Why Pipeline?

Many trading algorithms have the following structure:

  1. For each asset in a known (large) set, compute N scalar values for the asset based on a trailing window of data.
  2. Select a smaller tradeable set of assets based on the values computed in (1).
  3. Calculate desired portfolio weights on the set of assets selected in (2).
  4. Place orders to move the algorithm's current portfolio allocations to the desired weights computed in (3).

There are several technical challenges with doing this robustly. These include:

  • efficiently querying large sets of assets
  • performing computations on large sets of assets
  • handling adjustments (splits and dividends)
  • asset delistings

Pipeline exists to solve these challenges by providing a uniform API for expressing computations on a diverse collection of datasets.

Research Notebooks vs Zipline Backtests

An ideal algorithm design workflow involves a research phase and an implementation phase. In the research phase, we can interact with data or quickly iterate on different ideas in a notebook. Algorithms are then implemented in Zipline where they can be backtested.

One feature of the Pipeline API is that constructing a pipeline is identical in a research notebook and in a Zipline algorithm. The only difference between using pipeline in the two environments is how it is run. This makes it easy to iterate on a pipeline design in research and then move it with a simple copy paste to a Zipline algorithm.

Computations

There are three types of computations that can be expressed in a pipeline: factors, filters, and classifiers. Abstractly, factors, filters, and classifiers all represent functions that produce a value from an asset and a moment in time. Factors, filters, and classifiers are distinguished by the types of values they produce.

Factors

A factor is a function from an asset and a moment in time to a numerical value. A simple example of a factor is the most recent price of a security. Given a security and a specific point in time, the most recent price is a number. Another example is the 10-day average trading volume of a security. Factors are most commonly used to assign values to securities which can then be used in a number of ways. A factor can be used in each of the following procedures:

  • computing target weights
  • generating alpha signal
  • constructing other, more complex factors
  • constructing filters

Filters

A filter is a function from an asset and a moment in time to a boolean. An example of a filter is a function indicating whether a security's price is below \$10. Given a security and a point in time, this evaluates to either True or False. Filters are most commonly used for describing sets of assets to include or exclude for some particular purpose.

Classifiers

A classifier is a function from an asset and a moment in time to a categorical output. More specifically, a classifier produces a string or an int that doesn't represent a numerical value (e.g. an integer label such as a sector code). Classifiers are most commonly used for grouping assets for complex transformations on Factor outputs. An example of a classifier is the exchange on which an asset is currently being traded.

Datasets

Pipeline computations can be performed using a variety of data such as pricing (OHLC) and volume data, fundamental data, and securities master data. We will explore these datasets in later lessons.

A typical pipeline usually involves multiple computations and datasets. In this tutorial, we will build up to a pipeline that selects liquid securities with large changes between their 10-day and 30-day average prices.


Next Lesson: Creating a Pipeline