End-of-Day Trading Rules

Momentum investing says that excess returns can be generated by buying recent winners. In this notebook we will use Zipline's Pipeline API to research the momentum factor on our sample data.

Zipline Strategy Structure

When developing a Zipline strategy, a good first step is to consider which of your trading rules utilize end-of-day data and which ones utilize intraday data.

A typical Zipline strategy uses the Pipeline API for end-of-day trading logic. Zipline "pipelines" run once per trading day before the market opens and are used to compute alpha factors based on prior day data, and also to filter large universes down to a manageable number of securities based on these alpha factors.

A typical Zipline strategy then uses intraday data (which in live trading comes from a real-time data feed) to apply additional trading logic to this filtered universe of securities and make trading decisions throughout the trading day.

Run Pipeline

You can create and run pipelines interactively in a notebook. Although you could proceed directly to writing your Zipline strategy in a .py file, starting in a notebook is a great way to validate your code interactively before transitioning to a backtest, where debugging can be more laborious.

The first step is to define the pipeline. A pipeline has two main attributes: columns, which is used to calculate one or more factors, and screen, which is used to filter the pipeline to a subset of securities. Here, we filter the starting universe to include only stocks with 30-day average dollar volume of at least 10 million dollars, and for these securities we calculate a 12-month return:

In [ ]:
from zipline.pipeline import Pipeline
from zipline.pipeline.factors import AverageDollarVolume, Returns

pipeline = Pipeline(
    columns={
        "1y_returns": Returns(window_length=252),
    },
    screen=AverageDollarVolume(window_length=30) > 10e6
)

The above code merely defines the pipeline but does not return any data. To compute the pipeline, we must run it on our sample data bundle.

Since we will be using the same data bundle repeatedly in our analysis, we can set it as the default bundle to avoid always having to type the name of the bundle:

In [2]:
from quantrocket.zipline import set_default_bundle
set_default_bundle("usstock-free-1min")
Out[2]:
{'status': 'successfully set default bundle'}

And now we run the pipeline:

In [3]:
from zipline.research import run_pipeline
factors = run_pipeline(pipeline, start_date="2017-01-01", end_date="2019-01-01")
factors.head()
Out[3]:
1y_returns
2017-01-03 00:00:00+00:00Equity(FIBBG00B3T3HD3 [AA])0.923288
Equity(FIBBG000B9XRY4 [AAPL])0.123843
Equity(FIBBG000BKZB36 [HD])0.044736
Equity(FIBBG000BMHYD1 [JNJ])0.179002
Equity(FIBBG000BFWKC0 [MON])0.100381

For each date in the requested date range, the resulting DataFrame contains a row for each security that passed our screen on that date, plus a column for each of our requested factors in columns.

The run_pipeline function is only used in notebooks. In a Zipline strategy, you access pipeline results one date at a time (through the pipeline_output function). To get the exact data structure you'll use in Zipline, simply select a single date like this:

In [4]:
factors = factors.xs("2017-01-03")
factors.head()
Out[4]:
1y_returns
Equity(FIBBG00B3T3HD3 [AA])0.923288
Equity(FIBBG000B9XRY4 [AAPL])0.123843
Equity(FIBBG000BKZB36 [HD])0.044736
Equity(FIBBG000BMHYD1 [JNJ])0.179002
Equity(FIBBG000BFWKC0 [MON])0.100381

By selecting a single day of pipeline output in a notebook, you can go ahead and write the end-of-day trading logic you will use in your strategy, and later transfer it to a .py file. Here, we sort the pipeline output by one-year returns and select the top 3 securities. These are the stocks our example strategy will buy.

In [5]:
returns = factors["1y_returns"].sort_values(ascending=False)
winners = returns.index[:3]
winners
Out[5]:
Index([ Equity(FIBBG00B3T3HD3 [AA]), Equity(FIBBG000GZQ728 [XOM]),
       Equity(FIBBG000BMHYD1 [JNJ])],
      dtype='object')