Historical Data Collection

For backtesting we will collect 1-minute bid-ask bars for all CL futures.

First, start IB Gateway:

In [1]:
from quantrocket.ibg import start_gateways
start_gateways(wait=True)
Out[1]:
{'ibg1': {'status': 'running'}}

Collect CL futures chain

Collect contract details for all available CL futures:

In [2]:
from quantrocket.master import collect_ibkr_listings
collect_ibkr_listings(exchanges="NYMEX", sec_types="FUT", symbols="CL")
Out[2]:
{'status': 'the IBKR listing details will be collected asynchronously'}

Monitor flightlog for a completion message:

quantrocket.master: INFO Collecting NYMEX FUT listings from IBKR website (CL only)
quantrocket.master: INFO Requesting details for 1 NYMEX listings found on IBKR website
quantrocket.master: INFO Saved 140 NYMEX listings to securities master database

Define universe of CL futures

Next we define a universe of CL futures for easy reference. To do so, download a CSV of CL futures from the securities master database:

In [3]:
from quantrocket.master import download_master_file
download_master_file("cl_futures.csv", exchanges="NYMEX", sec_types="FUT", symbols="CL")

Then upload the CSV to create the "cl-fut" universe:

In [4]:
from quantrocket.master import create_universe
create_universe("cl-fut", infilepath_or_buffer="cl_futures.csv")
Out[4]:
{'code': 'cl-fut', 'provided': 140, 'inserted': 140, 'total_after_insert': 140}

Define rollover rules

For the purpose of defining calendar spreads, we must define rollover rules to specify which contract should be considered the front month and the various back months. Example rules are defined in quantrocket.master.rollover.yml, where we specify to rollover 10 business days before expiration. See the usage guide for more rollover rule options.

The master service looks for this file in the codeload directory, so move it there to install it:

In [ ]:
# move file over unless it already exists
![ -e /codeload/quantrocket.master.rollover.y*ml ] && echo 'oops, the file already exists!' || mv quantrocket.master.rollover.yml /codeload/

Collect historical data

Next we collect 1-min historical data with the following parameters:

  • bar_type: The BID_ASK bar type provides the average bid and ask over the period of the bar.
  • outside_rth: We opt to include data from outside regular trading hours so that our moving averages and Bollinger Bands aren't jumpy.
  • shard: We shard/partition the database by month, resulting in a separate database per month (see the usage guide for more on sharding).
  • start_date: We enforce a start date of 2.5 years ago. IBKR only provides historical data for futures that expired less than 2 years ago, but the IBKR API will sometimes unsuccessfully look for data much earlier than that, which slows down data collection.
In [5]:
from quantrocket.history import create_ibkr_db
import pandas as pd

start_date = (pd.Timestamp.today() - pd.Timedelta(days=365*2.5)).date().isoformat()

create_ibkr_db("cl-1min-bbo", 
              universes="cl-fut", 
              bar_size="1 min", 
              bar_type="BID_ASK", 
              outside_rth=True,
              shard="month",
              start_date=start_date
             )
Out[5]:
{'status': 'successfully created quantrocket.v2.history.cl-1min-bbo.sqlite'}

Then we collect the data. Be prepared for intraday data collection to take some time (perhaps a day or so depending on several variables).

In [6]:
from quantrocket.history import collect_history
collect_history("cl-1min-bbo")
Out[6]:
{'status': 'the historical data will be collected asynchronously'}

Monitor flightlog for completion:

quantrocket.history: INFO [cl-1min-bbo] Collecting history from IBKR for 144 securities in cl-1min-bbo
...
quantrocket.history: INFO [cl-1min-bbo] Saved 22664 total records for 50 total securities to quantrocket.v2.history.cl-1min-bbo.sqlite