Universe Selection

The Alpha Architect white paper calls for the trading strategy to run on the universe of NYSE stocks, excluding financials, REITs, and ADRs. Thus our first step is to create universes that define these different groups of securities.

All NYSE securities

First, download a CSV of all NYSE securities from the securities master. We use fields="sharadar*" to include all Sharadar master fields in the output. We use vendors="sharadar" to limit to securities which are available from Sharadar.

In [1]:
from quantrocket.master import download_master_file
download_master_file("sharadar_nyse_securities.csv", exchanges="NYSE", fields="sharadar*", vendors="sharadar")

We can use the file to create the universe of all NYSE securities:

In [2]:
from quantrocket.master import create_universe
create_universe("nyse-stk", "sharadar_nyse_securities.csv")
Out[2]:
{'code': 'nyse-stk',
 'provided': 6777,
 'inserted': 6777,
 'total_after_insert': 6777}

Financials

Next we create a universe of financials. We'll exclude this universe (along with REITs and ADRs) when it comes time to run our backtest.

First load the securities into Pandas and list the sectors:

In [3]:
import pandas as pd
nyse_securities = pd.read_csv("sharadar_nyse_securities.csv")
nyse_securities.sharadar_Sector.unique()
Out[3]:
array(['Financial Services', 'Real Estate', 'Utilities', nan,
       'Industrials', 'Healthcare', 'Basic Materials',
       'Consumer Cyclical', 'Energy', 'Communication Services',
       'Consumer Defensive', 'Technology'], dtype=object)

In the Sharadar data, the financial sector is called "Financial Services". We filter the DataFrame to stocks in this sector, write them to a file (we use an in-memory file so as not to clutter the hard drive), and upload the file to create the universe of financial stocks:

In [4]:
nyse_securities[nyse_securities.sharadar_Sector == "Financial Services"].to_csv("sharadar_nyse_financials.csv")
create_universe("nyse-financials", "sharadar_nyse_financials.csv")
Out[4]:
{'code': 'nyse-financials',
 'provided': 872,
 'inserted': 872,
 'total_after_insert': 872}

REITS

Next we create a universe of REITs. From inspecting the master file we know that REITs are identified in the "sharadar_Industry" column:

In [5]:
nyse_securities[nyse_securities.sharadar_Industry.fillna("").str.contains("REIT")].to_csv("sharadar_nyse_reits.csv")
create_universe("nyse-reits", "sharadar_nyse_reits.csv")
Out[5]:
{'code': 'nyse-reits',
 'provided': 637,
 'inserted': 637,
 'total_after_insert': 637}

ADRs

To create a universe of ADRs, we can take advantage of the "sharadar_Category" field in the Sharadar data, which contains this information. First have a peek:

In [6]:
nyse_securities.sharadar_Category.unique()
Out[6]:
array(['Domestic Preferred', 'ETD', 'ADR Preferred', 'Domestic', nan,
       'ETN', 'CEF', 'ETF', 'Domestic Primary', 'ADR', 'Canadian',
       'Domestic Secondary', 'ADR Primary', 'ADR Secondary',
       'Canadian Primary', 'Domestic Warrant', 'Canadian Preferred',
       'ADR Warrant', 'Canadian Warrant'], dtype=object)
In [8]:
nyse_securities[nyse_securities.sharadar_Category.fillna("").str.startswith("ADR")][["sharadar_Ticker","sharadar_Name","sharadar_Category"]].head()
Out[8]:
sharadar_Tickersharadar_Namesharadar_Category
6BCS-PDBarclays PlcADR Preferred
12HSEAHsbc Holdings PlcADR Preferred
14BCS-PABarclays PlcADR Preferred
25NBG-PANational Bank Of Greece SaADR Preferred
26AHL-PAAspen Insurance Holdings LtdADR Preferred

Then create the ADR universe:

In [9]:
nyse_securities[nyse_securities.sharadar_Category.fillna("").str.startswith("ADR")].to_csv("sharadar_nyse_adrs.csv")
create_universe("nyse-adrs", "sharadar_nyse_adrs.csv")
Out[9]:
{'code': 'nyse-adrs',
 'provided': 656,
 'inserted': 656,
 'total_after_insert': 656}