Interactive Strategy Development

We'll develop our strategy logic interactively in a notebook, peeking at the DataFrames as we go, then transfer the code to a .py file for backtesting with Moonshot.

As a reminder, here are the rules of the QVAL strategy as outlined in the Alpha Architect white paper:

  1. Universe selection
    1. Starting universe: all NYSE stocks
    2. Exclude financials, ADRs, REITs
    3. Liquidity screen: select top N percent of stocks by market cap (N=60)
  2. Apply value screen: select cheapest N percent of stocks by enterprise multiple (EV/EBIT) (N=10)
  3. Rank by quality: of the value stocks, select the N percent with the highest quality, as ranked by Piotroski F-Score (N=50)
  4. Apply equal weights
  5. Rebalance portfolio quarterly

Query historical prices

Start by querying historical prices from your Sharadar history database. We specify our universe of NYSE stocks as well as the universes we wish to exclude.

For now we limit ourselves to a couple years of data to make it easier to work with. Later we'll run a backtest using a larger date range.

In [1]:
from quantrocket import get_prices

DB = "sharadar-us-stk-1d"
UNIVERSES = "nyse-stk"
EXCLUDE_UNIVERSES = ["nyse-financials", "nyse-reits", "nyse-adrs"]

prices = get_prices(DB, 
                    start_date="2014-01-01",
                    end_date="2016-01-01", 
                    universes=UNIVERSES,
                    exclude_universes=EXCLUDE_UNIVERSES, 
                    fields=["Close", "Volume"])
prices.tail()
Out[1]:
SidFIBBG0000018G2FIBBG000001J87FIBBG000001JC2FIBBG000001JD1FIBBG000001NT5FIBBG000001NV2FIBBG000001SF9FIBBG000002791FIBBG0000027B8FIBBG000002WJ5...QA000000001978QA000000001981QA000000001995QA000000014708QA000000014977QA000000017129QA000000018169QA000000020127QA000000021599QA000000021660
FieldDate
Volume2015-12-24344.00.00.00.00.00.05.0367.0275.00.0...NaNNaNNaNNaN151783.044931.010039591.0198010.019740.02487841.0
2015-12-280.00.0110.00.00.00.02.01200.02305.0645.0...NaNNaNNaNNaN190260.0144889.022684172.0499439.021078.04331514.0
2015-12-29970.00.00.00.00.04404.020.0200.01789.00.0...NaNNaNNaNNaN243947.0148636.026242804.0478452.029483.05834458.0
2015-12-306.00.01010.00.00.0140.00.09781.0512.00.0...NaNNaNNaNNaN301970.0120767.030679962.0344818.022549.04393923.0
2015-12-3115.00.01000.00.03.00.050.08824.0849.00.0...NaNNaNNaNNaN304484.0167048.033392960.0261484.021686.06288810.0

5 rows × 1600 columns

Step 1.C: Filter by dollar volume

The QVAL white paper calls for limiting the universe to the top 60% of stocks by market cap. Although market cap is available in the Sharadar fundamental data, here we will use dollar volume as a proxy for market cap.

The code below will compute daily ranks by dollar volume and give us a boolean mask indicating which stocks have adequate dollar volume.

In [2]:
closes = prices.loc["Close"]
volumes = prices.loc["Volume"]

# calculate 90 day average dollar volume
avg_dollar_volumes = (closes * volumes).rolling(90).mean()

# rank biggest to smallest; pct=True gives percentile ranks between 0-1
dollar_volume_ranks = avg_dollar_volumes.rank(axis=1, ascending=False, pct=True)
        
have_adequate_dollar_volumes = dollar_volume_ranks <= (0.60)
have_adequate_dollar_volumes.tail()
Out[2]:
SidFIBBG0000018G2FIBBG000001J87FIBBG000001JC2FIBBG000001JD1FIBBG000001NT5FIBBG000001NV2FIBBG000001SF9FIBBG000002791FIBBG0000027B8FIBBG000002WJ5...QA000000001978QA000000001981QA000000001995QA000000014708QA000000014977QA000000017129QA000000018169QA000000020127QA000000021599QA000000021660
Date
2015-12-24FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseFalseTrue
2015-12-28FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseFalseTrue
2015-12-29FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseFalseTrue
2015-12-30FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseFalseTrue
2015-12-31FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseFalseTrue

5 rows × 1600 columns

We'll use this filter in the next step.

Step 2: Apply value screen

Next, we use Sharadar fundamentals to identify the cheapest 10% of stocks as measured by the enterprise multiple (EV/EBIT).

The function get_sharadar_fundamentals_reindexed_like returns a DataFrame of fundamentals matching the dates and sids (security IDs) of the input DataFrame (in this case closes). The resulting DataFrame gives us the latest fundamental values as of any given date.

In [3]:
from quantrocket.fundamental import get_sharadar_fundamentals_reindexed_like

# Request EV/EBIT. The dimension "ART" (= "As reported - trailing twelve months") excludes restatements.
fundamentals = get_sharadar_fundamentals_reindexed_like(closes, fields=["EVEBIT"], dimension="ART")
enterprise_multiples = fundamentals.loc["EVEBIT"]

# Ignore negative enterprise multiples, which indicate negative earnings
enterprise_multiples = enterprise_multiples.where(enterprise_multiples > 0)

enterprise_multiples.tail()
Out[3]:
SidFIBBG0000018G2FIBBG000001J87FIBBG000001JC2FIBBG000001JD1FIBBG000001NT5FIBBG000001NV2FIBBG000001SF9FIBBG000002791FIBBG0000027B8FIBBG000002WJ5...QA000000001978QA000000001981QA000000001995QA000000014708QA000000014977QA000000017129QA000000018169QA000000020127QA000000021599QA000000021660
Date
2015-12-24NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaN38.020.0NaN49.08.016.0
2015-12-28NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaN38.020.0NaN49.08.016.0
2015-12-29NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaN38.020.0NaN49.08.016.0
2015-12-30NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaN38.020.0NaN49.08.016.0
2015-12-31NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaN38.020.0NaN49.08.016.0

5 rows × 1600 columns

We identify value stocks by ranking on enterprise multiple, but we only apply the rankings to stocks with adequate dollar volume:

In [4]:
value_ranks = enterprise_multiples.where(have_adequate_dollar_volumes).rank(axis=1, ascending=True, pct=True)
are_value_stocks = value_ranks <= (0.10)

Step 3: Filter by quality

The next step is to rank the value stocks by quality and select the top 50%. To do this we must calculate the Piotroski F-Score.

Piotroski F-Score

First, we query the relevant indicators to calculate the F-Score:

In [5]:
fundamentals = get_sharadar_fundamentals_reindexed_like(
    closes,
    dimension="ART", # As-reported trailing twelve months reports
    fields=[
        "ROA", # Return on assets
        "ASSETS", # Total Assets
        "NCFO", # Net Cash Flow from Operations
        "DE", # Debt to Equity Ratio
        "CURRENTRATIO", # Current ratio
        "SHARESWA", # Outstanding shares
        "GROSSMARGIN", # Gross margin
        "ASSETTURNOVER", # Asset turnover
])

return_on_assets = fundamentals.loc["ROA"]
total_assets = fundamentals.loc["ASSETS"]
operating_cash_flows = fundamentals.loc["NCFO"]
leverages = fundamentals.loc["DE"]
current_ratios = fundamentals.loc["CURRENTRATIO"]
shares_out = fundamentals.loc["SHARESWA"]
gross_margins = fundamentals.loc["GROSSMARGIN"]
asset_turnovers = fundamentals.loc["ASSETTURNOVER"]

Many Piotroski F-score components compare current to previous values, so next we need to get DataFrames of the previous period's values. To do this in pandas, we identify which dates represent new fiscal periods, shift the previous period's value into the new period, and fill forward:

In [6]:
fundamentals = get_sharadar_fundamentals_reindexed_like(
    closes,
    dimension="ART", # As-reported trailing twelve month reports
    fields=["REPORTPERIOD"])

# get a boolean mask of the first day of each newly reported fiscal period
fiscal_periods = fundamentals.loc["REPORTPERIOD"]
are_new_fiscal_periods = fiscal_periods != fiscal_periods.shift()

# shift the ROAs forward one fiscal period by (1) shifting the ratios one day,
# (2) keeping only the ones that fall on the first day of the newly reported
# fiscal period, and (3) forward-filling
previous_return_on_assets = return_on_assets.shift().where(are_new_fiscal_periods).fillna(method="ffill")

# Step 2.c: Repeat for other indicators
previous_leverages = leverages.shift().where(are_new_fiscal_periods).fillna(method="ffill")
previous_current_ratios = current_ratios.shift().where(are_new_fiscal_periods).fillna(method="ffill")
previous_shares_out = shares_out.shift().where(are_new_fiscal_periods).fillna(method="ffill")
previous_gross_margins = gross_margins.shift().where(are_new_fiscal_periods).fillna(method="ffill")
previous_asset_turnovers = asset_turnovers.shift().where(are_new_fiscal_periods).fillna(method="ffill")

Now we can calculate the F-Score components; each resulting component is a DataFrame of booleans:

In [7]:
have_positive_return_on_assets = return_on_assets > 0
have_positive_operating_cash_flows = operating_cash_flows > 0
have_increasing_return_on_assets = return_on_assets > previous_return_on_assets
have_more_cash_flow_than_incomes = operating_cash_flows / total_assets > return_on_assets
have_decreasing_leverages = leverages < previous_leverages
have_increasing_current_ratios = current_ratios > previous_current_ratios
have_no_new_shares = shares_out <= previous_shares_out
have_increasing_gross_margins = gross_margins > previous_gross_margins
have_increasing_asset_turnovers = asset_turnovers > previous_asset_turnovers

Lastly, we convert the booleans to integers and sum to get the F-Score, which is a number between 0 and 9:

In [8]:
f_scores = (
    have_positive_return_on_assets.astype(int)
    + have_positive_operating_cash_flows.astype(int)
    + have_increasing_return_on_assets.astype(int)
    + have_more_cash_flow_than_incomes.astype(int)
    + have_decreasing_leverages.astype(int)
    + have_increasing_current_ratios.astype(int)
    + have_no_new_shares.astype(int)
    + have_increasing_gross_margins.astype(int)
    + have_increasing_asset_turnovers.astype(int)
)
f_scores.tail()
Out[8]:
SidFIBBG0000018G2FIBBG000001J87FIBBG000001JC2FIBBG000001JD1FIBBG000001NT5FIBBG000001NV2FIBBG000001SF9FIBBG000002791FIBBG0000027B8FIBBG000002WJ5...QA000000001978QA000000001981QA000000001995QA000000014708QA000000014977QA000000017129QA000000018169QA000000020127QA000000021599QA000000021660
Date
2015-12-240000000000...0005152775
2015-12-280000000000...0005152775
2015-12-290000000000...0005152775
2015-12-300000000000...0005152775
2015-12-310000000000...0005152775

5 rows × 1600 columns

Rank by F-Score

Now that we have the F-scores we can rank the value stocks by quality and select the top 50%. This gives us our DataFrame of long signals, which by convention we cast from boolean to int.

In [9]:
quality_ranks = f_scores.where(are_value_stocks).rank(axis=1, ascending=False, pct=True)
long_signals = quality_ranks <= (0.50)
long_signals = long_signals.astype(int)

Step 4: Apply equal weights

The QVAL strategy trades an equal-weighted portfolio. By convention, for an unlevered strategy the daily weights should add up to 1 (=100% invested), so we divide each day's signals by the number of signals to get the individual position weights:

In [10]:
daily_signal_counts = long_signals.abs().sum(axis=1)
daily_signal_counts.tail()
Out[10]:
Date
2015-12-24    49
2015-12-28    49
2015-12-29    49
2015-12-30    49
2015-12-31    49
dtype: int64
In [11]:
weights = long_signals.div(daily_signal_counts, axis=0).fillna(0)
weights.where(weights!=0).stack().tail()
Out[11]:
Date        Sid           
2015-12-31  FIBBG0029SNR63    0.020408
            FIBBG002B917C3    0.020408
            FIBBG00BN961G4    0.020408
            FIBBG00GTKSSK4    0.020408
            FIBBG00R2NHQ65    0.020408
dtype: float64

Step 5: Rebalance quarterly

Currently we have a DataFrame of signals that change every day, but the QVAL strategy calls for quarterly rebalancing.

To accomplish this with pandas, we resample the DataFrame to quarterly, take the last signal of the quarter, then reindex back to daily and fill forward:

In [12]:
# Resample daily to quarterly, taking the last day's signal
# For pandas offset aliases, see https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
weights = weights.resample("Q").last()

# Reindex back to daily and fill forward
weights = weights.reindex(closes.index, method="ffill")

Step 6: Positions and returns

The DataFrame of weights represents what we want to own, as calculated at the end of the day. Assuming we enter positions the next day, we simply shift the weights forward to simulate our positions:

In [13]:
positions = weights.shift()

To calculate the return (before costs), we multiply the security's percent change over the period by the size of the position.

Since positions represents when we enter the position, we must shift positions forward to get the "end" of the position, since that is when we collect the percent change, not when we first enter the position.

In [14]:
position_ends = positions.shift()
gross_returns = closes.pct_change() * position_ends

TIP: Proper alignment is important to avoid lookahead bias. Inspect your DataFrames as you go to validate your logic. If your DataFrames are large, as in this strategy, creating some toy DataFrames can be a good way to see what's happening:

In [15]:
import pandas as pd
toy_closes = pd.Series([50,100,50])
toy_pct_changes = toy_closes.pct_change()
toy_positions = pd.Series([0,1,0])
toy_position_ends = toy_positions.shift()
toy_returns = toy_pct_changes * toy_position_ends
pd.concat({
    "close": toy_closes,
    "pct_change": toy_pct_changes,
    "position": toy_positions,
    "position_ends": toy_position_ends,
    "return": toy_returns}, 
    axis=1)
Out[15]:
closepct_changepositionposition_endsreturn
050NaN0NaNNaN
11001.010.00.0
250-0.501.0-0.5

Next Up

Part 4: Moonshot Backtest