Moonshot Strategy Code

The file kitchensink_ml.py contains the strategy code.

Prices to features

Due to the large number of features, the strategy's prices_to_features calls a variety of helper methods to create the various categories of features. Not only does this improve code readability but it also allows intermediate DataFrames to be garbage-collected more frequently, reducing memory usage.

Fundamental features

The method add_fundamental_features adds various fundamental values and ratios. For each fundamental field, we choose to rank the stocks and use the rank as the feature, rather than the raw fundamental value. This is meant to ensure more uniform scaling of features. For example:

features["enterprise_multiples_ranks"] = enterprise_multiples.rank(axis=1, pct=True).fillna(0.5)

The parameter pct=True causes Pandas to rank the stocks along a continuum from 0 to 1, nicely scaling the data. We use fillna(0.5) to place NaNs at the center rather than at either extreme, so that the model does not interpret them as having either a good or bad rank.

Quality features

The method add_quality_features adds additional fundamental features related to quality as defined in the Piotroski F-Score. We add the nine individual F-score components as well as the daily F-score ranks.

Price and volume features

The method add_price_and_volume_features adds a number of features derived from price and volume including:

  • ranking by returns on several time frames (yearly, monthly, weekly, daily)
  • price level (above or below 10, above or below 2)
  • rankings by dollar volume
  • rankings by volatility
  • whether a volatility spike occurred today
  • whether a volume spike occurred today

Technical indicator features

The method add_technical_indicator_features calculates several technical indicators for each stock in the universe:

  • where is the price in relation to its 20-day Bollinger bands
  • RSI (Relative Strength Index)
  • Stochastic oscillator
  • Money Flow Index

Each indicator can have a value between 0 and 1. In the case of Bollinger Bands, where the price could exceed the band, resulting in a value less than 0 or greater than 1, we choose to winsorize the price at the upper and lower bands in order to keep the range between 0 and 1.

winsorized_closes = closes.where(closes > lower_bands, lower_bands).where(closes < upper_bands, upper_bands)

Securities master features

The method add_securities_master_features adds a few features from the securities master database: whether the stock is an ADR, and what sector it belongs to. Note that sectors must be one-hot encoded, resulting in a boolean DataFrame for each sector indicating whether the stock belongs to that particular sector. See the usage guide for more on one-hot encoding.

Market features

The method add_market_features adds several market-wide features to help the model know what is happening in the broader market, including:

  • whether the S&P 500 is above or below its 200-day moving average
  • the level of the VIX (specifically, where it falls within the range of 12 - 30, our chosen thresholds for low and high levels)
  • where the 10-day NYSE TRIN falls within the range of 0.5 to 2
  • the McClellan oscillator
  • whether the Hindenburg Omen triggered in the last 30 days

The first 3 of these features are derived from the index data collected from IBKR. We query this database for the date range of our prices DataFrame. (Note that we identified the database as the BENCHMARK_DB so that we can use SPY as the backtest benchmark, see the usage guide for more on benchmarks.)

# Get prices for SPY, VIX, TRIN-NYSE
market_prices = get_prices(self.BENCHMARK_DB,
                           fields="Close",
                           start_date=closes.index.min(),
                           end_date=closes.index.max())

market_closes = market_prices.loc["Close"]

Using SPY as an example, we extract the Series of SPY prices from the DataFrame and perform our calculations.

# Is S&P above its 200-day?
spy_closes = market_closes[self.SPY_SID]
spy_200d_mavg = spy_closes.rolling(200).mean()
spy_above_200d = (spy_closes > spy_200d_mavg).astype(int)

Now that we have a Series indicating whether SPY is above its moving average, we need to reshape the Series like our prices DataFrame, so that the SPY indicator is provided to the model as a feature for each stock. First, we reindex the Series like the prices DataFrame, in case there are any differences in dates between the two data sources (we don't expect there to be a difference but it is possible when using data from different sources). Then, we use apply to broadcast the Series along each column (i.e. each security) of the prices DataFrame:

# Must reindex like closes in case indexes differ
spy_above_200d = spy_above_200d.reindex(closes.index, method="ffill")
features["spy_above_200d"] = closes.apply(lambda x: spy_above_200d)

The McClellan oscillator is a market breadth indicator which we calculate using the Sharadar data, counting the daily advancers and decliners then calculating the indicator from these Series:

# McClellan oscillator
total_issues = closes.count(axis=1)
returns = closes.pct_change()
advances = returns.where(returns > 0).count(axis=1)
declines = returns.where(returns < 0).count(axis=1)
net_advances = advances - declines
pct_net_advances = net_advances / total_issues
ema_19 = pct_net_advances.ewm(span=19).mean()
ema_39 = pct_net_advances.ewm(span=39).mean()
mcclellan_oscillator = (ema_19 - ema_39) * 10
# Winsorize at 50 and -50
mcclellan_oscillator = mcclellan_oscillator.where(mcclellan_oscillator < 50, 50).where(mcclellan_oscillator > -50, -50)

As with the SPY indicator, we lastly broadcast the Series with apply to shape the indicator like the prices DataFrame:

features["mcclellan_oscillator"] = closes.apply(lambda x: mcclellan_oscillator).fillna(0)

Targets

Having created all of our features, in prices_to_features we create our targets by asking the model to predict the one-week forward return:

def prices_to_features(self, prices):
    ...
    # Target to predict: next week return
    one_week_returns = (closes - closes.shift(5)) / closes.shift(5).where(closes.shift(5) > 0)
    targets = one_week_returns.shift(-5)
    ...

Predictions to signals

The features and targets will be fed to the machine learning model during training. During backtesting or live trading, the features (but not the targets) will be fed to the machine learning model to generate predictions. The model's predictions will in turn be fed to the predictions_to_signals method, which creates buy signals for the 10 stocks with the highest predicted return and sell signals for the 10 stocks with the lowest predicted return, provided they have adequate dollar volume:

We choose to train our model on all securities regardless of dollar volume but only trade securities with adequate dollar volume. We could alternatively have chosen to only train on the set of liquid securities we were willing to trade.

def predictions_to_signals(self, predictions, prices):

    ...
    # Buy (sell) stocks with best (worst) predicted return
    have_best_predictions = predictions.where(have_adequate_dollar_volumes).rank(ascending=False, axis=1) <= 10
    have_worst_predictions = predictions.where(have_adequate_dollar_volumes).rank(ascending=True, axis=1) <= 10
    signals = have_best_predictions.astype(int).where(have_best_predictions, -have_worst_predictions.astype(int).where(have_worst_predictions, 0))
    ...

Weight allocation and rebalancing

Capital is divided equally among the signals, with weekly rebalancing:

def signals_to_target_weights(self, signals, prices):
    # Allocate equal weights
    daily_signal_counts = signals.abs().sum(axis=1)
    weights = signals.div(daily_signal_counts, axis=0).fillna(0)

    # Rebalance weekly
    # Resample daily to weekly, taking the first day's signal
    # For pandas offset aliases, see https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
    weights = weights.resample("W").first()
    # Reindex back to daily and fill forward
    weights = weights.reindex(prices.loc["Close"].index, method="ffill")        
    ...