Questions

Can we create a supervised learning model for a CTA Strategy using the approaches below?

Chronological data splitting
- To prevent data leakage and simulate proper back testing
Random up sampling (non-synthetic)
- To handle class imbalance
Modelling with features transformation via ensemble tree algorithms

Key Task Inputs:

Outputs:

Chronological Data Splitting

Chronological data splitting into train/validation/holdout splits
Assumes back testing with fixed time interval of model retraining
- Yearly
- Quarterly
- Monthly
- Daily
Validation Period:
- Predictions during this period are scored and used for hyperparameter tuning
Holdout Period:
- Predictions during this period are recorded but not used for any tuning

Random Up sampling (non-synthetic)

Signal modelling is a binary classification problem
CTA assets’ signals tend to be very imbalanced (around ratio of 1:4)
- Applying ML model directly will result in prediction of majority class most of the time
Approach
- Non-synthetic randomized up sampling is performed of the minority class to ratio 1:1
- Up sampling implementation is done with reproducible randomized seed

Modelling Methodology

Model Framework

Inputs -> Feature Transformation (FT) Ensemble Trees model -> One Hot Encoding -> Final Predictive Model

FT Ensemble Tree model:

Utilize a supervised learning ensemble tree algorithm for feature transformation
Approach:
- Trained on training data first (and never exposed to test data)
- Subsequently used as a feature transformation step where:
  - Preprocessed data is fed into the model
  - Leaves from each tree are used as input features in the next model
Helps capture non-linearity relationship in data while acting as a form of regularization

Final Predictive Model:

State of the art gradient boosted tree algorithm
Takes in the one-hot encoded feature transformed data from previous layer and performs binary classification prediction

Precision Charts

Precision = (𝑻𝒓𝒖𝒆 𝑷𝒐𝒔𝒊𝒕𝒊𝒗𝒆)/(𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝑷𝒐𝒔𝒊𝒕𝒊𝒗𝒆) for combined predictions in Validation and Holdout periods for each signal position (Long/Short)

Chart 1: Histogram of Precision:

Chart 2: Precision Bar chart per Asset

CTA Signal Modelling