
Synthetic view of the ML pipeline and the compared strategy curves.
Quantitative Finance · Data Science · Machine Learning
This project is linked to the portfolio backtesting and wealth-tracking stack: it is the machine-learning layer used to test signals and compare their impact on portfolio performance. The repository structures a full pipeline, from market-data preparation to test-metric production, with a reproducible logic built around dedicated scripts and run_all orchestration. The approach combines classic baselines such as Equal Weight and Markowitz Minimum Variance with supervised models like Random Forest and Logistic Regression in order to predict the direction of daily returns and convert those probabilities into portfolio weights.
What this project demonstrates
Ability to turn a quantitative topic into a usable ML pipeline: feature engineering, variable selection, multi-model training, comparison against financial baselines, and clear reporting of results.

My role
I contributed to project structuring, to the integration of strategy-comparison modules, and to the design of an end-to-end flow covering data preparation, training, backtesting, and reporting.
Context
The objective was to go beyond an exploratory notebook and produce a rigorous, collaborative working base with modular code, dedicated scripts, exported metrics, and directly usable comparison charts.
Objective
Assess the contribution of supervised models to portfolio allocation in a structured way while preserving a robust financial reference through Equal Weight and Markowitz baselines.
Deep dive
The project follows a full quant and machine-learning workflow: price ingestion, technical-feature construction, directional model training by asset, portfolio backtest, and systematic comparison against financial baselines.
Gallery

Synthetic view of the ML pipeline and the compared strategy curves.

Visual comparison of the Random Forest model against the financial baselines.
Architecture
Modules such as src/data/load_data.py and src/data/preprocess.py load prices, compute returns, and create a proper temporal train or test split.
Technical indicators are built in src/features/technical_indicators.py and ANOVA selection is applied through src/features/feature_selection.py to keep the most relevant variables.
Financial baselines live in src/baselines while ML models such as Random Forest and Logistic Regression are implemented with the backtest logic in the model layer.
Scripts such as scripts/run_*.py and run_all.py execute the full pipeline, produce metric tables, and generate equity-curve comparisons.
Pipeline
Technical choices
Every ML strategy is evaluated against Equal Weight and Markowitz to keep a stable financial benchmark rather than relying only on a classification score.
The train or test split follows market chronology in order to limit information leakage in time-series data.
Random Forest combines feature reduction and hyperparameter tuning to avoid a purely arbitrary modeling approach.
Predictions are generated at the ticker level and then aggregated through a weighting logic, which makes the model closer to a real allocation use case.
Reliability
Limitations
Roadmap
Challenges
Build a clean temporal train or test pipeline to limit information leakage in a time-series setting.
Compare fundamentally different approaches, from static allocations to supervised models, under homogeneous metrics.
Maintain a readable architecture despite the multiplication of components such as features, models, baselines, and orchestration scripts.
Outcomes and learnings
Reproducible execution chain through run_prepare.py, run_baselines.py, run_random_forest.py, run_logistic_regression.py, and run_all.py.
Systematic production of test reports such as metrics_test_* and comparative figures such as equity_*_vs_baselines_test.png.
Integration of Random Forest with ANOVA feature selection and hyperparameter search through GridSearchCV.
Addition of a Logistic Regression supervised benchmark compared with classical financial strategies.
Other projects

Personal Finance · Python · PyQt6
PyQt6 + SQLite desktop application to centralize multi-asset accounts, rebuild weekly history, and analyze portfolio performance.
View this project
Quantitative Finance · Analysis · Python
Python environment to backtest strategies, compare risk and return metrics, and analyze portfolio behavior.
View this projectDiscuss
If this project is relevant for you, I can detail the initial need, data structure, assumptions, challenges encountered, and analysis limitations.