Insights

March 2020
 

Share

Making Sense of Machine Learning

Algorithum

It is easier to trust machine learning predictions when we understand them better.

Machine learning (ML) enables powerful algorithms to analyze financial data in new and exciting ways. But this excitement is often tempered by fear that investors don’t really understand why a model behaves the way it does.We need to move beyond this “black box” stigma. We propose a framework that demystifies the predictions from any ML algorithm. Our approach computes what we call a “fingerprint” for a given model’s linear, nonlinear, and interaction effects that drive its predictions — and ultimately its investment performance. In a real-world case study applied to currency return predictions, we find that popular ML models like neural network and random forest think in ways that do indeed make sense, and which we can begin to understand. These fingerprints empower investors to describe and probe the similarities and differences across ML models, and to extract genuine insight from machine-learned rules.
 

Key highlights
It is easier to trust machine learning predictions when we understand them better.

Machine learning (ML) methods empower sophisticated algorithms to analyze financial data sets and make predictions against a defined goal. However, ML models aresometimes regarded as “black box”when used for investment applications. In particular, investors demand reliable and intuitive interpretationof how ML models think about databefore they are confident enough to use them in practice.

To address this challenge, we propose a framework that decomposes ML predictions into three basic components: linear, nonlinear, and interaction effects and evaluates the predictive efficacy of these components. Together, this formsa“Model Fingerprint,”which can be used to summarize key characteristics of model predictions.

We apply our framework to a real-world case study where Random Forest (RF), Gradient Boosting Machine (GBM), and Neural Network (NN) models are trained to predict monthly currency returns in the G10 universe. Carry, trend, valuation, equity differential and market turbulence are used as input factors. The bar chart at right shows the prediction decomposition of the GBM model. Valuation is the top driver in the linear space. Carry has the largest nonlinear effect, and the GBM model highlights the interaction between carry and turbulence when it comes to monthly forward return prediction. The component efficacy chart reveals that the GBM can learn reliably simple rules in the linear space. The model also homes in on the efficacy of interaction effects. Both pairwise and higher-order interactions captured by the model boost the risk-adjusted portfolio performance.

As a way to demystify how an algorithm “thinks,” our “Model Fingerprint” framework is intuitive and may be generalized to other predictive tools and investment portfolios. As such, it allows researchers to teaseout a machine learning model’s “personality”, or more aptly, its “machinality”.

Stay Updated

Please send me StateStreet’s latest Insights