Today I would like to share with you some life lessons and learnings from expert Robert Sinnott, where he shares his in depth knowledge of the modern evolution of systematic trading, in the form of Machine learning, and how machine learning has effected and impacted their short term and long term strategies. If you would like to hear the full episode then you can listen by clicking here. I’m pretty sure you are going to learn something valuable from what Robert shared with me.
Rob: In statistics, in machine learning and decision theory, many of these concepts that are talked about actually got their start in the ‘70s, in the ‘60s, even in the ‘50s. Where a lot of the nuance has come today, or a lot of the excitement has come today, is in where and how they are applied, in what data sets they’re applied to, and to what degree you can automate and systematize the application of these tools.
Machine learning is a toolbox. To say that we use machine learning to build our algorithms is kind of like saying, “I use tools to build a house.” It’s not really additive in terms of your understanding. So, let me break that down.
"Machine learning is a toolbox."
Let’s talk about what actually we do with machine learning. Before we go into the tools that we use, which I think are really interesting, let me also break down to the problems that we try to solve because I think there’s a lot of hype about machine learning and I think that there are some kinds of problems where there is a lot of potential for growth.
I think that there are some kinds of problems that, regardless of the amount of machinery that you throw at them, are still going to be a challenge and are still going to be a source of where people who are well versed in machines, and specifically their limitations, will be able to still add value as humans rather than just automatons.
So when we think about machine learning, I would say that there are two kinds of problems. You have what you would think of as classification problems, or (as what I would like to call) stationary problems, meaning that the problem that you’re working on doesn’t change over time.
Here’s a great example: Google has come out with a lot of really interesting results and a lot of really impressive, fast algorithms for identifying various things in videos and images. You go back ten years, and it was a really hard thing to identify a cat in a photo. Now it’s a really trivial thing. In fact, you can do that for arbitrary objects. You can just go online, and there are online classifiers that allow you to make these decisions.
One of the very, very, early uses of these things was (in financial markets) things like counting cars in parking lots, or identifying cars in parking lots, or identifying the levels of oil in silos, or trying to predict crop yields, things like these. Those kinds of questions, where it doesn’t matter how many people are looking at the field to identify if this is going to be a good forecast or a bad forecast, or is it going to be a high yield crop or a low yield crop? That doesn’t change the success of detection of that yield.
It doesn’t matter how many people are looking at whether or not that’s a cat in the video or a cat in the image. That doesn’t change your success rate. That’s a static problem and the more data you can throw at it, the more training samples that you can throw at the problem, the better your algorithm will be up to some asymptotic.
What are the challenges there? Well, the challenges there are an abundance of features. The more things that you potentially know about a data set the harder is to glean what’s true. Also noise, the more noisy a data set, the more pixilated an image, for example, the harder it’s going to be to get your answer. But again, the more data that you have, the more training samples that you have, the better your algorithm will be in the end.
There’s a second kind of problem, and that’s, unfortunately, the problem that we typically have in trying to forecast financial markets. So again, in the first case where you can do machine learning for yield discovery, you can do it for forecasting earnings, you can do it for trying to predict the number of skews that will be sold by a retailer. Those are all great things. Those are all classification problems. Those are all stationary.
But when you start trying to answer the question, “Well, this earnings level and this book value and this momentum indicator and this sentiment out of the CEO on his earnings call, what that means to the return that’s going to happen between today and tomorrow? That’s a much harder question because there’s a feedback loop.
The more people try to answer that question, and the better their answer is, the less relevant it is in figuring out where it’s going to go tomorrow. That is to say, the solution to the problem, the more people looking at the problem, make that problem harder and may make the features that you used to think were useful in making your prediction no longer useful because they’ve been fully priced in.
So, as a result, you have this competition, this fight to be first, this fight to be right among lots of intelligent people in the market is going to make something that means that whatever you design for the second kind of problem, this forecasting problem, it’s going to disappear over time. So, it may have been in the past that simple value signals or value pricing in equities may have been very successful at determining the direction of markets. Or, for a more concrete example, if you were to go back to the ‘80s and ’90, simply knowing which direction the price moved over the last five days or ten days was a really good indication of where prices were going to move over the next five days. It was a sharp two or sharp three strategy depending on how good your transaction costs were. That simply isn’t the case anymore.
So, the balance that we have to fight is because of the second category of problem. It’s whatever we think we know may work for some time, but then that will decay as more people figure out the things that we figured out, even if we did have forecasting power before, we may not have forecasting power tomorrow.
"...things that used to not work may also start to work again."
So, it’s not just model overfit. There are lots of problems with model overfit, especially noisy data like financial data, but it’s also model decay, and it’s this evolution of market participants that will make things that used to work cease to work. At the same time, if people stop paying attention to these particular features, things that used to not work may also start to work again. But that’s the challenge that we see today.
Niels: It’s interesting. It’s very, very interesting as an actual field and not a field that I know a lot about; I have to admit. I kind of posed the same question a few months ago when I was interviewing the founders of AHL and to my surprise both David Harding and Martin Lueck and I think Mike Adam as well, actually, they didn’t sound too overly enthusiastic about artificial intelligence. Actually, to some extent, from memory they were talking about, well, in a sense we already have that because each of our brains, each participant in the market kind of makes up that structure.
To break it down in a very simple way, for me, is of course that I think of it as the more examples the machine sees it learns more and more. But, it learns more and more from what’s happening in the recent history, and we know that things constantly change and might even go back. You know, we don’t have this kind of data from fifty or a hundred years ago, but actually, to some extent, you could argue that markets go back and replicate themselves into how things worked fifty or a hundred years ago. It doesn’t always have to be new change. It could be changed back to the way it was. So how do you overcome all of that? How do you make it work?
Rob: I don’t actually agree with them, to a large extent. I think that, as I gave with that example, I think that you can very easily overdo it with machine learning. I think you can very easily overfit your process and that overfitting can come from just finding noise and thinking it’s signal, or it can come from decaying of these attributes.
The things that we’re detecting, the things that we’re allowing our models to adjust to are generally fairly slow moving features. These are not things that are going to change from one month to the next. These are features that are going to develop and disappear over the course of years. Anything shorter than that and we have no hope of detecting and having any sense of confidence in it.
"They can never give too much emphasis to a trend."
The difference in the way we design our models is that we design them so that they can (those algorithms can) shape those particular nuances between markets, but that they have a common structure, and have common constraints, and common features across markets that we know (in some sense) hold economic value or has some economic truth to them. So again, putting my statistician’s hat on, putting my model builder’s hat on, when you have an infinitely wide dimensional feature space, anything can go into forecasting any market, potentially. You had only so much data.
Suppose we have forty years of data, well congratulations, you have ten thousand data points on a daily basis. Yes, you could chop it up second to second, or minute to minute, but for most of the things that we’re talking about (especially for momentum and trend following), it’s not going to be additive. The autocorrelation of the signal is too high.
So, how do you deal with that problem? You deal with that problem by building structure. You deal with that problem by saying, “You know what, my learning models, my mechanisms, they can only learn a certain class of problem. They can only learn (within trend following) how I relatively weight my different horizons?”
They can never go shorter trend. They can never give too much emphasis to a trend. The kinds of lessons that they can extract are in an, actually, very, very narrow region of that unlimited space because, otherwise, I can guarantee you’ll pick up noise.
An epochal story (probably epochal), it very well may be true, told to me by Andy Gelman of Columbia was that when, either he was a grad student or when he was an early professor, he had a student try to find the single best predictor of the S&P 500. What the student came back with, after studying thousands and tens of thousands of databases was the quarterly price of butter in Bangladesh. Indeed, over the in-sample period it would have phenomenally high r2, and on an out of sample basis, it was worthless.
The only way you can avoid those kinds of instances, besides model discipline, is by adding structure to your model: by knowing what kinds of features you can input, by knowing how those inputs can be used to develop that function, to develop that model, and what form the output will be. That’s where the art, so of speak, comes from.
The only difference between what we do and what Harding did and AHL did back in the day was that we’ve tried to make it so that the algorithms add much of that asset by asset, or asset class, by asset class nuance. We parameterize how they learn, rather than trying to code and tune every asset on an item by item basis.
Why do we do that? Again, because if you have two or three parameters that allow us to figure out or to constrain how seventy or eighty markets are learning or are being detected or are following trends, we’re going to be much less likely to overfit our process. We’re going to be much less likely to overfit those parameters than if we have to fit eighty different markets or two hundred seventy, or two hundred forty different parameters per eighty markets. So, by narrowing that we’re actually taking advantage of these tools and of the systems and of the computation to make our jobs…