"“I couldn’t resist the chance of working with a small tight-knit group of individuals who are focused more on the research, more on understanding the markets than just making money.” - Robert Sinnott (Tweet)
From the very beginning, Robert Sinnott has studied the movement of the markets and how to anticipate their changes. Robert earned both an A.B. and an A.M. in Statistics from Harvard University, where he focused on statistical machine learning, capital markets, and time series analysis.
In our conversation, my co-host Katy Kaminski and I talk with Robert about how he has used these skills to redesign AlphaSimplex’s performance analytics infrastructure and develop smarter machine learning processes, and how Robert employs this data with such success.
Thank you for listening and please welcome our guest, Robert Sinnott.
In This Episode, You'll Learn:
- The history behind AlphaSimplex and how it began
“How do you think about markets? What are your assumptions about how markets behave? What are your assumptions about how markets interact?” - Robert Sinnott (Tweet)
- How Robert got involved with AlphaSimplex
- How the surrounding technology industry affects Robert’s work
- What lessons Robert has learned from his time at AlphaSimplex
“So [our machine learning] is not a black box in any way, shape, or form to us, and that is critical in terms of how we develop these models.” - Robert Sinnott (Tweet)
- How Robert spends his time outside of his work
- How adaptive markets shape the way Robert works
- Why Robert utilizes machine learning and how it benefits AlphaSimplex
“You will never capture all of the factors, all of the drivers, all of the sources of impetus that are driving the individual actors of the markets.” - Robert Sinnott (Tweet)
- The dangers of relying too much on machine learning
- What makes AlphaSimplex’s machine learning so easy to understand
Connect with AlphaSimplex:
Visit the Website: www.alphasimplex.com
Call AlphaSimplex: +1 617-475-7100
E-Mail AlphaSimplex: firstname.lastname@example.org
Follow Robert Sinnott on LinkedIn
“All models are wrong; some are useful.” - Robert Sinnott (Tweet)
The following is a full detailed transcript of this conversion. Click here to subscribe to our mailing list, and get full access to our library of downloadable eBook transcripts!
So Rob, thanks so much for coming on the podcast today. Katy and I are very excited to have you here. We really appreciate you taking time out to do this. Now, your firm has an interesting story. There’s an interesting history behind it. I was hoping that maybe we could start off there by just you talking about the background of AlphaSimplex.
Sure, be happy to Niels, and thank you for having me on your show. So, AlphaSimplex is not a name that, at least a couple of years ago, most people thought of. You can think of us a little bit as one of the later comers to managed futures, but we were right there in the thick of it with the quantitative firms that started in the 90s. Like many other managers, we got our start and got most of our early funding as a consultancy rather than a manager or an asset manager. Then we did everything from tactical asset allocation, risk models, even trading our algorithmic execution models. It wasn’t until about 2003 that we began running assets under the name AlphaSimplex, and it wasn’t until around 2007, 2008, when we were acquired by Natixis, that we really started having the amount of distribution and the relative exposure that we’re experiencing today.
So Rob, maybe you can tell us a little bit about your past into managed futures, a little bit more about your background and how you ended up where you are today.
Absolutely, so, that’s well over a decade ago now. It’s kind of a funny story. When I first heard of AlphaSimplex, it was a much, much smaller firm, and I heard it from two different places. I heard it from one of my professors at the time, a gentleman named John Campbell. He runs Arrowstreet Advisors, a fairly well known fundamental shop these days; and I also found out from a friend who used to be the president of the ballroom dance team at my university.
It’s kind of funny. I went to Harvard, but the thing that was most important about my stay there was actually the ballroom dance team. That’s where I met my wife, and it’s also where I got my position at AlphaSimplex. It was a decision, at the time, between AlphaSimplex and Google.
Google was the tremendous tech firm we all know, and some of us love today, and AlphaSimplex was the new, entrepreneurial, very research focused organization. I couldn’t resist the pull of both that opportunity, but also the chance of really working with a small, tight-knit group of individuals who were focused more on the research, focused more on understanding the markets than necessarily just making money.
That’s what really first appealed to me about AlphaSimplex. Now, the culture at AlphaSimplex is a little bit distinct. I mentioned before that it’s a research group. That’s how it started. It was founded by Andrew Lo, and in effect by a bunch of his grad students. While it has evolved continuously since that time, that same initial research culture just permeates everything that happens on the research side or really at AlphaSimplex as a whole.
AlphaSimplex is a research group. If you think about what the brand, I suppose, AlphaSimplex means today, it’s technology leveraged investing, it’s adaptive markets per Andrew’s paper – per his theory. But, really what it is, is an organization that pushes to try to squeeze the most out of the ideas and try to constantly build in that adaptation to the models and to the portfolios that we create.
Rob, you guys are sitting right there in Kendall Square, sort of in the heart of data science, so you can definitely say is that something that runs in your blood right there. You can feel it in the air, the technology, the research?
Yes, so if you walk out of our offices in Kendall Square, we’re across the street from Google and Microsoft and Facebook and Amazon and the rest of that Tech space, and we look right down on the MIT campus. We joke that the reason that we’re right there, along with everyone else, is to capture the young.
If you look at our research team, and no, we’re not all from MIT. I happen to be from that school down the road. Each person comes from a different background. So, my background is in statistics, before data science became the hype word that it is today. I have colleagues who are in computer science and decision theory and bio-statistics. We brought all of us together to try to take different perspectives on the challenge that is financial markets without a lot of the predispositions, without anywhere near the emphasis on Efficient Markets Hypothesis that might have occurred if we all were finance majors or econometricians. I would say that being right next to MIT, that engineering focus is, again, a critical part of the culture at AlphaSimplex.
So Rob, you’ve been with AlphaSimplex since before they launched a managed futures type product and it really has grown to be a large player in the managed futures space. What are some of the lessons that you’ve learned along the way? Give us some details on how this experience has been, starting from over the last decade or so of time that you’ve been working in the managed futures space.
You’re absolutely right. So, when we launched the managed futures fund, we tried to take a bit of a different track than many who had already been in the space. Of course, we didn’t have many of the advantages of some of the more established players. We launched our managed futures program in 2010, which was the better part of two decades, if not more, after the first guard in the space.
There were some benefits to that in that futures were a more developed market, that the number of contracts available was much larger, that the liquidity was much more available, that the tools for execution were much, much more developed than were present at that time. We also had the benefit that our infrastructure could be much more modern because we didn’t have many of those legacy problems that larger, older institutions might have.
We also had some weaknesses. The weaknesses being that none of us had worked at a CTA before. None of us had the background that kind of spawned all of the later firms that came out of AHL or Campbell, or the like. So, as a result, we had to learn some of those lessons for the first time.
If you look back to our own strategy, there were times where we did very well, and there were some times that, instead of having those earnings, we had learnings. As we go on, I’d be happy to go into those.
What they really came down to was: how do you think about markets? What are your assumptions about how markets behave? And, specifically, what are your assumptions about how markets interact?
I think that, as we grew our portfolios, they became more diversified. As we gained greater confidence in letting the signals drive that risk allocation rather than imposing a risk allocation upon the product, we were able to create a much more productive and much more robust system as a whole. As I said before, a lot of those lessons were learned the hard way.
Before we jump into some of the further background that I’m sure Katy might have some questions about, I’m also interested in allowing people to get to know you a little bit better. So, one of the things that I often talk to my guests about is, is there something that people might not know about you - some kind of hidden talent? I’ve heard the word ballroom a couple of times. I don’t know, but I’m kind of interested that it’s not all math and stats, but what’s the person Rob like when he’s not in the office?
So, I’m a bit old-fashioned, I suppose. As I said, my wife and I met on the ballroom dance team. We still go out swing dancing and salsa dancing when we can steal away from our kids. Though with two of them, and both of them being pretty young, that’s much less time than we ever thought beforehand. I am a big outdoorsman. I love hiking, mountain climbing, a little bit of biking, though I’ll be frank, that’s mostly commuting these days. I’m also very much a do-it-yourself kind of guy. We just finished one of those formative mistakes of every man’s life which is to remodel a house. I ended up doing much more of that myself than I ever intended, but that’s something I love.
It was the type of thing where I’d go in every day. I’d wake up in the morning. I’d put in an hour painting a room, or changing some plumbing, then go into the office. I'd do the research, work with the markets during the day and then come back and spend time with the family, put the kids to sleep, give them their baths, and then I’d be back fixing the next part of the house. So that would be another part that is probably not worth my time in an economic sense, but it’s something that gives me great joy.
Maybe we can also turn back to adaptive markets because I think this is something that specifically AlphaSimplex focuses a lot on. Maybe you could give us a little bit more background about how you think about adaptive markets and about how that runs in your research process as well as your investment process.
Sure, so when you think of adaptive markets… I know you’re referencing Andrew’s paper and Andrew’s theory, and I don’t want to take any credit away from that. It’s a tremendous theory. It’s a very comprehensive and cohesive view as to how you can bridge efficient markets and behavioral economics. But, at its core what it is really describing is that markets are mostly efficient, but it’s composed of agents. It’s composed of not perfectly rational (but most of the time pretty close to rational) beings that have different objectives and different focuses - different points of attention.
One of the clear outcomes of that model of the universe, especially if you add in random shocks (which I think we all agree occur, though some would argue about the randomness, and we can hit that later), when you build a model of the world of that variety you quickly realize two things: one, you will never capture all of the factors, all of the drivers, or all of the sources of impetus that are driving the individual actors of the markets; and two, as time goes on, those agents, those people, those traders, those human beings that are interacting are going to learn.
So, the things that have worked in the past are not necessarily going to work in the future, and they certainly will vary over time in terms of how successful they will be. That has direct relevance to the success of investment strategies. If you think about any source of return, that source of return (if it’s not simply a compensation for risk) is going to be balanced by two things: one, what is the long-term expectation of whatever premia you’re trying to capture; and two, how many other people are trying to go for it? How many other people are trying to supply that market with that hedge, that source of liquidity?
So, as a result, as more people pay attention to those, the opportunity declines. As more people forget about it or view it as anathema, view it as an uncompensated risk, the more attractive it actually will be. So, when you build your portfolios you want to make sure that: one, whatever you do can adjust to those changes; and two, when you build those portfolios, chances are the people designing them aren’t going to be any more able to adjust and adapt those portfolios than any other person in the market. To think otherwise would be a little bit arrogant, in my opinion.
So, you want to make it so that your portfolios, your models can make those adaptations themselves in a way that is, of course, consistent with the mandate of whatever the investment is, but also is eminently transparent so that you can make sure that, indeed, it’s doing what it’s supposed to do - that the models, that the algorithms (however you want to describe it) are doing what they’re supposed to be doing. That’s how we view the Adaptive Markets Hypothesis. It’s one part of the view of how the markets work and another part of the view of how should you participate in such a market.
Perhaps you could go a little bit more practical and tell us more about ways that you can actually implement this. Are there certain things that you think are particularly important for being adaptive? Or, are there certain techniques of certain approaches that you have adopted to help follow that view?
So, you can answer that question a bunch of ways. I think in our portfolios we try to do it at multiple levels. I’ll give you just three quick examples: so one, the product that we’re best known for at AlphaSimplex, at least for now, is our trend following fund - our trend following program.
One of the unique things about trend following (and I know you’ve discussed it many, many times on this program) is that trend following has a few advantages. It’s broadly diversified. It’s (if you are doing it, in my opinion, correctly) fairly risk managed in that it will, in effect, detect the changes in the market - the dislocations in the market, and have a certain (in an abstract sense) likelihood of capturing or potentially providing a return based on that movement.
If you think about what momentum really is, there aren't fundamental premia that are being captured by momentum. It’s not like carry. It’s not like value where you can at least quantify something. Maybe the quantified carry, or the quantified value component doesn’t directly, or linearly, or even inversely relate to the return you capture in the next period. But you do have, in some sense, a quantity that you’re trying to capture. It’s something specific that you’re detecting.
In the case of momentum you don’t know what’s causing the momentum. We like to try to think of these reasons. We like to try to build some intuition as to why the Mexican peso is moving so and so, or the U.S. dollar is moving in some direction, or gold is going up or down because that gives us greater confidence in what the signal is. It makes us feel like we better understand our systems. It allows us to create a narrative. But momentum doesn’t require us to know that narrative.
Momentum doesn’t require us to have that insight. It’s a generic, (in a mathematical sense) it’s a weak detector, it’s a weak learner. In doing so it gives us a modest degree of capturing any particular feature, even if we don’t know what that feature is, or what that factor is, or what that even is ahead of time.
So, that’s the first way you can think about how we go about managing this adaptiveness in a very, very coarse, cursory way at AlphaSimplex. It’s by allowing investors to have access to momentum at (what we think of) as a very cost-effective format. I won’t go any further than that for compliance reasons.
The second aspect is how do we follow that momentum, how do we build those signals, how do we apply that detection? So there are… When you’re asking formulation, again, there are two different sleeves or two different paths that you can take. One is that you can be very broadly diversified and I’m not just talking about across assets or across trend horizons (which, again, I’m sure you‘ve spoken at length before on this podcast) but also approach.
Different trend models (be it a simple moving average, or a breakout model, or a dual moving average crossover, or something more exotic like a synthetic option straddle), they all behave, in general, like momentum, like trend. But, they all behave differently at different points in time based on differences in their definitions. They have different emphasis on very short-term behavior.
So, a simple moving average cares just as much about the return yesterday as it cares about the return a year ago on that day. A dual moving average crossover really doesn’t have much weight at all on what happened yesterday. It cares a lot about what happened at the particular moving average horizon - so if it’s a twenty day versus a hundred day, it cares a lot about what happened twenty days ago. So, those differences in definition lead to different outcomes.
So, if you’re going through a market that has a lot of mean reversion to it, maybe a dual moving average crossover is the way to go. But if you’re going through an environment that is very trendy, then a simple moving average might be the best way to go or even an exponentially weighted process which cares a lot about what happened yesterday and very little about what happened in the past.
The trick is you never know ahead of time which of those is going to be more successful. So, one way you can go about that problem is to combine the methodologies, to take a diversified approach, to take a weighted combination of those mechanisms. Another thing you can do is to try to build into your algorithms some ability of saying, “Actually, right now I should be very reactive, or I should be very conservative. I should be very, very tolerant of short-term noise.” The trick, of course, is how do you do that?
In trend following, you can say that there are only a few things that vary between managers. One of the big ones is how you relatively weight the trends that you see. Well, a lot of that comes down to what is that function. One of the ways that we think we add value is to add some dynamism to that - to try to learn from the data what that weighting is and to allow our algorithms, to allow our systems, to change over time in the lessons that they learn and in the functional response, the amount of weight that we give for a given signal.
Now, let me add a thousand caveats here. There are some really, really great ways to lose money by trying to highlight or trying to overfit a historical period. A really simple mechanism to try to really hone in on what worked really well in a situation like today, would be to go back in time and find the five, or the ten, or the fifty days that looked most like today under some feature set and to do exactly what was the optimal thing to do on those days, in the past.
I would be willing to bet you a fair amount of money that you would have two outcomes. One, your variation of positions would be huge. Your transactions costs would be huge. Your turnover would be monstrous, and you’d lose as a result. Maybe the signal might have a little bit of alpha, but probably not, it’s probably all statistical noise, but you will have cost, that is known, that is a given. On the other side, you could not try to make any of these decisions. You could just take that diversified approach that I was talking about before.
We think the optimal answer is somewhere in between. The trick is, for us, to figure out how adaptive our model should be, how specific, by market, they should be, and more specifically, how can we let our models shape themselves to each market and shape themselves to changes over time to best capture that return.
That, if I was going to describe anything, is where we spend a majority of our time and a majority of our effort (at least from an algorithm design perspective) in terms of adding value within our managed futures program. That’s what we mean by adaptation. Diversification, but also trying to hone in on how do we make that balance between overfit, but perfectly matched and overly generalized?
So Rob, not to use the common buzz word in our field, but it sounds like you’re using an adaptive that you may be using some machine learning type approaches. Could you elaborate a little bit more on that and what is your background with machine learning and how do you think about it in this space?
So, it’s absolutely machine learning. I think you’re right that machine learning is this incredible buzz word today, and for many people, it’s not something that they really ran into more than a year or two ago. My background is in statistics. My background is in statistics, and I got my degree in statistics back before statistics was a sexy field. When I was an undergrad, I was one of two statisticians in the field. I had a faculty teacher ratio of one to five, one being the undergrad, five being the faculty. That’s why I went into statistics, truth be told.
In statistics, in machine learning and decision theory, many of these concepts that are talked about actually got their start in the ‘70s, in the ‘60s, even in the ‘50s. Where a lot of the nuance has come today, or a lot of the excitement has come today, is in where and how they are applied, in what data sets they’re applied to, and to what degree you can automate and systematize the application of these tools.
Machine learning is a toolbox. To say that we use machine learning to build our algorithms is kind of like saying, “I use tools to build a house.” It’s not really additive in terms of your understanding. So, let me break that down. Let’s talk about what actually we do with machine learning. Before we go into the tools that we use, which I think are really interesting, let me also break down to the problems that we try to solve because I think there’s a lot of hype about machine learning and I think that there are some kinds of problems where there is a lot of potential for growth.
I think that there are some kinds of problems that, regardless of the amount of machinery that you throw at them, are still going to be a challenge and are still going to be a source of where people who are well versed in machines, and specifically their limitations, will be able to still add value as humans rather than just automatons.
So when we think about machine learning, I would say that there are two kinds of problems. You have what you would think of as classification problems, or (as what I would like to call) stationary problems, meaning that the problem that you’re working on doesn’t change over time.
Here’s a great example: Google has come out with a lot of really interesting results and a lot of really impressive, fast algorithms for identifying various things in videos and images. You go back ten years, and it was a really hard thing to identify a cat in a photo. Now it’s a really trivial thing. In fact, you can do that for arbitrary objects. You can just go online, and there are online classifiers that allow you to make these decisions.
One of the very, very, early uses of these things was (in financial markets) things like counting cars in parking lots, or identifying cars in parking lots, or identifying the levels of oil in silos, or trying to predict crop yields, things like these. Those kinds of questions, where it doesn’t matter how many people are looking at the field to identify if this is going to be a good forecast or a bad forecast, or is it going to be a high yield crop or a low yield crop? That doesn’t change the success of detection of that yield.
It doesn’t matter how many people are looking at whether or not that’s a cat in the video or a cat in the image. That doesn’t change your success rate. That’s a static problem and the more data you can throw at it, the more training samples that you can throw at the problem, the better your algorithm will be up to some asymptotic.
What are the challenges there? Well, the challenges there are an abundance of features. The more things that you potentially know about a data set the harder is to glean what’s true. Also noise, the more noisy a data set, the more pixilated an image, for example, the harder it’s going to be to get your answer. But again, the more data that you have, the more training samples that you have, the better your algorithm will be in the end.
There’s a second kind of problem, and that’s, unfortunately, the problem that we typically have in trying to forecast financial markets. So again, in the first case where you can do machine learning for yield discovery, you can do it for forecasting earnings, you can do it for trying to predict the number of skews that will be sold by a retailer. Those are all great things. Those are all classification problems. Those are all stationary.
But when you start trying to answer the question, “Well, this earnings level and this book value and this momentum indicator and this sentiment out of the CEO on his earnings call, what that means to the return that’s going to happen between today and tomorrow? That’s a much harder question because there’s a feedback loop.
The more people try to answer that question, and the better their answer is, the less relevant it is in figuring out where it’s going to go tomorrow. That is to say, the solution to the problem, the more people looking at the problem, make that problem harder and may make the features that you used to think were useful in making your prediction no longer useful because they’ve been fully priced in.
So, as a result, you have this competition, this fight to be first, this fight to be right among lots of intelligent people in the market is going to make something that means that whatever you design for the second kind of problem, this forecasting problem, it’s going to disappear over time. So, it may have been in the past that simple value signals or value pricing in equities may have been very successful at determining the direction of markets. Or, for a more concrete example, if you were to go back to the ‘80s and ’90, simply knowing which direction the price moved over the last five days or ten days was a really good indication of where prices were going to move over the next five days. It was a sharp two or sharp three strategy depending on how good your transaction costs were. That simply isn’t the case anymore.
So, the balance that we have to fight is because of the second category of problem. It’s whatever we think we know may work for some time, but then that will decay as more people figure out the things that we figured out, even if we did have forecasting power before, we may not have forecasting power tomorrow.
So, it’s not just model overfit. There are lots of problems with model overfit, especially noisy data like financial data, but it’s also model decay, and it’s this evolution of market participants that will make things that used to work cease to work. At the same time, if people stop paying attention to these particular features, things that used to not work may also start to work again. But that’s the challenge that we see today.
It’s interesting. It’s very, very interesting as an actual field and not a field that I know a lot about; I have to admit. I kind of posed the same question a few months ago when I was interviewing the founders of AHL and to my surprise both David Harding and Martin Lueck and I think Mike Adam as well, actually, they didn’t sound too overly enthusiastic about artificial intelligence. Actually, to some extent, from memory they were talking about, well, in a sense we already have that because each of our brains, each participant in the market kind of makes up that structure.
To break it down in a very simple way, for me, is of course that I think of it as the more examples the machine sees it learns more and more. But, it learns more and more from what’s happening in the recent history, and we know that things constantly change and might even go back. You know, we don’t have this kind of data from fifty or a hundred years ago, but actually, to some extent, you could argue that markets go back and replicate themselves into how things worked fifty or a hundred years ago. It doesn’t always have to be new change. It could be changed back to the way it was. So how do you overcome all of that? How do you make it work?
I don’t actually agree with them, to a large extent. I think that, as I gave with that example, I think that you can very easily overdo it with machine learning. I think you can very easily overfit your process and that overfitting can come from just finding noise and thinking it’s signal, or it can come from decaying of these attributes.
The things that we’re detecting, the things that we’re allowing our models to adjust to are generally fairly slow moving features. These are not things that are going to change from one month to the next. These are features that are going to develop and disappear over the course of years. Anything shorter than that and we have no hope of detecting and having any sense of confidence in it.
The difference in the way we design our models is that we design them so that they can (those algorithms can) shape those particular nuances between markets, but that they have a common structure, and have common constraints, and common features across markets that we know (in some sense) hold economic value or has some economic truth to them. So again, putting my statistician’s hat on, putting my model builder’s hat on, when you have an infinitely wide dimensional feature space, anything can go into forecasting any market, potentially. You had only so much data.
Suppose we have forty years of data, well congratulations, you have ten thousand data points on a daily basis. Yes, you could chop it up second to second, or minute to minute, but for most of the things that we’re talking about (especially for momentum and trend following), it’s not going to be additive. The autocorrelation of the signal is too high.
So, how do you deal with that problem? You deal with that problem by building structure. You deal with that problem by saying, “You know what, my learning models, my mechanisms, they can only learn a certain class of problem. They can only learn (within trend following) how I relatively weight my different horizons?”
They can never go shorter trend. They can never give too much emphasis to a trend. The kinds of lessons that they can extract are in an, actually, very, very narrow region of that unlimited space because, otherwise, I can guarantee you’ll pick up noise.
An epochal story (probably epochal), it very well may be true, told to me by Andy Gelman of Columbia was that when, either he was a grad student or when he was an early professor, he had a student try to find the single best predictor of the S&P 500. What the student came back with, after studying thousands and tens of thousands of databases was the quarterly price of butter in Bangladesh. Indeed, over the in-sample period it would have phenomenally high r2, and on an out of sample basis, it was worthless.
The only way you can avoid those kinds of instances, besides model discipline, is by adding structure to your model: by knowing what kinds of features you can input, by knowing how those inputs can be used to develop that function, to develop that model, and what form the output will be. That’s where the art, so of speak, comes from.
The only difference between what we do and what Harding did and AHL did back in the day was that we’ve tried to make it so that the algorithms add much of that asset by asset, or asset class, by asset class nuance. We parameterize how they learn, rather than trying to code and tune every asset on an item by item basis.
Why do we do that? Again, because if you have two or three parameters that allow us to figure out or to constrain how seventy or eighty markets are learning or are being detected or are following trends, we’re going to be much less likely to overfit our process. We’re going to be much less likely to overfit those parameters than if we have to fit eighty different markets or two hundred seventy, or two hundred forty different parameters per eighty markets. So, by narrowing that we’re actually taking advantage of these tools and of the systems and of the computation to make our jobs…
Ready to learn more about the world’s Top Traders? Go to TOPTRADERSUNPLUGGED.COM and signup to receive the full transcripts of the first ten episodes of the show, and visit the show notes where you can find useful links to other amazing resources.
Thanks for listening and we’ll see you on the next episode of Top Traders Unplugged.