Forums  > Basics  > Statistical analysis of backrest results  
     
Page 1 of 1
Display using:  

Lebowski


Total Posts: 70
Joined: Jun 2015
 
Posted: 2018-09-04 21:25
Hi all. Long-time-no-post but hopefully espressolover et. al will drop some knowledge on me for free like when I was back in school.

In my mind, there are kinda two obvious angles from which you can examine your backtest results:
1. Time series of returns. This is what we usually discuss. I’m gonna state something potentially naive in the hope that someone corrects me and I learn something: “beyond studying your residuals and computing the sharpe/sortino there’s not much to see there.”

2. Individual trades. This is where I’m really curious. What sort of statistics can you compute on the sample of trades? Specifically from a high frequency perspective, you likely want to know how quickly your alpha deteriorates, e.g. “where would I have been filled if this signal came in at t + some delta?” Another interesting angle might be to look at the trades conditional on something observable such as for discussions sake: intensity of market order arrivals, state of the LOB. With these examples in mind, what can you guys tell me as general guidance about looking at individual trades? What characteristics do the trades of a good HF backtest result have (besides edge?)

Thanks as always. Feel encouraged to correct anything you read above.

doomanx


Total Posts: 10
Joined: Jul 2018
 
Posted: 2018-09-06 10:56
I think the question is a little general. You mentioned HF - are you market making, latency arbing? A 'good' backtest is going to be conditional on the strategy, provided the slope of the pnl line is up, of course.

On a more basic level there's 'edge' in the sense of positive EV of your trades but it's not a real edge if it's data mined, so a good strat will be statistically significantly 'non-random'. For that there is many recommendations on this forum (See papers of White or Hansen for some ideas, although I think there are more pragmatic ways to approach the problem).

contango_and_cash


Total Posts: 87
Joined: Sep 2015
 
Posted: 2018-09-06 13:25
look at lopez de prado's papers on backtesting.

from what you're saying it sounds more like market impact type studies and low latency trading.

can you do the same activity with smaller size and check that your realized trades are near your assumptions etc? high freq should generate enough data very quickly to know if any of your assumptions are very wrong.


Lebowski


Total Posts: 70
Joined: Jun 2015
 
Posted: 2018-09-06 13:36
@doom Thanks a lot for the leads on Hansen and White's respective tests. That's exactly the sort of thing I was hoping to dig up here.

What I aim to do is build a really general backtesting framework. I say "HF" because the application is event driven on LOB data but I don't really have an answer at the present time as to what I'm trying to do. If there's a category called "market making and stat arb - not super low latency" I suppose that would be the intended use case.*

Will read up on both tests on the train over the next couple days. Thank you.

1. I don't know, in practice that category might be an oxymoron, but beyond any particular use case my motivation for building this is really just that I'm a long term hobbyist (who has never placed a trade) & junior software engineer who kinda hopes my next job is building one of these.

@contango I've gathered that's how many of the "real" firms do this, but per the above I'm not really a real firm. They go live in small size because backtesting is spurious & a drain on firm resources. Not really an option in my case and indeed my case is more about the journey than the destination. Thanks also for the paper suggestion. I have his book but have not yet read the backtesting chapter. Will take a look upon your reminder. Thanks again!

EDIT: typos, consolidated replies.

doomanx


Total Posts: 10
Joined: Jul 2018
 
Posted: 2018-09-06 14:33
I assume by the post your idea is to do some sort of stat arb with a 'market making flavor' to reduce t-costs. Forgive me if this is not the case, but if this is what you are thinking I'm not convinced by the idea. Managing inventory risk and short term prediction are pretty separate objectives. Much better to spend time modeling your market impact (don't read the crazy phrench closed form solution papers to this).

I don't necessarily agree with the recommendation of de prado - his latest book has maybe one useful section (which happens to be an exposition rather than a recommendation of his) and the rest is just sensationalist. It's written like an academic that's never traded. Apparently the guy ran some funds or whatever but I'm convinced nobody in industry uses the methods he presents. The core concept he's banging on about in his presentations (data mining bias) has been known for a long time (the papers I mentioned previously are testament to this) but because he's an 'ML' guy who's orders of magnitude smarter than those simian stats/econometric guys, he thinks he's ahead of the game.

doomanx


Total Posts: 10
Joined: Jul 2018
 
Posted: 2018-09-06 14:34
I mean come on '7 Reasons why Machine Learning funds fail' are we on Buzzfeed now or what.

Lebowski


Total Posts: 70
Joined: Jun 2015
 
Posted: 2018-09-06 20:28
@doom I appreciate your comments on the idea of combining the two, and I further agree that short term forecasting and managing inventory are indeed separate problems, but I ask that you to view my reply more as “various intraday approaches” than me saying explicitly that my strategy is this or that. I do not have anything resembling a strategy, what I have in an interest in building a generic intraday backtester. If we were in person I would shell you with questions about delineating the two, but we’re online so I want to keep the discussion focused to the extent possible. I may however start another thread to discuss de prado once I’ve read further. I put it down after chapter three to focus on other stuff, but the first couple chapters didn’t seem too outlandish..note that I haven’t gotten to the quantum computing chapter yet though. I can tell you my desk has more salient concerns than that. Further to your comments, academics want to write about cool stuff—we just want to keep risk up.

contango_and_cash


Total Posts: 87
Joined: Sep 2015
 
Posted: 2018-09-10 16:08
@doomanx - it is useful to know his research exists, and to consider his opinion.

similar to kelly, not all published results are followed in practice but they are helpful to know of.

EspressoLover


Total Posts: 330
Joined: Jan 2015
 
Posted: 2018-09-11 19:03
In terms of point 2, here's a grab bag of some things I find useful from that perspective.

Returns by Horizon: Take every fill, then look at some metric of "market-price" at some fixed time interval after the fill. What's the average return over that interval? E.g. if you fill a bid $100, and weighted-mid-price 100 milliseconds later is $99.97 then you're return for that horizon is -3 bps. Average that over all trades, or some interesting sub-sample of trades. Now do that for a sequence of horizons from very short to long and plot the curve.

This tells you a lot of interesting things. First it kind of helps you know what the general time-frame of your edge is. If almost all your PnL is realized in under a second, you want to focus on getting out of positions very quick. Or if the curve keeps realizing even well past the point of your strategy's average turnover, you probably want to trade less.

It's also a pretty good spot-check for overfitting. You generally want to see a smooth rise on the curve, with the fastest rate of return in the short-horizon, then a general leveling off, with maybe a slow decay on long-horizons as toxicity overwhelms alpha. Generally overfit signals have some sort of weird-ass structure, like all the returns coming from the middle horizons.

PnL by Source: I like to break this down into spread-capture (or payment), fees/rebates, impact from the fill trade (e.g. if you're market making and the level got swiped, causing an immediate mark-to-market loss), post-trade alpha/drift/toxicity (see above).

These are good metrics to keep in mind because they help you determine where to lean to squeeze more juice out of the strategy. E.g. if you're doing great on everything else, but losing all your money on fill impact, then you probably need to focus more on getting better queue position. Also good to look at how these numbers breakdown based on different subsets of trades, e.g. morning vs. afternoon or narrow-spread vs. wide-spread.

Risk-increasing vs. risk-reducing trades: Do trades that increase your total inventory have meaningfully different returns than those that result in a decrease. A lot of times, you'll see that the former have much better return characteristics than the latter. This could be for a number of reasons. One it's a sign of overfitting. To see why, think about an overfitted case, where some optimizer makes all it's PnL in backtest from doing the same trade a million times in a row. It's basically a sign that your fit has fewer degrees of freedom than the number of trades.

Two, assuming it's not overfit, it's a sign that your profit is compensation from some priced risk factor that others are offloading on to you. That's not necessarily a bad thing, for economic, institutional or even mechanical reasons the compensation could still be well worth it. But it's something to be aware of. It also increases the risk of fat-tailed drawdowns, since a common priced risk is more likely to be synchronously sold off by the entire market.

A somewhat related variant to this is looking at returns by how many times in a row you've traded in the same direction. If you're trading a portfolio of instruments you can also consider risk-reducing/increasing from the perspective of single-instrument or portfolio beta.

Distribution of touch sizes on fills: If you're quoting at the bid, what's the average and typical range of bid size at the time of your fill? This is a good metric in terms of estimating scalability of a strategy. It's also a good way to subset trades to determine how likely you are to worsen if you size up. E.g. if returns are much better when the touch is showing

Order lifetime characteristics: Things like how long the order was alive before it was filled, how long it spent at the touch vs. outside, what was the range of alpha signals during it's lifetime, how far did it move up the queue during it's life, how many orders joined behind it, what percent of the orders it shared a queue with were cancelled, etc. Calculating the average trade returns based on these types of things can you point to ways to optimize the strategy. E.g. maybe all your PnL comes from times you quoted deep-in-the-book and it's not worth it to join the top of the queue at the inside.

[Placeholder if I think of anything else]

Good questions outrank easy answers. -Paul Samuelson

Lebowski


Total Posts: 70
Joined: Jun 2015
 
Posted: 2018-09-12 18:15
Thank you, David Shaw, Jim Simons, Blair Hull or whoever the legendary polymath behind @espressolover is. Let me digest this a bit and perhaps we can discuss further once I get my thoughts together. First blush though I very much like the idea of risk on vs risk off analysis as well as PnL by source. The idea of decomposing PnL like this is not new to me but I like the way you’ve adapted it to intraday trading. You’re adding a lot of value here (NP) with not a ton of obvious return, so count this as at least a “thanks.”
Previous Thread :: Next Thread 
Page 1 of 1