Forums  > Pricing & Modelling  > Statistical significance for a subset of a non-normal distribution  
     
Page 1 of 1
Display using:  

Strange


Total Posts: 1450
Joined: Jun 2004
 
Posted: 2018-09-09 21:06
Let's imagine I have a highly skewed and kurtotic distribution of returns (e.g. returns from selling S&P skew). I also have a model that selects a subset of these returns. Any reliable techniques that would tell me how statistically significant is my model?

My initial inclinations are to either use Mann–Whitney U test or Wilcoxon test. In the first one, I'd draw a random sample of the original distribution and compare it to the model sample. In the second one, if I understand it correctly, I can simply compare a sample to the full population.

I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'

TonyC
Nuclear Energy Trader

Total Posts: 1280
Joined: May 2004
 
Posted: 2018-09-09 22:00
sitting here in the Louis Armstrong Airport nursing my third beer waiting for my twice delayed flight to get me back home
... given three beers and two Bourbon and sodas, I vaguely recall that Mann Whitney makes the assumption that the two distributions are independent while Wilcoxon makes assumption that the two distributions are related

to the extent that one of your distributions is a subset of the other, that would tend to make me think that that it violates the independent distribution Assumption of Mann Whitney, and that wilcoxen is more appropriate

flaneur/boulevardier/remittance man/energy trader

Strange


Total Posts: 1450
Joined: Jun 2004
 
Posted: 2018-09-09 23:17
I thought if I take a random samples, they would be independent for the purposes of the test. But yes, I hear you.

I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'

TonyC
Nuclear Energy Trader

Total Posts: 1280
Joined: May 2004
 
Posted: 2018-09-10 02:41
you make a good point about the random resampling, and now that you brought it up, I've had some second thoughts.

wilcoxon requires "paired" data sets. if I understand your hypothetical example, the model decides when to engage in shorting the skew as opposed to just naively always shorting the skew until you go bust.

so the model distribution is a subset of the stb (short till bust) distribution. the other "comparisondistribution" is the excluding distribution, the stb distribution with the models trades excluded

let's suppose that excluded distribution is 1000 observations and the model is 150 observations. I suppose you could randomly resample with replacement a hundred observations of the model and 100 observations of the excluded distribution, run the wilcoxon test on those hundred observation pairs, and wash rinse repeat ad infinitum so that you had a whole bunch of wilcoxon statistics.

but if my memory serves, (and consider that I am in the window of the bar at Mollys on Decatur Street because my flight did not leave tonight), if you do that you're really doing three-quarters of a Mann Whitney test

in fact if I remember correctly, you can actually run a Mann Whitney test by using the wilcoxon algorithm by invoking the R command :
wilcox_test(x,y, pair="false") rather than pair equal true..
(and if the only difference betwixt a wilcoxon test and a Mann Whitney test is setting a flag from true to false, that has to count for something)

but like I said ... I'm in the window ... at the bar ... at Mollys.
so take all this not with Just Grains of salt but with truckloads of salt; as there is lots of hand-waving involved.

flaneur/boulevardier/remittance man/energy trader

ronin


Total Posts: 361
Joined: May 2006
 
Posted: 2018-09-10 16:09

It helps if you think a bit about what you think is different about your subset. Mean? Variance? Skew? Kurtosis?

The out-of-the-box statistical tests measure the sameness of the mean. Are the two means closer than some measure of uncertainty due to finite sampling. Mann-Whitney even assumes that the uncertainty is normally distributed. Wilcoxon doesn't.

I assume that is what you are hoping for. Do I generate some value by deciding when to sell puts as opposed to just selling them all the time. But I would expect that the appropriate measure of noise in the denominator would be specific to your returns distribution, rather than coming out of some standardised test.

"There is a SIX am?" -- Arthur

EspressoLover


Total Posts: 337
Joined: Jan 2015
 
Posted: 2018-09-10 19:33
Maybe you're putting the cart before the horse. Just because the underlying distribution is skewed and leptokurtic doesn't mean that the sample statistics deviate significantly from normality. Unless the sample size is very small (less than 100 independent points) or very non-normal, CLT probably means that you can use plain ole' T-tests.

At the very least, I'd get a handle on the issue by bootstrapping the relevant sample stats. (I'm assuming that you primarily care about the difference between the population means.) Histogram out the values you get from bootstrapping. If you can't really eyeball a significant non-normality from this, then you're probably fine relying assuming CLT applies

Good questions outrank easy answers. -Paul Samuelson

TonyC
Nuclear Energy Trader

Total Posts: 1280
Joined: May 2004
 
Posted: 2018-09-10 21:58
Pay attention to what Ronin said, he's smarter 'n i am

flaneur/boulevardier/remittance man/energy trader

Strange


Total Posts: 1450
Joined: Jun 2004
 
Posted: 2018-09-11 03:42
@ronin Since I want to gauge the expectation of future PnL, it's the mean.

@EspressoLover Can we actually assume i.i.d. considering that these are sample and a sub-sample? However, I'll try that tomorrow.

Spinning this in my head, I feel a little like there are two very distinct cases in terms of the significance testing. Can't quite put my finger on it, but something like "am I aligned with the median or against it"...

Let's say for simplicity that I buy or sell puts on S&P. In the first case, let's say I am a seller, collect the bleed but somehow I seem to avoid the drawdowns. Did I avoid the drawdowns due to dumb luck or due to some information content in my signal? Second case is I buy puts, don't seem to bleed much and seem to catch most of the big down moves. Do I think that my lack of bleed is just due to random luck or is it due to information content in my signal?

I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'

TonyC
Nuclear Energy Trader

Total Posts: 1280
Joined: May 2004
 
Posted: 2018-09-11 05:22
maybe hennrickson and merton test of effectivness of market timing signal?

Attached File: onmarkettimingpart2.pdf

but beware of options causing false findings of timing ability as described here
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1947323

flaneur/boulevardier/remittance man/energy trader

Strange


Total Posts: 1450
Joined: Jun 2004
 
Posted: 2018-09-11 06:05
@TonyC - thanks, I'll take a look!

I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'

ronin


Total Posts: 361
Joined: May 2006
 
Posted: 2018-09-11 11:14
> he's smarter 'n i am

I wish I was mate, I wish I was...


@strange, in all seriousness.

Say you are picking some times to sell puts. You collect your pennies, and you get steamrolled every once in a while. Are you better than a dumb robot who sells puts all the time.

I would factor out various ways of being 'better'.
- are you collecting higher premia than average?
- are you having fewer drawdowns than average?
- are your drawdowns shallower than average?

Each of these (premia, drawdown frequency, drawdown depth) is probably more log Gaussian than your return distribution. So as a first go, I would try testing them separately using standardised tests. Out of those, the second one is going to be most suspicious, so if that's your edge you would probably have to dig in quite deep before you get comfortable.

And then there are ratios. Premium size to drawdown depth, premium size to drawdown frequency etc. Once you start digging in, something should come out.

If it doesn't, you'll have to just randomise your picks and work out where you are in the distribution of randomly picking strategies. I don't think there is a simple formula.

But that last scenario would probably not be tradable. "I am better than average, but I don't really know why" - it doesn't really fly off the shelves.

"There is a SIX am?" -- Arthur

Strange


Total Posts: 1450
Joined: Jun 2004
 
Posted: 2018-09-11 14:37
@ronin

I get it now, I confused a lot of people by talking about selling puts or whatever. It was a poor mnemonic device (in retrospect) because people right away start thinking about the distribution of returns.

- assume that this is a process for signal validation, involving the actual prices and P&L quality is not the right approach because it will mask the signal quality. If anything, let's convert the results into a digital form - "vol wins" or "vol loses".

- also, assume that each signal has some reasonable hypothesis behind it. It's not like "I am better than average, but I don't really know why" but more like "does buying puts after Trump tweets actually work" and "is it better to avoid selling puts on rainy days".

- there are several of these signals, but not a large number, so we don't need to make any form of data mining corrections.

- total number of these digital observations is large, but not unlimited (so it's not tick data);

So, to re-frame, looking at two types of models. Both are based on some external bit of data, both models are drawing a subsample from a highly asymmetrical binomial distribution

(a) model 1 aims to maximize draws from the bigger bucket
(b) model 2 aims to maximize draws from the smaller bucket




I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'

ronin


Total Posts: 361
Joined: May 2006
 
Posted: 2018-09-11 16:51
> "does buying puts after Trump tweets actually work"

That's not a bad one. Let me know if it does...

I guess the problem is that I don't really know what sort of distribution we are dealing with here.

E.g., how much is 3 standard deviations in your distribution? Is it pretty much all there is, or is just a little bit?



"There is a SIX am?" -- Arthur

nikol


Total Posts: 553
Joined: Jun 2005
 
Posted: 2018-09-11 17:47
What is the point in comparing set and subset unless there is some selection (filtering) model in between which extracts useful info, am I right?

If you care only about plain distribution (no time is involved, special ordering etc) then compare empirical CDFs: x=CDF(set), y=CDF(subset). If they are equal, you get diagonal qqplot.

Test for Uniformity distribution U = inverse_CDF_set(CDF_subset(i_point))
with, for example, Kolmogorov-Smirnov or Anderson-Darling.

These examples can give you further inspiration by giving more weight to tails or by other censoring or choosing right metrics (F-F)^2 or abs(F-F) etc. Particular choice requires some research and thinking about what kind of info is stored in your data.

EspressoLover


Total Posts: 337
Joined: Jan 2015
 
Posted: 2018-09-11 18:22
> Can we actually assume i.i.d. considering that these are sample and a sub-sample? However, I'll try that tomorrow.

I would suggest just comparing the sub-sample with the complement of the sub-sample. That eliminates any issues of overlapping sample points.

The null hypothesis is that the distribution of the sub-population is different than the distribution of the population. That's true if and only if the sub-population is different than the complement-sub-population. So comparing sub-sample and complement tells you the same thing.

#drawdown

I just want to add here that estimating drawdown is a whole 'nother bucket of worms. Unlike mean/variance/skew, it's not a population summary statistics. It's a time-series property, because the sequence of returns matters. In which case coming up with a tractable, analytical statistical test is a lot harder.

Plus drawdown has all sorts of pathological issues in terms of its mathematical behavior. For example a shorter-sample will always have lower expected drawdown than a longer sample, even if the returns are i.i.d. drawn from the same population. Now add on top that you want a nonparametric test, which is really hard to do with time series.

Personally, I'll use drawdown to give me a "gut feeling" about how a strategy behaves. But I'd just pass in terms of using it any formal statistical sense (like testing the hypothesis that one series has larger expected drawdown than another series). I think as long as you estimate mean, variance, skew, kurtosis, auto-correlation, heteroskedasticity and regime effects that pretty much captures everything relevant unless your time series is super-weird.

> assume that this is a process for signal validation, involving the actual prices and P&L quality is not the right approach because it will mask the signal quality.

If you're talking about signal fitting, not portfolio/risk management, then just use R-squared. Technically least-squares is MLE for normal distribution, but the target variable has to be really skewed or leptokurtic to meaningfully change the results. Very rare for returns, even VIX-type returns to be affected.

You can try for yourself, cap your dependent variables at three-standard deviations from the mean and re-fit the least-squares model. I'm willing to bet this fitted model and the vanilla model have 90%+ correlation with each other.

Good questions outrank easy answers. -Paul Samuelson

nikol


Total Posts: 553
Joined: Jun 2005
 
Posted: 2018-09-11 18:29
> I would suggest just comparing the sub-sample with the complement of the sub-sample. That eliminates any issues of overlapping sample points.


Worship

Strange


Total Posts: 1450
Joined: Jun 2004
 
Posted: 2018-09-11 23:38
@EspressoLover "I would suggest just comparing the sub-sample with the complement of the sub-sample."

That. Is. Smart.

The whole thing is about signal fitting only; in fact, it is rather specific to using a bunch of alternative data sets . The more I think about, the more I like the idea of using signs instead of returns.

PS. would anyone really try to use historical option returns to understand their risk?

I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'

TonyC
Nuclear Energy Trader

Total Posts: 1280
Joined: May 2004
 
Posted: 2018-09-12 04:29
> sub-sample with the complement of the sub-sample

complementary set, that's what I meant when I said excluded set as total set minus subset ... I am an inarticulate idiot

flaneur/boulevardier/remittance man/energy trader

goldorak


Total Posts: 1048
Joined: Nov 2004
 
Posted: 2018-09-12 08:47
Aside from the base question and re: assumptions to apply t-test, you may find the paper below interesting in the sense that you can compute an equivalent of the sharpe ratio (hence a t-stat) without relying on any normality assumption.

Sharper asset ranking from total drawdown durations

The paper is of course available from the usual sources.

If you are not living on the edge you are taking up too much space.

Strange


Total Posts: 1450
Joined: Jun 2004
 
Posted: 2018-09-17 01:09
@goldorak - thanks! looks interesting

I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'

nikol


Total Posts: 553
Joined: Jun 2005
 
Posted: 2018-09-24 12:17
@goldorak - thank you. duration is an interesting metrix, simple and informative

goldorak


Total Posts: 1048
Joined: Nov 2004
 
Posted: 2018-09-24 14:01
and does not rely on distributional assumptions Wink

If you are not living on the edge you are taking up too much space.

doomanx


Total Posts: 15
Joined: Jul 2018
 
Posted: 2018-09-24 15:37
Really nice paper goldorak thanks for the recommendation. For those interested you can find implementations of the proposed estimator at https://cran.r-project.org/web/packages/sharpeRratio/index.html for R and https://pypi.org/project/pysharperratio/0.1.10/ for Python.
Previous Thread :: Next Thread 
Page 1 of 1