Nonius

Founding Member Nonius Unbound

Total Posts: 11742 
Joined: Mar 2004 


I've a number of return profiles of fraudulent hedge funds. Looking for more though. anyone have the returns of:
Cambridge Partners Ashbury Synergy Princeton Economics ETJ Nidra Sagam
?
Thanks in advance, rgds, Nonius 
Location: somewhere in CfA2 Great Wall. 



stralen


Total Posts: 5 
Joined: Dec 2007 


try the returns from https://investatolsen.com/ 
Fear causes hesitation, and hesitation will cause your worst fears to come true. 


purbani


Total Posts: 82 
Joined: Apr 2005 


Hi Nonius
We have had quite a bit of success using the Hurst exponent to identify time series that are statistically 'too good to be true' or in sample perfect hindsight backtests. Method not infalible all PIPEs get flagged but correctly identified Fairfield Sentry and a number of other feeder funds. Can also be used to identify excess risk taking as in the case of Amaranth. The reason this seems to work is that the Hurst exponent is related to the alpha stable levy distribution through the fractal dimension. It is therefore a bit like an exponential distributon and highly sensitive to tail events. Sornettes method of identifying bubble risk appears to work the same way I.E things that are growing at an apparently faster than exponential rate ( think Cisco Fortune cover story in 1999 saying their projected earnings would exceed US Gdp within 5 years and more recently Amaranth ) grow to infinity in finite time which is impossible so must return to chaotic state or crash. Bottom line is Hurst not a linear function optimal is 0.55 to 0.68 anything above that indicative of excess momentum or dodgy numbers.
Would you be prepared to share some of your fraudulent time series data to test this ?
Kind regards,
Peter Urbani
peterurbaniinfiniticapitalcom





Nonius

Founding Member Nonius Unbound

Total Posts: 11742 
Joined: Mar 2004 


Hi Peter,
that sounds interesting. my method computes fraud probabilities conditioned on a certain statistic related to returns. It appears to work well in all fraud cases we have in our DB, including all madoff feeders.
I will look into how I could share data with you; perhaps with no names attached and mixed with not knowingly fraudulent returns. 
Location: somewhere in CfA2 Great Wall. 


krott


Total Posts: 3 
Joined: Mar 2009 


I'm doing factor analysis on time series. Would be fun to see how it works on your data. Have returns from a few "interesting" funds that I can share too. Email me phynancenewsalphacom 




purbani


Total Posts: 82 
Joined: Apr 2005 


Hi Nonius
Sure a no names basis would be fine as a blind test generally better
Kind regards,
Peter 




Added Sentry to our investable universe and the fund got the highest scoring.






What opinion deserves Evolution Capital Management? 



Nonius

Founding Member Nonius Unbound

Total Posts: 11742 
Joined: Mar 2004 


Sentry, Kingate, and all the other Mad feeders are at the top of mine along with Bayou, Wood River, Manhattan, and Valhalla. 
Location: somewhere in CfA2 Great Wall. 



aaron


Total Posts: 746 
Joined: Mar 2006 


Has anyone looked at applying Benford's Law? It states that the first nonzero digit of a random number should be distributed such that the frequency of k is Log(k+1)  Log(k) where Log means base10 logarithm (or whatever base you're writing the number in). It's one of those results I find really satisfying. Obviously it depends on some assumptions about what "random" means, but it works in a lot of empirical tests.
When people make numbers up, they tend to use digits more uniformly than Benford's distribution. In fact, they tend to like 9's and 7's, which are infrequent in numbers derived from measurement.
Here are the theoretical frequencies from Benford's Law:
1 
0.3010 
2 
0.1761 
3 
0.1249 
4 
0.0969 
5 
0.0792 
6 
0.0669 
7 
0.0580 
8 
0.0512 
9 
0.0458 
If the range of numbers is less than a few orders of magnitude, you can still apply the law with adjustments. 



jslade


Total Posts: 899 
Joined: Feb 2007 


I heard about someone running Benford's law on Bernie's returns early in the scandal.
Found the monthly returns data here: nakedshorts.typepad.com/files/madoff_fairfieldsentry3x.pdf
It kind of passes the sniff test: number of occurrences of 1:9 as MSD: 95 27 19 12 11 14 14 15 9 Frequency: 0.4398 0.125 0.088 0.0556 0.0509 0.0648 0.0648 0.0694 0.0417
(yeah, I posted something dumb before; sorry)

"Learning, n. The kind of ignorance distinguishing the studious." 



 

Lapin


Total Posts: 259 
Joined: Feb 2006 


My results.
I took the returns from their website http://www.zenithresources.com/performance_iop.php
First Digit 
Frequency 
% 
Theory 
0 
0 


1 
38 
34.86% 
30.10% 
2 
15 
13.76% 
17.60% 
3 
7 
6.42% 
12.50% 
4 
13 
11.93% 
9.70% 
5 
4 
3.67% 
7.90% 
6 
7 
6.42% 
6.70% 
7 
10 
9.17% 
5.80% 
8 
11 
10.09% 
5.10% 
9 
4 
3.67% 
4.60% 
Sum 
109 


[Edit: Typo in the results] 





You may also want to look at:
 Conditional serial correlation (resulting from return smoothing)  Conflicts of interest (see here: http://www.adviserinfo.sec.gov/IAPD/Content/Search/iapd_OrgSearch.aspx)
I'd also be very interested looking at the data.




aaron


Total Posts: 746 
Joined: Mar 2006 


"Sniff test"? I'm a chisquare man, I sniff at your sniff test.
Using your numbers, it fails with 0.0007 significance. But you have to adjust for the range of returns (Benford's only works if they cover a few orders of magnitude). The table below shows what the adjusted frequency should be and what my count of the actual first digits is (mine differ somewhat from yours).
Digit  Expected  Observed
1 66 84 2 66 29 3 28 20 4 12 13 5 12 11 6 9 17 7 7 14 8 7 15 9 7 11
This completely fails the chisquare test (significance = 1.5*10^9). However there are a lot of things that can cause violations of Benford's rule. It's most suspicious when the first digits are nearly uniform. You can sort of see this here if you ignore the 1's, the other number counts are far too even.
Zenith has a wider range, so the unadjusted Benford frequencies are reasonable. Its chisquare has a significance of 0.048, which fails at the 5% level, but given the assumptions necessary for Benford, that can be considered a strong pass. You can be pretty confident the numbers were not made up digit by digit, unless it was done by someone who knew about the Benford effect. That doesn't mean the numbers are correct, someone could have used a random number generator, or made them up in one set of units and transformed them, or produced this distribution in other way.
Digit  Expected  Observed
1 38 33 2 15 19 3 7 14 4 13 11 5 4 9 6 7 7 7 10 6 8 11 6 9 4 5





jslade


Total Posts: 899 
Joined: Feb 2007 


Man, my nose sucks. First off, I forgot to (abs) my returns: when I do, I get the same count numbers you guys do. I admit (re: chisq), I was just eyeballing the distribution for lots of 9's and not enough 1's. My copy of R is on the fritz for some reason.
For what it is worth, when you truncate the Fairfield Sentry series to be (about, I'm not being careful here) the same length as the zenith data, you get something more similar to the Zenith counts: 36 11 12 7 9 11 8 10 7
BTW, where did you get the significant digit adjusted Benford expectation from? My wakipedia level knowledge of Benford doesn't include this.
Might be fun to run chisquared on a rolling window of the Fairfield Sentry returns to see if there is some obvious shift in where he pulled the numbers from, but it's time for weekend merriment.
Might also be fun to know why the Zenith and Fairfield Sentry returns are so different. Notice that? They don't line up at all. Not even the signs are the same.
Edit add: oh, duh: I didn't realize Zenith ~= Bernie. My apologies for blathering so. Still might be neat to run the rolling window on FFSentry looking for changepoints. Or maybe I can jigger some kind of Kalman thingee. 
"Learning, n. The kind of ignorance distinguishing the studious." 


aaron


Total Posts: 746 
Joined: Mar 2006 


Benford combines a real insight, which is useful for practical work, and a "gotcha" that's a lot of fun but gets overanalyzed into nonsense. The law was discovered empricially and given some math later, then given too much math (including a silly "proof").
The insight is that the distribution of the leading digit of random numbers is far from uniform except in very special cases, such as data uniformly distributed between 1 and 10. For example, the table below gives the first digit distribution of the standard Normal, followed by the Benford frequency, log_10(k+1)  log_10(k). You get the standard Normal frequency just by adding up the ranges (there are an infinite number of course, but almost all the probability is contained in forty or so). That's how you do an adjusted Benford, guess a distribution, fit the parameters, break it into all first digit ranges with probability of more than, say, 0.000001, and add them up.
1 0.3595 0.3010 2 0.1290 0.1761 3 0.0865 0.1249 4 0.0810 0.0969 5 0.0774 0.0792 6 0.0734 0.0669 7 0.0691 0.0580 8 0.0644 0.0512 9 0.0596 0.0458
The "gotcha" part is what gets you the elegant log_10(k+1)  log_10(k). Most people with enough quant skills to understand the question are likely to say you need to know the distribution of a random variable in order to predict the distribution of its first digit. They are surprised to learn empirically that the first digit shapes are pretty consistent for most types of data that span a few orders of magnitude, in fact only very special underlying distributions produce strong divergences from the pattern.
Suppose you count the first digits of all integers from 1 to N. If N has the form 10^k  1, then the distribution is uniform. However, as N increases from 10^k to 10^(k+1)  1, we first add 10^k 1's, the 10^k 2's and so on. Therefore, if we pick N at random, it's clear that the distribution of first digits has to be skewed to the left.
Now consider picking an integer from N1 to N2. As we go from N1 to the first 10^k  1, we will have a distribution of first digits skewed to the right, using the reverse of the argument above. But the counts will be orders of magnitude smaller from N1 to the first 10^k  1, than from the last 10^k to N2. So as long as the data span several orders of magnitude, we expect strong skewness to the left.
Now suppose a take a continuous measurement of something with a unimodal distribution. There is some range of 10^k to 10^(k+1)  1 that is most probable. Within this range, any digit could be most frequent. For the standard Normal, for example, the most probable range is 0.1 to 1.0, which has 60% (I'm taking absolute values since the sign doesn't matter) of the probability mass. Within this range 1 is the most likely digit, and 9 the least. But if I used a Normal with mean of 0.5 and standard deviation of 1, 0.1 to 1.0 is still the most probable range, but now 4 is the most common digit within the range.
Measurements larger than the most probable range are more likely to be 1's than 9's, because the distribution is unimodal, so probabilities are decreasing away from the most common range. Measurements less than the most probable range are more likely to be 9's than 1's. However, just as with the integer counts, the first effect is likely to dominate. The range immediately above the most probable range is 10 times as big, while the range immediately below it is onetenth the size. Therefore the distribution has much more chance to decay between the 1 above the range and the 9 above the range; as from the 9 below the range to the 1 below the range.
The real reason this is true is people pick units and units of measurement to force it. If we measured outside temperature in degrees Kelvin, the natural scale, the leading digit would almost always be 2, so we subtract 273 arbitrarily to get a variety of leading digits. If we measured sound power it would span so many orders of magnitude we would need scientific notation to preserve the significant digits. So use decibels, meaning we convert to a logarithmic scale. If something has multiple modes, we tend to redefine it with things like seasonal adjustments or deviations from subgroup mean.
In my opinion the real insight of Benford's law is we measure things in a way that makes it true. 




kronon


Total Posts: 176 
Joined: Nov 2007 


You didn't need benfords law to tell you that Madoff's returns were a pile of crap.
I'd hate to see the idiots who invested (especially the 'experts'), use this as an excuse. I can hear it now: "you needed a highly sophisticated mathematical decoding operation to determine if the returns were fraudulent or not, and even then it was not clear..."
But for the less obvious BS it is a neat test.
I'm kind of worried now that in addition to all the tedious madoffinduced questions we'll now get "Please explain your benford law 0.051 pvalue". 
"Some fool in a laboratory might blow up the universe unawares" Rutherford, 1903. 


Nonius

Founding Member Nonius Unbound

Total Posts: 11742 
Joined: Mar 2004 


indeed. and, in fact, it would be nice to have a detector that works on insidertradinggenerated returns as well as fabricated returns .
aaron, I like benford and will look into it, but it's not that intuitive and works only in certain cases. 
Location: somewhere in CfA2 Great Wall. 



krott


Total Posts: 3 
Joined: Mar 2009 


This reminds of Russian election fraud story.
Now the regions that were out of line got better, and reported "normal" numbers. Small problem, the numbers were EXACTLY the same for all of them:
Check the tables at the bottom :) 



jslade


Total Posts: 899 
Joined: Feb 2007 


Aaron: thanks! In case anyone else is screwing around with Benford, I found this neat page with useful R functions in it:
http://www.stat.auckland.ac.nz/~fewster/benford/index.html
I also found the dude who ran Benford on Bernie when the scandal broke. Looks like he got something similar, but also didn't test for goodness of fit.
Finally, I modded up the code I found there so it works on returns (removed the 0's and take the absolute value), and included the data sets mentioned above, in case anyone else wants to screw around with it. Maybe it will be helpful to someone no guarantees she did it right. Fun stuff to fiddle with anyway. funddata.plot() to see the Zenith and Fairfield results.
Attached File: jsladeBenfordGood.zip 
"Learning, n. The kind of ignorance distinguishing the studious." 




You can do this easily in a spreadsheet too. If a return is in cell A2, use =TEXT(ABS(A2),"0.00E+00") to coerce returns to text, then use the LEFT function to grab the first digit. COUNTIF then gets your distribution. 



jslade


Total Posts: 899 
Joined: Feb 2007 


Spreadsheets? I should really learn how one day.
Meanwhile I have been screwing around with bias ratios for phraud detection. Bernie fails (as does Zenith), though so do some of whatever EDHEC uses for their strategy returns in the fEcofin package. Comparing to some market neutral equity funds (which are all 12 in bias ratio), Bernie fails a lot. It's not a great detector, as it's really just based on some intuitions of how fraudsters work, but it's an extra data point to look at.
Might be useful to someone, so here, have some scripts: Attached File: biasratio.zip 
"Learning, n. The kind of ignorance distinguishing the studious." 



jslade


Total Posts: 899 
Joined: Feb 2007 


This perhaps should be an editadd, but having looked at the bias ratio for some other options shop returns since Saturday, and it looks like Bernie would fit in nicely with the lot of them. Zenith has a high bias ratio because they short vol, which is a smooth series of returns if you do it right (someone please correct me if I'm wrong; it ain't no random walk anyway, the way equities are). In a short vol strat, when you do get negative returns, they are likely to be several standard deviations out. Zenith is also high, because they are good. Medallion would very likely "fail" this fraud detector also, because they only have a few down months. So, bias ratio is really just a dumb performance measure of the centroid of returns around zero. Might be useful as a component of a fraud detector, but it ain't good by itself, unless you're limiting your search to a sector which makes bets on random walks.
Funny, RiskData got the Financial Times to trumpet their new fraud detector (aka bias ratio) to the skies as something which will make all your whites whiter. I'll never believe anything I read there again. If they're just reissuing marketing crap, I may as well read the Weekly World News.

"Learning, n. The kind of ignorance distinguishing the studious." 


purbani


Total Posts: 82 
Joined: Apr 2005 


Hi jslade  generally agree with your comments I have been looking into it also with a view to adding it to our Infiniti Analytics Suite ( IAS ) package. It's obviously better than nothing but the main problem I have with it is once again it seems to predicated on the assumption of 'normality'. As such it will flag legitimately positively skewed funds as well as the dodgy ones. We are looking into it a bit further to see if we can come up with a better version but no cigar for infallible quantitative detection of fraud yet. Mark one eyeball, site visit and some hard questions still the order of the day. 



