Mistro


Total Posts: 21 
Joined: Aug 2018 


Hello,
I have created a binary factor based model for predicting earnings volatility and I would like some help in understanding the model a bit better.
My model is based off of Aaron Brown's Sports betting system. Here is the link to the article. chromeextension://oemmndcbldboiebfnladdacbdfmadadm/https://storage.googleapis.com/wzukusers/user28782334/documents/595bf1eb3454dT5FzugN/NFL%20Demonstration.pdf
What he does is add a few weak binary predictors together to create a system that makes money.
For my research, I am using ORATS and Bloomberg Data. The original data set was very large. After cleaning it, I was left with ~600 unique equities and ~9000 earnings events over the past 4 years. My dependent variable (DV) is : ((Straddle Price After Event  Straddle Price Before Event)/Straddle Price Before Event)*1. I am multiplying by 1 because I would like the DV to be as if I was short vol through the events.
First thing I did was find the median/mean PnL throughout all earnings. This is the benchmark. The mean Pnl was 3% and the median Pnl was 7%. So the expected value was 3% per short straddle trade across all earnings events (before commissions and slippage).
Next I created a bunch of Binary factors. For example  Near52WeekHigh. This variable tells you if the stock is within 5% of the 52WeekHigh and assigns a 1 if it is otherwise a 0.
The way I tested each factor was comparing the performance of the factor vs the non factor, grouped by years. If the factor out performed by a significant amount in more than 3 of the 4 years, it was added to my important factor list.
I have a question regarding creating binary factors (and factors in general). Does it make sense to turn something like Market Cap or P/E into a binary factor? Ie Find the median MarketCap and create a factor  1 if greater than median and 0 if below?
After filtering the data set for 6 of the best factors, our mean return was 13% and the median return was 16% ! However, our total trades were reduced from ~9000 to only 418. It out performed during every single year starting in 2015.
If anyone could give me some feed back on how to create simple factors that would be greatly appreciated. I think my next step would be to use random forest to find the best node splits.
Thank you. 
men lie, women lie, numbers don't 

