Forums  > Pricing & Modelling  > smart processing for noisy option chain snapshots?  
     
Page 1 of 1
Display using:  

Strange


Total Posts: 1434
Joined: Jun 2004
 
Posted: 2018-07-18 16:37
Help me brainstorm this, please. Let's assume that I have a noisy set of option prices for time T and underlying X (*). Unfortunately, it's a pretty valid assumption since data that comes from historical vendors is pretty bad, seemingly regardless of the cost. I have datasets from several options data vendors and they are all equally bad. My task is to get the best possible implied forwards and implied volatility surface from that set.

Problem 1. The underlying price is frequently snapped asynchronously vs the option chain so I can not use it (I suspect it has to do with the fact that the underlying comes from a different vendor or on a separate feed).

Problem 2. Some of the option prices are snapped with a delay or simply have bad quotes, so they add a fair bit of noise. It's especially significant for underlying symbols that have a fairly sparse strike space.

Problem 3. American options on div paying stocks make implying forwards a bit tricky. For European options, I can assume that each put/call pair has it's own forward price and deal with noise at the level of implied volatility by fitting a reasonable model. For American options, the only way I can get an implied forward is by assuming that forwards implied by the p/c parity for OTM calls and I am finding that frequently the forwards are all over the place.

So, I see the a couple hacky solutions, each one imperfect:

(1) using implied forwards for problem 3. It is a "market-proper" solution, I can de-cab the far OTM calls and get the best forward using put-deltas as the weights. Problem is that if the forward is bad, the whole slice goes bad.

(1a) Then I'd use regular BS to imply vols, excluding any ITM or ATM(ish) calls. Problem - excludes ITM call data

(1b) Use some form of American approximation to get vols. Problem - if the forward is bad, this can cause some really bad vols


(2) a "holistic" solution would be to imply a risk-neutral distribution based on the tight call and put spread prices for each expiration (still excluding ITM and ATM(ish) calls). This, supposedly, will give me the best implied forward and outlier cleaning all at once. One choice is check it against a normal distribution and clean it this way (either by regression or some form of smoothing). Alternatively, I can maybe try to fit something like G/C distribution or a simple polynomial. Problem with this approach is that can probably produce wildly warped results (IMHO) when it's ran in batch. So I drop bad prices based on this, mean of distribution is my implied forward.

(2a/b) Same as 1a and 1b with the same problem

So far I am leaning towards (2) , especially for hard cases. Any deep thoughts from the community?



I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'

c10


Total Posts: 9
Joined: Jul 2015
 
Posted: 2018-07-22 06:28
These are tough and interesting problems. Quants, especially at OMMs, have thought about them for many years. The quality of their answers vary widely.

I can't explain what I do about all these issues right now, but if you send me some samples of your noisy snapshots for bids and asks (and dividend projections, if needed & available), I'll send you back our implied spot & borrows (for American ex-style, i.e. stocks and ETFs) or forwards (for European, i.e. indices), market implied vols, and fitted vol curves for each term.

Each output, whether borrow, forward, implied vol or fitted vol curve parameter, etc, will come with an error bar.

Maybe this is helpful to you.

schmitty


Total Posts: 55
Joined: Jun 2006
 
Posted: 2018-07-22 07:27
I think the answer is (2) with triage. Vendor snapshots are often so bad, especially in less liquid names, that you really won't be able to fit usable per-expiry curves, let alone entire surfaces, for much of the US American (single name) universe. Just cut back your expectations and model only the most liquid 1000-odd names with usable snapshot data.

You might get a better contemporaneous underlying bid/ask by pulling from your firm's archived historical tick/message feed.

When fitting your RND's, obviously move arbs-at-mid (common in vendor historical snapshots) to no-arb levels first. If G/C means Gram-Charlier be careful as many of the parameter sets you fit won't define a density (e.g. integrate to one and no negative), so you'll have to constrain the fit. Breeden-Litzenberger first and then fit on that is more robust vs. a direct fit. For names with upcoming earnings a gaussian mixture of three distributions fits well (no need for G/C or other fat-tailed distro) but you'll need a good initial guess to make the EM fit of the mixture model consistent.

This is, overall, a difficult problem.

Strange


Total Posts: 1434
Joined: Jun 2004
 
Posted: 2018-07-22 08:01
@c10 i actually sent an email about a trial for the vol library a little while ago. You know, if you have the technology to do this, you should consider selling datasets or making it a service with a value-add fee (send us your data, we will fit it). I’ll be your first subscriber :)

@shmitty To be honest, I’d love to get a good fit for the top 100-200 names including the ETFs to start with. A thousand names would be a birthday present :)

I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'

c10


Total Posts: 9
Joined: Jul 2015
 
Posted: 2018-07-22 23:23
@strange: Sorry, your email seems to have disappeared into the ether. I sent one to the address in your profile. If that also somehow disappears, please try infoVolaDynamicscom again (it usually works, as far as we know... ;-)

HankScorpio


Total Posts: 477
Joined: Mar 2007
 
Posted: 2018-07-24 13:56
My apologies for crashing this thread, but on a related note can anyone recommend some historical (daily, at least 10 years) implied vol surface data vendors for:
(i). The major developed market equity indexes. Actually: SPX, STOXX, FTSE-100, AS51, Nikkei-225.
(ii). FX: AUDUSD, EURUSD, GBPUSD, JPYUSD.

If we have to buy option data to do back these out ourselves then we'll do it, but I'm trying to avoid this if possible.

Thanks a lot.

Strange


Total Posts: 1434
Joined: Jun 2004
 
Posted: 2018-07-24 17:47
Bloomberg data is not good enough?

I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'

bs2167


Total Posts: 20
Joined: Feb 2010
 
Posted: 2018-07-25 16:48
@strange- have you looked at the smoothed data set offered by ORATS on quandl? I have no experience with it (and am so lazy that I didn't take the time to read about their process) but it could potentially make your job a bit easier (assuming you don't find their methodology objectionable).

HankScorpio


Total Posts: 477
Joined: Mar 2007
 
Posted: 2018-07-26 12:41
@strange: Boss wants me to look for other provider, but it might just have to be Bloomberg to start with.

Strange


Total Posts: 1434
Joined: Jun 2004
 
Posted: 2018-07-27 05:07
@bs2167 - I have not, to be honest, though my expectation would be that it's just as sh*tty is the rest of them (of course, the sample data would be clean as a whistle, but the rest...)

@HankScorpio - ah, I see. Bloomberg also has vol surface data that's more detailed, but it's not cheap and kinda crappy (well, in-line with the rest of the vendors). For indices it would be OK, though.

I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'

tbretagn


Total Posts: 260
Joined: Oct 2004
 
Posted: 2018-08-02 12:59
@HankScorpio I have used bank data; JPM has good stuff, bloomberg good enough for FX i reckon

Et meme si ce n'est pas vrai, il faut croire en l'histoire ancienne

tbretagn


Total Posts: 260
Joined: Oct 2004
 
Posted: 2018-08-02 12:59
double post

Et meme si ce n'est pas vrai, il faut croire en l'histoire ancienne

HankScorpio


Total Posts: 477
Joined: Mar 2007
 
Posted: 2018-08-03 12:55
@tbretagn: Thank you.

Strange


Total Posts: 1434
Joined: Jun 2004
 
Posted: 2018-08-04 16:37
@tbretagn JPM data was for equity products? Did you have to jump through hoops to get it?

I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'

tbretagn


Total Posts: 260
Joined: Oct 2004
 
Posted: 2018-08-06 09:18
@Strange I use JPM data for equity index products and citi data for FX (they have less equity data).
The swaption data is ok too.
Regarding hoops well it depends on your relationships (it seems you need a decent sales credit to get access to the full suite, but I'm sure that can be arranged).

Et meme si ce n'est pas vrai, il faut croire en l'histoire ancienne

Strange


Total Posts: 1434
Joined: Jun 2004
 
Posted: 2018-08-07 05:27
@tbretagn If I can get good data, I'll engage in any sort of relationship!

PS. Sent you an email


I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'

Strange


Total Posts: 1434
Joined: Jun 2004
 
Posted: 2018-08-11 19:35
I had a chance to look at some vol data from dealers. Obviously, that's the way to go, but they want an arm and a leg in return.

I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'

EspressoLover


Total Posts: 330
Joined: Jan 2015
 
Posted: 2018-08-12 00:31
Not an options guy, but do have a lot of experience with the pitfalls of unreliable data. Let me throw out an alternative. Have you considered just buying raw book data directly from the exchange. Erroneous prices or asynchronous updates are very rarely an issue with exchange book data. Even between venue, the clocks are probably going to be synchronized to no more than a few milliseconds.

Obviously this is a lot more expensive than buying repackaged vendor data. But from a statistical learning standpoint, having a few months of clean data is often worth more than years of unreliable data. I would at go through the exercise of calculating how much quantity of data you could get if you spent your budget on book data from the exchanges. Even if not, sometimes just getting a small sliver of exchange data is worth it to benchmark vendor data against.

Good questions outrank easy answers. -Paul Samuelson

Strange


Total Posts: 1434
Joined: Jun 2004
 
Posted: 2018-08-12 06:57
@EspressoLover, I have considered purchasing tick/book data initially and decided against it. It's not the cost, it's the sheer amount of data that would be a bitch to handle.

Recently, I have thought about the problem a fair bit, since it's kinda preventing me from developing some strategies. I think the right answer might be going with some approximations/cleaning solutions for the initial testing which I am working on (might outsource, actually, since it's a self-contained project). Later, if I think there is alpha in my approaches, there are several vendors that provide pre-fitted volatility surfaces (vols/divs/borrows) for single names and ETFs that they average from several dealers, I might go that route.

I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'

ronin


Total Posts: 326
Joined: May 2006
 
Posted: 2018-09-04 10:06
@strange, I think you have two different problems here.

Cleaning up options data that may or may not be asynchronous with the underlying isn't too difficult. If you are talking up to a few minutes of asynchronicity, you are looking at a tick or two price uncertainty in the underlying. That translates into a tiny uncertainty in the implied vol. Nothing that a decent vol surface interpolator can't handle. You will probably find that you are within the bid ask at all times.

On the other hand, it sounds like your entire reason for doing this is to extract some alpha from tick by tick mispricings. I don't see how that can possibly survive the sort of data cleaning that you want to do.

Exchange data is probably the way to go, if that is what you are trying to. The volume of data is what you will be facing in trading anyway - you might as well worry about it now.

"There is a SIX am?" -- Arthur

Strange


Total Posts: 1434
Joined: Jun 2004
 
Posted: 2018-09-05 14:21
I am mostly working on relative risk-premium types of problems, so I don't really care about tick-by-tick mispricings. But even that these low-frequency levels, issues with things like borrow rates, missing quotes, missalligned forwards are pretty real.

I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'
Previous Thread :: Next Thread 
Page 1 of 1