Forums  > Pricing & Modelling  > Model-free option pricing by Reinforcement Learning  
     
Page 1 of 4Goto to page: [1], 2, 3, 4 Next
Display using:  

Nudnik Shpilkes


Total Posts: 47
Joined: Jan 2009
 
Posted: 2018-04-21 17:51
Any opinions on this research?

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3087076


https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3102707

katastrofa


Total Posts: 453
Joined: Jul 2008
 
Posted: 2018-05-03 23:22
No numerical experiments testing convergence. A big problem with RL is that it's not data-efficient. He doesn't show how long it takes for his agent to converge to optimal policy.

Nudnik Shpilkes


Total Posts: 47
Joined: Jan 2009
 
Posted: 2018-05-04 13:41
Hmm, the second paper shows examples. Your second statement is wrong too - efficiency of RL depends on how it is formulated. Deep RL is indeed very data-hungry due to an astronomical number of parameters needed there. But there is no such general statement like you make. Depending on a formulation, RL may or may not be data-efficient. The third comments is wrong too - see the second paper.

katastrofa


Total Posts: 453
Joined: Jul 2008
 
Posted: 2018-05-05 09:05
You have a point, I was thinking about Deep RL.

Re convergence: there are numerical experiments (which report the number of MC paths used), but they only test the model in the Black-Scholes world. This is not an interesting case, because we know how to price options in this case. An interesting case would be learning to price an option by observing real market data. Then, the question is: how long does it take for the agent to learn the market dynamics and price the option? This is not answered in the paper.

Maybe I have too high expectations about the method. If all it's useful for is the Black-Scholes world, it's not very useful.

Nudnik Shpilkes


Total Posts: 47
Joined: Jan 2009
 
Posted: 2018-05-05 14:38
Agree, real data is the most interesting case. In Machine Learning people developed a good habit to try models on synthetic data first - something that I think is under-utilized in Finance. As Q-learning used in these papers is a model-free method, it should work with real data too, and probably have similar convergence speed to what shown in these papers. Due to model independence, I think your high expectations are justified, but of course it would be interesting to see how it works with real data, especially in a portfolio setting, and see whether it will allow us to forget the non-existing volatility smile problem as a bad dream of Wall Street :)

Strange


Total Posts: 1422
Joined: Jun 2004
 
Posted: 2018-05-05 16:13
People have priced options using neural networks, used all sorts of non-linear regressions and now this. It’s a waste of electrons, imho

I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'

katastrofa


Total Posts: 453
Joined: Jul 2008
 
Posted: 2018-05-05 21:44
If you don't have a model for the stochastic process driving the stock prices, how are you going to run your Monte Carlo simulation?

london


Total Posts: 307
Joined: Apr 2005
 
Posted: 2018-05-09 19:35
A weakness of applying ML for return forecasting is insufficent volume of (stationary) data, so I hear of people using synthetic data.

The part I dont understand is how does one create synthetic data (to train on) without a model of the system?

But if I did have a _good enough_ model of the system, then why wouldnt i just use that model to trade?

This is a genuine question and im proably missing something blindingly obvious to many folks here. Thank you for a simple answer that my small mind might grasp.

Nudnik Shpilkes


Total Posts: 47
Joined: Jan 2009
 
Posted: 2018-05-09 20:05
Q-Learning is a method that is model-free - it does not use any distributional assumptions about returns. So you don't have to break your head anymore choosing, say, between Heston and whatever else - it is entirely irrelevant.

Now, to test the framework, you can use synthetic model generated from a known distribution, and test your model/implementation. In this case, you know what to expect.


Hope this helps.

Strange


Total Posts: 1422
Joined: Jun 2004
 
Posted: 2018-05-10 01:00
> Q-Learning is a method that is model-free - it does not use any distributional assumptions about returns.
In that case, how is it better than simply using historical distributions for pricing? There are plenty of robust statistical methods that allow you to generate distributions from the historical returns and use them for pricing.

Not trying to be a luddite, but
(a) I just don't see any appeal to what these people are trying to do in real life
(b) don't see any improvement over the other ML-based pricing methods that did not stick

I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'

Nudnik Shpilkes


Total Posts: 47
Joined: Jan 2009
 
Posted: 2018-05-10 16:05
Sorry, but this does sound like a luddite talk. What are "other ML-based pricing methods that did not stick"? Are you against ML in general? Do you think that Linear Regression is ML or not? What are your "robust statistical methods" that you think make other approaches obsolete, and how exactly you use them for pricing?

katastrofa


Total Posts: 453
Joined: Jul 2008
 
Posted: 2018-05-10 16:52
"Q-Learning is a method that is model-free - it does not use any distributional assumptions about returns. So you don't have to break your head anymore choosing, say, between Heston and whatever else - it is entirely irrelevant."

1. How do you model the discounting factor? BSM model solves this for you by expressing everything in the risk-neutral measure. Your paper assumes the agent operates in the real-world measure. But then can't just use the risk-free rate for discounting rewards.

2. The data-driven Q-learning version is described thus in the 1st paper: "The data available is given by a set of N trajectories for the underlying stock S t (expressed as a function of X t using Eq.(24)), hedge position a t , instantaneous reward R t , and the next-time value X t+1".

a) In order to use the model, you need to know an un-observable quantity (some trader's hedge positions, i.e. their deltas) to use this method. The agent is not learning just the market dynamics, it's also learning some other trader's model (their hedge positions).

b) Xt is derived from stock prices St using some parameters mu and sigma. Where do you take their values from?

c) The model requires N such paths (where N is undoubtedly large, say 10,000 or more). In order to price a 3-month options, you need 3x10,000 months worth of data. This is impossible.


If you want to build something worthwhile, solve this problem: an agent observes traded stock prices, traded bond prices and traded option prices. They have a given financing rate and can invest cash at some other rate. Then, they learn a strategy to hedge a European option at non-standard strike. The goal is to maximise the amount of cash in the bank after the option expires or is exercised.

Strange


Total Posts: 1422
Joined: Jun 2004
 
Posted: 2018-05-10 18:03
> What are "other ML-based pricing methods that did not stick"?
There were attempts to price options using neural nets and using robust regressions. That was fairly long time ago and it was more or less a waste of electrons. My expectation is that this method would be of equal value.

> Are you against ML in general?
No. In fact, I use some simple ML approaches in some of my strategies. I just do not believe that ML can be used to price convex instruments IRL, simply because of the dimensionality issues. @london said it far more eloquently.

> What are your "robust statistical methods" that you think make other approaches obsolete, and how exactly you use them for pricing?
LOL, what? Did I say anywhere that risk-neutral pricing is "obsolete"?

I was simply pointing out that if you really want to use model-free approach to pricing an option, you can estimate probability density using whatever your favorite method (including KDE, if you like to use ML) and multiply your payoff by that density. That approach is far less data-hungry.

I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'

Nudnik Shpilkes


Total Posts: 47
Joined: Jan 2009
 
Posted: 2018-05-10 18:32
""my expectation is that this method would be of equal value" - You obviously did not take a look at the papers I asked about, because they offer a very different approach from neural nets, and they go without estimation of a probability density.

Strange


Total Posts: 1422
Joined: Jun 2004
 
Posted: 2018-05-10 19:23
(a) "equal value" - as worthless as neural-net based learning turned out to be. I have read the paper, though I can't claim to have a deep understanding of the statistics involved

(b) IMHO, it's not the matter of the ML approach, it's a matter of required training data and the resulting dimensionality of the problem.


I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'

Nudnik Shpilkes


Total Posts: 47
Joined: Jan 2009
 
Posted: 2018-05-11 07:19
Bravo Katastrofa, finally I see meaningful questions that go beyond pure entropy-enhancing replies in the spirit "I did not read the papers but here is what I think about neural networks".

To your questions:

1. How do you model the discounting factor? BSM model solves this for you by expressing everything in the risk-neutral measure. Your paper assumes the agent operates in the real-world measure. But then can't just use the risk-free rate for discounting rewards.

Once you accounted for risk in rewards, you can discount them using a risk-free discount rate.

2. The data-driven Q-learning version is described thus in the 1st paper: "The data available is given by a set of N trajectories for the underlying stock S t (expressed as a function of X t using Eq.(24)), hedge position a t , instantaneous reward R t , and the next-time value X t+1".

a) In order to use the model, you need to know an un-observable quantity (some trader's hedge positions, i.e. their deltas) to use this method. The agent is not learning just the market dynamics, it's also learning some other trader's model (their hedge positions).

Good point. The agent learns from recorded actions (re-hedges). They should 'match' the market, otherwise the trader continuously looses money. In the worse case, you can take purely random actions, the model will still learn the price asymptotically. There is also a version without observed rewards in he second paper.

b) Xt is derived from stock prices St using some parameters mu and sigma. Where do you take their values from?

These are hyper-parameters, not parameters. No need to calibrate them in-sample.

c) The model requires N such paths (where N is undoubtedly large, say 10,000 or more). In order to price a 3-month options, you need 3x10,000 months worth of data. This is impossible.

No it is not not. You can use bootstapped data for both prices and hedges. Don't see any issue here.

If you want to build something worthwhile, solve this problem: an agent observes traded stock prices, traded bond prices and traded option prices. They have a given financing rate and can invest cash at some other rate. Then, they learn a strategy to hedge a European option at non-standard strike. The goal is to maximise the amount of cash in the bank after the option expires or is exercised.

Your suggestion is respectfully declined. Reminds me a suggestion to spend the rest of one's life trying to compute stability of a table with four legs.

katastrofa


Total Posts: 453
Joined: Jul 2008
 
Posted: 2018-05-11 09:51
"1. How do you model the discounting factor? BSM model solves this for you by expressing everything in the risk-neutral measure. Your paper assumes the agent operates in the real-world measure. But then can't just use the risk-free rate for discounting rewards.

Once you accounted for risk in rewards, you can discount them using a risk-free discount rate."

How do you account for the risk in rewards? Your answer is just "kicking the can down the road".

2. The data-driven Q-learning version is described thus in the 1st paper: "The data available is given by a set of N trajectories for the underlying stock S t (expressed as a function of X t using Eq.(24)), hedge position a t , instantaneous reward R t , and the next-time value X t+1".

a) In order to use the model, you need to know an un-observable quantity (some trader's hedge positions, i.e. their deltas) to use this method. The agent is not learning just the market dynamics, it's also learning some other trader's model (their hedge positions).

Good point. The agent learns from recorded actions (re-hedges). They should 'match' the market, otherwise the trader continuously looses money. In the worse case, you can take purely random actions, the model will still learn the price asymptotically. There is also a version without observed rewards in he second paper. "

The second version still assumes you're observing the trades. Hence, not usable in practice. If I can observe someone's trades, I also can walk to them across the trading floor and ask them what pricing model they use.

"b) Xt is derived from stock prices St using some parameters mu and sigma. Where do you take their values from?

These are hyper-parameters, not parameters. No need to calibrate them in-sample."

You still need to choose their values. How do you do this? Hyperparameters matter. There is a lot of debate about how you can overfit via hyperparameter optimisation. And your model is not data-efficient.

"c) The model requires N such paths (where N is undoubtedly large, say 10,000 or more). In order to price a 3-month options, you need 3x10,000 months worth of data. This is impossible.

No it is not not. You can use bootstapped data for both prices and hedges. Don't see any issue here."

I'm surprised that you don't! Bootstrapping assumes conditional independence - and you agent is trying to learn correlations between X_t, a_t and X_{t+1}. You still don't see an issue here? In other words: by bootstrapping, you're making a model assumption. If you use bootstrapping to train your agent, you're not model-free.


"If you want to build something worthwhile, solve this problem: an agent observes traded stock prices, traded bond prices and traded option prices. They have a given financing rate and can invest cash at some other rate. Then, they learn a strategy to hedge a European option at non-standard strike. The goal is to maximise the amount of cash in the bank after the option expires or is exercised.

Your suggestion is respectfully declined. Reminds me a suggestion to spend the rest of one's life trying to compute stability of a table with four legs."

I fail to see the analogy, but I gather that you prefer to stick to recasting old stuff using new buzzwords ;-)

Nudnik Shpilkes


Total Posts: 47
Joined: Jan 2009
 
Posted: 2018-05-11 15:22
Katastrofa, you are of course free to stick to outdated garbage called risk-neutral models for another 40 years. Here is another model that keeps actions unobserved and produces a market model.

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3174498

pj


Total Posts: 3400
Joined: Jun 2004
 
Posted: 2018-05-11 16:35
> outdated garbage called risk-neutral models
And, pray, why are they garbage?

The older I grow, the more I distrust the familiar doctrine that age brings wisdom Henry L. Mencken

Nudnik Shpilkes


Total Posts: 47
Joined: Jan 2009
 
Posted: 2018-05-11 16:53
Because they only compute the mean of an option hedge portfolio, and ignore risk of mis-hedges- it is not priced.

pj


Total Posts: 3400
Joined: Jun 2004
 
Posted: 2018-05-11 17:29
Not true.
If your market is incomplete, you have
several risk neutral models,
thus several available means.

The older I grow, the more I distrust the familiar doctrine that age brings wisdom Henry L. Mencken

Nudnik Shpilkes


Total Posts: 47
Joined: Jan 2009
 
Posted: 2018-05-11 17:35
Yes, that is correct. So you suggest to estimate risk in options by running a few incomplete market models with different pricing measures? :)

pj


Total Posts: 3400
Joined: Jun 2004
 
Posted: 2018-05-11 17:44
No, I prefer sacrificing virgins.
Works same as ML.
Magic.

The older I grow, the more I distrust the familiar doctrine that age brings wisdom Henry L. Mencken

Nudnik Shpilkes


Total Posts: 47
Joined: Jan 2009
 
Posted: 2018-05-11 17:55
Let me know how it goes :)

"Economics ended up with the theory of rational expectations, which maintains that there is a single optimum view of the future, that which corresponds to it, and eventually all the market participants will converge around that view. This postulate is absurd, but it is
needed in order to allow economic theory to model itself on Newtonian Physics." (G. Soros)

For a more detailed discussion, you may also find this interesting:

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1530046

Strange


Total Posts: 1422
Joined: Jun 2004
 
Posted: 2018-05-11 19:37
Look, for someone that never actually managed an options book to understand what the actual purposes the option pricing model serves. I really think a lot of academics waste a lot of efforts trying to solve problems that do not really have much practical value.

I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'
Previous Thread :: Next Thread 
Page 1 of 4Goto to page: [1], 2, 3, 4 Next