Forums  > Software  > ZFS/dragonfly vs mongo  
     
Page 1 of 1
Display using:  

Rashomon


Total Posts: 186
Joined: Mar 2011
 
Posted: 2018-10-19 11:49
I noticed several years ago that AHL was using mongo. Any experiences / thoughts on using a distributed filesystem rather than a db with all its special APIs?

edit: should have said hammer instead of dragonfly …

prikolno


Total Posts: 17
Joined: Jul 2018
 
Posted: 2018-10-23 21:38
What are you looking to store, what are the applications/use cases and how many concurrent applications/users? Just plain order book data for backtesting? Hard to give you a good answer without knowing these three things. Both approaches work well for completely different use cases, I have 4 different FSes and 2 DBMSes in production and I think a heterogenous/tiered setup like this is necessary if you want the best of all worlds, at the expense that it's more difficult to maintain.

+ ZFS and HAMMER aren't distributed, you still need to publish them over something like NFS. You can also stack your DBMS over an unconventional file system, which is advantageous in certain cases e.g. free node compression in the B+tree indices.

+ Man is still using Mongo. Their entire data set is in the 100s of GBs to single digit TBs. I don't think anyone has put Arctic in production at a larger scale than that. The design emulates an object store and by extension has the pros and cons of an object store: easier to store metadata like versioning, data gaps, fine with high bandwidth but bad for high IOPS access patterns. The side advantage of emulating an object store over a DBMS like Mongo is that you have a query DSL, query optimization, integrity constraints (I'm using the term loosely here since Mongo is NoSQL), ease of cloud deployment, replication and sharding for "free".

+ You can partially replicate the above functionality over your file system in a use case-specific way and trade-off flexibility of a DBMS for performance on your customized solution of course.

EspressoLover


Total Posts: 337
Joined: Jan 2015
 
Posted: 2018-11-01 20:16
> thoughts on using a … filesystem rather than a db with all its special APIs?

For batched WORM data like market data, you don't need any ACID guarantees or transactions. Right off the bat that's a major reason to prefer flat files. Every single language, tool and environment can read files out of the box, filesystems are rock solid, POSIX gives you a ton of capabilities, the structure is completely transparent, directory trees can be ported anywhere (including S3), and compression tools are easy to use and highly efficient.

The biggest reason to move market data out of flat files is secondary indexing. E.g. say you slice your data into the format /[date]/[symbol] Then it’s easy to get all the data for one symbol across a given date range, but painful to sequence the data across every symbol for a single date. Databases obviously can fix this problem. So can indexed file schemes like hdf5 or parquet, but you give up a lot by dropping the POSIX, portability, and stability that comes with native flat files.

But that being said, storage is pretty damn cheap. There’s nothing inherently wrong with duplicating data, as long as you’re hygienic about keeping a single source of truth. You can easily slice and store the same data multiple ways corresponding to different indexing schemes. The marginal cost of a GB is like $0.25 on even the highest end drives.

> ZFS + distributed filesystem

ZFS is really cool technology, but its experimental and unsupported status in linux probably doesn’t make it worth it over ext4.

Most run-in-the-mill needs are more than met by a linux NFS server on 10 Gbe with RAID10 and a couple dozen SSDs. Unless the I/O rate is super-high, or a single instance of the app requires more than one rack of nodes.

If you’re scaled out of NFS, and talking about filesystems distributed over multiple hosts, most of the technology is pretty lacking. Lustre’s the only decent option, but requires custom kernel support. GlusterFS is just terrible and unreliable. And I’ve heard similar things about using Ceph on the filesystem level. Fundamentally, I don’t think it’s possible to make a decent userland filesystem. If you do need that kind of scale, you’re probably better off following Spark’s philosophy and move the computation to the data, rather than the data to the computation.

> I noticed several years ago that AHL was using mongo.

That AHL presentation was so ridiculous that part of me suspects it might be satire. The underlying dataset’s a terabyte. Small enough that their data distribution platform could literally just be USB thumbdrives. Yet for some reason they need a 17 node cluster of high end servers backed by Infiniband SAN.

I’d highly discourage considering Mongo. Googling will review no shortage of issues and shortfalls. There’s maybe a very narrow technical justified use: you can confidently predict that you’ll never ever need even the slightest of relational logic, your data is structured but that specific structure is constantly changing, and you don’t care about performance, stability or integrity. Otherwise, it’s very easy to be seduced into using it because it’s so easy to use in dev, whereas all its warts don’t appear until you’re stuck with it in prod.

There’s maybe a broader business justification that explains its current popularity. That's the stereotypical tech startup with an Instagram-esque growth rate, tons of funding, but a constant shortage of manpower. No need to waste any time because the learning curve is so short and there’s virtually no DBA effort required. As long as you’re flush with VC cash it’s easy to throw more hardware at the problem and horizontally scale out. Eventually you’re going to regret picking Mongo, but by that time you'll have grown into a unicorn with deep reservoirs of engineering talent.

Good questions outrank easy answers. -Paul Samuelson

svisstack


Total Posts: 313
Joined: Feb 2014
 
Posted: 2018-11-02 16:57
@EspressoLover: your post is a killer. total agree.

But I don't actually understand the issue with NFS scaling; you can always put another NFS and make an abstraction where you search/read/write files simultaneously in X number of filesystems.

www.coinapi.io && www.cryptotick.com

prikolno


Total Posts: 17
Joined: Jul 2018
 
Posted: 2018-11-03 11:22
Interesting comments above, much of it makes sense but I think it's important to point out a few issues:

> storage is[...] cheap
But I/O is incredibly expensive. And if your storage is inefficient, then so will your I/O.

> move the computation to the data, rather than the data to the computation.
Best point in your post IMO. And that's why insofar as possible, you should perform any kind of query on your data at the storage layer before materializing it up the I/O hierarchy to your compute nodes - which is one of the areas where databases shine. My experience working with people is that most researchers, quant devs and data scientists write applications from the frame of reference of a single target node rather than a tiered system, which is why there's abstractions (Spark, Aerospike UDFs etc.) to assist with the latter.

However, at the same time you're right that for simple market data, most people aren't thinking of performing any query on the data.

> Most run-in-the-mill needs are more than met by a linux NFS server on 10 Gbe with RAID10 and a couple dozen SSDs.
I wouldn't do this for a few reasons. First, the kind of data you're talking is mostly accessed sequentially, so the only SSDs you'd use are NVMe otherwise it's cheaper and similarly performant to store the data on spindles. NVMe SSDs can sustain over 1.25 GB/s and you'll be bottlenecked on your interconnect, so there's not even any reason for striping. The second reason is that at typical capacities (>=1TB) mirroring gets you below consumer-grade SLA on permanent data loss during resilvering. You'd at least want dual parity. Also, SAS/FC/IB/PCIe cabling and supporting equipment are relatively cheap compared to 10 GbE.

> ZFS is really cool technology, but its experimental and unsupported status in linux probably doesn’t make it worth it over ext4
Are you referring to the FUSE implementation (ZoL)? ZFS has been stable for a *very* long time. Most ZFS implementations I know are sitting on some Solaris branch or FreeBSD. I can go into more detail if you're interested but I know even Theodore Ts'o would advocate ZFS over ext4 for this use case.

> you can always put another NFS and make an abstraction where you search/read/write files simultaneously in X number of filesystems.
Yes but why do it yourself when any modern FS with pooling achieves this across disk level, and say Lustre is maintained by experts with many man years of headstart over you?

jslade


Total Posts: 1138
Joined: Feb 2007
 
Posted: 2018-11-05 19:29
Mongo is shit, and database technology has actually regressed due to lazy open source nitwits touting nonsense like this.

If you're doing lots of active research on largeish data you will end up reproducing some arbitrary subset of a columnar array DB like Kx or J. There should be some modified 10th Greenspuns law on this; "Any sufficiently complicated data operation above 50Gb contains an ad hoc informally-specified bug-ridden slow implementation of half of KDB."

"Learning, n. The kind of ignorance distinguishing the studious."

gax


Total Posts: 18
Joined: Apr 2011
 
Posted: 2018-11-06 02:48
https://github.com/kevinlawler/kerf seems to be an interesting alternative to Kx. Not sure if it's still under active development.

svisstack


Total Posts: 313
Joined: Feb 2014
 
Posted: 2018-11-06 12:46
jslade: I would like to hear more story about the kerf if you willing to share. I could be an interesting road.

www.coinapi.io && www.cryptotick.com

Rashomon


Total Posts: 186
Joined: Mar 2011
 
Posted: 2018-11-06 19:10
thanks for the answers, everyone. much appreciated.

prikolno: no, not even for a financial application. despite the increasing alibaba themes, NP is still the smartest people I can think of to ask a tech question. (clearly)

prikolno


Total Posts: 17
Joined: Jul 2018
 
Posted: 2018-11-06 22:06
Like a web application for SaaS delivery? That has very different requirements. Now you probably need horizontal scaling or multi-clustering, high availability, support for many concurrent users. Your datasets are likely not WORM. And you probably don't need high throughput, but do need low latency random access. A file system can't do things like reoptimize a query for 1 scheduled I/O pass when it sees that there's 5000 users trying to access the same data nearly simultaneously.

There's probably a large subset of your data over which you do want ACID properties and relational integrity (e.g. account info), then another set which is semi-structured or schemaless (e.g. you're not sure what to collect for A/B or canary testing). Also unintuitively to some, a columnar layout actually shines more in these settings than regular financial time series data because you should have very small materialized aggregates. You'll probably be interested in something like Snowflake instead of kdb.

It is still possible to write your own custom solution that still depends heavily on POSIX file system operations - I've done it myself - but I only recommend you do that if you know how exploit modern tricks, like bypassing the page cache with O_DIRECT; managing your own lightweight thread pool with NUMA-awareness - ok these 2 are not so modern; or implementing adaptive radix trees for indexing entirely in-memory. You won't be able to beat Vertica, Snowflake, Cassandra or the likes without knowing these hacks plus domain-specific optimizations of your storage directly for the target data set.

Throw everything said earlier out of the window. Let me know if you'd like to discuss more over email.

prikolno


Total Posts: 17
Joined: Jul 2018
 
Posted: 2018-11-10 01:59
Another thought that crossed my mind: if you're offering a tech product that has multiple end users, you may need a multi-tenant architecture and a way to load balance among those heterogenous users.

You rarely have to deal with that at a trading firm. At a smaller scale, a trading firm is really just 1 team. Even if you have multiple silos across a trading firm, the major resources you'll have to load balance are just your build servers and simulation cluster, which can be done quite easily within your build/CI application and cluster middleware or orchestration service. Colo servers and trading engine instances can be added like a single-tenant configuration, you just give a couple of boxes per venue per team.

Multi-tenancy is messy if your storage layer consists of multiple, separate file systems that you're stacking to increase capacity at each site. It is easier to think of it in terms of performance (IOPS/transaction latency) of your database cluster and how to manage/throttle query traffic from your users to keep within that capacity. Throttling file system operations can be done too obviously, just lower level.

Maggette


Total Posts: 1062
Joined: Jun 2007
 
Posted: 2018-11-10 16:35
"Another thought that crossed my mind: if you're offering a tech product that has multiple end users, you may need a multi-tenant architecture and a way to load balance among those heterogenous users.

You rarely have to deal with that at a trading firm. At a smaller scale, a trading firm is really just 1 team. Even if you have multiple silos across a trading firm, the major resources you'll have to load balance are just your build servers and simulation cluster, which can be done quite easily within your build/CI application and cluster middleware or orchestration service. Colo servers and trading engine instances can be added like a single-tenant configuration, you just give a couple of boxes per venue per team.

Multi-tenancy is messy if your storage layer consists of multiple, separate file systems that you're stacking to increase capacity at each site. It is easier to think of it in terms of performance (IOPS/transaction latency) of your database cluster and how to manage/throttle query traffic from your users to keep within that capacity. Throttling file system operations can be done too obviously, just lower level.
"

IMHO there is a well proven stack with several architecture options for that kind of problem outside of finance. Message based, immutable data, micro service and "reactive manifesto" are the buzz words to google. I made some pleasent experiences with quite large and highly concurrent applications with the classical appache jungle here....Vert.x and Akka

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...

prikolno


Total Posts: 17
Joined: Jul 2018
 
Posted: 2018-11-10 21:26
Agreed.
Previous Thread :: Next Thread 
Page 1 of 1