 Strange
|
|
| Total Posts: 1250 |
| Joined: Jun 2004 |
| |
|
I am trying to build up a data aquisition and analysis server. The two important inputs are that (a) this is a "hobby" project, so I am a bit cost sensetive and (b) i live in Manhattan, so I can't keep a machine home.
On the requirements side, I need a simple Linux server which would run some recurrent scripts and will also have flexibility to install whatever software products I need (e.g. I will need a instance of RStudio server). I also need reliable backups. None of my stuff is so computationally complex that I would need additional server power nor do I need any sort of real time capability.
So, I see two choices. I can either buy a physical box and find a place nearby to put it (there are a few spots that would collocate a 1U server for approximately $100/month plus internet charges). As an alternative, I can subscribe to some sort of cloud service, but i am just not sure it will allow me to reliably run my scripts and will allow all of the software i'd need.
Anyone has experience with either? Are there services aside from Amazon that prodvide cloud servers? |
It's buy futures, sell futures, when there is no future! |
|
|
 |
 goldorak
|
|
| Total Posts: 385 |
| Joined: Nov 2004 |
| |
|
Amazon does a perfect job in my opinion.
The best part of it being the concept of snapshot where you can keep as many snapshots of a given disk as you wish. You then can recreate the disk (and even replicate the same data to other disks on other machines) from that snapshot. Very useful to do backtestings on a copy of your data on one machine while the "normal" machine keeps on doing its data retrieving to the original disk.
|
If you are not living on the edge you are taking up too much space. |
|
 |
 Strange
|
|
| Total Posts: 1250 |
| Joined: Jun 2004 |
| |
|
Do they have any restrictions in terms of the software that you allowed to run there? |
It's buy futures, sell futures, when there is no future! |
|
|
 |
 goldorak
|
|
| Total Posts: 385 |
| Joined: Nov 2004 |
| |
|
No. It is your machine, your system image, and your software. As long as you are not sending spam or trying to hack thousands of Windows machines, I guess you should be OK  |
If you are not living on the edge you are taking up too much space. |
|
 |
 FatChoi
|
|
| Total Posts: 107 |
| Joined: Feb 2008 |
| |
|
I'm a very happy user and you have total control over your machine. For R Studio you could start by borrowing one of these. On costs the hourly charges can add up but you get a big discount if you are willing to pay an upfront so on demand a "small" instance will cost $60/month but you can halve that with a $70 dollar upfront so $410 per year instead of $700. Storage costs are negligible until you get into hundreds of GB. Alternatively you pay by the hour so you could set things up to run on demand so you could run it for two hours a night sending a report somewhere static. I'm sure you can do all of this with R but Python has excellent client tools for AWS like boto and StarCluster, there's an easy way of running IPython and Notebooks on AWS on demand, and there are good tools like Requests and dexy for automating reporting and data gathering. |
|
|
|
 |
 Strange
|
|
| Total Posts: 1250 |
| Joined: Jun 2004 |
| |
|
I'd probably be looking to set up my own instance (for variety of reasons, mostly because i want it to be tightly integrated with the data storage model), but anything on the order of a hundreed-two per month would not be a big deal. Oh, and I am definitely not looking to migrate any of my tools to Python, at least not yet.
How do backups work, is there a way to arrange a physical backup?
|
It's buy futures, sell futures, when there is no future! |
|
 |
 FatChoi
|
|
| Total Posts: 107 |
| Joined: Feb 2008 |
| |
|
Probably shouldn't have said borrow, you can copy an AMI to your own EBS(virtual disk, doesn't need a machine attached to stay around) and have the machine run that. It will be your machine you just have less set-up to do.
You can run a RAID array of EBS instances for reliability. You can take snapshots of EBS disks and store on the more robust S3 storage and you can send them a disk and get them to dump your data onto that. There's probably someone who will send you a DVD every week if you point them at your data but I've never looked. S3 seems impressively robust and I have good bandwidth so I'm quite happy but incremental backups can be easier with databases than direct file based storage plans. |
|
|
|
 |
|
amazon is awesome but setup can be a little complicated (you end up paying for the server, for the bandwidth and for the disk space, separately if i'm not wrong.)
This adds up, and even if you plan on switching your instance off during the weekend it will probably cost more than any other VPS provider.
If you're looking to save a bit on costs you could look at VPSlink (tried it and liked it, you get to pick your favorite distro) or linode (havent tried it but i hear good things about them) |
|
|
 |
 Strange
|
|
| Total Posts: 1250 |
| Joined: Jun 2004 |
| |
|
VPSLink looks good, it's a package deal and for now i only need 20-30 GB at most to start. Going to try them today-tomorrow, though still need to figure out how to set up the RStudio server instance. Upon reading the docs, maybe I am better off hiring a consultant adn taking careful notes - a lot of details there.
PS. I would imagine once i am chugging full bore (we probably talking 100s of GB of data once it's all set up), I will have to go with a local server solution. |
It's buy futures, sell futures, when there is no future! |
|
|
 |
 goldorak
|
|
| Total Posts: 385 |
| Joined: Nov 2004 |
| |
|
you do not need "backup" with Amazon. It is kind of already integrated in the EBS concept.
|
If you are not living on the edge you are taking up too much space. |
|
 |
 Hansi
|
|
| Total Posts: 190 |
| Joined: Mar 2010 |
| |
|
I've used this as an alternative to Amazon for stuff that needed to be always on for multiple months with high RAM and HDD reqs: http://www.kimsufi.co.uk/
Doesn't allow for burst scaling but that's not what I was after. |
|
|
|
 |
|
Building from source might be complicated but if you choose ubuntu or debian there apparently are ready made binary packages that you can install with these 3 commands:
sudo apt-get install r-base wget http://download2.rstudio.org/rstudio-server-0.96.316-i386.deb sudo dpkg -i rstudio-server-0.96.316-i386.deb
amazon would probably be your only choice if you decided to do some huge backtests -- you could execute them in parallel over multiple instances using the doMC package... but i seem to understand this is not the case (yet) |
|
|
 |
 Strange
|
|
| Total Posts: 1250 |
| Joined: Jun 2004 |
| |
|
Cool! This looks like the best solution (lots of HD/RAM at the expence of traffic). I don't really need burst scaling either, this is much more of a storage+development platform.
PS. would they be ok w me being in the US and all? |
It's buy futures, sell futures, when there is no future! |
|
|
 |
 Hansi
|
|
| Total Posts: 190 |
| Joined: Mar 2010 |
| |
|
| Yes, should be okay. I know some US guys doing iOS games that use them for their European originating traffic and they didn't mention any issues signing up. |
|
|
 |