Forums  > Software  > Re: Any quant programs taking advantage of the G4/G5 velocity engine?  
     
Page 1 of 1
Display using:  

exotiq


Total Posts: 243
Joined: Jun 2004
 
Posted: 2005-03-31 01:18

Considering I don't even think QuantLib XLL runs on Excel for the Mac, this might be a bit of a stretch, but from what I understand, the velocity engine in the G4 and G5 chips makes very short order of any hard vector/matrix processing, so should be ideal for finite difference calculations, option greek visualizations, calibration, or even the occasional least squares monte carlo.

Does anyone know of any software or easy to use libraries that take advantage of this?


ajk


Total Posts: 2
Joined: Mar 2005
 
Posted: 2005-03-31 01:56
Did come across this library for Mac OS that may be of interest:

http://www.pixelglow.com/macstl

Appears to support some of the Intel style SIMD units as well. I think I remember reading something on the Apple site that a number of their standard libraries make use of the SIMD unit so if you make use of their standard vector and such like classes you may be using an optimized library unknowingly. IBM's DeveloperWorks has some more technical details on exploiting the PowerPC architecture.

FDAXHunter
Founding Member

Total Posts: 8117
Joined: Mar 2004
 
Posted: 2005-03-31 06:49

For starters, you should write your code using the AltiVec libraries in the (Accelerate.framework) such as vDSP.h, cblad.h, clapack.h.
Compile with the gcc -O3 -framework Accelerate main.c

Then, you can start looking at using an autovectorization optimizer (Such as VAST/AltiVec)

And if that's not enough, start writing your own optimized code (compile using -faltivec). You then can access the AltiVec API in your code and do things like:

vec_add( vector1, vector2 );

Hope this helps.


Salman Pushdie

James
NP High Priest

Total Posts: 2024
Joined: May 2004
 
Posted: 2005-03-31 10:21
Damn FDAX, you are gonna have me an array of Macs before the year is out. I can already tell.

Prior to the publication of the Black-Scholes model in 1973, the quest for a valuation formula that would describe option prices reflected one of the most elusive goals in financial economics. Though much work was done in the 1960s, many of the insights and techniques used to solve the problem were presented or anticipated at the beginning of the twentieth century by Louis Bachelier, an obscure French mathematician. These innovations include the first graphical representation of option pricing, a mathematical description of stock prices utilizing Brownian motion and anticipating the efficient market hypothesis, and the first formal option pricing formula.

FDAXHunter
Founding Member

Total Posts: 8117
Joined: Mar 2004
 
Posted: 2005-03-31 11:50

Yes, Apple rule as far as computation goes.... against Intel's architecture anyway...

However, if you are looking at highy arithmetically intensive operations on lots of data, such as vector math, there is a more specialized solution that allows for order(s) of magnitude of what you could achieve in todays main processors (and no, I don't care how good of an assembly programmer you are: It cannot be done).

CPUs are generally designed with serialism in mind. You may have several pipelines (which, in itself, can lead to problems). This is a conscious design choice as modern CPUs need to be all-purpose number crunching machines. This generalism means that you need an awful lot of so called "control hardware" (Hardware that directs where the computation should go and who should do it rather than performing any computation directly. Remember Turing machine: we need to be able to do pretty much anything).
So, we need to always feed the CPU the right data, otherwise it stalls while the right data is fetched from main memory. This is why modern CPUs have such huge caches (from a surface point of view, alot of the area in a modern CPU is caches: Level 1, Level 2 and Level 3) - So you don't have to take a trip down memory lane all the time!

Apple's AltiVec attempts to to come to grips with data parellalism (applying the same computation to multiple data elements) but obviously does this only for limited bandwith.

Matrix operations are much more efficient when applied the "stream programming" paradigm (Take a stream of numbers, the longer the better and crunch them through a single computation making use of vast data parallelism).

What supports this kind of data parellism? Answer: The GPU

GPUs are very high-bandwith, massively parallel machines. They are progammable stream processors. You can achieve 10 times the speed that you could ever hope to achieve with todays CPUs. And the rift is growing, as GPU speed does not adhere to Moore's law. While a CPU may grow it's capability at 70% a year, modern GPUs are growing at 400% a year.

So, push your matrix operations to the GPU and you don't have to worry about cache misses, branch prediction failures or anything.


Salman Pushdie

dgn2


Total Posts: 1907
Joined: May 2004
 
Posted: 2005-03-31 16:16
Every time you provide more information about these things, I get sucked in a little more. I am taking a hard look at the G5s. (I am also thinking about building a tiny recording studio in my home and these things are great for creative work)

...WARNING: I am an optimal f'er

Kutilya
Quote Machine

Total Posts: 1274
Joined: Jun 2004
 
Posted: 2005-03-31 16:29



I want an Apple Cry


Lost!

IAmEric
Phorgy Phynance
Banned
Total Posts: 2961
Joined: Oct 2004
 
Posted: 2005-03-31 16:40
Hi FDAXHunter,

My friend, an expert in scientific computing, has been pushing the idea of using the GPU for matrix operations for years. I didn't realize things had gotten to the point where it is possible. The last I checked, things were limited because you had to store numbers as "color" values which limited the precision. Is precision no longer an issue with numerics via the GPU?

Thanks,
Eric

FDAXHunter
Founding Member

Total Posts: 8117
Joined: Mar 2004
 
Posted: 2005-03-31 16:51
This has long passed. GPUs have programmable shaders now allowing you to directly work on a number of types (such as float, half, int etc). You don't need to encode into textures and triangle lists any more, you can pass streams directly into the GPU and write your own little programs.

Salman Pushdie

dgn2


Total Posts: 1907
Joined: May 2004
 
Posted: 2005-03-31 16:53
Can you actually purchase a ready-made workstation with a GPU? Is specialized software required?

...WARNING: I am an optimal f'er

FDAXHunter
Founding Member

Total Posts: 8117
Joined: Mar 2004
 
Posted: 2005-03-31 16:56
Pretty much every computer nowadays for sale comes with a reasonably powerful GPU. Sometimes you can even fit two GPUs in a machine (done so for the G5s using the Apple 30" display for example, uses two GeForce FX 5200 Ultra, made by nVidia).

Salman Pushdie

IAmEric
Phorgy Phynance
Banned
Total Posts: 2961
Joined: Oct 2004
 
Posted: 2005-03-31 17:12
FDAXHunter,

You da man Worship

I want to learn how to do that sheah. Where do I start?

Sweeeeeeeeeet.

Eric

Edit: This is phucking hot man. I love it!

FDAXHunter
Founding Member

Total Posts: 8117
Joined: Mar 2004
 
Posted: 2005-03-31 17:27

A word of caution, while playing around with the GPU for speed purposes may seem exciting, in the way that maybe adding N20 injection to your car is exciting, I can guarantee that your productivity will suffer... alot. While GPU general purpose computation has come a long way in the last couple of years, you can't use it to write overly generic code. Basically you'll be writing a new routine for pretty much everything. While you can see speedups on one or two orders of magnitude for specialized operations (such as exponentiation) it's very likely that the real speedup is perhaps on the order of 2 to 5. XServer cluster nodes cost 3K per pop, so you spend 30K and you got ten times the power.

The best way to learn would be to look at nVidias Cg language (while ATI seems to make more powerful chips at the moment, nVidia does a better job at "tutoring" programmers): Learn about Cg

 


Salman Pushdie

Luiz Paulo


Total Posts: 116
Joined: Jun 2004
 
Posted: 2005-03-31 19:50

small threadjack. this from a research report i got a couple of weeks ago:

For as long as most can remember, Intel has held at least an 80% share of the PC microprocessor market. However, nothing lasts forever, and the recent introduction of the Cell chip by a consortium of Sony, Toshiba and IBM could signal the beginning of the end for Intel’s reign. The prototype Cell chip recently unveiled operates at 256 gigaflops, which is about ten times as fast as the speediest desktop PCs available today. The Cell chip is often referred to as a “supercomputer on a chip”. Cell chips can also be networked to multiply processing power. Combining only four Cell chips could produce over a teraflop of processing power, moving it into the ranks of today’s fastest computers. The Cell chip is expected to make its market debut in the next-generation Sony PlayStation 3 (PS 3), in 2006, which will contain four of the chips. The PS 3 is expected to produce unrivaled complex three-dimensional video and sound. IBM is planning initially to introduce the Cell chip in a supercomputer workstation, reportedly reaching 16 trillion flops in processing speed. Toshiba plans to produce a Cell chip-driven high-definition TV in 2006. “It is so fast there is no point talking about the number”, said a Cell engineer in a recent Forbes magazine article. “The beauty is in its flexibility”. Based on IBM’s POWER architecture for its “nucleus”, each Cell chip consists of eight “synergistic processing elements” that function as simple but powerful, independent processors to maximize performance, according to The Economist. The Cell chip divides the assigned tasks among the different processors to get the job done faster.


IAmEric
Phorgy Phynance
Banned
Total Posts: 2961
Joined: Oct 2004
 
Posted: 2005-03-31 23:31
My friend is channeling a message through me *body spasm*

I know about Cg. It is still not clear to me how one could retrive the results (yes, Cg provides data types like float, float4 etc, but excepting the very recents cards, they don't support it. I could be wrong on this.) other than through the framebuffer. If your friend knows, could you ask him give a pointer?

FDAXHunter
Founding Member

Total Posts: 8117
Joined: Mar 2004
 
Posted: 2005-04-02 00:49

No offense, but Cg has been around for, dunno, 2 years, maybe 3 years? And in high performance software development, what is the point of using anything but the latest hardware anyway? In a way, your friend (who I'm sure is an expert in this) sounds a bit like "Yes... 64 bit processing has some advantages, but nothing but the latest generation of CPUs support it" ? (confused here)

So I'm not sure what your friend calls "recent"... none of the machines around me are older than 18 months, but that may be different for people working in different fields. And I think we could care less what a machine bought 5 years ago can or can't do.

Anyway, the way the way things are returned in stream programming is... through streams! (Gee, who would have guessed). You're friend is right, you can only go through the frame buffer (and then write back to the texture memory). This is quite intentional though.

Your friend may be interested in these links: gpgpu.org and BrookGPU

I have some PDFs but they are like 60 MBs large. How can I get these to ya?


Salman Pushdie
Previous Thread :: Next Thread 
Page 1 of 1