Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I've gained a huge knowledge of low level programming

This is really interesting. I'm at least passingly aware about high speed/frequency trading, but don't know much about the topic in depth.

How low is low in this field? I'm picturing RTOSes running AVX512-heavy hand-optimized code, FPGA farms, custom network ASICs... how overly optimistic am I being here? Heh

Of course, such a vision is very lop-sided, since HFT depends heavily on high-level intelligence. So perhaps it's realtime(ish) Linux and lots of GPUs.



Not realtime, because that only enforces 'precision', not low latency per se.

When I was working in this field, 2008-2011, there were guys doing fpgas, custom tcp/ip stacks, custom network drivers, dedicated networks and network cards for exchange data coming in and for going out. Mostly linux.

Hardware and lowlevel fun.

Allthough the fastest trades were always done by this one catalonian guy using Windows and .NET. I kid you not.

Good times. Soulless. But good.


How does realtime enforce precision and not latency? I was referring to hard realtime.

And wow, so I wasn't too far off the mark. FPGAs and exotic networking. Huh.

I remember reading a story about a trading floor running on SQL Server, which was doing continuous throughput of 6000 queries/second. I didn't know enough at the time to discern what percentage of that was writes, but I think the point may have been that it was all of it. This was quite a few years ago. So perhaps Windows isn't actually the slowe{st,r} system out there for certain tasks.


As I've always understood (but I'm no RTOS expert) is that RTOS does not guarantee LOWER latency. It guarantees A latency.

But again: not an RTOS expert. We had a lab that would constantly test configurations of hardware and software. And I remember them finding RTOS not being helpful.


That's right, real-time does not mean real-fast. In a hard real-time system, there is a deterministic worst-case bound for response times. "Real fast" CPUs, like the latest and greatest Intel CPUs, are actually pretty difficult to get deterministic bounds on. There are factors like unpreventable SMI events, possibility of L1/L2/L3 cache misses, etc. Often systems that need to be really deterministic, like say an engine controller in a car, run on simple CPUs like the Cortex-R series from ARM.


> "Real fast" CPUs, like the latest and greatest Intel CPUs, are actually pretty difficult to get deterministic bounds on. There are factors like unpreventable SMI events, possibility of L1/L2/L3 cache misses, etc.

Oh yeah. I remember reading something along the same lines about x86 a while back. I guess it didn't really go in properly, heh. Thanks

I'm reminded of the "x86 is high level" thing: https://news.ycombinator.com/item?id=9264195

Also, I think the iPhone 6's NVMe apparently uses a Cortex-R: https://ramtin-amin.fr/#nvmepcie


I'd say AVX512 is maybe not so great, because it can cook your CPU to the point where it slows down the clock. AVX2 probably required. But above all test. Have a bunch of compilers, read about all the options, see what is fastest.

FPGA feed handlers are common, but now that can also be rented.

Whether you're using GPUs depends on what you're up to. A lot of the strategy testing requires a bunch of computing power but not speed. You then take your conclusions and implement something fast that doesn't necessarily use the GPU.

Realtime, but soft real time. It's not like a vehicle ABS system where you have to brake within x milliseconds or someone gets killed. I've seen places where they see the degradation over time and eventually decide it's time for the newest hardware, again.


Ah, I see. That reminds me of https://stackoverflow.com/questions/8389648/ (7 years ago, just normal AVX).

I also just found http://redd.it/8dhp7q asking about AVX512 slowdowns too.

TIL about FPGA feed handlers. (http://redd.it/56tw4n, one of the first hits for the term, was mildly interesting)

Hmm, good point about not needing speed. Yeah, 24 execution units each capable of 3 billion ops/sec is probably more performance than is needed :)

Interestingly, I would have imagine HFT as needing ABS-style hard realtime. But no, it neither needs that nor is simple enough to be encapsulated by that sort of embedded-style approach.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: