I understand how BPF works for transparently steering TCP connections. But the a...

tgraf · on Dec 9, 2021

The model I'm describing contains two pieces: 1) Moving away from sidecars to per-node proxies that can be better integrated into the Linux kernel concept of namespacing instead of artificially injecting them with complicated iptables redirection logic at the network level. 2) Providing the HTTP awareness directly with eBPF using eBPF-based protocol parsers. The parser itself is written in eBPF which has a ton of security benefits because it runs in a sandboxed environment.

We are doing both. Aspect 2) is currently done for HTTP visibility and we will be working on connection splicing and HTTP header mutation going forward.

tptacek · on Dec 9, 2021

What does an HTTP parser written in BPF look like? Bounded loops only --- meaning no string libraries --- seems like a hell of a constraint there.

star-trek-fleet · on Dec 10, 2021

Bounded loop plus 1M instruction limits in the 5.4 kernel (no record at hands about the exact version), gives a large range of supported headers. Also note that these BPF code are on the network level, which is subject to the MTU limit as well, which usually is 1500 and now can be 10s of KBs (65,525 bytes maxmial in theory accroding to https://www.lifewire.com/definition-of-mtu-817948, but my networking knownledge is poor). These makes it possible to effectively handle all possible headers.

HTTP is actually fine.

HTTP2 will be a bigger issue as it has HPACK, and Huffman coding, that would be very complicated to maintain inside BPF runtime. I haven't thought about it closely yet. But based on our experience at http://px.dev, I am not aware of any glaring technical obstacles.

tptacek · on Dec 10, 2021

This is interesting and all, but I've also written bounded loop BPF code on 5.6 kernels, and it is not easy to get the verifier to accept seemingly obvious loops. I'm not saying it's impossible, I'm saying I'd like to see what this code actually looks like. I'd be a little shocked if it just looked exactly like Node's HTTP parser.

star-trek-fleet · on Dec 10, 2021

I need to double verify when was the bounded loop patch got into the kernel, I suppose it's 5.6 as you mentioned above.

What I actually was thinking is that one can write C code and ask the compiler to unroll it.

``` pragma(unroll) for (..., i < 100; ++I) { parsing code } ```

Also the other comment note the stake bookkeeping for HTTP to maintain the state when the parsing spans multiple packets, assuming here we are talking about XDP probes.

One quick idea is to use BPF_TABLE(, uint128_t, some data structure) I haven't tested if uint128_t is OK as key type. And the data structure in the value needs more thoughts. Roughly I am thinking turn any state bookkeeping into some BPF tables, and keyed through whatever data that matches the context. This probably means uint128_t as Ipve/6 address, and a nested map with key as the port. Or combined v4 IP & port.

It'll be interesting. I suppose the code from Isovalent will eventually be open sourced. Or is it already so? Haven't checked yet.

tptacek · on Dec 10, 2021

Bounded loops are 5.3. I'm just saying that after like 9 months of development following their introduction, it remained tricky to get the verifier to accept loops with seemingly obvious bounds. I know the feature works (I did ultimately get some loops working!) but I could not have straightforwardly ported userland C code to do it.

You've always been able to unroll loops, but of course you're chewing up code space doing that.

I don't know what BPF_TABLE is (I think it's a BCC-ism?) but BPF hash maps can take 16 byte keys. But notice that you're now writing something that looks nothing at all like Node's HTTP parser.

I'm not doubting that they did this work. I just want to know what it ends up looking like!

star-trek-fleet · on Dec 10, 2021

Oh nice, we haven't tried bounded loop, because our product is committed to support as old as 4.13.

BPF_TABLE is BCC.

Matthias247 · on Dec 10, 2021

another challenges I can see is wheee to actually store the state of a connection. Even if we just focus on http/1.1 then not all headers will be received at one, and data from previous segments needs to be carried forward. Would it be eBPF maps? Those also seem rather limited for this usecase, and are probably also not extremely fast.

I can imagine getting something to work for http/1.1 - but http/2 with multiplexing and stateful header compression is a completely different beast.

tgraf · on Dec 9, 2021

It looks not too different from the majority of HTTP parsers out there written in C. Here is an example of NodeJS [0].

[0] https://github.com/nodejs/http-parser/blob/main/http_parser....

tptacek · on Dec 9, 2021

Node's HTTP parser doesn't have to placate the BPF verifier, is why I'm asking.