As far as I understood, the solution to this in sandboxes such as the js world, is simply to deny anyone using timers with a resolution that could reveal cache misses. How much is software really relying on timers with this resolution? What would it mean if CPU manufacturers simply gave up and said "to mitigate side channels, you can't have a clock that is so accurate that it lets you measure whether X has happened because knowing that is equivalent to reading any memory".
Or, instead of detecting various things and flushing out sensitive data on some context switch, the CPU just adds noise to the timers instead? I'm gussing this is a complete no-go, but I'm wondering why it is?
Adding noise just makes side channel attacks slower, it doesn't stop them; there are statistical techniques to extract the original signal from the signal plus noise, given enough samples.
For a simple example, imagine you want to distinguish a 1ms difference in the execution time of some operation. Without noise, you just have to time it; now let's randomly add either nothing or 1ms to the operation time, so the "fast" operation will take either +0ms or +1ms, and the "slow" operation will take either +1ms or +2ms. But if you repeat the same operation several times and average the execution times, the "fast" operation will take an average of +0.5ms, and the "slow" operation will take an average of +1.5ms. As you can see, in this simple example the random noise averaged itself to a normal distribution, and the original signal is still visible on top of it.
Yes it would be extremely difficult to make it so slow that it's not a viable attack vector. So long as it has a known distribution it's not random enough because the mean is the unknown value. I remember too little statistics to understand whether it's possible to add randomness such that the measurements are not distributed in a sense that makes the underlying value not be the mean. It does sound impossible at least.
Meaning to matter how many samples you take, the mean (of the samples) is just as variable as an individual sample.
The mean and variance of the distribution (equivalently, of infinite samples) are undefined. The Cauchy is equivalent to a t distribution with 1 degree of freedom.
Infinite variability will be undesirable for lots of reasons though.
The browser makers also had to disable mutable JS shared memory arrays until other mitigations where in place. Having a single thread that continuously increments a shared value serves as a good enough approximation of the CPU clock for these exploits.
I suppose then "adding timing noise" here would also require making sure instructions don't have fixed and dependable execution times, because then you can just manufacture a clock by incrementing a number and knowing how many clock cycles the increment is. So an increment cannot be a known number of cycles. It does sound messy.
Adding random delay makes timing attacks more costly by not impossible. Any random noise can simply be filtered out by performing the attack multiple times and averaging the measurements. This even works over the network with milliseconds of random delays.
That's true of course. So basically adding timing noise is equivalent to adding artificial slowdowns. The only upside I suppose is that it might solve all timing sidechannel attacks in one go. So it's not 3% for one and 4% for the next and so on. It's a one time cost to disable timing as an attack vector.
When you talk about timers and resolution, what do you actually mean? When I hear timer, I think about setTimeout, when I hear resolution, I'm thinking about screen resolutions.
Is that what you mean, or are you referring to other things?
A timer simply reports the current time. In this context, people are using timers to calculate how long an operation takes.
Resolution is the precision that the timer reports the time. For example, it could report seconds elapsed (e.g: 8 seconds), it could report milliseconds (8.432 seconds), it could report microseconds (8.432389 seconds), etc.
Attackers want high resolution timers so that they can distinguish cache hits from misses (8.432389 seconds vs 8.432367 seconds, for example).
In this case, a higher resolution timer might be accurate to microseconds, while a lower resolution might be accurate down to milliseconds. Resolution here is about how low you can drill down before it becomes useless.
Or, instead of detecting various things and flushing out sensitive data on some context switch, the CPU just adds noise to the timers instead? I'm gussing this is a complete no-go, but I'm wondering why it is?