On Monday, Linux kernel creator Linus Torvalds was disappointed bravado About the lack of error correction total (ECC) RAM in personal computers and consumer laptops.
… the misleading and reference policy of “Consumers Don’t Need an ECC”, [made] The ECC memory market disappears.
The arguments against the ECC have always been a complete frivolous. Now even memory manufacturers are starting to do ECC internally because they finally realize the fact that they absolutely have to.
If you are not familiar with ECC RAM, this is probably because you haven’t built or defined dedicated servers with server-level CPUs and motherboards – which, unfortunately, is the only place you actually find an ECC. In short, the ECC RAM includes a minimal amount of additional memory used for error detection and correction.
Memory errors and probabilities
In most modern applications, this means 64 bits each a word Stored in RAM, there are eight scan bits. A single bit error – 0 to 1 flipped, 1 to 0 flipped – can be detected and corrected automatically. Two inverted bits in the same word can be detected but cannot be corrected. Three or more bits are flipped in the same word will Likely They are disclosed, but detection is not guaranteed.
Bit fluctuations can occur for a variety of reasons, ranging from the impact of cosmic rays to minor hardware failure. Widely a study Of Google’s servers, it found that nearly 32 percent of all servers (and 8 percent of all dual in-memory modules) in Google’s fleet experience at least one memory error annually. But the vast majority of these errors are single-bit errors – and since Google uses server CPUs and ECC RAM, this means that the hardware in question stays true to trucking.
In consumer devices, even these single-bit errors – which are more than 40 times more likely than multi-bit errors, according to Google data – are not detected and can lead to instability in systems and corruption in the data.
Bit fluctuations are not always accidental
Not every RAM error is caused by hardware failure or an unintended EMF issue. In recent years, researchers have developed practical, physics-based side channel attacks, using fast, controlled bit fluctuations in areas of RAM that can be accessed from a single application to infer or modify data values in areas adjacent to RAM that they should not be. Are able to.
Although the ECC RAM cannot mitigate horrorPattern attacks that infer adjacent memory values, can generally be stopped Rohamer Attacks – where rapid flipping of bits in one area of the RAM causes bits to change in an adjacent region.
Even when the ECC cannot effectively prevent a Rowhammer attack from affecting the system – for example, when it flips several bits in a single word – it can at least alert the system about the problem, and in most cases, Rowhammer prevented the attack from doing anything other than causing a downtime. (Most ECC systems are configured to shut down the entire machine if an unrecorrectable error is detected.)
Torvalds blames Intel
Memory manufacturers claim that it’s due to economy and low power. And they’re bastards lying – let me once again point to the rowing hammer about how these problems have existed for several generations already, but these fanatics happily sold broken devices to consumers and claimed it was an “offensive”, when it was always “we’re cutting corners.”
How many times has a paddle hammer like a bit-flip occurred only due to pure bad luck in real, non-offensive loads? We will never know. Because Intel was paying shit to consumers.
Torvalds takes the bold position that the lack of ECC RAM in consumer technology is Intel’s fault due to the company’s policy of artificial market segmentation. Intel has a vested interest in pushing deep-pocketed companies toward more expensive – and more profitable – CPUs at the server level rather than allowing these entities to use necessarily low-margin consumer parts.
Removing ECC RAM support from CPUs not directed directly into the server world is one of the ways Intel has kept those markets deeply divided. Torvalds’ argument here is that Intel’s refusal to support ECC RAM in consumer-targeted segments – along with its de facto monopoly in that space – is the real reason why ECC is nearly unavailable outside of the server space.
The usual argument for why no ECC in Consumer Technology is about cost, but we doubt Torvalds has the right to it here. Although ECC RAM is primarily a special hard-to-find part, it only costs about 20 percent per DIMM compared to other retailers. The real problem is that without motherboards and CPUs that support them, it won’t do you any good.