The cost of high-performance GPUs, typically $8,000 or more, means they are frequently shared among dozens of users in cloud environments. Three new attacks demonstrate how a malicious user can gain full root control of a host machine by performing novel Rowhammer attacks on high-performance GPU cards made by Nvidia.
The attacks exploit memory hardware’s increasing susceptibility to bit flips, in which 0s stored in memory switch to 1s and vice versa. In 2014, researchers first demonstrated that repeated, rapid access—or “hammering”—of memory hardware known as DRAM creates electrical disturbances that flip bits. A year later, a different research team showed that by targeting specific DRAM rows storing sensitive data, an attacker could exploit the phenomenon to escalate an unprivileged user to root or evade security sandbox protections. Both attacks targeted DDR3 generations of DRAM. //
On Thursday, two research teams, working independently of each other, demonstrated attacks against two cards from Nvidia’s Ampere generation that take GPU rowhammering into new—and potentially much more consequential—territory: GDDR bitflips that give adversaries full control of CPU memory, resulting in full system compromise of the host machine. For the attack to work, IOMMU memory management must be disabled, as is the default in BIOS settings. //
A separate mitigation is to enable Error Correcting Codes (ECC) on the GPU, something Nvidia allows to be done using a command line. //
Kevin G
Ars Scholae Palatinae
21y
1,483
Thursday at 2:54 PM
#12
New
The ECC functionality on nVidia cards can take a pretty big performance hit as they do not include extra DRAM for ECC. Thus on a 32 GB workstation GPU, the amount of usable memory is reduced down to a 28 GB. Thus if you were using that extra memory and flipped on ECC, performance tanks as the remaining 4 GB gets paged out to host CPU memory. Beyond that, the ECC algorithm itself as the where the parity data for ECC resides is some what configurable. If itis on the same memory controller (which generally means the same memory chip as often there is only one chip per memory channel), then the calculation is done inside the memory controller relatively quickly. This of course comes at the higher integrity risk of losing data if a memory chip fails but this does protect against random bit flips. The other ECC algorithm is more akin to software RAID5 which rotates where the parity data resides across the chip and across the various internal memory controllers. Thus to compute ECC, one memory controller has to wait for another control to read that information and pass it down which is big performance penalty.
What this article doesn't cover is HBM which can both have extra stacks of memory in a channel as well as extra bits of parity on each die in the stack. Most ECC leverage the extra memory on the die plus rotating where the parity data resides. The end result is effectively the same as having an extra DRAM chip on a DIMM. (For those who don't know, an 8 GB ECC DIMM will contain ten 1 GB memory chips but the extra 2 GB is used exclusively for ECC and does not alter the usable capacity.)
HBM controllers are rather complex and the reason why capacities like 141 GB exist is due to a single die failure in one of the many stacks. Instead of disabling a wholes stack and reducing the memory capacity down to 120 GB, only the explicitly broken die is disabled.