Linux kernel security needs an overhaul

The Linux kernel today faces an unprecedented safety crisis. Much like when Ralph Nader famously told the American public that their cars were “unsafe at any speed” back in 1965, numerous security developers told the 2016 Linux Security Summit in Toronto that the operating system needs a total rethink to keep it fit for purpose.

No longer the niche concern of years past, Linux today underpins the server farms that run the cloud, more than a billion Android phones, and not to mention the coming tsunami of grossly insecure devices that will be hitched to the Internet of Things. Today’s world runs on Linux, and the security of its kernel is a single point of failure that will affect the safety and well-being of almost every human being on the planet in one way or another.

“Cars were designed to run but not to fail,” Kees Cook, head of the Linux Kernel Self Protection Project, and a Google employee working on the future of IoT security, said at the summit. “Very comfortable while you’re going down the road, but as soon as you crashed, everybody died.”

“That’s not acceptable anymore,” he added, “and in a similar fashion the Linux kernel needs to deal with attacks in a manner where it actually is expecting them and actually handles gracefully in some fashion the fact that it’s being attacked.”

Jeffrey Vander Stoep, a software engineer on the Android security team at Google, echoed Cook’s message: “This kind of hearkens back to last year’s keynote speech when [Konstantin “Kai” Ryabitsev] compared computer safety with the car industry years ago. We need more and we need better safety features, and with it in mind this may cause inconvenience for developers, we still need them.”

For his part, Kai, a senior systems administrator at the Linux Foundation, who was unable to attend this year’s summit, is pleased that this car safety analogy is finding traction.

“We approach security today as though we are still living in the world of the 1990s and 2000s, computers in a data centre managed by knowledgeable people,” he told Ars. But, he pointed out, most computers today—laptops, smartphones, IoT devices—are not managed and secured by IT professionals.

“For the cases where computers are not well protected in the hands of end-users who are not IT professionals, and who do not have any recourse to IT professional help, we need to design systems that proactively protect them,” he said. “We have to change the way we approach this dramatically, the same way the vehicle manufacturers in the 1970s did.”

This is, however, easier said than done.

Killing bug classes, not political dissidents

The clear consensus at the Linux Security Summit was that squashing bugs is a losing strategy. Many deployed devices running Linux will never receive security updates, and patching a security hole in the upstream kernel does nothing to ensure the safety of an IoT device that could be in use for a decade and may forever be ignored by the manufacturer.

Even devices that do receive patches may see long gaps between public bug discovery and a patch being applied. Cook gave the example of an Internet-connected door lock that an end-user might well use for 15 years or more. Such devices are likely to receive sporadic security patches, if at all.

Worse, the average lifetime of a critical security bug in the Linux kernel, from introduction during a code commit to public discovery and having a patch issued, averages three years or more. According to Cook’s analysis, critical and high-severity security bugs in the upstream kernel have lifespans from 3.3 to 6.4 years between commit and discovery.

linux security fix timeline

Red = critical severity bugs; orange = high; blue = medium; and black = low.
The X axis is total number of security bugs; the Y axis shows the kernel version. So, the height of the bar shows how long that bug was present
Image credit Kees Cook

“The question I get a lot is ‘well isn’t this just theoretical?'” he said. “No-one’s actually finding these bugs to begin with, so there’s no window of opportunity. And that’s demonstrably false.”

Nation-state attackers are watching every commit, looking for an opening, he said, and “people are finding these bugs sometimes immediately when they’re introduced.”

He went on: “This seems to be a big thing that people for some reason just can’t accept mentally. You know, like ‘well I have no open bugs in my bug tracker, everything’s fine.'”

How, then, can the kernel proactively defend itself against bugs that have not yet been reported—or even implemented?

The answer, said Cook, could be a matter of life and death for some people: “If you’re a dissident, an activist somewhere in the world, and you’re getting spied on, your life is literally at risk because of these bugs. As we move forward, we need devices that can protect themselves.”

linux security bugs lifespan

A closer look at the lifespan of critical- and high-severity security bugs in the upstream Linux kernel.
X axis is the number of security bugs; Y axis shows the kernel versions in which each security bug was present.
Image credit Kees Cook

Protecting a world in which critical infrastructure runs Linux—not to mention protecting journalists and political dissidents—begins with protecting the kernel. The way to do that is to focus on squashing entire classes of bugs, so that a single undiscovered bug would not be exploitable, even on a future device running an ancient kernel.

Further, since successful attacks today often require chaining multiple exploits together, finding ways to break the exploit chain is a critical goal.

Kernel drivers suck

However that’s hard to do when the vast majority of kernel bugs come from vendor drivers, not the upstream Linux kernel, Stoep said.

“Android does in fact inherit bugs from the upstream kernel,” he said, “but our data shows that most of Android’s kernel security vulnerabilities live in device drivers.”

A slide from Stoep's presentation at the Security Summit.


A slide from Stoep’s presentation at the Security Summit.
Image credit Jeffrey Vander Stoep


And, he explained, many more are introduced by manufacturers, meaning that securing the Linux kernel against bugs in code over which upstream has no control becomes the challenge.

“[Kernel] maintainers say ‘bugs you didn’t inherit from upstream are not upstream’s problem,’ but I think the reality is that this is what most Linux systems look like, and it’s not limited to Android devices,” he said. “Kernel defence will protect both code that comes from upstream as well as out-of-tree vulnerabilities. That’s a really important point.”

He was quick to add that he was not calling out any particular vendor for poor security practices. As he put it, to audience chuckles, “they’re really all doing poorly.”

The bug stops here

While the technical challenges the Linux kernel faces in protecting itself against zero-days are “incredibly complex,” Cook said that the politics of submitting patches upstream can be even more challenging.

Coming across as a consummate diplomat, both in his talk and in person, he gently chided the buck-passing over how kernel security issues are discovered, fixed, and deployed.

“I hear a lot of blame-shifting of where this problem needs to be solved,” he told the audience. “Even if upstream says ‘oh sure we found that bug, we fixed it,’ what kernel version was it fixed in? Did it end up in a stable release? Did a vendor backport it? Did the carrier for the phone take that update from the vendor and push it onto phones?”

He went on: “The idea is to build in the protection technologies from the start, so that when a bug comes along, we don’t really care.”

But these mitigations come with trade-offs to performance or maintainability—something that, he hinted, was a continuous struggle to convince upstream kernel maintainers to accept.

“Understanding that developing against upstream means you’re not writing code for the kernel, you’re writing code for the kernel developers,” he said.

Not just the developers, either. It’s for everyone in the Internet-connected world we now live in.

“If we are to start being mindful of this new era of computing,” said Kai, “we have to change the way we approach this dramatically, the same way that vehicle manufacturers in the 1970s did.”

Source: Ars Technica