Update May 14th, 2010: Gerwin Klein, a project lead for L4.verified, has posted some insightful comments. Also it's worth reading their website here that clearly explains what assumptions they make, and what they really prove, and what they don't.

You must have heard about it before: formally verified microkernels that offer 100% security... Why don't we use such a microkernel in Qubes then? (The difference between a micro-kernel and a type I hypervisor is blurry. Especially in case of a type I hypervisor used for running para-virtualized VMs, such as Xen used in Qubes. So I would call Xen a micro-kernel in this case, although it can also run fully-virtualized VMs, in which case it should be called a hypervisor I think.)

In order to formally prove some property of any piece of code, you need to first assume certain things. One such thing is the correctness of a compiler, so that you can be sure that all the properties you proved for the source code, still hold true for the binary generated from this source code. But let's say it's a feasible assumption -- we do have mature compilers indeed.

Another important assumption you need, and this is especially important in proving kernels/microkernels/hypervisors, is the model of the hardware your kernel interacts with. Not necessarily all the hardware, but at least the CPU (e.g. MMU, mode transitions, etc) and the Chipset.

While the CPUs are rather well understood today, and their architecture (we're talking IA32 here) doesn't change so dramatically from season to season. The chipsets, however, are a whole different story. If you take a spec for any modern chipset, let's say only the MCH part, the one closer to the processor (on Core i5/i7 even integrated on the same die), there are virtually hundreds of configuration registers there. Those registers are used for all sorts of different purposes -- they configure DRAM parameters, PCIe bridges, various system memory map characteristics (e.g. the memory reclaiming feature), access to the infamous SMM memory, and finally VT-d and TXT configuration.

So, how are all those details modeled in microkernels formal verification process? Well, as far as I'm aware, they are not! They are simply ignored. The nice way of saying this in academic papers is to say that "we trust the hardware". This, however, might be incorrectly understood by readers to mean "we don't consider physical attacks". But this is not equal! And I will give a practical example in a moment.

I can bet that even the chipset manufactures (think e.g. Intel) do not have formal models for their chipsets (again, I will give a good example to support this thesis below).

But why are the chipsets so important? Perhaps they are configured "safe by default" on power on, so even if we don't model all the configuration registers, and their effects on the system, and if we won't be playing with them, maybe it's safe to assume all will be fine then?

Well, it might be that way, if we could have secure microkernels without IOMMU/VT-d and without some trusted boot mechanism.

But we need IOMMU. Without IOMMU there is no security benefit of having a microkernel vs. having a good-old monolithic kernel. Let me repeat this statement again: there is no point in building a microkernel-based system, if we don't correctly use IOMMU to sandbox all the drivers.

Now, setting up IOMMU/VT-d permissions require programming the chipset's registers, and is by no means a trivial task (see the the Intel VT-d spec to get an impression, if you don't believe me). Correctly setting up IOMMU is one of the most security-critical tasks to be done by a hypervisor/microkernel, and so it would be logical to expect that they also formally prove that this part is done flawlessly...

The next thing is the trusted boot. I will argue that without proper trusted boot implementation, the system cannot be made secure. And I'm not talking about physical attacks, like Evil Maid. I'm talking about true, remote, software attacks. If you haven't read it already, please go back and read my very recent post on "Remotely Attacking Network Cards". Building on Loic's and Yves-Alexis' recent research, I describe there a scenario how we could take their attack further to compromise even such a securely designed system as Qubes. And this could be possible, because of a flaw in TXT implementation. And, indeed, we demonstrated an attack on Intel Trusted Execution Technology that exploits one such flaw before.

Let's quickly sketch the whole attack in points:

  1. The attacker attacks a flaw in the network card processing code (Loic and Yves-Alexis)

  2. The attacker replaces the NIC's firmware in EEPROM to survive the reboot (Loic and Yves-Alexis)

  3. The new firmware attacks the system trusted boot via a flaw in Intel TXT (ITL)

    • If the system uses SRTM instead, it's even easier -- see the previous post (ITL)

    • If you have new SINIT module that patched our attack, there is still an avenue to attack TXT via SMM (ITL)

  4. The microkernel/hypervisor gets compromised with a rootkit and the attacker gets full control over the system:o

And this is the practical example I mentioned above. I'm sure readers understand that this is just one example, of what could go wrong on the hardware level (and be reachable to a software-only attacker). Don't ignore hardware security! Even for software attacks!

A good question to ask is: would a system with a formally verified microkernel also be vulnerable to such an attack? And the answer is yes! Yes, unless we could model and prove correctness of the whole chipset and the CPU. But nobody can do that today, because it is impossible to build such a model. If it was, I'm pretty sure Intel would already have such a model and they would not release an SINIT module with this stupid implementation bug we found and exploited in our attack.

So, we see an example of a practical attack that could be used to fully compromise a well designed system, even if it had a formally verified microkernel/hypervisor. Compromise it remotely, over the network!

So, are all those whole microkernel/hypervisor formal verification attempts just a waste of time? Are they only good for academics so that they could write more papers for conferences? Or for some companies to use them in marketing?

Perhaps the formal verification of system software will never be able to catch up with the pace of hardware development... By the time people will learn how to build models (and how to solve them) for hardware used today, the hardware manufactures, in the meantime, will present a few new generations of the hardware. For which the academics will need another 5 years to catch up, and so on.

Perhaps the industry will take a different approach. Perhaps in the coming years we will get hardware that would allow us to create untrusted hypervisors/kernels that would not be able to read/write usermode pages (Hey Howard;)? This is currently not possible with the hardware we have, but, hey, why would a hypervisor need access to the Firefox pages?

And how this all will affect Qubes? Well, the Qubes project is not about building a hypervisor or a microkernel. Qubes is about how to take a secure hypervisor/microkernel, and how to build the rest of the system in a secure, and easy to use, way, using the isolation properties that this hypervisor/microkernel is expected to provide. So, whatever kernels we will have in the future (better formally verified, e.g. including the hardware in the model), or based on some exciting new hardware features, still Qubes architecture would make perfect sense, I think.