But anyway, let's look at the problem again...
Obviously, Blue Pill, like any other hardware based VMM, needs to intercept some events and instructions. One intercept which we need to take care of (in case of SVM technology), is the RDMSR EFER instruction - just because the bit 12th in EFER register signalizes weather processor works in SVM mode or not. So, we need to cheat about it to the guest.
Now, we can measure how many processor 'tics' the given instruction took to execute - all we need to do is to use a RDTSC instruction, which returns the processor's time stamp counter. So I did the measuring and it turned out that normally it takes around 90 ticks to execute RDMSR, while on a 'bluepilled' system it takes about 2100 tics. What a big difference you will say!
But SVM technology (and Intel VT-x also) offers a nice way to cheat the guest about those extra tics, by adjusting a special variable in VMCB, called TSC_OFFSET (in that case we would set this value to something around -2010, just before returning to the guest). As a result, guest can not realize that the RDMSR instruction took extra ticks, by using RDTSC instruction.
So, here's what we need to do: we need to prepare a test piece of code, which would involve calling e.g. RDMSR instruction something like a few millions of times and observer the timing using *external* clock (yes, VMM can also cheat about the internal real time clock). This external clock can also be a human being (=user). For example the detector could display a message to the user:
"Dear user, I'm going to run a test now; if this test took more then 1 minute, that would mean your computer is probably compromised with a VMM based malware. Press OK to continue..."
The first problem with this approach is how to generate a piece of code, which would execute for exactly 1 minute (or any other given amount of time) on a native machine, taking into account that we may have many different processor models, working with many different speeds, etc... One can say, that we can probe the processor speed, using some test instructions which we know for sure that are not intercepted (most of the instructions). But this is problematic, since the hypervisor can cheat as to how many tics those instructions took to execute (as RDTSC can be intercepted itself). Of course this is trivial, when we assume that we can run our detector before and after infection, but this is not the case in most practical scenarios. So, I don't know how to solve this problem (which doesn't mean it can't be solved though)... Any suggestions welcomed.
Without solving the above problem, we're facing a problem of false positives and negatives. Consider that the test above took 5 minutes (instead of one) - now does that mean that we took a too big testing code (because guest were cheated during calibration) and that the very processor just had to spent 5 minutes executing it or was this a sign of an infection - it's just that on a new processor model maybe the RDMSR interception slowdown would be of a factor of 5 instead of 20 as it's with the processor I have right now. And if it was 15 minutes?
Currently this is not a big problem, just because there are only two models of AMD processors supporting SVM on the market and each is available with few different clock speeds. So, we can probably hardcode the testing code into our detector (because the slowdown is so big). But how the situation will change during the next two years, when there will be much more processors supporting hardware virtualization on the market? We would have to have a database of processor models and how much test code we need to use on each of them. (oh btw, and how detector could detect on which model it's running? You bet, using a CPUID instruction, which can be intercepted...)
And still, even if we solved this problem, still this kind of detection would be annoying to users (imagine a user being forced to do this kind of "1-minute test", or even 10sec test, every 15 minutes or so) unless we used some kind of infrastructure providing external time measurement (can't be just public NTP, because NTP packets could be easily intercepted by the malware). So, we would need to setup encrypted NTP servers in each company... Ah great!
So, I find it quite surprising that some people diminish the threat introduced by hardware virtualization based malware. I would like to point out that it's somewhat ridicules situation, when the malware can be reliably written using perfectly documented features of the processor, while we need to do some timing based tricks to detect it :) Are we switching roles with malware writers?
What we need is a reliable detector, something which would return 0 or 1 depending whether we're inside a VM or not. And I really don't see how we can create such a program (i.e. a standalone generic detector).
For completeness, I should also mention, just as I did during my talks, that we're aware of another attack against Blue Pill which should be very reliable and that can be implemented as a standalone program, but unfortunately it seems to allow only for crashing the system when it's 'bluepilled'. This nice attack has been independently proposed by Alex Tereshkin and Oded Horowitz, BTW.
Some people talked about prevention... Can we disable virtualization in BIOS? I can't do it on my AMD machine - but I heard that vendors are going to release updates to allow for that. But, come on, this is not a good way to address this threat! It's better not to buy the processors supporting hardware virtualization!
One more thing - as I'm being continually asked about this - yes, it is possible to create a similar malware to Blue Pill using Intel VT-x, just like it was demonstrated by Dino Dai Zovi at Black Hat a week ago.