I tried to cover those questions in my recent keynote at ETISS, and you can grab the slides here.
Particularly, the slide #18 presents the idealistic view of an OS that could be achieved through the use of hardware virtualization and trusted boot technologies. It might look very similar to many other pictures of virtualized systems one can see these days, but what makes it special is that all the dark gray boxes represent untrusted domains (so, their compromise is not security-critical, except for the potential of a denial-of-service).
No OS currently implements this architecture, even Qubes. We still have Storage and GUI subsystem in Dom0 (so they are both trusted), although we already know (we think) how to implement the untrusted storage domain (this is described in detail in the arch spec), and the main reason we don't have it now is that TXT market adoption is so poor, that very few people could make use of it.
The GUI subsystem is, however, a much bigger challenge. When we think about, it should really feel impossible to have an untrusted GUI subsystem, because the GUI subsystem really "sees" all the pixmaps that are to be displayed to the user, so also all the confidential emails, documents, etc. The GUI is different in nature than the networking subsystem, where we can use encrypted protocols to prevent the netvm from sniffing or meaningfully intercepting the application-generated traffic, or the storage subsystem, where we can use fs-encryption and trusted boot technologies to keep the storage domain off from reading or modifying the files used by apps in a meaningful ways. We cannot really encrypt the pixmaps (in the apps, or AppVMs), because for this to work we would need to have graphics cards that would be able to do the decryption and key exchange (note how this is different from the case of an untrusted storage domain, where there is no need for internal hardware encryption!), and the idea of putting, essentially an HTTPS webserver on your GPU is doubtful at best, because it would essentially move the target from the GUI domain to the GPU, and there is really no reason why lots-of-code in the GPU were any harder to attack than lots-of-code in the GUI domain...
So we came out recently with an idea of a Split I/O model that is also presented in my slides, where we separate the user input (keyboard, mouse), and keep it still in dom0 (trusted domain), from the output (GUI, audio), which is moved into an untrusted GUI domain. We obviously need to make sure that the GUI domain cannot "talk" to other domains, to make sure it cannot "leak out" the secrets that it "sees" while processing the various pixmaps. For this we need to have the hypervisor ensure that all the inter-domain shared pages mapped into the GUI domain are read-only for the GUI domain, and this would imply that we need the GUI protocol, exposed by the GUI domain to other AppVMs, to be unidirectional.
There are more challenges though, e.g. how to keep the bandwith of timing covert channels, such as those through the CPU caches, between the GUI domain and other AppVMs on a reasonably low level (please note the distinction between a covert channel, which require cooperation of two domains, and a side-channel, which requires just one domain to be malicious - the latter are much more of a theoretical problem, and are of a concern only in some very high security military systems, while the former are easy to implement in practice usually, and present a practical problem in this very scenario).
Another problem, that was immediately pointed out by the ETISS audience, is that an attacker, who compromised the GUI domain, can manipulate the pixmaps that are being processed in the GUI subsystem to present false picture to the user (remember, the attacker should have no way to send them out anywhere). This includes attacks such as button relabeling ("OK" becomes "Cancel" and the other way around), content manipulation ("$1,000,000" instead of "$100", and vice-versa), security labels spoofing ("red"-labeled windows becoming "green"-labeled), and so on. It's an open question how practical these attacks are, at least when we consider automated attacks, as they require ability to extract some semantics from the pixmaps (where is the button, where is the decoration), as well as understanding the user's actions, intentions, and behavior (just automatically relabeling my Friefox label to "green" would be a poor attack, as I would immediately realize something is going wrong). Nevertheless this is a problem, and I'm not sure how this could be solved with the current hardware architecture.
But do we really need untrusted GUI domain? That depends. Currently in Qubes the GUI subsystem is located in dom0, and thus it is fully trusted, and this also means that a potential compromise of the GUI subsystem is considered fatal. We try to make an attack on GUI as hard as possible, and this is the reason we have designed and implemented special, very simple GUI protocol that is exposed to other AppVMs (instead of e.g. using the X protocol or VNC). But if we wanted to add some more "features", such as 3D hardware acceleration for the apps (3D acceleration is already available to the Window Manager in Qubes, but not for the apps), then we would not be able to keep the GUI protocol so simple anymore, and this might result in introducing exploitable fatal bugs. So, in that case it would be great to have untrusted GUI domain, because we would be able to provide feature-rich GUI protocols, with all the OpenGL-ish like things, without worrying that somebody might exploit the GUI backend. We would also not need to worry about putting all the various 3rd party software in the GUI domain, such as KDE, Xorg, and various 3rd party GPU drivers, like e.g. NVIDIA's closed source ones, and that some of it might be malicious.
So, generally, yes, we would like to have untrusted GUI domain - we can live without it, but then we will not have all the fancy 3D acceleration for games, and also need to carefully choose and verify the GUI-related software (which is lots of software).
But perhaps in the next 5 years everybody will have a computer with a few dozens of cores, and also the CPU-to-DRAM bandwidth will be orders of magnitude faster than today, and so there will be no longer a need to offload graphic intensive work to a specialized GPU, because one of our 64 cores will happily do the work? Wouldn't that be a nicer architecture, also for many other reasons (e.g. better utilization of power/circuit real estate)? In that case nobody will need OpenGL, and so there will be no need for a richer GUI protocol than what is already implemented in Qubes...
It's quite exciting to see what will happen (and what we will come up for Qubes) :)
BTW, some people might confuse X server de-privileging efforts, i.e. making the X server run without root privileges, which is being done in some Linux distros and BSDs, with what had been described in this article, namely making the GUI subsystem untrusted. Please note that a de-priviliged X server doesn't really solve any major security problems related to GUI subsystem, as whoever controls ("0wns") the X server (depriviliged or not) can steal or manipulate all the data that this X server is processing/displaying. Apparently there are some reasons why people want to run Xorg as non-root, but in case of typical desktop OSes this provides little security benefit (unless you want to run a few X servers with different user accounts, and on different vt's, which most people would never do anyway).