(Un)Trusting the Cloud
Everybody loves The Cloud these days, and it is not hard to understand why. When every person owns computers (devices), the cloud is really hard to beat when it comes to syncing all your digital life back and forth between all those devices, and also sharing with your family members, friends, and colleagues at work. From task lists, through calendars, through health & fitness data, to work-related documents. And I'm not even mentioning all the unencrypted email that is out there.
One doesn't need to be especially smart or security conscious to realize how much this might be a threat to security and privacy. How much easier would it be to attack somebody's laptop if I knew precisely in which hotel and when he or she is planning to stay? How much more expensive would my health and life insurance be, if they could get a look at my health and fitness progress? Etc.
But we're willing to sacrifice our privacy and security in exchange for easy of syncing and sharing of our data. We decide to trust The Cloud. What specifically does that mean?
First, it means we trust the particular cloud-based service vendor, such as the provides of our training monitoring app and service. We trust that this vendor is: 1) non-malicious and ethical, and so is not going to sell our private data to some other entity, e.g. insurance company, and 2) that the software written by this vendor is somehow secure, so it would not be easy for an attacker to break into their cloud service and download all the user's data (and then sell to health insurance companies).
Next, we trust the cloud infrastructure provider, such as Amazon EC2. We trust that the cloud provider is 1) non-malicious and ethical, and that they won't really read the memory of the virtual machine on which the previously mentioned cloud-service is running (and won't make it available to a local government officials, e.g. in China), and 2) that they secured their infrastructure properly (e.g. it wouldn't be easy for one customer to “escape” from a VM and read all the memory of the VMs belonging to other customers).
Finally we trust all the infrastructure that is in the middle between us and the service provider, such as e.g. the networking protocols, are safe to use (e.g. we trust all the engineers working in any of the ISP we use won't sniff/spoof our communication, e.g. by using some fake or quasi-fake SSL certs).
So, that's a hell of a lot of trusting! And the stake is high. Do we really need to make such a sacrifice? Do we really need to hand in all our private data to all those organizations? Of course we don't!
First, notice that in majority of cases, the cloud is only used basically as a on-line storage. No processing, just dump storage. Indeed, what kind of server-side processing does your task list or calender require? Or your freestyle swimming results? Or your conference slides? None.
And we know for very long how to safely keep secrets on untrusted storage, don't we? This is achieved via encryption (and digital signatures for integrity/authenticity). So, the idea is very simple: let's encrypt all the data before we send them to the cloud. The point here is, the encryption must be done by the app that is running on our client device. Not in the cloud, of course.
Ok, so let's say I have my calendar records encrypted in the cloud, how do I share it with my other devices and other people, such as my partner and colleagues at work? Very simple – you encrypt each record with a random symmetric key and then, for every other device or person who you want to grant access to your calendar you make the symmetric key available to this person, by encrypting it with their public key (if you're paranoid, you can even verify fingerprints using some out-band communication channel, such as phone, to ensure the cloud/service provider didn't do MITM attack on you). What if you want to share only some events (or some details) with some group of people (e.g. only your availability info)? Very simple – just encrypt those records you want to share in non-full access with some other symmetric key and publish only this key to those people/devices you want to grant such non-full access.
Implementing the above would require writing new end-user apps, or plugins for existing apps (such as Outlook), so that they do encryption/decryption/signing/verification before sending the data out to the cloud. But what stops the malicious vendor from offering apps that would be leaking out our secrets, e.g. the keys? Well, nothing actually. But this time, the vendor would need to explicitly build in some kind of backdoor into the app. The same could be done with any other vendor, and any other, non-cloud-based app. After all, how do we know that MS Word, which is not cloud-based yet, is not sending out fragments of our texts to Agent Smith? Note how different this is from a situation when the vendor already owns all our data, unencrypted, brought legitimately to their servers, and all they need to do is to read them from their own disks. No need to plant and distribute any backdoors!
In practice few vendors would be risking their reputation and would be willing to build in a backdoor into an app that is then made available to customers. Because every backdoor in such client-exposed code will sooner or later be found (You would really not believe what great lengths all those young people aimed with disassembler and debugger would go to, to win an economy class ticket to the middle of desert in the hottest summer season, just to be able to deliver a presentation on how evil/stupid a company X is ;).
One problem is, however, with accessing our encrypted cloud over a Web Browser. In contrast to apps, the web browser content is much less identifiable. An app can have a digital signature – everybody know its an App v 1.1, published by X. As explained above it would be rather stupid for X to plant a backdoor into such an app. But a Web-delivered Javascript is much more tentative, and it's very possible for X to e.g. deliver various versions of scripts to different customers. Digital signature on client-side scripts, paired with ability to whitelist allowed client-side-scripts, would likely solve this problem.
So, why we still haven't got client-side-encrypted cloud-services? The question is rhetorical, of course. Most vendors actually loves the idea of having unlimited access to their customers data. Do you think Google would be happy to give up an opportunity to data mine all your data? This might affect their ad business, health research, or just Secret Plan To 0wn The World. After our dead body, I can almost hear them yelling! After all they have just came up with Chrome OS to bring even more data into their data mining machine...
To sum it up, there is no technical reason we must entrust all those people with our most private data. Sooner or later somebody will start selling client-side-encrypted cloud services, and I would be the first person to sign up for it. Hopefully it will happen sooner than later (to late?).
This post also hopefully shows, again, one more aspect – that we can, relatively easy, move most of the IT infrastructure out of the “TCB” (Trusted Computing Base, used as metaphor here). In other words, we can design our systems and services so that we don't need to trust a whole lot of things, including servers and the networking infrastructure (except for its reliability, but not for its security). But, there always remains one element that we must trust – these are our client devices. If they are compromised, the attacker can steal everything.
Strangely most people still don't get it, or get it backwards. Just the fact that “information is not stored on the iPad but kept safe on the corporate network”, doesn't change anything! Really. If the attacker owns your iPad, then she also can do anything that the legitimate user could do from this iPad. So if you could get to the company's secret trade data from your iPad's Receiver, so would be able to do the malware/attacker.