Secure Distributed Computation in TANGO

Implementing a security framework for the TANGO project is a daunting task, especially considering that any proposed architecture should impact the ongoing development on the project as little as possible. In this blog post, we intend to cover one aspect of security in TANGO: securing application data as it is processed in TANGO. If this data is sensitive, then there may be legal or regulatory requirements to ensure that such data is not revealed to unauthorised individuals. Our previous research dealt with a similar problem: securing distributed computation in the cloud. The goal was to allow computation in the cloud without leaking input data, working data or the output to another party. This is especially important when considering that the application is run in a domain which is not under the administrative control of the application developer. The application developer might not trust that the cloud provider will not "snoop" on their computation. If the snooper is a privileged insider, then no sophisticated authentication or access control system will prevent their snooping because they are an authorised individual. So, to thwart the privileged insider, it would be desirable to keep the data confidential, i.e. encrypted, throughout the computation. This, of course, poses real challenges. While we can use common encryption systems, such as Advanced Encryption Standard (AES), to secure data at rest or in transport, for us to compute on it, it must be decrypted first. Systems, such as AES, do not allow for computation to be performed on the encrypted data (in cryptography jargon, we call the unencrypted data, plaintexts, and their encryptions, ciphertexts). If we were to perform an XOR on two ciphertexts produced by AES it would result in a piece of data which would produce garbage once decrypted. We could, of course, share the encryption key between the distributed worker nodes, allowing each worker to decrypt to perform computation and re-encrypt after. However, this raises several problems. How do we safely distribute the key to the workers without exposing it? How do we store the key so that it cannot be easily retrieved by a privileged insider? How do we process third-party encrypted data where we do not have access to the encryption key? Ideally, we would like to encrypt the data using a system which allows computation to be meaningfully performed on the ciphertexts without the need for the ciphertexts to be decrypted. Under such a system, we might, say, add or multiply ciphertexts, producing a new ciphertext, which when decrypted would give the sum or product of the plaintexts i.e. if m1 and m2 are two plaintexts encrypted as c1 and c2 then under our proposed cryptosystem c1 + c2 decrypts to m1 + m2. There is a field of cryptography, fully homomorphic encryption (FHE), which allows any computation to be performed on encrypted data. However, it is inefficient. The computation must be represented as an arithmetic circuit (like a Boolean circuit but with addition and multiplication as the gates). The input data must be encrypted bit-by-bit and the ciphertexts of each bit are large. The speed of execution is currently slow, taking milliseconds to perform a multiplication (and this is a considerable improvement over the early implementations of FHE). So, if we rule out FHE, what can we do? Well, we can sacrifice some flexibility in favour of speed. By this, we mean that we use alternate forms of encryption that only support a limited set of operations on encrypted data but are much more efficient. For example, a cryptosystem that supports integer arithmetic on ciphertexts. We are not limited to arithmetic operations: we have cryptosystems which allow for other operations, such as sorting of ciphertexts (order-preserving encryption) or searching (searchable encryption). How does this fit into TANGO? We are continuing to develop a C++ crypto library with several novel cryptosystems to support the operations described above. To support processing the ciphertexts, we are currently implementing a library of C++ classes which encapsulate the ciphertext and provide methods for the operations which can be performed on it. Using these libraries, an application can be developed using the TANGO programming model which processes encrypted data. This approach has the additional benefit that it does not impact current development of TANGO at all. We have only dealt with one aspect of security in TANGO here. Others, such as user-to-service authentication, service-to-service authentication, user authorisation, and service availability, will be addressed in another post.