How to encrypt health data for GDPR & HIPAA compliance
Encryption is often poorly understood, with many companies claiming that what they do is “the most secure”. Here, we provide a summary of main encryption methods and describe which to use to ensure GDPR and HIPAA compliance for your health applications.
Introduction
Data encryption is the process of securing your data by subjecting it to some cryptographic operation. It can happen at rest (when the data is in storage on your device or server) or in transit (while the data is traveling across the network). Encryption involves using mathematical functions to scramble the data with a key. To access the data you need to decrypt the ciphertext using the same key (symmetric) or a paired key (asymmetric). Most encryption uses block ciphers meaning the data is split into blocks with each block being encrypted separately. Encryption is a vital element of data security for both GDPR and HIPAA.
Securing data at rest
Data at rest is extremely vulnerable. Most data protection breaches you read about relate to data that has been stolen from a company’s secure storage servers. Because all your data is stored there, a breach has the potential to affect a large number of customers. Remember that the GDPR mandates that you notify customers and data protection authorities about any and every potential breach.
Unencrypted data
Storing data without encryption is a bad idea. You are relying solely on your system’s access controls for protection and security. If an attacker bypasses your security or physically steals the disk drives, they have access to all your data. Marginally more secure is disk-level encryption, where the entire disk is encrypted. This gives protection against physical theft, but the drive will always be accessible when the machine is powered on.
Database-level encryption
Here, each database is individually encrypted. To access the database you have to use a key. This is a good first step, but still pretty weak. Loss of a single key still allows access to all data in that database. Typically, this form of encryption is what cloud service operators offer. In our experience, these keys are often stored in plaintext in something like a YAML file meaning you are particularly vulnerable.
Record-level encryption
Here, every single record is individually encrypted. This means each key stolen will only unlock one record. This is a really secure level of encryption, and really should be what you use for health data. However, implementing it is hard. It also requires secure (encrypted) key storage, since each record now has a key. In addition, searching your data is impossible (but there are techniques that can provide searchability on certain fields). Record-level encryption also makes life difficult for some machine learning algorithms.
Other approaches
Pseudonymization can be used to improve the security of your data storage. Here, rather than store all personal and sensitive data together, the personal data is replaced with pseudonyms (randomly generated identifiers). You then securely store the mapping between the personal data and the pseudonym.
In the example application, all personal data is encrypted and stored in one database along with a pseudonym. The sensitive data is then stored in a different encrypted database along with the pseudonym. If you want to access the data, you first have to find the appropriate pseudonym, then you use this to query the sensitive data.
Future approaches
Encryption is a constantly evolving technology area. One of the most exciting new proposals is for homomorphic encryption. The reason homomorphic encryption is so desirable is that it allows you to perform operations on the encrypted data as if it was unencrypted. This offers the promise of a “best of both worlds” where you can encrypt health data at the end devices while retaining the ability to process it on your servers. However, for now, homomorphic encryption is a vision rather than a reality.
Securing data in transit
Data in transit is at risk from interception (eavesdropping attacks) and, potentially, redirection attacks (where an attacker causes the traffic to be sent to their server instead of yours).
HTTP (no encryption)
This is definitely a bad thing. HTTP is increasingly blocked by browsers, but not native apps. Put simply, do not use it for anything.
HTTPS (SSL)
This is really the minimum level of security for any application. HTTPS encrypts all your data before it is transmitted over the network. It does this using Secure Sockets Layer (SSL). Recently, there has been a push towards using a new protocol, Transport Layer Security (TLS). This is more secure but is less widely implemented.
End-to-end (E2E) encryption
E2E encryption is how messenger apps such as WhatsApp protect user messages. Each pair of users share a key pair which is used to encrypt the messages they exchange. As a result, any server or device in the middle is unable to view the messages. In an eHealth context, this might be useful for private consultations with a medical professional or to send scans/documents. However, the fact the data is invisible to the servers invalidates it for the majority of eHealth scenarios.
Conclusions
As we have seen, when you look at data encryption you have to consider data at rest and in transit. There is no use in encrypting data on the device if you then send it unencrypted and vice versa. For data at rest, record-level encryption provides the highest security, although it’s hard to implement. For data in transit, e2e encryption is more secure, but for many eHealth scenarios, it is useless. As a result, you will have to fall back on HTTPS or TLS. Here at Chino.io, our health data storage API makes it extremely easy to implement record level encryption and secure key storage. We also make your data records searchable without sacrificing data security. To find out more about this and our other services, do get in touch with us.