Researchers who need to process sensitive medical data on an international collaboration over untrusted public resources.
The advent of computer-aided medicine has facilitated the transition from the traditional qualitative analysis of medical images to automated quantitative analysis. The qualitative analysis relies on the experience and knowledge of specialized radiologists who write their appraisal in natural language. This implies a high temporal and economic cost as well as a limitation on the secondary usage of such data for research. The generalisation of computer image analysis techniques and the increase on the power of computer systems has enabled to measure different characteristics of a medical image (such as texture, shapes, volumes, position of a component or its time evolution) to sustain by quantitative pieces of evidence the conclusions of radiology reports. However, medical imaging processing tasks, such as biomarkers sometimes require high-performance computing infrastructures and, in some cases, hardware accelerators (especially for Artificial Intelligence tools based on advanced Machine Learning techniques). Such advanced resources may not be available at the institutions where the data is acquired. These requirements could be fulfilled by cloud service providers.
On the other side, medical data is a sensitive, highly-protected asset that should be processed in an efficient and trustworthy manner. Medical data is universal and relevant at a global scale in research and secondary usage of data from clinical practice to research is key when developing new diagnosis assisting tools. Large-scale collaboration is key when dealing with rare diseases (as the number of subjects in a single site or country may not be sufficient), or in international research studies. The processing of data in multiple domains with different trustworthy levels and bound to different legal regulations impose additional complexity to the development of trustworthy medical data applications. We may want to define different roles for clinical users within the country where the data was acquired and for medical researchers in different countries.
In this context, we identified the following requirements and challenges:
Sensitive data must not be accessible out of the boundaries of the hosting country. Sensitive data is protected by the Brazilian LGPD add reference and must be processed under high access protection measures, which involves using a potentially non-vulnerable cloud offering. Anonymous data, though, can be released but should be kept accessible only in a restricted environment. If data is processed outside of the Brazilian border, other regulations will appear.
Medical Imaging Processing and Machine Learning model building require intensive computing resources. The capabilities for processing may not be accessible in the boundaries where the data is located and therefore such processing algorithms must run elsewhere. The access should be coherent and secure, and image processing should be efficient. In the case of ATMOSPHERE, medical imaging data has a reasonable scale to consider data transferring, as the image processing and model building process is highly computationally intensive.
Environments should be reproducible and processing should be repeatable. Reproducibility in medical research is extraordinarily important. The model building, image processing and classification should run on well-defined environments that could be reproduced for further analysis, including the definition of specific software versions and execution conditions.
For this reason, we would like to propose a model to federate virtual infrastructures where data can be processed securely and safely. In this model, Data is stored encrypted in Brazil. The decryption of the data requires a key that can only be provided to processes which run inside an SGX enclave (a hardware extension that makes processes to run encrypted in memory, so even a user with administrative privileges on the cloud resources cannot access the data in memory. Data is anonymised and securely copied to remote storage where the accelerator computing resources are available. All the communication takes place on a secure and isolated overlay network.
ATMOSPHERE has defined an architecture for the trustworthy processing of sensitive data, demonstrated on an application for cardiac data processing. The application also leverages other ATMOSPHERE services for the evaluation of trustworthiness.