Research and industry groups working with medical images can use this paper to foster the application of cloud-based environments and convolutional neural networks as development and production tools.
Rapid progress is being made in the field of artificial intelligence (AI), as milestones are being passed in a wide range of areas, including image understanding and medical diagnosis. With the processing power rapidly evolving and the increasing data availability, learning methods such as convolutional neural networks are playing a crucial role in making sense from images and videos. In this position paper, we propose to use 3D convolutions to model the temporal dependencies between video frames
from echocardiogram data. We applied the 3-dimensional Convolutional Neural Network (C3D) proposed by Tran, Du et al. for automatic identification of exams as Rheumatic Heart Disease (RHD) Positive or Negative.
Present in low-income countries, Rheumatic Heart Disease (RHD) is a heart condition caused by an abnormal immune response to streptococcal infection. Despite Echocardiographic (echo) screening playing the role of the gold standard for diagnosis of latent RHD, due to personnel shortages, it is limited to be used on a broad implementation. To overcome this issue, we propose to develop a machine-learning model for automatic identification to be used in further steps of our solution for RHD screening for prioritization of follow-up.
Thus, in this paper, we tackle the RHD diagnosis by extracting visual and temporal features from Echocardiographic videos using a 3D convolutional neural network implemented on top of the ATMOSPHERE environment. The ATMOSPHERE environment provides the trustworthiness and availability requirements for our medical application. The learning model used for the classification task consists of a fine-tuned version of the C3D. Data were partitioned in training, validation and test sets with a ratio of 80, 10 and 10, respectively, the model trained to identify RHD, independent of the current stage of the disease, and the results evaluated using a 10-fold cross-validation procedure. Videos were deidentified and clipped at 16 frames, resized to 128 x 171 pixels, due to the original model input format, and then subtracted from the mean of the original training data, in a process called whitening. Using a random 112 x 112 crop of Color Doppler echos, the C3D was able to achieve an overall accuracy of 62% on a per exam basis, assessed through a majority voting approach derived from the classification of individual videos.