By applying Deep Learning and Computer Vision, we can achieve faster diagnostics, which leads to the optimization of the way patients are treated in the whole process and the decision making from the doctor.

Abstract

We developed an automatic screening/diagnostic system for diabetic retinopathy using an ensemble of deep neural networks followed by a random forest classifier. Our system has a sensitivity of 95% and a specificity of 65%.

// Example API Response

{
  "label": "No DR",
  "sum_r": 0.000285714285714285,
  "sum_y": 0.018857143,
  "red_alert": false,
  "yellow_alert": false,
  "probs": {
    "healthy": 0.908857143,
    "mild": 0.0722857143,
    "moderate": 0.0185714286,
    "severe": 0.000285714286,
    "proliferative": 0
  }
}

Problem overview

Diabetic retinopathy (DR), a major microvascular complication of diabetes, has a significant impact on the world's health systems. In Mexico alone this disease affects more than 11 million people [1]. Globally, the number of people with DR will grow from 126.6 million in 2010 to 191.0 million by 2030, and it is estimated that the number with vision-threatening diabetic retinopathy (VTDR) will increase from 37.3 million to 56.3 million, if prompt action is not taken.

Despite growing evidence documenting the effectiveness of routine DR screening and early treatment, DR frequently leads to poor visual functioning and represents the leading cause of blindness in working-age populations. DR has been neglected in health-care research and planning in many low-income countries, where access to trained eye-care professionals and tertiary eye-care services may be inadequate. Demand for, as well as, supply of services may be a problem. Rates of compliance with diabetes medications and annual eye examinations may be low, the reasons for which are multifactorial [2].

Motivation

With the intention of developing an automatic diagnostic system for the screening of patients with possible diabetic retinopathy, we used recent advances in computer vision and deep learning to train an ensamble of neural networks to detect this disease and its level of progression.


Model overview

Data

For training and validation, we used 85,000 high-resolution images, each one consisting of a digital slit lamp capture, labeled with the proper diagnosis (made by a clinician who rated the severity of the disease). Each image is labeled as being [0] no DR, [1] mild DR, [2] moderate DR, [3] severe DR or [4] proliferative DR. The per-class representation in the dataset is as follows:
ClassNumber of images
No DR62,920
Mild DR5,650
Moderate DR12,440
Severe DR2,020
Proliferative DR1,690
The data was randomly divided between train (90%) and test (10%) sets. Test results were used for early-stopping during training and to choose some metaparameters of the neural networks. An example image from original data. An example image from original data.

Preprocessing

The eye is detected and the image is rescaled and adjusted so that the eye is always in the center with a fixed size. RGB channels are locally normalized with a moving gaussian kernel in order to highlight local image variability. This allows the model to be agnostic to global light intensity and other factors depending on the particular camera used.
An example image from original data. This image represent the final image from a Proliferative DR study used for neural network training.

Neural Networks

Several neural networks were trained using different architectures (InceptionV3, Resnet50). The training leveraged transfer learning from an Imagenet model, and was done in stages from the top-most layers gradually diminishing the learning rate. Two weeks of 2-gpu servers were used for the training of each model.
An example image from original data. After training, a neural network is capable to evaluate preprecessed images, this image shows the heatmap where damage is being found on a Proliferative DR patient.

Random Forest

We trained a Random Forest to combine the results of the different neural networks on both eyes of the patient with other statistics from the images, to predict the final probabilities that a particular image corresponds to a certain level of DR. This stage assigns to each image a vector with the probabilities of each class.

Label aggregation

Most guidelines recommend annual screening for those with no Retinopathy or mild Diabetic Retinopathy; every 6 months for moderate Diabetic Retinopathy, and an Ophthalmologist referral for treatment evaluation within a few weeks or months for severe or proliferative Diabetic retinopathy [3].

Following other studies such as [3], we define a negative case as no-DR or mild-DR, and a positive case as moderate, severe or proliferative DR. The vector of probabilities is therefore simplified into the probability of being a positive DR case. We can now create a ROC curve to choose the threshold for our prediction. A family of models with different sensibility and specificity. In figure X we can see the different possibilities. Among these we chose a model with 95% sensitivity and a corresponding 65% specificity so that it serves as a good first screening layer in a diagnostic pipeline.

In a similar fashion, we created a Red alert using only severe and proliferative DR as positive cases and looking for a sensitivity of 0.9. These two alerts, yellow and red have the following statistics:
Table 1
ClassYellow alertRed alert
No DR18%1%
Mild DR57%2%
Moderate DR90%38%
Severe DR98%89%
Proliferative DR98%91%
No DR or Mild DR35%
(general specificity = 65%)
1%
Moderate, Severe or Proliferative95%
(general sensivity)
50%
Severe or Proliferative98%90%

The probability of triggering the Yellow or Red alerts when the patient has a certain class level of Retinopathy. We see that the Red alert is only likely to be triggered with Moderate, Severe or Proliferative DR; while Yellow alert is more conservative and is able to detect 95% of all positive cases. In combination, both alerts can be extremely useful for the early detection of diabetic retinopathy.

Further steps for improving the model’s performance:

  • A more robust labeling following the example of [3] would definitely decrease the prediction error. In order to do this we will collaborate with a team of Ophthalmologists for systematic robust diagnosis and localization of wounds.
  • During the Random Forest stage, the inclusion of additional addition of certain data from the patients (such as glucose levels, age, etc) would be very valuable.
  • Currently, the model uses an ensemble of 3 neural networks. If we were to use at least 10, we could increase the accuracy of the model. In addition, working with larger (higher resolution) images could allow us to detect smaller wounds. Both neural network increased ensemble, and higher resolution images, imply more power during training.