Applications of Deep Learning: Convolutional Neural Network Models In the Healthcare Industry: Part 1

Rachana Reddy
6 min readMar 31, 2022

In this post, I’m going to be discussing the application of Deep Learning using Convolutional Neural Networks in the Healthcare industry. I will be discussing two of the research papers: Classification of breast cancer histology using deep learning and SD-CNN: A shallow-deep CNN for improved breast cancer diagnosis. The concept of using neural networks for having a better understanding of breast cancer is a very intriguing idea.

Figure 1: A — Normal breasts; B — Benign lesion; C — Uncheck benign can progress to in situ carcinoma ; D — invasive carcinoma


The dataset used in the paper was from Breast Cancer Histology Challenge 2018. It consisted of RGB color images of size 2048 x 1536 pixels.


  • Nuclei-based patch extraction
  • Transfer Learning for Patch-Wise Classification

Nuclei-based patch extraction

When the CNNs are training using the whole images, it can cause overfitting as the model might give more importance to the non-essential features as well. It may lead to poor generalization errors. This is the reason why the authors adopted to work in patches rather than whole images. It is better to work in patches as most of the information for the classification lies which has high nuclear density. They also add other information around the nucleus and make everything of consistent shape and size for accurate tissue classification. To make the model more robust, the patches are also rotated, flipped horizontally and vertically for data augmentation. Each patch is of size 299 x 299 pixels with 50% overlap. Using Inception-v3 as a base architecture, they extract nuclei-based features and the patches that do not have high nuclear density are discarded.

Figure 2: Nuclei based patch extraction

Transfer Learning for Patch-Wise Classification

For weight initialization, we can not adapt random initialization as the image sizes are really small for training. Therefore, they employed transfer learning from Inception-v3 architecture with some modifications. The modifications were that they removed the fully connected layer with 1024 neurons, they added a softmax classifier with 4 neurons.

Training process

There are two main stages in the training process. In the first stage, they freeze the convolutional layers and only train the top layers and in the second stage, they fine-tune the last two inception blocks along with the top layers.


After training, they perform the classification of classes by combining the patch-based predictions using majority voting to determine the class of the entire image. In case of a tie, they follow the precedence order to classify: invasive, in situ, benign, normal — to avoid false negatives for a more dangerous disease class.


For the results, they have two levels of accuracy: Patch-level accuracy and image-level accuracy. For patch-level accuracy, the average patch-wise accuracy: was 79% across all four classes, and the previous benchmark was 66.7%. For the image-level accuracy, they observed the confusion matrix, the model confuses the normal class with benign, because of the high similarity between benign and normal images. Images in situ are confused with normal images due to a similar reason. Overall image accuracy is 85% which is higher than the patch level accuracy due to the voting strategy.

Table 1: Four class confusion matrix: Normal, Benign, In situ, Invasive
Table 2: Two class confusion matrix: Non-carcinoma and Carcinoma

The second paper titled “SD-CNN: A shallow-deep CNN for improved breast cancer diagnosis” is relevant as breast cancer is the second leading cause of cancer among women worldwide, but at the same time is one most treatable cancers if detected early. This paper is mainly discussing how the existing technologies for detecting cancers can be used for training neural network models to classify images as benign or cancer.


The dataset used in this experiment is derived from two sources, one is acquired from a tertiary medical centre (Mayo Clinic Arizona), and the other was a public dataset from InBreast.


For image pre-processing, they followed a four-step imaging pre-processing procedure.

  1. First, Each image is has a minimum area bounding box that contains the tumor region, where there are (x_min, y_min) and (x_max, y_max) as the diagonal corner points. The bounding box size varies depending on the sizes of the tumors
  2. Second, an enlarged rectangle that is 1.33 times the size of the bounding box — is to include the neighboring information which proved to improve classification accuracy. This image is extracted and saved as an image
  3. Third, normalize the image intensity between 0 and 1 using max-min normalization
  4. Fourth, resized into 224 x 244 to fully take advantage of the trained ResNet model

Model Training

After the pre-processing, these patches are taken for the training process rather than the whole image as input. The following is the architecture for Shallow-CNN which is used for virtual image rendering and Deep-CNN used for feature generation.

  1. Shallow-CNN: virtual image rendering(Figure 3)
Figure 3: Architecture of 4-layer shallow-CNN for “virtual” recombined image rendering

2. Deep-CNN: feature generation

Figure 4: Building blocks for traditional CNNs(left) and ResNet(right)
  • As seen in Figure 4, the output of building blocks takes both final classification results and the initial inputs (the short-cut) when updating the parameters. As a result, it outperforms traditional deep-CNNs which are known to suffer from higher testing errors since the gradient tends to vanish as the number of layers increases
Figure 5: Architecture of ResNet


The way the prediction is done is by gradient boosting trees as boosting is a machine learning ensemble meta-algorithm aiming to reduce bias and variance. It converts weak learners to strong ones by weighing each training sample inversely correlated to the performance of previous weak learners.


The performance of the model is measured based on leave-one-out cross-validation to fully use the training dataset which is limited in size. Performance metrics are accuracy, sensitivity and specificity, and area under the receiver operating characteristic curve (AUC).

Figure 6: Receiver Operating Characteristic Curve for the model using FFDM image only verse FFDM and virtual recombined
Table 4: Classification Performance of Experiment Using FFDM Imaging vs FFDM + Recombined imaging


In conclusion, the following experiment has been a success as based on the review of literature, there is no existing study that investigated the extent of CEDM imaging potentials using the deep-CNN. Their second contribution lies in addressing the limited accessibility of CEDM and developing SD-CNN to improve breast cancer diagnosis using FFDM in general.