Automatic model for cervical cancer screening based on convolutional neural network: a retrospective, multicohort, multicenter study

Background The incidence rates of cervical cancer in developing countries have been steeply increasing while the medical resources for prevention, detection, and treatment are still quite limited. Computer-based deep learning methods can achieve high-accuracy fast cancer screening. Such methods can lead to early diagnosis, effective treatment, and hopefully successful prevention of cervical cancer. In this work, we seek to construct a robust deep convolutional neural network (DCNN) model that can assist pathologists in screening cervical cancer. Methods ThinPrep cytologic test (TCT) images diagnosed by pathologists from many collaborating hospitals in different regions were collected. The images were divided into a training dataset (13,775 images), validation dataset (2301 images), and test dataset (408,030 images from 290 scanned copies) for training and effect evaluation of a faster region convolutional neural network (Faster R-CNN) system. Results The sensitivity and specificity of the proposed cervical cancer screening system was 99.4 and 34.8%, respectively, with an area under the curve (AUC) of 0.67. The model could also distinguish between negative and positive cells. The sensitivity values of the atypical squamous cells of undetermined significance (ASCUS), the low-grade squamous intraepithelial lesion (LSIL), and the high-grade squamous intraepithelial lesions (HSIL) were 89.3, 71.5, and 73.9%, respectively. This system could quickly classify the images and generate a test report in about 3 minutes. Hence, the system can reduce the burden on the pathologists and saves them valuable time to analyze more complex cases. Conclusions In our study, a CNN-based TCT cervical-cancer screening model was established through a retrospective study of multicenter TCT images. This model shows improved speed and accuracy for cervical cancer screening, and helps overcome the shortage of medical resources required for cervical cancer screening.

. About four out of five medical cases occur in developing countries, and especially in China and India. Human papillomavirus (HPV) infections have been established as a main cause of cervical cancer [5][6][7]. Based on a survey of 30,207 cases at 13 Chinese medical centers, the HPV infection rate in China was found to be about 17.7% [8].
The incidence of cervical cancer can be reduced through early screening [9,10]. For severe cervical cancer types, screening can help in avoiding cancer-related deaths [11]. In 2012, the National Comprehensive Cancer Network (NCCN) published clinical practice guidelines for cervical cancer screening. These guidelines show that the combination of HPV testing and cytology has been used as a primary option for cervical cancer screening in women [12].
In the United States, the number of deaths from cervical cancer has decreased since the implementation of widespread cervical cancer screening from 2.8 to 2.3 deaths per 100,000 women between 2000 and 2015 [13]. However, cervical cancer screening is still a problem with low diagnostic sensitivity and specificity, especially in developing countries. While cervical cancer goes undetected in some patients, it is overtreated in others [14]. In short, it is important to improve the efficiency and accuracy of cervical cancer screening, especially in China and other developing countries.
In the 1990 s, ThinPrep cytologic test (TCT) technology was approved by the Food and Drug Administration (FDA) for clinical applications. Compared with the pap smear, this technique reduced the effects of mucus, blood, and inflammation in cervical cancer screening. It can also effectively reduce the dissatisfaction rate of cell screening by maintaining or improving the sensitivity of disease screening [15][16][17]. According to Chinese guidelines for diagnosis and treatment of cervical cancer published in 2018, the recommended preliminary screening method is TCT combined with HPV screening [18].
The diagnostic results of the TCT were classified using the Bethesda system [19]. This system represents an artificial screening method for determining the precancerous cervical lesions by microscopic assessment of the changes in cytoplasm, nuclear shape, and the fluid base color [18]. According to the Bethesda system, the smears are mainly divided into the following categories: Intraepithelial Lesion/Malignant Lesion cell (NILM), atypical squamous cells of undetermined significance (ASCUS), low-grade squamous intraepithelial lesion (LSIL), and high-grade squamous intraepithelial lesion (HSIL). Above all, this method requires trained pathologists to make a correct diagnosis at the cellular recognition stage.
In developing countries with a large population, like China, the following three problems are mostly observed in cervical screening that reduces its diagnostic accuracy: (1) identification of cells on TCT smear images is subjective and mostly relies on the experience and technology of the pathologists. It has been reported that the increase in pathologists' workload also leads to a reduced diagnostic sensitivity [20]. (2) The correct diagnosis of TCT images requires much of the pathologists' time. However, there is a significant shortage of pathologists in China. Indeed, 68.4% of the pathologists who provide cervical cancer screening services have only primary technical qualifications or no qualifications at all. (3)The gap between urban and rural medically trained pathologists has been increasing [21]. Moreover, it takes more time and effort for human eyes to read images, the number of readings per day is also limited [22].
With the rapid development of science and technology, the emergence of artificial intelligence has provided more sensitive and effective solutions to many medical imaging problems [23,24]. In recent years, deep learning algorithms have helped to identify patterns in classifying and quantifying medical images [25]. Significant progress has been made in the computer-aided diagnosis of medical images, such as the skin cancer diagnosis from melanoma images [25,26] and the detection of pulmonary nodules via computerized tomography (CT) images [27,28] and image-base detection of hepatocellular carcinoma using multiphasic magnetic resonance imaging (MRI) [29]. The concept of deep learning originated from the area of artificial neural network (ANN), which was helpful to extracting appropriate features from medical images. Among them, one of the leading neural networks for image recognition is a convolutional neural network (CNN) [30]. At present, this method is applied in most deep learning applications to achieve good detection results [31,32].
We have developed a target recognition system based on a CNN model to assist pathologists in the diagnosis of early cervical cancer screening. This method is based on faster region convolutional neural networks (Faster R-CNN) [33] to locate and identify lesion cells in the cervical TCT images, extract features, and automatically optimize them through updating the convolutional kernel parameters in the cyclic training process [34]. Convolutional neural networks have the advantages of automatically learning features during training, and possessing strong inferential capabilities in comparison with conventional image processing methods in complex scenes [35,36].
The trained faster RCNN model can automatically identify target regions and reduce the false detections of other non-target regions. As well, this model can generate region proposals on the feature map of the target region and its adjacent areas [33,37]. The region proposal features can distinguish between target and non-target regions through merging layers and classifiers. It can learn the common characteristics of cells of the same lesion grade and the characteristic differences among different lesion grade cells. The identification of generalized performance was reported to be better than performance of conventional image processing methods under the conditions of cell deformation and cell superposition.
In this research project, we have successfully developed an auxiliary diagnosis model for the diagnosis of early cervical cancer. This model separated the positive or negative images effectively, minimizing the rate of missed diagnosis. It also reduced pathologists' workload by saving their time and improved the accuracy of cervical cancer screening.

Methods
The analysis pipeline for the detection of cervical cancer from the TCT smear using the Faster R-CNN is depicted in Fig. 1. TCT images of cervical brush smears were randomly selected and collected from outpatients at the above nine hospitals. All TCT images were histopathologically diagnosed as with different grades of a lesion by at least two local experienced pathologists according to the Bethesda system. This diagnosis was used as the gold standard for subsequent experiments. The images were sent to Tongji Hospital affiliated to Tongji Medical College of Huazhong University of Science and Technology, and the abnormal cells were examined and labeled by a pathologist.

Image preparation and preprocessing
A pathologist from Tongji Hospital examined and preprocessed all images. High-quality images that met the requirements were selected. The inclusion criteria were as follows: The cells were evenly spread on the slide, the field of vision was clear, and the number of cells in the slide were moderate, with fewer overlapping cells.
Images of non-JPG formats (such as scanned copies from slides) couldn't be directly marked by pathologists because of the extra-large image size. So, these images were sent to Zhejiang University Rui Medical Artificial Intelligence Research Center for further processing and segmentation. One slide image was divided into 1407 single-field images with enlarged cervical cytology images using seamless slider technology. Figure 2 displays the process of the seamless slider technique, with a scanned image size of 83,712 × 86,784 × 3. In this work, the eyepiece has a 10X magnification, while the objective lens The primary purpose of lesion labeling was to train a model to recognize the abnormal cells. After that, all images with abnormal cell labeling were sent to Zhejiang University Rui Medical Artificial Intelligence Research Center for model training.

Image processing Image patching
Cervical cytology has a more diverse form of color expression compared with the standard red and blue. Because of the differences in the staining methods and the severity of cervical cancer, the cells show varied and complicated cell morphology patterns. The image analysis flow chart for the Faster R-CNN system for the detection of cervical cancer from the TCT smear is depicted in Fig. 1. In this paper, we created three different datasets. The first one was named as the training dataset with 13,775 images, of which 5414, 4292, and 4069 were ASCUS, LSIL, and HSIL images, respectively. These images showed 200X and 400X magnification levels. We used this dataset to train the Faster R-CNN model. The second dataset was named as the validation dataset with 2301 images, of which 1000 images had a 200X magnification and 1301 images had a 400X magnification. The detailed statistics of each kind of picture scan piece or field of view are presented in Table 1.

Image enhancement
Because the data samples in this study are limited, we employ image processing methods to simulate and transform the color and shape of the sample images in order to increase the data diversity and better simulate the real data variability. The image processing methods were implemented by an image rollover, changing the brightness, saturation, or adjusting the color deviation  within the original images. Figure 3 illustrates some of the images that are simulated or transformed during the enhancement process.

Network model
The processed cervical cytological images, 13,775 pictures, were used as training samples, and the labeled boxes and the pathological changes in irregular cells within the images were used as training labels. The deep CNN architecture contains a convolution kernel, an activation function and a pooling function of 3 × 3 and 1 × 1 sizes. During training, this architecture is used to extract image features and get feature maps. The feature map is fed into the region proposal network, and the IoU metric is used to select the region proposals for real and interfering targets. The proposal box was used to select the feature map in the feature graph of the previous step and process it through the grid pooling layer to obtain a pooled feature map. Then, the real target and interference target features were identified contained in the pooled feature graph by classification network. The deviation of the proposal box and ground-truth box was calculated. The network was optimized by backpropagation combining classification error and proposal box deviation, to achieve higher classification accuracy and more accurate localization of region proposals.

Loss function
During training, an objective loss function was used to measure the model localization and detection performance. This function is defined as: (1) where, i is the subscript of an anchor in a mini-batch, p i is the prediction possibility of an object in anchor i, and p gt i is the ground-truth tag. If the anchor is positive, the ground-truth label p gt i is 1, otherwise, it is 0. t i represents four parameterized coordinates of bounding-box of the predicted target region, (anchor-based transformation), t gt i is the ground-truth box corresponding to this positive anchor (transformation based on anchor) L cls represents classification loss. L reg stands for regression loss.

Regression loss
is the robust smooth-L1 loss function. p gt i indicates that regression loss is only activated when the value of positive anchor ( p gt i ) is equal to unity. Therefore, the regression adopted the calculation stable smooth-L1 method and used it to calculate the deviation between the predicted target coordinates and ground-truth. Then it minimized the error to make the predicted target position closer to ground-truth to achieve an accurate model positioning. The smooth-L1 function is used to calculate the distance between the coordinates of the predicted and groundtruth targets. The smooth-L1 loss is defined as:

Classification loss
There were two ways to classify the training samples. One was to divide the samples into normal or abnormal ones based on the presence of diseased cells. The classification loss uses cross-entropy, which is a binary classifier (only in the case of yes and no). For each category, the two probabilities predicted by the model were p and 1-p, respectively, and the classification loss is defined as: The other way was to divide the microscopic field of view into LSIL, HSIL, and ASCUS images based on the lesion degree. For the second classification method, the algorithm encoded multiple categories as a continuous sequence with 0 starting and 1 interval. The classification loss is as follows: where, N represents the number of categories, i.e., the types of lesions, y c values 0 or 1, if the sample category is the same as category C, the value is 1, otherwise, it is 0. p c represents the probability that the prediction samples belong to category C.

Model identification
For the feature identification of the same area, the voting method was used to select the category. The number of probability values that each model could output was the same as the number of categories to be identified, i.e., each category corresponded to a probability value. The probability value is directly proportional to the possibility. The higher the probability value is, the higher the possibility value is. The probability outputs of the two models were averaged, and then the category corresponding to the maximum probability value was selected as the output category. The average value calculated by the voting algorithm is as follows: where, i is the serial number of the category, N is the number of categories to be identified, p net m (i) is the probability value of category i in the N probability values output by the M th model, and p ave(i) is the average probability value of the model in category i, and m is the number of models. The category corresponding to the maximum probability value was selected as the final recognition category.

Statistical analysis
The receiver operating characteristic (ROC) curve was used to evaluate the ability of the auxiliary diagnosis model of cervical cancer to distinguish between the negative and positive images and to calculate the Area Under ROC Curve (AUC) values. AUC is a more objective evaluation index to measure the advantages and disadvantages of the two-classification model. The ROC curve and AUC value are calculated by Python (version 3.6) and scikit -learn (version 0.20.0).

Results
The samples were divided into normal, ASCUS, LSIL, and HSIL cells as well as squamous cell carcinoma (SqCa). Performance indicators included sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), defined as follows: where, TN, TP, FN, and FP represent the numbers of true-negative, true-positive, false-negative, and falsepositive images, respectively. Table 2 shows the confusion matrix of the classification results of the TCT technique used for the test data composed of 290 scans of complete slides. Of the 112 normal slides, 39 were correctly classified as normal and 73 were incorrectly classified as abnormal. Out of 178 abnormal slides, a total of 177 slides were correctly classified. This classifier resulted in a sensitivity of 99.4% and a specificity of 34.8%. The diagnostic test performance was better quantified using the positive predictive value (PPV) and the negative predictive value (NPV). The predictive value represented the probability that an experiment could make a correct judgment and the actual clinical significance of the results. In our system, the PPV and NPV were 70.8 and 97.5%, respectively. Furthermore, we constructed the ROC curve which is a comprehensive indicator of the continuous variables of sensitivity and specificity. The proposed system had an AUC of 0.67 for the ROC curve in Fig. 4.

Evaluation of lesiondegree classification
Our tests were performed on 1975 images generated after slide segmentation as shown in Fig. 2. This system didn't provide a direct diagnosis but provided a reference for the pathologists according to the cells with the highest degree of the lesion. Table 3 shows the confusion matrix used to test the results of TCT classification according to lesion degree. Of the 606 ASCUS images, 541 were correctly classified. Also, 422 of the 590 LSIL images and 576 of the 799 HSIL images were classified correctly. Therefore, the sensitivities for the ASCUS, LSIL, and HSIL were 89.3, 71.5, and 73.9%, respectively.

Discussion
In this paper, we have proposed a cervical cancer screening method based on CNN, which could successfully extract features from TCT images. The pathological condition of glass slides was diagnosed by dividing the images scanned by glass slides into a large number of single visual field images to establish the model, and by integrating the results of the visual field processing. In general, this model could distinguish between negative and positive cells, with a recognition sensitivity of 99.4%. Proper diversion of negative and positive marker images was also the focus of developing the cervical cancer-assisted screening system. This model achieved a 99.6% accuracy to exclude negative images (i.e. normal or healthy populations), and the remaining suspicious images were analyzed and confirmed by pathologists.
The specificity of the system was only 34.8%, and hence many normal cells were misclassified as abnormal ones. However, the key goal of this model was to ensure that there were no false-negative images as much as possible, and thus reduce the rate of missed diagnosis. The PPV was 70.8%, which meant that the test-positive patients  Although the low specificity means that more than half of the normal women may be classified as suspicious cases for further diagnosis, the system had a low false-negative rate of 0.06%, which indicated that the rate of missed diagnosis greatly decreased and it could effectively identify suspicious cases. This also helps pathologists improve their disease diagnosis efficiency, so they can spend more time on analyzing complex cases and ensuring that diseases are diagnosed quickly and accurately. Although cervical cancer screening has become very popular, there is an imbalance in medical resources and untrained pathologists in some parts of China. Our deep learning model may help solve the above problems, and lead to a more accurate and effective diagnosis. Moreover, this model can be re-classified according to the abnormal cell details. The sensitivity of ASCUS, LSIL, and HSIL was 89.3, 71.5, and 73.9%, respectively. According to the degree of a lesion, the classification of cervical cancer cells indicated that the model has good sensitivity to identify the degree of a lesion in a single field. The diagnosis of the TCT smear is a synthesis of the results of a large number of the single-field images, and our system was able to assist the specialists in locating and identifying different degrees of diseased cells quickly, allowing them to devote more time on more complex cases.
To our knowledge, only few studies on automatic diagnosis of cervical cancer were based on ThinPrep cytologic test (TCT) images instead of pap smears. This is mainly because TCT images are relatively difficult to analyze compared to the pap smear images. More specifically: (1) the TCT images do not currently have a good database, and the collection of such an image database is more difficult compared to the Harlve database of pap-smear images [38,39]; (2) TCT images have many overlapping cells, which are not as easy to analyze as the single cells in pap-smear images [40,41]; (3) the color and quality of TCT images obtained from different medical institutions may vary greatly. Therefore, these systems are poorly adapted for the proposed system, we trained it with data from hospitals at different levels in different regions of north and south China, which ensured the high adaptability of the system. Theoretically, the CNN system has good robustness and low dependence on smears staining and preparation. The system can adapt to the images from different hospitals and reduce the secondary examination caused by low-quality images. Future research is needed for further verification in other clinical applications. A sensitivity screening system may help pathologists to quickly diagnose some positive lesions with a computational speed of 70 − 10 milliseconds per single field. Therefore, this system can relieve the burden of the pathologists to some extent. In addition, the system not only reduces the influence of appellate factors, and ensures consistency in diagnosing images of different pathologists from different hospitals, but also increases the possibility of remote diagnosis. Meanwhile, because the automatic cervical cancer diagnosis system is mainly based on CNN models, the construction and diagnosis costs are very low.
However, our proposed system also has several major limitations. Firstly, Due to a high number of overlapping and adhesion cells, the differences between diseased cells and normal cells were not distinct. It was difficult for the system to learn the characteristics of cells, and hence its accuracy was relatively low. Secondly, the collected dataset didn't have enough samples of some pathological types such as atypical squamous cells-cannot exclude high-grade squamous intraepithelial lesion (ASC-H) or atypical glandular cells (AGC). This scarcity can't lead to effective training and accurate classification. Therefore, we plan to collect more samples and enlarge our dataset in order to arrive at more valuable findings. Ultimately, it is difficult to obtain complete TCT image information from outpatients. Future research should focus on more relevant patient information, such as age, HPV infection, or vaginal inflammation. Similarly, large amounts of data will be needed to design prospective follow-up studies to enhance the stability and accuracy of the model. In addition, with the discovery of new biomarkers, tumor diagnosis based on molecular imaging may be explored as a viable path [42][43][44].

Conclusions
The CNN-based TCT cervical cancer cell classification system proposed in this paper can effectively exclude negative smear samples, and identify the suspicious population in cervical cancer screening. It also showed high sensitivity and excellent performance in the identification of cervical cancer screening, which can save time for pathologists and provide an excellent secondary prevention effect. In comparison to the conventional diagnostic methods, this system has good robustness, objectivity, and small computational cost. Meanwhile, our system provides a possibility for online diagnosis of TCT images and is expected to contribute to the construction of primary medical care.