Breast Cancer Detection Using CLAHE-CNN Architecture

DOI : 10.17577/ICCIDT2K23-212

Download Full-Text PDF Cite this Publication

Text Only Version

Breast Cancer Detection Using CLAHE-CNN Architecture

Avani Manesp

1 Student, Dept. Of Computer Science & Engineering, Mangalam College of Engineering, India,

Abijith Biju2

2 Student, Dept. Of Computer Science & Engineering, Mangalam College of Engineering, India,

Amal Mohanan3

3Student, Dept. Of Computer Science &Engineering, Mangalam College of Engineering, India,

Jayakrishnan B4

4Assistant Professor, Dept. of Computer Science & Engineering,

Mangalam College of Engineering, India,

Abstract Breast cancer has evolved as the most lethal illness impacting women all over the globe. Researchers all around the world are working on breast cancer screening tools based on medical imaging. Deep learning approaches have piqued the attention of many in the medical imaging field due to their rapid growth. In this project, we have used CLAHE-CNN architecture in which the microscopic image or the biopsy image is passed through convolutional neural networks which identify various cancerous features in the image. The entire process actually covers four modules; pre-processing, segmentation, feature extraction, and classification. Pre- processing includes Contrast Limited Adaptive Histogram Equalization (CLAHE) and Laplacian filter which provides more specific images for segmentation. Feature extraction and classification of the image is achieved by employing LeNet-5, a variant of convolutional neural network (CLAHE). The resulting output will be displayed as a test result. This suggested CLAHE-CNN architecture using LeNet-5 has an accuracy of 90.3%. We believe that the suggested approach will be of tremendous value to healthcare practitioners identifying breast cancer patients early on perhaps to an immediate diagnosis.

Keywords Mammography, breast cancer detection, multi- instance classification, deep convolutional neural network.

  1. INTRODUCTION

    Cancer is one of the most common diseases in India which has responsibility for maximum mortality with about 0.3 million deaths per year. The chances of getting affected by this disease are accelerated due to changes in habits in the people such as an increase in use of tobacco, deterioration of dietary habits, lack of activities, and many more. The possibility of a cure for cancer is increased due to recent combined advancements in medicine and engineering.

    Breast cancer is currently the most common cancer globally, accounting for 12.5% of all new annual cancer cases worldwide. Breast cancer is also a very life-threatening disease of women after lung cancer. Breast cancer is categorized into various types according to the cells appearance through a microscope. Breast cancer can be effectively treated through its

    early detection. Thus, the availability of proper screening methods is important for detecting the initial symptom of breast cancer. Various imaging techniques are used for the screening to identify this disease; the popular approaches are mammography, ultrasound, and thermography. One of the most significant methods of early detection for breast cancer is mammography. Ultrasound or diagnostic sonography methods are popularly used as mammography is not effective for solid breasts. Considering these issues, small masses can be bypassed by radiations from radiography and thermography may be more effective than the ultrasound technique in diagnosing smaller cancerous masses.

    Due to the intrinsic difficulties associated with an image, with meager contrast, noise, and lack of appreciation by the eye, instruments have been prepared to make and improve image processing. Nowadays, artificial intelligence (AI), machine learning (ML), and convolutional neural network (CNN) are the quickest-rising areas of the healthcare industry. AI and ML are found in the research arena that deals with and improves technological systems to resolve complex tasks by reducing the necessity of human intelligence. Deep learning (DL) which is part of the machine learning family depended on artificial neural networks. DL architectures, such as DNN (deep neural networks), RNN (recurrent neural networks), DBN (deep belief networks), and CNN, are generally applied to the areas like computer vision, audio recognition, speech recognition, social network filtering, natural language processing, machine translation, drug design, bioinformatics, medical image analysis, materials scrutiny, histopathological diagnosis, and board game programs. These new technologies, in particular DL algorithms, can be applied to improve the diagnostic accuracy and efficiency of cancer detection.

    On the other hand, digital pathology (DP) is a way of digitalization of histology slides for producing high-resolution images. These digitized images are used for detection, segmentation, and classification through the application of image analysis techniques. Extra steps are required in deep learning (DL) using CNNs, such as digital staining, to understand patterns for image classification.

    Here we use the hybrid architecture of CLAHE and deep convolutional neural network for the classification of the breast microscopic imaging. For accurate detection of cancer histopathology biopsy images are used. The characteristic of

    microscopic biopsy images has the presence of isolated cells and cell clusters. In histopathology, the cancer detection process normally consists of categorizing the image biopsy into a cancerous one or a noncancerous one.

  2. BACKGROUND

    The study of cancer, called oncology, is the work of countless doctors and scientists around the world whose discoveries in anatomy, physiology, chemistry, epidemiology, and other related fields made oncology what it is today. Technological advances and the ever-increasing understanding of cancer make this field one of the most rapidly evolving areas of modern medicine. The growth in our knowledge of cancer biology has led to remarkable progress in cancer prevention, early detection, and treatment. Scientists have learned more about cancer in the last 2 decades than had been learned in all the centuries preceding. This doesnt change the fact, however, that all scientific knowledge is based on the knowledge already acquired by the hard work and discovery of our predecessors.

    Breast cancer ranks first in the global incidence and mortality of female cancer. 24.2% of female cancer patients worldwide are affected by breast cancer each year, and 15% of female cancer deaths are breast cancer patients. The situation of breast cancer in China is more severe, with the incidence and mortality rate increasing every year, and the proportion of young women in the affected population is also increasing. Although the incidence of breast cancer is increasing year by year, the number of deaths due to breast cancer in developed countries such as Europe and the United States has begun to show a downward trend.

    Early detection, early treatment" is the most important way to reduce breast cancer mortality when the cause of breast cancer is uncertain. Cancer detection has always been a major issue for pathologists and medical practitioners for diagnosis and treatment planning. The manual identification of cancer from microscopic biopsy images is subjective in nature and may vary from expert to expert depending on their expertise and other factors which include a lack of specific and accurate quantitative measures to classify the biopsy images as normal or cancerous ones. The automated identification of cancerous cells from microscopic biopsy images helps in alleviating the above-mentioned issues and provides better results if biologically intrpretable and clinically significant feature- based approaches are used for the identification of disease.

  3. PROBLEM DEFINITION

    According to the Global Cancer Statistics 2018 report, among females, breast cancer is the most frequently diagnosed cancer in the vast majority of countries (154 out of 185) and is also the leading cause of cancer death in over 100 countries. Even in the United States, a country with a developed healthcare system, breast cancer has the highest number of new cases of all kinds of cancer and is also the second most common cause of death from cancer. It has been verified that treating early-stage breast cancer can save lives. However, detecting early-stage breast cancer is a challenging task. The manual identification of cancer from microscopic biopsy images is subjective in nature. It may vary from expert to expert

    depending on their expertise and other factors which include a lack of specific and accurate quantitative measures to classify the biopsy images as normal or cancerous ones. The mainstay method of breast cancer screening and diagnosis is mammography. A single mammography procedure for a patient usually produces multiple images, and all these images are screened by a radiologist one by one. It is time-consuming to complete this task and usually, an expert radiologist is needed to do it well. For the automatic classification of breast cancer on mammograms, a generalized regression artificial neural network was trained. An ANN is a machine learning algorithm suitable for different tasks including classification, prediction, and visualization. But using ANN for breast cancer detection causes difficulties in image classification and provides less performance. During image classification, 2-dimensional images need to be converted to 1-dimensional vectors. This increases the number of trainable parameters exponentially. Increasing trainable parameters takes storage and processing capability. In other words, it would be expensive. Also, an existing system with CNN architecture uses histogram equalization (HE), which provides over-contrasted images which makes it difficult to analyze the cell structure. Hence an improved and more efficient system is required for the detection of breast cancer with more accuracy and precision.

  4. RELATED WORK

      1. Color Image Enhancement using Laplacian Filter and Contrast Limited Adaptive Histogram Equalization

        Image enhancement is the technique to improve the perception of information in images to provide better visualization. The image suffers from noise, low resolution, and low contrast that degrade the quality of the image, and hence enhancement is a must. There are two types of image enhancement techniques i.e., (i) Spatial domain and (ii) Frequency domain. Image enhancement has various applications in Medical imaging, Aerial imaging, Satellite images that suffer from various weather conditions, digital camera applications, underwater imaging, remote sensing, and forensic labs where the samples are enhanced for evidence collection, identification, etc. Lu Wang and Cheolkon Jung proposed a contrast enhancement technique by preserving tone in the images and by making use of rational tone mapping and constrained optimization. The image enhancement also includes the techniques like fuzzy-based enhancement, Winer filter, and Non-linear transfer function; Contrast stretching using neighborhood dependency and tone reproduction are discussed in the literature. In this paper, color image enhancement using the Laplacian filter and CLAHE is proposed. the RGB original color image is converted to HSV and subjected to a Laplacian filter in turn processing block to obtain the luminance component of the image. The output of the luminance component is further subjected to CLAHE to obtain an enhanced V component with contrast stretching on S with an appropriate stretching factor. Finally, the original RGB image is obtained by reconversion of HSV with better results of PSNR. Fig 1 shows the proposed flow diagram of the color image enhancement. The merit is the better enhancement of

        image and the demerit is the other steps of image processing is not discussed.

        Fig 1. The proposed flow diagram

      2. Cell Image Segmentation Based on an Improved Watershed Algorithm

    Cell image segmentation has very high practical significance in medical diagnosis. But the cell image has the problems of accretive cells, incoherent cell boundary, and the internal cavity that make it difficult to image segmentation. In this paper, a watershed algorithm based on distance transform is proposed to solve images of cell adhesion. Firstly, image enhancement is carried out as the image pre-processing, then the OTSU threshold segmentation is used to rough segment the image, and finally, the watershed algorithm by optimizing the seed points is adopted for fine segmentation. Experiment results showed that the proposed algorithm effectively solved the problems of cell adhesion and over-segmentation, the image segmentation accuracy is more than the traditional watershed algorithm and remained the cell shape to the maximum extent. Therefore, the watershed segmentation based on distance transformation transform is practical according to the accretive cell images. In the study of image processing, a watershed is a transformation defined on a grayscale image. The name refers metaphorically to a geological watershed, or drainage divide, which separates adjacent drainage basins. The watershed

    transformation treats the image it operates upon like a topographic map, with the brightness of each point representing its height, and finds the lines that run along the tops of ridges. There are different technical definitions of a watershed. In graphs, watershed lines may be defined on the nodes, on the edges, or hybrid lines on both nodes and edges. Watersheds may also be defined in the continuous domain. There are also many different algorithms to compute watersheds. Watershed algorithms are used in image processing primarily for object segmentation purposes, that is, for separating different objects in an image. This allows for counting the objects or for further analysis of the separated objects. The merits are it provides details about image segmentation and the demerit is that sufficient information on image processing is not provided.

  5. METHODOLOGY

    A. Proposed System

    The detection and classification of cancer from microscopic biopsy images are challenging tasks because an image usually contains many clusters and overlapping objects. The various stages involved in the proposed methodology include the enhancement of microscopic images, segmentation of background cells, feature extraction, and finally classification. For the enhancement of the microscopic biopsy images, the contrast-limited adaptive histogram equalization approach is used and for the segmentation of background cells, Meyers watershed algorithm is used. In the feature extraction phase, various biologically interpretable and clinically significant shape and morphology-based features are extracted from the segmented images which include gray-level texture features, color-based features, color gray-level texture features, etc.

    The proposed system aims to develop a framework and a software tool for the automated detection and classification of cancer from microscopic biopsy images using the convolutional neural network. A hybrid method is proposed that uses the Contrast Limited Adaptive Histogram Equalization (CLAHE) together with the Laplacian filter is used as a preprocessing step of image enhancement followed by segmentation. The LeNet-5 is the variant of the Convolutional Neural Network (CNN) used for the classification of the images.

    CNN is a deep learning algorithm which usually use to process spatial data like image processing. CNN has the dynamic ability to comprehend spatial information in a gradual, low-to-high-level pattern which is inspired by the workings of the human nervous system. Operaton of CNN will have high complexity and necessitates a large amount of data and execution time during the training period because of the form of feed-forward operation, this operation is a hierarchical operation in which the preceding process' output results should be used as an input in the following process. CNN operation is a complex mathematical operation that generally consists of convolutional layers, down-sampling layers (pooling), activation functions, and a fully connected layer.

    Contrast-limited adaptive histogram equalization (CLAHE) is a developed version of the adaptive histogram equation (AHE) that plays a role in increasing contrast in the image by increasing the intensity range of the image or performing a stretching-out mechanism at the most frequent intensity value in the image. In CLAHE, the image is broken down into sub-images called tiles or blocks, then performs the histogram equalization process on each sub-images that has a certain value that causes the image to be overamplified and then redistribute the pixels back to the histogram, resulting in the contrast in the image being increasingly visible.

    LeNet-5 CNN architecture is made up of 7 layers. The layer composition consists of 3 convolutional layers, 2 subsampling layers, and 2 fully connected layers. The first layer is the input layer this is generally not considered a layer of the network as nothing is learned in this layer. The input layer is built to take in 32×32, and these are the dimensions of images that are passed into the next layer. Those who are familiar with the MNIST dataset will be aware that the MNIST dataset images have dimensions 28×28. To get the MNIST images dimension to meet the requirements of the input layer, the 28×28 images are padded. The grayscale images used in the research paper had their pixel values normalized from 0 to 255, to values between -0.1 and 1.175. The reason for normalization is to ensure that the batch of images has a mean of 0 and a standard deviation of 1, the benefits of this are seen in the reduction in the amount of training time.

    Fig 2. The system architecture of the proposed system

  6. MODULES

      1. PRE-PROCESSING

        Pre-processing is done using two methods Contrast Limited Adaptive Histogram Equalization (CLAHE) and Laplacian Filter. Contrast Limited Adaptive Histogram Equalization (CLAHE) CLAHE is a form of adaptive histogram equalization where the amplification of contrast is limited, so as to minimize the problem of amplification of noise. Histogram equalization is an image processing technique of contrast adjustment using the histogram of pixel values. It increases the global contrast of images, especially when the image is represented by close contrast values. It is used to enhance new detail within an image. Histogram equalization is

        considered effective when the histogram of the image is restricted to a particular region. It might not work well in cases where there are large intensity variations and the histogram covers a large region, i.e. both bright and dark pixels are there. To solve this, adaptive histogram equalization is used. In this technique, the entire image is divided into small individual blocks called tiles. Then histogram equalization is performed in each of these tiles.

        Laplacian Filter

        The Laplacian of an image highlights regions of rapid intensity change and is an example of a second-order or second- derivative method of enhancement. It is particularly good at finding the fine detail in an image. Any feature with a sharp discontinuity will be enhanced by a Laplacian operator. The Laplacian is a well-known linear differential operator approximating the second derivative. Laplacian filters are derivative filters used to find areas of rapid change (edges) in images. Since derivative filters are very sensitive to noise, it is common to smooth the image (e.g., using a Gaussian filter) before applying the Laplacian. This two-step process is called the Laplacian of Gaussian (LoG) operation. When using the filter given above, or any other similar filter, the output can contain values that are quite large and may be negative, so it is important to use an image type that supports negatives and a large range, and then scale the output. Alternatively, a scaling factor can be used on the filter to restrict the range of values. Now combine the two images in an effort to sharpen the original image. You may have to scale the filtered image before combining the two images. Also, you may have to translate the filtered image by half the width of the convolution kernel in both the x and y directions in order to register the images correctly. The enhancement sharpens the edges but also increases noise. If the original image is filtered with a simple Laplacian, the resulting output is rather noisy. Combining this output with the original will give a noisy result. On the other hand, using a larger Gaussian will reduce the noise, but the sharpening effect will be reduced.

        Fig 3. Enhancement of image using Laplacian filter

        Fig 4 clearly depicts the enhancement of images using the Laplacian filter. Here the original image is sharpened pixel- wise. In the Laplacian filter, the RGB image is converted into HSV and enhanced and reconverted it into RGB.

      2. SEGMENTATION

        The segmentation task is to isolate an area of nuclei into various groups that have similar properties in each group. It is an important task for the quantitative analysis of histopathological images, which have the cell of interest mixed with other cell types. However, many researchers have studied using the texture information from a group of cells or area- based images instead of using only a single cell. Examples include the classification of the epithelium and stroma in the colorectal image and the categorization of the viable and non- viable tumors in lung tissue. The high-performance results can be obtained without the requirement of identifying the exact location of cells. In this study, the area-based texture information of BCCI is interesting. Therefore, the images will be cropped to produce the image with a smaller size before entering the classifying process.

        Meyers Flooding Algorithm

        A number of improvements, collectively called Priority-Flood, have since been made to this algorithm, including variants suitable for datasets consisting of trillions of pixels. The algorithm works on a Grayscale image. During the successive flooding of the grey value relief, watersheds with adjacent catchment basins are constructed. This flooding process is performed on the gradient image, i.e. the basins should emerge along the edges. Normally this will lead to an over-segmentation of the image, especially for noisy image material, e.g. medical CT data. Either the image must be pre- processed or the regions must be merged on the basis of a similarity criterion afterward.

        • A set of markers, pixels where the flooding shall start, are chosen. Each is given a different label.

        • The neighboring pixels of each marked area are inserted into a priority queue with a priority level corresponding to the gradient magnitude of the pixel.

        • The pixel with the highest priority level is extracted from the priority queue. If the neighbors of the extracted pixel that have already been labeled all have the same label, then the pixel is labeled with their label. All non-marked neighbors that are not yet in the priority queue are put into the priority queue.

        • Redo step 3 until the priority queue is empty.

        Fig 4. Enhancement of image after segmentation

      3. FEATURE EXTRACTION

        In order to yield a high classification performance, appropriate features need to be constructed and selected as classifier inputs. The main features found in BCCI research include color, morphology, and texture feature. For example, an average value of L*, a*, b* in CIE-L*a*b* color space, a circularity ratio, and an area measure, represent features used to classify the cell typs in a BCCI image. The most used color feature is La*b* color space that is transformed from red-green- blue (RGB) color in a pre-processing step. The morphological features, e.g. area, perimeter, convexity, and eccentricity, were combined with the textural features in each color channel of RGB. The first-order texture features, i.e., mean, and variance, have been applied often to the breast cancer cell counting system. Some researchers use them as single, whereas some researchers combine them to extract the structural information from the segmented cells, for instance, the combination of first- order texture, Laws texture and color feature, first-order texture, and Gray level co-occurrence matrices (GLCM). Nevertheless, other applications of breast cancer research usually applied the texture features. Some of them have given high performance but have not been studied in the context of estrogen receptor status detection such as run-length matrices and fractal dimension (FD).

      4. DEEP MODEL LeNet-5

    LeNet is a type of Convolutional Neural Network (CNN) architecture. A Convolutional Neural Network (CNN) is a Deep Learning algorithm that takes in an input image and assigns weights and biases to various features in the image and is able to classify the images. The architecture of CNN is such that it resembles the connection network of neurons in the

    human brain. The steps involved in CNN are Convolution, Pooling, Flattening, and Fully Connected Layer. In Convolution, various feature detectors or filters are applied to the image with a specific size of filter and value of stride. The main function of a feature detector is to extract features from the image and also make the size of the image smaller so that it will be easier to process it. Some information about the image will be lost, but the feature detector will bring out some important features from the image. After applying different filters, different feature maps are obtained from the convolution layer. After applying the convolutional layer, the Rectifier function is also applied to bring non-linearity to the network. It is done so because images themselves are non-linear and convolution is a linear operation i.e. element-wise multiplication and addition. So, the rectifier accesses the filter and breaks linearity. This whole process of applying the convolutional layer and applying the rectifier function is considered one step. In Pooling, a special property of neural network called spatial invariance is maintained. It means that the network does not care where the features are located, different in texture, closer or farther apart. So, if the feature is distorted, the network will have some flexibility to be able to and that feature. Various methods of pooling are max pooling, mean pooling, and sum pooling. In this research paper max pooling is used. Pooling also reduces the size of images and by reducing the number of parameters there are chances of reducing overwriting. Pooling is also known as downsampling. In Flattening, pooled feature map is attended to send it further as input to the fully connected layer. The proposed method uses two hidden layers with Rectified Linear Unit (ReLU) as an activation function and in the original output layer, the softmax function is used to predict the probability values of both classes. Adam optimizer is used in the backpropagation process to update the value of weights and biases. The main purpose of the network model is to make features as attributes to predict the class of image. This paper uses LeNet architecture, which comprises two convolutional, activation, and pooling layers. They are followed by a fully-connected layer, activation, one more fully connected, and in the end, a softmax classier.

  7. CONCLUSION

An automated detection and classification procedure was presented for the detection of cancer from microscopic biopsy images using a convolutional neural network. The proposed analysis was based on tissues level microscopic observations of cells and nuclei for cancer detection and classification. The various stages involved in the proposed model include pre-processing of microscopic images, segmentation, feature extraction, and classification. In the future, classification results will be studied to help the pathologist find the percentage and improve the CA system using the area based on the texture information instead of from the segmented cell. Moreover, the larger data set, cross- validation, and feature selection based on some selection algorithm will be applied for the CA system to improve the performance of the classification.

REFERENCES

[1] Ankit Vidyarthi, Jatin Shad, Shubham Sharma, Paridhi Agarwal Detection and Classification of Cancer from Microscopic Biopsy Image Using Clinically Significant and Biologically Interpretable Features, IEEE 2021.

[2] M.H. Motlagh Colour Image Enhancement using Laplacian filter and Contrast Limited Adaptive Histogram Equalization IEEE International Conference on Bioinformatics and Biomedicine, 2021.

[3] C. Wang Classifying Breast Cancer Regions in Microscopic Image using Texture Analysis And Neural Network IEEE, July 2020.

[4] S. M. Pizer, E. P. Amburn, J. D. Austin Cell Image Segmentation Based on an Improved Watershed Algorithm vol. 39, pp. 355-368, 2020.