Breast Cancer Detection Using CLAHE-CNN Architecture

Avani Manesh; Abijith Biju; Amal Mohanan; Jayakrishnan B

doi:10.17577/ICCIDT2K23-212

ICCIDT- 2023 (Volume 11 – Issue 01)

Breast Cancer Detection Using CLAHE-CNN Architecture

DOI : 10.17577/ICCIDT2K23-212

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 116
Authors : Avani Manesh, Abijith Biju, Amal Mohanan, Jayakrishnan B
Paper ID : ICCIDT2K23-212
Volume & Issue : Volume 11, Issue 01 (June 2023)
Published (First Online): 11-06-2023
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Breast Cancer Detection Using CLAHE-CNN Architecture

Avani Manesp

1 Student, Dept. Of Computer Science & Engineering, Mangalam College of Engineering, India,

Abijith Biju2

2 Student, Dept. Of Computer Science & Engineering, Mangalam College of Engineering, India,

Amal Mohanan3

3Student, Dept. Of Computer Science &Engineering, Mangalam College of Engineering, India,

Jayakrishnan B4

4Assistant Professor, Dept. of Computer Science & Engineering,

Mangalam College of Engineering, India,

Abstract Breast cancer has evolved as the most lethal illness impacting women all over the globe. Researchers all around the world are working on breast cancer screening tools based on medical imaging. Deep learning approaches have piqued the attention of many in the medical imaging field due to their rapid growth. In this project, we have used CLAHE-CNN architecture in which the microscopic image or the biopsy image is passed through convolutional neural networks which identify various cancerous features in the image. The entire process actually covers four modules; pre-processing, segmentation, feature extraction, and classification. Pre- processing includes Contrast Limited Adaptive Histogram Equalization (CLAHE) and Laplacian filter which provides more specific images for segmentation. Feature extraction and classification of the image is achieved by employing LeNet-5, a variant of convolutional neural network (CLAHE). The resulting output will be displayed as a test result. This suggested CLAHE-CNN architecture using LeNet-5 has an accuracy of 90.3%. We believe that the suggested approach will be of tremendous value to healthcare practitioners identifying breast cancer patients early on perhaps to an immediate diagnosis.

Keywords Mammography, breast cancer detection, multi- instance classification, deep convolutional neural network.

INTRODUCTION

Cancer is one of the most common diseases in India which has responsibility for maximum mortality with about 0.3 million deaths per year. The chances of getting affected by this disease are accelerated due to changes in habits in the people such as an increase in use of tobacco, deterioration of dietary habits, lack of activities, and many more. The possibility of a cure for cancer is increased due to recent combined advancements in medicine and engineering.

Breast cancer is currently the most common cancer globally, accounting for 12.5% of all new annual cancer cases worldwide. Breast cancer is also a very life-threatening disease of women after lung cancer. Breast cancer is categorized into various types according to the cells appearance through a microscope. Breast cancer can be effectively treated through its

early detection. Thus, the availability of proper screening methods is important for detecting the initial symptom of breast cancer. Various imaging techniques are used for the screening to identify this disease; the popular approaches are mammography, ultrasound, and thermography. One of the most significant methods of early detection for breast cancer is mammography. Ultrasound or diagnostic sonography methods are popularly used as mammography is not effective for solid breasts. Considering these issues, small masses can be bypassed by radiations from radiography and thermography may be more effective than the ultrasound technique in diagnosing smaller cancerous masses.

Due to the intrinsic difficulties associated with an image, with meager contrast, noise, and lack of appreciation by the eye, instruments have been prepared to make and improve image processing. Nowadays, artificial intelligence (AI), machine learning (ML), and convolutional neural network (CNN) are the quickest-rising areas of the healthcare industry. AI and ML are found in the research arena that deals with and improves technological systems to resolve complex tasks by reducing the necessity of human intelligence. Deep learning (DL) which is part of the machine learning family depended on artificial neural networks. DL architectures, such as DNN (deep neural networks), RNN (recurrent neural networks), DBN (deep belief networks), and CNN, are generally applied to the areas like computer vision, audio recognition, speech recognition, social network filtering, natural language processing, machine translation, drug design, bioinformatics, medical image analysis, materials scrutiny, histopathological diagnosis, and board game programs. These new technologies, in particular DL algorithms, can be applied to improve the diagnostic accuracy and efficiency of cancer detection.

On the other hand, digital pathology (DP) is a way of digitalization of histology slides for producing high-resolution images. These digitized images are used for detection, segmentation, and classification through the application of image analysis techniques. Extra steps are required in deep learning (DL) using CNNs, such as digital staining, to understand patterns for image classification.

Here we use the hybrid architecture of CLAHE and deep convolutional neural network for the classification of the breast microscopic imaging. For accurate detection of cancer histopathology biopsy images are used. The characteristic of

microscopic biopsy images has the presence of isolated cells and cell clusters. In histopathology, the cancer detection process normally consists of categorizing the image biopsy into a cancerous one or a noncancerous one.
BACKGROUND

The study of cancer, called oncology, is the work of countless doctors and scientists around the world whose discoveries in anatomy, physiology, chemistry, epidemiology, and other related fields made oncology what it is today. Technological advances and the ever-increasing understanding of cancer make this field one of the most rapidly evolving areas of modern medicine. The growth in our knowledge of cancer biology has led to remarkable progress in cancer prevention, early detection, and treatment. Scientists have learned more about cancer in the last 2 decades than had been learned in all the centuries preceding. This doesnt change the fact, however, that all scientific knowledge is based on the knowledge already acquired by the hard work and discovery of our predecessors.

Breast cancer ranks first in the global incidence and mortality of female cancer. 24.2% of female cancer patients worldwide are affected by breast cancer each year, and 15% of female cancer deaths are breast cancer patients. The situation of breast cancer in China is more severe, with the incidence and mortality rate increasing every year, and the proportion of young women in the affected population is also increasing. Although the incidence of breast cancer is increasing year by year, the number of deaths due to breast cancer in developed countries such as Europe and the United States has begun to show a downward trend.

Early detection, early treatment" is the most important way to reduce breast cancer mortality when the cause of breast cancer is uncertain. Cancer detection has always been a major issue for pathologists and medical practitioners for diagnosis and treatment planning. The manual identification of cancer from microscopic biopsy images is subjective in nature and may vary from expert to expert depending on their expertise and other factors which include a lack of specific and accurate quantitative measures to classify the biopsy images as normal or cancerous ones. The automated identification of cancerous cells from microscopic biopsy images helps in alleviating the above-mentioned issues and provides better results if biologically intrpretable and clinically significant feature- based approaches are used for the identification of disease.
PROBLEM DEFINITION

According to the Global Cancer Statistics 2018 report, among females, breast cancer is the most frequently diagnosed cancer in the vast majority of countries (154 out of 185) and is also the leading cause of cancer death in over 100 countries. Even in the United States, a country with a developed healthcare system, breast cancer has the highest number of new cases of all kinds of cancer and is also the second most common cause of death from cancer. It has been verified that treating early-stage breast cancer can save lives. However, detecting early-stage breast cancer is a challenging task. The manual identification of cancer from microscopic biopsy images is subjective in nature. It may vary from expert to expert

depending on their expertise and other factors which include a lack of specific and accurate quantitative measures to classify the biopsy images as normal or cancerous ones. The mainstay method of breast cancer screening and diagnosis is mammography. A single mammography procedure for a patient usually produces multiple images, and all these images are screened by a radiologist one by one. It is time-consuming to complete this task and usually, an expert radiologist is needed to do it well. For the automatic classification of breast cancer on mammograms, a generalized regression artificial neural network was trained. An ANN is a machine learning algorithm suitable for different tasks including classification, prediction, and visualization. But using ANN for breast cancer detection causes difficulties in image classification and provides less performance. During image classification, 2-dimensional images need to be converted to 1-dimensional vectors. This increases the number of trainable parameters exponentially. Increasing trainable parameters takes storage and processing capability. In other words, it would be expensive. Also, an existing system with CNN architecture uses histogram equalization (HE), which provides over-contrasted images which makes it difficult to analyze the cell structure. Hence an improved and more efficient system is required for the detection of breast cancer with more accuracy and precision.
RELATED WORK
Cell image segmentation has very high practical significance in medical diagnosis. But the cell image has the problems of accretive cells, incoherent cell boundary, and the internal cavity that make it difficult to image segmentation. In this paper, a watershed algorithm based on distance transform is proposed to solve images of cell adhesion. Firstly, image enhancement is carried out as the image pre-processing, then the OTSU threshold segmentation is used to rough segment the image, and finally, the watershed algorithm by optimizing the seed points is adopted for fine segmentation. Experiment results showed that the proposed algorithm effectively solved the problems of cell adhesion and over-segmentation, the image segmentation accuracy is more than the traditional watershed algorithm and remained the cell shape to the maximum extent. Therefore, the watershed segmentation based on distance transformation transform is practical according to the accretive cell images. In the study of image processing, a watershed is a transformation defined on a grayscale image. The name refers metaphorically to a geological watershed, or drainage divide, which separates adjacent drainage basins. The watershed

transformation treats the image it operates upon like a topographic map, with the brightness of each point representing its height, and finds the lines that run along the tops of ridges. There are different technical definitions of a watershed. In graphs, watershed lines may be defined on the nodes, on the edges, or hybrid lines on both nodes and edges. Watersheds may also be defined in the continuous domain. There are also many different algorithms to compute watersheds. Watershed algorithms are used in image processing primarily for object segmentation purposes, that is, for separating different objects in an image. This allows for counting the objects or for further analysis of the separated objects. The merits are it provides details about image segmentation and the demerit is that sufficient information on image processing is not provided.
METHODOLOGY

A. Proposed System

The detection and classification of cancer from microscopic biopsy images are challenging tasks because an image usually contains many clusters and overlapping objects. The various stages involved in the proposed methodology include the enhancement of microscopic images, segmentation of background cells, feature extraction, and finally classification. For the enhancement of the microscopic biopsy images, the contrast-limited adaptive histogram equalization approach is used and for the segmentation of background cells, Meyers watershed algorithm is used. In the feature extraction phase, various biologically interpretable and clinically significant shape and morphology-based features are extracted from the segmented images which include gray-level texture features, color-based features, color gray-level texture features, etc.

The proposed system aims to develop a framework and a software tool for the automated detection and classification of cancer from microscopic biopsy images using the convolutional neural network. A hybrid method is proposed that uses the Contrast Limited Adaptive Histogram Equalization (CLAHE) together with the Laplacian filter is used as a preprocessing step of image enhancement followed by segmentation. The LeNet-5 is the variant of the Convolutional Neural Network (CNN) used for the classification of the images.

CNN is a deep learning algorithm which usually use to process spatial data like image processing. CNN has the dynamic ability to comprehend spatial information in a gradual, low-to-high-level pattern which is inspired by the workings of the human nervous system. Operaton of CNN will have high complexity and necessitates a large amount of data and execution time during the training period because of the form of feed-forward operation, this operation is a hierarchical operation in which the preceding process' output results should be used as an input in the following process. CNN operation is a complex mathematical operation that generally consists of convolutional layers, down-sampling layers (pooling), activation functions, and a fully connected layer.

Contrast-limited adaptive histogram equalization (CLAHE) is a developed version of the adaptive histogram equation (AHE) that plays a role in increasing contrast in the image by increasing the intensity range of the image or performing a stretching-out mechanism at the most frequent intensity value in the image. In CLAHE, the image is broken down into sub-images called tiles or blocks, then performs the histogram equalization process on each sub-images that has a certain value that causes the image to be overamplified and then redistribute the pixels back to the histogram, resulting in the contrast in the image being increasingly visible.

LeNet-5 CNN architecture is made up of 7 layers. The layer composition consists of 3 convolutional layers, 2 subsampling layers, and 2 fully connected layers. The first layer is the input layer this is generally not considered a layer of the network as nothing is learned in this layer. The input layer is built to take in 32×32, and these are the dimensions of images that are passed into the next layer. Those who are familiar with the MNIST dataset will be aware that the MNIST dataset images have dimensions 28×28. To get the MNIST images dimension to meet the requirements of the input layer, the 28×28 images are padded. The grayscale images used in the research paper had their pixel values normalized from 0 to 255, to values between -0.1 and 1.175. The reason for normalization is to ensure that the batch of images has a mean of 0 and a standard deviation of 1, the benefits of this are seen in the reduction in the amount of training time.

Fig 2. The system architecture of the proposed system
MODULES
LeNet is a type of Convolutional Neural Network (CNN) architecture. A Convolutional Neural Network (CNN) is a Deep Learning algorithm that takes in an input image and assigns weights and biases to various features in the image and is able to classify the images. The architecture of CNN is such that it resembles the connection network of neurons in the

human brain. The steps involved in CNN are Convolution, Pooling, Flattening, and Fully Connected Layer. In Convolution, various feature detectors or filters are applied to the image with a specific size of filter and value of stride. The main function of a feature detector is to extract features from the image and also make the size of the image smaller so that it will be easier to process it. Some information about the image will be lost, but the feature detector will bring out some important features from the image. After applying different filters, different feature maps are obtained from the convolution layer. After applying the convolutional layer, the Rectifier function is also applied to bring non-linearity to the network. It is done so because images themselves are non-linear and convolution is a linear operation i.e. element-wise multiplication and addition. So, the rectifier accesses the filter and breaks linearity. This whole process of applying the convolutional layer and applying the rectifier function is considered one step. In Pooling, a special property of neural network called spatial invariance is maintained. It means that the network does not care where the features are located, different in texture, closer or farther apart. So, if the feature is distorted, the network will have some flexibility to be able to and that feature. Various methods of pooling are max pooling, mean pooling, and sum pooling. In this research paper max pooling is used. Pooling also reduces the size of images and by reducing the number of parameters there are chances of reducing overwriting. Pooling is also known as downsampling. In Flattening, pooled feature map is attended to send it further as input to the fully connected layer. The proposed method uses two hidden layers with Rectified Linear Unit (ReLU) as an activation function and in the original output layer, the softmax function is used to predict the probability values of both classes. Adam optimizer is used in the backpropagation process to update the value of weights and biases. The main purpose of the network model is to make features as attributes to predict the class of image. This paper uses LeNet architecture, which comprises two convolutional, activation, and pooling layers. They are followed by a fully-connected layer, activation, one more fully connected, and in the end, a softmax classier.
CONCLUSION

An automated detection and classification procedure was presented for the detection of cancer from microscopic biopsy images using a convolutional neural network. The proposed analysis was based on tissues level microscopic observations of cells and nuclei for cancer detection and classification. The various stages involved in the proposed model include pre-processing of microscopic images, segmentation, feature extraction, and classification. In the future, classification results will be studied to help the pathologist find the percentage and improve the CA system using the area based on the texture information instead of from the segmented cell. Moreover, the larger data set, cross- validation, and feature selection based on some selection algorithm will be applied for the CA system to improve the performance of the classification.

REFERENCES

[1] Ankit Vidyarthi, Jatin Shad, Shubham Sharma, Paridhi Agarwal Detection and Classification of Cancer from Microscopic Biopsy Image Using Clinically Significant and Biologically Interpretable Features, IEEE 2021.

[2] M.H. Motlagh Colour Image Enhancement using Laplacian filter and Contrast Limited Adaptive Histogram Equalization IEEE International Conference on Bioinformatics and Biomedicine, 2021.

[3] C. Wang Classifying Breast Cancer Regions in Microscopic Image using Texture Analysis And Neural Network IEEE, July 2020.

[4] S. M. Pizer, E. P. Amburn, J. D. Austin Cell Image Segmentation Based on an Improved Watershed Algorithm vol. 39, pp. 355-368, 2020.