© Springer Nature Singapore Pte Ltd. 2020
Sudeep Tanwar, Sudhanshu Tyagi and Neeraj Kumar (eds.)Multimedia Big Data Computing for IoT ApplicationsIntelligent Systems Reference Library163https://doi.org/10.1007/978-981-13-8759-3_10

Data Reduction Technique for Capsule Endoscopy

Kuntesh Jani1   and Rajeev Srivastava1  
(1)
Computer Science and Engineering Department, Indian Institute of Technology (BHU), Varanasi, India
 
 
Kuntesh Jani (Corresponding author)
 
Rajeev Srivastava

Abstract

The advancements in the field of IoT and sensors generate a huge amount of data. This huge data serves as an input to knowledge discovery and machine learning producing unprecedented results leading to trend analysis, classification, prediction, fraud and fault detection, drug discovery, artificial intelligence and many more. One such cutting-edge technology is capsule endoscopy (CE). CE is a noninvasive, non-sedative, patient-friendly and particularly child-friendly alternative to conventional endoscopy for diagnosis of gastrointestinal tract diseases. However, CE generates approximately 60000 images from each video. Further, when computer vision and pattern recognition techniques are applied to CE images for disease detection, the resultant data called feature vector sizes to 181548 for one image. Now a machine learning task for computer-aided disease detection would include nothing less than thousands of images leading to highly data intensive task. Processing such huge amount of data is an expensive task in terms of computation, memory and time. Hence, a data reduction technique needs to be employed in such a way that minimum information is lost. It is important to note that features must be discriminative and thus redundant or correlative data is not very useful. In this study, a data reduction technique is designed with the aim of maximizing the information gain. This technique exhibits high variance and low correlation to achieve this task. The data reduced feature vector is fed to a computer based diagnosis system in order to detect ulcer in the gastrointestinal tract. The proposed data reduction technique reduces the feature set to 98.34%.

Keywords

Data reductionCADCapsule endoscopy

1 Introduction

The advancements in the field of IoT and sensors generate a huge amount of data. This huge data serves as an input to knowledge discovery and machine learning producing unprecedented results leading to trend analysis, classification, prediction, fraud and fault detection, drug discovery, artificial intelligence and many more [1]. One such cutting-edge technology is capsule endoscopy (CE). CE was introduced in the year 2000 by Given Imaging Inc, Israel. CE is a non-invasive, non-sedative disposable and painless alternative to conventional endoscopy procedure [2]. It provides a comfortable and efficient way to view the complete gastrointestinal (GI) tract [3]. The endoscopic device a swallowable capsule with 3.7 g weight and 11 × 26 mm dimensions. This capsule is ingested by the patient and it is propels through the GI tract by natural peristalsis. Figure 1 presents various components as well as the length of the capsule. The capsule captures the images of the GI tract and transmits it to a receiver. This receiver is tied on the waist of the patient. The experts then with the help of a computer system analyze the received video to detect irregularities of the GI tract. With endoscopic techniques such as colonoscopy and conventional endoscopy, it is not possible to visualize the entire small intestine. Since the CE can help the doctors visualize the entire GI tract without using any sedation, invasive equipments, air-inflation or radiation, the use of this technology is increasing in hospitals. To provide timely decision from specialists on remote location, CE can be combined with IoT [4] and mobile computing technology. Looking at various restrictions related to memory, power of the battery and the available communication capabilities, the transmitting and study of these CE video data gets even more challenging.
../images/471310_1_En_10_Chapter/471310_1_En_10_Fig1_HTML.png
Fig. 1

Capsule length and components [5, 6]

Figure 2 presents a general idea of computer-aided CE system. While propelling through the GI tract, the capsule transmits data to receiver at frame rate of 2 frames per second. Approximately 8 h later when the procedure ends, the images are retrieved into a computer for experts to study for potential abnormalities. Patient passes the capsule from the body through natural passage. There is no need to retrieve the capsule. Thus problems related to sterilization and hygiene is automatically addressed.
../images/471310_1_En_10_Chapter/471310_1_En_10_Fig2_HTML.png
Fig. 2

Capsule endoscopy system [7]

By year 2015 since its approval by the U.S. food and drug administration (FDA), more than 10 lac capsules have been used [8]. However, CE videos length ranges from 6 to 8 h generating 55000–60000 frames which make the analysis time-consuming. Depending on the expertise of the examiner, the examination would take 45 min to 2 h. In addition to a huge number of frames, GI tract appearance, and intestinal dynamics, the need for constant concentration further complicates the diagnostic and analysis procedure. With advancements in technology, the volume of data is likely to grow by many folds [9]. Thus, using computer vision and machine learning together build as a computer-aided diagnosis (CAD) system and artificial intelligence in health care can be a great help for experts and physicians in diagnosing the abnormalities [10, 11]. A CAD system capable of analyzing and understanding the visual scene will certainly assist the doctor with a precise, fast and accurate diagnosis. After manual analysis of CE video, CAD can also provide a second opinion to a gastroenterologist. In medical imaging, CAD is a prominent research area capable of providing a precise diagnosis. The ultimate goal of a CAD is to limit the errors in interpretation and search. It also aims to limit the variation among the experts. In particular, a computer-aided medical diagnostic system for CE can consist of following units: (1) a data capturing and transmitting unit—the capsule (2) a data receiver and storage unit—the waist belt (3) a data processing unit for pre-processing and feature extraction (4) a machine learning based classification unit or decision support system (5) a user interaction unit for final diagnostic report. In general, a complete automated abnormality detection system comprises of a pre-processing unit, segmentation unit, feature extraction unit, and classification unit. CE images also contain un-informative images such as noise, dark regions, duplicate frames, bubbles, intestinal fluids and, food remains. By pre-processing it is important that such un-informative regions or images be isolated. Poisson maximum likelihood estimation method may be used to remove Poisson noise [12]. Pre-processing noticeable improves computational efficiency and overall detection rate. The task of pre-processing and feature extraction unit is to supply a CAD system friendly data [13]. Few methods adopted for pre-processing in CE are contrast stretching, histogram equalization and, adaptive contrast diffusion [14].

Segmentation is the process of extracting only a useful or informative part from the whole image. This process will help us concentrated only on the required portion instead of whole image. Segmentation is performed using edge based or region based or a combination of both approaches. Both methods have their advantages and disadvantages. Many techniques have been used in CE images for segmentation such as Hidden Markov Model(HMM) [15], total variation(TV) model [16] and, the Probabilistic Latent Semantic Analysis(pLSA) [17]. TV is a hybrid active contour model based on region and edge information; HMM is a statistical Markov model and, pLSA is an unsupervised machine learning technique.

Features in image analysis refer to a derived set of values providing discriminative and non-redundant information of the image. For visual patterns, extracting discriminative and robust features from the image is the most critical yet the most difficult step [18]. Researchers have explored texture, color, and shape based features in spatial as well as frequency domain to discriminate between normal and abnormal images of CE.

Classification is the last phase of the automated system. The process to predict unknown instances using the generated model is referred to as classification. Figure 3 presents a diagrammatic representation of the entire process.
../images/471310_1_En_10_Chapter/471310_1_En_10_Fig3_HTML.png
Fig. 3

Diagrammatic representation of entire process

Amongst all GI tract abnormalities, the most common lesion is an ulcer. The mortality rate for bleeding ulcers is nearly 10% [19]. Two of the important causes of GIT ulcers are non-steroidal anti-inflammatory drugs (NSAIDs) and bacteria called Helicobacter pylori (H. pylori). A un-treated ulcer may lead to ulcerative colitis and Crohn’s disease. Thus a timely detection of ulcer is a must. Table 1 presents a summary of various ulcer detection systems.
Table 1

Summary of prior art on ulcer detection

Work

Features used

Method/classifier used

Limitations

Performance

Dataset size

[20]

Texture and color

SVM

Very less number of data samples

Accuracy = 92.65%

Sensitivity = 94.12%

Total images 340

[21]

The chromatic moment

Neural network

Texture feature is neglected. Too few samples

Specificity = 84.68 ± 1.80

Sensitivity = 92.97 ± 1.05

100 images

[22]

LBP histogram

Multi-layer perceptron (MLP) and SVM

Too few samples for training

Accuracy = 92.37%,

Specificity = 91.46%,

Sensitivity = 93.28%

100 images

[23]

Dif lac analysis

De-noising using Bi-dimensional ensemble empirical mode decomposition (BEEMD)

Too few samples for training

Mean accuracy >95%

176 images

[24]

Texture and colour

Vector supported convex hull method

Specificity is less. Skewed data

Recall = 0.932

Precision = 0.696

Jaccard index = 0.235

36 images

[25]

Leung and Malik (LM) and LBP

k-nearest neighbor (k-NN)

Computationally intense.

Skewed data

Recall = 92%

Specificity = 91.8%

1750 images

This study proposes a CAD system for ulcer detection in CE using optimized feature set. Major contributions to this work are:
  • Data reduction technique

  • Automated ulcer detection using an optimized feature set

  • A thorough comparative analysis of the our designed feature selection technique with other techniques

  • A thorough analysis of the performance of designed system with other systems.

2 Materials and Models

2.1 Proposed System

This section explains the detailed significance of each stage of the designed CAD system. Following is the procedure of the entire system:
  1. (a)

    Load CE images

     
  2. (b)

    Perform noise removal

     
  3. (c)

    Perform image enhancement

     
  4. (d)

    Extract features

     
  5. (e)

    Reduce feature vector proposed data reduction technique

     
  6. (f)

    Partition data into training and testing set

     
  7. (g)

    Train the classifier model

     
  8. (h)

    Classify test data using the trained classifier model

     
Figure 4 presents a brief idea of the proposed system.
../images/471310_1_En_10_Chapter/471310_1_En_10_Fig4_HTML.png
Fig. 4

Brief idea of the system for automatic detection of ulcer

2.2 Pre-processing

The image enhancement technique is used to utilize all the details present in the image. Image enhancement provides a better perception for human visualization and better input for CAD systems [26]. Pre-processing noticeable improves the computational efficiency and overall detection rate of the system. CE images suffer from low contrast [27]. For digital images, the simple definition of contrast is the difference between the maximum and minimum pixel intensity. Figure 5 shows methodology to remove existence noise and enhance input image:
../images/471310_1_En_10_Chapter/471310_1_En_10_Fig5_HTML.png
Fig. 5

Pre-processing methodology

The Wiener filtering is used for noise smoothing. It performs linear estimation of local mean and variance of pixels in the actual image and minimizes the overall error. For enhancement of the given image, contrast-limited adaptive histogram equalization (CLAHE) is used after smoothing by Wiener filter. Instead of the whole image, CLAHE functions on small areas in the given image. It calculates the contrast transform function for each region individually. The bilinear interpolation is used to merge nearby regions to eliminate artificially induced boundaries. By this technique, the contrast of the image is limited while avoiding the amplification of the noise. Since CE images are prone to illumination related problems and low contrast, CLAHE is applied only on the L component of Lab color space for better enhancement. Finally, the image is converted back to RGB and passed for post-processing. Figure 6 shows the sample output of this stage.
../images/471310_1_En_10_Chapter/471310_1_En_10_Fig6_HTML.png
Fig. 6

Sample output of pre-processing

2.3 Extraction of Features

Three different features are included in this study. Local binary pattern (LBP), gray-level co-occurrence matrix (GLCM) and the Histogram of oriented gradients (HOG) together forms the feature set. Ulcer in CE images exhibits very discriminative texture and color properties. GLCM is a statistical method. It is helpful for analyzing textures. This study utilizes 13 texture features computed from GLCM namely homogeneity, contrast, mean, correlation, energy, standard deviation, skewness, root mean square (RMS), variance, entropy, smoothness, kurtosis and inverse difference moment(IDM). Energy measures uniformity. Entropy is a measure of complexity and it is large for non-uniform images. Energy and Entropy are inversely and strongly correlated. Variance is a measure of non-uniformity. Standard deviation and variance are strongly correlated. IDM is a measure of similarity. When all the elements in the image are same, IDM reaches its maximum value.

HOG as a feature descriptor deals with ambiguities related to texture and color [28]. Distribution (histogram) of intensity gradients can better describe the appearance of object and shape within the image. HOG identifies the edges [29]. It is computed for every pixel after dividing the image into cells. All the cells within a block are normalized, and concatenation of all these histograms is the feature descriptor. Figure 7 shows a sample CE image and visualization of the HOG descriptor.
../images/471310_1_En_10_Chapter/471310_1_En_10_Fig7_HTML.png
Fig. 7

Sample CE image and HOG descriptor visualization

LBP is a very discriminative textural feature [30]. CE images exhibits high variations related to illumination due to limited illumination capacity, limited range of vision inside GIT and motion of the camera. It is learned that LBP performs robustly to illumination variations. A total of 256 patters are obtained from a 3 × 3 neighborhood. Texture feature descriptor is the LBP histogram of 256 bin occurrence calculated over the region. A novel rotation invariant LBP is proposed in [30]. Patterns are uniform if they contain at most two transitions on a circular ring from 0 to 1 or 1 to 0. Examples of uniform patterns are 11111111 (nil transitions), 01000000 (2 transitions).

2.4 Feature Selection

The optimal size of the feature set reduces the cost of recognition as well as lead to improvement in accuracy of classification [31]. We compute 13 features from GLCM, HOG feature extraction process yields 181476 features, and LBP feature extraction leads to 59 features. Total of 181548 features is extracted from each of CE image in the dataset. Needless to say that this feature set produces extraordinary results for ulcer classification but, we must limit the number of features to a considerable number. To decrease the size of feature set and maintain the performance of classification, this study proposes a novel feature selection technique. Features with high variance can easily discriminate between two classes but, variance alone is not a good measure of information. Any two features can exhibit high variance but they may be correlated. Therefore, the proposed feature selection technique is designed on a dual criterion: high variance and low co-relation. Proposed feature selection technique is termed as high variance low co-relation (HVLC) technique. HVLC technique reduces the obtained feature set by 98.34% to 3000 number of features. Proposed technique encompasses a minimum co-relation fitness function based particle swarm optimization (PSO). This technique finds the optimal solution. It is influenced by local and global best values. Here, pbest is the best value of a particle and gbest is the best value of the whole swarm. At each iteration, ith particle updates position P as per (1) and velocity V as per (2).
$$ {\text{P}}_{\text{i}} \left( {\text{t + 1}} \right) = {\text{P}}_{\text{i}} \left( {\text{t}} \right) + {\text{V}}_{\text{i}} \left( {\text{t}} \right) $$
(1)
$$ \begin{aligned} {\text{V}}_{\text{i}} \left( {\text{t + 1}} \right) & = {\text{wV}}_{\text{i}} \left( {\text{t}} \right) + {\text{c1}} * {\text{rand}}\left( {\left[ {0,1} \right]} \right) * \left( {{\text{pbest}} - {\text{P}}_{\text{i}} \left( {\text{t}} \right)} \right) \\ & \quad + {\text{c}}2 * {\text{rand}}\left( {\left[ {0,1} \right]} \right) * \left( {{\text{gbest}} - {\text{P}}_{\text{i}} \left( {\text{t}} \right)} \right) \\ \end{aligned} $$
(2)
Where, t and (t + 1) are two successive iterations, inertia weight w, cognitive coefficient c1, and social coefficient c2 are constants. Also, c1 and c2 control the magnitude of steps taken by particle towards pbest (personal) and gbest (global) respectively. Table 2 presents the data reduction algorithm.
Table 2

Data reduction algorithm

Initial feature-set S = [1,2,…,n]

Set threshold T

Final feature set $$ {\texttt{F}}_{{\texttt{s}}} $$ =  null

Choose features with variance > T

Surviving feature set $$ {\texttt{S}}_{{\texttt{s}}} $$ = [1,2,…,m] where m < n

Set k = size of $$ {\texttt{F}}_{{\texttt{s}}} $$ from within $$ {\texttt{S}}_{{\texttt{s}}} $$ such that classification error err is very small

Set values of PSO control parameters: w = 0.2, c1 = c2 = 2

Create and initialize particles with values of P and V; initialize gbest of the population as infinity

Repeat:

For itr = 1 to population

Compute correlation C as a ranking criteria

f = argmin(C)

EndFor

Update pbest = min(f)

Update gbest = min (gbest,pbest)

Update P using (1)

Update V using (2)

Until the termination criterion is satisfied

Return improved-PSO selected values

Final reduced futures $$ {\texttt{F}}_{{\texttt{F}}} $$ = [1,2,3,4,5,…,k] where k < m

The variance threshold is experimentally chosen to fit the application. These features are then fed to SVM classifier to classify between ulcer and normal images.

2.5 Classification

Ulcer detection in CE is a binary classification problem having exactly two classes namely ulcer and normal. The SVM develops the widest possible hyperplane that can explicitly separate samples of two different classes. The support vectors are the observations falling on the boundary of the slab parallel to the hyperplane. Figure 8 presents the concept of SVM.
../images/471310_1_En_10_Chapter/471310_1_En_10_Fig8_HTML.png
Fig. 8

The concept of SVM [32]

The subsequent section presents detailed result analysis of the performance of the proposed system.

3 Results Analysis and Discussion

3.1 Dataset

Total of 1200 images from CE videos [33] is extracted out of which 201 images are of ulcers, and 999 images are normal. The dimension of each image is 576 × 576 pixels. All the images were manually diagnosed and annotated by physicians providing the ground truth. To avoid imbalanced data and overfitting 100 ulcer images and 100 normal images are carefully chosen from the annotated dataset.

3.2 Performance Metrics

Performance metrics used in this study are derived from a confusion matrix. Detailed discussion on the confusion matrix is given below (Table 3).
Table 3

Structure of the confusion matrix

Observer versus classifier

Prediction of classifier

+

Actual observation

+

True-Positive [TP]

False-Negative [FN]

False-Positive [FP]

True-Negative [TN]

Accuracy: The accuracy is measure of capability of a system to identify samples correctly.
$$ {\text{Accuracy}} = \left( {{\text{TN}} + {\text{TP}}} \right)/\left( {{\text{TN}} + {\text{TP}} + {\text{FN}} + {\text{FP}}} \right) $$
(3)
Precision: The precision is a measure of probability of correct classification of an observation.
$$ {\text{Precision}} = \left[ {{\text{TP/}}\left( {\text{FP + TP}} \right)} \right] $$
(4)
Sensitivity: Sensitivity is a measure of probability of system to provide a result that is true positive.
$$ {\text{Sensitivity}} = {\text{TP/TP + FN}} $$
(5)
Specificity: Specificity is a measure of probability of system to classify a positive observation as a negative.
$$ {\text{Specificity = FP/TN + FP}} $$
(6)
F-measure: The harmonic average of the precision and sensitivity is given by this metric. The value 100 indicates perfect system and 0 indicates worst system.
$$ {\text{F}} - {\text{measure}} = 2 * \left( {{\text{Precision}} * {\text{Sensitivity}}} \right)/\left( {\text{Precision + Sensitivity}} \right) $$
(7)
Matthews correlation coefficient (MCC): It is used even when the size of classes varies largely [34]. MCC in case of binary classifiers in machine learning is a measure of the quality.
$$ {\text{MCC = }}\left[ {\left( {{\text{TN}} * {\text{TP}}} \right) - \left( {{\text{FN}} * {\text{FP}}} \right)} \right]/{\text{SQRT}}\left[ {\left( {\text{FP + TP}} \right)\left( {\text{FN + TP}} \right)\left( {\text{FP + TN}} \right)\left( {\text{FN + TN}} \right)} \right] $$
(8)

The system is implemented on Dell Optiplex 9010 desktop computer with processor—intel core i7 and RAM—6 GB using MATLAB R2017a.

3.3 Analysis of Results

The achievement of the automated ulcer detection system largely depends on the achievement of the proposed feature selection technique. Therefore the performance of the proposed method is thoroughly compared with three other feature selection methods namely relief F [35], Fisher score [36] and Laplacian score [37]. Relief F gives a weight to a feature on the basis of the distance between observed feature and given feature. It finally provides a rank of most suitable features. Fisher score ranks each feature based on Fisher criterion. Laplacian score based feature ranking exhibits power to preserve locality. Table 4 shows a comparision of the proposed feature selection technique with three different techniques and Fig. 9 presents the graphical representation of the same.
Table 4

A comparison of proposed feature selection

Method

Accuracy

Sensitivity

Specificity

Precision

F measure

MCC

Relief F

93.7

87.5

100

100

93.3

88.19

Fisher score

91.66

90

93.3

93.10

91.5

83.37

Laplacian score

91.66

83.33

100

100

90.9

84.51

Proposed (HVLC)

95

95

95

95

95

90

../images/471310_1_En_10_Chapter/471310_1_En_10_Fig9_HTML.png
Fig. 9

Performance comparison of feature selection techniques

As presented by Fig. 9, feature set obtained by the proposed HVLC technique outperforms in accuracy, sensitivity, f measure, and MCC as compared to three other techniques. Further, the ulcer detection system is compared with two other systems. Suman et al. [3] extracted features from relevant color bands and ulcer images are classified using SVM. Koshy and Gopi [38] extracted contourlet transform and log Gabor based texture features and the classification task is performed using SVM. We have implemented both these prior art on our hardware and using our dataset. The comparative results presented in Table 5 shows that the proposed system outperforms the prior art. Figure 10 presents a graphical analysis of the result.
Table 5

A comparison of the proposed system

Method

Accuracy

Sensitivity

Specificity

Precision

F measure

MCC

[3]

80

60

100

100

75

65.5

[38]

88.75

77.5

100

100

87.3

79.5

Proposed

95

95

95

95

95

90

../images/471310_1_En_10_Chapter/471310_1_En_10_Fig10_HTML.png
Fig. 10

Performance comparison of CAD systems

As seen in Fig. 10, the proposed CAD system for ulcer detection in CE outperforms the other two systems in terms of accuracy, sensitivity, F measure, and MCC. However, it fails to outperform other systems in terms of specificity and precision. Reason for this is that the proposed systems have more false positives as compared to other systems. Approximately 5% of normal cases are misclassified as ulcer cases by the proposed system as compared to the other two systems.

4 Conclusion

With the advancements in the field of multimedia and IoT, the data generation has increased tremendously. This study focuses on the data reduction of one of the emerging medical imaging systems, capsule endoscopy. With advanced imaging system, CE generates a massive number of images with minute details. It is important for a CAD system to preserve minute details of a CE image and thereby provide a precise diagnosis. This study addresses the dilemma of reducing data while preserving crucial information. The proposed data reduction technique reduces the feature vector from 181548 to 3000 for each image. It reduces data by 98.34% and yet the proposed system outperforms when compared with other data reduction techniques and systems. The significant reduction in the size of data certainly reduces computational time and memory.