br Fig Pap smear segmentation
Fig. 3. Pap smear segmentation to achieve pixel level classification. and 200 from Dataset 2) was defined by Equation (2).
where Ni, Ci, Bi and Di are the number of pixels from the nucleus, cy-toplasm, background and debris of image i as shown in Fig. 4.
Each pixel extracted from the image represents not only its intensity but also a set of image features that contain much information including texture, borders, and colour, within a pixel area of 0.201 μm2. Choosing an appropriate feature vector for training the classifier was a great challenge and a novel task in the proposed approach. The pixel level classifier was trained using a total of 226 training features from TWS (as shown in Table 1).
The classifier was trained using a set of TWS training features which included: (i) Noise Reduction: Kuwahara  and Bilateral filters  were used to train the classifier on noise removal. These have been reported to be excellent filters for removing noise whilst preserving the edges , (ii) Edge Detection: A Sobel filter , Hessian matrix  and Gabor filter  were used for training the classifier on boundary detection in an image, and (iii) Texture filtering: The mean, variance, median, maximum, minimum and entropy filters were used for texture filtering. The TWS performed extremely well in the segmentation of single Calpain Inhibitor I ALLN and the full Pap smear image, as shown in Fig. 5. This was very useful for the identification of cells and debris requiring further analysis.
The main limitations of many of the existing automated Pap smear analysis systems is that they struggle to overcome the complexity of the Pap smear structures, by trying to analyze the slide as a whole, which often contain multiple cells and debris. This has the potential to cause the failure of the algorithm and requires higher computational power . Samples are covered in artefacts - such as blood cells, overlapping and folded cells, and bacteria - that hamper the segmentation processes and generate a large number of suspicious objects. It has been shown that classifiers designed to differentiate between normal cells and pre-cancerous cells usually produce unpredictable results when artefacts exist in the Pap smear . In this paper, a technique to identify cervix cells using a three-phase sequential elimination scheme (depicted in Fig. 6) is presented.
The proposed three-phase elimination scheme sequentially removes debris from the Pap smear if deemed unlikely to be a cervix cell. This approach is beneficial as it allows a lower-dimensional decision to be made at each stage.
Size Analysis: Size analysis is a set of procedures for determining a
Total number of pixels and training features for TWS segmentation.
Class Number of pixels Number of training features
Fig. 5. Original single cell (A, D), Single cell nuclei segmentations (B, E), Single cell cytoplasm segmentations (C, F), original Pap smear image (G) and seg-mented Pap smear image (H).
Fig. 6. Three-phase sequential elimination approach for debris rejection.
range of size measurements of particles . The area is one of the most basic features used in the field of automated cytology to separate cells from debris. The Pap smear analysis is a well-studied field with much prior knowledge regarding cell properties . However, one of the key changes with nucleus area assessment is that cancerous cells undergo a substantial increase in nuclear size . Therefore, de-termining an upper size threshold that does not systematically exclude diagnostic cells is more difficult, but has the advantage of reducing the search space. The method presented in this paper is based on a lower size and upper size threshold of the cervical cells. The pseudo code for the approach is shown in Equation (3).
If Area min Area roi Areamax then < foreground > else < Background>,
where Areamax = 85,267µm2 and Areamin = 625µm2 and Arearoi is the area of the object being analysed. The objects in the background are regarded as debris, and thus discarded from the image. Particles that fall between Areamin and Areamax are further analysed during the next stages of texture and shape analysis (as shown in Fig. 7).
Shape Analysis: The shape of the objects in a Pap smear is a key feature in differentiating between cells and debris . There are a number of methods for shape description detection and these include region-based and contour-based approaches . Region-based methods are less sensitive to noise but are more computationally in-tensive, whereas contour-based methods are relatively efficient to cal-culate but more sensitive to noise . In this paper, a region-based method (perimeter2/area (P2A)) has been used . The P2A de-scriptor was chosen on the merit that it describes the similarity of an object to a circle. This makes it well suited as a cell nucleus descriptor since nuclei are generally circular in their appearance. The P2A is also