br ReliefF BGSA br ReliefF RBGSA br ReliefF BGSA
ReliefF-BGSA
ReliefF-RBGSA
ReliefF-BGSA
Number of Iterations
Number of Iterations
Central nervous system (a)
Central nervous system (b)
ReliefF-RBGSA
ReliefF-BGSA
Accuracy(%)
ReliefF-RBGSA
ReliefF-BGSA
Number of Iterations
Breast (a)
Number of Iterations
Breast (b)
ReliefF-RBGSA
ReliefF-BGSA
Accuracy(%)
ReliefF-RBGSA
ReliefF-BGSA
Number of Iterations
Number of Iterations
Ovarian (a)
Ovarian (b)
Fig. 1. Comparative evaluation curves of ReliefF-RBGSA and ReliefF-BGSA on classification accuracy (a) and number of Puromycin (b).
Table 3 shows that the experimental results of classification accuracy and number of genes for the 6 microarray datasets got by ReliefF-RBGSA and ReliefF-BGSA. The classification accuracy is computed in terms of best value, standard deviation, and mean. In Table 3 Acc represents accuracy and NF indicates average number of genes selected in 10 runs. From Table 3, it is observed that the ReliefF-RBGSA achieve better performance in accuracy while choosing a very small set of genes in comparison to ReliefF-BGSA since it chooses the least number of relevant genes while not decreasing accuracy.
Table 3: Statistical results obtained by ReliefF-RBGSA and ReliefF-BGSA on 6 microarray datasets.
Datsets
Performance
ReliefF-RBGSA
ReliefF-BGSA
measures
Best
Mean
SD
Best
Mean
SD
Colon
Acc
NF
Central nervous system
Acc
NF
ALL-AML
Acc
NF
Breast
Acc
NF
Lung
Acc
NF
Ovarian
Acc
NF
Table 4: P-values for ReliefF-RBGSA and ReliefF-BGSA for ten runs.
We have presented statistical p-values for ReliefF-RBGSA and ReliefF-BGSA in Table 4. Frome Tables 4, we can see that ReliefF-RBGSA shows very small p-values over all datasets comparing to ReliefF-BGSA. From the p-values of Table 4 we can see that the accuracy difference for ReliefF-RBGSA method and ReliefF-BGSA is statistically significant.
3.2.2. comparison study between ReliefF-RBGSA-MNB and state-of-the-art Methods
We give the comparison results in Table 5 including the mean classification accuracy and average number of genes selected on 6 datasets. In table 5, CNS denotes Central nervous system. The ReliefF -RBGSA-MNB has obtained 98.39% in terms of classification accuracy among 6 approaches. The mean accuracy is computed on 10 independent run (each run includes 100 iterations) for each dataset individually. The highest mean classification accuracy is represented with bold type -face over each dataset. The ReliefF-RBGSA-MNB achieved 100% classification accuracy and have obtained comparative results on all datasets. The MBEGA method has the worst results in classification performance for each microarray cancer dataset.
Table 5 Experimental results of classification performance for 6 microarray cancer datasets.
4. Conclusion
In this work, we develop a novel method which integrates a ReliefF and RBGSA-MNB to perform classification of cancer. In our proposed method, the raw gene set is decreased first through ReliefF in order to remove the irrelevant and redundant genes. Then based on this decreased gene subset, RBGSA selects an optimized gene set using a recursive feature elimination scheme to obtain a great improvement of performance in terms of classification accuracy while cutting down a great deal of redundant features in the gene set, in which Multinomial Naive Bayes (MNB) classifier is used. We compared our proposed model with other existing methods for 6 publicly available benchmark datasets. Experimental results show that ReliefF-RBGSA-MNB achieves the best accuracy in most cases. Through reducing irrelevant and redundant genes, ReliefF-RBGSA -MNB effectively decreases the dimensionality of data. The obtained low dimensional set is the most important genes which can obtain higher classification accuracy.