关键词:
Ensemble learning
Active learning
Cancer classification
Gene expression data
Fuzzy set
Rough set
GENE-EXPRESSION DATA
TUMOR CLASSIFICATION
CLUSTER-ANALYSIS
PREDICTION
ALGORITHM
SELECTION
SVM
摘要:
Background and Objective: Classification of cancer from gene expression data is one of the major research areas in the field of machine learning and medical science. Generally, conventional supervised methods are not able to produce desired classification accuracy due to inadequate training samples present in gene expression data to train the system. Ensemble-based active learning technique in this situation can be effective as it determines few informative samples by all the base classifiers and ensemble the decisions of all the base classifiers to get the most informative samples. Most informative samples are labeled by the subject experts and those are added to the training set, which can improve the classification accuracy. Method: We propose a novel ensemble-based active learning using fuzzy-rough approach for cancer sample classification from microarray gene expression data. The proposed method is able to deal with the uncertainty, overlap and indiscernibility usually present in the subtype classes of the gene expression data and can improve the accuracy of the individual base classifier in presence of limited training samples. Results: The proposed method is validated using eight microarray gene expression datasets. The performance of the proposed method in terms of classification accuracy, precision, recall, F-1-measures and kappa is compared with six other methods. The improvements in accuracy achieved by the proposed method compared to its nearest competitive methods are 2.96%, 9.34%, 0.93%, 3.69%, 7.2% and 4.53% respectively for Colon cancer, Prostate cancer, SRBCT, Ovarian cancer, DLBCL and Central nervous system datasets. Results of the paired t-test justify the statistical relevance of the results in favor of the proposed method for most of the datasets. Conclusion: The proposed method is an effective general purpose ensemble-based active learning adopting the fuzzy-rough concept and therefore can be applied for other classification problem in future.