Ajou University repository

ClearF: A supervised feature scoring method to find biomarkers using class-wise embedding and reconstructionoa mark
Citations

SCOPUS

4

Citation Export

Publication Year
2019-07-11
Publisher
BioMed Central Ltd.
Citation
BMC Medical Genomics, Vol.12
Keyword
Breast cancerDimension reductionFeature scoringFeature selectionLow-dimensional embeddingMutual information (MI)Principal component analysis (PCA)Reconstruction error
Mesh Keyword
BenchmarkingBiomarkersComputational BiologySupervised Machine Learning
All Science Classification Codes (ASJC)
GeneticsGenetics (clinical)
Abstract
Background: Feature selection or scoring methods for the detection of biomarkers are essential in bioinformatics. Various feature selection methods have been developed for the detection of biomarkers, and several studies have employed information-theoretic approaches. However, most of these methods generally require a long processing time. In addition, information-theoretic methods discretize continuous features, which is a drawback that can lead to the loss of information. Results: In this paper, a novel supervised feature scoring method named ClearF is proposed. The proposed method is suitable for continuous-valued data, which is similar to the principle of feature selection using mutual information, with the added advantage of a reduced computation time. The proposed score calculation is motivated by the association between the reconstruction error and the information-theoretic measurement. Our method is based on class-wise low-dimensional embedding and the resulting reconstruction error. Given multi-class datasets such as a case-control study dataset, low-dimensional embedding is first applied to each class to obtain a compressed representation of the class, and also for the entire dataset. Reconstruction is then performed to calculate the error of each feature and the final score for each feature is defined in terms of the reconstruction errors. The correlation between the information theoretic measurement and the proposed method is demonstrated using a simulation. For performance validation, we compared the classification performance of the proposed method with those of various algorithms on benchmark datasets. Conclusions: The proposed method showed higher accuracy and lower execution time than the other established methods. Moreover, an experiment was conducted on the TCGA breast cancer dataset, and it was confirmed that the genes with the highest scores were highly associated with subtypes of breast cancer.
ISSN
1755-8794
Language
eng
URI
https://dspace.ajou.ac.kr/dev/handle/2018.oak/30815
DOI
https://doi.org/10.1186/s12920-019-0512-9
Fulltext

Type
Article
Funding
This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2018-2018-0-01431) supervised by the IITP (Institute for Information & communications Technology Promotion). Publication costs are funded by IITP (IITP-2018-2018-0-01431) and Ajou University.
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Sohn, Kyung-Ah Image
Sohn, Kyung-Ah손경아
Department of Software and Computer Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.