Ajou University repository

Transformer-Based Gene Scoring Model for Extracting Representative Characteristic of Central Dogma Process to Prioritize Pathogenic Genes Applying Breast Cancer Multi-omics Data
  • Jhee, Jong Ho ;
  • Song, Min Young ;
  • Kim, Byung Gon ;
  • Shin, Hyunjung ;
  • Lee, Soo Youn
Citations

SCOPUS

0

Citation Export

DC Field Value Language
dc.contributor.authorJhee, Jong Ho-
dc.contributor.authorSong, Min Young-
dc.contributor.authorKim, Byung Gon-
dc.contributor.authorShin, Hyunjung-
dc.contributor.authorLee, Soo Youn-
dc.date.issued2023-01-01-
dc.identifier.urihttps://aurora.ajou.ac.kr/handle/2018.oak/36929-
dc.identifier.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85151568347&origin=inward-
dc.description.abstractVarious deep learning approaches using big multiomics data of cancer patients are being applied to identify biomarkers of diverse cancer types these days. Because multiomics data generally have a character with high dimensions compared with relatively few patient samples, this imbalance is a recognized bottleneck to apply integrated characteristics of multiomics in cancer research. Among the dimensionality reduction techniques, deep learning-based approaches, such as autoencoder, are known to have strength in handling high dimensional data with few samples. However, the black box model makes it difficult to explain which genes are essential. In this study, we develop a transformer-based representative Central tendency Gene score considering Central Dogma process information (CGCD) model to predict optimized potential anti-breast cancer therapeutic target genes. It is based on a unified representation applying the compressed features learned through Transformer using multiomics data of 105 breast cancer patients from The Cancer Genome Atlas (TCGA). Unlike other autoencoder-based models, CGCD can derive gene scores from the self-attention mechanism in the transformer model. The significant encoding genes were selected by computing the p-value per each gene based on the scores for all the patients. To verify CGCD score ability for predicting target genes, we estimated hazard ratio and p-value per gene by conducting survival analysis using Cox proportional hazard model and calculated area under the curve (AUC) with CGCD score and the p-value per patient, and performed biological functional analysis including Gene Set Enrichment Analysis (GSEA). As the CGCD score became higher, the results showed a pronounced increasing trend in the retention rate of breast cancer marker genes and pathways. From this point of view, the CGCD score that reflects harmony of multi-omics data in a gene is considered suitable as a criterion for predicting cancer diagnostic markers.-
dc.description.sponsorshipACKNOWLEDGMENT This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education (grant no. 2021R1I1A1A01058604), Korea Initiative for fostering University of Research and Innovation Program of the National Research Foundation (NRF) funded by the Korean government (MSIT) (No. NRF2021M3H1A104892211), Institute for Information communications Technology Promotion(IITP) grant funded by the Korea government (MSIP) (No. S2022A 068600023) and the Ajou University research fund.-
dc.language.isoeng-
dc.publisherInstitute of Electrical and Electronics Engineers Inc.-
dc.subject.mesh'omics'-
dc.subject.meshAuto encoders-
dc.subject.meshBreast Cancer-
dc.subject.meshCancer patients-
dc.subject.meshCentral dogma-
dc.subject.meshDeep learning-
dc.subject.meshGene scoring-
dc.subject.meshMulti-omic-
dc.subject.meshP-values-
dc.subject.meshProcess information-
dc.titleTransformer-Based Gene Scoring Model for Extracting Representative Characteristic of Central Dogma Process to Prioritize Pathogenic Genes Applying Breast Cancer Multi-omics Data-
dc.typeConference-
dc.citation.conferenceDate2023.2.13. ~ 2023.2.16.-
dc.citation.conferenceName2023 IEEE International Conference on Big Data and Smart Computing, BigComp 2023-
dc.citation.editionProceedings - 2023 IEEE International Conference on Big Data and Smart Computing, BigComp 2023-
dc.citation.endPage154-
dc.citation.startPage149-
dc.citation.titleProceedings - 2023 IEEE International Conference on Big Data and Smart Computing, BigComp 2023-
dc.identifier.bibliographicCitationProceedings - 2023 IEEE International Conference on Big Data and Smart Computing, BigComp 2023, pp.149-154-
dc.identifier.doi10.1109/bigcomp57234.2023.00033-
dc.identifier.scopusid2-s2.0-85151568347-
dc.identifier.urlhttp://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=10066534-
dc.subject.keywordbreast cancer-
dc.subject.keyworddata integration-
dc.subject.keyworddeep learning-
dc.subject.keywordgene scoring-
dc.subject.keywordMulti-omics-
dc.type.otherConference Paper-
dc.description.isoafalse-
dc.subject.subareaArtificial Intelligence-
dc.subject.subareaComputer Science Applications-
dc.subject.subareaComputer Vision and Pattern Recognition-
dc.subject.subareaInformation Systems-
dc.subject.subareaInformation Systems and Management-
dc.subject.subareaStatistics, Probability and Uncertainty-
dc.subject.subareaHealth Informatics-
Show simple item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Shin, HyunJung Image
Shin, HyunJung신현정
Department of Industrial Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.