Ajou University repository

Representation Learning of Biomedical Ontologies using Poincaré Embedding and Application to Genetic Risk Model
  • 김재식
Citations

SCOPUS

0

Citation Export

Advisor
손경아
Affiliation
아주대학교 일반대학원
Department
일반대학원 컴퓨터공학과
Publication Year
2021-08
Publisher
The Graduate School, Ajou University
Keyword
Poincaré ballPolygenic risk scoreRepresentation learningTransformer
Description
학위논문(석사)--아주대학교 일반대학원 :컴퓨터공학과,2021. 8
Alternative Abstract
Knowledge manipulation of Gene Ontology (GO) and Gene Ontology Annotation (GOA) can be done primarily by using vector representation of GO terms and genes. Previous studies have represented GO terms and genes or gene products in Euclidean space to measure their semantic similarity using an embedding method such as the Word2Vec-based method to represent entities as numeric vectors. However, this method has the limitation that embedding large graph-structured data in the Euclidean space cannot prevent a loss of information of latent hierarchies, thus precluding the semantics of GO and GOA from being captured optimally. On the other hand, hyperbolic spaces such as the Poincaré ball are more suitable for modeling hierarchies, as they have a geometric property in which the distance increases exponentially as it nears the boundary because of negative curvature. In this thesis, we propose hierarchical representations of GO and genes (HiG2Vec) by applying Poincaré embedding specialized in the representation of hierarchy through a two-step procedure: GO embedding and gene embedding. Through experiments, we show that our model represents the hierarchical structure better than other approaches and predicts the interaction of genes or gene products similar to or better than previous studies. The results indicate that HiG2Vec is superior to other methods in capturing the GO and gene semantics and in data utilization as well. As one of effective downstream application of gene embeddings, we propose TransformerPRS, a deep learing model using a transformer module derived from language model, and compared with conventional polygenic risk score (PRS) which is a widely used risk scoring approach that derives a genetic risk for each individual from the sum of risk variants weighted by effect sizes from genome-wide association studies (GWASs). In the experiments, TransformerPRS with initialized by HiG2Vec showed better prediction performance than TransfermerPRS from scratch as well as conventional PRS. In addition, the self-attention module in a transformer block identified important features and their interactions. Our models can improve genetic risk prediction by providing information on which genes and interactions between genes have an important impact on prediction, which were not captured by conventional PRS.
Language
eng
URI
https://dspace.ajou.ac.kr/handle/2018.oak/20424
Fulltext

Type
Thesis
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Total Views & Downloads

File Download

  • There are no files associated with this item.