Representation-learning based propensity score model for causal inference in high dimension

유승찬

Advisor: 박래웅

Affiliation: 아주대학교 일반대학원

Department: 일반대학원 의학과

Publication Year: 2021-02

Publisher: The Graduate School, Ajou University

Keyword: deep learning observational study propensity score sample size

Description: 학위논문(박사)--아주대학교 일반대학원 :의학과,2021. 2

Alternative Abstract: There has been a surge in medical research attempting causal inference along with the enhancement in the adoption of electronic health records (EHRs) and the secondary use of large claim databases. Unlike in randomized clinical trials, the assignment of treatment is not independent of the baseline characteristics in observational data. Hence, two key assumptions should be satisfied for estimating causal inference in the observational study: unconfoundedness and overlap. Unconfoundedness rather than overlap is a significant challenge in most studies. Intuitively, unconfoundedness is more plausible when more covariates are included in the analysis. In this regard, the large-scale propensity score model (LSPS) balancing virtually all observed confounders is favorable over the propensity score model adjusting expert-derived tens of variables. However, LSPS often fails to balance available covariates in the high-dimensional, low sample-size (HDLSS) data, i.e. p >> n. This weakness hinders its wide adoption through a distributed research network based on standardized clinical data. Hence, this study aims to develop a more robust framework for causal inference based on propensity score in HDLSS: database-wide representation-learning-based propensity score model (RLPS). RLPS is composed of two components: 1. a task-agnostic, database-wide asymmetrically stacked autoencoder (DASA) to abstract high-dimensional features; and 2. downstream Bayesian lasso to estimate propensity score. A task-agnostic, database-wide asymmetrically stacked autoencoder (DASA) is trained in an unsupervised way based on a database-wide feature matrix to distill condensed meaningful representation. Once DASA is pretrained, the deep encoder of DASA maps the covariates into condensed space, and then Bayesian lasso estimates propensity score as a downstream task. Finally, propensity score matching is conducted to estimate the average treatment effect. The performance of RLPS was evaluated by using two clinical cases: 1. comparative cohort study of new users of 1. angiotensin receptor blocker and calcium channel blocker in hypertension; 2. ranitidine and other H2-receptor antagonists. In each case, 1000 and 500 patients were randomly sampled 100 times from the single standardized EHR database of tertiary hospital. Unconfoundedness, accuracy in risk estimates, and residual bias were compared between RLPS and LSPS. Compared to LSPS, RLPS identified more overlap and achieved better balancing performance of a large set of covariates between target and comparator cohorts. Mostly, RLPS performs better when there is an empirical equipoise. RLPS can be an attractive alternative to LSPS in studies when the number of covariates exceeds observations. Furthermore, RLPS may facilitate the population-level estimation study using EHRs of single institutions across the distributed research network.

Language: eng

URI: https://dspace.ajou.ac.kr/handle/2018.oak/20293

Fulltext

Type: Thesis

Show full item record

qrcode

트윗하기

Total Views & Downloads

File Download

There are no files associated with this item.