Ajou University repository

Blind Face Restoration Using Swin Transformer with Global Semantic Token Regularization
  • 신창종
Citations

SCOPUS

0

Citation Export

DC Field Value Language
dc.contributor.advisor허용석-
dc.contributor.author신창종-
dc.date.issued2023-02-
dc.identifier.other32682-
dc.identifier.urihttps://dspace.ajou.ac.kr/handle/2018.oak/24490-
dc.description학위논문(석사)--아주대학교 일반대학원 :인공지능학과,2023. 2-
dc.description.tableofcontents1 Introduction 1 <br>2 Related Works 5 <br> 2.1 Blind Face Restoration 5 <br>3 Proposed Method 7 <br> 3.1 Code generation (Stage 1) 8 <br> 3.2 Code prediction (Stage 2) 10 <br> 3.2.1 Encoder with (S)W-MSA 11 <br> 3.2.2 Multi-scale cross-attention transformer 12 <br> 3.2.3 Transformer 14 <br> 3.3 Feature fusion (Stage 3) 15 <br> 3.3.1 SW-TCAFM 16 <br> 3.3.2 Semantic token regularization loss 19 <br>4 Experiments and Results 21 <br> 4.1 Evaluation Settings and Implementation 21 <br> 4.2 Comparative Results with State-of-the-art Methods 22 <br> 4.3 Ablation study 23 <br>5 Conclusions 27-
dc.language.isoeng-
dc.publisherThe Graduate School, Ajou University-
dc.rights아주대학교 논문은 저작권에 의해 보호받습니다.-
dc.titleBlind Face Restoration Using Swin Transformer with Global Semantic Token Regularization-
dc.typeThesis-
dc.contributor.affiliation아주대학교 대학원-
dc.contributor.department일반대학원 인공지능학과-
dc.date.awarded2023-02-
dc.description.degreeMaster-
dc.identifier.urlhttps://dcoll.ajou.ac.kr/dcollection/common/orgView/000000032682-
dc.subject.keywordBlind face restoration-
dc.description.alternativeAbstractIn this thesis, we propose a framework to solve the blind face restoration that <br>recover a high-quality face image from unknown degradations. Previous methods <br>have shown that the Vector Quantization (VQ) codebook can be powerful prior <br>to solve the blind face restoration. <br>However, it is still challenging to predict code vectors from low-quality im- <br>ages. To solve this problem, we propose a multi-scale transformer consisting of <br>multi-scale cross-attention (MSCA) blocks. The multi-scale transformer com- <br>pensates for lost information of high-level features by globally fusing low-level <br>and high-level features with different spatial resolutions. <br>Also, there is a trade-off problem between pixel-wise fidelity and visual qual- <br>ity of the results. To improve the fidelity of the results, we employ shifted win- <br>dow cross-attention modules at multiple scales. The shifted window method can <br>not calculate inter-window attention to model the abundant facial global con- <br>text. To solve this problem, we propose a shifted window token cross-attention <br>module SW-TCAFM with a global class token to model the global context of <br>face. The global class token models the global context by aggregating informa- <br>tion across all windows and passing it to the next step. In addition, we propose a <br>semantic token regularization loss that makes each global class token represents <br>a specific face component by utilizing the face parsing map prior. <br>Our framework achieves superior performance in both quality and fidelity <br>compared to state-of-the-art methods. In our experiments, we show that the <br>PSNR and FID results of our framework are better than 3.21% and 2.92%, <br>respectively, compared to state-of-the-art method.-
Show simple item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Total Views & Downloads

File Download

  • There are no files associated with this item.