SCOPUS
0Citation Export
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 허용석 | - |
dc.contributor.author | 신창종 | - |
dc.date.issued | 2023-02 | - |
dc.identifier.other | 32682 | - |
dc.identifier.uri | https://dspace.ajou.ac.kr/handle/2018.oak/24490 | - |
dc.description | 학위논문(석사)--아주대학교 일반대학원 :인공지능학과,2023. 2 | - |
dc.description.tableofcontents | 1 Introduction 1 <br>2 Related Works 5 <br> 2.1 Blind Face Restoration 5 <br>3 Proposed Method 7 <br> 3.1 Code generation (Stage 1) 8 <br> 3.2 Code prediction (Stage 2) 10 <br> 3.2.1 Encoder with (S)W-MSA 11 <br> 3.2.2 Multi-scale cross-attention transformer 12 <br> 3.2.3 Transformer 14 <br> 3.3 Feature fusion (Stage 3) 15 <br> 3.3.1 SW-TCAFM 16 <br> 3.3.2 Semantic token regularization loss 19 <br>4 Experiments and Results 21 <br> 4.1 Evaluation Settings and Implementation 21 <br> 4.2 Comparative Results with State-of-the-art Methods 22 <br> 4.3 Ablation study 23 <br>5 Conclusions 27 | - |
dc.language.iso | eng | - |
dc.publisher | The Graduate School, Ajou University | - |
dc.rights | 아주대학교 논문은 저작권에 의해 보호받습니다. | - |
dc.title | Blind Face Restoration Using Swin Transformer with Global Semantic Token Regularization | - |
dc.type | Thesis | - |
dc.contributor.affiliation | 아주대학교 대학원 | - |
dc.contributor.department | 일반대학원 인공지능학과 | - |
dc.date.awarded | 2023-02 | - |
dc.description.degree | Master | - |
dc.identifier.url | https://dcoll.ajou.ac.kr/dcollection/common/orgView/000000032682 | - |
dc.subject.keyword | Blind face restoration | - |
dc.description.alternativeAbstract | In this thesis, we propose a framework to solve the blind face restoration that <br>recover a high-quality face image from unknown degradations. Previous methods <br>have shown that the Vector Quantization (VQ) codebook can be powerful prior <br>to solve the blind face restoration. <br>However, it is still challenging to predict code vectors from low-quality im- <br>ages. To solve this problem, we propose a multi-scale transformer consisting of <br>multi-scale cross-attention (MSCA) blocks. The multi-scale transformer com- <br>pensates for lost information of high-level features by globally fusing low-level <br>and high-level features with different spatial resolutions. <br>Also, there is a trade-off problem between pixel-wise fidelity and visual qual- <br>ity of the results. To improve the fidelity of the results, we employ shifted win- <br>dow cross-attention modules at multiple scales. The shifted window method can <br>not calculate inter-window attention to model the abundant facial global con- <br>text. To solve this problem, we propose a shifted window token cross-attention <br>module SW-TCAFM with a global class token to model the global context of <br>face. The global class token models the global context by aggregating informa- <br>tion across all windows and passing it to the next step. In addition, we propose a <br>semantic token regularization loss that makes each global class token represents <br>a specific face component by utilizing the face parsing map prior. <br>Our framework achieves superior performance in both quality and fidelity <br>compared to state-of-the-art methods. In our experiments, we show that the <br>PSNR and FID results of our framework are better than 3.21% and 2.92%, <br>respectively, compared to state-of-the-art method. | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.