Blind Face Restoration Using Swin Transformer with Global Semantic Token Regularization

신창종

DC Field	Value	Language
dc.contributor.advisor	허용석	-
dc.contributor.author	신창종	-
dc.date.issued	2023-02	-
dc.identifier.other	32682	-
dc.identifier.uri	https://dspace.ajou.ac.kr/handle/2018.oak/24490	-
dc.description	학위논문(석사)--아주대학교 일반대학원 :인공지능학과,2023. 2	-
dc.description.tableofcontents	1 Introduction 1 <br>2 Related Works 5 <br> 2.1 Blind Face Restoration 5 <br>3 Proposed Method 7 <br> 3.1 Code generation (Stage 1) 8 <br> 3.2 Code prediction (Stage 2) 10 <br> 3.2.1 Encoder with (S)W-MSA 11 <br> 3.2.2 Multi-scale cross-attention transformer 12 <br> 3.2.3 Transformer 14 <br> 3.3 Feature fusion (Stage 3) 15 <br> 3.3.1 SW-TCAFM 16 <br> 3.3.2 Semantic token regularization loss 19 <br>4 Experiments and Results 21 <br> 4.1 Evaluation Settings and Implementation 21 <br> 4.2 Comparative Results with State-of-the-art Methods 22 <br> 4.3 Ablation study 23 <br>5 Conclusions 27	-
dc.language.iso	eng	-
dc.publisher	The Graduate School, Ajou University	-
dc.rights	아주대학교 논문은 저작권에 의해 보호받습니다.	-
dc.title	Blind Face Restoration Using Swin Transformer with Global Semantic Token Regularization	-
dc.type	Thesis	-
dc.contributor.affiliation	아주대학교 대학원	-
dc.contributor.department	일반대학원 인공지능학과	-
dc.date.awarded	2023-02	-
dc.description.degree	Master	-
dc.identifier.url	https://dcoll.ajou.ac.kr/dcollection/common/orgView/000000032682	-
dc.subject.keyword	Blind face restoration	-
dc.description.alternativeAbstract	In this thesis, we propose a framework to solve the blind face restoration that <br>recover a high-quality face image from unknown degradations. Previous methods <br>have shown that the Vector Quantization (VQ) codebook can be powerful prior <br>to solve the blind face restoration. <br>However, it is still challenging to predict code vectors from low-quality im- <br>ages. To solve this problem, we propose a multi-scale transformer consisting of <br>multi-scale cross-attention (MSCA) blocks. The multi-scale transformer com- <br>pensates for lost information of high-level features by globally fusing low-level <br>and high-level features with different spatial resolutions. <br>Also, there is a trade-off problem between pixel-wise fidelity and visual qual- <br>ity of the results. To improve the fidelity of the results, we employ shifted win- <br>dow cross-attention modules at multiple scales. The shifted window method can <br>not calculate inter-window attention to model the abundant facial global con- <br>text. To solve this problem, we propose a shifted window token cross-attention <br>module SW-TCAFM with a global class token to model the global context of <br>face. The global class token models the global context by aggregating informa- <br>tion across all windows and passing it to the next step. In addition, we propose a <br>semantic token regularization loss that makes each global class token represents <br>a specific face component by utilizing the face parsing map prior. <br>Our framework achieves superior performance in both quality and fidelity <br>compared to state-of-the-art methods. In our experiments, we show that the <br>PSNR and FID results of our framework are better than 3.21% and 2.92%, <br>respectively, compared to state-of-the-art method.	-

Show simple item record

qrcode

트윗하기

Total Views & Downloads

File Download

There are no files associated with this item.