Weakly supervised video object segmentation initialized with referring expression

Bu, Xiao Qing; Sun, Yu Kuan; Wang, Jian Ming; Liu, Kun Liang; Liang, Jia Yu; Jin, Guang Hao; Chung, Tae Sun

DC Field	Value	Language
dc.contributor.author	Bu, Xiao Qing	-
dc.contributor.author	Sun, Yu Kuan	-
dc.contributor.author	Wang, Jian Ming	-
dc.contributor.author	Liu, Kun Liang	-
dc.contributor.author	Liang, Jia Yu	-
dc.contributor.author	Jin, Guang Hao	-
dc.contributor.author	Chung, Tae Sun	-
dc.date.issued	2021-09-17	-
dc.identifier.uri	https://dspace.ajou.ac.kr/dev/handle/2018.oak/31611	-
dc.description.abstract	With the aid of one manually annotated frame, One-Shot Video Object Segmentation (OSVOS) uses a CNN architecture to tackle the problem of semi-supervised video object segmentation (VOS). However, annotating a pixel-level segmentation mask is expensive and time-consuming. To alleviate the problem, we explore a language interactive way of initializing semi-supervised VOS and run the semi-supervised methods into a weakly supervised mode. Our contributions are two folds: (i) we propose a variant of OSVOS initialized with referring expressions (REVOS), which locates a target object by maximizing the matching score between all the candidates and the referring expression; (ii) segmentation performance of semi-supervised VOS methods varies dramatically when selecting different frames for annotation. We present a strategy of the best annotation frame selection by using image similarity measurement. Meanwhile, we first to propose a multiple frame annotation selection strategy for initialization of semi-supervised VOS with more than one annotated frames. Finally we evaluate our method on DAVIS-2016 dataset, and experimental results show that REVOS achieves similar performance (79.94% measured by average IoU) compared with OSVOS (80.1%). Although current REVOS implementation is specific to the method of one-shot video object segmentation, it can be more widely applicable to other semi-supervised VOS methods.	-
dc.description.sponsorship	This work was supported by The Tianjin Science and Technology Program under grant 19PTZWHZ00020 and the National Natural Science Foundation of China under grant 61902281.	-
dc.description.sponsorship	This study is funded by The Tianjin Science and Technology Program(19PTZWHZ00020) and National Natural Science Foundation of China (Grant No. 61902281).	-
dc.language.iso	eng	-
dc.publisher	Elsevier B.V.	-
dc.subject.mesh	Frame selection	-
dc.subject.mesh	Image similarity	-
dc.subject.mesh	Interactive way	-
dc.subject.mesh	Referring expressions	-
dc.subject.mesh	Segmentation masks	-
dc.subject.mesh	Segmentation performance	-
dc.subject.mesh	Semi-supervised method	-
dc.subject.mesh	Video-object segmentation	-
dc.title	Weakly supervised video object segmentation initialized with referring expression	-
dc.type	Article	-
dc.citation.endPage	765	-
dc.citation.startPage	754	-
dc.citation.title	Neurocomputing	-
dc.citation.volume	453	-
dc.identifier.bibliographicCitation	Neurocomputing, Vol.453, pp.754-765	-
dc.identifier.doi	10.1016/j.neucom.2020.06.129	-
dc.identifier.scopusid	2-s2.0-85092720316	-
dc.identifier.url	www.elsevier.com/locate/neucom	-
dc.subject.keyword	Natural Language Processing	-
dc.subject.keyword	Referring Expression	-
dc.subject.keyword	Video Object Segmentation	-
dc.description.isoa	false	-
dc.subject.subarea	Computer Science Applications	-
dc.subject.subarea	Cognitive Neuroscience	-
dc.subject.subarea	Artificial Intelligence	-

Show simple item record

qrcode

트윗하기

Related Researcher

Chung, Tae-Sun정태선: Department of Software and Computer Engineering

File Download

There are no files associated with this item.

Related Researcher

Total Views & Downloads

File Download