Ajou University repository

Weakly supervised video object segmentation initialized with referring expression
  • Bu, Xiao Qing ;
  • Sun, Yu Kuan ;
  • Wang, Jian Ming ;
  • Liu, Kun Liang ;
  • Liang, Jia Yu ;
  • Jin, Guang Hao ;
  • Chung, Tae Sun
Citations

SCOPUS

3

Citation Export

Publication Year
2021-09-17
Publisher
Elsevier B.V.
Citation
Neurocomputing, Vol.453, pp.754-765
Keyword
Natural Language ProcessingReferring ExpressionVideo Object Segmentation
Mesh Keyword
Frame selectionImage similarityInteractive wayReferring expressionsSegmentation masksSegmentation performanceSemi-supervised methodVideo-object segmentation
All Science Classification Codes (ASJC)
Computer Science ApplicationsCognitive NeuroscienceArtificial Intelligence
Abstract
With the aid of one manually annotated frame, One-Shot Video Object Segmentation (OSVOS) uses a CNN architecture to tackle the problem of semi-supervised video object segmentation (VOS). However, annotating a pixel-level segmentation mask is expensive and time-consuming. To alleviate the problem, we explore a language interactive way of initializing semi-supervised VOS and run the semi-supervised methods into a weakly supervised mode. Our contributions are two folds: (i) we propose a variant of OSVOS initialized with referring expressions (REVOS), which locates a target object by maximizing the matching score between all the candidates and the referring expression; (ii) segmentation performance of semi-supervised VOS methods varies dramatically when selecting different frames for annotation. We present a strategy of the best annotation frame selection by using image similarity measurement. Meanwhile, we first to propose a multiple frame annotation selection strategy for initialization of semi-supervised VOS with more than one annotated frames. Finally we evaluate our method on DAVIS-2016 dataset, and experimental results show that REVOS achieves similar performance (79.94% measured by average IoU) compared with OSVOS (80.1%). Although current REVOS implementation is specific to the method of one-shot video object segmentation, it can be more widely applicable to other semi-supervised VOS methods.
Language
eng
URI
https://dspace.ajou.ac.kr/dev/handle/2018.oak/31611
DOI
https://doi.org/10.1016/j.neucom.2020.06.129
Fulltext

Type
Article
Funding
This work was supported by The Tianjin Science and Technology Program under grant 19PTZWHZ00020 and the National Natural Science Foundation of China under grant 61902281.This study is funded by The Tianjin Science and Technology Program(19PTZWHZ00020) and National Natural Science Foundation of China (Grant No. 61902281).
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Chung, Tae-Sun Image
Chung, Tae-Sun정태선
Department of Software and Computer Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.