Ajou University repository

A Novel Multi-Modal Network-Based Dynamic Scene Understanding
  • Uddin, Md Azher ;
  • Joolee, Joolekha Bibi ;
  • Lee, Young Koo ;
  • Sohn, Kyung Ah
Citations

SCOPUS

1

Citation Export

Publication Year
2022-01-01
Journal
ACM Transactions on Multimedia Computing, Communications and Applications
Publisher
Association for Computing Machinery
Citation
ACM Transactions on Multimedia Computing, Communications and Applications, Vol.18 No.1
Keyword
Multi-modal networkstacked Bi-LSTM networktemporal mixed poolingvolume local directional transition patternvolume symmetric gradient local graph structure
Mesh Keyword
Dynamic scenesGraph structuresMemory networkMultimodal networkStacked bidirectional long short-term memory networkSymmetricsTemporal mixed poolingTransition patternsVolume local directional transition patternVolume symmetric gradient local graph structure
All Science Classification Codes (ASJC)
Hardware and ArchitectureComputer Networks and Communications
Abstract
In recent years, dynamic scene understanding has gained attention from researchers because of its widespread applications. The main important factor in successfully understanding the dynamic scenes lies in jointly representing the appearance and motion features to obtain an informative description. Numerous methods have been introduced to solve dynamic scene recognition problem, nevertheless, a few concerns still need to be investigated. In this article, we introduce a novel multi-modal network for dynamic scene understanding from video data, which captures both spatial appearance and temporal dynamics effectively. Furthermore, two-level joint tuning layers are proposed to integrate the global and local spatial features as well as spatial and temporal stream deep features. In order to extract the temporal information, we present a novel dynamic descriptor, namely, Volume Symmetric Gradient Local Graph Structure (VSGLGS), which generates temporal feature maps similar to optical flow maps. However, this approach overcomes the issues of optical flow maps. Additionally, Volume Local Directional Transition Pattern (VLDTP) based handcrafted spatiotemporal feature descriptor is also introduced, which extracts the directional information through exploiting edge responses. Lastly, a stacked Bidirectional Long Short-Term Memory (Bi-LSTM) network along with a temporal mixed pooling scheme is designed to achieve the dynamic information without noise interference. The extensive experimental investigation proves that the proposed multi-modal network outperforms most of the state-of-The-Art approaches for dynamic scene understanding.
ISSN
1551-6865
Language
eng
URI
https://aurora.ajou.ac.kr/handle/2018.oak/32642
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85127896688&origin=inward
DOI
https://doi.org/10.1145/3462218
Journal URL
http://dl.acm.org/citation.cfm?id=J961&picked=prox&cfid=195871604&cftoken=86191829
Type
Article
Funding
This research was supported by the Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (No.2016-0-00406, SIAT CCTV Cloud Platform), by the National Research Foundation of Korea grant funded by the Korea government (MSIT) (NRF-2019R1A2C1006608) and by the BK21 FOUR program of the Ministry of Education (NRF5199991014091). Authors. addresses: Md. A. Uddin, Department of Artificial Intelligence, Ajou University, 206, World cup-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do, 16499, Republic of Korea; email: azher006@yahoo.com; J. B. Joolee and Y.-K. Lee (corresponding author), Department of Computer Science and Engineering, Kyung Hee University, 1732, Deogyeong-daero, Giheung-gu, Yongin-si, Gyeonggi-do 17104, Republic of Korea; emails: julekhajulie@gmail.com, yklee@khu.ac.kr; K.-A. Sohn (corresponding author), Department of Software and Computer Engineering, and Department of Artificial Intelligence, Ajou University, 206, World cup-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do, 16499, Republic of Korea; email: kasohn@ajou.ac.kr. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. \u00a9 2022 Association for Computing Machinery. 1551-6857/2022/01-ART7 $15.00 https://doi.org/10.1145/3462218
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Sohn, Kyung-Ah Image
Sohn, Kyung-Ah손경아
Department of Software and Computer Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.