A Novel Multi-Modal Network-Based Dynamic Scene Understanding

Uddin, Md Azher; Joolee, Joolekha Bibi; Lee, Young Koo; Sohn, Kyung Ah

Publication Year: 2022-01-01

Journal: ACM Transactions on Multimedia Computing, Communications and Applications

Publisher: Association for Computing Machinery

Citation: ACM Transactions on Multimedia Computing, Communications and Applications, Vol.18 No.1

Keyword: Multi-modal network stacked Bi-LSTM network temporal mixed pooling volume local directional transition pattern volume symmetric gradient local graph structure

Mesh Keyword: Dynamic scenes Graph structures Memory network Multimodal network Stacked bidirectional long short-term memory network Symmetrics Temporal mixed pooling Transition patterns Volume local directional transition pattern Volume symmetric gradient local graph structure

All Science Classification Codes (ASJC): Hardware and Architecture Computer Networks and Communications

Abstract: In recent years, dynamic scene understanding has gained attention from researchers because of its widespread applications. The main important factor in successfully understanding the dynamic scenes lies in jointly representing the appearance and motion features to obtain an informative description. Numerous methods have been introduced to solve dynamic scene recognition problem, nevertheless, a few concerns still need to be investigated. In this article, we introduce a novel multi-modal network for dynamic scene understanding from video data, which captures both spatial appearance and temporal dynamics effectively. Furthermore, two-level joint tuning layers are proposed to integrate the global and local spatial features as well as spatial and temporal stream deep features. In order to extract the temporal information, we present a novel dynamic descriptor, namely, Volume Symmetric Gradient Local Graph Structure (VSGLGS), which generates temporal feature maps similar to optical flow maps. However, this approach overcomes the issues of optical flow maps. Additionally, Volume Local Directional Transition Pattern (VLDTP) based handcrafted spatiotemporal feature descriptor is also introduced, which extracts the directional information through exploiting edge responses. Lastly, a stacked Bidirectional Long Short-Term Memory (Bi-LSTM) network along with a temporal mixed pooling scheme is designed to achieve the dynamic information without noise interference. The extensive experimental investigation proves that the proposed multi-modal network outperforms most of the state-of-The-Art approaches for dynamic scene understanding.

ISSN: 1551-6865

Language: eng

URI: https://aurora.ajou.ac.kr/handle/2018.oak/32642
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85127896688&origin=inward

DOI: https://doi.org/10.1145/3462218

Journal URL: http://dl.acm.org/citation.cfm?id=J961&picked=prox&cfid=195871604&cftoken=86191829

Type: Article

Funding: This research was supported by the Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (No.2016-0-00406, SIAT CCTV Cloud Platform), by the National Research Foundation of Korea grant funded by the Korea government (MSIT) (NRF-2019R1A2C1006608) and by the BK21 FOUR program of the Ministry of Education (NRF5199991014091). Authors. addresses: Md. A. Uddin, Department of Artificial Intelligence, Ajou University, 206, World cup-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do, 16499, Republic of Korea; email: azher006@yahoo.com; J. B. Joolee and Y.-K. Lee (corresponding author), Department of Computer Science and Engineering, Kyung Hee University, 1732, Deogyeong-daero, Giheung-gu, Yongin-si, Gyeonggi-do 17104, Republic of Korea; emails: julekhajulie@gmail.com, yklee@khu.ac.kr; K.-A. Sohn (corresponding author), Department of Software and Computer Engineering, and Department of Artificial Intelligence, Ajou University, 206, World cup-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do, 16499, Republic of Korea; email: kasohn@ajou.ac.kr. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2022 Association for Computing Machinery. 1551-6857/2022/01-ART7 $15.00 https://doi.org/10.1145/3462218

Show full item record

qrcode

트윗하기

Related Researcher

Sohn, Kyung-Ah손경아: Department of Software and Computer Engineering

File Download

There are no files associated with this item.

Related Researcher

Total Views & Downloads

File Download