A hybrid multi-modal visual data cross fusion network for indoor and outdoor segmentation

Multi-modal scene parsing is a prevalent topic in robotics and autonomous driving since the knowledge of different modalities can complement each other. Recently, the success of self-attention-based methods has demonstrated the effectiveness of capturing long-range dependencies. However, the tremendous cost dramatically limits the application of this idea in multi-modal fusion. To alleviate this problem, this paper designs a multimodal cross-fusion block (AC) and its elegant variant (EAC) based on an additive attention mechanism to capture global awareness among different modalities efficiently. Moreover, a simple yet efficient transformer-based trans-context block (TC) is also presented to connect the contextual information. Based on the above components, we propose light HCFNet, which can explore long-range dependencies of multi-modal information while keeping local details. Finally, we conduct comprehensive experiments and analyses on both indoor (NYUv2-13,-40) and outdoor (Cityscapes-11) datasets. Experiment results show that the proposed HCFNet achieved 66.9% and 51.5% mIoU on NYUv2-13 and-40 classes settings, which outperform current start-of-the-art multi-model methods. Our model also shows a competitive mIoU of 80.6% on the Cityscapes-11 dataset. The code will be available at https://github.com/Superjie13/HCFNet.

Domains

Computer Vision and Pattern Recognition [cs.CV]

Fichier principal

ICPR2022 (1).pdf (2.73 Mo)

Origin : Files produced by the author(s)

Désiré Sidibé : Connect in order to contact the contributor

https://univ-evry.hal.science/hal-03719440

Submitted on : Monday, July 11, 2022-11:22:52 AM

Last modification on : Wednesday, February 21, 2024-1:47:15 PM

Long-term archiving on: Wednesday, October 12, 2022-7:46:25 PM

Dates and versions

hal-03719440 , version 1 (11-07-2022)

Identifiers

HAL Id : hal-03719440 , version 1

Cite

Sijie Hu, Fabien Bonardi, Samia Bouchafa, Désiré Sidibé. A hybrid multi-modal visual data cross fusion network for indoor and outdoor segmentation. 26TH International Conference on Pattern Recognition (ICPR 2022), Aug 2022, Montreal, Canada. pp.2539--2545. ⟨hal-03719440⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-EVRY IBISC UNIV-PARIS-SACLAY IBISC-SIAM GS-ENGINEERING GS-COMPUTER-SCIENCE GS-LIFE-SCIENCES-HEALTH GS-SPORT-HUMAN-MOVEMENT

68 View

51 Download