Skip to Main content Skip to Navigation
Journal articles

A central multimodal fusion framework for outdoor scene image segmentation

Abstract : Robust multimodal fusion is one of the challenging research problems in semantic scene understanding. In real-world applications, the fusion system can overcome the drawbacks of individual sensors by taking different feature representations and statistical properties of multiple modalities (e.g., RGB-depth cameras, multispectral cameras). In this paper, we propose a novel central multimodal fusion framework for semantic image segmentation of road scenes, aiming to effectively learn joint feature representations and optimally combine deep neural networks with statistical priors. More specifically, the proposed fusion framework can automatically generate a central branch by sequentially mapping multimodal features into a common space, including both low-level and high-level features. Besides, in order to reduce the model uncertainty, we employ statistical fusion to compute the final prediction, which leads to significant performance improvement. We conduct extensive experiments on various outdoor scene datasets. Both qualitative and quantitative experiments demonstrate that our central fusion framework achieves competitive performance against existing multimodal fusion methods.
Document type :
Journal articles
Complete list of metadata
Contributor : Frédéric Davesne <>
Submitted on : Friday, March 5, 2021 - 9:52:16 AM
Last modification on : Sunday, March 7, 2021 - 3:23:43 AM



Yifei Zhang, Olivier Morel, Ralph Seulin, Fabrice Mériaudeau, Désiré Sidibé. A central multimodal fusion framework for outdoor scene image segmentation. Multimedia Tools and Applications, Springer Verlag, In press, ⟨10.1007/s11042-020-10357-y⟩. ⟨hal-03160284⟩



Record views