Skip to Main content Skip to Navigation
Journal articles

Deep multimodal fusion for semantic image segmentation: A survey

Yifei Zhang 1 Désiré Sidibé 2 Olivier Morel 1 Fabrice Mériaudeau 1
1 VIBOT - Equipe VIBOT - VIsion pour la roBOTique [ImViA EA7535 - ERL CNRS 6000]
CNRS - Centre National de la Recherche Scientifique : ERL 6000, ImViA - Imagerie et Vision Artificielle [Dijon]
Abstract : Recent advances in deep learning have shown excellent performance in various scene understanding tasks. However, in some complex environments or under challenging conditions, it is necessary to employ multiple modalities that provide complementary information on the same scene. A variety of studies have demonstrated that deep multimodal fusion for semantic image segmentation achieves significant performance improvement. These fusion approaches take the benefits of multiple information sources and generate an optimal joint prediction automatically. This paper describes the essential background concepts of deep multimodal fusion and the relevant applications in computer vision. In particular, we provide a systematic survey of multimodal fusion methodolo-gies, multimodal segmentation datasets, and quantitative evaluations on the benchmark datasets. Existing fusion methods are summarized according to a common taxonomy: early fusion, late fusion, and hybrid fusion. Based on their performance, we analyze the strengths and weaknesses of different fusion strategies. Current challenges and design choices are discussed, aiming to provide the reader with a comprehensive and heuristic view of deep multimodal image seg-mentation.
Complete list of metadatas

Cited literature [138 references]  Display  Hide  Download

https://hal-univ-evry.archives-ouvertes.fr/hal-02963619
Contributor : Désiré Sidibé <>
Submitted on : Saturday, October 10, 2020 - 10:47:43 PM
Last modification on : Wednesday, October 14, 2020 - 4:22:11 AM

File

Deep_Multimodal_Fusion_for_Sem...
Files produced by the author(s)

Identifiers

Citation

Yifei Zhang, Désiré Sidibé, Olivier Morel, Fabrice Mériaudeau. Deep multimodal fusion for semantic image segmentation: A survey. Image and Vision Computing, Elsevier, In press, ⟨10.1016/j.imavis.2020.104042⟩. ⟨hal-02963619⟩

Share

Metrics

Record views

72

Files downloads

60