Deep multimodal fusion for semantic image segmentation: A survey

Recent advances in deep learning have shown excellent performance in various scene understanding tasks. However, in some complex environments or under challenging conditions, it is necessary to employ multiple modalities that provide complementary information on the same scene. A variety of studies have demonstrated that deep multimodal fusion for semantic image segmentation achieves significant performance improvement. These fusion approaches take the benefits of multiple information sources and generate an optimal joint prediction automatically. This paper describes the essential background concepts of deep multimodal fusion and the relevant applications in computer vision. In particular, we provide a systematic survey of multimodal fusion methodolo-gies, multimodal segmentation datasets, and quantitative evaluations on the benchmark datasets. Existing fusion methods are summarized according to a common taxonomy: early fusion, late fusion, and hybrid fusion. Based on their performance, we analyze the strengths and weaknesses of different fusion strategies. Current challenges and design choices are discussed, aiming to provide the reader with a comprehensive and heuristic view of deep multimodal image seg-mentation.

Keywords

Deep learning Image segmentation mutimodal fusion Image fusion Multi-modal Semantic segmentation

Domains

Computer Vision and Pattern Recognition [cs.CV] Image Processing [eess.IV]

Fichier principal

Deep_Multimodal_Fusion_for_Semantic_Image_Segmentation__A_Survey.pdf (5.16 Mo)

Origin : Files produced by the author(s)

Désiré Sidibé : Connect in order to contact the contributor

https://univ-evry.hal.science/hal-02963619

Submitted on : Saturday, October 10, 2020-10:47:43 PM

Last modification on : Monday, April 22, 2024-4:24:04 PM

Long-term archiving on: Monday, January 11, 2021-7:08:42 PM

Dates and versions

hal-02963619 , version 1 (10-10-2020)

Identifiers

HAL Id : hal-02963619 , version 1
DOI : 10.1016/j.imavis.2020.104042

Cite

Yifei Zhang, Désiré Sidibé, Olivier Morel, Fabrice Mériaudeau. Deep multimodal fusion for semantic image segmentation: A survey. Image and Vision Computing, 2021, 105, pp.104042. ⟨10.1016/j.imavis.2020.104042⟩. ⟨hal-02963619⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-BOURGOGNE CNRS UNIV-EVRY IBISC UNIV-PARIS-SACLAY IMVIA VIBOT IBISC-SIAM ANR GS-ENGINEERING GS-COMPUTER-SCIENCE GS-LIFE-SCIENCES-HEALTH GS-SPORT-HUMAN-MOVEMENT

403 View

4126 Download