C. Hazirbas, L. Ma, C. Domokos, and D. Cremers, Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture, pp.213-228, 2016.

L. Ma, J. Stückler, C. Kerl, and D. Cremers, Multi-view deep learning for consistent semantic mapping with rgbd cameras, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.598-605, 2017.

C. R. Qi, H. Su, K. Mo, and L. J. Guibas, Pointnet: Deep learning on point sets for 3d classification and segmentation, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.652-660, 2017.

C. R. Qi, L. Yi, H. Su, and L. J. Guibas, Pointnet++: Deep hierarchical feature learning on point sets in a metric space, Advances in neural information processing systems, pp.5099-5108, 2017.

J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.3431-3440, 2015.

F. Yu and V. Koltun, Multi-scale context aggregation by dilated convolutions, 2015.

L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, IEEE transactions on pattern analysis and machine intelligence, vol.40, pp.834-848, 2017.

X. Qi, R. Liao, J. Jia, S. Fidler, and R. Urtasun, 3d graph neural networks for rgbd semantic segmentation, Proceedings of the IEEE International Conference on Computer Vision, pp.5199-5208, 2017.

Y. Li, J. Zhang, Y. Cheng, K. Huang, and T. Tan, Semantics-guided multi-level rgb-d feature fusion for indoor semantic segmentation, 2017 IEEE International Conference on Image Processing (ICIP), pp.1262-1266, 2017.

Y. Cheng, R. Cai, Z. Li, X. Zhao, and K. Huang, Locality-sensitive deconvolution networks with gated fusion for rgb-d indoor semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3029-3037, 2017.

L. Deng, M. Yang, T. Li, Y. He, and C. Wang, Rfbnet: deep multimodal networks with residual fusion blocks for rgb-d semantic segmentation, 2019.

S. Gupta, R. Girshick, P. Arbeláez, and J. Malik, Learning rich features from rgb-d images for object detection and segmentation, pp.345-360, 2014.

O. Vinyals, C. Blundell, T. Lillicrap, and D. Wierstra, Matching networks for one shot learning, Advances in neural information processing systems, pp.3630-3638, 2016.

X. Li, L. Yu, C. Fu, M. Fang, and P. Heng, Revisiting metric learning for few-shot image classification, ArXiv, 2019.

J. Snell, K. Swersky, and R. Zemel, Prototypical networks for few-shot learning, Advances in neural information processing systems, pp.4077-4087, 2017.

V. Garcia and J. Bruna, Few-shot learning with graph neural networks, 2017.

A. Shaban, S. Bansal, Z. Liu, I. Essa, and B. Boots, Oneshot learning for semantic segmentation, 2017.

K. Rakelly, E. Shelhamer, T. Darrell, A. Efros, and S. Levine, Conditional networks for few-shot semantic segmentation, 2018.

N. Dong and E. Xing, Few-shot semantic segmentation with prototype learning, BMVC, vol.3, 2018.

Z. Dong, R. Zhang, X. Shao, and H. Zhou, Multiscale discriminative location-aware network for fewshot semantic segmentation, 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol.2, pp.42-47, 2019.

M. Siam, B. Oreshkin, and M. Jagersand, Adaptive masked proxies for few-shot segmentation, 2019.

X. Zhang, Y. Wei, Y. Yang, and T. Huang, Sg-one: Similarity guidance network for one-shot semantic segmentation, 2018.

K. Wang, J. H. Liew, Y. Zou, D. Zhou, and J. Feng, Panet: Few-shot image semantic segmentation with prototype alignment, Proceedings of the IEEE International Conference on Computer Vision, pp.9197-9206, 2019.

L. Van-der-maaten and G. Hinton, Visualizing data using t-SNE, Journal of Machine Learning Research, vol.9, pp.2579-2605, 2008.

T. Baltru?aitis, C. Ahuja, and L. Morency, Multimodal machine learning: A survey and taxonomy, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.41, issue.2, pp.423-443, 2018.

D. Ramachandram and G. W. Taylor, Deep multimodal learning: A survey on recent advances and trends, IEEE Signal Processing Magazine, vol.34, issue.6, pp.96-108, 2017.

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler et al., The cityscapes dataset for semantic urban scene understanding, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.3213-3223, 2016.

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results

T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona et al., Microsoft coco: Common objects in context, pp.740-755, 2014.

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury et al., Pytorch: An imperative style, high-performance deep learning library, Advances in Neural Information Processing Systems, pp.8024-8035, 2019.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., Imagenet large scale visual recognition challenge, International journal of computer vision, vol.115, issue.3, pp.211-252, 2015.