Multimodal Alignment and Fusion for Diabetic Retinopathy Detection using Deep Learning Approach
Keywords:
Multimodal alignment, late fusion, diabetic retinopathy, Fundus, OCT Scan, EHRAbstract
This study proposes a label-driven multimodal alignment with late fusion framework for diabetic retinopathy (DR) detection, integrating fundus images, OCT scans, and structured electronic health records (EHR). Unlike patient-wise paired datasets, which are often unavailable, the proposed alignment strategy groups and matches modalities based on diagnostic labels (DR/No_DR), ensuring semantic consistency across heterogeneous sources. Each modality is modeled using an architecture tailored to its data type—CNNs for fundus and OCT images, and an ANN for EHR—before predictions are combined via four late fusion strategies: Simple Average, Weighted Average, Majority Voting, and Stacked Ensemble. By enforcing label-driven alignment, the framework ensures that multimodal integration leverages coherent diagnostic cues from aligned class distributions, even without patient- level pairing. Experimental results, evaluated on accuracy, specificity, sensitivity, and F1-Score, show that while unimodal CNN-OCT and ANN-EHR models achieved strong accuracy, the Simple Average and Weighted Average fusion methods attained the highest F1-Scores (0.999), demonstrating an optimal precision–recall balance. Confusion matrix analysis further confirms high specificity and sensitivity, underscoring the ability of label-aligned multimodal fusion to exploit complementary diagnostic strengths. These findings highlight that label-driven alignment, coupled with averaging-based late fusion, not only improves predictive performance but also enhances robustness and clinical applicability, offering a scalable and interpretable AI- assisted DR screening solution for real-world ophthalmology practice.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Soft Computing and Data Mining

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.









