ECGNet-ViT: Hybridizing GoogleNet with Vision Transformer for Accurate COVID-19 Detection from ECG Images

Mudassar Khalid; Charnchai Pluempitiwiriyawej; Nawaporn Wisitpongphanb; Muhammad Asim Saleem; Sadia Jabbar Anwar; Abdulkadhem A. Abdulkadhem; Tien Truong

doi:10.4186/ej.2025.29.10.97

Authors

Mudassar Khalid Department of Electrical Engineering, Chulalongkorn University
Charnchai Pluempitiwiriyawej Department of Electrical Engineering, Chulalongkorn University
Nawaporn Wisitpongphanb Department of Digital Network and Information Security Management in the Faculty of Information Technology and Digital Innovation, King Mongkut’s University of Technology North Bangkok
Muhammad Asim Saleem Department of Electrical Engineering, Chulalongkorn University
Sadia Jabbar Anwar School of Computer Science Northwestern Polytechnical University Xi’an, PR. China
Abdulkadhem A. Abdulkadhem Department of Cyber Security, College of Sciences, Al-Mustaqbal University, Iraq
Tien Truong School of Economics and Cognitive Science, University of California, Berkeley, USA

DOI:

https://doi.org/10.4186/ej.2025.29.10.97

Keywords:

Deep Learning, ECG Images Classification, COVID-19, GoogleNet, Swish, CNN, Vision Transformer

Abstract

COVID-19 has affected millions of people around the world in the last three years. Despite widespread vaccination efforts, infections persist and definitive treatments remain elusive. Therefore, early and accurate detection of COVID-19 is critical to minimize invasive procedures and reduce mortality. Although radiographs and CT scans are commonly used for the diagnosis of COVID-19, electrocardiogram (ECG) images remain underutilized despite their widespread availability. This limited use can be attributed to the complex transformations required to process ECG data, which increase computational demands. This study proposes a novel hybrid deep learning model ECGNet-ViT for COVID-19 detection. The model combines the multi-scale feature extraction capabilities of GoogleNet (GNet) with Swish activation functions and densely connected layers, and then integrates it with Vision Transformer (ViT) to effectively capture long-range dependencies in classification tasks. This approach can efficiently analyze ECG data and accurately classify samples into five categories: normal, COVID-19, myocardial infarction (MI), previous myocardial infarction (PMI) and arrhythmia (AHB). Comprehensive experiments on a publicly available ECG datasets demonstrate the effectiveness of the proposed model, achieving 99.13% accuracy, 99.19% precision, 99.24% recall, and 99.22% F1 score. These results highlight the potential of the proposed model to provide reliable, non-invasive support in COVID-19 diagnosis based on ECG data.