CSWin Transformer-CNN Encoder and Multi-Head Self-Attention Based CNN Decoder for Robust Medical Segmentation

Authors

  • J. Pandu IQAC Department of ECE, Sreyas Institute of Engineering and Technology, Nagole, Bandlaguna, Hyderabad, Telangana, India
  • G. Ravi Shankar Reddy Department of Electronics and Communication Engineering, CVR College of Engineering, Hyderabad, Telangana, India
  • CH. Ashok Babu Department of Audiology, Helen Kellers Institute of Research & Rehabilitation for the Disabled Children, Hyderabad- 500056, India

Keywords:

Medical image segmentation, CSWin transformer, CNN, Multi-Head Self-Attention, dilated-uper decoder

Abstract

Convolutional Neural Networks have demonstrated exceptional effectiveness in the field of medical image segmentation by effectively capturing intricate local details such as edges and textures. But still, their limited domain of view often impedes comprehensive representation of global information. Transformers, on the other hand, have shown promise in modeling long-range dependencies, yet, Convolutional Neural Networks occasionally face challenges in effectively capturing high-level spatial features. An ideal segmentation model ought to effectively harness both local and global features to achieve precision and semantic accuracy. This article introduces a novel Cross Shaped Window Transformer framework, employing U-shaped network architecture. This network combines a Convolutional Neural Network encoder with a Multi-Head Self-Attention based CNN decoder. Within the CNN encoder, a transformer path is integrated with a shifted window mechanism, enhancing the representation of both local and global information, thus ensuring robust medical image segmentation. The encoder's skip connections are reinstated using a Multi-Head Self-Attention decoder. To decode a wide range of features and manage distortions in local details, a dilated-Uper decoder is introduced. The Synapse dataset is utilized to assess the effectiveness of the proposed method, revealing that it surpasses existing approaches with an impressive accuracy of approximately 93%.

Downloads

Published

21-06-2024

Issue

Section

Articles

How to Cite

J. Pandu, G. Ravi Shankar Reddy, & CH. Ashok Babu. (2024). CSWin Transformer-CNN Encoder and Multi-Head Self-Attention Based CNN Decoder for Robust Medical Segmentation. Journal of Soft Computing and Data Mining, 5(1), 57-69. https://publisher.uthm.edu.my/ojs/index.php/jscdm/article/view/17425