Department of Computer Science and Information Systems

Permanent URI for this collectionhttp://localhost:4000/handle/123456789/1928

Browse

Search Results

Now showing 1 - 2 of 2

Transformers for vision: a survey on innovative methods for computer vision
(IEEE, 2025-05) Kumar, Dhruv; Chalapathi, G.S.S.
Transformers have emerged as a groundbreaking architecture in the field of computer vision, offering a compelling alternative to traditional convolutional neural networks (CNNs) by enabling the modeling of long-range dependencies and global context through self-attention mechanisms. Originally developed for natural language processing, transformers have now been successfully adapted for a wide range of vision tasks, leading to significant improvements in performance and generalization. This survey provides a comprehensive overview of the fundamental principles of transformer architectures, highlighting the core mechanisms such as self-attention, multi-head attention, and positional encoding that distinguish them from CNNs. We delve into the theoretical adaptations required to apply transformers to visual data, including image tokenization and the integration of positional embeddings. A detailed analysis of key transformer-based vision architectures such as ViT, DeiT, Swin Transformer, PVT, Twins, and CrossViT are presented, alongside their practical applications in image classification, object detection, video understanding, medical imaging, and cross-modal tasks. The paper further compares the performance of vision transformers with CNNs, examining their respective strengths, limitations, and the emergence of hybrid models. Finally, current challenges in deploying ViTs, such as computational cost, data efficiency, and interpretability, and explore recent advancements and future research directions including efficient architectures, self-supervised learning, and multimodal integration are discussed.
A WSN and vision based smart, energy efficient, scalable, and reliable parking surveillance system with optical verification at edge for resource constrained IoT devices
(Elsevier, 2024-12) Haribabu, K.
As urbanization accelerates, the demand for efficient parking surveillance solutions has increased. However, existing solutions often face challenges related to energy consumption, scalability, and reliability. This paper introduces a smart hybrid parking surveillance system integrating wireless sensor networks (WSNs) with vision based solution at the edge for resource constrained IoT devices to address these challenges. The solution leverages WSNs for periodic readings of parking space occupancy and introduces a low power sleep mode in the network for energy efficiency, along with optical verification strategies using computer vision models like R-CNN and Faster R-CNN FPN on ResNet50 and MobileNetv2 backbones for distinguishing between true and false positives in the WSN data for a greater accuracy in parking space occupancy. The system utilizes edge for computing on edge servers resulting in increased responsiveness of the system, reduced data transmission and real time processing of data. The proposed solution is formulated in such a way that it automatically switches between WSN and vision based sensing resulting in less energy consumption and longer lifespan of the system without compromising on accuracy. Through experimental results it is observed that models trained on the MobileNetv2 backbone demonstrated at least twice faster for both processing the images and training compared to those models trained on the ResNet backbone. On the other hand, both Faster R-CNN FPN (input resolution: 1440) and R-CNN (input resolution: 128) models trained on the MobileNetv2 backbone have slightly lower accuracies than the same models trained on the ResNet50 backbone.

Department of Computer Science and Information Systems

Browse

Filters

Settings

Sort By

Results per page

Search Results