dc.description.abstract |
Distracted driving contributes to thousands of fatalities
and injuries globally. According to India’s Ministry of Road
Transport and Highways (MoRTH), distraction-related behaviors
such as rear-end and off-road collisions accounted for nearly
one-fourth of all traffic incidents in 2022. The U.S. National
Highway Traffic Safety Administration (NHTSA) reported 3,275
deaths and over 324,000 injuries from distraction-related crashes
in 2023. In Europe, the European Road Safety Observatory
(ERSO) observed handheld phone use by drivers in up to 9.4%
of vehicles across member states, with self-reported texting rates
reaching 53%. Despite numerous studies and surveys on driver
distraction detection, existing literature remains fragmented,
often combining multiple sensor modalities or distraction with
related driver states such as fatigue. Prior empirical efforts
also lack a unified benchmarking strategy to assess model
generalization under shifts in viewpoint or spectral input. This
paper presents a focused survey and empirical study of visiononly
distraction detection using deep learning models applied
to driver-facing camera inputs. It introduces a conceptual model
linking behavioral cues to cognitive distraction, defines the visionbased
Driver Distraction Detection (vDDD) system with alert
logic, and develops structured taxonomies of datasets, architectures,
and learning strategies. Using the 100-Driver dataset, the
empirical study evaluates 26 CNN classifiers under 64 crossdomain
configurations, systematically analyzing generalization
across modality and camera view changes. Results show that
frontal RGB-trained models generalize better than their NIRtrained
counterparts and that lightweight models trade off accuracy
under rare class scenarios for faster inference. The study
establishes the vDDD paradigm as a vision-based behavioral
modeling approach for distraction detection using driver-facing
camera data. It outlines future research directions in spectrumaligned
augmentation, attention modeling, and lightweight visuallanguage
fusion, emphasizing deployment-focused strategies such
as quantization, contrastive learning, and progressive fine-tuning. |
en_US |