Abstract:
Visual data collected from drones has opened a new direction for surveillance applications and has recently attracted considerable attention among computer vision researchers. Due to the availability and increasing use of the drone for both public and private sectors, it is a critical futuristic technology to solve multiple surveillance problems in remote areas. One of the fundamental challenges in recognizing crowd monitoring videos’ human action is the precise modeling of an individual’s motion feature. Most state-of-the-art methods heavily rely on optical flow for motion modeling and representation, and motion modeling through optical flow is a time-consuming process. This article underlines this issue and provides a novel architecture that eliminates the dependency on optical flow. The proposed architecture uses two sub-modules, FMFM (faster motion feature modeling) and AAR (accurate action recognition), to accurately classify the aerial surveillance action. Another critical issue in aerial surveillance is a deficiency of the dataset. Out of few datasets proposed recently, most of them have multiple humans performing different actions in the same scene, such as a crowd monitoring video, and hence not suitable for directly applying to the training of action recognition models. Given this, we have proposed a novel dataset captured from top view aerial surveillance that has a good variety in terms of actors, daytime, and environment. The proposed architecture has shown the capability to be applied in different terrain as it removes the background before using the action recognition model. The proposed architecture is validated through the experiment with varying investigation levels and achieves a remarkable performance of 0.90 validation accuracy in aerial action recognition.