A Real-Time Fall Detection Framework Using Vision Transformer and LSTM for Elderly People
DOI:
https://doi.org/10.55549/epstem.1358Keywords:
Vision transformer, LSTM deep learning, Temporal modeling, UR fall detection dataset, Real-time monitoringAbstract
For older people and people with limited mobility, falls are a major hazard and they can result in life-altering injuries and hospitalizations. An efficient fall detection system can improve the patient care by decreasing the time to intervention. We hereby introduce the Fall Detection framework based on Deep Learning that employs Vision Transformers (ViT) network for spatial feature extraction and LSTM networks (Long Short-Term Memory) modeling of the temporal sequence. The system can analyze the video clips to classify Event Fall and Not fall. Events majorly up to 99% accuracy level. We developed a lightweight and a modular architecture in Python which is optimized for speed and real-time processing using TensorFlow and Pytorch. The UR Fall Detection Dataset is comprised of labeled video sequences of simulating falls video sequences & has been used to perform evaluation. The model achieved an impressive 98.45% accuracy on fall detection, and its high generalization ability and low false detection rate were ascertained. In addition, a web interface for video upload was developed, enabling remote monitoring and real-time alerts, thus rendering the system ready for adoption in healthcare centers, assisted living facilities, and smart homes. By combining state-of-the-art techniques in vision and sequencing modeling, this system offers a non-invasive alternative for long-term monitoring of subjects.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 The Eurasia Proceedings of Science, Technology, Engineering and Mathematics

This work is licensed under a Creative Commons Attribution 4.0 International License.


