RESUMO
Recently, the scientific community has placed great emphasis on the recognition of human activity, especially in the area of health and care for the elderly. There are already practical applications of activity recognition and unusual conditions that use body sensors such as wrist-worn devices or neck pendants. These relatively simple devices may be prone to errors, might be uncomfortable to wear, might be forgotten or not worn, and are unable to detect more subtle conditions such as incorrect postures. Therefore, other proposed methods are based on the use of images and videos to carry out human activity recognition, even in open spaces and with multiple people. However, the resulting increase in the size and complexity involved when using image data requires the use of the most recent advanced machine learning and deep learning techniques. This paper presents an approach based on deep learning with attention to the recognition of activities from multiple frames. Feature extraction is performed by estimating the pose of the human skeleton, and classification is performed using a neural network based on Bidirectional Encoder Representation of Transformers (BERT). This algorithm was trained with the UP-Fall public dataset, generating more balanced artificial data with a Generative Adversarial Neural network (GAN), and evaluated with real data, outperforming the results of other activity recognition methods using the same dataset.
Assuntos
Algoritmos , Redes Neurais de Computação , Humanos , Idoso , Aprendizado de Máquina , Esqueleto , PosturaRESUMO
In recent years, much effort has been devoted to the development of applications capable of detecting different types of human activity. In this field, fall detection is particularly relevant, especially for the elderly. On the one hand, some applications use wearable sensors that are integrated into cell phones, necklaces or smart bracelets to detect sudden movements of the person wearing the device. The main drawback of these types of systems is that these devices must be placed on a person's body. This is a major drawback because they can be uncomfortable, in addition to the fact that these systems cannot be implemented in open spaces and with unfamiliar people. In contrast, other approaches perform activity recognition from video camera images, which have many advantages over the previous ones since the user is not required to wear the sensors. As a result, these applications can be implemented in open spaces and with unknown people. This paper presents a vision-based algorithm for activity recognition. The main contribution of this work is to use human skeleton pose estimation as a feature extraction method for activity detection in video camera images. The use of this method allows the detection of multiple people's activities in the same scene. The algorithm is also capable of classifying multi-frame activities, precisely for those that need more than one frame to be detected. The method is evaluated with the public UP-FALL dataset and compared to similar algorithms using the same dataset.