Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Sensors (Basel) ; 24(7)2024 Apr 05.
Artículo en Inglés | MEDLINE | ID: mdl-38610532

RESUMEN

In emergency situations, every second counts for an ambulance navigating through traffic. Efficient use of traffic light systems can play a crucial role in minimizing response time. This paper introduces a novel automated Real-Time Ambulance in an Emergency Detector (RTAIAED). The proposed system uses special Lookout Stations (LSs) suitably positioned at a certain distance from each involved traffic light (TL), to obtain timely and safe transitions to green lights as the Ambulance in an Emergency (AIAE) approaches. The foundation of the proposed system is built on the simultaneous processing of video and audio data. The video analysis is inspired by the Part-Based Model theory integrating tailored video detectors that leverage a custom YOLOv8 model for enhanced precision. Concurrently the audio analysis component employs a neural network designed to analyze Mel Frequency Cepstral Coefficients (MFCCs) providing an accurate classification of auditory information. This dual-faceted approach facilitates a cohesive and synergistic analysis of sensory inputs. It incorporates a logic-based component to integrate and interpret the detections from each sensory channel, thereby ensuring the precise identification of an AIAE as it approaches a traffic light. Extensive experiments confirm the robustness of the approach and its reliable application in real-world scenarios thanks to its predictions in real time (reaching an fps of 11.8 on a Jetson Nano and a response time up to 0.25 s), showcasing the ability to detect AIAEs even in challenging conditions, such as noisy environments, nighttime, or adverse weather conditions, provided a suitable-quality camera is appropriately positioned. The RTAIAED is particularly effective on one-way roads, addressing the challenge of regulating the sequence of traffic light signals so as to ensure a green signal to the AIAE when arriving in front of the TL, despite the presence of the "double red" periods in which the one-way traffic is cleared of vehicles coming from one direction before allowing those coming from the other side. Also, it is suitable for managing temporary situations, like in the case of roadworks.

2.
J Voice ; 2023 Oct 25.
Artículo en Inglés | MEDLINE | ID: mdl-37891129

RESUMEN

The incidence rate of voice diseases is increasing year by year. The use of software for remote diagnosis is a technical development trend and has important practical value. Among voice diseases, common diseases that cause hoarseness include spasmodic dysphonia, vocal cord paralysis, vocal nodule, and vocal cord polyp. This paper presents a voice disease detection method that can be applied in a wide range of clinical. We cooperated with Xiangya Hospital of Central South University to collect voice samples from 352 different patients. The Mel Frequency Cepstrum Coefficient (MFCC) parameters are extracted as input features to describe the voice in the form of data. An innovative model combining MFCC parameters and single convolution layer CNN is proposed for fast calculation and classification. The highest accuracy we achieved was 92%, it is fully ahead of the original research results and internationally advanced. And we use advanced voice function assessment databases (AVFAD) to evaluate the generalization ability of the method we proposed, which achieved an accuracy rate of 98%. Experiments on clinical and standard datasets show that for the pathological detection of voice diseases, our method has greatly improved in accuracy and computational efficiency.

3.
Front Psychiatry ; 14: 1079448, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37575564

RESUMEN

Background: Vocal features have been exploited to distinguish depression from healthy controls. While there have been some claims for success, the degree to which changes in vocal features are specific to depression has not been systematically studied. Hence, we examined the performances of vocal features in differentiating depression from bipolar disorder (BD), schizophrenia and healthy controls, as well as pairwise classifications for the three disorders. Methods: We sampled 32 bipolar disorder patients, 106 depression patients, 114 healthy controls, and 20 schizophrenia patients. We extracted i-vectors from Mel-frequency cepstrum coefficients (MFCCs), and built logistic regression models with ridge regularization and 5-fold cross-validation on the training set, then applied models to the test set. There were seven classification tasks: any disorder versus healthy controls; depression versus healthy controls; BD versus healthy controls; schizophrenia versus healthy controls; depression versus BD; depression versus schizophrenia; BD versus schizophrenia. Results: The area under curve (AUC) score for classifying depression and bipolar disorder was 0.5 (F-score = 0.44). For other comparisons, the AUC scores ranged from 0.75 to 0.92, and the F-scores ranged from 0.73 to 0.91. The model performance (AUC) of classifying depression and bipolar disorder was significantly worse than that of classifying bipolar disorder and schizophrenia (corrected p < 0.05). While there were no significant differences in the remaining pairwise comparisons of the 7 classification tasks. Conclusion: Vocal features showed discriminatory potential in classifying depression and the healthy controls, as well as between depression and other mental disorders. Future research should systematically examine the mechanisms of voice features in distinguishing depression with other mental disorders and develop more sophisticated machine learning models so that voice can assist clinical diagnosis better.

4.
J Voice ; 2023 Jun 19.
Artículo en Inglés | MEDLINE | ID: mdl-37344246

RESUMEN

On the one hand, the relationship between formant frequencies and vocal tract length (VTL) has been intensively studied over the years. On the other hand, the connection involving mel-frequency cepstral coefficients (MFCCs), which concisely codify the overall shape of a speaker's spectral envelope with just a few cepstral coefficients, and VTL has only been modestly analyzed, being worth of further investigation. Thus, based on different statistical models, this article explores the advantages and disadvantages of the latter approach, which is relatively novel, in contrast to the former which arises from more traditional studies. Additionally, VTL is assumed to be a static and inherent characteristic of speakers, that is, a single length parameter is frequently estimated per speaker. By contrast, in this paper we consider VTL estimation from a dynamic perspective using modern real-time Magnetic Resonance Imaging (rtMRI) to measure VTL in parallel with audio signals. To support the experiments, data obtained from USC-TIMIT magnetic resonance videos were used, allowing for the 2D real-time analysis of articulators in motion. As a result, we observed that the performance of MFCCs in case of speaker-dependent modeling is higher, however, in case of cross-speaker modeling, which uses different speakers' data for training and evaluating, its performance is not significantly different of that obtained with formants. In complement, we note that the estimation based on MFCCs is robust, with an acceptable computational time complexity, coherent with the traditional approach.

5.
Sensors (Basel) ; 23(7)2023 Mar 28.
Artículo en Inglés | MEDLINE | ID: mdl-37050599

RESUMEN

With the worldwide carbon neutralization boom, low-speed heavy load bearings have been widely used in the field of wind power. Bearing failure generates impulses when the rolling element passes the cracked surface of the bearing. Over the past decade, acoustic emission (AE) techniques have been used to detect failure signals. However, the high sampling rates of AE signals make it difficult to design and extract fault features; thus, deep neural network-based approaches have been proposed. In this paper, we proposed an improved RepVGG bearing fault diagnosis technique. The normalized and noise-reduced bearing signals were first converted into Mel frequency cepstrum coefficients (MFCCs) and then inputted into the model. In addition, the exponential moving average method was used to optimize the model and improve its accuracy. Data were extracted from the test bench and wind turbine main shaft bearing. Four damage classes were studied experimentally. The experimental results demonstrated that the improved RepVGG model could be employed for classifying low-speed heavy load bearing states by using MFCCs. Furthermore, the effectiveness of the proposed model was assessed by performing comparisons with existing models.

6.
Sensors (Basel) ; 21(16)2021 Aug 18.
Artículo en Inglés | MEDLINE | ID: mdl-34450996

RESUMEN

Intelligent systems are transforming the world, as well as our healthcare system. We propose a deep learning-based cough sound classification model that can distinguish between children with healthy versus pathological coughs such as asthma, upper respiratory tract infection (URTI), and lower respiratory tract infection (LRTI). To train a deep neural network model, we collected a new dataset of cough sounds, labelled with a clinician's diagnosis. The chosen model is a bidirectional long-short-term memory network (BiLSTM) based on Mel-Frequency Cepstral Coefficients (MFCCs) features. The resulting trained model when trained for classifying two classes of coughs-healthy or pathology (in general or belonging to a specific respiratory pathology)-reaches accuracy exceeding 84% when classifying the cough to the label provided by the physicians' diagnosis. To classify the subject's respiratory pathology condition, results of multiple cough epochs per subject were combined. The resulting prediction accuracy exceeds 91% for all three respiratory pathologies. However, when the model is trained to classify and discriminate among four classes of coughs, overall accuracy dropped: one class of pathological coughs is often misclassified as the other. However, if one considers the healthy cough classified as healthy and pathological cough classified to have some kind of pathology, then the overall accuracy of the four-class model is above 84%. A longitudinal study of MFCC feature space when comparing pathological and recovered coughs collected from the same subjects revealed the fact that pathological coughs, irrespective of the underlying conditions, occupy the same feature space making it harder to differentiate only using MFCC features.


Asunto(s)
Asma , Tos , Asma/diagnóstico , Niño , Tos/diagnóstico , Humanos , Estudios Longitudinales , Redes Neurales de la Computación , Ruidos Respiratorios/diagnóstico , Sonido
7.
IEEE Open J Eng Med Biol ; 2: 304-313, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-35402977

RESUMEN

Goal: Smartphones can be used to passively assess and monitor patients' speech impairments caused by ailments such as Parkinson's disease, Traumatic Brain Injury (TBI), Post-Traumatic Stress Disorder (PTSD) and neurodegenerative diseases such as Alzheimer's disease and dementia. However, passive audio recordings in natural settings often capture the speech of non-target speakers (cross-talk). Consequently, speaker separation, which identifies the target speakers' speech in audio recordings with two or more speakers' voices, is a crucial pre-processing step in such scenarios. Prior speech separation methods analyzed raw audio. However, in order to preserve speaker privacy, passively recorded smartphone audio and machine learning-based speech assessment are often performed on derived speech features such as Mel-Frequency Cepstral Coefficients (MFCCs). In this paper, we propose a novel Deep MFCC bAsed SpeaKer Separation (Deep-MASKS). Methods: Deep-MASKS uses an autoencoder to reconstruct MFCC components of an individual's speech from an i-vector, x-vector or d-vector representation of their speech learned during the enrollment period. Deep-MASKS utilizes a Deep Neural Network (DNN) for MFCC signal reconstructions, which yields a more accurate, higher-order function compared to prior work that utilized a mask. Unlike prior work that operates on utterances, Deep-MASKS operates on continuous audio recordings. Results: Deep-MASKS outperforms baselines, reducing the Mean Squared Error (MSE) of MFCC reconstruction by up to 44% and the number of additional bits required to represent clean speech entropy by 36%.

8.
Ann Biomed Eng ; 48(4): 1322-1336, 2020 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-31965359

RESUMEN

The clinical assessment of speech abnormalities in Cerebellar Ataxia (CA) is time-consuming and inconsistent. We have developed an automated objective system to quantify CA severity and thereby facilitate remote monitoring and optimisation of therapeutic interventions. A quantitative acoustic assessment could prove to be a viable biomarker for this purpose. Our study explores the use of phase-based cepstral features extracted from the modified group delay function as a complement to the features obtained from the magnitude cepstrum. We selected a combination of 15 acoustic measurements using RELIEF feature selection algorithm during the feature optimisation process. These features were used to segregate ataxic speakers from normal speakers (controls) and objectively assess them based on their severity. The effectiveness of our study has been experimentally evaluated through a clinical study involving 42 patients diagnosed with CA and 23 age-matched controls. A radial basis function kernel based support vector machine (SVM) classifier achieved a classification accuracy of 84.6% in CA-Control discrimination [area under the ROC curve (AUC) of 0.97] and 74% in the modified 3-level CA severity estimation (AUC of 0.90) deduced from the clinical ratings. The strong classification ability of selected features and the SVM model supports this scheme's suitability for monitoring CA related speech motor abnormalities.


Asunto(s)
Ataxia Cerebelosa/fisiopatología , Trastornos del Habla/clasificación , Adulto , Anciano , Anciano de 80 o más Años , Femenino , Humanos , Masculino , Persona de Mediana Edad , Habla , Trastornos del Habla/fisiopatología , Máquina de Vectores de Soporte
9.
Neural Comput Appl ; 30(8): 2581-2593, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30363735

RESUMEN

Speaker age and gender classification is one of the most challenging problems in speech signal processing. Recently with developing technologies, identifying speaker age and gender information has become a necessity for speaker verification and identification systems such as identifying suspects in criminal cases, improving human-machine interaction, and adapting music for awaiting people queue. Despite the intensive studies that have been conducted to extract descriptive and distinctive features, the classification accuracies are still not satisfactory. In this work, a model for generating bottleneck features from a deep neural network and a Gaussian Mixture Model-Universal Background Model (GMM-UBM) classifier are proposed for speaker age and gender classification problem. Deep neural network with a bottleneck layer is trained in an unsupervised manner for calculating the initial weights between layers. Then, it is trained and tuned in a supervised manner to generate transformed mel-frequency cepstral coefficients (T-MFCCs). The GMM-UBM is used to build a GMM model for each class, and the models are used to classify speaker age and gender. Age-annotated database of German telephone speech (aGender) is used to evaluate the proposed classification system. The newly generated T-MFCCs have shown potential to achieve significant classification improvements in speaker age and gender classification by using the GMM-UBM classifier. The proposed classification system achieved an overall accuracy of 57.63%. The highest accuracy is calculated as 72.97% for adult female speakers.

10.
Comput Biol Med ; 75: 118-29, 2016 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-27286184

RESUMEN

Lung sounds convey useful information related to pulmonary pathology. In this paper, short-term spectral characteristics of lung sounds are studied to characterize the lung sounds for the identification of associated diseases. Motivated by the success of cepstral features in speech signal classification, we evaluate five different cepstral features to recognize three types of lung sounds: normal, wheeze and crackle. Subsequently for fast and efficient classification, we propose a new feature set computed from the statistical properties of cepstral coefficients. Experiments are conducted on a dataset of 30 subjects using the artificial neural network (ANN) as a classifier. Results show that the statistical features extracted from mel-frequency cepstral coefficients (MFCCs) of lung sounds outperform commonly used wavelet-based features as well as standard cepstral coefficients including MFCCs. Further, we experimentally optimize different control parameters of the proposed feature extraction algorithm. Finally, we evaluate the features for noisy lung sound recognition. We have found that our newly investigated features are more robust than existing features and show better recognition accuracy even in low signal-to-noise ratios (SNRs).


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Ruidos Respiratorios/clasificación , Procesamiento de Señales Asistido por Computador , Humanos , Relación Señal-Ruido
11.
Artículo en Inglés | MEDLINE | ID: mdl-26380256

RESUMEN

Obstructive sleep apnea (OSA) is a disorder characterized by repeated pauses in breathing during sleep, which leads to deoxygenation and voiced chokes at the end of each episode. OSA is associated by daytime sleepiness and an increased risk of serious conditions such as cardiovascular disease, diabetes, and stroke. Between 2 and 7% of the adult population globally has OSA, but it is estimated that up to 90% of those are undiagnosed and untreated. Diagnosis of OSA requires expensive and cumbersome screening. Audio offers a potential non-contact alternative, particularly with the ubiquity of excellent signal processing on every phone. Previous studies have focused on the classification of snoring and apneic chokes. However, such approaches require accurate identification of events. This leads to limited accuracy and small study populations. In this work, we propose an alternative approach which uses multiscale entropy (MSE) coefficients presented to a classifier to identify disorder in vocal patterns indicative of sleep apnea. A database of 858 patients was used, the largest reported in this domain. Apneic choke, snore, and noise events encoded with speech analysis features were input into a linear classifier. Coefficients of MSE derived from the first 4 h of each recording were used to train and test a random forest to classify patients as apneic or not. Standard speech analysis approaches for event classification achieved an out-of-sample accuracy (Ac) of 76.9% with a sensitivity (Se) of 29.2% and a specificity (Sp) of 88.7% but high variance. For OSA severity classification, MSE provided an out-of-sample Ac of 79.9%, Se of 66.0%, and Sp = 88.8%. Including demographic information improved the MSE-based classification performance to Ac = 80.5%, Se = 69.2%, and Sp = 87.9%. These results indicate that audio recordings could be used in screening for OSA, but are generally under-sensitive.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA