RESUMEN
Extraction of a target sound source amidst multiple interfering sound sources is difficult when there are fewer sensors than sources, as is the case for human listeners in the classic cocktail-party situation. This study compares the signal extraction performance of five algorithms using recordings of speech sources made with three different two-microphone arrays in three rooms of varying reverberation time. Test signals, consisting of two to five speech sources, were constructed for each room and array. The signals were processed with each algorithm, and the signal extraction performance was quantified by calculating the signal-to-noise ratio of the output. A frequency-domain minimum-variance distortionless-response beamformer outperformed the time-domain based Frost beamformer and generalized sidelobe canceler for all tests with two or more interfering sound sources, and performed comparably or better than the time-domain algorithms for tests with one interfering sound source. The frequency-domain minimum-variance algorithm offered performance comparable to that of the Peissig-Kollmeier binaural frequency-domain algorithm, but with much less distortion of the target signal. Comparisons were also made to a simple beamformer. In addition, computer simulations illustrate that, when processing speech signals, the chosen implementation of the frequency-domain minimum-variance technique adapts more quickly and accurately than time-domain techniques.
Asunto(s)
Atención , Pruebas de Audición Dicótica , Audífonos , Enmascaramiento Perceptual , Medio Social , Acústica del Lenguaje , Percepción del Habla , Algoritmos , Análisis de Fourier , Audífonos/estadística & datos numéricos , Humanos , Percepción Sonora , Cómputos Matemáticos , Discriminación de la Altura Tonal , Diseño de Prótesis , Espectrografía del Sonido , Pruebas de Discriminación del HablaRESUMEN
Although Central Institute for the Deaf (CID) W-1 stimuli are routinely used for speech recognition threshold (SRT) testing, they are not always familiar to new learners of English and often lead to erroneous assessments. To improve test accuracy, alternative stimuli were constructed by pairing familiar English digits. These digit pairs were used to measure SRT for 12 non-native speakers of English and 12 native speakers of English. Results indicate that digit pairs effectively measure SRT for both participant groups; and more importantly, for non-native speakers of English, digit pairs are more accurate than CID W-1 words in measuring the hearing threshold for speech. Digit-pairs have cross-linguistic appeal and should greatly facilitate accurate SRT testing for listeners with minimal exposure to English.