This work investigates Artificial Intelligence (AI) systems that detect respiratory insufficiency (RI) by analyzing speech audios, thus treating speech as a RI biomarker. Previous works collected RI data (P1) from COVID-19 patients during the first phase of the pandemic and trained modern AI models, such as CNNs and Transformers, which achieved $96.5\%$ accuracy, showing the feasibility of RI detection via AI. Here, we collect RI patient data (P2) with several causes besides COVID-19, aiming at extending AI-based RI detection. We also collected control data from hospital patients without RI. We show that the considered models, when trained on P1, do not generalize to P2, indicating that COVID-19 RI has features that may not be found in all RI types.
The NURC Project that started in 1969 to study the cultured linguistic urban norm spoken in five Brazilian capitals, was responsible for compiling a large corpus for each capital. The digitized NURC/SP comprises 375 inquiries in 334 hours of recordings taken in S\~ao Paulo capital. Although 47 inquiries have transcripts, there was no alignment between the audio-transcription, and 328 inquiries were not transcribed. This article presents an evaluation and error analysis of three automatic speech recognition models trained with spontaneous speech in Portuguese and one model trained with prepared speech. The evaluation allowed us to choose the best model, using WER and CER metrics, in a manually aligned sample of NURC/SP, to automatically transcribe 284 hours.