Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Koichi Miyazaki

Structured State Space Decoder for Speech Recognition and Synthesis

Oct 31, 2022
Koichi Miyazaki, Masato Murata, Tomoki Koriyama

Figure 1 for Structured State Space Decoder for Speech Recognition and Synthesis

Figure 2 for Structured State Space Decoder for Speech Recognition and Synthesis

Figure 3 for Structured State Space Decoder for Speech Recognition and Synthesis

Figure 4 for Structured State Space Decoder for Speech Recognition and Synthesis

Automatic speech recognition (ASR) systems developed in recent years have shown promising results with self-attention models (e.g., Transformer and Conformer), which are replacing conventional recurrent neural networks. Meanwhile, a structured state space model (S4) has been recently proposed, producing promising results for various long-sequence modeling tasks, including raw speech classification. The S4 model can be trained in parallel, same as the Transformer model. In this study, we applied S4 as a decoder for ASR and text-to-speech (TTS) tasks by comparing it with the Transformer decoder. For the ASR task, our experimental results demonstrate that the proposed model achieves a competitive word error rate (WER) of 1.88%/4.25% on LibriSpeech test-clean/test-other set and a character error rate (CER) of 3.80%/2.63%/2.98% on the CSJ eval1/eval2/eval3 set. Furthermore, the proposed model is more robust than the standard Transformer model, particularly for long-form speech on both the datasets. For the TTS task, the proposed method outperforms the Transformer baseline.

* Submitted to ICASSP 2023

Via

Access Paper or Ask Questions

Acoustic Event Detection with Classifier Chains

Feb 17, 2022
Tatsuya Komatsu, Shinji Watanabe, Koichi Miyazaki, Tomoki Hayashi

Figure 1 for Acoustic Event Detection with Classifier Chains

Figure 2 for Acoustic Event Detection with Classifier Chains

Figure 3 for Acoustic Event Detection with Classifier Chains

Figure 4 for Acoustic Event Detection with Classifier Chains

This paper proposes acoustic event detection (AED) with classifier chains, a new classifier based on the probabilistic chain rule. The proposed AED with classifier chains consists of a gated recurrent unit and performs iterative binary detection of each event one by one. In each iteration, the event's activity is estimated and used to condition the next output based on the probabilistic chain rule to form classifier chains. Therefore, the proposed method can handle the interdependence among events upon classification, while the conventional AED methods with multiple binary classifiers with a linear layer and sigmoid function have placed an assumption of conditional independence. In the experiments with a real-recording dataset, the proposed method demonstrates its superior AED performance to a relative 14.80% improvement compared to a convolutional recurrent neural network baseline system with the multiple binary classifiers.

* 5pages, presented at Interspeech2021

Via

Access Paper or Ask Questions