Alert button
Picture for Yusuke Ijima

Yusuke Ijima

Alert button

Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis

Add code
Bookmark button
Alert button
Feb 11, 2024
Kenichi Fujita, Atsushi Ando, Yusuke Ijima

Viaarxiv icon

What Do Self-Supervised Speech and Speaker Models Learn? New Findings From a Cross Model Layer-Wise Analysis

Add code
Bookmark button
Alert button
Jan 31, 2024
Takanori Ashihara, Marc Delcroix, Takafumi Moriya, Kohei Matsuura, Taichi Asami, Yusuke Ijima

Viaarxiv icon

Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters

Add code
Bookmark button
Alert button
Jan 10, 2024
Kenichi Fujita, Hiroshi Sato, Takanori Ashihara, Hiroki Kanagawa, Marc Delcroix, Takafumi Moriya, Yusuke Ijima

Viaarxiv icon

StyleCap: Automatic Speaking-Style Captioning from Speech Based on Speech and Language Self-supervised Learning Models

Add code
Bookmark button
Alert button
Nov 28, 2023
Kazuki Yamauchi, Yusuke Ijima, Yuki Saito

Viaarxiv icon

SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?

Add code
Bookmark button
Alert button
Jun 14, 2023
Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, Yusuke Ijima, Taichi Asami, Marc Delcroix, Yukinori Honma

Figure 1 for SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
Figure 2 for SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
Figure 3 for SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
Figure 4 for SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
Viaarxiv icon

Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model

Add code
Bookmark button
Alert button
Apr 24, 2023
Kenichi Fujita, Takanori Ashihara, Hiroki Kanagawa, Takafumi Moriya, Yusuke Ijima

Figure 1 for Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model
Figure 2 for Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model
Figure 3 for Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model
Figure 4 for Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model
Viaarxiv icon

SIMD-size aware weight regularization for fast neural vocoding on CPU

Add code
Bookmark button
Alert button
Nov 02, 2022
Hiroki Kanagawa, Yusuke Ijima

Figure 1 for SIMD-size aware weight regularization for fast neural vocoding on CPU
Figure 2 for SIMD-size aware weight regularization for fast neural vocoding on CPU
Figure 3 for SIMD-size aware weight regularization for fast neural vocoding on CPU
Figure 4 for SIMD-size aware weight regularization for fast neural vocoding on CPU
Viaarxiv icon

Model architectures to extrapolate emotional expressions in DNN-based text-to-speech

Add code
Bookmark button
Alert button
Feb 20, 2021
Katsuki Inoue, Sunao Hara, Masanobu Abe, Nobukatsu Hojo, Yusuke Ijima

Figure 1 for Model architectures to extrapolate emotional expressions in DNN-based text-to-speech
Figure 2 for Model architectures to extrapolate emotional expressions in DNN-based text-to-speech
Figure 3 for Model architectures to extrapolate emotional expressions in DNN-based text-to-speech
Figure 4 for Model architectures to extrapolate emotional expressions in DNN-based text-to-speech
Viaarxiv icon

V2S attack: building DNN-based voice conversion from automatic speaker verification

Add code
Bookmark button
Alert button
Aug 05, 2019
Taiki Nakamura, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Hiroshi Saruwatari

Figure 1 for V2S attack: building DNN-based voice conversion from automatic speaker verification
Figure 2 for V2S attack: building DNN-based voice conversion from automatic speaker verification
Figure 3 for V2S attack: building DNN-based voice conversion from automatic speaker verification
Figure 4 for V2S attack: building DNN-based voice conversion from automatic speaker verification
Viaarxiv icon