Picture for Nam Soo Kim

Nam Soo Kim

HILCodec: High Fidelity and Lightweight Neural Audio Codec

Add code
May 08, 2024
Viaarxiv icon

Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction

Add code
Jan 03, 2024
Viaarxiv icon

Efficient Parallel Audio Generation using Group Masked Language Modeling

Add code
Jan 02, 2024
Figure 1 for Efficient Parallel Audio Generation using Group Masked Language Modeling
Figure 2 for Efficient Parallel Audio Generation using Group Masked Language Modeling
Figure 3 for Efficient Parallel Audio Generation using Group Masked Language Modeling
Figure 4 for Efficient Parallel Audio Generation using Group Masked Language Modeling
Viaarxiv icon

EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings

Add code
Dec 11, 2023
Figure 1 for EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings
Figure 2 for EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings
Figure 3 for EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings
Figure 4 for EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings
Viaarxiv icon

Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction

Add code
Nov 08, 2023
Viaarxiv icon

EM-Network: Oracle Guided Self-distillation for Sequence Learning

Add code
Jun 14, 2023
Figure 1 for EM-Network: Oracle Guided Self-distillation for Sequence Learning
Figure 2 for EM-Network: Oracle Guided Self-distillation for Sequence Learning
Figure 3 for EM-Network: Oracle Guided Self-distillation for Sequence Learning
Figure 4 for EM-Network: Oracle Guided Self-distillation for Sequence Learning
Viaarxiv icon

MCR-Data2vec 2.0: Improving Self-supervised Speech Pre-training via Model-level Consistency Regularization

Add code
Jun 14, 2023
Figure 1 for MCR-Data2vec 2.0: Improving Self-supervised Speech Pre-training via Model-level Consistency Regularization
Figure 2 for MCR-Data2vec 2.0: Improving Self-supervised Speech Pre-training via Model-level Consistency Regularization
Figure 3 for MCR-Data2vec 2.0: Improving Self-supervised Speech Pre-training via Model-level Consistency Regularization
Figure 4 for MCR-Data2vec 2.0: Improving Self-supervised Speech Pre-training via Model-level Consistency Regularization
Viaarxiv icon

Towards single integrated spoofing-aware speaker verification embeddings

Add code
Jun 01, 2023
Figure 1 for Towards single integrated spoofing-aware speaker verification embeddings
Figure 2 for Towards single integrated spoofing-aware speaker verification embeddings
Figure 3 for Towards single integrated spoofing-aware speaker verification embeddings
Figure 4 for Towards single integrated spoofing-aware speaker verification embeddings
Viaarxiv icon

When Crowd Meets Persona: Creating a Large-Scale Open-Domain Persona Dialogue Corpus

Add code
Apr 01, 2023
Figure 1 for When Crowd Meets Persona: Creating a Large-Scale Open-Domain Persona Dialogue Corpus
Figure 2 for When Crowd Meets Persona: Creating a Large-Scale Open-Domain Persona Dialogue Corpus
Figure 3 for When Crowd Meets Persona: Creating a Large-Scale Open-Domain Persona Dialogue Corpus
Figure 4 for When Crowd Meets Persona: Creating a Large-Scale Open-Domain Persona Dialogue Corpus
Viaarxiv icon

SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech

Add code
Nov 30, 2022
Figure 1 for SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech
Figure 2 for SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech
Viaarxiv icon