Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anup Singh

Simultaneously Learning Robust Audio Embeddings and balanced Hash codes for Query-by-Example

Nov 20, 2022
Anup Singh, Kris Demuynck, Vipul Arora

Figure 1 for Simultaneously Learning Robust Audio Embeddings and balanced Hash codes for Query-by-Example

Figure 2 for Simultaneously Learning Robust Audio Embeddings and balanced Hash codes for Query-by-Example

Figure 3 for Simultaneously Learning Robust Audio Embeddings and balanced Hash codes for Query-by-Example

Figure 4 for Simultaneously Learning Robust Audio Embeddings and balanced Hash codes for Query-by-Example

Audio fingerprinting systems must efficiently and robustly identify query snippets in an extensive database. To this end, state-of-the-art systems use deep learning to generate compact audio fingerprints. These systems deploy indexing methods, which quantize fingerprints to hash codes in an unsupervised manner to expedite the search. However, these methods generate imbalanced hash codes, leading to their suboptimal performance. Therefore, we propose a self-supervised learning framework to compute fingerprints and balanced hash codes in an end-to-end manner to achieve both fast and accurate retrieval performance. We model hash codes as a balanced clustering process, which we regard as an instance of the optimal transport problem. Experimental results indicate that the proposed approach improves retrieval efficiency while preserving high accuracy, particularly at high distortion levels, compared to the competing methods. Moreover, our system is efficient and scalable in computational load and memory storage.

* Submitted to ICASSP2023

Via

Access Paper or Ask Questions

Attention-Based Audio Embeddings for Query-by-Example

Oct 16, 2022
Anup Singh, Kris Demuynck, Vipul Arora

Figure 1 for Attention-Based Audio Embeddings for Query-by-Example

Figure 2 for Attention-Based Audio Embeddings for Query-by-Example

Figure 3 for Attention-Based Audio Embeddings for Query-by-Example

Figure 4 for Attention-Based Audio Embeddings for Query-by-Example

An ideal audio retrieval system efficiently and robustly recognizes a short query snippet from an extensive database. However, the performance of well-known audio fingerprinting systems falls short at high signal distortion levels. This paper presents an audio retrieval system that generates noise and reverberation robust audio fingerprints using the contrastive learning framework. Using these fingerprints, the method performs a comprehensive search to identify the query audio and precisely estimate its timestamp in the reference audio. Our framework involves training a CNN to maximize the similarity between pairs of embeddings extracted from clean audio and its corresponding distorted and time-shifted version. We employ a channel-wise spectral-temporal attention mechanism to better discriminate the audio by giving more weight to the salient spectral-temporal patches in the signal. Experimental results indicate that our system is efficient in computation and memory usage while being more accurate, particularly at higher distortion levels, than competing state-of-the-art systems and scalable to a larger database.

Via

Access Paper or Ask Questions