Models, code, and papers for "Sangwook Park":
Simulators that generate observations based on theoretical models can be important tools for development, prediction, and assessment of signal processing algorithms. In order to design these simulators, painstaking effort is required to construct mathematical models according to their application. Complex models are sometimes necessary to represent a variety of real phenomena. In contrast, obtaining synthetic observations from generative models developed from real observations often require much less effort. This paper proposes a generative model based on adversarial learning. Given that observations are typically signals composed of a linear combination of sinusoidal waves and random noises, sinusoidal wave generating networks are first designed based on an adversarial network. Audio waveform generation can then be performed using the proposed network. Several approaches to designing the objective function of the proposed network using adversarial learning are investigated experimentally. In addition, amphibian sound classification is performed using a convolutional neural network trained with real and synthetic sounds. Both qualitative and quantitative results show that the proposed generative model makes realistic signals and is very helpful for data augmentation and data analysis.
We propose a condition-adaptive representation learning framework for the driver drowsiness detection based on 3D-deep convolutional neural network. The proposed framework consists of four models: spatio-temporal representation learning, scene condition understanding, feature fusion, and drowsiness detection. The spatio-temporal representation learning extracts features that can describe motions and appearances in video simultaneously. The scene condition understanding classifies the scene conditions related to various conditions about the drivers and driving situations such as statuses of wearing glasses, illumination condition of driving, and motion of facial elements such as head, eye, and mouth. The feature fusion generates a condition-adaptive representation using two features extracted from above models. The detection model recognizes drivers drowsiness status using the condition-adaptive representation. The condition-adaptive representation learning framework can extract more discriminative features focusing on each scene condition than the general representation so that the drowsiness detection method can provide more accurate results for the various driving situations. The proposed framework is evaluated with the NTHU Drowsy Driver Detection video dataset. The experimental results show that our framework outperforms the existing drowsiness detection methods based on visual analysis.
Performance of learning based Automatic Speech Recognition (ASR) is susceptible to noise, especially when it is introduced in the testing data while not presented in the training data. This work focuses on a feature enhancement for noise robust end-to-end ASR system by introducing a novel variant of denoising autoencoder (DAE). The proposed method uses skip connections in both encoder and decoder sides by passing speech information of the target frame from input to the model. It also uses a new objective function in training model that uses a correlation distance measure in penalty terms by measuring dependency of the latent target features and the model (latent features and enhanced features obtained from the DAE). Performance of the proposed method was compared against a conventional model and a state of the art model under both seen and unseen noisy environments of 7 different types of background noise with different SNR levels (0, 5, 10 and 20 dB). The proposed method also is tested using linear and non-linear penalty terms as well, where, they both show an improvement on the overall average WER under noisy conditions both seen and unseen in comparison to the state-of-the-art model.