Models, code, and papers for "Chengwei Huang":
In this dissertation the practical speech emotion recognition technology is studied, including several cognitive related emotion types, namely fidgetiness, confidence and tiredness. The high quality of naturalistic emotional speech data is the basis of this research. The following techniques are used for inducing practical emotional speech: cognitive task, computer game, noise stimulation, sleep deprivation and movie clips. A practical speech emotion recognition system is studied based on Gaussian mixture model. A two-class classifier set is adopted for performance improvement under the small sample case. Considering the context information in continuous emotional speech, a Gaussian mixture model embedded with Markov networks is proposed. A further study is carried out for system robustness analysis. First, noise reduction algorithm based on auditory masking properties is fist introduced to the practical speech emotion recognition. Second, to deal with the complicated unknown emotion types under real situation, an emotion recognition method with rejection ability is proposed, which enhanced the system compatibility against unknown emotion samples. Third, coping with the difficulties brought by a large number of unknown speakers, an emotional feature normalization method based on speaker-sensitive feature clustering is proposed. Fourth, by adding the electrocardiogram channel, a bi-modal emotion recognition system based on speech signals and electrocardiogram signals is first introduced. The speech emotion recognition methods studied in this dissertation may be extended into the cross-language speech emotion recognition and the whispered speech emotion recognition.
In this paper we study the personalized text search problem. The keyword based search method in conventional algorithms has a low efficiency in understanding users' intention since the semantic meaning, user profile, user interests are not always considered. Firstly, we propose a novel text search algorithm using a inverse filtering mechanism that is very efficient for label based item search. Secondly, we adopt the Bayesian network to implement the user interest prediction for an improved personalized search. According to user input, it searches the related items using keyword information, predicted user interest. Thirdly, the word vectorization is used to discover potential targets according to the semantic meaning. Experimental results show that the proposed search engine has an improved efficiency and accuracy and it can operate on embedded devices with very limited computational resources.
Simulation systems have become an essential component in the development and validation of autonomous driving technologies. The prevailing state-of-the-art approach for simulation is to use game engines or high-fidelity computer graphics (CG) models to create driving scenarios. However, creating CG models and vehicle movements (e.g., the assets for simulation) remains a manual task that can be costly and time-consuming. In addition, the fidelity of CG images still lacks the richness and authenticity of real-world images and using these images for training leads to degraded performance. In this paper we present a novel approach to address these issues: Augmented Autonomous Driving Simulation (AADS). Our formulation augments real-world pictures with a simulated traffic flow to create photo-realistic simulation images and renderings. More specifically, we use LiDAR and cameras to scan street scenes. From the acquired trajectory data, we generate highly plausible traffic flows for cars and pedestrians and compose them into the background. The composite images can be re-synthesized with different viewpoints and sensor models. The resulting images are photo-realistic, fully annotated, and ready for end-to-end training and testing of autonomous driving systems from perception to planning. We explain our system design and validate our algorithms with a number of autonomous driving tasks from detection to segmentation and predictions. Compared to traditional approaches, our method offers unmatched scalability and realism. Scalability is particularly important for AD simulation and we believe the complexity and diversity of the real world cannot be realistically captured in a virtual environment. Our augmented approach combines the flexibility in a virtual environment (e.g., vehicle movements) with the richness of the real world to allow effective simulation of anywhere on earth.
Online personalized news product needs a suitable cover for the article. The news cover demands to be with high image quality, and draw readers' attention at same time, which is extraordinary challenging due to the subjectivity of the task. In this paper, we assess the news cover from image clarity and object salience perspective. We propose an end-to-end multi-task learning network for image clarity assessment and semantic segmentation simultaneously, the results of which can be guided for news cover assessment. The proposed network is based on a modified DeepLabv3+ model. The network backbone is used for multiple scale spatial features exaction, followed by two branches for image clarity assessment and semantic segmentation, respectively. The experiment results show that the proposed model is able to capture important content in images and performs better than single-task learning baselines on our proposed game content based CIA dataset.
Acoustic anomaly detection aims at distinguishing abnormal acoustic signals from the normal ones. It suffers from the class imbalance issue and the lacking in the abnormal instances. In addition, collecting all kinds of abnormal or unknown samples for training purpose is impractical and timeconsuming. In this paper, a novel Gaussian Mixture Generative Adversarial Network (GMGAN) is proposed under semi-supervised learning framework, in which the underlying structure of training data is not only captured in spectrogram reconstruction space, but also can be further restricted in the space of latent representation in a discriminant manner. Experiments show that our model has clear superiority over previous methods, and achieves the state-of-the-art results on DCASE dataset.
Anomaly detection is a fundamental problem in computer vision area with many real-world applications. Given a wide range of images belonging to the normal class, emerging from some distribution, the objective of this task is to construct the model to detect out-of-distribution images belonging to abnormal instances. Semi-supervised Generative Adversarial Networks (GAN)-based methods have been gaining popularity in anomaly detection task recently. However, the training process of GAN is still unstable and challenging. To solve these issues, a novel adversarial dual autoencoder network is proposed, in which the underlying structure of training data is not only captured in latent feature space, but also can be further restricted in the space of latent representation in a discriminant manner, leading to a more accurate detector. In addition, the auxiliary autoencoder regarded as a discriminator could obtain an more stable training process. Experiments show that our model achieves the state-of-the-art results on MNIST and CIFAR10 datasets as well as GTSRB stop signs dataset.
One-class novelty detection is the process of determining if a query example differs from the training examples (the target class). Most of previous strategies attempt to learn the real characteristics of target sample by using generative adversarial networks (GANs) methods. However, the training process of GANs remains challenging, suffering from instability issues such as mode collapse and vanishing gradients. In this paper, by adopting non-adversarial generative networks, a novel decoder-encoder framework is proposed for novelty detection task, insteading of classical encoder-decoder style. Under the non-adversarial framework, both latent space and image reconstruction space are jointly optimized, leading to a more stable training process with super fast convergence and lower training losses. During inference, inspired by cycleGAN, we design a new testing scheme to conduct image reconstruction, which is the reverse way of training sequence. Experiments show that our model has the clear superiority over cutting-edge novelty detectors and achieves the state-of-the-art results on the datasets.