Models, code, and papers for "Chengwei Huang":
In this dissertation the practical speech emotion recognition technology is studied, including several cognitive related emotion types, namely fidgetiness, confidence and tiredness. The high quality of naturalistic emotional speech data is the basis of this research. The following techniques are used for inducing practical emotional speech: cognitive task, computer game, noise stimulation, sleep deprivation and movie clips. A practical speech emotion recognition system is studied based on Gaussian mixture model. A two-class classifier set is adopted for performance improvement under the small sample case. Considering the context information in continuous emotional speech, a Gaussian mixture model embedded with Markov networks is proposed. A further study is carried out for system robustness analysis. First, noise reduction algorithm based on auditory masking properties is fist introduced to the practical speech emotion recognition. Second, to deal with the complicated unknown emotion types under real situation, an emotion recognition method with rejection ability is proposed, which enhanced the system compatibility against unknown emotion samples. Third, coping with the difficulties brought by a large number of unknown speakers, an emotional feature normalization method based on speaker-sensitive feature clustering is proposed. Fourth, by adding the electrocardiogram channel, a bi-modal emotion recognition system based on speech signals and electrocardiogram signals is first introduced. The speech emotion recognition methods studied in this dissertation may be extended into the cross-language speech emotion recognition and the whispered speech emotion recognition.
In this paper we study the personalized text search problem. The keyword based search method in conventional algorithms has a low efficiency in understanding users' intention since the semantic meaning, user profile, user interests are not always considered. Firstly, we propose a novel text search algorithm using a inverse filtering mechanism that is very efficient for label based item search. Secondly, we adopt the Bayesian network to implement the user interest prediction for an improved personalized search. According to user input, it searches the related items using keyword information, predicted user interest. Thirdly, the word vectorization is used to discover potential targets according to the semantic meaning. Experimental results show that the proposed search engine has an improved efficiency and accuracy and it can operate on embedded devices with very limited computational resources.
Simulation systems have become an essential component in the development and validation of autonomous driving technologies. The prevailing state-of-the-art approach for simulation is to use game engines or high-fidelity computer graphics (CG) models to create driving scenarios. However, creating CG models and vehicle movements (e.g., the assets for simulation) remains a manual task that can be costly and time-consuming. In addition, the fidelity of CG images still lacks the richness and authenticity of real-world images and using these images for training leads to degraded performance. In this paper we present a novel approach to address these issues: Augmented Autonomous Driving Simulation (AADS). Our formulation augments real-world pictures with a simulated traffic flow to create photo-realistic simulation images and renderings. More specifically, we use LiDAR and cameras to scan street scenes. From the acquired trajectory data, we generate highly plausible traffic flows for cars and pedestrians and compose them into the background. The composite images can be re-synthesized with different viewpoints and sensor models. The resulting images are photo-realistic, fully annotated, and ready for end-to-end training and testing of autonomous driving systems from perception to planning. We explain our system design and validate our algorithms with a number of autonomous driving tasks from detection to segmentation and predictions. Compared to traditional approaches, our method offers unmatched scalability and realism. Scalability is particularly important for AD simulation and we believe the complexity and diversity of the real world cannot be realistically captured in a virtual environment. Our augmented approach combines the flexibility in a virtual environment (e.g., vehicle movements) with the richness of the real world to allow effective simulation of anywhere on earth.
Online personalized news product needs a suitable cover for the article. The news cover demands to be with high image quality, and draw readers' attention at same time, which is extraordinary challenging due to the subjectivity of the task. In this paper, we assess the news cover from image clarity and object salience perspective. We propose an end-to-end multi-task learning network for image clarity assessment and semantic segmentation simultaneously, the results of which can be guided for news cover assessment. The proposed network is based on a modified DeepLabv3+ model. The network backbone is used for multiple scale spatial features exaction, followed by two branches for image clarity assessment and semantic segmentation, respectively. The experiment results show that the proposed model is able to capture important content in images and performs better than single-task learning baselines on our proposed game content based CIA dataset.