Models, code, and papers for "Xin Wang":

Sparse Learning in reproducing kernel Hilbert space

Jan 03, 2019
Xin He, Junhui Wang

Sparse learning aims to learn the sparse structure of the true target function from the collected data, which plays a crucial role in high dimensional data analysis. This article proposes a unified and universal method for learning sparsity of M-estimators within a rich family of loss functions in a reproducing kernel Hilbert space (RKHS). The family of loss functions interested is very rich, including most commonly used ones in literature. More importantly, the proposed method is motivated by some nice properties in the induced RKHS, and is computationally efficient for large-scale data, and can be further improved through parallel computing. The asymptotic estimation and selection consistencies of the proposed method are established for a general loss function under mild conditions. It works for general loss function, admits general dependence structure, allows for efficient computation, and with theoretical guarantee. The superior performance of our proposed method is also supported by a variety of simulated examples and a real application in the human breast cancer study (GSE20194).

  Click for Model/Code and Paper
A Fast Training Algorithm for Deep Convolutional Fuzzy Systems with Application to Stock Index Prediction

Dec 07, 2018
Li-Xin Wang

A deep convolutional fuzzy system (DCFS) on a high-dimensional input space is a multi-layer connection of many low-dimensional fuzzy systems, where the input variables to the low-dimensional fuzzy systems are selected through a moving window (a convolution operator) across the input spaces of the layers. To design the DCFS based on input-output data pairs, we propose a bottom-up layer-by-layer scheme. Specifically, by viewing each of the first-layer fuzzy systems as a weak estimator of the output based only on a very small portion of the input variables, we can design these fuzzy systems using the WM Method. After the first-layer fuzzy systems are designed, we pass the data through the first layer and replace the inputs in the original data set by the corresponding outputs of the first layer to form a new data set, then we design the second-layer fuzzy systems based on this new data set in the same way as designing the first-layer fuzzy systems. Repeating this process we design the whole DCFS. Since the WM Method requires only one-pass of the data, this training algorithm for the DCFS is very fast. We apply the DCFS model with the training algorithm to predict a synthetic chaotic plus random time-series and the real Hang Seng Index of the Hong Kong stock market.

  Click for Model/Code and Paper
Gaussian-Chain Filters for Heavy-Tailed Noise with Application to Detecting Big Buyers and Big Sellers in Stock Market

May 09, 2014
Li-Xin Wang

We propose a new heavy-tailed distribution --- Gaussian-Chain (GC) distribution, which is inspirited by the hierarchical structures prevailing in social organizations. We determine the mean, variance and kurtosis of the Gaussian-Chain distribution to show its heavy-tailed property, and compute the tail distribution table to give specific numbers showing how heavy is the heavy-tails. To filter out the heavy-tailed noise, we construct two filters --- 2nd and 3rd-order GC filters --- based on the maximum likelihood principle. Simulation results show that the GC filters perform much better than the benchmark least-squares algorithm when the noise is heavy-tail distributed. Using the GC filters, we propose a trading strategy, named Ride-the-Mood, to follow the mood of the market by detecting the actions of the big buyers and the big sellers in the market based on the noisy, heavy-tailed price data. Application of the Ride-the-Mood strategy to five blue-chip Hong Kong stocks over the recent two-year period from April 2, 2012 to March 31, 2014 shows that their returns are higher than the returns of the benchmark Buy-and-Hold strategy and the Hang Seng Index Fund.

  Click for Model/Code and Paper
Recognizing License Plates in Real-Time

Jun 11, 2019
Xuewen Yang, Xin Wang

License plate detection and recognition (LPDR) is of growing importance for enabling intelligent transportation and ensuring the security and safety of the cities. However, LPDR faces a big challenge in a practical environment. The license plates can have extremely diverse sizes, fonts and colors, and the plate images are usually of poor quality caused by skewed capturing angles, uneven lighting, occlusion, and blurring. In applications such as surveillance, it often requires fast processing. To enable real-time and accurate license plate recognition, in this work, we propose a set of techniques: 1) a contour reconstruction method along with edge-detection to quickly detect the candidate plates; 2) a simple zero-one-alternation scheme to effectively remove the fake top and bottom borders around plates to facilitate more accurate segmentation of characters on plates; 3) a set of techniques to augment the training data, incorporate SIFT features into the CNN network, and exploit transfer learning to obtain the initial parameters for more effective training; and 4) a two-phase verification procedure to determine the correct plate at low cost, a statistical filtering in the plate detection stage to quickly remove unwanted candidates, and the accurate CR results after the CR process to perform further plate verification without additional processing. We implement a complete LPDR system based on our algorithms. The experimental results demonstrate that our system can accurately recognize license plate in real-time. Additionally, it works robustly under various levels of illumination and noise, and in the presence of car movement. Compared to peer schemes, our system is not only among the most accurate ones but is also the fastest, and can be easily applied to other scenarios.

* License Plate Detection and Recognition, Computer Vision, Supervised Learning 

  Click for Model/Code and Paper
Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization

Mar 19, 2019
Hesham Mostafa, Xin Wang

Deep neural networks are typically highly over-parameterized with pruning techniques able to remove a significant fraction of network parameters with little loss in accuracy. Recently, techniques based on dynamic re-allocation of non-zero parameters have emerged for training sparse networks directly without having to train a large dense model beforehand. We present a parameter re-allocation scheme that addresses the limitations of previous methods such as their high computational cost and the fixed number of parameters they allocate to each layer. We investigate the performance of these dynamic re-allocation methods in deep convolutional networks and show that our method outperforms previous static and dynamic parameterization methods, yielding the best accuracy for a given number of training parameters, and performing on par with networks obtained by iteratively pruning a trained dense model. We further investigated the mechanisms underlying the superior performance of the resulting sparse networks. We found that neither the structure, nor the initialization of the sparse networks discovered by our parameter reallocation scheme are sufficient to explain their superior generalization performance. Rather, it is the continuous exploration of different sparse network structures during training that is critical to effective learning. We show that it is more fruitful to explore these structural degrees of freedom than to add extra parameters to the network.

  Click for Model/Code and Paper
Deep Learning for Real-Time Crime Forecasting and its Ternarization

Nov 23, 2017
Bao Wang, Penghang Yin, Andrea L. Bertozzi, P. Jeffrey Brantingham, Stanley J. Osher, Jack Xin

Real-time crime forecasting is important. However, accurate prediction of when and where the next crime will happen is difficult. No known physical model provides a reasonable approximation to such a complex system. Historical crime data are sparse in both space and time and the signal of interests is weak. In this work, we first present a proper representation of crime data. We then adapt the spatial temporal residual network on the well represented data to predict the distribution of crime in Los Angeles at the scale of hours in neighborhood-sized parcels. These experiments as well as comparisons with several existing approaches to prediction demonstrate the superiority of the proposed model in terms of accuracy. Finally, we present a ternarization technique to address the resource consumption issue for its deployment in real world. This work is an extension of our short conference proceeding paper [Wang et al, Arxiv 1707.03340].

* 14 pages, 7 figures 

  Click for Model/Code and Paper
Scalable kernel-based variable selection with sparsistency

Feb 27, 2018
Xin He, Junhui Wang, Shaogao Lv

Variable selection is central to high-dimensional data analysis, and various algorithms have been developed. Ideally, a variable selection algorithm shall be flexible, scalable, and with theoretical guarantee, yet most existing algorithms cannot attain these properties at the same time. In this article, a three-step variable selection algorithm is developed, involving kernel-based estimation of the regression function and its gradient functions as well as a hard thresholding. Its key advantage is that it assumes no explicit model assumption, admits general predictor effects, allows for scalable computation, and attains desirable asymptotic sparsistency. The proposed algorithm can be adapted to any reproducing kernel Hilbert space (RKHS) with different kernel functions, and can be extended to interaction selection with slight modification. Its computational cost is only linear in the data dimension, and can be further improved through parallel computing. The sparsistency of the proposed algorithm is established for general RKHS under mild conditions, including linear and Gaussian kernels as special cases. Its effectiveness is also supported by a variety of simulated and real examples.

* 27 pages, 5 figures 

  Click for Model/Code and Paper
A multi-task learning model for malware classification with useful file access pattern from API call sequence

Oct 19, 2016
Xin Wang, Siu Ming Yiu

Based on API call sequences, semantic-aware and machine learning (ML) based malware classifiers can be built for malware detection or classification. Previous works concentrate on crafting and extracting various features from malware binaries, disassembled binaries or API calls via static or dynamic analysis and resorting to ML to build classifiers. However, they tend to involve too much feature engineering and fail to provide interpretability. We solve these two problems with the recent advances in deep learning: 1) RNN-based autoencoders (RNN-AEs) can automatically learn low-dimensional representation of a malware from its raw API call sequence. 2) Multiple decoders can be trained under different supervisions to give more information, other than the class or family label of a malware. Inspired by the works of document classification and automatic sentence summarization, each API call sequence can be regarded as a sentence. In this paper, we make the first attempt to build a multi-task malware learning model based on API call sequences. The model consists of two decoders, one for malware classification and one for $\emph{file access pattern}$ (FAP) generation given the API call sequence of a malware. We base our model on the general seq2seq framework. Experiments show that our model can give competitive classification results as well as insightful FAP information.

  Click for Model/Code and Paper
Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment

Oct 28, 2019
Yusuke Yasuda, Xin Wang, Junichi Yamagishi

Sequence-to-sequence text-to-speech (TTS) is dominated by soft-attention-based methods. Recently, hard-attention-based methods have been proposed to prevent fatal alignment errors, but their sampling method of discrete alignment is poorly investigated. This research investigates various combinations of sampling methods and probability distributions for alignment transition modeling in a hard-alignment-based sequence-to-sequence TTS method called SSNT-TTS. We clarify the common sampling methods of discrete variables including greedy search, beam search, and random sampling from a Bernoulli distribution in a more general way. Furthermore, we introduce the binary Concrete distribution to model discrete variables more properly. The results of a listening test shows that deterministic search is more preferable than stochastic search, and the binary Concrete distribution is robust with stochastic search for natural alignment transition.

* Submitted to ICASSP 2020 

  Click for Model/Code and Paper
Multi-modal Deep Analysis for Multimedia

Oct 11, 2019
Wenwu Zhu, Xin Wang, Hongzhi Li

With the rapid development of Internet and multimedia services in the past decade, a huge amount of user-generated and service provider-generated multimedia data become available. These data are heterogeneous and multi-modal in nature, imposing great challenges for processing and analyzing them. Multi-modal data consist of a mixture of various types of data from different modalities such as texts, images, videos, audios etc. In this article, we present a deep and comprehensive overview for multi-modal analysis in multimedia. We introduce two scientific research problems, data-driven correlational representation and knowledge-guided fusion for multimedia analysis. To address the two scientific problems, we investigate them from the following aspects: 1) multi-modal correlational representation: multi-modal fusion of data across different modalities, and 2) multi-modal data and knowledge fusion: multi-modal fusion of data with domain knowledge. More specifically, on data-driven correlational representation, we highlight three important categories of methods, such as multi-modal deep representation, multi-modal transfer learning, and multi-modal hashing. On knowledge-guided fusion, we discuss the approaches for fusing knowledge with data and four exemplar applications that require various kinds of domain knowledge, including multi-modal visual question answering, multi-modal video summarization, multi-modal visual pattern mining and multi-modal recommendation. Finally, we bring forward our insights and future research directions.


  Click for Model/Code and Paper
Additive Powers-of-Two Quantization: A Non-uniform Discretization for Neural Networks

Sep 28, 2019
Yuhang Li, Xin Dong, Wei Wang

We proposed Additive Powers-of-Two~(APoT) quantization, an efficient non-uniform quantization scheme that attends to the bell-shaped and long-tailed distribution of weights in neural networks. By constraining all quantization levels as a sum of several Powers-of-Two terms, APoT quantization enjoys overwhelming efficiency of computation and a good match with weights' distribution. A simple reparameterization on clipping function is applied to generate better-defined gradient for updating of optimal clipping threshold. Moreover, weight normalization is presented to refine the input distribution of weights to be more stable and consistent. Experimental results show that our proposed method outperforms state-of-the-art methods, and is even competitive with the full-precision models demonstrating the effectiveness of our proposed APoT quantization. For example, our 3-bit quantized ResNet-34 on ImageNet only drops 0.3% Top-1 and 0.2% Top-5 accuracy without bells and whistles, while the computation of our model is approximately 2x less than uniformly quantized neural networks.

* quantization, efficient neural network 

  Click for Model/Code and Paper
Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments

Aug 30, 2019
Yusuke Yasuda, Xin Wang, Junichi Yamagishi

End-to-end text-to-speech (TTS) synthesis is a method that directly converts input text to output acoustic features using a single network. A recent advance of end-to-end TTS is due to a key technique called attention mechanisms, and all successful methods proposed so far have been based on soft attention mechanisms. However, although network structures are becoming increasingly complex, end-to-end TTS systems with soft attention mechanisms may still fail to learn and to predict accurate alignment between the input and output. This may be because the soft attention mechanisms are too flexible. Therefore, we propose an approach that has more explicit but natural constraints suitable for speech signals to make alignment learning and prediction of end-to-end TTS systems more robust. The proposed system, with the constrained alignment scheme borrowed from segment-to-segment neural transduction (SSNT), directly calculates the joint probability of acoustic features and alignment given an input text. The alignment is designed to be hard and monotonically increase by considering the speech nature, and it is treated as a latent variable and marginalized during training. During prediction, both the alignment and acoustic features can be generated from the probabilistic distributions. The advantages of our approach are that we can simplify many modules for the soft attention and that we can train the end-to-end TTS model using a single likelihood function. As far as we know, our approach is the first end-to-end TTS without a soft attention mechanism.

* To be appeared at SSW10 

  Click for Model/Code and Paper
Self-Supervised Dialogue Learning

Jun 30, 2019
Jiawei Wu, Xin Wang, William Yang Wang

The sequential order of utterances is often meaningful in coherent dialogues, and the order changes of utterances could lead to low-quality and incoherent conversations. We consider the order information as a crucial supervised signal for dialogue learning, which, however, has been neglected by many previous dialogue systems. Therefore, in this paper, we introduce a self-supervised learning task, inconsistent order detection, to explicitly capture the flow of conversation in dialogues. Given a sampled utterance pair triple, the task is to predict whether it is ordered or misordered. Then we propose a sampling-based self-supervised network SSN to perform the prediction with sampled triple references from previous dialogue history. Furthermore, we design a joint learning framework where SSN can guide the dialogue systems towards more coherent and relevant dialogue learning through adversarial training. We demonstrate that the proposed methods can be applied to both open-domain and task-oriented dialogue scenarios, and achieve the new state-of-the-art performance on the OpenSubtitiles and Movie-Ticket Booking datasets.

* 11pages, 2 figures, accepted to ACL 2019 

  Click for Model/Code and Paper
Neural source-filter waveform models for statistical parametric speech synthesis

Apr 27, 2019
Xin Wang, Shinji Takaki, Junichi Yamagishi

Neural waveform models such as WaveNet have demonstrated better performance than conventional vocoders for statistical parametric speech synthesis. As an autoregressive (AR) model, WaveNet is limited by a slow sequential waveform generation process. Some new models that use the inverse-autoregressive flow (IAF) can generate a whole waveform in a one-shot manner. However, these IAF-based models require sequential transformation during training, which severely slows down the training speed. Other models such as Parallel WaveNet and ClariNet bring together the benefits of AR and IAF-based models and train an IAF model by transferring the knowledge from a pre-trained AR teacher to an IAF student without any sequential transformation. However, both models require additional training criteria, and their implementation is prohibitively complicated. We propose a framework for neural source-filter (NSF) waveform modeling without AR nor IAF-based approaches. This framework requires only three components for waveform generation: a source module that generates a sine-based signal as excitation, a non-AR dilated-convolution-based filter module that transforms the excitation into a waveform, and a conditional module that pre-processes the acoustic features for the source and filer modules. This framework minimizes spectral-amplitude distances for model training, which can be efficiently implemented by using short-time Fourier transform routines. Under this framework, we designed three NSF models and compared them with WaveNet. It was demonstrated that the NSF models generated waveforms at least 100 times faster than WaveNet, and the quality of the synthetic speech from the best NSF model was better than or equally good as that from WaveNet.

* Submitted to IEEE/ACM TASLP 

  Click for Model/Code and Paper
Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation

Apr 04, 2019
Jiawei Wu, Xin Wang, William Yang Wang

The overreliance on large parallel corpora significantly limits the applicability of machine translation systems to the majority of language pairs. Back-translation has been dominantly used in previous approaches for unsupervised neural machine translation, where pseudo sentence pairs are generated to train the models with a reconstruction loss. However, the pseudo sentences are usually of low quality as translation errors accumulate during training. To avoid this fundamental issue, we propose an alternative but more effective approach, extract-edit, to extract and then edit real sentences from the target monolingual corpora. Furthermore, we introduce a comparative translation loss to evaluate the translated target sentences and thus train the unsupervised translation systems. Experiments show that the proposed approach consistently outperforms the previous state-of-the-art unsupervised machine translation systems across two benchmarks (English-French and English-German) and two low-resource language pairs (English-Romanian and English-Russian) by more than 2 (up to 3.63) BLEU points.

* 11 pages, 3 figures. Accepted to NAACL 2019 

  Click for Model/Code and Paper
AdaLinUCB: Opportunistic Learning for Contextual Bandits

Feb 20, 2019
Xueying Guo, Xiaoxiao Wang, Xin Liu

In this paper, we propose and study opportunistic contextual bandits - a special case of contextual bandits where the exploration cost varies under different environmental conditions, such as network load or return variation in recommendations. When the exploration cost is low, so is the actual regret of pulling a sub-optimal arm (e.g., trying a suboptimal recommendation). Therefore, intuitively, we could explore more when the exploration cost is relatively low and exploit more when the exploration cost is relatively high. Inspired by this intuition, for opportunistic contextual bandits with Linear payoffs, we propose an Adaptive Upper-Confidence-Bound algorithm (AdaLinUCB) to adaptively balance the exploration-exploitation trade-off for opportunistic learning. We prove that AdaLinUCB achieves O((log T)^2) problem-dependent regret upper bound, which has a smaller coefficient than that of the traditional LinUCB algorithm. Moreover, based on both synthetic and real-world dataset, we show that AdaLinUCB significantly outperforms other contextual bandit algorithms, under large exploration cost fluctuations.

  Click for Model/Code and Paper
Pornographic Image Recognition via Weighted Multiple Instance Learning

Feb 11, 2019
Jin Xin, Wang Yuhui, Tan Xiaoyang

In the era of Internet, recognizing pornographic images is of great significance for protecting children's physical and mental health. However, this task is very challenging as the key pornographic contents (e.g., breast and private part) in an image often lie in local regions of small size. In this paper, we model each image as a bag of regions, and follow a multiple instance learning (MIL) approach to train a generic region-based recognition model. Specifically, we take into account the region's degree of pornography, and make three main contributions. First, we show that based on very few annotations of the key pornographic contents in a training image, we can generate a bag of properly sized regions, among which the potential positive regions usually contain useful contexts that can aid recognition. Second, we present a simple quantitative measure of a region's degree of pornography, which can be used to weigh the importance of different regions in a positive image. Third, we formulate the recognition task as a weighted MIL problem under the convolutional neural network framework, with a bag probability function introduced to combine the importance of different regions. Experiments on our newly collected large scale dataset demonstrate the effectiveness of the proposed method, achieving an accuracy with 97.52% true positive rate at 1% false positive rate, tested on 100K pornographic images and 100K normal images.

* IEEE transactions on cybernetics, 2018 
* 9 pages, 3 figures 

  Click for Model/Code and Paper
Conditional Graph Neural Processes: A Functional Autoencoder Approach

Dec 13, 2018
Marcel Nassar, Xin Wang, Evren Tumer

We introduce a novel encoder-decoder architecture to embed functional processes into latent vector spaces. This embedding can then be decoded to sample the encoded functions over any arbitrary domain. This autoencoder generalizes the recently introduced Conditional Neural Process (CNP) model of random processes. Our architecture employs the latest advances in graph neural networks to process irregularly sampled functions. Thus, we refer to our model as Conditional Graph Neural Process (CGNP). Graph neural networks can effectively exploit `local' structures of the metric spaces over which the functions/processes are defined. The contributions of this paper are twofold: (i) a novel graph-based encoder-decoder architecture for functional and process embeddings, and (ii) a demonstration of the importance of using the structure of metric spaces for this type of representations.

* 3 pages, 1 figure, 1 table, published in the Third Workshop on Bayesian Deep Learning (NeurIPS 2018), Montr\'eal, Canada 

  Click for Model/Code and Paper
Neural source-filter-based waveform model for statistical parametric speech synthesis

Oct 31, 2018
Xin Wang, Shinji Takaki, Junichi Yamagishi

Neural waveform models such as the WaveNet are used in many recent text-to-speech systems, but the original WaveNet is quite slow in waveform generation because of its autoregressive (AR) structure. Although faster non-AR models were recently reported, they may be prohibitively complicated due to the use of a distilling training method and the blend of other disparate training criteria. This study proposes a non-AR neural source-filter waveform model that can be directly trained using spectrum-based training criteria and the stochastic gradient descent method. Given the input acoustic features, the proposed model first uses a source module to generate a sine-based excitation signal and then uses a filter module to transform the excitation signal into the output speech waveform. Our experiments demonstrated that the proposed model generated waveforms at least 100 times faster than the AR WaveNet and the quality of its synthetic speech is close to that of speech generated by the AR WaveNet. Ablation test results showed that both the sine-wave excitation signal and the spectrum-based training criteria were essential to the performance of the proposed model.

* Submitted to ICASSP 2019 

  Click for Model/Code and Paper