**Click to Read Paper**

**Click to Read Paper**

**Click to Read Paper**

Supervised Machine Learning with a Novel Kernel Density Estimator

Oct 16, 2007

Yen-Jen Oyang, Darby Tien-Hao Chang, Yu-Yen Ou, Hao-Geng Hung, Chih-Peng Wu, Chien-Yu Chen

In recent years, kernel density estimation has been exploited by computer scientists to model machine learning problems. The kernel density estimation based approaches are of interest due to the low time complexity of either O(n) or O(n*log(n)) for constructing a classifier, where n is the number of sampling instances. Concerning design of kernel density estimators, one essential issue is how fast the pointwise mean square error (MSE) and/or the integrated mean square error (IMSE) diminish as the number of sampling instances increases. In this article, it is shown that with the proposed kernel function it is feasible to make the pointwise MSE of the density estimator converge at O(n^-2/3) regardless of the dimension of the vector space, provided that the probability density function at the point of interest meets certain conditions.
Oct 16, 2007

Yen-Jen Oyang, Darby Tien-Hao Chang, Yu-Yen Ou, Hao-Geng Hung, Chih-Peng Wu, Chien-Yu Chen

**Click to Read Paper**

Boltzmann Generators - Sampling Equilibrium States of Many-Body Systems with Deep Learning

Dec 04, 2018

Frank Noé, Hao Wu

Computing equilibrium states in condensed-matter many-body systems, such as solvated proteins, is a long-standing challenge. Lacking methods for generating statistically independent equilibrium samples directly, vast computational effort is invested for simulating these system in small steps, e.g., using Molecular Dynamics. Combining deep learning and statistical mechanics, we here develop Boltzmann Generators, that are shown to generate statistically independent samples of equilibrium states of representative condensed matter systems and complex polymers. Boltzmann Generators use neural networks to learn a coordinate transformation of the complex configurational equilibrium distribution to a distribution that can be easily sampled. Accurate computation of free energy differences, and discovery of new system states are demonstrated, providing a new statistical mechanics tool that performs orders of magnitude faster than standard simulation methods.
Dec 04, 2018

Frank Noé, Hao Wu

**Click to Read Paper**

Variational approach for learning Markov processes from time series data

Dec 11, 2017

Hao Wu, Frank Noé

Inference, prediction and control of complex dynamical systems from time series is important in many areas, including financial markets, power grid management, climate and weather modeling, or molecular dynamics. The analysis of such highly nonlinear dynamical systems is facilitated by the fact that we can often find a (generally nonlinear) transformation of the system coordinates to features in which the dynamics can be excellently approximated by a linear Markovian model. Moreover, the large number of system variables often change collectively on large time- and length-scales, facilitating a low-dimensional analysis in feature space. In this paper, we introduce a variational approach for Markov processes (VAMP) that allows us to find optimal feature mappings and optimal Markovian models of the dynamics from given time series data. The key insight is that the best linear model can be obtained from the top singular components of the Koopman operator. This leads to the definition of a family of score functions called VAMP-r which can be calculated from data, and can be employed to optimize a Markovian model. In addition, based on the relationship between the variational scores and approximation errors of Koopman operators, we propose a new VAMP-E score, which can be applied to cross-validation for hyper-parameter optimization and model selection in VAMP. VAMP is valid for both reversible and nonreversible processes and for stationary and non-stationary processes or realizations.
Dec 11, 2017

Hao Wu, Frank Noé

**Click to Read Paper**

Network Vector: Distributed Representations of Networks with Global Context

Sep 07, 2017

Hao Wu, Kristina Lerman

Sep 07, 2017

Hao Wu, Kristina Lerman

**Click to Read Paper**

**Click to Read Paper**

**Click to Read Paper**

Deep metric learning aims to learn a function mapping image pixels to embedding feature vectors that model the similarity between images. The majority of current approaches are non-parametric, learning the metric space directly through the supervision of similar (pairs) or relatively similar (triplets) sets of images. A difficult challenge for training these approaches is mining informative samples of images as the metric space is learned with only the local context present within a single mini-batch. Alternative approaches use parametric metric learning to eliminate the need for sampling through supervision of images to proxies. Although this simplifies optimization, such proxy-based approaches have lagged behind in performance. In this work, we demonstrate that a standard classification network can be transformed into a variant of proxy-based metric learning that is competitive against non-parametric approaches across a wide variety of image retrieval tasks. We address key challenges in proxy-based metric learning such as performance under extreme classification and describe techniques to stabilize and learn higher dimensional embeddings. We evaluate our approach on the CAR-196, CUB-200-2011, Stanford Online Product, and In-Shop datasets for image retrieval and clustering. Finally, we show that our softmax classification approach can learn high-dimensional binary embeddings that achieve new state-of-the-art performance on all datasets evaluated with a memory footprint that is the same or smaller than competing approaches.

**Click to Read Paper**
AutoPruner: An End-to-End Trainable Filter Pruning Method for Efficient Deep Model Inference

May 24, 2018

Jian-Hao Luo, Jianxin Wu

May 24, 2018

Jian-Hao Luo, Jianxin Wu

**Click to Read Paper**

Learning Effective Binary Visual Representations with Deep Networks

Mar 08, 2018

Jianxin Wu, Jian-Hao Luo

Mar 08, 2018

Jianxin Wu, Jian-Hao Luo

**Click to Read Paper**

**Click to Read Paper**

A Multi-Axis Annotation Scheme for Event Temporal Relations

May 14, 2018

Qiang Ning, Hao Wu, Dan Roth

Existing temporal relation (TempRel) annotation schemes often have low inter-annotator agreements (IAA) even between experts, suggesting that the current annotation task needs a better definition. This paper proposes a new multi-axis modeling to better capture the temporal structure of events. In addition, we identify that event end-points are a major source of confusion in annotation, so we also propose to annotate TempRels based on start-points only. A pilot expert annotation using the proposed scheme shows significant improvement in IAA from the conventional 60's to 80's (Cohen's Kappa). This better-defined annotation scheme further enables the use of crowdsourcing to alleviate the labor intensity for each annotator. We hope that this work can foster more interesting studies towards event understanding.
May 14, 2018

Qiang Ning, Hao Wu, Dan Roth

**Click to Read Paper**

ResumeVis: A Visual Analytics System to Discover Semantic Information in Semi-structured Resume Data

May 15, 2017

Chen Zhang, Hao Wang, Yingcai Wu

May 15, 2017

Chen Zhang, Hao Wang, Yingcai Wu

**Click to Read Paper**

Sparse Estimation of Multivariate Poisson Log-Normal Models from Count Data

Aug 12, 2016

Hao Wu, Xinwei Deng, Naren Ramakrishnan

Aug 12, 2016

Hao Wu, Xinwei Deng, Naren Ramakrishnan

**Click to Read Paper**

Modeling Coherence for Discourse Neural Machine Translation

Nov 14, 2018

Hao Xiong, Zhongjun He, Hua Wu, Haifeng Wang

Nov 14, 2018

Hao Xiong, Zhongjun He, Hua Wu, Haifeng Wang

**Click to Read Paper**

Multi-channel Encoder for Neural Machine Translation

Dec 06, 2017

Hao Xiong, Zhongjun He, Xiaoguang Hu, Hua Wu

Dec 06, 2017

Hao Xiong, Zhongjun He, Xiaoguang Hu, Hua Wu

**Click to Read Paper**

ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression

Jul 20, 2017

Jian-Hao Luo, Jianxin Wu, Weiyao Lin

We propose an efficient and unified framework, namely ThiNet, to simultaneously accelerate and compress CNN models in both training and inference stages. We focus on the filter level pruning, i.e., the whole filter would be discarded if it is less important. Our method does not change the original network structure, thus it can be perfectly supported by any off-the-shelf deep learning libraries. We formally establish filter pruning as an optimization problem, and reveal that we need to prune filters based on statistics information computed from its next layer, not the current layer, which differentiates ThiNet from existing methods. Experimental results demonstrate the effectiveness of this strategy, which has advanced the state-of-the-art. We also show the performance of ThiNet on ILSVRC-12 benchmark. ThiNet achieves 3.31$\times$ FLOPs reduction and 16.63$\times$ compression on VGG-16, with only 0.52$\%$ top-5 accuracy drop. Similar experiments with ResNet-50 reveal that even for a compact network, ThiNet can also reduce more than half of the parameters and FLOPs, at the cost of roughly 1$\%$ top-5 accuracy drop. Moreover, the original VGG-16 model can be further pruned into a very small model with only 5.05MB model size, preserving AlexNet level accuracy but showing much stronger generalization ability.
Jul 20, 2017

Jian-Hao Luo, Jianxin Wu, Weiyao Lin

**Click to Read Paper**

CIFT: Crowd-Informed Fine-Tuning to Improve Machine Learning Ability

Jun 28, 2017

John P. Lalor, Hao Wu, Hong Yu

Item Response Theory (IRT) allows for measuring ability of Machine Learning models as compared to a human population. However, it is difficult to create a large dataset to train the ability of deep neural network models (DNNs). We propose Crowd-Informed Fine-Tuning (CIFT) as a new training process, where a pre-trained model is fine-tuned with a specialized supplemental training set obtained via IRT model-fitting on a large set of crowdsourced response patterns. With CIFT we can leverage the specialized set of data obtained through IRT to inform parameter tuning in DNNs. We experiment with two loss functions in CIFT to represent (i) memorization of fine-tuning items and (ii) learning a probability distribution over potential labels that is similar to the crowdsourced distribution over labels to simulate crowd knowledge. Our results show that CIFT improves ability for a state-of-the art DNN model for Recognizing Textual Entailment (RTE) tasks and is generalizable to a large-scale RTE test set.
Jun 28, 2017

John P. Lalor, Hao Wu, Hong Yu

**Click to Read Paper**