Models, code, and papers for "Xuan Zhang":

Machine Translation System Selection from Bandit Feedback

Feb 22, 2020
Jason Naradowsky, Xuan Zhang, Kevin Duh

Adapting machine translation systems in the real world is a difficult problem. In contrast to offline training, users cannot provide the type of fine-grained feedback typically used for improving the system. Moreover, users have different translation needs, and even a single user's needs may change over time. In this work we take a different approach, treating the problem of adapting as one of selection. Instead of adapting a single system, we train many translation systems using different architectures and data partitions. Using bandit learning techniques on simulated user feedback, we learn a policy to choose which system to use for a particular translation task. We show that our approach can (1) quickly adapt to address domain changes in translation tasks, (2) outperform the single best system in mixed-domain translation tasks, and (3) make effective instance-specific decisions when using contextual bandit strategies.


  Access Model/Code and Paper
Multi-Target, Multi-Camera Tracking by Hierarchical Clustering: Recent Progress on DukeMTMC Project

Dec 27, 2017
Zhimeng Zhang, Jianan Wu, Xuan Zhang, Chi Zhang

Although many methods perform well in single camera tracking, multi-camera tracking remains a challenging problem with less attention. DukeMTMC is a large-scale, well-annotated multi-camera tracking benchmark which makes great progress in this field. This report is dedicated to briefly introduce our method on DukeMTMC and show that simple hierarchical clustering with well-trained person re-identification features can get good results on this dataset.

* 4 pages, 1 figure 

  Access Model/Code and Paper
Visual Graph Mining

Aug 13, 2017
Quanshi Zhang, Xuan Song, Ryosuke Shibasaki

In this study, we formulate the concept of "mining maximal-size frequent subgraphs" in the challenging domain of visual data (images and videos). In general, visual knowledge can usually be modeled as attributed relational graphs (ARGs) with local attributes representing local parts and pairwise attributes describing the spatial relationship between parts. Thus, from a practical perspective, such mining of maximal-size subgraphs can be regarded as a general platform for discovering and modeling the common objects within cluttered and unlabeled visual data. Then, from a theoretical perspective, visual graph mining should encode and overcome the great fuzziness of messy data collected from complex real-world situations, which conflicts with the conventional theoretical basis of graph mining designed for tabular data. Common subgraphs hidden in these ARGs usually have soft attributes, with considerable inter-graph variation. More importantly, we should also discover the latent pattern space, including similarity metrics for the pattern and hidden node relations, during the mining process. In this study, we redefine the visual subgraph pattern that encodes all of these challenges in a general way, and propose an approximate but efficient solution to graph mining. We conduct five experiments to evaluate our method with different kinds of visual data, including videos and RGB/RGB-D images. These experiments demonstrate the generality of the proposed method.


  Access Model/Code and Paper
Multiplicative Gaussian Particle Filter

Feb 29, 2020
Xuan Su, Wee Sun Lee, Zhen Zhang

We propose a new sampling-based approach for approximate inference in filtering problems. Instead of approximating conditional distributions with a finite set of states, as done in particle filters, our approach approximates the distribution with a weighted sum of functions from a set of continuous functions. Central to the approach is the use of sampling to approximate multiplications in the Bayes filter. We provide theoretical analysis, giving conditions for sampling to give good approximation. We next specialize to the case of weighted sums of Gaussians, and show how properties of Gaussians enable closed-form transition and efficient multiplication. Lastly, we conduct preliminary experiments on a robot localization problem and compare performance with the particle filter, to demonstrate the potential of the proposed method.


  Access Model/Code and Paper
Quaternion Product Units for Deep Learning on 3D Rotation Groups

Dec 17, 2019
Xuan Zhang, Shaofei Qin, Yi Xu, Hongteng Xu

We propose a novel quaternion product unit (QPU) to represent data on 3D rotation groups. The QPU leverages quaternion algebra and the law of 3D rotation group, representing 3D rotation data as quaternions and merging them via a weighted chain of Hamilton products. We prove that the representations derived by the proposed QPU can be disentangled into "rotation-invariant" features and "rotation-equivariant" features, respectively, which supports the rationality and the efficiency of the QPU in theory. We design quaternion neural networks based on our QPUs and make our models compatible with existing deep learning models. Experiments on both synthetic and real-world data show that the proposed QPU is beneficial for the learning tasks requiring rotation robustness.


  Access Model/Code and Paper
An Empirical Study on Leveraging Scene Graphs for Visual Question Answering

Jul 28, 2019
Cheng Zhang, Wei-Lun Chao, Dong Xuan

Visual question answering (Visual QA) has attracted significant attention these years. While a variety of algorithms have been proposed, most of them are built upon different combinations of image and language features as well as multi-modal attention and fusion. In this paper, we investigate an alternative approach inspired by conventional QA systems that operate on knowledge graphs. Specifically, we investigate the use of scene graphs derived from images for Visual QA: an image is abstractly represented by a graph with nodes corresponding to object entities and edges to object relationships. We adapt the recently proposed graph network (GN) to encode the scene graph and perform structured reasoning according to the input question. Our empirical studies demonstrate that scene graphs can already capture essential information of images and graph networks have the potential to outperform state-of-the-art Visual QA algorithms but with a much cleaner architecture. By analyzing the features generated by GNs we can further interpret the reasoning process, suggesting a promising direction towards explainable Visual QA.

* Accepted as oral presentation at BMVC 2019 

  Access Model/Code and Paper
Diversity, Density, and Homogeneity: Quantitative Characteristic Metrics for Text Collections

Mar 19, 2020
Yi-An Lai, Xuan Zhu, Yi Zhang, Mona Diab

Summarizing data samples by quantitative measures has a long history, with descriptive statistics being a case in point. However, as natural language processing methods flourish, there are still insufficient characteristic metrics to describe a collection of texts in terms of the words, sentences, or paragraphs they comprise. In this work, we propose metrics of diversity, density, and homogeneity that quantitatively measure the dispersion, sparsity, and uniformity of a text collection. We conduct a series of simulations to verify that each metric holds desired properties and resonates with human intuitions. Experiments on real-world datasets demonstrate that the proposed characteristic metrics are highly correlated with text classification performance of a renowned model, BERT, which could inspire future applications.

* Accepted by LREC 2020 

  Access Model/Code and Paper
Neural Network-Inspired Analog-to-Digital Conversion to Achieve Super-Resolution with Low-Precision RRAM Devices

Nov 28, 2019
Weidong Cao, Liu Ke, Ayan Chakrabarti, Xuan Zhang

Recent works propose neural network- (NN-) inspired analog-to-digital converters (NNADCs) and demonstrate their great potentials in many emerging applications. These NNADCs often rely on resistive random-access memory (RRAM) devices to realize the NN operations and require high-precision RRAM cells (6~12-bit) to achieve a moderate quantization resolution (4~8-bit). Such optimistic assumption of RRAM resolution, however, is not supported by fabrication data of RRAM arrays in large-scale production process. In this paper, we propose an NN-inspired super-resolution ADC based on low-precision RRAM devices by taking the advantage of a co-design methodology that combines a pipelined hardware architecture with a custom NN training framework. Results obtained from SPICE simulations demonstrate that our method leads to robust design of a 14-bit super-resolution ADC using 3-bit RRAM devices with improved power and speed performance and competitive figure-of-merits (FoMs). In addition to the linear uniform quantization, the proposed ADC can also support configurable high-resolution nonlinear quantization with high conversion speed and low conversion energy, enabling future intelligent analog-to-information interfaces for near-sensor analytics and processing.

* 7 pages, ICCAD 2019 

  Access Model/Code and Paper
Simple Physical Adversarial Examples against End-to-End Autonomous Driving Models

Mar 12, 2019
Adith Boloor, Xin He, Christopher Gill, Yevgeniy Vorobeychik, Xuan Zhang

Recent advances in machine learning, especially techniques such as deep neural networks, are promoting a range of high-stakes applications, including autonomous driving, which often relies on deep learning for perception. While deep learning for perception has been shown to be vulnerable to a host of subtle adversarial manipulations of images, end-to-end demonstrations of successful attacks, which manipulate the physical environment and result in physical consequences, are scarce. Moreover, attacks typically involve carefully constructed adversarial examples at the level of pixels. We demonstrate the first end-to-end attacks on autonomous driving in simulation, using simple physically realizable attacks: the painting of black lines on the road. These attacks target deep neural network models for end-to-end autonomous driving control. A systematic investigation shows that such attacks are surprisingly easy to engineer, and we describe scenarios (e.g., right turns) in which they are highly effective, and others that are less vulnerable (e.g., driving straight). Further, we use network deconvolution to demonstrate that the attacks succeed by inducing activation patterns similar to entirely different scenarios used in training.


  Access Model/Code and Paper
Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech Synthesis

Jul 18, 2018
Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai

This paper proposes a forward attention method for the sequenceto- sequence acoustic modeling of speech synthesis. This method is motivated by the nature of the monotonic alignment from phone sequences to acoustic sequences. Only the alignment paths that satisfy the monotonic condition are taken into consideration at each decoder timestep. The modified attention probabilities at each timestep are computed recursively using a forward algorithm. A transition agent for forward attention is further proposed, which helps the attention mechanism to make decisions whether to move forward or stay at each decoder timestep. Experimental results show that the proposed forward attention method achieves faster convergence speed and higher stability than the baseline attention method. Besides, the method of forward attention with transition agent can also help improve the naturalness of synthetic speech and control the speed of synthetic speech effectively.

* IEEE International Conference on Acoustics, Speech and Signal Processing 2018 (ICASSP2018) 
* 5 pages, 3 figures, 2 tables. Published in IEEE International Conference on Acoustics, Speech and Signal Processing 2018 (ICASSP2018) 

  Access Model/Code and Paper
AxTrain: Hardware-Oriented Neural Network Training for Approximate Inference

May 21, 2018
Xin He, Liu Ke, Wenyan Lu, Guihai Yan, Xuan Zhang

The intrinsic error tolerance of neural network (NN) makes approximate computing a promising technique to improve the energy efficiency of NN inference. Conventional approximate computing focuses on balancing the efficiency-accuracy trade-off for existing pre-trained networks, which can lead to suboptimal solutions. In this paper, we propose AxTrain, a hardware-oriented training framework to facilitate approximate computing for NN inference. Specifically, AxTrain leverages the synergy between two orthogonal methods---one actively searches for a network parameters distribution with high error tolerance, and the other passively learns resilient weights by numerically incorporating the noise distributions of the approximate hardware in the forward pass during the training phase. Experimental results from various datasets with near-threshold computing and approximation multiplication strategies demonstrate AxTrain's ability to obtain resilient neural network parameters and system energy efficiency improvement.

* In International Symposium on Low Power Electronics and Design (ISLPED) 2018 

  Access Model/Code and Paper
Automated Detecting and Placing Road Objects from Street-level Images

Sep 17, 2019
Chaoquan Zhang, Hongchao Fan, Wanzhi Li, Bo Mao, Xuan Ding

Navigation services utilized by autonomous vehicles or ordinary users require the availability of detailed information about road-related objects and their geolocations, especially at road intersections. However, these road intersections are mainly represented as point elements without detailed information, or are even not available in current versions of crowdsourced mapping databases including OpenStreetMap(OSM). This study develops an approach to automatically detect road objects and place them to right location from street-level images. Our processing pipeline relies on two convolutional neural networks: the first segments the images, while the second detects and classifies the specific objects. Moreover, to locate the detected objects, we establish an attributed topological binary tree(ATBT) based on urban grammar for each image to depict the coherent relations of topologies, attributes and semantics of the road objects. Then the ATBT is further matched with map features on OSM to determine the right placed location. The proposed method has been applied to a case study in Berlin, Germany. We validate the effectiveness of our method on two object classes: traffic signs and traffic lights. Experimental results demonstrate that the proposed approach provides near-precise localization results in terms of completeness and positional accuracy. Among many potential applications, the output may be combined with other sources of data to guide autonomous vehicles


  Access Model/Code and Paper
Visualizing How Embeddings Generalize

Sep 16, 2019
Xiaotong Liu, Hong Xuan, Zeyu Zhang, Abby Stylianou, Robert Pless

Deep metric learning is often used to learn an embedding function that captures the semantic differences within a dataset. A key factor in many problem domains is how this embedding generalizes to new classes of data. In observing many triplet selection strategies for Metric Learning, we find that the best performance consistently arises from approaches that focus on a few, well selected triplets.We introduce visualization tools to illustrate how an embedding generalizes beyond measuring accuracy on validation data, and we illustrate the behavior of a range of triplet selection strategies.

* 8 pages,4 figures, published in ICML workshop:Understanding and Improving Generalization in Deep Learning 

  Access Model/Code and Paper
Open Set Domain Adaptation: Theoretical Bound and Algorithm

Jul 19, 2019
Zhen Fang, Jie Lu, Feng Liu, Junyu Xuan, Guangquan Zhang

Unsupervised domain adaptation for classification tasks has achieved great progress in leveraging the knowledge in a labeled (source) domain to improve the task performance in an unlabeled (target) domain by mitigating the effect of distribution discrepancy. However, most existing methods can only handle unsupervised closed set domain adaptation (UCSDA), where the source and target domains share the same label set. In this paper, we target a more challenging but realistic setting: unsupervised open set domain adaptation (UOSDA), where the target domain has unknown classes that the source domain does not have. This study is the first to give the generalization bound of open set domain adaptation through theoretically investigating the risk of the target classifier on the unknown classes. The proposed generalization bound for open set domain adaptation has a special term, namely open set difference, which reflects the risk of the target classifier on unknown classes. According to this generalization bound, we propose a novel and theoretically guided unsupervised open set domain adaptation method: Distribution Alignment with Open Difference (DAOD), which is based on the structural risk minimization principle and open set difference regularization. The experiments on several benchmark datasets show the superior performance of the proposed UOSDA method compared with the state-of-the-art methods in the literature.


  Access Model/Code and Paper
A Scheme for Continuous Input to the Tsetlin Machine with Applications to Forecasting Disease Outbreaks

May 10, 2019
K. Darshana Abeyrathna, Ole-Christoffer Granmo, Xuan Zhang, Morten Goodwin

In this paper, we apply a new promising tool for pattern classification, namely, the Tsetlin Machine (TM), to the field of disease forecasting. The TM is interpretable because it is based on manipulating expressions in propositional logic, leveraging a large team of Tsetlin Automata (TA). Apart from being interpretable, this approach is attractive due to its low computational cost and its capacity to handle noise. To attack the problem of forecasting, we introduce a preprocessing method that extends the TM so that it can handle continuous input. Briefly stated, we convert continuous input into a binary representation based on thresholding. The resulting extended TM is evaluated and analyzed using an artificial dataset. The TM is further applied to forecast dengue outbreaks of all the seventeen regions in Philippines using the spatio-temporal properties of the data. Experimental results show that dengue outbreak forecasts made by the TM are more accurate than those obtained by a Support Vector Machine (SVM), Decision Trees (DTs), and several multi-layered Artificial Neural Networks (ANNs), both in terms of forecasting precision and F1-score.

* 14 pages 

  Access Model/Code and Paper
Projecting "better than randomly": How to reduce the dimensionality of very large datasets in a way that outperforms random projections

Jan 03, 2019
Michael Wojnowicz, Di Zhang, Glenn Chisholm, Xuan Zhao, Matt Wolff

For very large datasets, random projections (RP) have become the tool of choice for dimensionality reduction. This is due to the computational complexity of principal component analysis. However, the recent development of randomized principal component analysis (RPCA) has opened up the possibility of obtaining approximate principal components on very large datasets. In this paper, we compare the performance of RPCA and RP in dimensionality reduction for supervised learning. In Experiment 1, study a malware classification task on a dataset with over 10 million samples, almost 100,000 features, and over 25 billion non-zero values, with the goal of reducing the dimensionality to a compressed representation of 5,000 features. In order to apply RPCA to this dataset, we develop a new algorithm called large sample RPCA (LS-RPCA), which extends the RPCA algorithm to work on datasets with arbitrarily many samples. We find that classification performance is much higher when using LS-RPCA for dimensionality reduction than when using random projections. In particular, across a range of target dimensionalities, we find that using LS-RPCA reduces classification error by between 37% and 54%. Experiment 2 generalizes the phenomenon to multiple datasets, feature representations, and classifiers. These findings have implications for a large number of research projects in which random projections were used as a preprocessing step for dimensionality reduction. As long as accuracy is at a premium and the target dimensionality is sufficiently less than the numeric rank of the dataset, randomized PCA may be a superior choice. Moreover, if the dataset has a large number of samples, then LS-RPCA will provide a method for obtaining the approximate principal components.

* 2016 IEEE 3rd International Conference on Data Science and Advanced Analytics (DSAA) (pp. 184-193). IEEE 
* Originally published in IEEE DSAA in 2016; this post-print fixes a rendering error of the += operator in Algorithm 3 

  Access Model/Code and Paper
SCPNet: Spatial-Channel Parallelism Network for Joint Holistic and Partial Person Re-Identification

Oct 16, 2018
Xing Fan, Hao Luo, Xuan Zhang, Lingxiao He, Chi Zhang, Wei Jiang

Holistic person re-identification (ReID) has received extensive study in the past few years and achieves impressive progress. However, persons are often occluded by obstacles or other persons in practical scenarios, which makes partial person re-identification non-trivial. In this paper, we propose a spatial-channel parallelism network (SCPNet) in which each channel in the ReID feature pays attention to a given spatial part of the body. The spatial-channel corresponding relationship supervises the network to learn discriminative feature for both holistic and partial person re-identification. The single model trained on four holistic ReID datasets achieves competitive accuracy on these four datasets, as well as outperforms the state-of-the-art methods on two partial ReID datasets without training.

* accepted by ACCV 2018 

  Access Model/Code and Paper
A Learning-based Framework for Hybrid Depth-from-Defocus and Stereo Matching

Aug 06, 2018
Zhang Chen, Xinqing Guo, Siyuan Li, Xuan Cao, Jingyi Yu

Depth from defocus (DfD) and stereo matching are two most studied passive depth sensing schemes. The techniques are essentially complementary: DfD can robustly handle repetitive textures that are problematic for stereo matching whereas stereo matching is insensitive to defocus blurs and can handle large depth range. In this paper, we present a unified learning-based technique to conduct hybrid DfD and stereo matching. Our input is image triplets: a stereo pair and a defocused image of one of the stereo views. We first apply depth-guided light field rendering to construct a comprehensive training dataset for such hybrid sensing setups. Next, we adopt the hourglass network architecture to separately conduct depth inference from DfD and stereo. Finally, we exploit different connection methods between the two separate networks for integrating them into a unified solution to produce high fidelity 3D disparity maps. Comprehensive experiments on real and synthetic data show that our new learning-based hybrid 3D sensing technique can significantly improve accuracy and robustness in 3D reconstruction.


  Access Model/Code and Paper
Sparse Photometric 3D Face Reconstruction Guided by Morphable Models

Nov 29, 2017
Xuan Cao, Zhang Chen, Anpei Chen, Xin Chen, Cen Wang, Jingyi Yu

We present a novel 3D face reconstruction technique that leverages sparse photometric stereo (PS) and latest advances on face registration/modeling from a single image. We observe that 3D morphable faces approach provides a reasonable geometry proxy for light position calibration. Specifically, we develop a robust optimization technique that can calibrate per-pixel lighting direction and illumination at a very high precision without assuming uniform surface albedos. Next, we apply semantic segmentation on input images and the geometry proxy to refine hairy vs. bare skin regions using tailored filters. Experiments on synthetic and real data show that by using a very small set of images, our technique is able to reconstruct fine geometric details such as wrinkles, eyebrows, whelks, pores, etc, comparable to and sometimes surpassing movie quality productions.


  Access Model/Code and Paper
Cooperative Hierarchical Dirichlet Processes: Superposition vs. Maximization

Jul 18, 2017
Junyu Xuan, Jie Lu, Guangquan Zhang, Richard Yi Da Xu

The cooperative hierarchical structure is a common and significant data structure observed in, or adopted by, many research areas, such as: text mining (author-paper-word) and multi-label classification (label-instance-feature). Renowned Bayesian approaches for cooperative hierarchical structure modeling are mostly based on topic models. However, these approaches suffer from a serious issue in that the number of hidden topics/factors needs to be fixed in advance and an inappropriate number may lead to overfitting or underfitting. One elegant way to resolve this issue is Bayesian nonparametric learning, but existing work in this area still cannot be applied to cooperative hierarchical structure modeling. In this paper, we propose a cooperative hierarchical Dirichlet process (CHDP) to fill this gap. Each node in a cooperative hierarchical structure is assigned a Dirichlet process to model its weights on the infinite hidden factors/topics. Together with measure inheritance from hierarchical Dirichlet process, two kinds of measure cooperation, i.e., superposition and maximization, are defined to capture the many-to-many relationships in the cooperative hierarchical structure. Furthermore, two constructive representations for CHDP, i.e., stick-breaking and international restaurant process, are designed to facilitate the model inference. Experiments on synthetic and real-world data with cooperative hierarchical structures demonstrate the properties and the ability of CHDP for cooperative hierarchical structure modeling and its potential for practical application scenarios.


  Access Model/Code and Paper