Models, code, and papers for "Rui Zhu":

Statistical Characteristics of Driver Accelerating Behavior and Its Probability Model

Jul 03, 2019
Rui Liu, Xichan Zhu

The naturalistic driving data are employed to study the accelerating behavior of the driver. Firstly, the question that whether the database is big enough to achieve a convergent accelerating behavior of the driver is studied. The kernel density estimation is applied to estimate the distributions of the accelerations. The Kullback-Liebler divergence is employed to evaluate the distinction between datasets composed of different quantity of data. The results show that a convergent accelerating behavior of the driver can be obtained by using the database in this study. Secondly, the bivariate accelerating behavior is proposed. It is shown that the bivariate distribution between longitudinal acceleration and lateral acceleration follows the dual triangle distribution pattern. Two bivariate distribution models are proposed to explain this phenomenon, i.e. the bivariate Normal distribution model (BNDM) and the bivariate Pareto distribution model (BPDM). The univariate acceleration behavior is presented to examine which model is better. It is identified that the marginal distribution and conditional distribution of the accelerations approximately follow the univariate Pareto distribution. Hence, the BPDM is a more appropriate one to describe the bivariate accelerating behavior of the driver. This reveals that the bivariate distribution pattern will never reach a circle-shaped region.

* 8 pages, 14 figure 

  Click for Model/Code and Paper
A Model Parallel Proximal Stochastic Gradient Algorithm for Partially Asynchronous Systems

Oct 19, 2018
Rui Zhu, Di Niu

Large models are prevalent in modern machine learning scenarios, including deep learning, recommender systems, etc., which can have millions or even billions of parameters. Parallel algorithms have become an essential solution technique to many large-scale machine learning jobs. In this paper, we propose a model parallel proximal stochastic gradient algorithm, AsyB-ProxSGD, to deal with large models using model parallel blockwise updates while in the meantime handling a large amount of training data using proximal stochastic gradient descent (ProxSGD). In our algorithm, worker nodes communicate with the parameter servers asynchronously, and each worker performs proximal stochastic gradient for only one block of model parameters during each iteration. Our proposed algorithm generalizes ProxSGD to the asynchronous and model parallel setting. We prove that AsyB-ProxSGD achieves a convergence rate of $O(1/\sqrt{K})$ to stationary points for nonconvex problems under \emph{constant} minibatch sizes, where $K$ is the total number of block updates. This rate matches the best-known rates of convergence for a wide range of gradient-like algorithms. Furthermore, we show that when the number of workers is bounded by $O(K^{1/4})$, we can expect AsyB-ProxSGD to achieve linear speedup as the number of workers increases. We implement the proposed algorithm on MXNet and demonstrate its convergence behavior and near-linear speedup on a real-world dataset involving both a large model size and large amounts of data.

* arXiv admin note: substantial text overlap with arXiv:1802.08880 

  Click for Model/Code and Paper
Consensus-Based Transfer Linear Support Vector Machines for Decentralized Multi-Task Multi-Agent Learning

Mar 27, 2018
Rui Zhang, Quanyan Zhu

Transfer learning has been developed to improve the performances of different but related tasks in machine learning. However, such processes become less efficient with the increase of the size of training data and the number of tasks. Moreover, privacy can be violated as some tasks may contain sensitive and private data, which are communicated between nodes and tasks. We propose a consensus-based distributed transfer learning framework, where several tasks aim to find the best linear support vector machine (SVM) classifiers in a distributed network. With alternating direction method of multipliers, tasks can achieve better classification accuracies more efficiently and privately, as each node and each task train with their own data, and only decision variables are transferred between different tasks and nodes. Numerical experiments on MNIST datasets show that the knowledge transferred from the source tasks can be used to decrease the risks of the target tasks that lack training data or have unbalanced training labels. We show that the risks of the target tasks in the nodes without the data of the source tasks can also be reduced using the information transferred from the nodes who contain the data of the source tasks. We also show that the target tasks can enter and leave in real-time without rerunning the whole algorithm.

  Click for Model/Code and Paper
A Game-Theoretic Approach to Design Secure and Resilient Distributed Support Vector Machines

Feb 07, 2018
Rui Zhang, Quanyan Zhu

Distributed Support Vector Machines (DSVM) have been developed to solve large-scale classification problems in networked systems with a large number of sensors and control units. However, the systems become more vulnerable as detection and defense are increasingly difficult and expensive. This work aims to develop secure and resilient DSVM algorithms under adversarial environments in which an attacker can manipulate the training data to achieve his objective. We establish a game-theoretic framework to capture the conflicting interests between an adversary and a set of distributed data processing units. The Nash equilibrium of the game allows predicting the outcome of learning algorithms in adversarial environments, and enhancing the resilience of the machine learning through dynamic distributed learning algorithms. We prove that the convergence of the distributed algorithm is guaranteed without assumptions on the training data or network topologies. Numerical experiments are conducted to corroborate the results. We show that network topology plays an important role in the security of DSVM. Networks with fewer nodes and higher average degrees are more secure. Moreover, a balanced network is found to be less vulnerable to attacks.

* arXiv admin note: text overlap with arXiv:1710.04677 

  Click for Model/Code and Paper
Game-Theoretic Design of Secure and Resilient Distributed Support Vector Machines with Adversaries

Oct 12, 2017
Rui Zhang, Quanyan Zhu

With a large number of sensors and control units in networked systems, distributed support vector machines (DSVMs) play a fundamental role in scalable and efficient multi-sensor classification and prediction tasks. However, DSVMs are vulnerable to adversaries who can modify and generate data to deceive the system to misclassification and misprediction. This work aims to design defense strategies for DSVM learner against a potential adversary. We establish a game-theoretic framework to capture the conflicting interests between the DSVM learner and the attacker. The Nash equilibrium of the game allows predicting the outcome of learning algorithms in adversarial environments, and enhancing the resilience of the machine learning through dynamic distributed learning algorithms. We show that the DSVM learner is less vulnerable when he uses a balanced network with fewer nodes and higher degree. We also show that adding more training samples is an efficient defense strategy against an attacker. We present secure and resilient DSVM algorithms with verification method and rejection method, and show their resiliency against adversary with numerical experiments.

  Click for Model/Code and Paper
Vispi: Automatic Visual Perception and Interpretation of Chest X-rays

Jun 12, 2019
Xin Li, Rui Cao, Dongxiao Zhu

Medical imaging contains the essential information for rendering diagnostic and treatment decisions. Inspecting (visual perception) and interpreting image to generate a report are tedious clinical routines for a radiologist where automation is expected to greatly reduce the workload. Despite rapid development of natural image captioning, computer-aided medical image visual perception and interpretation remain a challenging task, largely due to the lack of high-quality annotated image-report pairs and tailor-made generative models for sufficient extraction and exploitation of localized semantic features, particularly those associated with abnormalities. To tackle these challenges, we present Vispi, an automatic medical image interpretation system, which first annotates an image via classifying and localizing common thoracic diseases with visual support and then followed by report generation from an attentive LSTM model. Analyzing an open IU X-ray dataset, we demonstrate a superior performance of Vispi in disease classification, localization and report generation using automatic performance evaluation metrics ROUGE and CIDEr.

  Click for Model/Code and Paper
Asynchronous Stochastic Proximal Methods for Nonconvex Nonsmooth Optimization

Sep 15, 2018
Rui Zhu, Di Niu, Zongpeng Li

We study stochastic algorithms for solving nonconvex optimization problems with a convex yet possibly nonsmooth regularizer, which find wide applications in many practical machine learning applications. However, compared to asynchronous parallel stochastic gradient descent (AsynSGD), an algorithm targeting smooth optimization, the understanding of the behavior of stochastic algorithms for nonsmooth regularized optimization problems is limited, especially when the objective function is nonconvex. To fill this theoretical gap, in this paper, we propose and analyze asynchronous parallel stochastic proximal gradient (Asyn-ProxSGD) methods for nonconvex problems. We establish an ergodic convergence rate of $O(1/\sqrt{K})$ for the proposed Asyn-ProxSGD, where $K$ is the number of updates made on the model, matching the convergence rate currently known for AsynSGD (for smooth problems). To our knowledge, this is the first work that provides convergence rates of asynchronous parallel ProxSGD algorithms for nonconvex problems. Furthermore, our results are also the first to show the convergence of any stochastic proximal methods without assuming an increasing batch size or the use of additional variance reduction techniques. We implement the proposed algorithms on Parameter Server and demonstrate its convergence behavior and near-linear speedup, as the number of workers increases, on two real-world datasets.

  Click for Model/Code and Paper
A Block-wise, Asynchronous and Distributed ADMM Algorithm for General Form Consensus Optimization

Feb 24, 2018
Rui Zhu, Di Niu, Zongpeng Li

Many machine learning models, including those with non-smooth regularizers, can be formulated as consensus optimization problems, which can be solved by the alternating direction method of multipliers (ADMM). Many recent efforts have been made to develop asynchronous distributed ADMM to handle large amounts of training data. However, all existing asynchronous distributed ADMM methods are based on full model updates and require locking all global model parameters to handle concurrency, which essentially serializes the updates from different workers. In this paper, we present a novel block-wise, asynchronous and distributed ADMM algorithm, which allows different blocks of model parameters to be updated in parallel. The lock-free block-wise algorithm may greatly speedup sparse optimization problems, a common scenario in reality, in which most model updates only modify a subset of all decision variables. We theoretically prove the convergence of our proposed algorithm to stationary points for non-convex general form consensus problems with possibly non-smooth regularizers. We implement the proposed ADMM algorithm on the Parameter Server framework and demonstrate its convergence and near-linear speedup performance as the number of workers increases.

  Click for Model/Code and Paper
The Conditional Lucas & Kanade Algorithm

Mar 29, 2016
Chen-Hsuan Lin, Rui Zhu, Simon Lucey

The Lucas & Kanade (LK) algorithm is the method of choice for efficient dense image and object alignment. The approach is efficient as it attempts to model the connection between appearance and geometric displacement through a linear relationship that assumes independence across pixel coordinates. A drawback of the approach, however, is its generative nature. Specifically, its performance is tightly coupled with how well the linear model can synthesize appearance from geometric displacement, even though the alignment task itself is associated with the inverse problem. In this paper, we present a new approach, referred to as the Conditional LK algorithm, which: (i) directly learns linear models that predict geometric displacement as a function of appearance, and (ii) employs a novel strategy for ensuring that the generative pixel independence assumption can still be taken advantage of. We demonstrate that our approach exhibits superior performance to classical generative forms of the LK algorithm. Furthermore, we demonstrate its comparable performance to state-of-the-art methods such as the Supervised Descent Method with substantially less training examples, as well as the unique ability to "swap" geometric warp functions without having to retrain from scratch. Finally, from a theoretical perspective, our approach hints at possible redundancies that exist in current state-of-the-art methods for alignment that could be leveraged in vision systems of the future.

* 17 pages, 11 figures 

  Click for Model/Code and Paper
SFA: Small Faces Attention Face Detector

Dec 20, 2018
Shi Luo, Xiongfei Li, Rui Zhu, Xiaoli Zhang

In recent year, tremendous strides have been made in face detection thanks to deep learning. However, most published face detectors deteriorate dramatically as the faces become smaller. In this paper, we present the Small Faces Attention (SFA) face detector to better detect faces with small scale. First, we propose a new scale-invariant face detection architecture which pays more attention to small faces, including 4-branch detection architecture and small faces sensitive anchor design. Second, feature maps fusion strategy is applied in SFA by partially combining high-level features into low-level features to further improve the ability of finding hard faces. Third, we use multi-scale training and testing strategy to enhance face detection performance in practice. Comprehensive experiments show that SFA significantly improves face detection performance, especially on small faces. Our real-time SFA face detector can run at 5 FPS on a single GPU as well as maintain high performance. Besides, our final SFA face detector achieves state-of-the-art detection performance on challenging face detection benchmarks, including WIDER FACE and FDDB datasets, with competitive runtime speed. Both our code and models will be available to the research community.

* 10 pages, 0 figures, 0 tables, 41 references 

  Click for Model/Code and Paper
Expectile Matrix Factorization for Skewed Data Analysis

Mar 03, 2017
Rui Zhu, Di Niu, Linglong Kong, Zongpeng Li

Matrix factorization is a popular approach to solving matrix estimation problems based on partial observations. Existing matrix factorization is based on least squares and aims to yield a low-rank matrix to interpret the conditional sample means given the observations. However, in many real applications with skewed and extreme data, least squares cannot explain their central tendency or tail distributions, yielding undesired estimates. In this paper, we propose \emph{expectile matrix factorization} by introducing asymmetric least squares, a key concept in expectile regression analysis, into the matrix factorization framework. We propose an efficient algorithm to solve the new problem based on alternating minimization and quadratic programming. We prove that our algorithm converges to a global optimum and exactly recovers the true underlying low-rank matrices when noise is zero. For synthetic data with skewed noise and a real-world dataset containing web service response times, the proposed scheme achieves lower recovery errors than the existing matrix factorization method based on least squares in a wide range of settings.

* 8 page main text with 5 page supplementary documents, published in AAAI 2017 

  Click for Model/Code and Paper
Sentiment Analysis based on User Tag for Traditional Chinese Medicine in Weibo

Oct 13, 2014
Junhui Shen, Peiyan Zhu, Rui Fan, Wei Tan

With the acceptance of Western culture and science, Traditional Chinese Medicine (TCM) has become a controversial issue in China. So, it's important to study the public's sentiment and opinion on TCM. The rapid development of online social network, such as twitter, make it convenient and efficient to sample hundreds of millions of people for the aforementioned sentiment study. To the best of our knowledge, the present work is the first attempt that applies sentiment analysis to the domain of TCM on Sina Weibo (a twitter-like microblogging service in China). In our work, firstly we collect tweets topic about TCM from Sina Weibo, and label the tweets as supporting TCM and opposing TCM automatically based on user tag. Then, a support vector machine classifier has been built to predict the sentiment of TCM tweets without labels. Finally, we present a method to adjust the classifier result. The performance of F-measure attained with our method is 97%.

* 7 pages, 8 figures,3 tables 

  Click for Model/Code and Paper
Collaborative Filtering with Information-Rich and Information-Sparse Entities

Mar 06, 2014
Kai Zhu, Rui Wu, Lei Ying, R. Srikant

In this paper, we consider a popular model for collaborative filtering in recommender systems where some users of a website rate some items, such as movies, and the goal is to recover the ratings of some or all of the unrated items of each user. In particular, we consider both the clustering model, where only users (or items) are clustered, and the co-clustering model, where both users and items are clustered, and further, we assume that some users rate many items (information-rich users) and some users rate only a few items (information-sparse users). When users (or items) are clustered, our algorithm can recover the rating matrix with $\omega(MK \log M)$ noisy entries while $MK$ entries are necessary, where $K$ is the number of clusters and $M$ is the number of items. In the case of co-clustering, we prove that $K^2$ entries are necessary for recovering the rating matrix, and our algorithm achieves this lower bound within a logarithmic factor when $K$ is sufficiently large. We compare our algorithms with a well-known algorithms called alternating minimization (AM), and a similarity score-based algorithm known as the popularity-among-friends (PAF) algorithm by applying all three to the MovieLens and Netflix data sets. Our co-clustering algorithm and AM have similar overall error rates when recovering the rating matrix, both of which are lower than the error rate under PAF. But more importantly, the error rate of our co-clustering algorithm is significantly lower than AM and PAF in the scenarios of interest in recommender systems: when recommending a few items to each user or when recommending items to users who only rated a few items (these users are the majority of the total user population). The performance difference increases even more when noise is added to the datasets.

  Click for Model/Code and Paper
Constrained Mutual Convex Cone Method for Image Set Based Recognition

Mar 14, 2019
Naoya Sogi, Rui Zhu, Jing-Hao Xue, Kazuhiro Fukui

In this paper, we propose a method for image-set classification based on convex cone models. Image set classification aims to classify a set of images, which were usually obtained from video frames or multi-view cameras, into a target object. To accurately and stably classify a set, it is essential to represent structural information of the set accurately. There are various representative image features, such as histogram based features, HLAC, and Convolutional Neural Network (CNN) features. We should note that most of them have non-negativity and thus can be effectively represented by a convex cone. This leads us to introduce the convex cone representation to image-set classification. To establish a convex cone based framework, we mathematically define multiple angles between two convex cones, and then define the geometric similarity between the cones using the angles. Moreover, to enhance the framework, we introduce a discriminant space that maximizes the between-class variance (gaps) and minimizes the within-class variance of the projected convex cones onto the discriminant space, similar to the Fisher discriminant analysis. Finally, the classification is performed based on the similarity between projected convex cones. The effectiveness of the proposed method is demonstrated experimentally by using five databases: CMU PIE dataset, ETH-80, CMU Motion of Body dataset, Youtube Celebrity dataset, and a private database of multi-view hand shapes.

* arXiv admin note: substantial text overlap with arXiv:1805.12467 

  Click for Model/Code and Paper
Thermostat-assisted continuously-tempered Hamiltonian Monte Carlo for Bayesian learning

May 28, 2018
Rui Luo, Yaodong Yang, Jianhong Wang, Zhanxing Zhu, Jun Wang

In this paper, we propose a novel sampling method, the thermostat-assisted continuously-tempered Hamiltonian Monte Carlo, for the purpose of multimodal Bayesian learning. It simulates a noisy dynamical system by incorporating both a continuously-varying tempering variable and the Nos\'e-Hoover thermostats. A significant benefit is that it is not only able to efficiently generate i.i.d. samples when the underlying posterior distributions are multimodal, but also capable of adaptively neutralising the noise arising from the use of mini-batches. While the properties of the approach have been studied using synthetic datasets, our experiments on three real datasets have also shown its performance gains over several strong baselines for Bayesian learning with various types of neural networks plunged in.

  Click for Model/Code and Paper
Learning Depth from Monocular Videos using Direct Methods

Dec 01, 2017
Chaoyang Wang, Jose Miguel Buenaposada, Rui Zhu, Simon Lucey

The ability to predict depth from a single image - using recent advances in CNNs - is of increasing interest to the vision community. Unsupervised strategies to learning are particularly appealing as they can utilize much larger and varied monocular video datasets during learning without the need for ground truth depth or stereo. In previous works, separate pose and depth CNN predictors had to be determined such that their joint outputs minimized the photometric error. Inspired by recent advances in direct visual odometry (DVO), we argue that the depth CNN predictor can be learned without a pose CNN predictor. Further, we demonstrate empirically that incorporation of a differentiable implementation of DVO, along with a novel depth normalization strategy - substantially improves performance over state of the art that use monocular videos for training.

  Click for Model/Code and Paper
Rethinking Reprojection: Closing the Loop for Pose-aware ShapeReconstruction from a Single Image

Jul 26, 2017
Rui Zhu, Hamed Kiani Galoogahi, Chaoyang Wang, Simon Lucey

An emerging problem in computer vision is the reconstruction of 3D shape and pose of an object from a single image. Hitherto, the problem has been addressed through the application of canonical deep learning methods to regress from the image directly to the 3D shape and pose labels. These approaches, however, are problematic from two perspectives. First, they are minimizing the error between 3D shapes and pose labels - with little thought about the nature of this label error when reprojecting the shape back onto the image. Second, they rely on the onerous and ill-posed task of hand labeling natural images with respect to 3D shape and pose. In this paper we define the new task of pose-aware shape reconstruction from a single image, and we advocate that cheaper 2D annotations of objects silhouettes in natural images can be utilized. We design architectures of pose-aware shape reconstruction which re-project the predicted shape back on to the image using the predicted pose. Our evaluation on several object categories demonstrates the superiority of our method for predicting pose-aware 3D shapes from natural images.

* First sub 

  Click for Model/Code and Paper
Sensing Subjective Well-being from Social Media

Aug 28, 2014
Bibo Hao, Lin Li, Rui Gao, Ang Li, Tingshao Zhu

Subjective Well-being(SWB), which refers to how people experience the quality of their lives, is of great use to public policy-makers as well as economic, sociological research, etc. Traditionally, the measurement of SWB relies on time-consuming and costly self-report questionnaires. Nowadays, people are motivated to share their experiences and feelings on social media, so we propose to sense SWB from the vast user generated data on social media. By utilizing 1785 users' social media data with SWB labels, we train machine learning models that are able to "sense" individual SWB from users' social media. Our model, which attains the state-by-art prediction accuracy, can then be used to identify SWB of large population of social media users in time with very low cost.

* 12 pages, 1 figures, 2 tables, 10th International Conference, AMT 2014, Warsaw, Poland, August 11-14, 2014. Proceedings 

  Click for Model/Code and Paper
TransGCN:Coupling Transformation Assumptions with Graph Convolutional Networks for Link Prediction

Oct 01, 2019
Ling Cai, Bo Yan, Gengchen Mai, Krzysztof Janowicz, Rui Zhu

Link prediction is an important and frequently studied task that contributes to an understanding of the structure of knowledge graphs (KGs) in statistical relational learning. Inspired by the success of graph convolutional networks (GCN) in modeling graph data, we propose a unified GCN framework, named TransGCN, to address this task, in which relation and entity embeddings are learned simultaneously. To handle heterogeneous relations in KGs, we introduce a novel way of representing heterogeneous neighborhood by introducing transformation assumptions on the relationship between the subject, the relation, and the object of a triple. Specifically, a relation is treated as a transformation operator transforming a head entity to a tail entity. Both translation assumption in TransE and rotation assumption in RotatE are explored in our framework. Additionally, instead of only learning entity embeddings in the convolution-based encoder while learning relation embeddings in the decoder as done by the state-of-art models, e.g., R-GCN, the TransGCN framework trains relation embeddings and entity embeddings simultaneously during the graph convolution operation, thus having fewer parameters compared with R-GCN. Experiments show that our models outperform the-state-of-arts methods on both FB15K-237 and WN18RR.

  Click for Model/Code and Paper
Learning Target-oriented Dual Attention for Robust RGB-T Tracking

Aug 12, 2019
Rui Yang, Yabin Zhu, Xiao Wang, Chenglong Li, Jin Tang

RGB-Thermal object tracking attempt to locate target object using complementary visual and thermal infrared data. Existing RGB-T trackers fuse different modalities by robust feature representation learning or adaptive modal weighting. However, how to integrate dual attention mechanism for visual tracking is still a subject that has not been studied yet. In this paper, we propose two visual attention mechanisms for robust RGB-T object tracking. Specifically, the local attention is implemented by exploiting the common visual attention of RGB and thermal data to train deep classifiers. We also introduce the global attention, which is a multi-modal target-driven attention estimation network. It can provide global proposals for the classifier together with local proposals extracted from previous tracking result. Extensive experiments on two RGB-T benchmark datasets validated the effectiveness of our proposed algorithm.

* Accepted by IEEE ICIP 2019 

  Click for Model/Code and Paper