Research papers and code for "Michael Zhang":
Graph convolutional neural networks (Graph-CNNs) extend traditional CNNs to handle data that is supported on a graph. Major challenges when working with data on graphs are that the support set (the vertices of the graph) do not typically have a natural ordering, and in general, the topology of the graph is not regular (i.e., vertices do not all have the same number of neighbors). Thus, Graph-CNNs have huge potential to deal with 3D point cloud data which has been obtained from sampling a manifold. In this paper, we develop a Graph-CNN for classifying 3D point cloud data, called PointGCN. The architecture combines localized graph convolutions with two types of graph downsampling operations (also known as pooling). By the effective exploration of the point cloud local structure using the Graph-CNN, the proposed architecture achieves competitive performance on the 3D object classification benchmark ModelNet, and our architecture is more stable than competing schemes.

* Published as a conference paper at ICASSP 2018
Click to Read Paper and Get Code
The paper presents a knowledge representation language $\mathcal{A}log$ which extends ASP with aggregates. The goal is to have a language based on simple syntax and clear intuitive and mathematical semantics. We give some properties of $\mathcal{A}log$, an algorithm for computing its answer sets, and comparison with other approaches.

* arXiv admin note: text overlap with arXiv:1405.3637
Click to Read Paper and Get Code
The paper continues the investigation of Poincare and Russel's Vicious Circle Principle (VCP) in the context of the design of logic programming languages with sets. We expand previously introduced language Alog with aggregates by allowing infinite sets and several additional set related constructs useful for knowledge representation and teaching. In addition, we propose an alternative formalization of the original VCP and incorporate it into the semantics of new language, Slog+, which allows more liberal construction of sets and their use in programming rules. We show that, for programs without disjunction and infinite sets, the formal semantics of aggregates in Slog+ coincides with that of several other known languages. Their intuitive and formal semantics, however, are based on quite different ideas and seem to be more involved than that of Slog+.

* Paper presented at the 9th Workshop on Answer Set Programming and Other Computing Paradigms (ASPOCP 2016), New York City, USA, 16 October 2016
Click to Read Paper and Get Code
Stochastic algorithms are efficient approaches to solving machine learning and optimization problems. In this paper, we propose a general framework called Splash for parallelizing stochastic algorithms on multi-node distributed systems. Splash consists of a programming interface and an execution engine. Using the programming interface, the user develops sequential stochastic algorithms without concerning any detail about distributed computing. The algorithm is then automatically parallelized by a communication-efficient execution engine. We provide theoretical justifications on the optimal rate of convergence for parallelizing stochastic gradient descent. Splash is built on top of Apache Spark. The real-data experiments on logistic regression, collaborative filtering and topic modeling verify that Splash yields order-of-magnitude speedup over single-thread stochastic algorithms and over state-of-the-art implementations on Spark.

* redo experiments to learn bigger models; compare Splash with state-of-the-art implementations on Spark
Click to Read Paper and Get Code
As a contribution to the challenge of building game-playing AI systems, we develop and analyse a formal language for representing and reasoning about strategies. Our logical language builds on the existing general Game Description Language (GDL) and extends it by a standard modality for linear time along with two dual connectives to express preferences when combining strategies. The semantics of the language is provided by a standard state-transition model. As such, problems that require reasoning about games can be solved by the standard methods for reasoning about actions and change. We also endow the language with a specific semantics by which strategy formulas are understood as move recommendations for a player. To illustrate how our formalism supports automated reasoning about strategies, we demonstrate two example methods of implementation\/: first, we formalise the semantic interpretation of our language in conjunction with game rules and strategy rules in the Situation Calculus; second, we show how the reasoning problem can be solved with Answer Set Programming.

Click to Read Paper and Get Code
We show that the multi-class support vector machine (MSVM) proposed by Lee et. al. (2004), can be viewed as a MAP estimation procedure under an appropriate probabilistic interpretation of the classifier. We also show that this interpretation can be extended to a hierarchical Bayesian architecture and to a fully-Bayesian inference procedure for multi-class classification based on data augmentation. We present empirical results that show that the advantages of the Bayesian formalism are obtained without a loss in classification accuracy.

* Appears in Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI2006)
Click to Read Paper and Get Code
In this paper, we propose a distributed off-policy actor critic method to solve multi-agent reinforcement learning problems. Specifically, we assume that all agents keep local estimates of the global optimal policy parameter and update their local value function estimates independently. Then, we introduce an additional consensus step to let all the agents asymptotically achieve agreement on the global optimal policy function. The convergence analysis of the proposed algorithm is provided and the effectiveness of the proposed algorithm is validated using a distributed resource allocation example. Compared to relevant distributed actor critic methods, here the agents do not share information about their local tasks, but instead they coordinate to estimate the global policy function.

Click to Read Paper and Get Code
Training Gaussian process-based models typically involves an $ O(N^3)$ computational bottleneck due to inverting the covariance matrix. Popular methods for overcoming this matrix inversion problem cannot adequately model all types of latent functions, and are often not parallelizable. However, judicious choice of model structure can ameliorate this problem. A mixture-of-experts model that uses a mixture of $K$ Gaussian processes offers modeling flexibility and opportunities for scalable inference. Our embarassingly parallel algorithm combines low-dimensional matrix inversions with importance sampling to yield a flexible, scalable mixture-of-experts model that offers comparable performance to Gaussian process regression at a much lower computational cost.

Click to Read Paper and Get Code
Recent advances in deep learning have achieved impressive gains in classification accuracy on a variety of types of data, including images and text. Despite these gains, however, concerns have been raised about the calibration, robustness, and interpretability of these models. In this paper we propose a simple way to modify any conventional deep architecture to automatically provide more transparent explanations for classification decisions, as well as an intuitive notion of the credibility of each prediction. Specifically, we draw on ideas from nonparametric kernel regression, and propose to predict labels based on a weighted sum of training instances, where the weights are determined by distance in a learned instance-embedding space. Working within the framework of conformal methods, we propose a new measure of nonconformity suggested by our model, and experimentally validate the accompanying theoretical expectations, demonstrating improved transparency, controlled error rates, and robustness to out-of-domain data, without compromising on accuracy or calibration.

* 13 pages, 8 figures, 5 tables, added DOI and updated to meet ACM formatting requirements, In Proceedings of FAT* (2019)
Click to Read Paper and Get Code
Transfer learning allows practitioners to recognize and apply knowledge learned in previous tasks (source task) to new tasks or new domains (target task), which share some commonality. The two important factors impacting the performance of transfer learning models are: (a) the size of the target dataset, and (b) the similarity in distribution between source and target domains. Thus far, there has been little investigation into just how important these factors are. In this paper, we investigate the impact of target dataset size and source/target domain similarity on model performance through a series of experiments. We find that more data is always beneficial, and model performance improves linearly with the log of data size, until we are out of data. As source/target domains differ, more data is required and fine tuning will render better performance than feature extraction. When source/target domains are similar and data size is small, fine tuning and feature extraction renders equivalent performance. Our hope is that by beginning this quantitative investigation on the effect of data volume and domain similarity in transfer learning we might inspire others to explore the significance of data in developing more accurate statistical models.

* 9 pages
Click to Read Paper and Get Code
Inference of latent feature models in the Bayesian nonparametric setting is generally difficult, especially in high dimensional settings, because it usually requires proposing features from some prior distribution. In special cases, where the integration is tractable, we could sample feature assignments according to a predictive likelihood. However, this still may not be efficient in high dimensions. We present a novel method to accelerate the mixing of latent variable model inference by proposing feature locations from the data, as opposed to the prior. This sampling method is efficient for proper mixing of the Markov chain Monte Carlo sampler, computationally attractive because this method can be performed in parallel, and is theoretically guaranteed to converge to the posterior distribution as its limiting distribution.

Click to Read Paper and Get Code
This is a preliminary report on the work aimed at making CR-Prolog -- a version of ASP with consistency restoring rules -- more suitable for use in teaching and large applications. First we describe a sorted version of CR-Prolog called SPARC. Second, we translate a basic version of the CR-Prolog into the language of DLV and compare the performance with the state of the art CR-Prolog solver. The results form the foundation for future more efficient and user friendly implementation of SPARC and shed some light on the relationship between two useful knowledge representation constructs: consistency restoring rules and weak constraints of DLV.

* Proceedings of Answer Set Programming and Other Computing Paradigms (ASPOCP 2012), 5th International Workshop, September 4, 2012, Budapest, Hungary
Click to Read Paper and Get Code
Support vector machines (SVMs) naturally embody sparseness due to their use of hinge loss functions. However, SVMs can not directly estimate conditional class probabilities. In this paper we propose and study a family of coherence functions, which are convex and differentiable, as surrogates of the hinge function. The coherence function is derived by using the maximum-entropy principle and is characterized by a temperature parameter. It bridges the hinge function and the logit function in logistic regression. The limit of the coherence function at zero temperature corresponds to the hinge function, and the limit of the minimizer of its expected error is the minimizer of the expected error of the hinge loss. We refer to the use of the coherence function in large-margin classification as C-learning, and we present efficient coordinate descent algorithms for the training of regularized ${\cal C}$-learning models.

* 28 pages
Click to Read Paper and Get Code
Human activity recognition has gained importance in recent years due to its applications in various fields such as health, security and surveillance, entertainment, and intelligent environments. A significant amount of work has been done on human activity recognition and researchers have leveraged different approaches, such as wearable, object-tagged, and device-free, to recognize human activities. In this article, we present a comprehensive survey of the work conducted over the period 2010-2018 in various areas of human activity recognition with main focus on device-free solutions. The device-free approach is becoming very popular due to the fact that the subject is not required to carry anything, instead, the environment is tagged with devices to capture the required information. We propose a new taxonomy for categorizing the research work conducted in the field of activity recognition and divide the existing literature into three sub-areas: action-based, motion-based, and interaction-based. We further divide these areas into ten different sub-topics and present the latest research work in these sub-topics. Unlike previous surveys which focus only on one type of activities, to the best of our knowledge, we cover all the sub-areas in activity recognition and provide a comparison of the latest research work in these sub-areas. Specifically, we discuss the key attributes and design approaches for the work presented. Then we provide extensive analysis based on 10 important metrics, to give the reader, a complete overview of the state-of-the-art techniques and trends in different sub-areas of human activity recognition. In the end, we discuss open research issues and provide future research directions in the field of human activity recognition.

* 28
Click to Read Paper and Get Code
In many applications, observed data are influenced by some combination of latent causes. For example, suppose sensors are placed inside a building to record responses such as temperature, humidity, power consumption and noise levels. These random, observed responses are typically affected by many unobserved, latent factors (or features) within the building such as the number of individuals, the turning on and off of electrical devices, power surges, etc. These latent factors are usually present for a contiguous period of time before disappearing; further, multiple factors could be present at a time. This paper develops new probabilistic methodology and inference methods for random object generation influenced by latent features exhibiting temporal persistence. Every datum is associated with subsets of a potentially infinite number of hidden, persistent features that account for temporal dynamics in an observation. The ensuing class of dynamic models constructed by adapting the Indian Buffet Process --- a probability measure on the space of random, unbounded binary matrices --- finds use in a variety of applications arising in operations, signal processing, biomedicine, marketing, image analysis, etc. Illustrations using synthetic and real data are provided.

Click to Read Paper and Get Code
The last decade has witnessed an explosion in the development of models, theory and computational algorithms for "big data" analysis. In particular, distributed computing has served as a natural and dominating paradigm for statistical inference. However, the existing literature on parallel inference almost exclusively focuses on Euclidean data and parameters. While this assumption is valid for many applications, it is increasingly more common to encounter problems where the data or the parameters lie on a non-Euclidean space, like a manifold for example. Our work aims to fill a critical gap in the literature by generalizing parallel inference algorithms to optimization on manifolds. We show that our proposed algorithm is both communication efficient and carries theoretical convergence guarantees. In addition, we demonstrate the performance of our algorithm to the estimation of Fr\'echet means on simulated spherical data and the low-rank matrix completion problem over Grassmann manifolds applied to the Netflix prize data set.

* 15 pages
Click to Read Paper and Get Code
Effective and accurate model selection is an important problem in modern data analysis. One of the major challenges is the computational burden required to handle large data sets that cannot be stored or processed on one machine. Another challenge one may encounter is the presence of outliers and contaminations that damage the inference quality. The parallel "divide and conquer" model selection strategy divides the observations of the full data set into roughly equal subsets and perform inference and model selection independently on each subset. After local subset inference, this method aggregates the posterior model probabilities or other model/variable selection criteria to obtain a final model by using the notion of geometric median. This approach leads to improved concentration in finding the "correct" model and model parameters and also is provably robust to outliers and data contamination.

* Computational Statistics & Data Analysis, Volume 127, 2018, Pages 229-247, ISSN 0167-9473
Click to Read Paper and Get Code
This paper addresses active state estimation with a team of robotic sensors. The states to be estimated are represented by spatially distributed, uncorrelated, stationary vectors. Given a prior belief on the geographic locations of the states, we cluster the states in moderately sized groups and propose a new hierarchical Dynamic Programming (DP) framework to compute optimal sensing policies for each cluster that mitigates the computational cost of planning optimal policies in the combined belief space. Then, we develop a decentralized assignment algorithm that dynamically allocates clusters to robots based on the pre-computed optimal policies at each cluster. The integrated distributed state estimation framework is optimal at the cluster level but also scales very well to large numbers of states and robot sensors. We demonstrate efficiency of the proposed method in both simulations and real-world experiments using stereoscopic vision sensors.

* IEEE Transactions on Control of Network Systems, December 2017
Click to Read Paper and Get Code
Statistical analysis (SA) is a complex process to deduce population properties from analysis of data. It usually takes a well-trained analyst to successfully perform SA, and it becomes extremely challenging to apply SA to big data applications. We propose to use deep neural networks to automate the SA process. In particular, we propose to construct convolutional neural networks (CNNs) to perform automatic model selection and parameter estimation, two most important SA tasks. We refer to the resulting CNNs as the neural model selector and the neural model estimator, respectively, which can be properly trained using labeled data systematically generated from candidate models. Simulation study shows that both the selector and estimator demonstrate excellent performances. The idea and proposed framework can be further extended to automate the entire SA process and have the potential to revolutionize how SA is performed in big data analytics.

Click to Read Paper and Get Code