Models, code, and papers for "Anderson Y. Zhang":

Iterative Algorithm for Discrete Structure Recovery

Nov 04, 2019
Chao Gao, Anderson Y. Zhang

We propose a general modeling and algorithmic framework for discrete structure recovery that can be applied to a wide range of problems. Under this framework, we are able to study the recovery of clustering labels, ranks of players, and signs of regression coefficients from a unified perspective. A simple iterative algorithm is proposed for discrete structure recovery, which generalizes methods including Lloyd's algorithm and the iterative feature matching algorithm. A linear convergence result for the proposed algorithm is established in this paper under appropriate abstract conditions on stochastic errors and initialization. We illustrate our general theory by applying it on three representative problems: clustering in Gaussian mixture model, approximate ranking, and sign recovery in compressed sensing, and show that minimax rate is achieved in each case.

  Click for Model/Code and Paper
Theoretical and Computational Guarantees of Mean Field Variational Inference for Community Detection

Dec 11, 2017
Anderson Y. Zhang, Harrison H. Zhou

The mean field variational Bayes method is becoming increasingly popular in statistics and machine learning. Its iterative Coordinate Ascent Variational Inference algorithm has been widely applied to large scale Bayesian inference. See Blei et al. (2017) for a recent comprehensive review. Despite the popularity of the mean field method there exist remarkably little fundamental theoretical justifications. To the best of our knowledge, the iterative algorithm has never been investigated for any high dimensional and complex model. In this paper, we study the mean field method for community detection under the Stochastic Block Model. For an iterative Batch Coordinate Ascent Variational Inference algorithm, we show that it has a linear convergence rate and converges to the minimax rate within $\log n$ iterations. This complements the results of Bickel et al. (2013) which studied the global minimum of the mean field variational Bayes and obtained asymptotic normal estimation of global model parameters. In addition, we obtain similar optimality results for Gibbs sampling and an iterative procedure to calculate maximum likelihood estimation, which can be of independent interest.

  Click for Model/Code and Paper
Optimality of Spectral Clustering for Gaussian Mixture Model

Nov 01, 2019
Matthias Löffler, Anderson Y. Zhang, Harrison H. Zhou

Spectral clustering is one of the most popular algorithms to group high dimensional data. It is easy to implement and computationally efficient. Despite its popularity and successful applications, its theoretical properties have not been fully understood. The spectral clustering algorithm is often used as a consistent initializer for more sophisticated clustering algorithms. However, in this paper, we show that spectral clustering is actually already optimal in the Gaussian Mixture Model, when the number of clusters of is fixed and consistent clustering is possible. Contrary to that spectral gap conditions are widely assumed in literature to analyze spectral clustering, these conditions are not needed in this paper to establish its optimality.

  Click for Model/Code and Paper
Community Detection in Degree-Corrected Block Models

Jul 24, 2016
Chao Gao, Zongming Ma, Anderson Y. Zhang, Harrison H. Zhou

Community detection is a central problem of network data analysis. Given a network, the goal of community detection is to partition the network nodes into a small number of clusters, which could often help reveal interesting structures. The present paper studies community detection in Degree-Corrected Block Models (DCBMs). We first derive asymptotic minimax risks of the problem for a misclassification proportion loss under appropriate conditions. The minimax risks are shown to depend on degree-correction parameters, community sizes, and average within and between community connectivities in an intuitive and interpretable way. In addition, we propose a polynomial time algorithm to adaptively perform consistent and even asymptotically optimal community detection in DCBMs.

  Click for Model/Code and Paper
Achieving Optimal Misclassification Proportion in Stochastic Block Model

Oct 03, 2015
Chao Gao, Zongming Ma, Anderson Y. Zhang, Harrison H. Zhou

Community detection is a fundamental statistical problem in network data analysis. Many algorithms have been proposed to tackle this problem. Most of these algorithms are not guaranteed to achieve the statistical optimality of the problem, while procedures that achieve information theoretic limits for general parameter spaces are not computationally tractable. In this paper, we present a computationally feasible two-stage method that achieves optimal statistical performance in misclassification proportion for stochastic block model under weak regularity conditions. Our two-stage procedure consists of a generic refinement step that can take a wide range of weakly consistent community detection procedures as initializer, to which the refinement stage applies and outputs a community assignment achieving optimal misclassification proportion with high probability. The practical effectiveness of the new algorithm is demonstrated by competitive numerical results.

  Click for Model/Code and Paper
Superbloom: Bloom filter meets Transformer

Feb 11, 2020
John Anderson, Qingqing Huang, Walid Krichene, Steffen Rendle, Li Zhang

We extend the idea of word pieces in natural language models to machine learning tasks on opaque ids. This is achieved by applying hash functions to map each id to multiple hash tokens in a much smaller space, similarly to a Bloom filter. We show that by applying a multi-layer Transformer to these Bloom filter digests, we are able to obtain models with high accuracy. They outperform models of a similar size without hashing and, to a large degree, models of a much larger size trained using sampled softmax with the same computational budget. Our key observation is that it is important to use a multi-layer Transformer for Bloom filter digests to remove ambiguity in the hashed input. We believe this provides an alternative method to solving problems with large vocabulary size.

  Click for Model/Code and Paper
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Mar 14, 2018
Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang

Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning. In this work, we propose a combined bottom-up and top-down attention mechanism that enables attention to be calculated at the level of objects and other salient image regions. This is the natural basis for attention to be considered. Within our approach, the bottom-up mechanism (based on Faster R-CNN) proposes image regions, each with an associated feature vector, while the top-down mechanism determines feature weightings. Applying this approach to image captioning, our results on the MSCOCO test server establish a new state-of-the-art for the task, achieving CIDEr / SPICE / BLEU-4 scores of 117.9, 21.5 and 36.9, respectively. Demonstrating the broad applicability of the method, applying the same approach to VQA we obtain first place in the 2017 VQA Challenge.

* CVPR 2018 full oral, winner of the 2017 Visual Question Answering challenge 

  Click for Model/Code and Paper
Data Science in Service of Performing Arts: Applying Machine Learning to Predicting Audience Preferences

Sep 30, 2016
Jacob Abernethy, Cyrus Anderson, Alex Chojnacki, Chengyu Dai, John Dryden, Eric Schwartz, Wenbo Shen, Jonathan Stroud, Laura Wendlandt, Sheng Yang, Daniel Zhang

Performing arts organizations aim to enrich their communities through the arts. To do this, they strive to match their performance offerings to the taste of those communities. Success relies on understanding audience preference and predicting their behavior. Similar to most e-commerce or digital entertainment firms, arts presenters need to recommend the right performance to the right customer at the right time. As part of the Michigan Data Science Team (MDST), we partnered with the University Musical Society (UMS), a non-profit performing arts presenter housed in the University of Michigan, Ann Arbor. We are providing UMS with analysis and business intelligence, utilizing historical individual-level sales data. We built a recommendation system based on collaborative filtering, gaining insights into the artistic preferences of customers, along with the similarities between performances. To better understand audience behavior, we used statistical methods from customer-base analysis. We characterized customer heterogeneity via segmentation, and we modeled customer cohorts to understand and predict ticket purchasing patterns. Finally, we combined statistical modeling with natural language processing (NLP) to explore the impact of wording in program descriptions. These ongoing efforts provide a platform to launch targeted marketing campaigns, helping UMS carry out its mission by allocating its resources more efficiently. Celebrating its 138th season, UMS is a 2014 recipient of the National Medal of Arts, and it continues to enrich communities by connecting world-renowned artists with diverse audiences, especially students in their formative years. We aim to contribute to that mission through data science and customer analytics.

* Presented at the Data For Good Exchange 2016 

  Click for Model/Code and Paper
Interpolatron: Interpolation or Extrapolation Schemes to Accelerate Optimization for Deep Neural Networks

May 17, 2018
Guangzeng Xie, Yitan Wang, Shuchang Zhou, Zhihua Zhang

In this paper we explore acceleration techniques for large scale nonconvex optimization problems with special focuses on deep neural networks. The extrapolation scheme is a classical approach for accelerating stochastic gradient descent for convex optimization, but it does not work well for nonconvex optimization typically. Alternatively, we propose an interpolation scheme to accelerate nonconvex optimization and call the method Interpolatron. We explain motivation behind Interpolatron and conduct a thorough empirical analysis. Empirical results on DNNs of great depths (e.g., 98-layer ResNet and 200-layer ResNet) on CIFAR-10 and ImageNet show that Interpolatron can converge much faster than the state-of-the-art methods such as the SGD with momentum and Adam. Furthermore, Anderson's acceleration, in which mixing coefficients are computed by least-squares estimation, can also be used to improve the performance. Both Interpolatron and Anderson's acceleration are easy to implement and tune. We also show that Interpolatron has linear convergence rate under certain regularity assumptions.

  Click for Model/Code and Paper