Models, code, and papers for "Qi Ge":

OICSR: Out-In-Channel Sparsity Regularization for Compact Deep Neural Networks

Jun 06, 2019
Jiashi Li, Qi Qi, Jingyu Wang, Ce Ge, Yujian Li, Zhangzhang Yue, Haifeng Sun

Channel pruning can significantly accelerate and compress deep neural networks. Many channel pruning works utilize structured sparsity regularization to zero out all the weights in some channels and automatically obtain structure-sparse network in training stage. However, these methods apply structured sparsity regularization on each layer separately where the correlations between consecutive layers are omitted. In this paper, we first combine one out-channel in current layer and the corresponding in-channel in next layer as a regularization group, namely out-in-channel. Our proposed Out-In-Channel Sparsity Regularization (OICSR) considers correlations between successive layers to further retain predictive power of the compact network. Training with OICSR thoroughly transfers discriminative features into a fraction of out-in-channels. Correspondingly, OICSR measures channel importance based on statistics computed from two consecutive layers, not individual layer. Finally, a global greedy pruning algorithm is designed to remove redundant out-in-channels in an iterative way. Our method is comprehensively evaluated with various CNN architectures including CifarNet, AlexNet, ResNet, DenseNet and PreActSeNet on CIFAR-10, CIFAR-100 and ImageNet-1K datasets. Notably, on ImageNet-1K, we reduce 37.2% FLOPs on ResNet-50 while outperforming the original model by 0.22% top-1 accuracy.

* Accepted to CVPR 2019, the pruned ResNet-50 model has be released at: withdraw with personal reason, without error 

  Click for Model/Code and Paper
Double Neural Counterfactual Regret Minimization

Dec 27, 2018
Hui Li, Kailiang Hu, Zhibang Ge, Tao Jiang, Yuan Qi, Le Song

Counterfactual Regret Minimization (CRF) is a fundamental and effective technique for solving Imperfect Information Games (IIG). However, the original CRF algorithm only works for discrete state and action spaces, and the resulting strategy is maintained as a tabular representation. Such tabular representation limits the method from being directly applied to large games and continuing to improve from a poor strategy profile. In this paper, we propose a double neural representation for the imperfect information games, where one neural network represents the cumulative regret, and the other represents the average strategy. Furthermore, we adopt the counterfactual regret minimization algorithm to optimize this double neural representation. To make neural learning efficient, we also developed several novel techniques including a robust sampling method, mini-batch Monte Carlo Counterfactual Regret Minimization (MCCFR) and Monte Carlo Counterfactual Regret Minimization Plus (MCCFR+) which may be of independent interests. Experimentally, we demonstrate that the proposed double neural algorithm converges significantly better than the reinforcement learning counterpart.

  Click for Model/Code and Paper
One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling

Mar 04, 2014
Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, Tony Robinson

We propose a new benchmark corpus to be used for measuring progress in statistical language modeling. With almost one billion words of training data, we hope this benchmark will be useful to quickly evaluate novel language modeling techniques, and to compare their contribution when combined with other advanced techniques. We show performance of several well-known types of language models, with the best results achieved with a recurrent neural network based language model. The baseline unpruned Kneser-Ney 5-gram model achieves perplexity 67.6; a combination of techniques leads to 35% reduction in perplexity, or 10% reduction in cross-entropy (bits), over that baseline. The benchmark is available as a project; besides the scripts needed to rebuild the training/held-out data, it also makes available log-probability values for each word in each of ten held-out data sets, for each of the baseline n-gram models.

* Accompanied by a project allowing anyone to generate the benchmark data, and use it to compare their language model against the ones described in the paper 

  Click for Model/Code and Paper
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

Feb 21, 2019
Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel A. U. Bacchiani, Thomas B. Jablin, Rob Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon

Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of modular building blocks that are flexible and easily extensible, and experiment configurations are centralized and highly customizable. Distributed training and quantized inference are supported directly within the framework, and it contains existing implementations of a large number of utilities, helper functions, and the newest research ideas. Lingvo has been used in collaboration by dozens of researchers in more than 20 papers over the last two years. This document outlines the underlying design of Lingvo and serves as an introduction to the various pieces of the framework, while also offering examples of advanced features that showcase the capabilities of the framework.

  Click for Model/Code and Paper
Learning Spatial Awareness to Improve Crowd Counting

Sep 16, 2019
Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Alexander Hauptmann

The aim of crowd counting is to estimate the number of people in images by leveraging the annotation of center positions for pedestrians' heads. Promising progresses have been made with the prevalence of deep Convolutional Neural Networks. Existing methods widely employ the Euclidean distance (i.e., $L_2$ loss) to optimize the model, which, however, has two main drawbacks: (1) the loss has difficulty in learning the spatial awareness (i.e., the position of head) since it struggles to retain the high-frequency variation in the density map, and (2) the loss is highly sensitive to various noises in crowd counting, such as the zero-mean noise, head size changes, and occlusions. Although the Maximum Excess over SubArrays (MESA) loss has been previously proposed to address the above issues by finding the rectangular subregion whose predicted density map has the maximum difference from the ground truth, it cannot be solved by gradient descent, thus can hardly be integrated into the deep learning framework. In this paper, we present a novel architecture called SPatial Awareness Network (SPANet) to incorporate spatial context for crowd counting. The Maximum Excess over Pixels (MEP) loss is proposed to achieve this by finding the pixel-level subregion with high discrepancy to the ground truth. To this end, we devise a weakly supervised learning scheme to generate such region with a multi-branch architecture. The proposed framework can be integrated into existing deep crowd counting methods and is end-to-end trainable. Extensive experiments on four challenging benchmarks show that our method can significantly improve the performance of baselines. More remarkably, our approach outperforms the state-of-the-art methods on all benchmark datasets.

* ICCV 2019 Oral 

  Click for Model/Code and Paper
Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting

Sep 17, 2019
Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Jun-Yan He, Alexander Hauptmann

Tremendous variation in the scale of people/head size is a critical problem for crowd counting. To improve the scale invariance of feature representation, recent works extensively employ Convolutional Neural Networks with multi-column structures to handle different scales and resolutions. However, due to the substantial redundant parameters in columns, existing multi-column networks invariably exhibit almost the same scale features in different columns, which severely affects counting accuracy and leads to overfitting. In this paper, we attack this problem by proposing a novel Multi-column Mutual Learning (McML) strategy. It has two main innovations: 1) A statistical network is incorporated into the multi-column framework to estimate the mutual information between columns, which can approximately indicate the scale correlation between features from different columns. By minimizing the mutual information, each column is guided to learn features with different image scales. 2) We devise a mutual learning scheme that can alternately optimize each column while keeping the other columns fixed on each mini-batch training data. With such asynchronous parameter update process, each column is inclined to learn different feature representation from others, which can efficiently reduce the parameter redundancy and improve generalization ability. More remarkably, McML can be applied to all existing multi-column networks and is end-to-end trainable. Extensive experiments on four challenging benchmarks show that McML can significantly improve the original multi-column networks and outperform the other state-of-the-art approaches.

* ACM Multimedia 2019 

  Click for Model/Code and Paper
Artificial Intelligence BlockCloud (AIBC) Technical Whitepaper

Sep 26, 2019
Qi Deng

The AIBC is an Artificial Intelligence and blockchain technology based large-scale decentralized ecosystem that allows system-wide low-cost sharing of computing and storage resources. The AIBC consists of four layers: a fundamental layer, a resource layer, an application layer, and an ecosystem layer. The AIBC implements a two-consensus scheme to enforce upper-layer economic policies and achieve fundamental layer performance and robustness: the DPoEV incentive consensus on the application and resource layers, and the DABFT distributed consensus on the fundamental layer. The DABFT uses deep learning techniques to predict and select the most suitable BFT algorithm in order to achieve the best balance of performance, robustness, and security. The DPoEV uses the knowledge map algorithm to accurately assess the economic value of digital assets.

  Click for Model/Code and Paper
F-Cooper: Feature based Cooperative Perception for Autonomous Vehicle Edge Computing System Using 3D Point Clouds

Sep 13, 2019
Qi Chen

Autonomous vehicles are heavily reliant upon their sensors to perfect the perception of surrounding environments, however, with the current state of technology, the data which a vehicle uses is confined to that from its own sensors. Data sharing between vehicles and/or edge servers is limited by the available network bandwidth and the stringent real-time constraints of autonomous driving applications. To address these issues, we propose a point cloud feature based cooperative perception framework (F-Cooper) for connected autonomous vehicles to achieve a better object detection precision. Not only will feature based data be sufficient for the training process, we also use the features' intrinsically small size to achieve real-time edge computing, without running the risk of congesting the network. Our experiment results show that by fusing features, we are able to achieve a better object detection result, around 10% improvement for detection within 20 meters and 30% for further distances, as well as achieve faster edge computing with a low communication delay, requiring 71 milliseconds in certain feature selections. To the best of our knowledge, we are the first to introduce feature-level data fusion to connected autonomous vehicles for the purpose of enhancing object detection and making real-time edge computing on inter-vehicle data feasible for autonomous vehicles.

* Accepted by SEC2019 

  Click for Model/Code and Paper
Submodular Mini-Batch Training in Generative Moment Matching Networks

Aug 03, 2017
Jun Qi

This article was withdrawn because (1) it was uploaded without the co-authors' knowledge or consent, and (2) there are allegations of plagiarism.

* The paper has been withdrawn. See the abstract for the reason 

  Click for Model/Code and Paper
Distributed Parameter Map-Reduce

Oct 03, 2015
Qi Li

This paper describes how to convert a machine learning problem into a series of map-reduce tasks. We study logistic regression algorithm. In logistic regression algorithm, it is assumed that samples are independent and each sample is assigned a probability. Parameters are obtained by maxmizing the product of all sample probabilities. Rapid expansion of training samples brings challenges to machine learning method. Training samples are so many that they can be only stored in distributed file system and driven by map-reduce style programs. The main step of logistic regression is inference. According to map-reduce spirit, each sample makes inference through a separate map procedure. But the premise of inference is that the map procedure holds parameters for all features in the sample. In this paper, we propose Distributed Parameter Map-Reduce, in which not only samples, but also parameters are distributed in nodes of distributed filesystem. Through a series of map-reduce tasks, we assign each sample parameters for its features, make inference for the sample and update paramters of the model. The above processes are excuted looply until convergence. We test the proposed algorithm in actual hadoop production environment. Experiments show that the acceleration of the algorithm is in linear relationship with the number of cluster nodes.

  Click for Model/Code and Paper
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Jun 07, 2017
Charles R. Qi, Li Yi, Hao Su, Leonidas J. Guibas

Few prior works study deep learning on point sets. PointNet by Qi et al. is a pioneer in this direction. However, by design PointNet does not capture local structures induced by the metric space points live in, limiting its ability to recognize fine-grained patterns and generalizability to complex scenes. In this work, we introduce a hierarchical neural network that applies PointNet recursively on a nested partitioning of the input point set. By exploiting metric space distances, our network is able to learn local features with increasing contextual scales. With further observation that point sets are usually sampled with varying densities, which results in greatly decreased performance for networks trained on uniform densities, we propose novel set learning layers to adaptively combine features from multiple scales. Experiments show that our network called PointNet++ is able to learn deep point set features efficiently and robustly. In particular, results significantly better than state-of-the-art have been obtained on challenging benchmarks of 3D point clouds.

  Click for Model/Code and Paper
A Machine Learning Analysis of the Features in Deceptive and Credible News

Oct 05, 2019
Qi Jia Sun

Fake news is a type of pervasive propaganda that spreads misinformation online, taking advantage of social media's extensive reach to manipulate public perception. Over the past three years, fake news has become a focal discussion point in the media due to its impact on the 2016 U.S. presidential election. Fake news can have severe real-world implications: in 2016, a man walked into a pizzeria carrying a rifle because he read that Hillary Clinton was harboring children as sex slaves. This project presents a high accuracy (87%) machine learning classifier that determines the validity of news based on the word distributions and specific linguistic and stylistic differences in the first few sentences of an article. This can help readers identify the validity of an article by looking for specific features in the opening lines aiding them in making informed decisions. Using a dataset of 2,107 articles from 30 different websites, this project establishes an understanding of the variations between fake and credible news by examining the model, dataset, and features. This classifier appears to use the differences in word distribution, levels of tone authenticity, and frequency of adverbs, adjectives, and nouns. The differentiation in the features of these articles can be used to improve future classifiers. This classifier can also be further applied directly to browsers as a Google Chrome extension or as a filter for social media outlets or news websites to reduce the spread of misinformation.

  Click for Model/Code and Paper
Learning Generalized Transformation Equivariant Representations via Autoencoding Transformations

Jun 19, 2019
Guo-Jun Qi

Learning Transformation Equivariant Representations (TERs) seeks to capture the intrinsic visual structures of images through the representations that equivary to the applied transformations. It assumes that a transformation should be decoded from expressive representations of images before and after transformations. It greatly expands the scope of {\em translation} equivariance pinpointing the success of the Convolutional Neural Networks (CNNs) to develop a generic class of {\em transformation} equivariant representations. Unlike group equivariant convolutions that are limited to discrete transformations or linear transformation equivariance, we present a more flexible and tractable AutoEncoding Transformation (AET) model that can handle various types of transformations. Both deterministic AET and probabilistic Autoencoding Variational Transformations (AVT) models are presented. While the former trains transformation equivariant representations by directly reconstructing applied transformations, the latter is trained by maximizing the joint mutual information between the representations and the transformations. It leads to the Generalized TERs (GTERs) that could equivary against transformations in a more general manner by enabling them to capture more complex patterns of transformed visual structures beyond the linear TERs of a transformation group. We will further show that the presented approach can be extended to (semi-)supervised models by jointly maximizing the mutual information in the learned representations about the input labels and transformations. Experiment results following the standard evaluation protocols demonstrate the superior performances of the proposed models to the existing state-of-the-art unsupervised and (semi-)supervised approaches in literature.

* arXiv admin note: text overlap with arXiv:1903.10863 

  Click for Model/Code and Paper
Discriminative Cross-View Binary Representation Learning

Apr 04, 2018
Liu Liu, Hairong Qi

Learning compact representation is vital and challenging for large scale multimedia data. Cross-view/cross-modal hashing for effective binary representation learning has received significant attention with exponentially growing availability of multimedia content. Most existing cross-view hashing algorithms emphasize the similarities in individual views, which are then connected via cross-view similarities. In this work, we focus on the exploitation of the discriminative information from different views, and propose an end-to-end method to learn semantic-preserving and discriminative binary representation, dubbed Discriminative Cross-View Hashing (DCVH), in light of learning multitasking binary representation for various tasks including cross-view retrieval, image-to-image retrieval, and image annotation/tagging. The proposed DCVH has the following key components. First, it uses convolutional neural network (CNN) based nonlinear hashing functions and multilabel classification for both images and texts simultaneously. Such hashing functions achieve effective continuous relaxation during training without explicit quantization loss by using Direct Binary Embedding (DBE) layers. Second, we propose an effective view alignment via Hamming distance minimization, which is efficiently accomplished by bit-wise XOR operation. Extensive experiments on two image-text benchmark datasets demonstrate that DCVH outperforms state-of-the-art cross-view hashing algorithms as well as single-view image hashing algorithms. In addition, DCVH can provide competitive performance for image annotation/tagging.

* WACV2018 
* Published in WACV2018. Code will be available soon 

  Click for Model/Code and Paper
Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities

Mar 19, 2018
Guo-Jun Qi

In this paper, we present the Lipschitz regularization theory and algorithms for a novel Loss-Sensitive Generative Adversarial Network (LS-GAN). Specifically, it trains a loss function to distinguish between real and fake samples by designated margins, while learning a generator alternately to produce realistic samples by minimizing their losses. The LS-GAN further regularizes its loss function with a Lipschitz regularity condition on the density of real data, yielding a regularized model that can better generalize to produce new data from a reasonable number of training examples than the classic GAN. We will further present a Generalized LS-GAN (GLS-GAN) and show it contains a large family of regularized GAN models, including both LS-GAN and Wasserstein GAN, as its special cases. Compared with the other GAN models, we will conduct experiments to show both LS-GAN and GLS-GAN exhibit competitive ability in generating new images in terms of the Minimum Reconstruction Error (MRE) assessed on a separate test set. We further extend the LS-GAN to a conditional form for supervised and semi-supervised learning problems, and demonstrate its outstanding performance on image classification tasks.

* The source codes for both LS-GAN and GLS-GAN are available at \url{}. LS-GAN is also supported by Microsoft CNTK at \url{}. The original codes of LS-GAN and GLS-GAN are also available at and 

  Click for Model/Code and Paper
Neural Network-Assisted Nonlinear Multiview Component Analysis: Identifiability and Algorithm

Sep 19, 2019
Qi Lyu, Xiao Fu

Multiview analysis aims at extracting shared latent components from data samples that are acquired in different domains, e.g., image, text, and audio. Classic multiview analysis, e.g., Canonical Correlation Analysis (CCA), tackles this problem via matching the linearly transformed views in a certain latent domain. More recently, powerful nonlinear learning tools such as kernel methods and neural networks are utilized for enhancing the classic CCA. However, unlike linear CCA whose theoretical aspects are clearly understood, nonlinear CCA approaches are largely intuition-driven. In particular, it is unclear under what conditions the shared latent components across the veiws can be identified---while identifiability plays an essential role in many applications. In this work, we revisit nonlinear multiview analysis and address both the theoretical and computational aspects. We take a nonlinear multiview mixture learning viewpoint, which is a natural extension of the classic generative models for linear CCA. From there, we derive a nonlinear multiview analysis criteron. We show that minimizing this criterion leads to identification of the latent shared components up to certain ambiguities, under reasonable conditions. Our derivation and formulation also offer new insights and interpretations to existing deep neural network-based CCA formulations. On the computation side, we propose an effective algorithm with simple and scalable update rules. A series of simulations and real-data experiments corroborate our theoretical analysis.

  Click for Model/Code and Paper
Self-driving scale car trained by Deep reinforcement Learning

Sep 08, 2019
Qi Zhang, Tao Du

This paper considers the problem of self-driving algorithm based on deep learning. This is a hot topic because self-driving is the most important application field of artificial intelligence. Existing work focused on deep learning which has the ability to learn end-to-end self-driving control directly from raw sensory data, but this method is just a mapping between images and driving. We prefer deep reinforcement learning to train a self-driving car in a virtual simulation environment created by Unity and then migrate to reality. Deep reinforcement learning makes the machine own the driving descision-making ability like human. The virtual to realistic training method can efficiently handle the problem that reinforcement learning requires reward from the environment which probably cause cars damge. We have derived a theoretical model and analysis on how to use Deep Q-learning to control a car to drive. We have carried out simulations in the Unity virtual environment for evaluating the performance. Finally, we successfully migrate te model to the real world and realize self-driving.

  Click for Model/Code and Paper
Efficiency of Coordinate Descent Methods For Structured Nonconvex Optimization

Sep 03, 2019
Qi Deng, Chenghao Lan

Novel coordinate descent (CD) methods are proposed for minimizing nonconvex functions consisting of three terms: (i) a continuously differentiable term, (ii) a simple convex term, and (iii) a concave and continuous term. First, by extending randomized CD to nonsmooth nonconvex settings, we develop a coordinate subgradient method that randomly updates block-coordinate variables by using block composite subgradient mapping. This method converges asymptotically to critical points with proven sublinear convergence rate for certain optimality measures. Second, we develop a randomly permuted CD method with two alternating steps: linearizing the concave part and cycling through variables. We prove asymptotic convergence to critical points and sublinear complexity rate for objectives with both smooth and concave parts. Third, we extend accelerated coordinate descent (ACD) to nonsmooth and nonconvex optimization to develop a novel randomized proximal DC algorithm whereby we solve the subproblem inexactly by ACD. Convergence is guaranteed with at most a few number of ACD iterations for each DC subproblem, and convergence complexity is established for identification of some approximate critical points. Fourth, we further develop the third method to minimize certain ill-conditioned nonconvex functions: weakly convex functions with high Lipschitz constant to negative curvature ratios. We show that, under specific criteria, the ACD-based randomized method has superior complexity compared to conventional gradient methods. Finally, an empirical study on sparsity-inducing learning models demonstrates that CD methods are superior to gradient-based methods for certain large-scale problems.

  Click for Model/Code and Paper