Models, code, and papers for "Qiang Sun":

Modified Multidimensional Scaling and High Dimensional Clustering

Oct 24, 2018
Xiucai Ding, Qiang Sun

Multidimensional scaling is an important dimension reduction tool in statistics and machine learning. Yet few theoretical results characterizing its statistical performance exist, not to mention any in high dimensions. By considering a unified framework that includes low, moderate and high dimensions, we study multidimensional scaling in the setting of clustering noisy data. Our results suggest that, in order to achieve consistent estimation of the embedding scheme, the classical multidimensional scaling needs to be modified, especially when the noise level increases. To this end, we propose {\it modified multidimensional scaling} which applies a nonlinear transformation to the sample eigenvalues. The nonlinear transformation depends on the dimensionality, sample size and unknown moment. We show that modified multidimensional scaling followed by various clustering algorithms can achieve exact recovery, i.e., all the cluster labels can be recovered correctly with probability tending to one. Numerical simulations and two real data applications lend strong support to our proposed methodology. As a byproduct, we unify and improve existing results on the $\ell_{\infty}$ bound for eigenvectors under only low bounded moment conditions. This can be of independent interest.

* 31 pages, 4 figures 

  Click for Model/Code and Paper
A cooperative game for automated learning of elasto-plasticity knowledge graphs and models with AI-guided experimentation

Mar 08, 2019
Kun Wang, WaiChing Sun, Qiang Du

We introduce a multi-agent meta-modeling game to generate data, knowledge, and models that make predictions on constitutive responses of elasto-plastic materials. We introduce a new concept from graph theory where a modeler agent is tasked with evaluating all the modeling options recast as a directed multigraph and find the optimal path that links the source of the directed graph (e.g. strain history) to the target (e.g. stress) measured by an objective function. Meanwhile, the data agent, which is tasked with generating data from real or virtual experiments (e.g. molecular dynamics, discrete element simulations), interacts with the modeling agent sequentially and uses reinforcement learning to design new experiments to optimize the prediction capacity. Consequently, this treatment enables us to emulate an idealized scientific collaboration as selections of the optimal choices in a decision tree search done automatically via deep reinforcement learning.


  Click for Model/Code and Paper
An Analysis of Classical Multidimensional Scaling

Jan 15, 2019
Anna Little, Yuying Xie, Qiang Sun

Classical multidimensional scaling is an important tool for dimension reduction in many applications. Yet few theoretical results characterizing its statistical performance exist. In this paper, we provide a theoretical framework for analyzing the quality of embedded samples produced by classical multidimensional scaling. This lays down the foundation for various downstream statistical analysis. As an application, we study its performance in the setting of clustering noisy data. Our results provide scaling conditions on the sample size, ambient dimensionality, between-class distance and noise level under which classical multidimensional scaling followed by a clustering algorithm can recover the cluster labels of all samples with high probability. Numerical simulations confirm these scaling conditions are sharp in low, moderate, and high dimensional regimes. Applications to both human RNAseq data and natural language data lend strong support to the methodology and theory.

* 37 pages including supplementary material 

  Click for Model/Code and Paper
Stochastic Training of Residual Networks: a Differential Equation Viewpoint

Dec 01, 2018
Qi Sun, Yunzhe Tao, Qiang Du

During the last few years, significant attention has been paid to the stochastic training of artificial neural networks, which is known as an effective regularization approach that helps improve the generalization capability of trained models. In this work, the method of modified equations is applied to show that the residual network and its variants with noise injection can be regarded as weak approximations of stochastic differential equations. Such observations enable us to bridge the stochastic training processes with the optimal control of backward Kolmogorov's equations. This not only offers a novel perspective on the effects of regularization from the loss landscape viewpoint but also sheds light on the design of more reliable and efficient stochastic training strategies. As an example, we propose a new way to utilize Bernoulli dropout within the plain residual network architecture and conduct experiments on a real-world image classification task to substantiate our theoretical findings.

* 20 pages, 8 figures, and 1 table 

  Click for Model/Code and Paper
An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches

Apr 24, 2018
Zhuoyao Zhong, Lei Sun, Qiang Huo

The anchor mechanism of Faster R-CNN and SSD framework is considered not effective enough to scene text detection, which can be attributed to its IoU based matching criterion between anchors and ground-truth boxes. In order to better enclose scene text instances of various shapes, it requires to design anchors of various scales, aspect ratios and even orientations manually, which makes anchor-based methods sophisticated and inefficient. In this paper, we propose a novel anchor-free region proposal network (AF-RPN) to replace the original anchor-based RPN in the Faster R-CNN framework to address the above problem. Compared with a vanilla RPN and FPN-RPN, AF-RPN can get rid of complicated anchor design and achieve higher recall rate on large-scale COCO-Text dataset. Owing to the high-quality text proposals, our Faster R-CNN based two-stage text detection approach achieves state-of-the-art results on ICDAR-2017 MLT, ICDAR-2015 and ICDAR-2013 text detection benchmarks when using single-scale and single-model (ResNet50) testing only.

* Technical report 

  Click for Model/Code and Paper
Nonconvex Regularized Robust Regression with Oracle Properties in Polynomial Time

Jul 09, 2019
Xiaoou Pan, Qiang Sun, Wen-Xin Zhou

This paper investigates tradeoffs among optimization errors, statistical rates of convergence and the effect of heavy-tailed random errors for high-dimensional adaptive Huber regression with nonconvex regularization. When the additive errors in linear models have only bounded second moment, our results suggest that adaptive Huber regression with nonconvex regularization yields statistically optimal estimators that satisfy oracle properties as if the true underlying support set were known beforehand. Computationally, we need as many as O(log s + log log d) convex relaxations to reach such oracle estimators, where s and d denote the sparsity and ambient dimension, respectively. Numerical studies lend strong support to our methodology and theory.

* 55 pages 

  Click for Model/Code and Paper
Distributionally Robust Reduced Rank Regression and Principal Component Analysis in High Dimensions

Oct 18, 2018
Kean Ming Tan, Qiang Sun, Daniela Witten

We propose robust sparse reduced rank regression and robust sparse principal component analysis for analyzing large and complex high-dimensional data with heavy-tailed random noise. The proposed methods are based on convex relaxations of rank-and sparsity-constrained non-convex optimization problems, which are solved using the alternating direction method of multipliers (ADMM) algorithm. For robust sparse reduced rank regression, we establish non-asymptotic estimation error bounds under both Frobenius and nuclear norms, while existing results focus mostly on rank-selection and prediction consistency. Our theoretical results quantify the tradeoff between heavy-tailedness of the random noise and statistical bias. For random noise with bounded $(1+\delta)$th moment with $\delta \in (0,1)$, the rate of convergence is a function of $\delta$, and is slower than the sub-Gaussian-type deviation bounds; for random noise with bounded second moment, we recover the results obtained under sub-Gaussian noise. Furthermore, the transition between the two regimes is smooth. For robust sparse principal component analysis, we propose to truncate the observed data, and show that this truncation will lead to consistent estimation of the eigenvectors. We then establish theoretical results similar to those of robust sparse reduced rank regression. We illustrate the performance of these methods via extensive numerical studies and two real data applications.


  Click for Model/Code and Paper
Mask R-CNN with Pyramid Attention Network for Scene Text Detection

Nov 22, 2018
Zhida Huang, Zhuoyao Zhong, Lei Sun, Qiang Huo

In this paper, we present a new Mask R-CNN based text detection approach which can robustly detect multi-oriented and curved text from natural scene images in a unified manner. To enhance the feature representation ability of Mask R-CNN for text detection tasks, we propose to use the Pyramid Attention Network (PAN) as a new backbone network of Mask R-CNN. Experiments demonstrate that PAN can suppress false alarms caused by text-like backgrounds more effectively. Our proposed approach has achieved superior performance on both multi-oriented (ICDAR-2015, ICDAR-2017 MLT) and curved (SCUT-CTW1500) text detection benchmark tasks by only using single-scale and single-model testing.

* Accepted by WACV 2019 

  Click for Model/Code and Paper
Nonlocal Neural Networks, Nonlocal Diffusion and Nonlocal Modeling

Oct 25, 2018
Yunzhe Tao, Qi Sun, Qiang Du, Wei Liu

Nonlocal neural networks have been proposed and shown to be effective in several computer vision tasks, where the nonlocal operations can directly capture long-range dependencies in the feature space. In this paper, we study the nature of diffusion and damping effect of nonlocal networks by doing spectrum analysis on the weight matrices of the well-trained networks, and then propose a new formulation of the nonlocal block. The new block not only learns the nonlocal interactions but also has stable dynamics, thus allowing deeper nonlocal structures. Moreover, we interpret our formulation from the general nonlocal modeling perspective, where we make connections between the proposed nonlocal network and other nonlocal models, such as nonlocal diffusion process and Markov jump process.

* Accepted by NIPS 2018 

  Click for Model/Code and Paper
Graphical Nonconvex Optimization for Optimal Estimation in Gaussian Graphical Models

Jun 04, 2017
Qiang Sun, Kean Ming Tan, Han Liu, Tong Zhang

We consider the problem of learning high-dimensional Gaussian graphical models. The graphical lasso is one of the most popular methods for estimating Gaussian graphical models. However, it does not achieve the oracle rate of convergence. In this paper, we propose the graphical nonconvex optimization for optimal estimation in Gaussian graphical models, which is then approximated by a sequence of convex programs. Our proposal is computationally tractable and produces an estimator that achieves the oracle rate of convergence. The statistical error introduced by the sequential approximation using the convex programs are clearly demonstrated via a contraction property. The rate of convergence can be further improved using the notion of sparsity pattern. The proposed methodology is then extended to semiparametric graphical models. We show through numerical studies that the proposed estimator outperforms other popular methods for estimating Gaussian graphical models.

* 3 figures 

  Click for Model/Code and Paper
Selective Image Super-Resolution

Oct 27, 2010
Ju Sun, Qiang Chen, Shuicheng Yan, Loong-Fah Cheong

In this paper we propose a vision system that performs image Super Resolution (SR) with selectivity. Conventional SR techniques, either by multi-image fusion or example-based construction, have failed to capitalize on the intrinsic structural and semantic context in the image, and performed "blind" resolution recovery to the entire image area. By comparison, we advocate example-based selective SR whereby selectivity is exemplified in three aspects: region selectivity (SR only at object regions), source selectivity (object SR with trained object dictionaries), and refinement selectivity (object boundaries refinement using matting). The proposed system takes over-segmented low-resolution images as inputs, assimilates recent learning techniques of sparse coding (SC) and grouped multi-task lasso (GMTL), and leads eventually to a framework for joint figure-ground separation and interest object SR. The efficiency of our framework is manifested in our experiments with subsets of the VOC2009 and MSRC datasets. We also demonstrate several interesting vision applications that can build on our system.

* 20 pages, 5 figures. Submitted to Computer Vision and Image Understanding in March 2010. Keywords: image super resolution, semantic image segmentation, vision system, vision application 

  Click for Model/Code and Paper
A unified framework of predicting binary interestingness of images based on discriminant correlation analysis and multiple kernel learning

Oct 14, 2019
Qiang Sun, Liting Wang, Maohui Li, Longtao Zhang, Yuxiang Yang

In the modern content-based image retrieval systems, there is an increasingly interest in constructing a computationally effective model to predict the interestingness of images since the measure of image interestingness could improve the human-centered search satisfaction and the user experience in different applications. In this paper, we propose a unified framework to predict the binary interestingness of images based on discriminant correlation analysis (DCA) and multiple kernel learning (MKL) techniques. More specially, on the one hand, to reduce feature redundancy in describing the interestingness cues of images, the DCA or multi-set discriminant correlation analysis (MDCA) technique is adopted to fuse multiple feature sets of the same type for individual cues by taking into account the class structure among the samples involved to describe the three classical interestingness cues, unusualness,aesthetics as well as general preferences, with three sets of compact and representative features; on the other hand, to make good use of the heterogeneity from the three sets of high-level features for describing the interestingness cues, the SimpleMKL method is employed to enhance the generalization ability of the built model for the task of the binary interestingness classification. Experimental results on the publicly-released interestingness prediction data set have demonstrated the rationality and effectiveness of the proposed framework in the binary prediction of image interestingness where we have conducted several groups of comparative studies across different interestingness feature combinations, different interestingness cues, as well as different feature types for the three interestingness cues.

* 30 pages, 9 figures, 6 tables 

  Click for Model/Code and Paper
Communication-efficient sparse regression: a one-shot approach

Aug 11, 2015
Jason D. Lee, Yuekai Sun, Qiang Liu, Jonathan E. Taylor

We devise a one-shot approach to distributed sparse regression in the high-dimensional setting. The key idea is to average "debiased" or "desparsified" lasso estimators. We show the approach converges at the same rate as the lasso as long as the dataset is not split across too many machines. We also extend the approach to generalized linear models.

* 29 pages, 3 figures 

  Click for Model/Code and Paper
OpenHowNet: An Open Sememe-based Lexical Knowledge Base

Jan 28, 2019
Fanchao Qi, Chenghao Yang, Zhiyuan Liu, Qiang Dong, Maosong Sun, Zhendong Dong

In this paper, we present an open sememe-based lexical knowledge base OpenHowNet. Based on well-known HowNet, OpenHowNet comprises three components: core data which is composed of more than 100 thousand senses annotated with sememes, OpenHowNet Web which gives a brief introduction to OpenHowNet as well as provides online exhibition of OpenHowNet information, and OpenHowNet API which includes several useful APIs such as accessing OpenHowNet core data and drawing sememe tree structures of senses. In the main text, we first give some backgrounds including definition of sememe and details of HowNet. And then we introduce some previous HowNet and sememe-based research works. Last but not least, we detail the constituents of OpenHowNet and their basic features and functionalities. Additionally, we briefly make a summary and list some future works.

* 4 pages, 3 figures 

  Click for Model/Code and Paper
Efficient Delivery Policy to Minimize User Traffic Consumption in Guaranteed Advertising

Nov 23, 2016
Jia Zhang, Zheng Wang, Qian Li, Jialin Zhang, Yanyan Lan, Qiang Li, Xiaoming Sun

In this work, we study the guaranteed delivery model which is widely used in online display advertising. In the guaranteed delivery scenario, ad exposures (which are also called impressions in some works) to users are guaranteed by contracts signed in advance between advertisers and publishers. A crucial problem for the advertising platform is how to fully utilize the valuable user traffic to generate as much as possible revenue. Different from previous works which usually minimize the penalty of unsatisfied contracts and some other cost (e.g. representativeness), we propose the novel consumption minimization model, in which the primary objective is to minimize the user traffic consumed to satisfy all contracts. Under this model, we develop a near optimal method to deliver ads for users. The main advantage of our method lies in that it consumes nearly as least as possible user traffic to satisfy all contracts, therefore more contracts can be accepted to produce more revenue. It also enables the publishers to estimate how much user traffic is redundant or short so that they can sell or buy this part of traffic in bulk in the exchange market. Furthermore, it is robust with regard to priori knowledge of user type distribution. Finally, the simulation shows that our method outperforms the traditional state-of-the-art methods.

* Already accepted by AAAI'17 

  Click for Model/Code and Paper
Automatic Detection of ECG Abnormalities by using an Ensemble of Deep Residual Networks with Attention

Aug 27, 2019
Yang Liu, Runnan He, Kuanquan Wang, Qince Li, Qiang Sun, Na Zhao, Henggui Zhang

Heart disease is one of the most common diseases causing morbidity and mortality. Electrocardiogram (ECG) has been widely used for diagnosing heart diseases for its simplicity and non-invasive property. Automatic ECG analyzing technologies are expected to reduce human working load and increase diagnostic efficacy. However, there are still some challenges to be addressed for achieving this goal. In this study, we develop an algorithm to identify multiple abnormalities from 12-lead ECG recordings. In the algorithm pipeline, several preprocessing methods are firstly applied on the ECG data for denoising, augmentation and balancing recording numbers of variant classes. In consideration of efficiency and consistency of data length, the recordings are padded or truncated into a medium length, where the padding/truncating time windows are selected randomly to sup-press overfitting. Then, the ECGs are used to train deep neural network (DNN) models with a novel structure that combines a deep residual network with an attention mechanism. Finally, an ensemble model is built based on these trained models to make predictions on the test data set. Our method is evaluated based on the test set of the First China ECG Intelligent Competition dataset by using the F1 metric that is regarded as the harmonic mean between the precision and recall. The resultant overall F1 score of the algorithm is 0.875, showing a promising performance and potential for practical use.

* 8 pages, 2 figures, conference 

  Click for Model/Code and Paper
Question Guided Modular Routing Networks for Visual Question Answering

Apr 17, 2019
Yanze Wu, Qiang Sun, Jianqi Ma, Bin Li, Yanwei Fu, Yao Peng, Xiangyang Xue

Visual Question Answering (VQA) faces two major challenges: how to better fuse the visual and textual modalities and how to make the VQA model have the reasoning ability to answer more complex questions. In this paper, we address both challenges by proposing the novel Question Guided Modular Routing Networks (QGMRN). QGMRN can fuse the visual and textual modalities in multiple semantic levels which makes the fusion occur in a fine-grained way, it also can learn to reason by routing between the generic modules without additional supervision information or prior knowledge. The proposed QGMRN consists of three sub-networks: visual network, textual network and routing network. The routing network selectively executes each module in the visual network according to the pathway activated by the question features generated by the textual network. Experiments on the CLEVR dataset show that our model can outperform the state-of-the-art. Models and Codes will be released.


  Click for Model/Code and Paper
Real time backbone for semantic segmentation

Mar 16, 2019
Zhengeng Yang, Hongshan Yu, Qiang Fu, Wei Sun, Wenyan Jia, Mingui Sun, Zhi-Hong Mao

The rapid development of autonomous driving in recent years presents lots of challenges for scene understanding. As an essential step towards scene understanding, semantic segmentation thus received lots of attention in past few years. Although deep learning based state-of-the-arts have achieved great success in improving the segmentation accuracy, most of them suffer from an inefficiency problem and can hardly applied to practical applications. In this paper, we systematically analyze the computation cost of Convolutional Neural Network(CNN) and found that the inefficiency of CNN is mainly caused by its wide structure rather than the deep structure. In addition, the success of pruning based model compression methods proved that there are many redundant channels in CNN. Thus, we designed a very narrow while deep backbone network to improve the efficiency of semantic segmentation. By casting our network to FCN32 segmentation architecture, the basic structure of most segmentation methods, we achieved 60.6\% mIoU on Cityscape val dataset with 54 frame per seconds(FPS) on $1024\times2048$ inputs, which already outperforms one of the earliest real time deep learning based segmentation methods: ENet.

* 6pages 

  Click for Model/Code and Paper
Combination of Diverse Ranking Models for Personalized Expedia Hotel Searches

Nov 29, 2013
Xudong Liu, Bing Xu, Yuyu Zhang, Qiang Yan, Liang Pang, Qiang Li, Hanxiao Sun, Bin Wang

The ICDM Challenge 2013 is to apply machine learning to the problem of hotel ranking, aiming to maximize purchases according to given hotel characteristics, location attractiveness of hotels, user's aggregated purchase history and competitive online travel agency information for each potential hotel choice. This paper describes the solution of team "binghsu & MLRush & BrickMover". We conduct simple feature engineering work and train different models by each individual team member. Afterwards, we use listwise ensemble method to combine each model's output. Besides describing effective model and features, we will discuss about the lessons we learned while using deep learning in this competition.

* 6 pages, 3 figures 

  Click for Model/Code and Paper
FAB: A Robust Facial Landmark Detection Framework for Motion-Blurred Videos

Oct 26, 2019
Keqiang Sun, Wayne Wu, Tinghao Liu, Shuo Yang, Quan Wang, Qiang Zhou, Zuochang Ye, Chen Qian

Recently, facial landmark detection algorithms have achieved remarkable performance on static images. However, these algorithms are neither accurate nor stable in motion-blurred videos. The missing of structure information makes it difficult for state-of-the-art facial landmark detection algorithms to yield good results. In this paper, we propose a framework named FAB that takes advantage of structure consistency in the temporal dimension for facial landmark detection in motion-blurred videos. A structure predictor is proposed to predict the missing face structural information temporally, which serves as a geometry prior. This allows our framework to work as a virtuous circle. On one hand, the geometry prior helps our structure-aware deblurring network generates high quality deblurred images which lead to better landmark detection results. On the other hand, better landmark detection results help structure predictor generate better geometry prior for the next frame. Moreover, it is a flexible video-based framework that can incorporate any static image-based methods to provide a performance boost on video datasets. Extensive experiments on Blurred-300VW, the proposed Real-world Motion Blur (RWMB) datasets and 300VW demonstrate the superior performance to the state-of-the-art methods. Datasets and models will be publicly available at https://keqiangsun.github.io/projects/FAB/FAB.html.

* Accepted to ICCV 2019. Project page: https://keqiangsun.github.io/projects/FAB/FAB.html 

  Click for Model/Code and Paper