Research papers and code for "Qiang Sun":
Multidimensional scaling is an important dimension reduction tool in statistics and machine learning. Yet few theoretical results characterizing its statistical performance exist, not to mention any in high dimensions. By considering a unified framework that includes low, moderate and high dimensions, we study multidimensional scaling in the setting of clustering noisy data. Our results suggest that, in order to achieve consistent estimation of the embedding scheme, the classical multidimensional scaling needs to be modified, especially when the noise level increases. To this end, we propose {\it modified multidimensional scaling} which applies a nonlinear transformation to the sample eigenvalues. The nonlinear transformation depends on the dimensionality, sample size and unknown moment. We show that modified multidimensional scaling followed by various clustering algorithms can achieve exact recovery, i.e., all the cluster labels can be recovered correctly with probability tending to one. Numerical simulations and two real data applications lend strong support to our proposed methodology. As a byproduct, we unify and improve existing results on the $\ell_{\infty}$ bound for eigenvectors under only low bounded moment conditions. This can be of independent interest.

* 31 pages, 4 figures
Click to Read Paper and Get Code
We introduce a multi-agent meta-modeling game to generate data, knowledge, and models that make predictions on constitutive responses of elasto-plastic materials. We introduce a new concept from graph theory where a modeler agent is tasked with evaluating all the modeling options recast as a directed multigraph and find the optimal path that links the source of the directed graph (e.g. strain history) to the target (e.g. stress) measured by an objective function. Meanwhile, the data agent, which is tasked with generating data from real or virtual experiments (e.g. molecular dynamics, discrete element simulations), interacts with the modeling agent sequentially and uses reinforcement learning to design new experiments to optimize the prediction capacity. Consequently, this treatment enables us to emulate an idealized scientific collaboration as selections of the optimal choices in a decision tree search done automatically via deep reinforcement learning.

Click to Read Paper and Get Code
Classical multidimensional scaling is an important tool for dimension reduction in many applications. Yet few theoretical results characterizing its statistical performance exist. In this paper, we provide a theoretical framework for analyzing the quality of embedded samples produced by classical multidimensional scaling. This lays down the foundation for various downstream statistical analysis. As an application, we study its performance in the setting of clustering noisy data. Our results provide scaling conditions on the sample size, ambient dimensionality, between-class distance and noise level under which classical multidimensional scaling followed by a clustering algorithm can recover the cluster labels of all samples with high probability. Numerical simulations confirm these scaling conditions are sharp in low, moderate, and high dimensional regimes. Applications to both human RNAseq data and natural language data lend strong support to the methodology and theory.

* 37 pages including supplementary material
Click to Read Paper and Get Code
During the last few years, significant attention has been paid to the stochastic training of artificial neural networks, which is known as an effective regularization approach that helps improve the generalization capability of trained models. In this work, the method of modified equations is applied to show that the residual network and its variants with noise injection can be regarded as weak approximations of stochastic differential equations. Such observations enable us to bridge the stochastic training processes with the optimal control of backward Kolmogorov's equations. This not only offers a novel perspective on the effects of regularization from the loss landscape viewpoint but also sheds light on the design of more reliable and efficient stochastic training strategies. As an example, we propose a new way to utilize Bernoulli dropout within the plain residual network architecture and conduct experiments on a real-world image classification task to substantiate our theoretical findings.

* 20 pages, 8 figures, and 1 table
Click to Read Paper and Get Code
The anchor mechanism of Faster R-CNN and SSD framework is considered not effective enough to scene text detection, which can be attributed to its IoU based matching criterion between anchors and ground-truth boxes. In order to better enclose scene text instances of various shapes, it requires to design anchors of various scales, aspect ratios and even orientations manually, which makes anchor-based methods sophisticated and inefficient. In this paper, we propose a novel anchor-free region proposal network (AF-RPN) to replace the original anchor-based RPN in the Faster R-CNN framework to address the above problem. Compared with a vanilla RPN and FPN-RPN, AF-RPN can get rid of complicated anchor design and achieve higher recall rate on large-scale COCO-Text dataset. Owing to the high-quality text proposals, our Faster R-CNN based two-stage text detection approach achieves state-of-the-art results on ICDAR-2017 MLT, ICDAR-2015 and ICDAR-2013 text detection benchmarks when using single-scale and single-model (ResNet50) testing only.

* Technical report
Click to Read Paper and Get Code
This paper investigates tradeoffs among optimization errors, statistical rates of convergence and the effect of heavy-tailed random errors for high-dimensional adaptive Huber regression with nonconvex regularization. When the additive errors in linear models have only bounded second moment, our results suggest that adaptive Huber regression with nonconvex regularization yields statistically optimal estimators that satisfy oracle properties as if the true underlying support set were known beforehand. Computationally, we need as many as O(log s + log log d) convex relaxations to reach such oracle estimators, where s and d denote the sparsity and ambient dimension, respectively. Numerical studies lend strong support to our methodology and theory.

* 55 pages
Click to Read Paper and Get Code
We propose robust sparse reduced rank regression and robust sparse principal component analysis for analyzing large and complex high-dimensional data with heavy-tailed random noise. The proposed methods are based on convex relaxations of rank-and sparsity-constrained non-convex optimization problems, which are solved using the alternating direction method of multipliers (ADMM) algorithm. For robust sparse reduced rank regression, we establish non-asymptotic estimation error bounds under both Frobenius and nuclear norms, while existing results focus mostly on rank-selection and prediction consistency. Our theoretical results quantify the tradeoff between heavy-tailedness of the random noise and statistical bias. For random noise with bounded $(1+\delta)$th moment with $\delta \in (0,1)$, the rate of convergence is a function of $\delta$, and is slower than the sub-Gaussian-type deviation bounds; for random noise with bounded second moment, we recover the results obtained under sub-Gaussian noise. Furthermore, the transition between the two regimes is smooth. For robust sparse principal component analysis, we propose to truncate the observed data, and show that this truncation will lead to consistent estimation of the eigenvectors. We then establish theoretical results similar to those of robust sparse reduced rank regression. We illustrate the performance of these methods via extensive numerical studies and two real data applications.

Click to Read Paper and Get Code
In this paper, we present a new Mask R-CNN based text detection approach which can robustly detect multi-oriented and curved text from natural scene images in a unified manner. To enhance the feature representation ability of Mask R-CNN for text detection tasks, we propose to use the Pyramid Attention Network (PAN) as a new backbone network of Mask R-CNN. Experiments demonstrate that PAN can suppress false alarms caused by text-like backgrounds more effectively. Our proposed approach has achieved superior performance on both multi-oriented (ICDAR-2015, ICDAR-2017 MLT) and curved (SCUT-CTW1500) text detection benchmark tasks by only using single-scale and single-model testing.

* Accepted by WACV 2019
Click to Read Paper and Get Code
Nonlocal neural networks have been proposed and shown to be effective in several computer vision tasks, where the nonlocal operations can directly capture long-range dependencies in the feature space. In this paper, we study the nature of diffusion and damping effect of nonlocal networks by doing spectrum analysis on the weight matrices of the well-trained networks, and then propose a new formulation of the nonlocal block. The new block not only learns the nonlocal interactions but also has stable dynamics, thus allowing deeper nonlocal structures. Moreover, we interpret our formulation from the general nonlocal modeling perspective, where we make connections between the proposed nonlocal network and other nonlocal models, such as nonlocal diffusion process and Markov jump process.

* Accepted by NIPS 2018
Click to Read Paper and Get Code
We consider the problem of learning high-dimensional Gaussian graphical models. The graphical lasso is one of the most popular methods for estimating Gaussian graphical models. However, it does not achieve the oracle rate of convergence. In this paper, we propose the graphical nonconvex optimization for optimal estimation in Gaussian graphical models, which is then approximated by a sequence of convex programs. Our proposal is computationally tractable and produces an estimator that achieves the oracle rate of convergence. The statistical error introduced by the sequential approximation using the convex programs are clearly demonstrated via a contraction property. The rate of convergence can be further improved using the notion of sparsity pattern. The proposed methodology is then extended to semiparametric graphical models. We show through numerical studies that the proposed estimator outperforms other popular methods for estimating Gaussian graphical models.

* 3 figures
Click to Read Paper and Get Code
In this paper we propose a vision system that performs image Super Resolution (SR) with selectivity. Conventional SR techniques, either by multi-image fusion or example-based construction, have failed to capitalize on the intrinsic structural and semantic context in the image, and performed "blind" resolution recovery to the entire image area. By comparison, we advocate example-based selective SR whereby selectivity is exemplified in three aspects: region selectivity (SR only at object regions), source selectivity (object SR with trained object dictionaries), and refinement selectivity (object boundaries refinement using matting). The proposed system takes over-segmented low-resolution images as inputs, assimilates recent learning techniques of sparse coding (SC) and grouped multi-task lasso (GMTL), and leads eventually to a framework for joint figure-ground separation and interest object SR. The efficiency of our framework is manifested in our experiments with subsets of the VOC2009 and MSRC datasets. We also demonstrate several interesting vision applications that can build on our system.

* 20 pages, 5 figures. Submitted to Computer Vision and Image Understanding in March 2010. Keywords: image super resolution, semantic image segmentation, vision system, vision application
Click to Read Paper and Get Code
We devise a one-shot approach to distributed sparse regression in the high-dimensional setting. The key idea is to average "debiased" or "desparsified" lasso estimators. We show the approach converges at the same rate as the lasso as long as the dataset is not split across too many machines. We also extend the approach to generalized linear models.

* 29 pages, 3 figures
Click to Read Paper and Get Code
In this paper, we present an open sememe-based lexical knowledge base OpenHowNet. Based on well-known HowNet, OpenHowNet comprises three components: core data which is composed of more than 100 thousand senses annotated with sememes, OpenHowNet Web which gives a brief introduction to OpenHowNet as well as provides online exhibition of OpenHowNet information, and OpenHowNet API which includes several useful APIs such as accessing OpenHowNet core data and drawing sememe tree structures of senses. In the main text, we first give some backgrounds including definition of sememe and details of HowNet. And then we introduce some previous HowNet and sememe-based research works. Last but not least, we detail the constituents of OpenHowNet and their basic features and functionalities. Additionally, we briefly make a summary and list some future works.

* 4 pages, 3 figures
Click to Read Paper and Get Code
In this work, we study the guaranteed delivery model which is widely used in online display advertising. In the guaranteed delivery scenario, ad exposures (which are also called impressions in some works) to users are guaranteed by contracts signed in advance between advertisers and publishers. A crucial problem for the advertising platform is how to fully utilize the valuable user traffic to generate as much as possible revenue. Different from previous works which usually minimize the penalty of unsatisfied contracts and some other cost (e.g. representativeness), we propose the novel consumption minimization model, in which the primary objective is to minimize the user traffic consumed to satisfy all contracts. Under this model, we develop a near optimal method to deliver ads for users. The main advantage of our method lies in that it consumes nearly as least as possible user traffic to satisfy all contracts, therefore more contracts can be accepted to produce more revenue. It also enables the publishers to estimate how much user traffic is redundant or short so that they can sell or buy this part of traffic in bulk in the exchange market. Furthermore, it is robust with regard to priori knowledge of user type distribution. Finally, the simulation shows that our method outperforms the traditional state-of-the-art methods.

* Already accepted by AAAI'17
Click to Read Paper and Get Code
Visual Question Answering (VQA) faces two major challenges: how to better fuse the visual and textual modalities and how to make the VQA model have the reasoning ability to answer more complex questions. In this paper, we address both challenges by proposing the novel Question Guided Modular Routing Networks (QGMRN). QGMRN can fuse the visual and textual modalities in multiple semantic levels which makes the fusion occur in a fine-grained way, it also can learn to reason by routing between the generic modules without additional supervision information or prior knowledge. The proposed QGMRN consists of three sub-networks: visual network, textual network and routing network. The routing network selectively executes each module in the visual network according to the pathway activated by the question features generated by the textual network. Experiments on the CLEVR dataset show that our model can outperform the state-of-the-art. Models and Codes will be released.

Click to Read Paper and Get Code
The rapid development of autonomous driving in recent years presents lots of challenges for scene understanding. As an essential step towards scene understanding, semantic segmentation thus received lots of attention in past few years. Although deep learning based state-of-the-arts have achieved great success in improving the segmentation accuracy, most of them suffer from an inefficiency problem and can hardly applied to practical applications. In this paper, we systematically analyze the computation cost of Convolutional Neural Network(CNN) and found that the inefficiency of CNN is mainly caused by its wide structure rather than the deep structure. In addition, the success of pruning based model compression methods proved that there are many redundant channels in CNN. Thus, we designed a very narrow while deep backbone network to improve the efficiency of semantic segmentation. By casting our network to FCN32 segmentation architecture, the basic structure of most segmentation methods, we achieved 60.6\% mIoU on Cityscape val dataset with 54 frame per seconds(FPS) on $1024\times2048$ inputs, which already outperforms one of the earliest real time deep learning based segmentation methods: ENet.

* 6pages
Click to Read Paper and Get Code
The ICDM Challenge 2013 is to apply machine learning to the problem of hotel ranking, aiming to maximize purchases according to given hotel characteristics, location attractiveness of hotels, user's aggregated purchase history and competitive online travel agency information for each potential hotel choice. This paper describes the solution of team "binghsu & MLRush & BrickMover". We conduct simple feature engineering work and train different models by each individual team member. Afterwards, we use listwise ensemble method to combine each model's output. Besides describing effective model and features, we will discuss about the lessons we learned while using deep learning in this competition.

* 6 pages, 3 figures
Click to Read Paper and Get Code
The recent research of facial expression recognition has made a lot of progress due to the development of deep learning technologies, but some typical challenging problems such as the variety of rich facial expressions and poses are still not resolved. To solve these problems, we develop a new Facial Expression Recognition (FER) framework by involving the facial poses into our image synthesizing and classification process. There are two major novelties in this work. First, we create a new facial expression dataset of more than 200k images with 119 persons, 4 poses and 54 expressions. To our knowledge this is the first dataset to label faces with subtle emotion changes for expression recognition purpose. It is also the first dataset that is large enough to validate the FER task on unbalanced poses, expressions, and zero-shot subject IDs. Second, we propose a facial pose generative adversarial network (FaPE-GAN) to synthesize new facial expression images to augment the data set for training purpose, and then learn a LightCNN based Fa-Net model for expression classification. Finally, we advocate four novel learning tasks on this dataset. The experimental results well validate the effectiveness of the proposed approach.

* 10 pages, 8 figures
Click to Read Paper and Get Code
Researches have shown difficulties in obtaining proximity while maintaining diversity for solving many-objective optimization problems (MaOPs). The complexities of the true Pareto Front (PF) pose serious challenges for the reference vector based algorithms for their insufficient adaptability to the characteristics of the true PF with no priori. This paper proposes a many-objective optimization Algorithm with two Interacting processes: cascade Clustering and reference point incremental Learning (CLIA). In the population selection process based on cascade clustering, using the reference vectors provided by the incremental learning process, the non-dominated and the dominated individuals are clustered and sorted with different manners in a cascade style and are selected by round-robin for better proximity and diversity. In the reference vector adaptation process based on reference point incremental learning, using the feedbacks from the clustering process, the proper distribution of reference points is gradually obtained by incremental learning and the reference vectors are accordingly repositioned. The advantages of CLIA lie not only in its effective and efficient performance, but also in the versatility to deal with diverse characteristics of the true PF, only using the interactions between the two processes without incurring extra evaluations. The experimental studies on many benchmark problems show that CLIA is competitive, efficient and versatile compared with the state-of-the-art algorithms.

* IEEE Transactions on Evolutionary Computation, 2018
* 15 pages
Click to Read Paper and Get Code
Bronchoscopy inspection as a follow-up procedure from the radiological imaging plays a key role in lung disease diagnosis and determining treatment plans for the patients. Doctors needs to make a decision whether to biopsy the patients timely when performing bronchoscopy. However, the doctors also needs to be very selective with biopsies as biopsies may cause uncontrollable bleeding of the lung tissue which is life-threaten. To help doctors to be more selective on biopsies and provide a second opinion on diagnosis, in this work, we propose a computer-aided diagnosis (CAD) system for lung diseases including cancers and tuberculosis (TB). The system is developed based on transfer learning. We propose a novel transfer learning method: sentential fine-tuning . Compared to traditional fine-tuning methods, our methods achieves the best performance. We obtained a overall accuracy of 77.0% a dataset of 81 normal cases, 76 tuberculosis cases and 277 lung cancer cases while the other traditional transfer learning methods achieve an accuracy of 73% and 68%. . The detection accuracy of our method for cancers, TB and normal cases are 87%, 54% and 91% respectively. This indicates that the CAD system has potential to improve lung disease diagnosis accuracy in bronchoscopy and it also might be used to be more selective with biopsies.

Click to Read Paper and Get Code