Models, code, and papers for "Han Qiu":
In this manuscript we consider the problem of jointly estimating multiple graphical models in high dimensions. We assume that the data are collected from n subjects, each of which consists of T possibly dependent observations. The graphical models of subjects vary, but are assumed to change smoothly corresponding to a measure of closeness between subjects. We propose a kernel based method for jointly estimating all graphical models. Theoretically, under a double asymptotic framework, where both (T,n) and the dimension d can increase, we provide the explicit rate of convergence in parameter estimation. It characterizes the strength one can borrow across different individuals and impact of data dependence on parameter estimation. Empirically, experiments on both synthetic and real resting state functional magnetic resonance imaging (rs-fMRI) data illustrate the effectiveness of the proposed method.
We propose an approximation algorithm for efficient correlation search in time series data. In our method, we use Fourier transform and neural network to embed time series into a low-dimensional Euclidean space. The given space is learned such that time series correlation can be effectively approximated from Euclidean distance between corresponding embedded vectors. Therefore, search for correlated time series can be done using an index in the embedding space for efficient nearest neighbor search. Our theoretical analysis illustrates that our method's accuracy can be guaranteed under certain regularity conditions. We further conduct experiments on real-world datasets and the results show that our method indeed outperforms the baseline solution. In particular, for approximation of correlation, our method reduces the approximation loss by a half in most test cases compared to the baseline solution. For top-$k$ highest correlation search, our method improves the precision from 5\% to 20\% while the query time is similar to the baseline approach query time.
Generating plausible hair image given limited guidance, such as sparse sketches or low-resolution image, has been made possible with the rise of Generative Adversarial Networks (GANs). Traditional image-to-image translation networks can generate recognizable results, but finer textures are usually lost and blur artifacts commonly exist. In this paper, we propose a two-phase generative model for high-quality hair image synthesis. The two-phase pipeline first generates a coarse image by an existing image translation model, then applies a re-generating network with self-enhancing capability to the coarse image. The self-enhancing capability is achieved by a proposed structure extraction layer, which extracts the texture and orientation map from a hair image. Extensive experiments on two tasks, Sketch2Hair and Hair Super-Resolution, demonstrate that our approach is able to synthesize plausible hair image with finer details, and outperforms the state-of-the-art.
With more encrypted network traffic gets involved in the Internet, how to effectively identify network traffic has become a top priority in the field. Accurate identification of the network traffic is the footstone of basic network services, say QoE, bandwidth allocation, and IDS. Previous identification methods either cannot deal with encrypted traffics or require experts to select tons of features to attain a relatively decent accuracy.In this paper, we present a Deep Learning based end-to-end network traffic identification framework, termed TEST, to avoid the aforementioned problems. CNN and LSTM are combined and implemented to help the machine automatically extract features from both special and time-related features of the raw traffic. The presented framework has two layers of structure, which made it possible to attain a remarkable accuracy on both encrypted traffic classification and intrusion detection tasks. The experimental results demonstrate that our model can outperform previous methods with a state-of-the-art accuracy of 99.98%.
Diabetic retinopathy is the most important complication of diabetes. Early diagnosis of retinal lesions helps to avoid visual loss or blindness. Due to high-resolution and small-size lesion regions, applying existing methods, such as U-Nets, to perform segmentation on fundus photography is very challenging. Although downsampling the input images could simplify the problem, it loses detailed information. Conducting patch-level analysis helps reaching fine-scale segmentation yet usually leads to misunderstanding as the lack of context information. In this paper, we propose an efficient network that combines them together, not only being aware of local details but also taking fully use of the context perceptions. This is implemented by integrating the decoder parts of a global-level U-net and a patch-level one. The two streams are jointly optimized, ensuring that they are enhanced mutually. Experimental results demonstrate our new framework significantly outperforms existing patch-based and global-based methods, especially when the lesion regions are scattered and small-scaled.
In this paper, we propose the first sketching system for interactively personalized and photorealistic face caricaturing. Input an image of a human face, the users can create caricature photos by manipulating its facial feature curves. Our system firstly performs exaggeration on the recovered 3D face model according to the edited sketches, which is conducted by assigning the laplacian of each vertex a scaling factor. To construct the mapping between 2D sketches and a vertex-wise scaling field, a novel deep learning architecture is developed. With the obtained 3D caricature model, two images are generated, one obtained by applying 2D warping guided by the underlying 3D mesh deformation and the other obtained by re-rendering the deformed 3D textured model. These two images are then seamlessly integrated to produce our final output. Due to the severely stretching of meshes, the rendered texture is of blurry appearances. A deep learning approach is exploited to infer the missing details for enhancing these blurry regions. Moreover, a relighting operation is invented to further improve the photorealism of the result. Both quantitative and qualitative experiment results validated the efficiency of our sketching system and the superiority of our proposed techniques against existing methods.
The recent research of facial expression recognition has made a lot of progress due to the development of deep learning technologies, but some typical challenging problems such as the variety of rich facial expressions and poses are still not resolved. To solve these problems, we develop a new Facial Expression Recognition (FER) framework by involving the facial poses into our image synthesizing and classification process. There are two major novelties in this work. First, we create a new facial expression dataset of more than 200k images with 119 persons, 4 poses and 54 expressions. To our knowledge this is the first dataset to label faces with subtle emotion changes for expression recognition purpose. It is also the first dataset that is large enough to validate the FER task on unbalanced poses, expressions, and zero-shot subject IDs. Second, we propose a facial pose generative adversarial network (FaPE-GAN) to synthesize new facial expression images to augment the data set for training purpose, and then learn a LightCNN based Fa-Net model for expression classification. Finally, we advocate four novel learning tasks on this dataset. The experimental results well validate the effectiveness of the proposed approach.
The neural seq2seq based question generation (QG) is prone to generating generic and undiversified questions that are poorly relevant to the given passage and target answer. In this paper, we propose two methods to address the issue. (1) By a partial copy mechanism, we prioritize words that are morphologically close to words in the input passage when generating questions; (2) By a QA-based reranker, from the n-best list of question candidates, we select questions that are preferred by both the QA and QG model. Experiments and analyses demonstrate that the proposed two methods substantially improve the relevance of generated questions to passages and answers.
Selecting input data or design points for statistical models has been of great interest in sequential design and active learning. In this paper, we present a new strategy of selecting the design points for a regression model when the underlying regression function is discontinuous. Two main motivating examples are (1) compressed material imaging with the purpose of accelerating the imaging speed and (2) design for regression analysis over a phase diagram in chemistry. In both examples, the underlying regression functions have discontinuities, so many of the existing design optimization approaches cannot be applied for the two examples because they mostly assume a continuous regression function. There are some studies for estimating a discontinuous regression function from its noisy observations, but all noisy observations are typically provided in advance in these studies. In this paper, we develop a design strategy of selecting the design points for regression analysis with discontinuities. We first review the existing approaches relevant to design optimization and active learning for regression analysis and discuss their limitations in handling a discontinuous regression function. We then present our novel design strategy for a regression analysis with discontinuities: some statistical properties with a fixed design will be presented first, and then these properties will be used to propose a new criterion of selecting the design points for the regression analysis. Sequential design of experiments with the new criterion will be presented with numerical examples.
The potential positive impact of autonomous driving and driver assistance technolo- gies have been a major impetus over the last decade. On the flip side, it has been a challenging problem to analyze the performance of human drivers or autonomous driving agents quantitatively. In this work, we propose a generic method that compares the performance of drivers or autonomous driving agents even if the environmental conditions are different, by using the driver behavioral advantage instead of absolute metrics, which efficiently removes the environmental factors. A concrete application of the method is also presented, where the performance of more than 100 truck drivers was evaluated and ranked in terms of fuel efficiency, covering more than 90,000 trips spanning an average of 300 miles in a variety of driving conditions and environments.
Optimization problems with uncertain fitness functions are common in the real world, and present unique challenges for evolutionary optimization approaches. Existing issues include excessively expensive evaluation, lack of solution reliability, and incapability in maintaining high overall fitness during optimization. Using conversion rate optimization as an example, this paper proposes a series of new techniques for addressing these issues. The main innovation is to augment evolutionary algorithms by allocating evaluation budget through multi-armed bandit algorithms. Experimental results demonstrate that multi-armed bandit algorithms can be used to allocate evaluations efficiently, select the winning solution reliably and increase overall fitness during exploration. The proposed methods can be generalized to any optimization problems with noisy fitness functions.
Most density-based clustering methods largely rely on how well the underlying density is estimated. However, density estimation itself is also a challenging problem, especially the determination of the kernel bandwidth. A large bandwidth could lead to the over-smoothed density estimation in which the number of density peaks could be less than the true clusters, while a small bandwidth could lead to the under-smoothed density estimation in which spurious density peaks, or called the "ripple noise", would be generated in the estimated density. In this paper, we propose a density-based hierarchical clustering method, called the Deep Nearest Neighbor Descent (D-NND), which could learn the underlying density structure layer by layer and capture the cluster structure at the same time. The over-smoothed density estimation could be largely avoided and the negative effect of the under-estimated cases could be also largely reduced. Overall, D-NND presents not only the strong capability of discovering the underlying cluster structure but also the remarkable reliability due to its insensitivity to parameters.
We present a dictionary learning approach to compensate for the transformation of faces due to changes in view point, illumination, resolution, etc. The key idea of our approach is to force domain-invariant sparse coding, i.e., design a consistent sparse representation of the same face in different domains. In this way, classifiers trained on the sparse codes in the source domain consisting of frontal faces for example can be applied to the target domain (consisting of faces in different poses, illumination conditions, etc) without much loss in recognition accuracy. The approach is to first learn a domain base dictionary, and then describe each domain shift (identity, pose, illumination) using a sparse representation over the base dictionary. The dictionary adapted to each domain is expressed as sparse linear combinations of the base dictionary. In the context of face recognition, with the proposed compositional dictionary approach, a face image can be decomposed into sparse representations for a given subject, pose and illumination respectively. This approach has three advantages: first, the extracted sparse representation for a subject is consistent across domains and enables pose and illumination insensitive face recognition. Second, sparse representations for pose and illumination can subsequently be used to estimate the pose and illumination condition of a face image. Finally, by composing sparse representations for subject and the different domains, we can also perform pose alignment and illumination normalization. Extensive experiments using two public face datasets are presented to demonstrate the effectiveness of our approach for face recognition.
Nowadays, data are generated massively and rapidly from scientific fields as bioinformatics, neuroscience and astronomy to business and engineering fields. Cluster analysis, as one of the major data analysis tools, is therefore more significant than ever. We propose in this work an effective Semi-supervised Divisive Clustering algorithm (SDC). Data points are first organized by a minimal spanning tree. Next, this tree structure is transitioned to the in-tree structure, and then divided into sub-trees under the supervision of the labeled data, and in the end, all points in the sub-trees are directly associated with specific cluster centers. SDC is fully automatic, non-iterative, involving no free parameter, insensitive to noise, able to detect irregularly shaped cluster structures, applicable to the data sets of high dimensionality and different attributes. The power of SDC is demonstrated on several datasets.
Social network analysis (SNA), which is a research field describing and modeling the social connection of a certain group of people, is popular among network services. Our topic words analysis project is a SNA method to visualize the topic words among emails from Obama.com to accounts registered in Columbus, Ohio. Based on Latent Dirichlet Allocation (LDA) model, a popular topic model of SNA, our project characterizes the preference of senders for target group of receptors. Gibbs sampling is used to estimate topic and word distribution. Our training and testing data are emails from the carbon-free server Datagreening.com. We use parallel computing tool BashReduce for word processing and generate related words under each latent topic to discovers typical information of political news sending specially to local Columbus receptors. Running on two instances using paralleling tool BashReduce, our project contributes almost 30% speedup processing the raw contents, comparing with processing contents on one instance locally. Also, the experimental result shows that the LDA model applied in our project provides precision rate 53.96% higher than TF-IDF model finding target words, on the condition that appropriate size of topic words list is selected.
A low-rank transformation learning framework for subspace clustering and classification is here proposed. Many high-dimensional data, such as face images and motion sequences, approximately lie in a union of low-dimensional subspaces. The corresponding subspace clustering problem has been extensively studied in the literature to partition such high-dimensional data into clusters corresponding to their underlying low-dimensional subspaces. However, low-dimensional intrinsic structures are often violated for real-world observations, as they can be corrupted by errors or deviate from ideal models. We propose to address this by learning a linear transformation on subspaces using matrix rank, via its convex surrogate nuclear norm, as the optimization criteria. The learned linear transformation restores a low-rank structure for data from the same subspace, and, at the same time, forces a a maximally separated structure for data from different subspaces. In this way, we reduce variations within subspaces, and increase separation between subspaces for a more robust subspace clustering. This proposed learned robust subspace clustering framework significantly enhances the performance of existing subspace clustering methods. Basic theoretical results here presented help to further support the underlying framework. To exploit the low-rank structures of the transformed subspaces, we further introduce a fast subspace clustering technique, which efficiently combines robust PCA with sparse modeling. When class labels are present at the training stage, we show this low-rank transformation framework also significantly enhances classification performance. Extensive experiments using public datasets are presented, showing that the proposed approach significantly outperforms state-of-the-art methods for subspace clustering and classification.
We propose a low-rank transformation-learning framework to robustify subspace clustering. Many high-dimensional data, such as face images and motion sequences, lie in a union of low-dimensional subspaces. The subspace clustering problem has been extensively studied in the literature to partition such high-dimensional data into clusters corresponding to their underlying low-dimensional subspaces. However, low-dimensional intrinsic structures are often violated for real-world observations, as they can be corrupted by errors or deviate from ideal models. We propose to address this by learning a linear transformation on subspaces using matrix rank, via its convex surrogate nuclear norm, as the optimization criteria. The learned linear transformation restores a low-rank structure for data from the same subspace, and, at the same time, forces a high-rank structure for data from different subspaces. In this way, we reduce variations within the subspaces, and increase separations between the subspaces for more accurate subspace clustering. This proposed learned robust subspace clustering framework significantly enhances the performance of existing subspace clustering methods. To exploit the low-rank structures of the transformed subspaces, we further introduce a subspace clustering technique, called Robust Sparse Subspace Clustering, which efficiently combines robust PCA with sparse modeling. We also discuss the online learning of the transformation, and learning of the transformation while simultaneously reducing the data dimensionality. Extensive experiments using public datasets are presented, showing that the proposed approach significantly outperforms state-of-the-art subspace clustering methods.
In the recent work of Candes et al, the problem of recovering low rank matrix corrupted by i.i.d. sparse outliers is studied and a very elegant solution, principal component pursuit, is proposed. It is motivated as a tool for video surveillance applications with the background image sequence forming the low rank part and the moving objects/persons/abnormalities forming the sparse part. Each image frame is treated as a column vector of the data matrix made up of a low rank matrix and a sparse corruption matrix. Principal component pursuit solves the problem under the assumptions that the singular vectors of the low rank matrix are spread out and the sparsity pattern of the sparse matrix is uniformly random. However, in practice, usually the sparsity pattern and the signal values of the sparse part (moving persons/objects) change in a correlated fashion over time, for e.g., the object moves slowly and/or with roughly constant velocity. This will often result in a low rank sparse matrix. For video surveillance applications, it would be much more useful to have a real-time solution. In this work, we study the online version of the above problem and propose a solution that automatically handles correlated sparse outliers. The key idea of this work is as follows. Given an initial estimate of the principal directions of the low rank part, we causally keep estimating the sparse part at each time by solving a noisy compressive sensing type problem. The principal directions of the low rank part are updated every-so-often. In between two update times, if new Principal Components' directions appear, the "noise" seen by the Compressive Sensing step may increase. This problem is solved, in part, by utilizing the time correlation model of the low rank part. We call the proposed solution "Real-time Robust Principal Components' Pursuit".
We revisit the initialization of deep residual networks (ResNets) by introducing a novel analytical tool in free probability to the community of deep learning. This tool deals with non-Hermitian random matrices, rather than their conventional Hermitian counterparts in the literature. As a consequence, this new tool enables us to evaluate the singular value spectrum of the input-output Jacobian of a fully- connected deep ResNet for both linear and nonlinear cases. With the powerful tool of free probability, we conduct an asymptotic analysis of the spectrum on the single-layer case, and then extend this analysis to the multi-layer case of an arbitrary number of layers. In particular, we propose to rescale the classical random initialization by the number of residual units, so that the spectrum has the order of $O(1)$, when compared with the large width and depth of the network. We empirically demonstrate that the proposed initialization scheme learns at a speed of orders of magnitudes faster than the classical ones, and thus attests a strong practical relevance of this investigation.
Kernel method is a very powerful tool in machine learning. The trick of kernel has been effectively and extensively applied in many areas of machine learning, such as support vector machine (SVM) and kernel principal component analysis (kernel PCA). Kernel trick is to define a kernel function which relies on the inner-product of data in the feature space without knowing these feature space data. In this paper, the kernel trick will be employed to extend the algorithm of spectrum sensing with leading eigenvector under the framework of PCA to a higher dimensional feature space. Namely, the leading eigenvector of the sample covariance matrix in the feature space is used for spectrum sensing without knowing the leading eigenvector explicitly. Spectrum sensing with leading eigenvector under the framework of kernel PCA is proposed with the inner-product as a measure of similarity. A modified kernel GLRT algorithm based on matched subspace model will be the first time applied to spectrum sensing. The experimental results on simulated sinusoidal signal show that spectrum sensing with kernel PCA is about 4 dB better than PCA, besides, kernel GLRT is also better than GLRT. The proposed algorithms are also tested on the measured DTV signal. The simulation results show that kernel methods are 4 dB better than the corresponding linear methods. The leading eigenvector of the sample covariance matrix learned by kernel PCA is more stable than that learned by PCA for different segments of DTV signal.