Research papers and code for "Jie An":
3D shape analysis is an important research topic in computer vision and graphics. While existing methods have generalized image-based deep learning to meshes using graph-based convolutions, the lack of an effective pooling operation restricts the learning capability of their networks. In this paper, we propose a novel pooling operation for mesh datasets with the same connectivity but different geometry, by building a mesh hierarchy using mesh simplification. For this purpose, we develop a modified mesh simplification method to avoid generating highly irregularly sized triangles. Our pooling operation effectively encodes the correspondence between coarser and finer meshes in the hierarchy. We then present a variational auto-encoder structure with the edge contraction pooling and graph-based convolutions, to explore probability latent spaces of 3D surfaces. Our network requires far fewer parameters than the original mesh VAE and thus can handle denser models thanks to our new pooling operation and convolutional kernels. Our evaluation also shows that our method has better generalization ability and is more reliable in various applications, including shape generation, shape interpolation and shape embedding.

Click to Read Paper and Get Code
This article proposes the axiomatizations of contingency logics of various natural classes of neighborhood frames. In particular, by defining a suitable canonical neighborhood function, we give sound and complete axiomatizations of monotone contingency logic and regular contingency logic, thereby answering two open questions raised by Bakhtiari, van Ditmarsch, and Hansen. The canonical function is inspired by a function proposed by Kuhn in~1995. We show that Kuhn's function is actually equal to a related function originally given by Humberstone.

* 18 pages. arXiv admin note: substantial text overlap with arXiv:1802.03516
Click to Read Paper and Get Code
In this paper, we propose a principled deep reinforcement learning (RL) approach that is able to accelerate the convergence rate of general deep neural networks (DNNs). With our approach, a deep RL agent (synonym for optimizer in this work) is used to automatically learn policies about how to schedule learning rates during the optimization of a DNN. The state features of the agent are learned from the weight statistics of the optimizee during training. The reward function of this agent is designed to learn policies that minimize the optimizee's training time given a certain performance goal. The actions of the agent correspond to changing the learning rate for the optimizee during training. As far as we know, this is the first attempt to use deep RL to learn how to optimize a large-sized DNN. We perform extensive experiments on a standard benchmark dataset and demonstrate the effectiveness of the policies learned by our approach.

* We choose to withdraw this paper. The DQN itself has too many hyperparameters, which makes it almost impossible to be applied to reasonably large datasets. In the later versions (from v4) with SGDR experiments, it seems that the agent only performs random actions
Click to Read Paper and Get Code
In this paper, we propose a sampling-based planning and optimal control method of nonlinear systems under non-differentiable constraints. Motivated by developing scalable planning algorithms, we consider the optimal motion plan to be a feedback controller that can be approximated by a weighted sum of given bases. Given this approximate optimal control formulation, our main contribution is to introduce importance sampling, specifically, model-reference adaptive search algorithm, to iteratively compute the optimal weight parameters, i.e., the weights corresponding to the optimal policy function approximation given chosen bases. The key idea is to perform the search by iteratively estimating a parametrized distribution which converges to a Dirac's Delta that infinitely peaks on the global optimal weights. Then, using this direct policy search, we incorporated trajectory-based verification to ensure that, for a class of nonlinear systems, the obtained policy is not only optimal but robust to bounded disturbances. The correctness and efficiency of the methods are demonstrated through numerical experiments including linear systems with a nonlinear cost function and motion planning for a Dubins car.

* submitted to IEEE ACC 2017
Click to Read Paper and Get Code
Image perceptual hashing finds applications in content indexing, large-scale image database management, certification and authentication and digital watermarking. We propose a Block-DCT and PCA based image perceptual hash in this article and explore the algorithm in the application of tamper detection. The main idea of the algorithm is to integrate color histogram and DCT coefficients of image blocks as perceptual feature, then to compress perceptual features as inter-feature with PCA, and to threshold to create a robust hash. The robustness and discrimination properties of the proposed algorithm are evaluated in detail. Our algorithms first construct a secondary image, derived from input image by pseudo-randomly extracting features that approximately capture semi-global geometric characteristics. From the secondary image (which does not perceptually resemble the input), we further extract the final features which can be used as a hash value (and can be further suitably quantized). In this paper, we use spectral matrix invariants as embodied by Singular Value Decomposition. Surprisingly, formation of the secondary image turns out be quite important since it not only introduces further robustness, but also enhances the security properties. Indeed, our experiments reveal that our hashing algorithms extract most of the geometric information from the images and hence are robust to severe perturbations (e.g. up to %50 cropping by area with 20 degree rotations) on images while avoiding misclassification. Experimental results show that the proposed image perceptual hash algorithm can effectively address the tamper detection problem with advantageous robustness and discrimination.

* 7 pages, 5 figrues
Click to Read Paper and Get Code
The fundamental assignment problem is in search of welfare maximization mechanisms to allocate items to agents when the private preferences over indivisible items are provided by self-interested agents. The mainstream mechanism \textit{Random Priority} is asymptotically the best mechanism for this purpose, when comparing its welfare to the optimal social welfare using the canonical \textit{worst-case approximation ratio}. Despite its popularity, the efficiency loss indicated by the worst-case ratio does not have a constant bound. Recently, [Deng, Gao, Zhang 2017] show that when the agents' preferences are drawn from a uniform distribution, its \textit{average-case approximation ratio} is upper bounded by 3.718. They left it as an open question of whether a constant ratio holds for general scenarios. In this paper, we offer an affirmative answer to this question by showing that the ratio is bounded by $1/\mu$ when the preference values are independent and identically distributed random variables, where $\mu$ is the expectation of the value distribution. This upper bound also improves the upper bound of 3.718 in [Deng, Gao, Zhang 2017] for the Uniform distribution. Moreover, under mild conditions, the ratio has a \textit{constant} bound for any independent random values. En route to these results, we develop powerful tools to show the insights that in most instances the efficiency loss is small.

* To appear in IJCAI 2019
Click to Read Paper and Get Code
Semi-supervised learning, which has emerged from the beginning of this century, is a new type of learning method between traditional supervised learning and unsupervised learning. The main idea of semi-supervised learning is to introduce unlabeled samples into the model training process to avoid performance (or model) degeneration due to insufficiency of labeled samples. Semi-supervised learning has been applied successfully in many fields. This paper reviews the development process and main theories of semi-supervised learning, as well as its recent advances and importance in solving real-world problems demonstrated by typical application examples.

* Chinese language, 14 pages
Click to Read Paper and Get Code
Nowadays, listening music has been and will always be an indispensable part of our daily life. In recent years, sentiment analysis of music has been widely used in the information retrieval systems, personalized recommendation systems and so on. Due to the development of deep learning, this paper commits to find an effective approach for mood tagging of Chinese song lyrics. To achieve this goal, both machine-learning and deep-learning models have been studied and compared. Eventually, a CNN-based model with pre-trained word embedding has been demonstrated to effectively extract the distribution of emotional features of Chinese lyrics, with at least 15 percentage points higher than traditional machine-learning methods (i.e. TF-IDF+SVM and LIWC+SVM), and 7 percentage points higher than other deep-learning models (i.e. RNN, LSTM). In this paper, more than 160,000 lyrics corpus has been leveraged for pre-training word embedding for mood tagging boost.

Click to Read Paper and Get Code
With rapid development of neural networks, deep-learning has been extended to various natural language generation fields, such as machine translation, dialogue generation and even literature creation. In this paper, we propose a theme-aware language generation model for Chinese music lyrics, which improves the theme-connectivity and coherence of generated paragraphs greatly. A multi-channel sequence-to-sequence (seq2seq) model encodes themes and previous sentences as global and local contextual information. Moreover, attention mechanism is incorporated for sequence decoding, enabling to fuse context into predicted next texts. To prepare appropriate train corpus, LDA (Latent Dirichlet Allocation) is applied for theme extraction. Generated lyrics is grammatically correct and semantically coherent with selected themes, which offers a valuable modelling method in other fields including multi-turn chatbots, long paragraph generation and etc.

Click to Read Paper and Get Code
This paper focuses on a robotic picking tasks in cluttered scenario. Because of the diversity of objects and clutter by placing, it is much difficult to recognize and estimate their pose before grasping. Here, we use U-net, a special Convolution Neural Networks (CNN), to combine RGB images and depth information to predict picking region without recognition and pose estimation. The efficiency of diverse visual input of the network were compared, including RGB, RGB-D and RGB-Points. And we found the RGB-Points input could get a precision of 95.74%.

* 5 pages, 6 figures
Click to Read Paper and Get Code
Human eyes concentrate different facial regions during distinct cognitive activities. We study utilising facial visual saliency maps to classify different facial expressions into different emotions. Our results show that our novel method of merely using facial saliency maps can achieve a descent accuracy of 65\%, much higher than the chance level of $1/7$. Furthermore, our approach is of semi-supervision, i.e., our facial saliency maps are generated from a general saliency prediction algorithm that is not explicitly designed for face images. We also discovered that the classification accuracies of each emotional class using saliency maps demonstrate a strong positive correlation with the accuracies produced by face images. Our work implies that humans may look at different facial areas in order to perceive different emotions.

Click to Read Paper and Get Code
With the growing prevalence of smart grid technology, short-term load forecasting (STLF) becomes particularly important in power system operations. There is a large collection of methods developed for STLF, but selecting a suitable method under varying conditions is still challenging. This paper develops a novel reinforcement learning based dynamic model selection (DMS) method for STLF. A forecasting model pool is first built, including ten state-of-the-art machine learning based forecasting models. Then a Q-learning agent learns the optimal policy of selecting the best forecasting model for the next time step, based on the model performance. The optimal DMS policy is applied to select the best model at each time step with a moving window. Numerical simulations on two-year load and weather data show that the Q-learning algorithm converges fast, resulting in effective and efficient DMS. The developed STLF model with Q-learning based DMS improves the forecasting accuracy by approximately 50%, compared to the state-of-the-art machine learning based STLF models.

Click to Read Paper and Get Code
We present Semantic WordRank (SWR), an unsupervised method for generating an extractive summary of a single document. Built on a weighted word graph with semantic and co-occurrence edges, SWR scores sentences using an article-structure-biased PageRank algorithm with a Softplus function adjustment, and promotes topic diversity using spectral subtopic clustering under the Word-Movers-Distance metric. We evaluate SWR on the DUC-02 and SummBank datasets and show that SWR produces better summaries than the state-of-the-art algorithms over DUC-02 under common ROUGE measures. We then show that, under the same measures over SummBank, SWR outperforms each of the three human annotators (aka. judges) and compares favorably with the combined performance of all judges.

* 12 pages, accepted by IDEAL2018
Click to Read Paper and Get Code
We propose a novel dependency-based hybrid tree model for semantic parsing, which converts natural language utterance into machine interpretable meaning representations. Unlike previous state-of-the-art models, the semantic information is interpreted as the latent dependency between the natural language words in our joint representation. Such dependency information can capture the interactions between the semantics and natural language words. We integrate a neural component into our model and propose an efficient dynamic-programming algorithm to perform tractable inference. Through extensive experiments on the standard multilingual GeoQuery dataset with eight languages, we demonstrate that our proposed approach is able to achieve state-of-the-art performance across several languages. Analysis also justifies the effectiveness of using our new dependency-based representation.

* Accepted by EMNLP 2018
Click to Read Paper and Get Code
With recent progress in algorithms and the availability of massive amounts of computation power, application of machine learning techniques is becoming a hot topic in the oil and gas industry. One of the most promising aspects to apply machine learning to the upstream field is the rock facies classification in reservoir characterization, which is crucial in determining the net pay thickness of reservoirs, thus a definitive factor in drilling decision making process. For complex machine learning tasks like facies classification, feature engineering is often critical. This paper shows the inclusion of physics-motivated feature interaction in feature augmentation can further improve the capability of machine learning in rock facies classification. We demonstrate this approach with the SEG 2016 machine learning contest dataset and the top winning algorithms. The improvement is roboust and can be $\sim5\%$ better than current existing best F-1 score, where F-1 is an evaluation metric used to quantify average prediction accuracy.

* 8 pages, 7 figures
Click to Read Paper and Get Code
Stochastic gradient descent (SGD), which dates back to the 1950s, is one of the most popular and effective approaches for performing stochastic optimization. Research on SGD resurged recently in machine learning for optimizing convex loss functions as well as training nonconvex deep neural networks. The theory assumes that one can easily compute an unbiased gradient estimator, which is usually the case due to the sample average nature of empirical risk minimization. There exist, however, many scenarios (e.g., graph learning) where an unbiased estimator may be as expensive to compute as the full gradient, because training examples are interconnected. In a recent work, Chen et al. (2018) proposed using a consistent gradient estimator as an economic alternative. Encouraged by empirical success, we show, in a general setting, that consistent estimators result in the same convergence behavior as do unbiased ones. Our analysis covers strongly convex, convex, and nonconvex objectives. This work opens several new research directions, including the development of more efficient SGD updates with consistent estimators and the design of efficient training algorithms for large-scale graphs.

Click to Read Paper and Get Code
We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.

* Accepted at ACL 2018 as Long paper
Click to Read Paper and Get Code
This paper is concerned with the hard thresholding operator which sets all but the $k$ largest absolute elements of a vector to zero. We establish a {\em tight} bound to quantitatively characterize the deviation of the thresholded solution from a given signal. Our theoretical result is universal in the sense that it holds for all choices of parameters, and the underlying analysis depends only on fundamental arguments in mathematical optimization. We discuss the implications for two domains: Compressed Sensing. On account of the crucial estimate, we bridge the connection between the restricted isometry property (RIP) and the sparsity parameter for a vast volume of hard thresholding based algorithms, which renders an improvement on the RIP condition especially when the true sparsity is unknown. This suggests that in essence, many more kinds of sensing matrices or fewer measurements are admissible for the data acquisition procedure. Machine Learning. In terms of large-scale machine learning, a significant yet challenging problem is learning accurate sparse models in an efficient manner. In stark contrast to prior work that attempted the $\ell_1$-relaxation for promoting sparsity, we present a novel stochastic algorithm which performs hard thresholding in each iteration, hence ensuring such parsimonious solutions. Equipped with the developed bound, we prove the {\em global linear convergence} for a number of prevalent statistical models under mild assumptions, even though the problem turns out to be non-convex.

* Journal of Machine Learning Research 18(208): 1-42, 2018
* V1 was submitted to COLT 2016. V2 fixes minor flaws, adds extra experiments and discusses time complexity, V3 has been accepted to JMLR
Click to Read Paper and Get Code
This paper describes NCRF++, a toolkit for neural sequence labeling. NCRF++ is designed for quick implementation of different neural sequence labeling models with a CRF inference layer. It provides users with an inference for building the custom model structure through configuration file with flexible neural feature design and utilization. Built on PyTorch, the core operations are calculated in batch, making the toolkit efficient with the acceleration of GPU. It also includes the implementations of most state-of-the-art neural sequence labeling models such as LSTM-CRF, facilitating reproducing and refinement on those methods.

* ACL 2018, demonstration paper
Click to Read Paper and Get Code