Models, code, and papers for "Li Xu":
Images have become one of the most popular types of media through which users convey their emotions within online social networks. Although vast amount of research is devoted to sentiment analysis of textual data, there has been very limited work that focuses on analyzing sentiment of image data. In this work, we propose a novel visual sentiment prediction framework that performs image understanding with Deep Convolutional Neural Networks (CNN). Specifically, the proposed sentiment prediction framework performs transfer learning from a CNN with millions of parameters, which is pre-trained on large-scale data for object recognition. Experiments conducted on two real-world datasets from Twitter and Tumblr demonstrate the effectiveness of the proposed visual sentiment analysis framework.
In this paper, we focus on the task of generating a pun sentence given a pair of word senses. A major challenge for pun generation is the lack of large-scale pun corpus to guide the supervised learning. To remedy this, we propose an adversarial generative network for pun generation (Pun-GAN), which does not require any pun corpus. It consists of a generator to produce pun sentences, and a discriminator to distinguish between the generated pun sentences and the real sentences with specific word senses. The output of the discriminator is then used as a reward to train the generator via reinforcement learning, encouraging it to produce pun sentences that can support two word senses simultaneously. Experiments show that the proposed Pun-GAN can generate sentences that are more ambiguous and diverse in both automatic and human evaluation.
Sparse representation based classification (SRC) methods have achieved remarkable results. SRC, however, still suffer from requiring enough training samples, insufficient use of test samples and instability of representation. In this paper, a stable inverse projection representation based classification (IPRC) is presented to tackle these problems by effectively using test samples. An IPR is firstly proposed and its feasibility and stability are analyzed. A classification criterion named category contribution rate is constructed to match the IPR and complete classification. Moreover, a statistical measure is introduced to quantify the stability of representation-based classification methods. Based on the IPRC technique, a robust tumor recognition framework is presented by interpreting microarray gene expression data, where a two-stage hybrid gene selection method is introduced to select informative genes. Finally, the functional analysis of candidate's pathogenicity-related genes is given. Extensive experiments on six public tumor microarray gene expression datasets demonstrate the proposed technique is competitive with state-of-the-art methods.
Segmentation of colorectal cancerous regions from Magnetic Resonance (MR) images is a crucial procedure for radiotherapy which conventionally requires accurate delineation of tumour boundaries at an expense of labor, time and reproducibility. To address this important yet challenging task within the framework of performance-leading deep learning methods, regions of interest (RoIs) localization from large whole volume 3D images serves as a preceding operation that brings about multiple benefits in terms of speed, target completeness and reduction of false positives. Distinct from sliding window or discrete localization-segmentation based models, we propose a novel multi-task framework referred to as 3D RoI-aware U-Net (3D RU-Net), for RoI localization and intra-RoI segmentation where the two tasks share one backbone encoder network. With the region proposals from the encoder, we crop multi-level feature maps from the backbone network to form a GPU memory-efficient decoder for detail-preserving intra-RoI segmentation. To effectively train the model, we designed a Dice formulated loss function for the global-to-local multi-task learning procedure. Based on the promising efficiency gains demonstrated by the proposed method, we went on to ensemble multiple models to achieve even higher performance costing minor extra computational expensiveness. Extensive experiments were subsequently conducted on 64 cancerous cases with a four-fold cross-validation, and the results showed significant superiority in terms of accuracy and efficiency over conventional state-of-the art frameworks. In conclusion, the proposed method has a huge potential for extension to other 3D object segmentation tasks from medical images due to its inherent generalizability. The code for the proposed method is publicly available.
Link prediction is critical for the application of incomplete knowledge graph (KG) in the downstream tasks. As a family of effective approaches for link predictions, embedding methods try to learn low-rank representations for both entities and relations such that the bilinear form defined therein is a well-behaved scoring function. Despite of their successful performances, existing bilinear forms overlook the modeling of relation compositions, resulting in lacks of interpretability for reasoning on KG. To fulfill this gap, we propose a new model called DihEdral, named after dihedral symmetry group. This new model learns knowledge graph embeddings that can capture relation compositions by nature. Furthermore, our approach models the relation embeddings parametrized by discrete values, thereby decrease the solution space drastically. Our experiments show that DihEdral is able to capture all desired properties such as (skew-) symmetry, inversion and (non-) Abelian composition, and outperforms existing bilinear form based approach and is comparable to or better than deep learning models such as ConvE.
In this work, we address the task of weakly-supervised human action segmentation in long, untrimmed videos. Recent methods have relied on expensive learning models, such as Recurrent Neural Networks (RNN) and Hidden Markov Models (HMM). However, these methods suffer from expensive computational cost, thus are unable to be deployed in large scale. To overcome the limitations, the keys to our design are efficiency and scalability. We propose a novel action modeling framework, which consists of a new temporal convolutional network, named Temporal Convolutional Feature Pyramid Network (TCFPN), for predicting frame-wise action labels, and a novel training strategy for weakly-supervised sequence modeling, named Iterative Soft Boundary Assignment (ISBA), to align action sequences and update the network in an iterative fashion. The proposed framework is evaluated on two benchmark datasets, Breakfast and Hollywood Extended, with four different evaluation metrics. Extensive experimental results show that our methods achieve competitive or superior performance to state-of-the-art methods.
Neural segmentation has a great impact on the smooth implementation of local anesthesia surgery. At present, the network for the segmentation includes U-NET  and SegNet . U-NET network has short training time and less training parameters, but the depth is not deep enough. SegNet network has deeper structure, but it needs longer training time, and more training samples. In this paper, we propose an improved U-NET neural network for the segmentation. This network deepens the original structure through importing residual network. Compared with U-NET and SegNet, the improved U-NET network has fewer training parameters, shorter training time and get a great improvement in segmentation effect. The improved U-NET network structure has a good application scene in neural segmentation.
Action segmentation as a milestone towards building automatic systems to understand untrimmed videos has received considerable attention in the recent years. It is typically being modeled as a sequence labeling problem but contains intrinsic and sufficient differences than text parsing or speech processing. In this paper, we introduce a novel hybrid temporal convolutional and recurrent network (TricorNet), which has an encoder-decoder architecture: the encoder consists of a hierarchy of temporal convolutional kernels that capture the local motion changes of different actions; the decoder is a hierarchy of recurrent neural networks that are able to learn and memorize long-term action dependencies after the encoding stage. Our model is simple but extremely effective in terms of video sequence labeling. The experimental results on three public action segmentation datasets have shown that the proposed model achieves superior performance over the state of the art.
This paper first analyzes the resolution complexity of two random CSP models (i.e. Model RB/RD) for which we can establish the existence of phase transitions and identify the threshold points exactly. By encoding CSPs into CNF formulas, it is proved that almost all instances of Model RB/RD have no tree-like resolution proofs of less than exponential size. Thus, we not only introduce new families of CNF formulas hard for resolution, which is a central task of Proof-Complexity theory, but also propose models with both many hard instances and exact phase transitions. Then, the implications of such models are addressed. It is shown both theoretically and experimentally that an application of Model RB/RD might be in the generation of hard satisfiable instances, which is not only of practical importance but also related to some open problems in cryptography such as generating one-way functions. Subsequently, a further theoretical support for the generation method is shown by establishing exponential lower bounds on the complexity of solving random satisfiable and forced satisfiable instances of RB/RD near the threshold. Finally, conclusions are presented, as well as a detailed comparison of Model RB/RD with the Hamiltonian cycle problem and random 3-SAT, which, respectively, exhibit three different kinds of phase transition behavior in NP-complete problems.
To study the structure of solutions for random k-SAT and random CSPs, this paper introduces the concept of average similarity degree to characterize how solutions are similar to each other. It is proved that under certain conditions, as r (i.e. the ratio of constraints to variables) increases, the limit of average similarity degree when the number of variables approaches infinity exhibits phase transitions at a threshold point, shifting from a smaller value to a larger value abruptly. For random k-SAT this phenomenon will occur when k>4 . It is further shown that this threshold point is also a singular point with respect to r in the asymptotic estimate of the second moment of the number of solutions. Finally, we discuss how this work is helpful to understand the hardness of solving random instances and a possible application of it to the design of search algorithms.
Phase transition is an important feature of SAT problem. For random k-SAT model, it is proved that as r (ratio of clauses to variables) increases, the structure of solutions will undergo a sudden change like satisfiability phase transition when r reaches a threshold point. This phenomenon shows that the satisfying truth assignments suddenly shift from being relatively different from each other to being very similar to each other.
In this paper we propose a random CSP model, called Model GB, which is a natural generalization of standard Model B. It is proved that Model GB in which each constraint is easy to satisfy exhibits non-trivial behaviour (not trivially satisfiable or unsatisfiable) as the number of variables approaches infinity. A detailed analysis to obtain an asymptotic estimate (good to 1+o(1)) of the average number of nodes in a search tree used by the backtracking algorithm on Model GB is also presented. It is shown that the average number of nodes required for finding all solutions or proving that no solution exists grows exponentially with the number of variables. So this model might be an interesting distribution for studying the nature of hard instances and evaluating the performance of CSP algorithms. In addition, we further investigate the behaviour of the average number of nodes as r (the ratio of constraints to variables) varies. The results indicate that as r increases, random CSP instances get easier and easier to solve, and the base for the average number of nodes that is exponential in r tends to 1 as r approaches infinity. Therefore, although the average number of nodes used by the backtracking algorithm on random CSP is exponential, many CSP instances will be very easy to solve when r is sufficiently large.
In this paper we propose a new type of random CSP model, called Model RB, which is a revision to the standard Model B. It is proved that phase transitions from a region where almost all problems are satisfiable to a region where almost all problems are unsatisfiable do exist for Model RB as the number of variables approaches infinity. Moreover, the critical values at which the phase transitions occur are also known exactly. By relating the hardness of Model RB to Model B, it is shown that there exist a lot of hard instances in Model RB.
Answer selection is an important subtask of question answering (QA), where deep models usually achieve better performance. Most deep models adopt question-answer interaction mechanisms, such as attention, to get vector representations for answers. When these interaction based deep models are deployed for online prediction, the representations of all answers need to be recalculated for each question. This procedure is time-consuming for deep models with complex encoders like BERT which usually have better accuracy than simple encoders. One possible solution is to store the matrix representation (encoder output) of each answer in memory to avoid recalculation. But this will bring large memory cost. In this paper, we propose a novel method, called hashing based answer selection (HAS), to tackle this problem. HAS adopts a hashing strategy to learn a binary matrix representation for each answer, which can dramatically reduce the memory cost for storing the matrix representations of answers. Hence, HAS can adopt complex encoders like BERT in the model, but the online prediction of HAS is still fast with a low memory cost. Experimental results on three popular answer selection datasets show that HAS can outperform existing models to achieve state-of-the-art performance.
Recently, bidirectional recurrent neural network (BRNN) has been widely used for question answering (QA) tasks with promising performance. However, most existing BRNN models extract the information of questions and answers by directly using a pooling operation to generate the representation for loss or similarity calculation. Hence, these existing models don't put supervision (loss or similarity calculation) at every time step, which will lose some useful information. In this paper, we propose a novel BRNN model called full-time supervision based BRNN (FTS-BRNN), which can put supervision at every time step. Experiments on the factoid QA task show that our FTS-BRNN can outperform other baselines to achieve the state-of-the-art accuracy.
We recently have witnessed many ground-breaking results in machine learning and computer vision, generated by using deep convolutional neural networks (CNN). While the success mainly stems from the large volume of training data and the deep network architectures, the vector processing hardware (e.g. GPU) undisputedly plays a vital role in modern CNN implementations to support massive computation. Though much attention was paid in the extent literature to understand the algorithmic side of deep CNN, little research was dedicated to the vectorization for scaling up CNNs. In this paper, we studied the vectorization process of key building blocks in deep CNNs, in order to better understand and facilitate parallel implementation. Key steps in training and testing deep CNNs are abstracted as matrix and vector operators, upon which parallelism can be easily achieved. We developed and compared six implementations with various degrees of vectorization with which we illustrated the impact of vectorization on the speed of model training and testing. Besides, a unified CNN framework for both high-level and low-level vision tasks is provided, along with a vectorized Matlab implementation with state-of-the-art speed performance.
Deep learning achieves excellent performance in many domains, so we not only apply it to the navel orange semantic segmentation task to solve the two problems of distinguishing defect categories and identifying the stem end and blossom end, but also propose a fastidious attention mechanism to further improve model performance. This lightweight attention mechanism includes two learnable parameters, activations and thresholds, to capture long-range dependence. Specifically, the threshold picks out part of the spatial feature map and the activation excite this area. Based on activations and thresholds training from different types of feature maps, we design fastidious self-attention module (FSAM) and fastidious inter-attention module (FIAM). And then construct the Fastidious Attention Network (FANet), which uses U-Net as the backbone and embeds these two modules, to solve the problems with semantic segmentation for stem end, blossom end, flaw and ulcer. Compared with some state-of-the-art deep-learning-based networks under our navel orange dataset, experiments show that our network is the best performance with pixel accuracy 99.105%, mean accuracy 77.468%, mean IU 70.375% and frequency weighted IU 98.335%. And embedded modules show better discrimination of 5 categories including background, especially the IU of flaw is increased by 3.165%.
We propose a fast potential splitting Markov Chain Monte Carlo method which costs $O(1)$ time each step for sampling from equilibrium distributions (Gibbs measures) corresponding to particle systems with singular interacting kernels. We decompose the interacting potential into two parts, one is of long range but is smooth, and the other one is of short range but may be singular. To displace a particle, we first evolve a selected particle using the stochastic differential equation (SDE) under the smooth part with the idea of random batches, as commonly used in stochastic gradient Langevin dynamics. Then, we use the short range part to do a Metropolis rejection. Different from the classical Langevin dynamics, we only run the SDE dynamics with random batch for a short duration of time so that the cost in the first step is $O(p)$, where $p$ is the batch size. The cost of the rejection step is $O(1)$ since the interaction used is of short range. We justify the proposed random-batch Monte Carlo method, which combines the random batch and splitting strategies, both in theory and with numerical experiments. While giving comparable results for typical examples of the Dyson Brownian motion and Lennard-Jones fluids, our method can save more time when compared to the classical Metropolis-Hastings algorithm.
Training generative models that can generate high-quality text with sufficient diversity is an important open problem for Natural Language Generation (NLG) community. Recently, generative adversarial models have been applied extensively on text generation tasks, where the adversarially trained generators alleviate the exposure bias experienced by conventional maximum likelihood approaches and result in promising generation quality. However, due to the notorious defect of mode collapse for adversarial training, the adversarially trained generators face a quality-diversity trade-off, i.e., the generator models tend to sacrifice generation diversity severely for increasing generation quality. In this paper, we propose a novel approach which aims to improve the performance of adversarial text generation via efficiently decelerating mode collapse of the adversarial training. To this end, we introduce a cooperative training paradigm, where a language model is cooperatively trained with the generator and we utilize the language model to efficiently shape the data distribution of the generator against mode collapse. Moreover, instead of engaging the cooperative update for the generator in a principled way, we formulate a meta learning mechanism, where the cooperative update to the generator serves as a high level meta task, with an intuition of ensuring the parameters of the generator after the adversarial update would stay resistant against mode collapse. In the experiment, we demonstrate our proposed approach can efficiently slow down the pace of mode collapse for the adversarial text generators. Overall, our proposed method is able to outperform the baseline approaches with significant margins in terms of both generation quality and diversity in the testified domains.
We present a novel approach to category-level 6D object pose and size estimation. To tackle intra-class shape variation, we learn canonical shape space (CASS), a unified representation for a large variety of instances of a certain object category. In particular, CASS is modeled as the latent space of a deep generative model of canonical 3D shapes with normalized pose and size. We train a variational auto-encoder (VAE) for generating 3D point clouds in the canonical space from an RGBD image. The VAE is trained in a cross-category fashion, exploiting the publicly available large 3D shape repositories. Since the 3D point cloud is generated in normalized pose and size, the encoder of the VAE learns view-factorized RGBD embedding. It maps an RGBD image in arbitrary view into a pose-independent 3D shape representation. Object pose and size are then estimated via contrasting it with a pose-dependent feature of the input RGBD extracted with a separate deep neural networks. We integrate the learning of CASS and pose and size estimation into an end-to-end trainable network, achieving the state-of-the-art pose and size accuracy.