We present the KOKO system that takes declarative information extraction to a new level by incorporating advances in natural language processing techniques in its extraction language. KOKO is novel in that its extraction language simultaneously supports conditions on the surface of the text and on the structure of the dependency parse tree of sentences, thereby allowing for more refined extractions. KOKO also supports conditions that are forgiving to linguistic variation of expressing concepts and allows to aggregate evidence from the entire document in order to filter extractions. To scale up, KOKO exploits a multi-indexing scheme and heuristics for efficient extractions. We extensively evaluate KOKO over publicly available text corpora. We show that KOKO indices take up the smallest amount of space, are notably faster and more effective than a number of prior indexing schemes. Finally, we demonstrate KOKO's scale up on a corpus of 5 million Wikipedia articles. Click to Read Paper
We propose a simple algorithm to train stochastic neural networks to draw samples from given target distributions for probabilistic inference. Our method is based on iteratively adjusting the neural network parameters so that the output changes along a Stein variational gradient direction (Liu & Wang, 2016) that maximally decreases the KL divergence with the target distribution. Our method works for any target distribution specified by their unnormalized density function, and can train any black-box architectures that are differentiable in terms of the parameters we want to adapt. We demonstrate our method with a number of applications, including variational autoencoder (VAE) with expressive encoders to model complex latent space structures, and hyper-parameter learning of MCMC samplers that allows Bayesian inference to adaptively improve itself when seeing more data. Click to Read Paper
The eigendeomposition of nearest-neighbor (NN) graph Laplacian matrices is the main computational bottleneck in spectral clustering. In this work, we introduce a highly-scalable, spectrum-preserving graph sparsification algorithm that enables to build ultra-sparse NN (u-NN) graphs with guaranteed preservation of the original graph spectrums, such as the first few eigenvectors of the original graph Laplacian. Our approach can immediately lead to scalable spectral clustering of large data networks without sacrificing solution quality. The proposed method starts from constructing low-stretch spanning trees (LSSTs) from the original graphs, which is followed by iteratively recovering small portions of "spectrally critical" off-tree edges to the LSSTs by leveraging a spectral off-tree embedding scheme. To determine the suitable amount of off-tree edges to be recovered to the LSSTs, an eigenvalue stability checking scheme is proposed, which enables to robustly preserve the first few Laplacian eigenvectors within the sparsified graph. Additionally, an incremental graph densification scheme is proposed for identifying extra edges that have been missing in the original NN graphs but can still play important roles in spectral clustering tasks. Our experimental results for a variety of well-known data sets show that the proposed method can dramatically reduce the complexity of NN graphs, leading to significant speedups in spectral clustering. Click to Read Paper
Voxel is an important format to represent geometric data, which has been widely used for 3D deep learning in shape analysis due to its generalization ability and regular data format. However, fine-grained tasks like part segmentation require detailed structural information, which increases voxel resolution and thus causes other issues such as the exhaustion of computational resources. In this paper, we propose a novel volumetric convolutional neural network, which could extract discriminative features encoding detailed information from voxelized 3D data under a limited resolution. To this purpose, a spatial dense extraction (SDE) module is designed to preserve the spatial resolution during the feature extraction procedure, alleviating the loss of detail caused by sub-sampling operations such as max-pooling. An attention feature aggregation (AFA) module is also introduced to adaptively select informative features from different abstraction scales, leading to segmentation with both semantic consistency and high accuracy of details. Experiment results on the large-scale dataset demonstrate the effectiveness of our method in 3D shape part segmentation. Click to Read Paper
Given two possible treatments, there may exist subgroups who benefit greater from one treatment than the other. This problem is relevant to the field of marketing, where treatments may correspond to different ways of selling a product. It is similarly relevant to the field of public policy, where treatments may correspond to specific government programs. And finally, personalized medicine is a field wholly devoted to understanding which subgroups of individuals will benefit from particular medical treatments. We present a computationally fast tree-based method, ABtree, for treatment effect differentiation. Unlike other methods, ABtree specifically produces decision rules for optimal treatment assignment on a per-individual basis. The treatment choices are selected for maximizing the overall occurrence of a desired binary outcome, conditional on a set of covariates. In this poster, we present the methodology on tree growth and pruning, and show performance results when applied to simulated data as well as real data. Click to Read Paper
In this paper we study the problem of optimal layout of an offshore wind farm to minimize the wake effect impacts. Considering the specific requirements of concerned offshore wind farm, we propose an adaptive genetic algorithm (AGA) which introduces location swaps to replace random crossovers in conventional GAs. That way the total number of turbines in the resulting layout will be effectively kept to the initially specified value. We experiment the proposed AGA method on three cases with free wind speed of 12 m/s, 20 m/s, and a typical offshore wind distribution setting respectively. Numerical results verify the effectiveness of our proposed algorithm which achieves a much faster convergence compared to conventional GA algorithms. Click to Read Paper
We present CGO-AS, a generalized Ant System (AS) implemented in the framework of Cooperative Group Optimization (CGO), to show the leveraged optimization with a mixed individual and social learning. Ant colony is a simple yet efficient natural system for understanding the effects of primary intelligence on optimization. However, existing AS algorithms are mostly focusing on their capability of using social heuristic cues while ignoring their individual learning. CGO can integrate the advantages of a cooperative group and a low-level algorithm portfolio design, and the agents of CGO can explore both individual and social search. In CGO-AS, each ant (agent) is added with an individual memory, and is implemented with a novel search strategy to use individual and social cues in a controlled proportion. The presented CGO-AS is therefore especially useful in exposing the power of the mixed individual and social learning for improving optimization. The optimization performance is tested with instances of the Traveling Salesman Problem (TSP). The results prove that a cooperative ant group using both individual and social learning obtains a better performance than the systems solely using either individual or social learning. The best performance is achieved under the condition when agents use individual memory as their primary information source, and simultaneously use social memory as their searching guidance. In comparison with existing AS systems, CGO-AS retains a faster learning speed toward those higher-quality solutions, especially in the later learning cycles. The leverage in optimization by CGO-AS is highly possible due to its inherent feature of adaptively maintaining the population diversity in the individual memory of agents, and of accelerating the learning process with accumulated knowledge in the social memory. Click to Read Paper
In ECOC framework, the ternary coding strategy is widely deployed in coding process. It relabels classes with {"-1,0,1" }, where -1/1 means to assign the corresponding classes to the negative/positive group, and label 0 leads to ignore the corresponding classes in the training process. However, the application of hard labels may lose some information about the tendency of class distributions. Instead, we propose a Centroid distance-based Soft coding scheme to indicate such tendency, named as CSECOC. In our algorithm, Sequential Forward Floating Selection (SFFS) is applied to search an optimal class assignment by minimizing the ratio of intra-group and inter-group distance. In this way, a hard coding matrix is generated initially. Then we propose a measure, named as coverage, to describe the probability of a sample in a class falling to a correct group. The coverage of a class a group replace the corresponding hard element, so as to form a soft coding matrix. Compared with the hard ones, such soft elements can reflect the tendency of a class belonging to positive or negative group. Instead of classifiers, regressors are used as base learners in this algorithm. To the best of our knowledge, it is the first time that soft coding scheme has been proposed. The results on five UCI datasets show that compared with some state-of-art ECOC algorithms, our algorithm can produce comparable or better classification accuracy with small scale ensembles. Click to Read Paper
We propose to prune a random forest (RF) for resource-constrained prediction. We first construct a RF and then prune it to optimize expected feature cost & accuracy. We pose pruning RFs as a novel 0-1 integer program with linear constraints that encourages feature re-use. We establish total unimodularity of the constraint set to prove that the corresponding LP relaxation solves the original integer program. We then exploit connections to combinatorial optimization and develop an efficient primal-dual algorithm, scalable to large datasets. In contrast to our bottom-up approach, which benefits from good RF initialization, conventional methods are top-down acquiring features based on their utility value and is generally intractable, requiring heuristics. Empirically, our pruning algorithm outperforms existing state-of-the-art resource-constrained algorithms. Click to Read Paper
This paper addresses the issue on how to more effectively coordinate the depth with RGB aiming at boosting the performance of RGB-D object detection. Particularly, we investigate two primary ideas under the CNN model: property derivation and property fusion. Firstly, we propose that the depth can be utilized not only as a type of extra information besides RGB but also to derive more visual properties for comprehensively describing the objects of interest. So a two-stage learning framework consisting of property derivation and fusion is constructed. Here the properties can be derived either from the provided color/depth or their pairs (e.g. the geometry contour adopted in this paper). Secondly, we explore the fusion method of different properties in feature learning, which is boiled down to, under the CNN model, from which layer the properties should be fused together. The analysis shows that different semantic properties should be learned separately and combined before passing into the final classifier. Actually, such a detection way is in accordance with the mechanism of the primary neural cortex (V1) in brain. We experimentally evaluate the proposed method on the challenging dataset, and have achieved state-of-the-art performance. Click to Read Paper
The recurrent neural networks (RNN) can be used to solve the sequence to sequence problem, where both the input and the output have sequential structures. Usually there are some implicit relations between the structures. However, it is hard for the common RNN model to fully explore the relations between the sequences. In this survey, we introduce some attention based RNN models which can focus on different parts of the input for each output item, in order to explore and take advantage of the implicit relations between the input and the output items. The different attention mechanisms are described in detail. We then introduce some applications in computer vision which apply the attention based RNN models. The superiority of the attention based RNN model is shown by the experimental results. At last some future research directions are given. Click to Read Paper
We consider the problem of learning decision rules for prediction with feature budget constraint. In particular, we are interested in pruning an ensemble of decision trees to reduce expected feature cost while maintaining high prediction accuracy for any test example. We propose a novel 0-1 integer program formulation for ensemble pruning. Our pruning formulation is general - it takes any ensemble of decision trees as input. By explicitly accounting for feature-sharing across trees together with accuracy/cost trade-off, our method is able to significantly reduce feature cost by pruning subtrees that introduce more loss in terms of feature cost than benefit in terms of prediction accuracy gain. Theoretically, we prove that a linear programming relaxation produces the exact solution of the original integer program. This allows us to use efficient convex optimization tools to obtain an optimally pruned ensemble for any given budget. Empirically, we see that our pruning algorithm significantly improves the performance of the state of the art ensemble method BudgetRF. Click to Read Paper
We seek decision rules for prediction-time cost reduction, where complete data is available for training, but during prediction-time, each feature can only be acquired for an additional cost. We propose a novel random forest algorithm to minimize prediction error for a user-specified {\it average} feature acquisition budget. While random forests yield strong generalization performance, they do not explicitly account for feature costs and furthermore require low correlation among trees, which amplifies costs. Our random forest grows trees with low acquisition cost and high strength based on greedy minimax cost-weighted-impurity splits. Theoretically, we establish near-optimal acquisition cost guarantees for our algorithm. Empirically, on a number of benchmark datasets we demonstrate superior accuracy-cost curves against state-of-the-art prediction-time algorithms. Click to Read Paper
We propose novel methods for max-cost Discrete Function Evaluation Problem (DFEP) under budget constraints. We are motivated by applications such as clinical diagnosis where a patient is subjected to a sequence of (possibly expensive) tests before a decision is made. Our goal is to develop strategies for minimizing max-costs. The problem is known to be NP hard and greedy methods based on specialized impurity functions have been proposed. We develop a broad class of \emph{admissible} impurity functions that admit monomials, classes of polynomials, and hinge-loss functions that allow for flexible impurity design with provably optimal approximation bounds. This flexibility is important for datasets when max-cost can be overly sensitive to "outliers." Outliers bias max-cost to a few examples that require a large number of tests for classification. We design admissible functions that allow for accuracy-cost trade-off and result in $O(\log n)$ guarantees of the optimal cost among trees with corresponding classification accuracy levels. Click to Read Paper
Binary codes have been widely used in vision problems as a compact feature representation to achieve both space and time advantages. Various methods have been proposed to learn data-dependent hash functions which map a feature vector to a binary code. However, considerable data information is inevitably lost during the binarization step which also causes ambiguity in measuring sample similarity using Hamming distance. Besides, the learned hash functions cannot be changed after training, which makes them incapable of adapting to new data outside the training data set. To address both issues, in this paper we propose a flexible bitwise weight learning framework based on the binary codes obtained by state-of-the-art hashing methods, and incorporate the learned weights into the weighted Hamming distance computation. We then formulate the proposed framework as a ranking problem and leverage the Ranking SVM model to offline tackle the weight learning. The framework is further extended to an online mode which updates the weights at each time new data comes, thereby making it scalable to large and dynamic data sets. Extensive experimental results demonstrate significant performance gains of using binary codes with bitwise weighting in image retrieval tasks. It is appealing that the online weight learning leads to comparable accuracy with its offline counterpart, which thus makes our approach practical for realistic applications. Click to Read Paper
This paper describes our solution to the multi-modal learning challenge of ICML. This solution comprises constructing three-level representations in three consecutive stages and choosing correct tag words with a data-specific strategy. Firstly, we use typical methods to obtain level-1 representations. Each image is represented using MPEG-7 and gist descriptors with additional features released by the contest organizers. And the corresponding word tags are represented by bag-of-words model with a dictionary of 4000 words. Secondly, we learn the level-2 representations using two stacked RBMs for each modality. Thirdly, we propose a bimodal auto-encoder to learn the similarities/dissimilarities between the pairwise image-tags as level-3 representations. Finally, during the test phase, based on one observation of the dataset, we come up with a data-specific strategy to choose the correct tag words leading to a leap of an improved overall performance. Our final average accuracy on the private test set is 100%, which ranks the first place in this challenge. Click to Read Paper
Visual reasoning is a special visual question answering problem that is multi-step and compositional by nature, and also requires intensive text-vision interactions. We propose CMM: Cascaded Mutual Modulation as a novel end-to-end visual reasoning model. CMM includes a multi-step comprehension process for both question and image. In each step, we use a Feature-wise Linear Modulation (FiLM) technique to enable textual/visual pipeline to mutually control each other. Experiments show that CMM significantly outperforms most related models, and reach state-of-the-arts on two visual reasoning benchmarks: CLEVR and NLVR, collected from both synthetic and natural languages. Ablation studies confirm that both our multistep framework and our visual-guided language modulation are critical to the task. Our code is available at https://github.com/FlamingHorizon/CMM-VR. Click to Read Paper
Computer vision based fine-grained recognition has received great attention in recent years. Existing works focus on discriminative part localization and feature learning. In this paper, to improve the performance of fine-grained recognition, we try to precisely locate as many salient parts of object as possible at first. Then, we figure out the classification probability that can be obtained by using separate parts for object classification. Finally, through extracting efficient features from each part and combining them, then feeding to a classifier for recognition, an improved accuracy over state-of-art algorithms has been obtained on CUB200-2011 bird dataset. Click to Read Paper
The aim of sparse approximation is to estimate a sparse signal according to the measurement matrix and an observation vector. It is widely used in data analytics, image processing, and communication, etc. Up to now, a lot of research has been done in this area, and many off-the-shelf algorithms have been proposed. However, most of them cannot offer a real-time solution. To some extent, this shortcoming limits its application prospects. To address this issue, we devise a novel sparse approximation algorithm based on Lagrange programming neural network (LPNN), locally competitive algorithm (LCA), and projection theorem. LPNN and LCA are both analog neural network which can help us get a real-time solution. The non-differentiable objective function can be solved by the concept of LCA. Utilizing the projection theorem, we further modify the dynamics and proposed a new system with global asymptotic stability. Simulation results show that the proposed sparse approximation method has the real-time solutions with satisfactory MSEs. Click to Read Paper
In this paper, we propose a conceptually simple and geometrically interpretable objective function, i.e. additive margin Softmax (AM-Softmax), for deep face verification. In general, the face verification task can be viewed as a metric learning problem, so learning large-margin face features whose intra-class variation is small and inter-class difference is large is of great importance in order to achieve good performance. Recently, Large-margin Softmax and Angular Softmax have been proposed to incorporate the angular margin in a multiplicative manner. In this work, we introduce a novel additive angular margin for the Softmax loss, which is intuitively appealing and more interpretable than the existing works. We also emphasize and discuss the importance of feature normalization in the paper. Most importantly, our experiments on LFW BLUFR and MegaFace show that our additive margin softmax loss consistently performs better than the current state-of-the-art methods using the same network architecture and training dataset. Our code has also been made available at https://github.com/happynear/AMSoftmax Click to Read Paper