Models, code, and papers for "Bin Liu":
In computer vision, an entity such as an image or video is often represented as a set of instance vectors, which can be SIFT, motion, or deep learning feature vectors extracted from different parts of that entity. Thus, it is essential to design efficient and effective methods to compare two sets of instance vectors. Existing methods such as FV, VLAD or Super Vectors have achieved excellent results. However, this paper shows that these methods are designed based on a generative perspective, and a discriminative method can be more effective in categorizing images or videos. The proposed D3 (discriminative distribution distance) method effectively compares two sets as two distributions, and proposes a directional total variation distance (DTVD) to measure how separated are they. Furthermore, a robust classifier-based method is proposed to estimate DTVD robustly. The D3 method is evaluated in action and image recognition tasks and has achieved excellent accuracy and speed. D3 also has a synergy with FV. The combination of D3 and FV has advantages over D3, FV, and VLAD.
Bayesian optimization (BO) is a powerful paradigm for derivative-free global optimization of a black-box objective function (BOF) that is expensive to evaluate. However, the overhead of BO can still be prohibitive if the maximum number of allowed function evaluations is less than required. In this paper, we investigate how to reduce the required number of function evaluations for BO without compromise in solution quality. We explore the idea of posterior regularization for harnessing low fidelity (LF) data within the Gaussian process upper confidence bound (GP-UCB) framework. The LF data are assumed to arise from previous evaluations of an LF approximation of the BOF. An extra GP expert called LF-GP is trained to fit the LF data. We develop a dynamic weighted product of experts (DW-POE) fusion operator. The regularization is induced from this operator on the posterior of the BOF. The impact of the LF-GP expert on the resulting regularized posterior is adaptively adjusted via Bayesian formalism. Extensive experimental results on benchmark BOF optimization tasks demonstrate the superior performance of the proposed algorithm over state-of-the-art.
This contribution presents a very brief and critical discussion on automated machine learning (AutoML), which is categorized here into two classes, referred to as narrow AutoML and generalized AutoML, respectively. The conclusions yielded from this discussion can be summarized as follows: (1) most existent research on AutoML belongs to the class of narrow AutoML; (2) advances in narrow AutoML are mainly motivated by commercial needs, while any possible benefit obtained is definitely at a cost of increase in computing burdens; (3)the concept of generalized AutoML has a strong tie in spirit with artificial general intelligence (AGI), also called "strong AI", for which obstacles abound for obtaining pivotal progresses.
There is a recent interest in developing statistical filtering methods for stochastic optimization (FSO) by leveraging a probabilistic perspective of the incremental proximity methods (IPMs). The existent FSO methods are derived based on the Kalman filter (KF) and extended KF (EKF). Different with classical stochastic optimization methods such as the stochastic gradient descent (SGD) and typical IPMs, such KF-type algorithms possess a desirable property, namely they do not require pre-scheduling of the learning rate for convergence. However, on the other side, they have inherent limitations inherited from the nature of KF mechanisms. It is a consensus that the class of particle filters (PFs) outperforms the KF and its variants remarkably for nonlinear and/or non-Gaussian statistical filtering tasks. Hence, it is natural to ask if the FSO methods can benefit from the PF theory to get around of the limitations of the KF-type IPMs. We provide an affirmative answer to the aforementioned question by developing three PF based SO (PFSO) algorithms. We also provide a discussion of relationships among (1) PF methods designed for stochastic dynamic filtering; (2) PF methods designed for static parameter estimation; and (3) our PFSO algorithms. For performance evaluation, we apply the proposed algorithms to solve a least-square fitting problem using simulated dataset, and the empirical risk minimization (ERM) problem in binary classification using real datasets. The experimental results demonstrate that our algorithms outperform remarkably existent methods in terms of numerical stability, convergence speed, classification error rate and flexibility in handling different types of models and loss functions.
Class-imbalance is an inherent characteristic of multi-label data which affects the prediction accuracy of most multi-label learning methods. One efficient strategy to deal with this problem is to employ resampling techniques before training the classifier. Existing multilabel sampling methods alleviate the (global) imbalance of multi-label datasets. However, performance degradation is mainly due to rare subconcepts and overlapping of classes that could be analysed by looking at the local characteristics of the minority examples, rather than the imbalance of the whole dataset. We propose a new method for synthetic oversampling of multi-label data that focuses on local label distribution to generate more diverse and better labeled instances. Experimental results on 13 multi-label datasets demonstrate the effectiveness of the proposed approach in a variety of evaluation measures, particularly in the case of an ensemble of classifiers trained on repeated samples of the original data.
Class imbalance is an intrinsic characteristic of multi-label data. Most of the labels in multi-label data sets are associated with a small number of training examples, much smaller compared to the size of the data set. Class imbalance poses a key challenge that plagues most multi-label learning methods. Ensemble of Classifier Chains (ECC), one of the most prominent multi-label learning methods, is no exception to this rule, as each of the binary models it builds is trained from all positive and negative examples of a label. To make ECC resilient to class imbalance, we first couple it with random undersampling. We then present two extensions of this basic approach, where we build a varying number of binary models per label and construct chains of different sizes, in order to improve the exploitation of majority examples with approximately the same computational budget. Experimental results on 16 multi-label datasets demonstrate the effectiveness of the proposed approaches in a variety of evaluation metrics.
This paper is concerned with a recently developed paradigm for population-based optimization, termed particle filter optimization (PFO). This paradigm is attractive in terms of coherence in theory and easiness in mathematical analysis and interpretation. Current PFO algorithms only work for single-objective optimization cases, while many real-life problems involve multiple objectives to be optimized simultaneously. To this end, we make an effort to extend the scope of application of the PFO paradigm to multi-objective optimization (MOO) cases. An idea called path sampling is adopted within the PFO scheme to balance the different objectives to be optimized. The resulting algorithm is thus termed PFO with Path Sampling (PFOPS). The validity of the presented algorithm is assessed based on three benchmark MOO experiments, in which the shapes of the Pareto fronts are convex, concave and discontinuous, respectively.
In this article, we are concerned with tracking an object of interest in video stream. We propose an algorithm that is robust against occlusion, the presence of confusing colors, abrupt changes in the object feature space and changes in object size. We develop the algorithm within a Bayesian modeling framework. The state space model is used for capturing the temporal correlation in the sequence of frame images by modeling the underlying dynamics of the tracking system. The Bayesian model averaging (BMA) strategy is proposed for fusing multi-clue information in the observations. Any number of object features are allowed to be involved in the proposed framework. Every feature represents one source of information to be fused and is associated with an observation model. The state inference is performed by employing the particle filter methods. In comparison with related approaches, the BMA based tracker is shown to have robustness, expressivity, and comprehensibility.
A robust algorithm solution is proposed for tracking an object in complex video scenes. In this solution, the bootstrap particle filter (PF) is initialized by an object detector, which models the time-evolving background of the video signal by an adaptive Gaussian mixture. The motion of the object is expressed by a Markov model, which defines the state transition prior. The color and texture features are used to represent the object, and a marginal likelihood based feature fusion approach is proposed. A corresponding object template model updating procedure is developed to account for possible scale changes of the object in the tracking process. Experimental results show that our algorithm beats several existing alternatives in tackling challenging scenarios in video tracking tasks.
AxialGen is a research prototype for automatically generating the axial map, which consists of the least number of the longest visibility lines (or axial lines) for representing individual linearly stretched parts of open space of an urban environment. Open space is the space between closed spaces such as buildings and street blocks. This paper aims to provide an accessible guide to software AxialGen, and the underlying concepts and ideas. We concentrate on the explanation and illustration of the key concept of bucket: its definition, formation and how it is used in generating the axial map. Keywords: Bucket, visibility, medial axes, axial lines, isovists, axial map
Axial lines are defined as the longest visibility lines for representing individual linear spaces in urban environments. The least number of axial lines that cover the free space of an urban environment or the space between buildings constitute what is often called an axial map. This is a fundamental tool in space syntax, a theory developed by Bill Hillier and his colleagues for characterizing the underlying urban morphologies. For a long time, generating axial lines with help of some graphic software has been a tedious manual process that is criticized for being time consuming, subjective, or even arbitrary. In this paper, we redefine axial lines as the least number of individual straight line segments mutually intersected along natural streets that are generated from street center lines using the Gestalt principle of good continuity. Based on this new definition, we develop an automatic solution to generating the newly defined axial lines from street center lines. We apply this solution to six typical street networks (three from North America and three from Europe), and generate a new set of axial lines for analyzing the urban morphologies. Through a comparison study between the new axial lines and the conventional or old axial lines, and between the new axial lines and natural streets, we demonstrate with empirical evidence that the newly defined axial lines are a better alternative in capturing the underlying urban structure. Keywords: Space syntax, street networks, topological analysis, traffic, head/tail division rule
Based on the concepts of isovists and medial axes, we developed a set of algorithms that can automatically generate axial lines for representing individual linearly stretched parts of open space of an urban environment. Open space is the space between buildings, where people can freely move around. The generation of the axial lines has been a key aspect of space syntax research, conventionally relying on hand-drawn axial lines of an urban environment, often called axial map, for urban morphological analysis. Although various attempts have been made towards an automatic solution, few of them can produce the axial map that consists of the least number of longest visibility lines, and none of them really works for different urban environments. Our algorithms provide a better solution than existing ones. Throughout this paper, we have also argued and demonstrated that the axial lines constitute a true skeleton, superior to medial axes, in capturing what we perceive about the urban environment. Keywords: Visibility, space syntax, topological analysis, medial axes, axial lines, isovists
Detection of arbitrarily rotated objects is a challenging task due to the difficulties of locating the multi-angle objects and separating them effectively from the background. The existing methods are not robust to angle varies of the objects because of the use of traditional bounding box, which is a rotation variant structure for locating rotated objects. In this article, a new detection method is proposed which applies the newly defined rotatable bounding box (RBox). The proposed detector (DRBox) can effectively handle the situation where the orientation angles of the objects are arbitrary. The training of DRBox forces the detection networks to learn the correct orientation angle of the objects, so that the rotation invariant property can be achieved. DRBox is tested to detect vehicles, ships and airplanes on satellite images, compared with Faster R-CNN and SSD, which are chosen as the benchmark of the traditional bounding box based methods. The results shows that DRBox performs much better than traditional bounding box based methods do on the given tasks, and is more robust against rotation of input image and target objects. Besides, results show that DRBox correctly outputs the orientation angles of the objects, which is very useful for locating multi-angle objects efficiently. The code and models are available at https://github.com/liulei01/DRBox.
This paper addresses maximum likelihood (ML) estimation based model fitting in the context of extrasolar planet detection. This problem is featured by the following properties: 1) the candidate models under consideration are highly nonlinear; 2) the likelihood surface has a huge number of peaks; 3) the parameter space ranges in size from a few to dozens of dimensions. These properties make the ML search a very challenging problem, as it lacks any analytical or gradient based searching solution to explore the parameter space. A population based searching method, called estimation of distribution algorithm (EDA), is adopted to explore the model parameter space starting from a batch of random locations. EDA is featured by its ability to reveal and utilize problem structures. This property is desirable for characterizing the detections. However, it is well recognized that EDAs can not scale well to large scale problems, as it consists of iterative random sampling and model fitting procedures, which results in the well-known dilemma curse of dimensionality. A novel mechanism to perform EDAs in interactive random subspaces spanned by correlated variables is proposed and the hope is to alleviate the curse of dimensionality for EDAs by performing the operations of sampling and model fitting in lower dimensional subspaces. The effectiveness of the proposed algorithm is verified via both benchmark numerical studies and real data analysis.
In this paper, we are concerned with a branch of evolutionary algorithms termed estimation of distribution (EDA), which has been successfully used to tackle derivative-free global optimization problems. For existent EDA algorithms, it is a common practice to use a Gaussian distribution or a mixture of Gaussian components to represent the statistical property of available promising solutions found so far. Observing that the Student's t distribution has heavier and longer tails than the Gaussian, which may be beneficial for exploring the solution space, we propose a novel EDA algorithm termed ESTDA, in which the Student's t distribution, rather than Gaussian, is employed. To address hard multimodal and deceptive problems, we extend ESTDA further by substituting a single Student's t distribution with a mixture of Student's t distributions. The resulting algorithm is named as estimation of mixture of Student's t distribution algorithm (EMSTDA). Both ESTDA and EMSTDA are evaluated through extensive and in-depth numerical experiments using over a dozen of benchmark objective functions. Empirical results demonstrate that the proposed algorithms provide remarkably better performance than their Gaussian counterparts.
We propose a novel framework to understand the text by converting sentences or articles into video-like 3-dimensional tensors. Each frame, corresponding to a slice of the tensor, is a word image that is rendered by the word's shape. The length of the tensor equals to the number of words in the sentence or article. The proposed transformation from the text to a 3-dimensional tensor makes it very convenient to implement an $n$-gram model with convolutional neural networks for text analysis. Concretely, we impose a 3-dimensional convolutional kernel on the 3-dimensional text tensor. The first two dimensions of the convolutional kernel size equal the size of the word image and the last dimension of the kernel size is $n$. That is, every time when we slide the 3-dimensional kernel over a word sequence, the convolution covers $n$ word images and outputs a scalar. By iterating this process continuously for each $n$-gram along with the sentence or article with multiple kernels, we obtain a 2-dimensional feature map. A subsequent 1-dimensional max-over-time pooling is applied to this feature map, and three fully-connected layers are used for conducting text classification finally. Experiments of several text classification datasets demonstrate surprisingly superior performances using the proposed model in comparison with existing methods.
Rolling Horizon Evolutionary Algorithms (RHEA) are a class of online planning methods for real-time game playing; their performance is closely related to the planning horizon and the search time allowed. In this paper, we propose to learn a prior for RHEA in an offline manner by training a value network and a policy network. The value network is used to reduce the planning horizon by providing an estimation of future rewards, and the policy network is used to initialize the population, which helps to narrow down the search scope. The proposed algorithm, named prior-based RHEA (p-RHEA), trains policy and value networks by performing planning and learning iteratively. In the planning stage, the horizon-limited search assisted with the policy network and value network is performed to improve the policies and collect training samples. In the learning stage, the policy network and value network are trained with the collected samples to learn better prior knowledge. Experimental results on OpenAI Gym MuJoCo tasks show that the performance of the proposed p-RHEA is significantly improved compared to that of RHEA.
CT image reconstruction from incomplete data, such as sparse views and limited angle reconstruction, is an important and challenging problem in medical imaging. This work proposes a new deep convolutional neural network (CNN), called JSR-Net, that jointly reconstructs CT images and their associated Radon domain projections. JSR-Net combines the traditional model based approach with deep architecture design of deep learning. A hybrid loss function is adopted to improve the performance of the JSR-Net making it more effective in protecting important image structures. Numerical experiments demonstrate that JSR-Net outperforms some latest model based reconstruction methods, as well as a recently proposed deep model.