Models, code, and papers for "Bo Zhang":

In recent years, deep learning has achieved remarkable achievements in many fields, including computer vision, natural language processing, speech recognition and others. Adequate training data is the key to ensure the effectiveness of the deep models. However, obtaining valid data requires a lot of time and labor resources. Data augmentation (DA) is an effective alternative approach, which can generate new labeled data based on existing data using label-preserving transformations. Although we can benefit a lot from DA, designing appropriate DA policies requires a lot of expert experience and time consumption, and the evaluation of searching the optimal policies is costly. So we raise a new question in this paper: how to achieve automated data augmentation at as low cost as possible? We propose a method named BO-Aug for automating the process by finding the optimal DA policies using the Bayesian optimization approach. Our method can find the optimal policies at a relatively low search cost, and the searched policies based on a specific dataset are transferable across different neural network architectures or even different datasets. We validate the BO-Aug on three widely used image classification datasets, including CIFAR-10, CIFAR-100 and SVHN. Experimental results show that the proposed method can achieve state-of-the-art or near advanced classification accuracy. Code to reproduce our experiments is available at https://github.com/zhangxiaozao/BO-Aug.

In school, a teacher plays an important role in various classroom teaching patterns. Likewise to this human learning activity, the learning using privileged information (LUPI) paradigm provides additional information generated by the teacher to 'teach' learning algorithms during the training stage. Therefore, this novel learning paradigm is a typical Teacher-Student Interaction mechanism. This paper is the first to present a random vector functional link network based on the LUPI paradigm, called RVFL+. Rather than simply combining two existing approaches, the newly-derived RVFL+ fills the gap between neural networks and the LUPI paradigm, which offers an alternative way to train RVFL networks. Moreover, the proposed RVFL+ can perform in conjunction with the kernel trick for highly complicated nonlinear feature learning, which is termed KRVFL+. Furthermore, the statistical property of the proposed RVFL+ is investigated, and we derive a sharp and high-quality generalization error bound based on the Rademacher complexity. Competitive experimental results on 14 real-world datasets illustrate the great effectiveness and efficiency of the novel RVFL+ and KRVFL+, which can achieve better generalization performance than state-of-the-art algorithms.

This study provides a systematic review of the recent advances in designing the intelligent tutoring robot (ITR), and summarises the status quo of applying artificial intelligence (AI) techniques. We first analyse the environment of the ITR and propose a relationship model for describing interactions of ITR with the students, the social milieu and the curriculum. Then, we transform the relationship model into the perception-planning-action model for exploring what AI techniques are suitable to be applied in the ITR. This article provides insights on promoting human-robot teaching-learning process and AI-assisted educational techniques, illustrating the design guidelines and future research perspectives in intelligent tutoring robots.

It is well-understood that the robustness of mechanical and robotic control systems depends critically on minimizing sensitivity to arbitrary application-specific details whenever possible. For example, if a system is defined and performs well in one particular Euclidean coordinate frame then it should be expected to perform identically if that coordinate frame is arbitrarily rotated or scaled. Similarly, the performance of the system should not be affected if its key parameters are all consistently defined in metric units or in imperial units. In this paper we show that a recently introduced generalized matrix inverse permits performance consistency to be rigorously guaranteed in control systems that require solutions to underdetermined and/or overdetermined systems of equations.

Hashing method maps similar high-dimensional data to binary hashcodes with smaller hamming distance, and it has received broad attention due to its low storage cost and fast retrieval speed. Pairwise similarity is easily obtained and widely used for retrieval, and most supervised hashing algorithms are carefully designed for the pairwise supervisions. As labeling all data pairs is difficult, semi-supervised hashing is proposed which aims at learning efficient codes with limited labeled pairs and abundant unlabeled ones. Existing methods build graphs to capture the structure of dataset, but they are not working well for complex data as the graph is built based on the data representations and determining the representations of complex data is difficult. In this paper, we propose a novel teacher-student semi-supervised hashing framework in which the student is trained with the pairwise information produced by the teacher network. The network follows the smoothness assumption, which achieves consistent distances for similar data pairs so that the retrieval results are similar for neighborhood queries. Experiments on large-scale datasets show that the proposed method reaches impressive gain over the supervised baselines and is superior to state-of-the-art semi-supervised hashing methods.

Dantzig Selector (DS) is widely used in compressed sensing and sparse learning for feature selection and sparse signal recovery. Since the DS formulation is essentially a linear programming optimization, many existing linear programming solvers can be simply applied for scaling up. The DS formulation can be explained as a basis pursuit denoising problem, wherein the data matrix (or measurement matrix) is employed as the denoising matrix to eliminate the observation noise. However, we notice that the data matrix may not be the optimal denoising matrix, as shown by a simple counter-example. This motivates us to pursue a better denoising matrix for defining a general DS formulation. We first define the optimal denoising matrix through a minimax optimization, which turns out to be an NPhard problem. To make the problem computationally tractable, we propose a novel algorithm, termed as Optimal Denoising Dantzig Selector (ODDS), to approximately estimate the optimal denoising matrix. Empirical experiments validate the proposed method. Finally, a novel sparse reinforcement learning algorithm is formulated by extending the proposed ODDS algorithm to temporal difference learning, and empirical experimental results demonstrate to outperform the conventional vanilla DS-TD algorithm.

Hashing method maps similar data to binary hashcodes with smaller hamming distance, which has received a broad attention due to its low storage cost and fast retrieval speed. With the rapid development of deep learning, deep hashing methods have achieved promising results in efficient information retrieval. Most of the existing deep hashing methods adopt pairwise or triplet losses to deal with similarities underlying the data, but the training is difficult and less efficient because $O(n^2)$ data pairs and $O(n^3)$ triplets are involved. To address these issues, we propose a novel deep hashing algorithm with unary loss which can be trained very efficiently. We first of all introduce a Unary Upper Bound of the traditional triplet loss, thus reducing the complexity to $O(n)$ and bridging the classification-based unary loss and the triplet loss. Second, we propose a novel Semantic Cluster Deep Hashing (SCDH) algorithm by introducing a modified Unary Upper Bound loss, named Semantic Cluster Unary Loss (SCUL). The resultant hashcodes form several compact clusters, which means hashcodes in the same cluster have similar semantic information. We also demonstrate that the proposed SCDH is easy to be extended to semi-supervised settings by incorporating the state-of-the-art semi-supervised learning algorithms. Experiments on large-scale datasets show that the proposed method is superior to state-of-the-art hashing algorithms.

Mega-city analysis with very high resolution (VHR) satellite images has been drawing increasing interest in the fields of city planning and social investigation. It is known that accurate land-use, urban density, and population distribution information is the key to mega-city monitoring and environmental studies. Therefore, how to generate land-use, urban density, and population distribution maps at a fine scale using VHR satellite images has become a hot topic. Previous studies have focused solely on individual tasks with elaborate hand-crafted features and have ignored the relationship between different tasks. In this study, we aim to propose a universal framework which can: 1) automatically learn the internal feature representation from the raw image data; and 2) simultaneously produce fine-scale land-use, urban density, and population distribution maps. For the first target, a deep convolutional neural network (CNN) is applied to learn the hierarchical feature representation from the raw image data. For the second target, a novel CNN-based universal framework is proposed to process the VHR satellite images and generate the land-use, urban density, and population distribution maps. To the best of our knowledge, this is the first CNN-based mega-city analysis method which can process a VHR remote sensing image with such a large data volume. A VHR satellite image (1.2 m spatial resolution) of the center of Wuhan covering an area of 2606 km2 was used to evaluate the proposed method. The experimental results confirm that the proposed method can achieve a promising accuracy for land-use, urban density, and population distribution maps.

Manifold learning is a hot research topic in the field of computer science. A crucial issue with current manifold learning methods is that they lack a natural quantitative measure to assess the quality of learned embeddings, which greatly limits their applications to real-world problems. In this paper, a new embedding quality assessment method for manifold learning, named as Normalization Independent Embedding Quality Assessment (NIEQA), is proposed. Compared with current assessment methods which are limited to isometric embeddings, the NIEQA method has a much larger application range due to two features. First, it is based on a new measure which can effectively evaluate how well local neighborhood geometry is preserved under normalization, hence it can be applied to both isometric and normalized embeddings. Second, it can provide both local and global evaluations to output an overall assessment. Therefore, NIEQA can serve as a natural tool in model selection and evaluation tasks for manifold learning. Experimental results on benchmark data sets validate the effectiveness of the proposed method.

This paper addresses the automatic image segmentation problem in a region merging style. With an initially over-segmented image, in which the many regions (or super-pixels) with homogeneous color are detected, image segmentation is performed by iteratively merging the regions according to a statistical test. There are two essential issues in a region merging algorithm: order of merging and the stopping criterion. In the proposed algorithm, these two issues are solved by a novel predicate, which is defined by the sequential probability ratio test (SPRT) and the maximum likelihood criterion. Starting from an over-segmented image, neighboring regions are progressively merged if there is an evidence for merging according to this predicate. We show that the merging order follows the principle of dynamic programming. This formulates image segmentation as an inference problem, where the final segmentation is established based on the observed image. We also prove that the produced segmentation satisfies certain global properties. In addition, a faster algorithm is developed to accelerate the region merging process, which maintains a nearest neighbor graph in each iteration. Experiments on real natural images are conducted to demonstrate the performance of the proposed dynamic region merging algorithm.

The evolution of MobileNets has laid a solid foundation for neural network applications on mobile end. With the latest MobileNetV3, neural architecture search again claimed its supremacy in network design. Unfortunately, till today all mobile methods mainly focus on CPU latencies instead of GPU, the latter, however, is much preferred in practice for it has faster speed, lower overhead and less interference. Bearing the target hardware in mind, we propose the first Mobile GPU-Aware (MoGA) neural architecture search in order to be precisely tailored for real-world applications. Further, the ultimate objective to devise a mobile network lies in achieving better performance by maximizing the utilization of bounded resources. Urging higher capability while restraining time consumption is not reconcilable. We alleviate the tension by weighted evolution techniques. Moreover, we encourage increasing the number of parameters for higher representational power. With 200x fewer GPU days than MnasNet, we obtain a series of models that outperform MobileNetV3 under the similar latency constraints, i.e., MoGA-A achieves 75.9% top-1 accuracy on ImageNet, MoGA-B meets 75.5% which costs only 0.5 ms more on mobile GPU. MoGA-C best attests GPU-awareness by reaching 75.3% and being slower on CPU but faster on GPU.The models and test code is made available here https://github.com/xiaomi-automl/MoGA.

Experimental evaluation is a major research methodology for investigating clustering algorithms. For this purpose, a number of benchmark datasets have been widely used in the literature and their quality plays an important role on the value of the research work. However, in most of the existing studies, little attention has been paid to the specific properties of the datasets and they are often regarded as black-box problems. In our work, with the help of advanced visualization and dimension reduction techniques, we show that there are potential issues with some of the popular benchmark datasets used to evaluate clustering algorithms that may seriously compromise the research quality and even may produce completely misleading results. We suggest that significant efforts need to be devoted to improving the current practice of experimental evaluation of clustering algorithms by having a principled analysis of each benchmark dataset of interest.

Deep generative models play an increasingly important role in machine learning and computer vision. However there are two fundamental issues hindering real-world applications of these techniques: the learning difficulty of variational inference in Variational AutoEncoder (VAE) and the functional absence of encoding samples in Generative Adversarial Network (GAN). In this paper, we manage to address these issues in one framework by proposing a novel algorithm named Latently Invertible Autoencoder (LIA). A deep invertible network and its inverse mapping are symmetrically embedded in the latent space of VAE. Thus the partial encoder first transforms inputs to be feature vectors and then the distribution of these feature vectors is reshaped to approach a prior by the invertible network. The decoder proceeds in reverse order of composite mappings of the complete encoder. The two-stage stochasticity-free training is devised to train LIA via adversarial learning, in the sense that we first train a standard GAN whose generator is the decoder of LIA and then an autoencoder in the adversarial manner by detaching the invertible network from LIA. Experiments conducted on the FFHQ dataset validate the effectiveness of LIA for inference and generation tasks.

Traditionally the danger cylinder is intimately related to the solution stability in P3P problem. In this work, we show that the danger cylinder is also closely related to the multiple-solution phenomenon. More specifically, we show when the optical center lies on the danger cylinder, of the 3 possible P3P solutions, i.e., one double solution, and two other solutions, the optical center of the double solution still lies on the danger cylinder, but the optical centers of the other two solutions no longer lie on the danger cylinder. And when the optical center moves on the danger cylinder, accordingly the optical centers of the two other solutions of the corresponding P3P problem form a new surface, characterized by a polynomial equation of degree 12 in the optical center coordinates, called the Companion Surface of Danger Cylinder (CSDC). That means the danger cylinder always has a companion surface. For the significance of CSDC, we show that when the optical center passes through the CSDC, the number of solutions of P3P problem must change by 2. That means CSDC acts as a delimitating surface of the P3P solution space. These new findings shed some new lights on the P3P multi-solution phenomenon, an important issue in PnP study.

Traditionally, the P3P problem is solved by firstly transforming its 3 quadratic equations into a quartic one, then by locating the roots of the resulting quartic equation and verifying whether a root does really correspond to a true solution of the P3P problem itself. However, a root of the quartic equation does not always correspond to a solution of the P3P problem. In this work, we show that when the optical center is outside of all the 6 toroids defined by the control point triangle, each positive root of the Grunert's quartic equation must correspond to a true solution of the P3P problem, and the corresponding P3P problem cannot have a unique solution, it must have either 2 positive solutions or 4 positive solutions. In addition, we show that when the optical center passes through any one of the 3 toroids among these 6 toroids ( except possibly for two concentric circles) , the number of the solutions of the corresponding P3P problem always changes by 1, either increased by 1 or decreased by 1.Furthermore we show that such changed solutions always locate in a small neighborhood of control points, hence the 3 toroids are critical surfaces of the P3P problem and the 3 control points are 3 singular points of solutions. A notable example is that when the optical center passes through the outer surface of the union of the 6 toroids from the outside to inside, the number of the solutions must always decrease by 1. Our results are the first to give an explicit and geometrically intuitive relationship between the P3P solutions and the roots of its quartic equation. It could act as some theoretical guidance for P3P practitioners to properly arrange their control points to avoid undesirable solutions.

It is well known that the P3P problem could have 1, 2, 3 and at most 4 positive solutions under different configurations among its 3 control points and the position of the optical center. Since in any real applications, the knowledge on the exact number of possible solutions is a prerequisite for selecting the right one among all the possible solutions, the study on the phenomenon of multiple solutions in the P3P problem has been an active topic . In this work, we provide some new geometric interpretations on the multi-solution phenomenon in the P3P problem, our main results include: (1): The necessary and sufficient condition for the P3P problem to have a pair of side-sharing solutions is the two optical centers of the solutions both lie on one of the 3 vertical planes to the base plane of control points; (2): The necessary and sufficient condition for the P3P problem to have a pair of point-sharing solutions is the two optical centers of the solutions both lie on one of the 3 so-called skewed danger cylinders;(3): If the P3P problem has other solutions in addition to a pair of side-sharing ( point-sharing) solutions, these remaining solutions must be a point-sharing ( side-sharing ) pair. In a sense, the side-sharing pair and the point-sharing pair are companion pairs. In sum, our results provide some new insights into the nature of the multi-solution phenomenon in the P3P problem, in addition to their academic value, they could also be used as some theoretical guidance for practitioners in real applications to avoid occurrence of multiple solutions by properly arranging the control points.

Face parsing is a basic task in face image analysis. It amounts to labeling each pixel with appropriate facial parts such as eyes and nose. In the paper, we present a interlinked convolutional neural network (iCNN) for solving this problem in an end-to-end fashion. It consists of multiple convolutional neural networks (CNNs) taking input in different scales. A special interlinking layer is designed to allow the CNNs to exchange information, enabling them to integrate local and contextual information efficiently. The hallmark of iCNN is the extensive use of downsampling and upsampling in the interlinking layers, while traditional CNNs usually uses downsampling only. A two-stage pipeline is proposed for face parsing and both stages use iCNN. The first stage localizes facial parts in the size-reduced image and the second stage labels the pixels in the identified facial parts in the original image. On a benchmark dataset we have obtained better results than the state-of-the-art methods.

This paper presents a robust matrix elastic net based canonical correlation analysis (RMEN-CCA) for multiple view unsupervised learning problems, which emphasizes the combination of CCA and the robust matrix elastic net (RMEN) used as coupled feature selection. The RMEN-CCA leverages the strength of the RMEN to distill naturally meaningful features without any prior assumption and to measure effectively correlations between different 'views'. We can further employ directly the kernel trick to extend the RMEN-CCA to the kernel scenario with theoretical guarantees, which takes advantage of the kernel trick for highly complicated nonlinear feature learning. Rather than simply incorporating existing regularization minimization terms into CCA, this paper provides a new learning paradigm for CCA and is the first to derive a coupled feature selection based CCA algorithm that guarantees convergence. More significantly, for CCA, the newly-derived RMEN-CCA bridges the gap between measurement of relevance and coupled feature selection. Moreover, it is nontrivial to tackle directly the RMEN-CCA by previous optimization approaches derived from its sophisticated model architecture. Therefore, this paper further offers a bridge between a new optimization problem and an existing efficient iterative approach. As a consequence, the RMEN-CCA can overcome the limitation of CCA and address large-scale and streaming data problems. Experimental results on four popular competing datasets illustrate that the RMEN-CCA performs more effectively and efficiently than do state-of-the-art approaches.