Models, code, and papers for "Yi Wang":

The Nearest subspace classifier (NSS) finds an estimation of the underlying subspace within each class and assigns data points to the class that corresponds to its nearest subspace. This paper mainly studies how well NSS can be generalized to new samples. It is proved that NSS is strongly consistent under certain assumptions. For completeness, NSS is evaluated through experiments on various simulated and real data sets, in comparison with some other linear model based classifiers. It is also shown that NSS can obtain effective classification results and is very efficient, especially for large scale data sets.

User's mental state is concerned gradually, during the interaction course of human robot. As the measurement and identification method of psychological state, tension, has certain practical significance role. At presents there is no suitable method of measuring the tension. Firstly, sum up some availability of eye movement index. And then parameters extraction on eye movement characteristics of normal illumination is studied, including the location of the face, eyes location, access to the pupil diameter, the eye pupil center characteristic parameters. And with the judgment of the tension in eye images, extract exact information of gaze direction. Finally, through the experiment to prove the proposed method is effective.

Referring expressions are natural language descriptions that identify a particular object within a scene and are widely used in our daily conversations. In this work, we focus on segmenting the object in an image specified by a referring expression. To this end, we propose an end-to-end trainable comprehension network that consists of the language and visual encoders to extract feature representations from both domains. We introduce the spatial-aware dynamic filters to transfer knowledge from text to image, and effectively capture the spatial information of the specified object. To better communicate between the language and visual modules, we employ a caption generation network that takes features shared across both domains as input, and improves both representations via a consistency that enforces the generated sentence to be similar to the given referring expression. We evaluate the proposed framework on two referring expression datasets and show that our method performs favorably against the state-of-the-art algorithms.

In this paper, we consider a generic probabilistic discriminative learner from the functional viewpoint and argue that, to make it learn well, it is necessary to constrain its hypothesis space to a set of non-trivial piecewise constant functions. To achieve this goal, we present a scalable unsupervised regularization framework. On the theoretical front, we prove that this framework is conducive to a factually confident and smooth discriminative model and connect it to an adversarial Taboo game, spectral clustering and virtual adversarial training. Experimentally, we take deep neural networks as our learners and demonstrate that, when trained under our framework in the unsupervised setting, they not only achieve state-of-the-art clustering results but also generalize well on both synthetic and real data.

We introduce the bilingual dual-coding theory as a model for bilingual mental representation. Based on this model, lexical selection neural networks are implemented for a connectionist transfer project in machine translation. This lexical selection approach has two advantages. First, it is learnable. Little human effort on knowledge engineering is required. Secondly, it is psycholinguistically well-founded.

This paper presents a semantic brain computer interface (BCI) agent with particle swarm optimization (PSO) based on a Fuzzy Markup Language (FML) for Go learning and prediction applications. Additionally, we also establish an Open Go Darkforest (OGD) cloud platform with Facebook AI research (FAIR) open source Darkforest and ELF OpenGo AI bots. The Japanese robot Palro will simultaneously predict the move advantage in the board game Go to the Go players for reference or learning. The proposed semantic BCI agent operates efficiently by the human-based BCI data from their brain waves and machine-based game data from the prediction of the OGD cloud platform for optimizing the parameters between humans and machines. Experimental results show that the proposed human and smart machine co-learning mechanism performs favorably. We hope to provide students with a better online learning environment, combining different kinds of handheld devices, robots, or computer equipment, to achieve a desired and intellectual learning goal in the future.

Robotic software and hardware systems of autonomous surface vehicles have been developed in transportation, military, and ocean researches for decades. Previous efforts in RobotX Challenges 2014 and 2016 facilitates the developments for important tasks such as obstacle avoidance and docking. Team NCTU is motivated by the AI Driving Olympics (AI-DO) developed by the Duckietown community, and adopts the principles to RobotX challenge. With the containerization (Docker) and uniformed AI agent (with observations and actions), we could better 1) integrate solutions developed in different middlewares (ROS and MOOS), 2) develop essential functionalities of from simulation (Gazebo) to real robots (either miniaturized or full-sized WAM-V), and 3) compare different approaches either from classic model-based or learning-based. Finally, we setup an outdoor on-surface platform with localization services for evaluation. Some of the preliminary results will be presented for the Team NCTU participations of the RobotX competition in Hawaii in 2018.

Given new pairs of source and target point sets, standard point set registration methods often repeatedly conduct the independent iterative search of desired geometric transformation to align the source point set with the target one. This limits their use in applications to handle the real-time point set registration with large volume dataset. This paper presents a novel method, named coherent point drift networks (CPD-Net), for unsupervised learning of geometric transformation towards real-time non-rigid point set registration. In contrast to previous efforts (e.g. coherent point drift), CPD-Net can learn displacement field function to estimate geometric transformation from a training dataset, consequently, to predict the desired geometric transformation for the alignment of previously unseen pairs without any additional iterative optimization process. Furthermore, CPD-Net leverages the power of deep neural network to fit an arbitrary function, that adaptively accommodates different levels of complexity of the desired geometric transformation. Particularly, CPD-Net is proved with a theoretical guarantee to learn a continuous displacement vector function that could further avoid imposing additional parametric smoothness constraint as in previous works. Our experiments verify CPD-Net's impressive performance for non-rigid point set registration on various 2D/3D datasets, even in presence of significant displacement noise, outliers, and missing points. Our code is availabel at https://github.com/nyummvc/CPD-Net.

We extend probabilistic action language pBC+ with the notion of utility as in decision theory. The semantics of the extended pBC+ can be defined as a shorthand notation for a decision-theoretic extension of the probabilistic answer set programming language LPMLN. Alternatively, the semantics of pBC+ can also be defined in terms of Markov Decision Process (MDP), which in turn allows for representing MDP in a succinct and elaboration tolerant way as well as to leverage an MDP solver to compute pBC+. The idea led to the design of the system pbcplus2mdp, which can find an optimal policy of a pBC+ action description using an MDP solver.

Stochastic gradient descent updates parameters with summation gradient computed from a random data batch. This summation will lead to unbalanced training process if the data we obtained is unbalanced. To address this issue, this paper takes the error variance and error mean both into consideration. The adaptively adjusting approach of two terms trading off is also given in our algorithm. Due to this algorithm can suppress error variance, we named it Variance Suppression Gradient Descent (VSSGD). Experimental results have demonstrated that VSSGD can accelerate the training process, effectively prevent overfitting, improve the networks learning capacity from small samples.

LPMLN is a probabilistic extension of answer set programs with the weight scheme derived from that of Markov Logic. Previous work has shown how inference in LPMLN can be achieved. In this paper, we present the concept of weight learning in LPMLN and learning algorithms for LPMLN derived from those for Markov Logic. We also present a prototype implementation that uses answer set solvers for learning as well as some example domains that illustrate distinct features of LPMLN learning. Learning in LPMLN is in accordance with the stable model semantics, thereby it learns parameters for probabilistic extensions of knowledge-rich domains where answer set programming has shown to be useful but limited to the deterministic case, such as reachability analysis and reasoning about actions in dynamic domains. We also apply the method to learn the parameters for probabilistic abductive reasoning about actions.

We present a probabilistic extension of action language BC+. Just like BC+ is defined as a high-level notation of answer set programs for describing transition systems, the proposed language, which we call pBC+, is defined as a high-level notation of LPMLN programs---a probabilistic extension of answer set programs. We show how probabilistic reasoning about transition systems, such as prediction, postdiction, and planning problems, as well as probabilistic diagnosis for dynamic domains, can be modeled in pBC+ and computed using an implementation of LPMLN.

Recent advancements in deep learning opened new opportunities for learning a high-quality 3D model from a single 2D image given sufficient training on large-scale data sets. However, the significant imbalance between available amount of images and 3D models, and the limited availability of labeled 2D image data (i.e. manually annotated pairs between images and their corresponding 3D models), severely impacts the training of most supervised deep learning methods in practice. In this paper, driven by a novel design of adversarial networks, we have developed an unsupervised learning paradigm to reconstruct 3D models from a single 2D image, which is free of manually annotated pairwise input image and its associated 3D model. Particularly, the paradigm begins with training an adaption network via autoencoder with adversarial loss, which embeds unpaired 2D synthesized image domain with real world image domain to a shared latent vector space. Then, we jointly train a 3D deconvolutional network to transform the latent vector space to the 3D object space together with the embedding process. Our experiments verify our network's robust and superior performance in handling 3D volumetric object generation from a single 2D image.

Markov Logic Networks (MLN) and Probabilistic Soft Logic (PSL) are widely applied formalisms in Statistical Relational Learning, an emerging area in Artificial Intelligence that is concerned with combining logical and statistical AI. Despite their resemblance, the relationship has not been formally stated. In this paper, we describe the precise semantic relationship between them from a logical perspective. This is facilitated by first extending fuzzy logic to allow weights, which can be also viewed as a generalization of PSL, and then relate that generalization to MLN. We observe that the relationship between PSL and MLN is analogous to the known relationship between fuzzy logic and Boolean logic, and furthermore the weight scheme of PSL is essentially a generalization of the weight scheme of MLN for the many-valued setting.

Simple tree models for articulated objects prevails in the last decade. However, it is also believed that these simple tree models are not capable of capturing large variations in many scenarios, such as human pose estimation. This paper attempts to address three questions: 1) are simple tree models sufficient? more specifically, 2) how to use tree models effectively in human pose estimation? and 3) how shall we use combined parts together with single parts efficiently? Assuming we have a set of single parts and combined parts, and the goal is to estimate a joint distribution of their locations. We surprisingly find that no latent variables are introduced in the Leeds Sport Dataset (LSP) during learning latent trees for deformable model, which aims at approximating the joint distributions of body part locations using minimal tree structure. This suggests one can straightforwardly use a mixed representation of single and combined parts to approximate their joint distribution in a simple tree model. As such, one only needs to build Visual Categories of the combined parts, and then perform inference on the learned latent tree. Our method outperformed the state of the art on the LSP, both in the scenarios when the training images are from the same dataset and from the PARSE dataset. Experiments on animal images from the VOC challenge further support our findings.

Parsing human poses in images is fundamental in extracting critical visual information for artificial intelligent agents. Our goal is to learn self-contained body part representations from images, which we call visual symbols, and their symbol-wise geometric contexts in this parsing process. Each symbol is individually learned by categorizing visual features leveraged by geometric information. In the categorization, we use Latent Support Vector Machine followed by an efficient cross validation procedure to learn visual symbols. Then, these symbols naturally define geometric contexts of body parts in a fine granularity. When the structure of the compositional parts is a tree, we derive an efficient approach to estimating human poses in images. Experiments on two large datasets suggest our approach outperforms state of the art methods.

The trace norm is widely used in multi-task learning as it can discover low-rank structures among tasks in terms of model parameters. Nowadays, with the emerging of big datasets and the popularity of deep learning techniques, tensor trace norms have been used for deep multi-task models. However, existing tensor trace norms cannot discover all the low-rank structures and they require users to manually determine the importance of their components. To solve those two issues together, in this paper, we propose a Generalized Tensor Trace Norm (GTTN). The GTTN is defined as a convex combination of matrix trace norms of all possible tensor flattenings and hence it can discover all the possible low-rank structures. In the induced objective function, we will learn combination coefficients in the GTTN to automatically determine the importance. Experiments on real-world datasets demonstrate the effectiveness of the proposed GTTN.

This paper offers a novel mathematical approach, the modified Fractional-order Steepest Descent Method (FSDM) for training BackPropagation Neural Networks (BPNNs); this differs from the majority of the previous approaches and as such. A promising mathematical method, fractional calculus, has the potential to assume a prominent role in the applications of neural networks and cybernetics because of its inherent strengths such as long-term memory, nonlocality, and weak singularity. Therefore, to improve the optimization performance of classic first-order BPNNs, in this paper we study whether it could be possible to modified FSDM and generalize classic first-order BPNNs to modified FSDM based Fractional-order Backpropagation Neural Networks (FBPNNs). Motivated by this inspiration, this paper proposes a state-of-the-art application of fractional calculus to implement a modified FSDM based FBPNN whose reverse incremental search is in the negative directions of the approximate fractional-order partial derivatives of the square error. At first, the theoretical concept of a modified FSDM based FBPNN is described mathematically. Then, the mathematical proof of the fractional-order global optimal convergence, an assumption of the structure, and the fractional-order multi-scale global optimization of a modified FSDM based FBPNN are analysed in detail. Finally, we perform comparative experiments and compare a modified FSDM based FBPNN with a classic first-order BPNN, i.e., an example function approximation, fractional-order multi-scale global optimization, and two comparative performances with real data. The more efficient optimal searching capability of the fractional-order multi-scale global optimization of a modified FSDM based FBPNN to determine the global optimal solution is the major advantage being superior to a classic first-order BPNN.

Distributed knowledge is the sum of the knowledge in a group; what someone who is able to discern between two possible worlds whenever any member of the group can discern between them, would know. Sometimes distributed knowledge is referred to as the potential knowledge of a group, or the joint knowledge they could obtain if they had unlimited means of communication. In epistemic logic, the formula D_G{\phi} is intended to express the fact that group G has distributed knowledge of {\phi}, that there is enough information in the group to infer {\phi}. But this is not the same as reasoning about what happens if the members of the group share their information. In this paper we introduce an operator R_G, such that R_G{\phi} means that {\phi} is true after G have shared all their information with each other - after G's distributed knowledge has been resolved. The R_G operators are called resolution operators. Semantically, we say that an expression R_G{\phi} is true iff {\phi} is true in what van Benthem [11, p. 249] calls (G's) communication core; the model update obtained by removing links to states for members of G that are not linked by all members of G. We study logics with different combinations of resolution operators and operators for common and distributed knowledge. Of particular interest is the relationship between distributed and common knowledge. The main results are sound and complete axiomatizations.