Research papers and code for "Yi Wang":
The Nearest subspace classifier (NSS) finds an estimation of the underlying subspace within each class and assigns data points to the class that corresponds to its nearest subspace. This paper mainly studies how well NSS can be generalized to new samples. It is proved that NSS is strongly consistent under certain assumptions. For completeness, NSS is evaluated through experiments on various simulated and real data sets, in comparison with some other linear model based classifiers. It is also shown that NSS can obtain effective classification results and is very efficient, especially for large scale data sets.

Click to Read Paper and Get Code
User's mental state is concerned gradually, during the interaction course of human robot. As the measurement and identification method of psychological state, tension, has certain practical significance role. At presents there is no suitable method of measuring the tension. Firstly, sum up some availability of eye movement index. And then parameters extraction on eye movement characteristics of normal illumination is studied, including the location of the face, eyes location, access to the pupil diameter, the eye pupil center characteristic parameters. And with the judgment of the tension in eye images, extract exact information of gaze direction. Finally, through the experiment to prove the proposed method is effective.

Click to Read Paper and Get Code
In this paper, we consider a generic probabilistic discriminative learner from the functional viewpoint and argue that, to make it learn well, it is necessary to constrain its hypothesis space to a set of non-trivial piecewise constant functions. To achieve this goal, we present a scalable unsupervised regularization framework. On the theoretical front, we prove that this framework is conducive to a factually confident and smooth discriminative model and connect it to an adversarial Taboo game, spectral clustering and virtual adversarial training. Experimentally, we take deep neural networks as our learners and demonstrate that, when trained under our framework in the unsupervised setting, they not only achieve state-of-the-art clustering results but also generalize well on both synthetic and real data.

* 9 pages
Click to Read Paper and Get Code
We introduce the bilingual dual-coding theory as a model for bilingual mental representation. Based on this model, lexical selection neural networks are implemented for a connectionist transfer project in machine translation. This lexical selection approach has two advantages. First, it is learnable. Little human effort on knowledge engineering is required. Secondly, it is psycholinguistically well-founded.

* 3 pages, ACL94 student session
Click to Read Paper and Get Code
This paper presents a semantic brain computer interface (BCI) agent with particle swarm optimization (PSO) based on a Fuzzy Markup Language (FML) for Go learning and prediction applications. Additionally, we also establish an Open Go Darkforest (OGD) cloud platform with Facebook AI research (FAIR) open source Darkforest and ELF OpenGo AI bots. The Japanese robot Palro will simultaneously predict the move advantage in the board game Go to the Go players for reference or learning. The proposed semantic BCI agent operates efficiently by the human-based BCI data from their brain waves and machine-based game data from the prediction of the OGD cloud platform for optimizing the parameters between humans and machines. Experimental results show that the proposed human and smart machine co-learning mechanism performs favorably. We hope to provide students with a better online learning environment, combining different kinds of handheld devices, robots, or computer equipment, to achieve a desired and intellectual learning goal in the future.

Click to Read Paper and Get Code
Given new pairs of source and target point sets, standard point set registration methods often repeatedly conduct the independent iterative search of desired geometric transformation to align the source point set with the target one. This limits their use in applications to handle the real-time point set registration with large volume dataset. This paper presents a novel method, named coherent point drift networks (CPD-Net), for unsupervised learning of geometric transformation towards real-time non-rigid point set registration. In contrast to previous efforts (e.g. coherent point drift), CPD-Net can learn displacement field function to estimate geometric transformation from a training dataset, consequently, to predict the desired geometric transformation for the alignment of previously unseen pairs without any additional iterative optimization process. Furthermore, CPD-Net leverages the power of deep neural network to fit an arbitrary function, that adaptively accommodates different levels of complexity of the desired geometric transformation. Particularly, CPD-Net is proved with a theoretical guarantee to learn a continuous displacement vector function that could further avoid imposing additional parametric smoothness constraint as in previous works. Our experiments verify CPD-Net's impressive performance for non-rigid point set registration on various 2D/3D datasets, even in presence of significant displacement noise, outliers, and missing points. Our code is availabel at https://github.com/nyummvc/CPD-Net.

Click to Read Paper and Get Code
We extend probabilistic action language pBC+ with the notion of utility as in decision theory. The semantics of the extended pBC+ can be defined as a shorthand notation for a decision-theoretic extension of the probabilistic answer set programming language LPMLN. Alternatively, the semantics of pBC+ can also be defined in terms of Markov Decision Process (MDP), which in turn allows for representing MDP in a succinct and elaboration tolerant way as well as to leverage an MDP solver to compute pBC+. The idea led to the design of the system pbcplus2mdp, which can find an optimal policy of a pBC+ action description using an MDP solver.

* 13 pages, to appear in LPNMR 2019
Click to Read Paper and Get Code
Stochastic gradient descent updates parameters with summation gradient computed from a random data batch. This summation will lead to unbalanced training process if the data we obtained is unbalanced. To address this issue, this paper takes the error variance and error mean both into consideration. The adaptively adjusting approach of two terms trading off is also given in our algorithm. Due to this algorithm can suppress error variance, we named it Variance Suppression Gradient Descent (VSSGD). Experimental results have demonstrated that VSSGD can accelerate the training process, effectively prevent overfitting, improve the networks learning capacity from small samples.

* More experiments are needed to prove this theory, but I'm not quite sure yet
Click to Read Paper and Get Code
LPMLN is a probabilistic extension of answer set programs with the weight scheme derived from that of Markov Logic. Previous work has shown how inference in LPMLN can be achieved. In this paper, we present the concept of weight learning in LPMLN and learning algorithms for LPMLN derived from those for Markov Logic. We also present a prototype implementation that uses answer set solvers for learning as well as some example domains that illustrate distinct features of LPMLN learning. Learning in LPMLN is in accordance with the stable model semantics, thereby it learns parameters for probabilistic extensions of knowledge-rich domains where answer set programming has shown to be useful but limited to the deterministic case, such as reachability analysis and reasoning about actions in dynamic domains. We also apply the method to learn the parameters for probabilistic abductive reasoning about actions.

* Technical Report of the paper to appear in 16th International Conference on Principles of Knowledge Representation and Reasoning
Click to Read Paper and Get Code
We present a probabilistic extension of action language BC+. Just like BC+ is defined as a high-level notation of answer set programs for describing transition systems, the proposed language, which we call pBC+, is defined as a high-level notation of LPMLN programs---a probabilistic extension of answer set programs. We show how probabilistic reasoning about transition systems, such as prediction, postdiction, and planning problems, as well as probabilistic diagnosis for dynamic domains, can be modeled in pBC+ and computed using an implementation of LPMLN.

* Paper presented at the 34nd International Conference on Logic Programming (ICLP 2018), Oxford, UK, July 14 to July 17, 2018 18 pages, LaTeX, 1 PDF figures (arXiv:YYMM.NNNNN)
Click to Read Paper and Get Code
Recent advancements in deep learning opened new opportunities for learning a high-quality 3D model from a single 2D image given sufficient training on large-scale data sets. However, the significant imbalance between available amount of images and 3D models, and the limited availability of labeled 2D image data (i.e. manually annotated pairs between images and their corresponding 3D models), severely impacts the training of most supervised deep learning methods in practice. In this paper, driven by a novel design of adversarial networks, we have developed an unsupervised learning paradigm to reconstruct 3D models from a single 2D image, which is free of manually annotated pairwise input image and its associated 3D model. Particularly, the paradigm begins with training an adaption network via autoencoder with adversarial loss, which embeds unpaired 2D synthesized image domain with real world image domain to a shared latent vector space. Then, we jointly train a 3D deconvolutional network to transform the latent vector space to the 3D object space together with the embedding process. Our experiments verify our network's robust and superior performance in handling 3D volumetric object generation from a single 2D image.

Click to Read Paper and Get Code
Markov Logic Networks (MLN) and Probabilistic Soft Logic (PSL) are widely applied formalisms in Statistical Relational Learning, an emerging area in Artificial Intelligence that is concerned with combining logical and statistical AI. Despite their resemblance, the relationship has not been formally stated. In this paper, we describe the precise semantic relationship between them from a logical perspective. This is facilitated by first extending fuzzy logic to allow weights, which can be also viewed as a generalization of PSL, and then relate that generalization to MLN. We observe that the relationship between PSL and MLN is analogous to the known relationship between fuzzy logic and Boolean logic, and furthermore the weight scheme of PSL is essentially a generalization of the weight scheme of MLN for the many-valued setting.

* In Working Notes of the 6th International Workshop on Statistical Relational AI
Click to Read Paper and Get Code
Simple tree models for articulated objects prevails in the last decade. However, it is also believed that these simple tree models are not capable of capturing large variations in many scenarios, such as human pose estimation. This paper attempts to address three questions: 1) are simple tree models sufficient? more specifically, 2) how to use tree models effectively in human pose estimation? and 3) how shall we use combined parts together with single parts efficiently? Assuming we have a set of single parts and combined parts, and the goal is to estimate a joint distribution of their locations. We surprisingly find that no latent variables are introduced in the Leeds Sport Dataset (LSP) during learning latent trees for deformable model, which aims at approximating the joint distributions of body part locations using minimal tree structure. This suggests one can straightforwardly use a mixed representation of single and combined parts to approximate their joint distribution in a simple tree model. As such, one only needs to build Visual Categories of the combined parts, and then perform inference on the learned latent tree. Our method outperformed the state of the art on the LSP, both in the scenarios when the training images are from the same dataset and from the PARSE dataset. Experiments on animal images from the VOC challenge further support our findings.

* CVPR 2013
Click to Read Paper and Get Code
Parsing human poses in images is fundamental in extracting critical visual information for artificial intelligent agents. Our goal is to learn self-contained body part representations from images, which we call visual symbols, and their symbol-wise geometric contexts in this parsing process. Each symbol is individually learned by categorizing visual features leveraged by geometric information. In the categorization, we use Latent Support Vector Machine followed by an efficient cross validation procedure to learn visual symbols. Then, these symbols naturally define geometric contexts of body parts in a fine granularity. When the structure of the compositional parts is a tree, we derive an efficient approach to estimating human poses in images. Experiments on two large datasets suggest our approach outperforms state of the art methods.

* IJCAI 2013
Click to Read Paper and Get Code
This paper offers a novel mathematical approach, the modified Fractional-order Steepest Descent Method (FSDM) for training BackPropagation Neural Networks (BPNNs); this differs from the majority of the previous approaches and as such. A promising mathematical method, fractional calculus, has the potential to assume a prominent role in the applications of neural networks and cybernetics because of its inherent strengths such as long-term memory, nonlocality, and weak singularity. Therefore, to improve the optimization performance of classic first-order BPNNs, in this paper we study whether it could be possible to modified FSDM and generalize classic first-order BPNNs to modified FSDM based Fractional-order Backpropagation Neural Networks (FBPNNs). Motivated by this inspiration, this paper proposes a state-of-the-art application of fractional calculus to implement a modified FSDM based FBPNN whose reverse incremental search is in the negative directions of the approximate fractional-order partial derivatives of the square error. At first, the theoretical concept of a modified FSDM based FBPNN is described mathematically. Then, the mathematical proof of the fractional-order global optimal convergence, an assumption of the structure, and the fractional-order multi-scale global optimization of a modified FSDM based FBPNN are analysed in detail. Finally, we perform comparative experiments and compare a modified FSDM based FBPNN with a classic first-order BPNN, i.e., an example function approximation, fractional-order multi-scale global optimization, and two comparative performances with real data. The more efficient optimal searching capability of the fractional-order multi-scale global optimization of a modified FSDM based FBPNN to determine the global optimal solution is the major advantage being superior to a classic first-order BPNN.

Click to Read Paper and Get Code
Distributed knowledge is the sum of the knowledge in a group; what someone who is able to discern between two possible worlds whenever any member of the group can discern between them, would know. Sometimes distributed knowledge is referred to as the potential knowledge of a group, or the joint knowledge they could obtain if they had unlimited means of communication. In epistemic logic, the formula D_G{\phi} is intended to express the fact that group G has distributed knowledge of {\phi}, that there is enough information in the group to infer {\phi}. But this is not the same as reasoning about what happens if the members of the group share their information. In this paper we introduce an operator R_G, such that R_G{\phi} means that {\phi} is true after G have shared all their information with each other - after G's distributed knowledge has been resolved. The R_G operators are called resolution operators. Semantically, we say that an expression R_G{\phi} is true iff {\phi} is true in what van Benthem [11, p. 249] calls (G's) communication core; the model update obtained by removing links to states for members of G that are not linked by all members of G. We study logics with different combinations of resolution operators and operators for common and distributed knowledge. Of particular interest is the relationship between distributed and common knowledge. The main results are sound and complete axiomatizations.

* EPTCS 215, 2016, pp. 31-50
* In Proceedings TARK 2015, arXiv:1606.07295
Click to Read Paper and Get Code
This paper revisits the problem of analyzing multiple ratings given by different judges. Different from previous work that focuses on distilling the true labels from noisy crowdsourcing ratings, we emphasize gaining diagnostic insights into our in-house well-trained judges. We generalize the well-known DawidSkene model (Dawid & Skene, 1979) to a spectrum of probabilistic models under the same "TrueLabel + Confusion" paradigm, and show that our proposed hierarchical Bayesian model, called HybridConfusion, consistently outperforms DawidSkene on both synthetic and real-world data sets.

* ICML2012
Click to Read Paper and Get Code
State-of-the-art text spotting systems typically aim to detect isolated words or word-by-word text in images of natural scenes and ignore the semantic coherence within a region of text. However, when interpreted together, seemingly isolated words may be easier to recognize. On this basis, we propose a novel "semantic-based text recognition" (STR) deep learning model that reads text in images with the help of understanding context. STR consists of several modules. We introduce the Text Grouping and Arranging (TGA) algorithm to connect and order isolated text regions. A text-recognition network interprets isolated words. Benefiting from semantic information, a sequenceto-sequence network model efficiently corrects inaccurate and uncertain phrases produced earlier in the STR pipeline. We present experiments on two new distinct datasets that contain scanned catalog images of interior designs and photographs of protesters with hand-written signs, respectively. Our results show that our STR model outperforms a baseline method that uses state-of-the-art single-wordrecognition techniques on both datasets. STR yields a high accuracy rate of 90% on the catalog images and 71% on the more difficult protest images, suggesting its generality in recognizing text.

Click to Read Paper and Get Code