To solve more complex things, computer systems becomes more and more complex. It becomes harder to be handled manually for various conditions and unknown new conditions in advance. This situation urgently requires the development of computer technology of automatic judgement and decision according to various conditions. Current ANN (Artificial Neural Network) models are good at perceptual intelligence while they are not good at cognitive intelligence such as logical representation, making them not deal with the above situation well. Therefore, researchers have tried to design novel models so as to represent and store logical relations into the neural network structures, the type of which is called KBNN (Knowledge-Based Neural Network). In this type models, the neurons and links are designed specific for logical relation representation, and the neural network structures are constructed according to logical relations, allowing us to construct automatically the rule libraries of expert systems. In this paper, the further improvement is made based on KBNN by redesigning the neurons and links. This improvement can make neurons solely for representing things while making links solely for representing logical relations between things, and thus no extra logical neurons are needed. Moreover, the related construction and adjustment methods of the neural network structure are also designed based on the redesigned neurons and links, making the neural network structure dynamically constructed and adjusted according to the logical relations. The probabilistic mechanism for the weight adjustment can make the neural network model further represent logical relations in the uncertainty.

Click to Read Paper
With computers to handle more and more complicated things in variable environments, it becomes an urgent requirement that the artificial intelligence has the ability of automatic judging and deciding according to numerous specific conditions so as to deal with the complicated and variable cases. ANNs inspired by brain is a good candidate. However, most of current numeric ANNs are not good at representing logical relations because these models still try to represent logical relations in the form of ratio based on functional approximation. On the other hand, researchers have been trying to design novel neural network models to make neural network model represent logical relations. In this work, a novel neural network model specified for representing logical relations is proposed and applied. New neurons and multiple kinds of links are defined. Inhibitory links are introduced besides exciting links. Different from current numeric ANNs, one end of an inhibitory link connects an exciting link rather than a neuron. Inhibitory links inhibit the connected exciting links conditionally to make this neural network model represent logical relations correctly. This model can simulate the operations of Boolean logic gates, and construct complex logical relations with the advantages of simpler neural network structures than recent works in this area. This work provides some ideas to make neural networks represent logical relations more directly and efficiently, and the model could be used as the complement to current numeric ANN to deal with logical issues and expand the application areas of ANN.

Click to Read Paper
Recognizing actions from still images is popularly studied recently. In this paper, we model an action class as a flexible number of spatial configurations of body parts by proposing a new spatial SPN (Sum-Product Networks). First, we discover a set of parts in image collections via unsupervised learning. Then, our new spatial SPN is applied to model the spatial relationship and also the high-order correlations of parts. To learn robust networks, we further develop a hierarchical spatial SPN method, which models pairwise spatial relationship between parts inside sub-images and models the correlation of sub-images via extra layers of SPN. Our method is shown to be effective on two benchmark datasets.

Click to Read Paper
We propose a novel approach to enhance the discriminability of Convolutional Neural Networks (CNN). The key idea is to build a tree structure that could progressively learn fine-grained features to distinguish a subset of classes, by learning features only among these classes. Such features are expected to be more discriminative, compared to features learned for all the classes. We develop a new algorithm to effectively learn the tree structure from a large number of classes. Experiments on large-scale image classification tasks demonstrate that our method could boost the performance of a given basic CNN model. Our method is quite general, hence it can potentially be used in combination with many other deep learning models.

* Neurocomputing 2017
Click to Read Paper
Convolutional-deconvolution networks can be adopted to perform end-to-end saliency detection. But, they do not work well with objects of multiple scales. To overcome such a limitation, in this work, we propose a recurrent attentional convolutional-deconvolution network (RACDNN). Using spatial transformer and recurrent network units, RACDNN is able to iteratively attend to selected image sub-regions to perform saliency refinement progressively. Besides tackling the scale problem, RACDNN can also learn context-aware features from past iterations to enhance saliency refinement in future iterations. Experiments on several challenging saliency detection datasets validate the effectiveness of RACDNN, and show that RACDNN outperforms state-of-the-art saliency detection methods.

* CVPR 2016
Click to Read Paper
This paper addresses the problem of matching pedestrians across multiple camera views, known as person re-identification. Variations in lighting conditions, environment and pose changes across camera views make re-identification a challenging problem. Previous methods address these challenges by designing specific features or by learning a distance function. We propose a hierarchical feature learning framework that learns invariant representations from labeled image pairs. A mapping is learned such that the extracted features are invariant for images belonging to same individual across views. To learn robust representations and to achieve better generalization to unseen data, the system has to be trained with a large amount of data. Critically, most of the person re-identification datasets are small. Manually augmenting the dataset by partial corruption of input data introduces additional computational burden as it requires several training epochs to converge. We propose a hierarchical network which incorporates a marginalization technique that can reap the benefits of training on large datasets without explicit augmentation. We compare our approach with several baseline algorithms as well as popular linear and non-linear metric learning algorithms and demonstrate improved performance on challenging publicly available datasets, VIPeR, CUHK01, CAVIAR4REID and iLIDS. Our approach also achieves the stateof-the-art results on these datasets.

Click to Read Paper
This correspondence studies the basic problem of classifications - how to evaluate different classifiers. Although the conventional performance indexes, such as accuracy, are commonly used in classifier selection or evaluation, information-based criteria, such as mutual information, are becoming popular in feature/model selections. In this work, we propose to assess classifiers in terms of normalized mutual information (NI), which is novel and well defined in a compact range for classifier evaluation. We derive close-form relations of normalized mutual information with respect to accuracy, precision, and recall in binary classifications. By exploring the relations among them, we reveal that NI is actually a set of nonlinear functions, with a concordant power-exponent form, to each performance index. The relations can also be expressed with respect to precision and recall, or to false alarm and hitting rate (recall).

* 8 pages, 8 figures, and 2 tables
Click to Read Paper
Sponsored search in E-commerce platforms such as Amazon, Taobao and Tmall provides sellers an effective way to reach potential buyers with most relevant purpose. In this paper, we study the auction mechanism optimization problem in sponsored search on Alibaba's mobile E-commerce platform. Besides generating revenue, we are supposed to maintain an efficient marketplace with plenty of quality users, guarantee a reasonable return on investment (ROI) for advertisers, and meanwhile, facilitate a pleasant shopping experience for the users. These requirements essentially pose a constrained optimization problem. Directly optimizing over auction parameters yields a discontinuous, non-convex problem that denies effective solutions. One of our major contribution is a practical convex optimization formulation of the original problem. We devise a novel re-parametrization of auction mechanism with discrete sets of representative instances. To construct the optimization problem, we build an auction simulation system which estimates the resulted business indicators of the selected parameters by replaying the auctions recorded from real online requests. We summarized the experiments on real search traffics to analyze the effects of fidelity of auction simulation, the efficacy under various constraint targets and the influence of regularization. The experiment results show that with proper entropy regularization, we are able to maximize revenue while constraining other business indicators within given ranges.

* 6 pages, 1 figure
Click to Read Paper
The performance of face detection has been largely improved with the development of convolutional neural network. However, the occlusion issue due to mask and sunglasses, is still a challenging problem. The improvement on the recall of these occluded cases usually brings the risk of high false positives. In this paper, we present a novel face detector called Face Attention Network (FAN), which can significantly improve the recall of the face detection problem in the occluded case without compromising the speed. More specifically, we propose a new anchor-level attention, which will highlight the features from the face region. Integrated with our anchor assign strategy and data augmentation techniques, we obtain state-of-art results on public face detection benchmarks like WiderFace and MAFA. The code will be released for reproduction.

Click to Read Paper
Selecting attractive photos from a human action shot sequence is quite challenging, because of the subjective nature of the "attractiveness", which is mainly a combined factor of human pose in action and the background. Prior works have actively studied high-level image attributes including interestingness, memorability, popularity, and aesthetics. However, none of them has ever studied the "attractiveness" of human action shot. In this paper, we present the first study of the "attractiveness" of human action shots by taking a systematic data-driven approach. Specifically, we create a new action-shot dataset composed of about 8000 high quality action-shot photos. We further conduct rich crowd-sourced human judge studies on Amazon Mechanical Turk(AMT) in terms of global attractiveness of a single photo, and relative attractiveness of a pair of photos. A deep Siamese network with a novel hybrid distribution matching loss was further proposed to fully exploit both types of ratings. Extensive experiments reveal that (1) the property of action shot attractiveness is subjective but predicable (2) our proposed method is both efficient and effective for predicting the attractive human action shots.

Click to Read Paper
Feature engineering has been the key to the success of many prediction models. However, the process is non-trivial and often requires manual feature engineering or exhaustive searching. DNNs are able to automatically learn feature interactions; however, they generate all the interactions implicitly, and are not necessarily efficient in learning all types of cross features. In this paper, we propose the Deep & Cross Network (DCN) which keeps the benefits of a DNN model, and beyond that, it introduces a novel cross network that is more efficient in learning certain bounded-degree feature interactions. In particular, DCN explicitly applies feature crossing at each layer, requires no manual feature engineering, and adds negligible extra complexity to the DNN model. Our experimental results have demonstrated its superiority over the state-of-art algorithms on the CTR prediction dataset and dense classification dataset, in terms of both model accuracy and memory usage.

* In Proceedings of AdKDD and TargetAd, Halifax, NS, Canada, August, 14, 2017, 7 pages
Click to Read Paper
Fully Convolution Networks (FCN) have achieved great success in dense prediction tasks including semantic segmentation. In this paper, we start from discussing FCN by understanding its architecture limitations in building a strong segmentation network. Next, we present our Improved Fully Convolution Network (IFCN). In contrast to FCN, IFCN introduces a context network that progressively expands the receptive fields of feature maps. In addition, dense skip connections are added so that the context network can be effectively optimized. More importantly, these dense skip connections enable IFCN to fuse rich-scale context to make reliable predictions. Empirically, those architecture modifications are proven to be significant to enhance the segmentation performance. Without engaging any contextual post-processing, IFCN significantly advances the state-of-the-arts on ADE20K (ImageNet scene parsing), Pascal Context, Pascal VOC 2012 and SUN-RGBD segmentation datasets.

Click to Read Paper
We adopt Convolutional Neural Networks (CNNs) to be our parametric model to learn discriminative features and classifiers for local patch classification. Based on the occurrence frequency distribution of classes, an ensemble of CNNs (CNN-Ensemble) are learned, in which each CNN component focuses on learning different and complementary visual patterns. The local beliefs of pixels are output by CNN-Ensemble. Considering that visually similar pixels are indistinguishable under local context, we leverage the global scene semantics to alleviate the local ambiguity. The global scene constraint is mathematically achieved by adding a global energy term to the labeling energy function, and it is practically estimated in a non-parametric framework. A large margin based CNN metric learning method is also proposed for better global belief estimation. In the end, the integration of local and global beliefs gives rise to the class likelihood of pixels, based on which maximum marginal inference is performed to generate the label prediction maps. Even without any post-processing, we achieve state-of-the-art results on the challenging SiftFlow and Barcelona benchmarks.

* 13 Pages, 6 figures, IEEE Transactions on Image Processing (T-IP) 2016
Click to Read Paper
In image labeling, local representations for image units are usually generated from their surrounding image patches, thus long-range contextual information is not effectively encoded. In this paper, we introduce recurrent neural networks (RNNs) to address this issue. Specifically, directed acyclic graph RNNs (DAG-RNNs) are proposed to process DAG-structured images, which enables the network to model long-range semantic dependencies among image units. Our DAG-RNNs are capable of tremendously enhancing the discriminative power of local representations, which significantly benefits the local classification. Meanwhile, we propose a novel class weighting function that attends to rare classes, which phenomenally boosts the recognition accuracy for non-frequent classes. Integrating with convolution and deconvolution layers, our DAG-RNNs achieve new state-of-the-art results on the challenging SiftFlow, CamVid and Barcelona benchmarks.

Click to Read Paper
Researchers on artificial intelligence have achieved human-level intelligence in large-scale perfect-information games, but it is still a challenge to achieve (nearly) optimal results (in other words, an approximate Nash Equilibrium) in large-scale imperfect-information games (i.e. war games, football coach or business strategies). Neural Fictitious Self Play (NFSP) is an effective algorithm for learning approximate Nash equilibrium of imperfect-information games from self-play without prior domain knowledge. However, it relies on Deep Q-Network, which is off-line and is hard to converge in online games with changing opponent strategy, so it can't approach approximate Nash equilibrium in games with large search scale and deep search depth. In this paper, we propose Monte Carlo Neural Fictitious Self Play (MC-NFSP), an algorithm combines Monte Carlo tree search with NFSP, which greatly improves the performance on large-scale zero-sum imperfect-information games. Experimentally, we demonstrate that the proposed Monte Carlo Neural Fictitious Self Play can converge to approximate Nash equilibrium in games with large-scale search depth while the Neural Fictitious Self Play can't. Furthermore, we develop Asynchronous Neural Fictitious Self Play (ANFSP). It use asynchronous and parallel architecture to collect game experience. In experiments, we show that parallel actor-learners have a further accelerated and stabilizing effect on training.

Click to Read Paper
Caching is envisioned to play a critical role in next-generation content delivery infrastructure, cellular networks, and Internet architectures. By smartly storing the most popular contents at the storage-enabled network entities during off-peak demand instances, caching can benefit both network infrastructure as well as end users, during on-peak periods. In this context, distributing the limited storage capacity across network entities calls for decentralized caching schemes. Many practical caching systems involve a parent caching node connected to multiple leaf nodes to serve user file requests. To model the two-way interactive influence between caching decisions at the parent and leaf nodes, a reinforcement learning framework is put forth. To handle the large continuous state space, a scalable deep reinforcement learning approach is pursued. The novel approach relies on a deep Q-network to learn the Q-function, and thus the optimal caching policy, in an online fashion. Reinforcing the parent node with ability to learn-and-adapt to unknown policies of leaf nodes as well as spatio-temporal dynamic evolution of file requests, results in remarkable caching performance, as corroborated through numerical tests.

* 10 pages, 16 figures
Click to Read Paper
Web services are basic functions of a software system to support the concept of service-oriented architecture. They are often composed together to provide added values, known as web service composition. Researchers often employ Evolutionary Computation techniques to efficiently construct composite services with near-optimized functional quality (i.e., Quality of Semantic Matchmaking) or non-functional quality (i.e., Quality of Service) or both due to the complexity of this problem. With a significant increase in service composition requests, many composition requests have similar input and output requirements but may vary due to different preferences from different user segments. This problem is often treated as a multi-objective service composition so as to cope with different preferences from different user segments simultaneously. Without taking a multi-objective approach that gives rise to a solution selection challenge, we perceive multiple similar service composition requests as jointly forming an evolutionary multi-tasking problem in this work. We propose an effective permutation-based evolutionary multi-tasking approach that can simultaneously generate a set of solutions, with one for each service request. We also introduce a neighborhood structure over multiple tasks to allow newly evolved solutions to be evaluated on related tasks. Our proposed method can perform better at the cost of only a fraction of time, compared to one state-of-art single-tasking EC-based method. We also found that the use of the proper neighborhood structure can enhance the effectiveness of our approach.

Click to Read Paper
Multiview canonical correlation analysis (MCCA) seeks latent low-dimensional representations encountered with multiview data of shared entities (a.k.a. common sources). However, existing MCCA approaches do not exploit the geometry of the common sources, which may be available \emph{a priori}, or can be constructed using certain domain knowledge. This prior information about the common sources can be encoded by a graph, and be invoked as a regularizer to enrich the maximum variance MCCA framework. In this context, the present paper's novel graph-regularized (G) MCCA approach minimizes the distance between the wanted canonical variables and the common low-dimensional representations, while accounting for graph-induced knowledge of the common sources. Relying on a function capturing the extent low-dimensional representations of the multiple views are similar, a generalization bound of GMCCA is established based on Rademacher's complexity. Tailored for setups where the number of data pairs is smaller than the data vector dimensions, a graph-regularized dual MCCA approach is also developed. To further deal with nonlinearities present in the data, graph-regularized kernel MCCA variants are put forward too. Interestingly, solutions of the graph-regularized linear, dual, and kernel MCCA, are all provided in terms of generalized eigenvalue decomposition. Several corroborating numerical tests using real datasets are provided to showcase the merits of the graph-regularized MCCA variants relative to several competing alternatives including MCCA, Laplacian-regularized MCCA, and (graph-regularized) PCA.

Click to Read Paper
Contemporary power grids are being challenged by rapid voltage fluctuations that are caused by large-scale deployment of renewable generation, electric vehicles, and demand response programs. In this context, monitoring the grid's operating conditions in real time becomes increasingly critical. With the emergent large scale and nonconvexity however, the existing power system state estimation (PSSE) schemes become computationally expensive or yield suboptimal performance. To bypass these hurdles, this paper advocates deep neural networks (DNNs) for real-time power system monitoring. By unrolling an iterative physics-based prox-linear solver, a novel model-specific DNN is developed for real-time PSSE with affordable training and minimal tuning effort. To further enable system awareness even ahead of the time horizon, as well as to endow the DNN-based estimator with resilience, deep recurrent neural networks (RNNs) are also pursued for power system state forecasting. Deep RNNs leverage the long-term nonlinear dependencies present in the historical voltage time series to enable forecasting, and they are easy to implement. Numerical tests showcase improved performance of the proposed DNN-based estimation and forecasting approaches compared with existing alternatives. In real load data experiments on the IEEE 118-bus benchmark system, the novel model-specific DNN-based PSSE scheme outperforms nearly by an order-of-magnitude the competing alternatives, including the widely adopted Gauss-Newton PSSE solver.

Click to Read Paper
Principal component analysis (PCA) is widely used for feature extraction and dimensionality reduction, with documented merits in diverse tasks involving high-dimensional data. Standard PCA copes with one dataset at a time, but it is challenged when it comes to analyzing multiple datasets jointly. In certain data science settings however, one is often interested in extracting the most discriminative information from one dataset of particular interest (a.k.a. target data) relative to the other(s) (a.k.a. background data). To this end, this paper puts forth a novel approach, termed discriminative (d) PCA, for such discriminative analytics of multiple datasets. Under certain conditions, dPCA is proved to be least-squares optimal in recovering the component vector unique to the target data relative to background data. To account for nonlinear data correlations, (linear) dPCA models for one or multiple background datasets are generalized through kernel-based learning. Interestingly, all dPCA variants admit an analytical solution obtainable with a single (generalized) eigenvalue decomposition. Finally, corroborating dimensionality reduction tests using both synthetic and real datasets are provided to validate the effectiveness of the proposed methods.

* there are some mistakes
Click to Read Paper