Models, code, and papers for "Gang Wang":

Scene Text Recognition With Finer Grid Rectification

Jan 26, 2020
Gang Wang

Scene Text Recognition is a challenging problem because of irregular styles and various distortions. This paper proposed an end-to-end trainable model consists of a finer rectification module and a bidirectional attentional recognition network(Firbarn). The rectification module adopts finer grid to rectify the distorted input image and the bidirectional decoder contains only one decoding layer instead of two separated one. Firbarn can be trained in a weak supervised way, only requiring the scene text images and the corresponding word labels. With the flexible rectification and the novel bidirectional decoder, the results of extensive evaluation on the standard benchmarks show Firbarn outperforms previous works, especially on irregular datasets.

* 6pages,3 figures 

  Click for Model/Code and Paper
A Novel Neural Network Structure Constructed according to Logical Relations

Mar 07, 2019
Wang Gang

To solve more complex things, computer systems becomes more and more complex. It becomes harder to be handled manually for various conditions and unknown new conditions in advance. This situation urgently requires the development of computer technology of automatic judgement and decision according to various conditions. Current ANN (Artificial Neural Network) models are good at perceptual intelligence while they are not good at cognitive intelligence such as logical representation, making them not deal with the above situation well. Therefore, researchers have tried to design novel models so as to represent and store logical relations into the neural network structures, the type of which is called KBNN (Knowledge-Based Neural Network). In this type models, the neurons and links are designed specific for logical relation representation, and the neural network structures are constructed according to logical relations, allowing us to construct automatically the rule libraries of expert systems. In this paper, the further improvement is made based on KBNN by redesigning the neurons and links. This improvement can make neurons solely for representing things while making links solely for representing logical relations between things, and thus no extra logical neurons are needed. Moreover, the related construction and adjustment methods of the neural network structure are also designed based on the redesigned neurons and links, making the neural network structure dynamically constructed and adjusted according to the logical relations. The probabilistic mechanism for the weight adjustment can make the neural network model further represent logical relations in the uncertainty.


  Click for Model/Code and Paper
A Novel Neural Network Model Specified for Representing Logical Relations

Aug 02, 2017
Gang Wang

With computers to handle more and more complicated things in variable environments, it becomes an urgent requirement that the artificial intelligence has the ability of automatic judging and deciding according to numerous specific conditions so as to deal with the complicated and variable cases. ANNs inspired by brain is a good candidate. However, most of current numeric ANNs are not good at representing logical relations because these models still try to represent logical relations in the form of ratio based on functional approximation. On the other hand, researchers have been trying to design novel neural network models to make neural network model represent logical relations. In this work, a novel neural network model specified for representing logical relations is proposed and applied. New neurons and multiple kinds of links are defined. Inhibitory links are introduced besides exciting links. Different from current numeric ANNs, one end of an inhibitory link connects an exciting link rather than a neuron. Inhibitory links inhibit the connected exciting links conditionally to make this neural network model represent logical relations correctly. This model can simulate the operations of Boolean logic gates, and construct complex logical relations with the advantages of simpler neural network structures than recent works in this area. This work provides some ideas to make neural networks represent logical relations more directly and efficiently, and the model could be used as the complement to current numeric ANN to deal with logical issues and expand the application areas of ANN.


  Click for Model/Code and Paper
Hierarchical Spatial Sum-Product Networks for Action Recognition in Still Images

Jul 08, 2016
Jinghua Wang, Gang Wang

Recognizing actions from still images is popularly studied recently. In this paper, we model an action class as a flexible number of spatial configurations of body parts by proposing a new spatial SPN (Sum-Product Networks). First, we discover a set of parts in image collections via unsupervised learning. Then, our new spatial SPN is applied to model the spatial relationship and also the high-order correlations of parts. To learn robust networks, we further develop a hierarchical spatial SPN method, which models pairwise spatial relationship between parts inside sub-images and models the correlation of sub-images via extra layers of SPN. Our method is shown to be effective on two benchmark datasets.


  Click for Model/Code and Paper
Learning Fine-grained Features via a CNN Tree for Large-scale Classification

Sep 22, 2017
Zhenhua Wang, Xingxing Wang, Gang Wang

We propose a novel approach to enhance the discriminability of Convolutional Neural Networks (CNN). The key idea is to build a tree structure that could progressively learn fine-grained features to distinguish a subset of classes, by learning features only among these classes. Such features are expected to be more discriminative, compared to features learned for all the classes. We develop a new algorithm to effectively learn the tree structure from a large number of classes. Experiments on large-scale image classification tasks demonstrate that our method could boost the performance of a given basic CNN model. Our method is quite general, hence it can potentially be used in combination with many other deep learning models.

* Neurocomputing 2017 

  Click for Model/Code and Paper
Recurrent Attentional Networks for Saliency Detection

Apr 12, 2016
Jason Kuen, Zhenhua Wang, Gang Wang

Convolutional-deconvolution networks can be adopted to perform end-to-end saliency detection. But, they do not work well with objects of multiple scales. To overcome such a limitation, in this work, we propose a recurrent attentional convolutional-deconvolution network (RACDNN). Using spatial transformer and recurrent network units, RACDNN is able to iteratively attend to selected image sub-regions to perform saliency refinement progressively. Besides tackling the scale problem, RACDNN can also learn context-aware features from past iterations to enhance saliency refinement in future iterations. Experiments on several challenging saliency detection datasets validate the effectiveness of RACDNN, and show that RACDNN outperforms state-of-the-art saliency detection methods.

* CVPR 2016 

  Click for Model/Code and Paper
Hierarchical Invariant Feature Learning with Marginalization for Person Re-Identification

Nov 30, 2015
Rahul Rama Varior, Gang Wang

This paper addresses the problem of matching pedestrians across multiple camera views, known as person re-identification. Variations in lighting conditions, environment and pose changes across camera views make re-identification a challenging problem. Previous methods address these challenges by designing specific features or by learning a distance function. We propose a hierarchical feature learning framework that learns invariant representations from labeled image pairs. A mapping is learned such that the extracted features are invariant for images belonging to same individual across views. To learn robust representations and to achieve better generalization to unseen data, the system has to be trained with a large amount of data. Critically, most of the person re-identification datasets are small. Manually augmenting the dataset by partial corruption of input data introduces additional computational burden as it requires several training epochs to converge. We propose a hierarchical network which incorporates a marginalization technique that can reap the benefits of training on large datasets without explicit augmentation. We compare our approach with several baseline algorithms as well as popular linear and non-linear metric learning algorithms and demonstrate improved performance on challenging publicly available datasets, VIPeR, CUHK01, CAVIAR4REID and iLIDS. Our approach also achieves the stateof-the-art results on these datasets.


  Click for Model/Code and Paper
Derivations of Normalized Mutual Information in Binary Classifications

Nov 23, 2007
Yong Wang, Bao-Gang Hu

This correspondence studies the basic problem of classifications - how to evaluate different classifiers. Although the conventional performance indexes, such as accuracy, are commonly used in classifier selection or evaluation, information-based criteria, such as mutual information, are becoming popular in feature/model selections. In this work, we propose to assess classifiers in terms of normalized mutual information (NI), which is novel and well defined in a compact range for classifier evaluation. We derive close-form relations of normalized mutual information with respect to accuracy, precision, and recall in binary classifications. By exploring the relations among them, we reveal that NI is actually a set of nonlinear functions, with a concordant power-exponent form, to each performance index. The relations can also be expressed with respect to precision and recall, or to false alarm and hitting rate (recall).

* 8 pages, 8 figures, and 2 tables 

  Click for Model/Code and Paper
Practical Constrained Optimization of Auction Mechanisms in E-Commerce Sponsored Search Advertising

Jul 31, 2018
Gang Bai, Zhihui Xie, Liang Wang

Sponsored search in E-commerce platforms such as Amazon, Taobao and Tmall provides sellers an effective way to reach potential buyers with most relevant purpose. In this paper, we study the auction mechanism optimization problem in sponsored search on Alibaba's mobile E-commerce platform. Besides generating revenue, we are supposed to maintain an efficient marketplace with plenty of quality users, guarantee a reasonable return on investment (ROI) for advertisers, and meanwhile, facilitate a pleasant shopping experience for the users. These requirements essentially pose a constrained optimization problem. Directly optimizing over auction parameters yields a discontinuous, non-convex problem that denies effective solutions. One of our major contribution is a practical convex optimization formulation of the original problem. We devise a novel re-parametrization of auction mechanism with discrete sets of representative instances. To construct the optimization problem, we build an auction simulation system which estimates the resulted business indicators of the selected parameters by replaying the auctions recorded from real online requests. We summarized the experiments on real search traffics to analyze the effects of fidelity of auction simulation, the efficacy under various constraint targets and the influence of regularization. The experiment results show that with proper entropy regularization, we are able to maximize revenue while constraining other business indicators within given ranges.

* 6 pages, 1 figure 

  Click for Model/Code and Paper
Face Attention Network: An Effective Face Detector for the Occluded Faces

Nov 22, 2017
Jianfeng Wang, Ye Yuan, Gang Yu

The performance of face detection has been largely improved with the development of convolutional neural network. However, the occlusion issue due to mask and sunglasses, is still a challenging problem. The improvement on the recall of these occluded cases usually brings the risk of high false positives. In this paper, we present a novel face detector called Face Attention Network (FAN), which can significantly improve the recall of the face detection problem in the occluded case without compromising the speed. More specifically, we propose a new anchor-level attention, which will highlight the features from the face region. Integrated with our anchor assign strategy and data augmentation techniques, we obtain state-of-art results on public face detection benchmarks like WiderFace and MAFA. The code will be released for reproduction.


  Click for Model/Code and Paper
Understanding and Predicting The Attractiveness of Human Action Shot

Nov 02, 2017
Bin Dai, Baoyuan Wang, Gang Hua

Selecting attractive photos from a human action shot sequence is quite challenging, because of the subjective nature of the "attractiveness", which is mainly a combined factor of human pose in action and the background. Prior works have actively studied high-level image attributes including interestingness, memorability, popularity, and aesthetics. However, none of them has ever studied the "attractiveness" of human action shot. In this paper, we present the first study of the "attractiveness" of human action shots by taking a systematic data-driven approach. Specifically, we create a new action-shot dataset composed of about 8000 high quality action-shot photos. We further conduct rich crowd-sourced human judge studies on Amazon Mechanical Turk(AMT) in terms of global attractiveness of a single photo, and relative attractiveness of a pair of photos. A deep Siamese network with a novel hybrid distribution matching loss was further proposed to fully exploit both types of ratings. Extensive experiments reveal that (1) the property of action shot attractiveness is subjective but predicable (2) our proposed method is both efficient and effective for predicting the attractive human action shots.


  Click for Model/Code and Paper
Deep & Cross Network for Ad Click Predictions

Aug 17, 2017
Ruoxi Wang, Bin Fu, Gang Fu, Mingliang Wang

Feature engineering has been the key to the success of many prediction models. However, the process is non-trivial and often requires manual feature engineering or exhaustive searching. DNNs are able to automatically learn feature interactions; however, they generate all the interactions implicitly, and are not necessarily efficient in learning all types of cross features. In this paper, we propose the Deep & Cross Network (DCN) which keeps the benefits of a DNN model, and beyond that, it introduces a novel cross network that is more efficient in learning certain bounded-degree feature interactions. In particular, DCN explicitly applies feature crossing at each layer, requires no manual feature engineering, and adds negligible extra complexity to the DNN model. Our experimental results have demonstrated its superiority over the state-of-art algorithms on the CTR prediction dataset and dense classification dataset, in terms of both model accuracy and memory usage.

* In Proceedings of AdKDD and TargetAd, Halifax, NS, Canada, August, 14, 2017, 7 pages 

  Click for Model/Code and Paper
Improving Fully Convolution Network for Semantic Segmentation

Nov 28, 2016
Bing Shuai, Ting Liu, Gang Wang

Fully Convolution Networks (FCN) have achieved great success in dense prediction tasks including semantic segmentation. In this paper, we start from discussing FCN by understanding its architecture limitations in building a strong segmentation network. Next, we present our Improved Fully Convolution Network (IFCN). In contrast to FCN, IFCN introduces a context network that progressively expands the receptive fields of feature maps. In addition, dense skip connections are added so that the context network can be effectively optimized. More importantly, these dense skip connections enable IFCN to fuse rich-scale context to make reliable predictions. Empirically, those architecture modifications are proven to be significant to enhance the segmentation performance. Without engaging any contextual post-processing, IFCN significantly advances the state-of-the-arts on ADE20K (ImageNet scene parsing), Pascal Context, Pascal VOC 2012 and SUN-RGBD segmentation datasets.


  Click for Model/Code and Paper
Scene Parsing with Integration of Parametric and Non-parametric Models

Apr 20, 2016
Bing Shuai, Zhen Zuo, Gang Wang, Bing Wang

We adopt Convolutional Neural Networks (CNNs) to be our parametric model to learn discriminative features and classifiers for local patch classification. Based on the occurrence frequency distribution of classes, an ensemble of CNNs (CNN-Ensemble) are learned, in which each CNN component focuses on learning different and complementary visual patterns. The local beliefs of pixels are output by CNN-Ensemble. Considering that visually similar pixels are indistinguishable under local context, we leverage the global scene semantics to alleviate the local ambiguity. The global scene constraint is mathematically achieved by adding a global energy term to the labeling energy function, and it is practically estimated in a non-parametric framework. A large margin based CNN metric learning method is also proposed for better global belief estimation. In the end, the integration of local and global beliefs gives rise to the class likelihood of pixels, based on which maximum marginal inference is performed to generate the label prediction maps. Even without any post-processing, we achieve state-of-the-art results on the challenging SiftFlow and Barcelona benchmarks.

* 13 Pages, 6 figures, IEEE Transactions on Image Processing (T-IP) 2016 

  Click for Model/Code and Paper
DAG-Recurrent Neural Networks For Scene Labeling

Nov 23, 2015
Bing Shuai, Zhen Zuo, Gang Wang, Bing Wang

In image labeling, local representations for image units are usually generated from their surrounding image patches, thus long-range contextual information is not effectively encoded. In this paper, we introduce recurrent neural networks (RNNs) to address this issue. Specifically, directed acyclic graph RNNs (DAG-RNNs) are proposed to process DAG-structured images, which enables the network to model long-range semantic dependencies among image units. Our DAG-RNNs are capable of tremendously enhancing the discriminative power of local representations, which significantly benefits the local classification. Meanwhile, we propose a novel class weighting function that attends to rare classes, which phenomenally boosts the recognition accuracy for non-frequent classes. Integrating with convolution and deconvolution layers, our DAG-RNNs achieve new state-of-the-art results on the challenging SiftFlow, CamVid and Barcelona benchmarks.


  Click for Model/Code and Paper
A Multistep Lyapunov Approach for Finite-Time Analysis of Biased Stochastic Approximation

Sep 10, 2019
Gang Wang, Bingcong Li, Georgios B. Giannakis

Motivated by the widespread use of temporal-difference (TD-) and Q-learning algorithms in reinforcement learning, this paper studies a class of biased stochastic approximation (SA) procedures under a mild "ergodic-like" assumption on the underlying stochastic noise sequence. Building upon a carefully designed multistep Lyapunov function that looks ahead to several future updates to accommodate the stochastic perturbations (for control of the gradient bias), we prove a general result on the convergence of the iterates, and use it to derive non-asymptotic bounds on the mean-square error in the case of constant stepsizes. This novel looking-ahead viewpoint renders finite-time analysis of biased SA algorithms under a large family of stochastic perturbations possible. For direct comparison with existing contributions, we also demonstrate these bounds by applying them to TD- and Q-learning with linear function approximation, under the practical Markov chain observation model. The resultant finite-time error bound for both the TD- as well as the Q-learning algorithms is the first of its kind, in the sense that it holds i) for the unmodified versions (i.e., without making any modifications to the parameter updates) using even nonlinear function approximators; as well as for Markov chains ii) under general mixing conditions and iii) starting from any initial distribution, at least one of which has to be violated for existing results to be applicable.

* 28 Pages 

  Click for Model/Code and Paper
Memetic EDA-Based Approaches to Comprehensive Quality-Aware Automated Semantic Web Service Composition

Jun 19, 2019
Chen Wang, Hui Ma, Gang Chen, Sven Hartmann

Comprehensive quality-aware automated semantic web service composition is an NP-hard problem, where service composition workflows are unknown, and comprehensive quality, i.e., Quality of services (QoS) and Quality of semantic matchmaking (QoSM) are simultaneously optimized. The objective of this problem is to find a solution with optimized or near-optimized overall QoS and QoSM within polynomial time over a service request. In this paper, we proposed novel memetic EDA-based approaches to tackle this problem. The proposed method investigates the effectiveness of several neighborhood structures of composite services by proposing domain-dependent local search operators. Apart from that, a joint strategy of the local search procedure is proposed to integrate with a modified EDA to reduce the overall computation time of our memetic approach. To better demonstrate the effectiveness and scalability of our approach, we create a more challenging, augmented version of the service composition benchmark based on WSC-08 \cite{bansal2008wsc} and WSC-09 \cite{kona2009wsc}. Experimental results on this benchmark show that one of our proposed memetic EDA-based approach (i.e., MEEDA-LOP) significantly outperforms existing state-of-the-art algorithms.


  Click for Model/Code and Paper
Monte Carlo Neural Fictitious Self-Play: Approach to Approximate Nash equilibrium of Imperfect-Information Games

Apr 06, 2019
Li Zhang, Wei Wang, Shijian Li, Gang Pan

Researchers on artificial intelligence have achieved human-level intelligence in large-scale perfect-information games, but it is still a challenge to achieve (nearly) optimal results (in other words, an approximate Nash Equilibrium) in large-scale imperfect-information games (i.e. war games, football coach or business strategies). Neural Fictitious Self Play (NFSP) is an effective algorithm for learning approximate Nash equilibrium of imperfect-information games from self-play without prior domain knowledge. However, it relies on Deep Q-Network, which is off-line and is hard to converge in online games with changing opponent strategy, so it can't approach approximate Nash equilibrium in games with large search scale and deep search depth. In this paper, we propose Monte Carlo Neural Fictitious Self Play (MC-NFSP), an algorithm combines Monte Carlo tree search with NFSP, which greatly improves the performance on large-scale zero-sum imperfect-information games. Experimentally, we demonstrate that the proposed Monte Carlo Neural Fictitious Self Play can converge to approximate Nash equilibrium in games with large-scale search depth while the Neural Fictitious Self Play can't. Furthermore, we develop Asynchronous Neural Fictitious Self Play (ANFSP). It use asynchronous and parallel architecture to collect game experience. In experiments, we show that parallel actor-learners have a further accelerated and stabilizing effect on training.


  Click for Model/Code and Paper
Adaptive Caching via Deep Reinforcement Learning

Feb 27, 2019
Alireza Sadeghi, Gang Wang, Georgios B. Giannakis

Caching is envisioned to play a critical role in next-generation content delivery infrastructure, cellular networks, and Internet architectures. By smartly storing the most popular contents at the storage-enabled network entities during off-peak demand instances, caching can benefit both network infrastructure as well as end users, during on-peak periods. In this context, distributing the limited storage capacity across network entities calls for decentralized caching schemes. Many practical caching systems involve a parent caching node connected to multiple leaf nodes to serve user file requests. To model the two-way interactive influence between caching decisions at the parent and leaf nodes, a reinforcement learning framework is put forth. To handle the large continuous state space, a scalable deep reinforcement learning approach is pursued. The novel approach relies on a deep Q-network to learn the Q-function, and thus the optimal caching policy, in an online fashion. Reinforcing the parent node with ability to learn-and-adapt to unknown policies of leaf nodes as well as spatio-temporal dynamic evolution of file requests, results in remarkable caching performance, as corroborated through numerical tests.

* 10 pages, 16 figures 

  Click for Model/Code and Paper
Evolutionary Multitasking for Semantic Web Service Composition

Feb 18, 2019
Chen Wang, Hui Ma, Gang Chen, Sven Hartmann

Web services are basic functions of a software system to support the concept of service-oriented architecture. They are often composed together to provide added values, known as web service composition. Researchers often employ Evolutionary Computation techniques to efficiently construct composite services with near-optimized functional quality (i.e., Quality of Semantic Matchmaking) or non-functional quality (i.e., Quality of Service) or both due to the complexity of this problem. With a significant increase in service composition requests, many composition requests have similar input and output requirements but may vary due to different preferences from different user segments. This problem is often treated as a multi-objective service composition so as to cope with different preferences from different user segments simultaneously. Without taking a multi-objective approach that gives rise to a solution selection challenge, we perceive multiple similar service composition requests as jointly forming an evolutionary multi-tasking problem in this work. We propose an effective permutation-based evolutionary multi-tasking approach that can simultaneously generate a set of solutions, with one for each service request. We also introduce a neighborhood structure over multiple tasks to allow newly evolved solutions to be evaluated on related tasks. Our proposed method can perform better at the cost of only a fraction of time, compared to one state-of-art single-tasking EC-based method. We also found that the use of the proper neighborhood structure can enhance the effectiveness of our approach.


  Click for Model/Code and Paper