Models, code, and papers for "Liang Wang":

Segmenting DNA sequence into `words'

Jun 22, 2013
Wang Liang

This paper presents a novel method to segment/decode DNA sequences based on n-grams statistical language model. Firstly, we find the length of most DNA 'words' is 12 to 15 bps by analyzing the genomes of 12 model species. Then we design an unsupervised probability based approach to segment the DNA sequences. The benchmark of segmenting method is also proposed.

* 12 pages,2 figures 

  Click for Model/Code and Paper
Adaptively Connected Neural Networks

Apr 07, 2019
Guangrun Wang, Keze Wang, Liang Lin

This paper presents a novel adaptively connected neural network (ACNet) to improve the traditional convolutional neural networks (CNNs) {in} two aspects. First, ACNet employs a flexible way to switch global and local inference in processing the internal feature representations by adaptively determining the connection status among the feature nodes (e.g., pixels of the feature maps) \footnote{In a computer vision domain, a node refers to a pixel of a feature map{, while} in {the} graph domain, a node denotes a graph node.}. We can show that existing CNNs, the classical multilayer perceptron (MLP), and the recently proposed non-local network (NLN) \cite{nonlocalnn17} are all special cases of ACNet. Second, ACNet is also capable of handling non-Euclidean data. Extensive experimental analyses on {a variety of benchmarks (i.e.,} ImageNet-1k classification, COCO 2017 detection and segmentation, CUHK03 person re-identification, CIFAR analysis, and Cora document categorization) demonstrate that {ACNet} cannot only achieve state-of-the-art performance but also overcome the limitation of the conventional MLP and CNN \footnote{Corresponding author: Liang Lin (linliang@ieee.org)}. The code is available at \url{https://github.com/wanggrun/Adaptively-Connected-Neural-Networks}.

* Accepted by CVPR 2019 

  Click for Model/Code and Paper
BARNet: Bilinear Attention Network with Adaptive Receptive Field for Surgical Instrument Segmentation

Jan 20, 2020
Zhen-Liang Ni, Gui-Bin Bian, Guan-An Wang, Xiao-Hu Zhou, Zeng-Guang Hou, Xiao-Liang Xie, Zhen Li, Yu-Han Wang

Surgical instrument segmentation is extremely important for computer-assisted surgery. Different from common object segmentation, it is more challenging due to the large illumination and scale variation caused by the special surgical scenes. In this paper, we propose a novel bilinear attention network with adaptive receptive field to solve these two challenges. For the illumination variation, the bilinear attention module can capture second-order statistics to encode global contexts and semantic dependencies between local pixels. With them, semantic features in challenging areas can be inferred from their neighbors and the distinction of various semantics can be boosted. For the scale variation, our adaptive receptive field module aggregates multi-scale features and automatically fuses them with different weights. Specifically, it encodes the semantic relationship between channels to emphasize feature maps with appropriate scales, changing the receptive field of subsequent convolutions. The proposed network achieves the best performance 97.47% mean IOU on Cata7 and comes first place on EndoVis 2017 by 10.10% IOU overtaking second-ranking method.


  Click for Model/Code and Paper
RAUNet: Residual Attention U-Net for Semantic Segmentation of Cataract Surgical Instruments

Oct 02, 2019
Zhen-Liang Ni, Gui-Bin Bian, Xiao-Hu Zhou, Zeng-Guang Hou, Xiao-Liang Xie, Chen Wang, Yan-Jie Zhou, Rui-Qi Li, Zhen Li

Semantic segmentation of surgical instruments plays a crucial role in robot-assisted surgery. However, accurate segmentation of cataract surgical instruments is still a challenge due to specular reflection and class imbalance issues. In this paper, an attention-guided network is proposed to segment the cataract surgical instrument. A new attention module is designed to learn discriminative features and address the specular reflection issue. It captures global context and encodes semantic dependencies to emphasize key semantic features, boosting the feature representation. This attention module has very few parameters, which helps to save memory. Thus, it can be flexibly plugged into other networks. Besides, a hybrid loss is introduced to train our network for addressing the class imbalance issue, which merges cross entropy and logarithms of Dice loss. A new dataset named Cata7 is constructed to evaluate our network. To the best of our knowledge, this is the first cataract surgical instrument dataset for semantic segmentation. Based on this dataset, RAUNet achieves state-of-the-art performance 97.71% mean Dice and 95.62% mean IOU.

* Accepted by the 26th International Conference on Neural Information Processing (ICONIP2019). arXiv admin note: cs.CV => eess.IV cs.CV 

  Click for Model/Code and Paper
Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks

Apr 12, 2017
Hongsong Wang, Liang Wang

Recently, skeleton based action recognition gains more popularity due to cost-effective depth sensors coupled with real-time skeleton estimation algorithms. Traditional approaches based on handcrafted features are limited to represent the complexity of motion patterns. Recent methods that use Recurrent Neural Networks (RNN) to handle raw skeletons only focus on the contextual dependency in the temporal domain and neglect the spatial configurations of articulated skeletons. In this paper, we propose a novel two-stream RNN architecture to model both temporal dynamics and spatial configurations for skeleton based action recognition. We explore two different structures for the temporal stream: stacked RNN and hierarchical RNN. Hierarchical RNN is designed according to human body kinematics. We also propose two effective methods to model the spatial structure by converting the spatial graph into a sequence of joints. To improve generalization of our model, we further exploit 3D transformation based data augmentation techniques including rotation and scaling transformation to transform the 3D coordinates of skeletons during training. Experiments on 3D action recognition benchmark datasets show that our method brings a considerable improvement for a variety of actions, i.e., generic actions, interaction activities and gestures.

* Accepted to IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2017 

  Click for Model/Code and Paper
Uncovering Dominant Social Class in Neighborhoods through Building Footprints: A Case Study of Residential Zones in Massachusetts using Computer Vision

Jun 12, 2019
Qianhui Liang, Zhoutong Wang

In urban theory, urban form is related to social and economic status. This paper explores to uncover zip-code level income through urban form by analyzing figure-ground map, a simple, prevailing and precise representation of urban form in the field of urban study. Deep learning in computer vision enables such representation maps to be studied at a large scale. We propose to train a DCNN model to identify and uncover the internal bridge between social class and urban form. Further, using hand-crafted informative visual features related with urban form properties (building size, building density, etc.), we apply a random forest classifier to interpret how morphological properties are related with social class.

* Publishing as conference proceeding of 16th International Conference on Computers in Urban Planning and Urban Management 

  Click for Model/Code and Paper
Scalable Compression of Deep Neural Networks

Aug 26, 2016
Xing Wang, Jie Liang

Deep neural networks generally involve some layers with mil- lions of parameters, making them difficult to be deployed and updated on devices with limited resources such as mobile phones and other smart embedded systems. In this paper, we propose a scalable representation of the network parameters, so that different applications can select the most suitable bit rate of the network based on their own storage constraints. Moreover, when a device needs to upgrade to a high-rate network, the existing low-rate network can be reused, and only some incremental data are needed to be downloaded. We first hierarchically quantize the weights of a pre-trained deep neural network to enforce weight sharing. Next, we adaptively select the bits assigned to each layer given the total bit budget. After that, we retrain the network to fine-tune the quantized centroids. Experimental results show that our method can achieve scalable compression with graceful degradation in the performance.

* 5 pages, 4 figures, ACM Multimedia 2016 

  Click for Model/Code and Paper
Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

Feb 03, 2015
Xiaolong Wang, Liang Lin

This paper studies a novel discriminative part-based model to represent and recognize object shapes with an "And-Or graph". We define this model consisting of three layers: the leaf-nodes with collaborative edges for localizing local parts, the or-nodes specifying the switch of leaf-nodes, and the root-node encoding the global verification. A discriminative learning algorithm, extended from the CCCP [23], is proposed to train the model in a dynamical manner: the model structure (e.g., the configuration of the leaf-nodes associated with the or-nodes) is automatically determined with optimizing the multi-layer parameters during the iteration. The advantages of our method are two-fold. (i) The And-Or graph model enables us to handle well large intra-class variance and background clutters for object shape detection from images. (ii) The proposed learning algorithm is able to obtain the And-Or graph representation without requiring elaborate supervision and initialization. We validate the proposed method on several challenging databases (e.g., INRIA-Horse, ETHZ-Shape, and UIUC-People), and it outperforms the state-of-the-arts approaches.

* Advances in Neural Information Processing Systems (pp. 242-250), 2014 
* 9 pages, 4 figures, NIPS 2012 

  Click for Model/Code and Paper
Biconvex Landscape In SDP-Related Learning

Nov 03, 2018
En-Liang Hu, Bo Wang

Many machine learning problems can be reduced to learning a low-rank positive semidefinite matrix (denoted as $Z$), which encounters semidefinite program (SDP). Existing SDP solvers are often expensive for large-scale learning. To avoid directly solving SDP, some works convert SDP into a nonconvex program by factorizing $Z$ as $XX^\top$. However, this would bring higher-order nonlinearity, resulting in scarcity of structure in subsequent optimization. In this paper, we propose a novel surrogate for SDP-related learning, in which the structure of subproblem is exploited. More specifically, we surrogate unconstrained SDP by a biconvex problem, through factorizing $Z$ as $XY^\top$ and using a Courant penalty to penalize the difference of $X$ and $Y$, in which the resultant subproblems are convex. Furthermore, we provide a theoretical bound for the associated penalty parameter under the assumption that the objective function is Lipschitz-smooth, such that the proposed surrogate will solve the original SDP when the penalty parameter is larger than this bound. Experiments on two SDP-related machine learning applications demonstrate that the proposed algorithm is as accurate as the state-of-the-art, but is faster on large-scale learning.


  Click for Model/Code and Paper
Query Adaptive Late Fusion for Image Retrieval

Oct 31, 2018
Zhongdao Wang, Liang Zheng, Shengjin Wang

Feature fusion is a commonly used strategy in image retrieval tasks, which aggregates the matching responses of multiple visual features. Feasible sets of features can be either descriptors (SIFT, HSV) for an entire image or the same descriptor for different local parts (face, body). Ideally, the to-be-fused heterogeneous features are pre-assumed to be discriminative and complementary to each other. However, the effectiveness of different features varies dramatically according to different queries. That is to say, for some queries, a feature may be neither discriminative nor complementary to existing ones, while for other queries, the feature suffices. As a result, it is important to estimate the effectiveness of features in a query-adaptive manner. To this end, this article proposes a new late fusion scheme at the score level. We base our method on the observation that the sorted score curves contain patterns that describe their effectiveness. For example, an "L"-shaped curve indicates that the feature is discriminative while a gradually descending curve suggests a bad feature. As such, this paper introduces a query-adaptive late fusion pipeline. In the hand-crafted version, it can be an unsupervised approach to tasks like particular object retrieval. In the learning version, it can also be applied to supervised tasks like person recognition and pedestrian retrieval, based on a trainable neural module. Extensive experiments are conducted on two object retrieval datasets and one person recognition dataset. We show that our method is able to highlight the good features and suppress the bad ones, is resilient to distractor features, and achieves very competitive retrieval accuracy compared with the state of the art. In an additional person re-identification dataset, the application scope and limitation of the proposed method are studied.


  Click for Model/Code and Paper
Irregular Convolutional Neural Networks

Jun 24, 2017
Jiabin Ma, Wei Wang, Liang Wang

Convolutional kernels are basic and vital components of deep Convolutional Neural Networks (CNN). In this paper, we equip convolutional kernels with shape attributes to generate the deep Irregular Convolutional Neural Networks (ICNN). Compared to traditional CNN applying regular convolutional kernels like ${3\times3}$, our approach trains irregular kernel shapes to better fit the geometric variations of input features. In other words, shapes are learnable parameters in addition to weights. The kernel shapes and weights are learned simultaneously during end-to-end training with the standard back-propagation algorithm. Experiments for semantic segmentation are implemented to validate the effectiveness of our proposed ICNN.

* 7 pages, 5 figures, 3 tables 

  Click for Model/Code and Paper
Instance-aware Image and Sentence Matching with Selective Multimodal LSTM

Nov 17, 2016
Yan Huang, Wei Wang, Liang Wang

Effective image and sentence matching depends on how to well measure their global visual-semantic similarity. Based on the observation that such a global similarity arises from a complex aggregation of multiple local similarities between pairwise instances of image (objects) and sentence (words), we propose a selective multimodal Long Short-Term Memory network (sm-LSTM) for instance-aware image and sentence matching. The sm-LSTM includes a multimodal context-modulated attention scheme at each timestep that can selectively attend to a pair of instances of image and sentence, by predicting pairwise instance-aware saliency maps for image and sentence. For selected pairwise instances, their representations are obtained based on the predicted saliency maps, and then compared to measure their local similarity. By similarly measuring multiple local similarities within a few timesteps, the sm-LSTM sequentially aggregates them with hidden states to obtain a final matching score as the desired global similarity. Extensive experiments show that our model can well match image and sentence with complex content, and achieve the state-of-the-art results on two public benchmark datasets.


  Click for Model/Code and Paper
CatGAN: Category-aware Generative Adversarial Networks with Hierarchical Evolutionary Learning for Category Text Generation

Nov 20, 2019
Zhiyue Liu, Jiahai Wang, Zhiwei Liang

Generating multiple categories of texts is a challenging task and draws more and more attention. Since generative adversarial nets (GANs) have shown competitive results on general text generation, they are extended for category text generation in some previous works. However, the complicated model structures and learning strategies limit their performance and exacerbate the training instability. This paper proposes a category-aware GAN (CatGAN) which consists of an efficient category-aware model for category text generation and a hierarchical evolutionary learning algorithm for training our model. The category-aware model directly measures the gap between real samples and generated samples on each category, then reducing this gap will guide the model to generate high-quality category samples. The Gumbel-Softmax relaxation further frees our model from complicated learning strategies for updating CatGAN on discrete data. Moreover, only focusing on the sample quality normally leads the mode collapse problem, thus a hierarchical evolutionary learning algorithm is introduced to stabilize the training procedure and obtain the trade-off between quality and diversity while training CatGAN. Experimental results demonstrate that CatGAN outperforms most of the existing state-of-the-art methods.

* 15 pages, 4 figures. Accepted by AAAI 2020 

  Click for Model/Code and Paper
An Efficient Method of Detection and Recognition in Remote Sensing Image Based on multi-angle Region of Interests

Jul 22, 2019
Hongyu Wang, Wei Liang, Guangcun Shan

Presently, deep learning technology has been widely used in the field of image recognition. However, it mainly aims at the recognition and detection of ordinary pictures and common scenes. As special images, remote sensing images have different shooting angles and shooting methods compared with ordinary ones, which makes remote sensing images play an irreplaceable role in some areas. In this paper, based on a deep convolution neural network for providing multi-level information of images and combines RPN (Region Proposal Network) for generating multi-angle ROIs (Region of Interest), a new model for object detection and recognition in remote sensing images is proposed. In the experiment, it achieves better results than traditional ways, which demonstrate that the model proposed here would have a huge potential application in remote sensing image recognition.

* 4 pages, 3 figures 

  Click for Model/Code and Paper
Richly Activated Graph Convolutional Network for Action Recognition with Incomplete Skeletons

May 17, 2019
Yi-Fan Song, Zhang Zhang, Liang Wang

Current methods for skeleton-based human action recognition usually work with completely observed skeletons. However, in real scenarios, it is prone to capture incomplete and noisy skeletons, which will deteriorate the performance of traditional models. To enhance the robustness of action recognition models to incomplete skeletons, we propose a multi-stream graph convolutional network (GCN) for exploring sufficient discriminative features distributed over all skeleton joints. Here, each stream of the network is only responsible for learning features from currently unactivated joints, which are distinguished by the class activation maps (CAM) obtained by preceding streams, so that the activated joints of the proposed method are obviously more than traditional methods. Thus, the proposed method is termed richly activated GCN (RA-GCN), where the richly discovered features will improve the robustness of the model. Compared to the state-of-the-art methods, the RA-GCN achieves comparable performance on the NTU RGB+D dataset. Moreover, on a synthetic occlusion dataset, the performance deterioration can be alleviated by the RA-GCN significantly.

* Accepted by ICIP 2019, 5 pages, 3 figures, 3 tables 

  Click for Model/Code and Paper
A Dynamic Evolutionary Framework for Timeline Generation based on Distributed Representations

May 15, 2019
Dongyun Liang, Guohua Wang, Jing Nie

Given the collection of timestamped web documents related to the evolving topic, timeline summarization (TS) highlights its most important events in the form of relevant summaries to represent the development of a topic over time. Most of the previous work focuses on fully-observable ranking models and depends on hand-designed features or complex mechanisms that may not generalize well. We present a novel dynamic framework for evolutionary timeline generation leveraging distributed representations, which dynamically finds the most likely sequence of evolutionary summaries in the timeline, called the Viterbi timeline, and reduces the impact of events that irrelevant or repeated to the topic. The assumptions of the coherence and the global view run through our model. We explore adjacent relevance to constrain timeline coherence and make sure the events evolve on the same topic with a global view. Experimental results demonstrate that our framework is feasible to extract summaries for timeline generation, outperforms various competitive baselines, and achieves the state-of-the-art performance as an unsupervised approach.

* 4 pages, next version will be submitted to a conference 

  Click for Model/Code and Paper
Robust Encoder-Decoder Learning Framework towards Offline Handwritten Mathematical Expression Recognition Based on Multi-Scale Deep Neural Network

Feb 18, 2019
Guangcun Shan, Hongyu Wang, Wei Liang

Offline handwritten mathematical expression recognition is a challenging task, because handwritten mathematical expressions mainly have two problems in the process of recognition. On one hand, it is how to correctly recognize different mathematical symbols. On the other hand, it is how to correctly recognize the two-dimensional structure existing in mathematical expressions. Inspired by recent work in deep learning, a new neural network model that combines a Multi-Scale convolutional neural network (CNN) with an Attention recurrent neural network (RNN) is proposed to identify two-dimensional handwritten mathematical expressions as one-dimensional LaTeX sequences. As a result, the model proposed in the present work has achieved a WER error of 25.715% and ExpRate of 28.216%.

* 11 pages, 16 figures 

  Click for Model/Code and Paper