Models, code, and papers for "Teng Wang":

The Monkeytyping Solution to the YouTube-8M Video Understanding Challenge

Jun 16, 2017
He-Da Wang, Teng Zhang, Ji Wu

This article describes the final solution of team monkeytyping, who finished in second place in the YouTube-8M video understanding challenge. The dataset used in this challenge is a large-scale benchmark for multi-label video classification. We extend the work in [1] and propose several improvements for frame sequence modeling. We propose a network structure called Chaining that can better capture the interactions between labels. Also, we report our approaches in dealing with multi-scale information and attention pooling. In addition, We find that using the output of model ensemble as a side target in training can boost single model performance. We report our experiments in bagging, boosting, cascade, and stacking, and propose a stacking algorithm called attention weighted stacking. Our final submission is an ensemble that consists of 74 sub models, all of which are listed in the appendix.

* Submitted to the CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding 

  Click for Model/Code and Paper
A Survey of FPGA Based Deep Learning Accelerators: Challenges and Opportunities

Dec 25, 2018
Teng Wang, Chao Wang, Xuehai Zhou, Huaping Chen

With the rapid development of in-depth learning, neural network and deep learning algorithms have been widely used in various fields, e.g., image, video and voice processing. However, the neural network model is getting larger and larger, which is expressed in the calculation of model parameters. Although a wealth of existing efforts on GPU platforms currently used by researchers for improving computing performance, dedicated hardware solutions are essential and emerging to provide advantages over pure software solutions. In this paper, we systematically investigate the neural network accelerator based on FPGA. Specifically, we respectively review the accelerators designed for specific problems, specific algorithms, algorithm features, and general templates. We also compared the design and implementation of the accelerator based on FPGA under different devices and network models and compared it with the versions of CPU and GPU. Finally, we present to discuss the advantages and disadvantages of accelerators on FPGA platforms and to further explore the opportunities for future research.


  Click for Model/Code and Paper
Soft-Autoencoder and Its Wavelet Shrinkage Interpretation

Dec 31, 2018
Fenglei Fan, Mengzhou Li, Yueyang Teng, Ge Wang

Deep learning is a main focus of artificial intelligence and has greatly impacted other fields. However, deep learning is often criticized for its lack of interpretation. As a successful unsupervised model in deep learning, various autoencoders, especially convolutional autoencoders, are very popular and important. Since these autoencoders need improvements and insights, in this paper we shed light on the nonlinearity of a deep convolutional autoencoder in perspective of perfect signal recovery. In particular, we propose a new type of convolutional autoencoders, termed as Soft-Autoencoder (Soft-AE), in which the activations of encoding layers are implemented with adaptable soft-thresholding units while decoding layers are realized with linear units. Consequently, Soft-AE can be naturally interpreted as a learned cascaded wavelet shrinkage system. Our denoising numerical experiments on CIFAR-10, BSD-300 and Mayo Clinical Challenge Dataset demonstrate that Soft-AE gives a competitive performance relative to its counterparts.


  Click for Model/Code and Paper
MsCGAN: Multi-scale Conditional Generative Adversarial Networks for Person Image Generation

Oct 19, 2018
Wei Tang, Teng Li, Fudong Nian, Meng Wang

To synthesize high quality person images with arbitrary poses is challenging. In this paper, we propose a novel Multi-scale Conditional Generative Adversarial Networks (MsCGAN), aiming to convert the input conditional person image to a synthetic image of any given target pose, whose appearance and the texture are consistent with the input image. MsCGAN is a multi-scale adversarial network consisting of two generators and two discriminators. One generator transforms the conditional person image into a coarse image of the target pose globally, and the other is to enhance the detailed quality of the synthetic person image through a local reinforcement network. The outputs of the two generators are then merged into a synthetic, discriminant and high-resolution image. On the other hand, the synthetic image is down-sampled to multiple resolutions as the input to multi-scale discriminator networks. The proposed multi-scale generators and discriminators handling different levels of visual features can benefit to synthesizing high resolution person images with realistic appearance and texture. Experiments are conducted on the Market-1501 and DeepFashion datasets to evaluate the proposed model, and both qualitative and quantitative results demonstrate superior performance of the proposed MsCGAN.


  Click for Model/Code and Paper
Model-Free Control for Distributed Stream Data Processing using Deep Reinforcement Learning

Mar 02, 2018
Teng Li, Zhiyuan Xu, Jian Tang, Yanzhi Wang

In this paper, we focus on general-purpose Distributed Stream Data Processing Systems (DSDPSs), which deal with processing of unbounded streams of continuous data at scale distributedly in real or near-real time. A fundamental problem in a DSDPS is the scheduling problem with the objective of minimizing average end-to-end tuple processing time. A widely-used solution is to distribute workload evenly over machines in the cluster in a round-robin manner, which is obviously not efficient due to lack of consideration for communication delay. Model-based approaches do not work well either due to the high complexity of the system environment. We aim to develop a novel model-free approach that can learn to well control a DSDPS from its experience rather than accurate and mathematically solvable system models, just as a human learns a skill (such as cooking, driving, swimming, etc). Specifically, we, for the first time, propose to leverage emerging Deep Reinforcement Learning (DRL) for enabling model-free control in DSDPSs; and present design, implementation and evaluation of a novel and highly effective DRL-based control framework, which minimizes average end-to-end tuple processing time by jointly learning the system environment via collecting very limited runtime statistics data and making decisions under the guidance of powerful Deep Neural Networks. To validate and evaluate the proposed framework, we implemented it based on a widely-used DSDPS, Apache Storm, and tested it with three representative applications. Extensive experimental results show 1) Compared to Storm's default scheduler and the state-of-the-art model-based method, the proposed framework reduces average tuple processing by 33.5% and 14.0% respectively on average. 2) The proposed framework can quickly reach a good scheduling solution during online learning, which justifies its practicability for online control in DSDPSs.

* 14 pages, this paper has been accepted by VLDB 2018 

  Click for Model/Code and Paper
Hybrid Linear Modeling via Local Best-fit Flats

May 01, 2012
Teng Zhang, Arthur Szlam, Yi Wang, Gilad Lerman

We present a simple and fast geometric method for modeling data by a union of affine subspaces. The method begins by forming a collection of local best-fit affine subspaces, i.e., subspaces approximating the data in local neighborhoods. The correct sizes of the local neighborhoods are determined automatically by the Jones' $\beta_2$ numbers (we prove under certain geometric conditions that our method finds the optimal local neighborhoods). The collection of subspaces is further processed by a greedy selection procedure or a spectral method to generate the final model. We discuss applications to tracking-based motion segmentation and clustering of faces under different illuminating conditions. We give extensive experimental evidence demonstrating the state of the art accuracy and speed of the suggested algorithms on these problems and also on synthetic hybrid linear data as well as the MNIST handwritten digits data; and we demonstrate how to use our algorithms for fast determination of the number of affine subspaces.

* International Journal of Computer Vision Volume 100, Issue 3 (2012), Page 217-240 
* This version adds some clarifications and numerical experiments as well as strengthens the previous theorem. For face experiments, we use here the Extended Yale Face Database B (cropped faces unlike previous version). This database points to a failure mode of our algorithms, but we suggest and successfully test a workaround 

  Click for Model/Code and Paper
Randomized hybrid linear modeling by local best-fit flats

May 05, 2010
Teng Zhang, Arthur Szlam, Yi Wang, Gilad Lerman

The hybrid linear modeling problem is to identify a set of d-dimensional affine sets in a D-dimensional Euclidean space. It arises, for example, in object tracking and structure from motion. The hybrid linear model can be considered as the second simplest (behind linear) manifold model of data. In this paper we will present a very simple geometric method for hybrid linear modeling based on selecting a set of local best fit flats that minimize a global l1 error measure. The size of the local neighborhoods is determined automatically by the Jones' l2 beta numbers; it is proven under certain geometric conditions that good local neighborhoods exist and are found by our method. We also demonstrate how to use this algorithm for fast determination of the number of affine subspaces. We give extensive experimental evidence demonstrating the state of the art accuracy and speed of the algorithm on synthetic and real hybrid linear data.

* 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (13-18 June 2010), pp. 1927-1934 
* To appear in the proceedings of CVPR 2010 

  Click for Model/Code and Paper
CT-image Super Resolution Using 3D Convolutional Neural Network

Jun 24, 2018
Yukai Wang, Qizhi Teng, Xiaohai He, Junxi Feng, Tingrong Zhang

Computed Tomography (CT) imaging technique is widely used in geological exploration, medical diagnosis and other fields. In practice, however, the resolution of CT image is usually limited by scanning devices and great expense. Super resolution (SR) methods based on deep learning have achieved surprising performance in two-dimensional (2D) images. Unfortunately, there are few effective SR algorithms for three-dimensional (3D) images. In this paper, we proposed a novel network named as three-dimensional super resolution convolutional neural network (3DSRCNN) to realize voxel super resolution for CT images. To solve the practical problems in training process such as slow convergence of network training, insufficient memory, etc., we utilized adjustable learning rate, residual-learning, gradient clipping, momentum stochastic gradient descent (SGD) strategies to optimize training procedure. In addition, we have explored the empirical guidelines to set appropriate number of layers of network and how to use residual learning strategy. Additionally, previous learning-based algorithms need to separately train for different scale factors for reconstruction, yet our single model can complete the multi-scale SR. At last, our method has better performance in terms of PSNR, SSIM and efficiency compared with conventional methods.


  Click for Model/Code and Paper
Sample-Efficient Neural Architecture Search by Learning Action Space

Jun 17, 2019
Linnan Wang, Saining Xie, Teng Li, Rodrigo Fonseca, Yuandong Tian

Neural Architecture Search (NAS) has emerged as a promising technique for automatic neural network design. However, existing NAS approaches often utilize manually designed action space, which is not directly related to the performance metric to be optimized (e.g., accuracy). As a result, using manually designed action space to perform NAS often leads to sample-inefficient explorations of architectures and thus can be sub-optimal. In order to improve sample efficiency, this paper proposes Latent Action Neural Architecture Search (LaNAS) that learns the action space to recursively partition the architecture search space into regions, each with concentrated performance metrics (\emph{i.e.}, low variance). During the search phase, as different architecture search action sequences lead to regions of different performance, the search efficiency can be significantly improved by biasing towards the regions with good performance. On the largest NAS dataset NasBench-101, our experimental results demonstrated that LaNAS is 22x, 14.6x and 12.4x more sample-efficient than random search, regularized evolution, and Monte Carlo Tree Search (MCTS) respectively. When applied to the open domain, LaNAS finds an architecture that achieves SoTA 98.0% accuracy on CIFAR-10 and 75.0% top1 accuracy on ImageNet (mobile setting), after exploring only 6,000 architectures.


  Click for Model/Code and Paper
Interactive Binary Image Segmentation with Edge Preservation

Sep 10, 2018
Jianfeng Zhang, Liezhuo Zhang, Yuankai Teng, Xiaoping Zhang, Song Wang, Lili Ju

Binary image segmentation plays an important role in computer vision and has been widely used in many applications such as image and video editing, object extraction, and photo composition. In this paper, we propose a novel interactive binary image segmentation method based on the Markov Random Field (MRF) framework and the fast bilateral solver (FBS) technique. Specifically, we employ the geodesic distance component to build the unary term. To ensure both computation efficiency and effective responsiveness for interactive segmentation, superpixels are used in computing geodesic distances instead of pixels. Furthermore, we take a bilateral affinity approach for the pairwise term in order to preserve edge information and denoise. Through the alternating direction strategy, the MRF energy minimization problem is divided into two subproblems, which then can be easily solved by steepest gradient descent (SGD) and FBS respectively. Experimental results on the VGG interactive image segmentation dataset show that the proposed algorithm outperforms several state-of-the-art ones, and in particular, it can achieve satisfactory edge-smooth segmentation results even when the foreground and background color appearances are quite indistinctive.


  Click for Model/Code and Paper
Semi-supervised Feature Learning For Improving Writer Identification

Oct 06, 2018
Shiming Chen, Yisong Wang, Chin-Teng Lin, Weiping Ding, Zehong Cao

Data augmentation is usually used by supervised learning approaches for offline writer identification, but such approaches require extra training data and potentially lead to overfitting errors. In this study, a semi-supervised feature learning pipeline was proposed to improve the performance of writer identification by training with extra unlabeled data and the original labeled data simultaneously. Specifically, we proposed a weighted label smoothing regularization (WLSR) method for data augmentation, which assigned the weighted uniform label distribution to the extra unlabeled data. The WLSR method could regularize the convolutional neural network (CNN) baseline to allow more discriminative features to be learned to represent the properties of different writing styles. The experimental results on well-known benchmark datasets (ICDAR2013 and CVL) showed that our proposed semi-supervised feature learning approach could significantly improve the baseline measurement and perform competitively with existing writer identification approaches. Our findings provide new insights into offline write identification.

* This manuscript is submitting to Information Science 

  Click for Model/Code and Paper
Coverage Sampling Planner for UAV-enabled Environmental Exploration and Field Mapping

Jul 12, 2019
Teng Li, Chaoqun Wang, Max Q. -H. Meng, Clarence W. de Silva

Unmanned Aerial Vehicles (UAVs) have been implemented for environmental monitoring by using their capabilities of mobile sensing, autonomous navigation, and remote operation. However, in real-world applications, the limitations of on-board resources (e.g., power supply) of UAVs will constrain the coverage of the monitored area and the number of the acquired samples, which will hinder the performance of field estimation and mapping. Therefore, the issue of constrained resources calls for an efficient sampling planner to schedule UAV-based sensing tasks in environmental monitoring. This paper presents a mission planner of coverage sampling and path planning for a UAV-enabled mobile sensor to effectively explore and map an unknown environment that is modeled as a random field. The proposed planner can generate a coverage path with an optimal coverage density for exploratory sampling, and the associated energy cost is subjected to a power supply constraint. The performance of the developed framework is evaluated and compared with the existing state-of-the-art algorithms, using a real-world dataset that is collected from an environmental monitoring program as well as physical field experiments. The experimental results illustrate the reliability and accuracy of the presented coverage sampling planner in a prior survey for environmental exploration and field mapping.


  Click for Model/Code and Paper
Crowded Scene Analysis: A Survey

Feb 06, 2015
Teng Li, Huan Chang, Meng Wang, Bingbing Ni, Richang Hong, Shuicheng Yan

Automated scene analysis has been a topic of great interest in computer vision and cognitive science. Recently, with the growth of crowd phenomena in the real world, crowded scene analysis has attracted much attention. However, the visual occlusions and ambiguities in crowded scenes, as well as the complex behaviors and scene semantics, make the analysis a challenging task. In the past few years, an increasing number of works on crowded scene analysis have been reported, covering different aspects including crowd motion pattern learning, crowd behavior and activity analysis, and anomaly detection in crowds. This paper surveys the state-of-the-art techniques on this topic. We first provide the background knowledge and the available features related to crowded scenes. Then, existing models, popular algorithms, evaluation protocols, as well as system performance are provided corresponding to different aspects of crowded scene analysis. We also outline the available datasets for performance evaluation. Finally, some research problems and promising future directions are presented with discussions.

* 20 pages in IEEE Transactions on Circuits and Systems for Video Technology, 2015 

  Click for Model/Code and Paper
SRM: An Efficient Framework for Autonomous Robotic Exploration in Indoor Environments

Dec 24, 2018
Chaoqun Wang, Delong Zhu, Teng Li, Max Q. -H. Meng, Clarence De. Silva

In this paper, we propose an integrated framework for the autonomous robotic exploration in indoor environments. Specially, we present a hybrid map, named Semantic Road Map (SRM), to represent the topological structure of the explored environment and facilitate decision-making in the exploration. The SRM is built incrementally along with the exploration process. It is a graph structure with collision-free nodes and edges that are generated within the sensor coverage. Moreover, each node has a semantic label and the expected information gain at that location. Based on the concise SRM, we present a novel and effective decision-making model to determine the next-best-target (NBT) during the exploration. The model concerns the semantic information, the information gain, and the path cost to the target location. We use the nodes of SRM to represent the candidate targets, which enables the target evaluation to be performed directly on the SRM. With the SRM, both the information gain of a node and the path cost to the node can be obtained efficiently. Besides, we adopt the cross-entropy method to optimize the path to make it more informative. We conduct experimental studies in both simulated and real-world environments, which demonstrate the effectiveness of the proposed method.


  Click for Model/Code and Paper
Mobile APP User Attribute Prediction by Heterogeneous Information Network Modeling

Oct 06, 2019
Hekai Zhang, Jibing Gong, Zhiyong Teng, Dan Wang, Hongfei Wang, Linfeng Du, Zakirul Alam Bhuiyan

User-based attribute information, such as age and gender, is usually considered as user privacy information. It is difficult for enterprises to obtain user-based privacy attribute information. However, user-based privacy attribute information has a wide range of applications in personalized services, user behavior analysis and other aspects. this paper advances the HetPathMine model and puts forward TPathMine model. With applying the number of clicks of attributes under each node to express the user's emotional preference information, optimizations of the solution of meta-path weight are also presented. Based on meta-path in heterogeneous information networks, the new model integrates all relationships among objects into isomorphic relationships of classified objects. Matrix is used to realize the knowledge dissemination of category knowledge among isomorphic objects. The experimental results show that: (1) the prediction of user attributes based on heterogeneous information networks can achieve higher accuracy than traditional machine learning classification methods; (2) TPathMine model based on the number of clicks is more accurate in classifying users of different age groups, and the weight of each meta-path is consistent with human intuition or the real world situation.

* 10 pages,3 figures,International Conference on Dependability in Sensor, Cloud, and Big Data Systems and Applications 

  Click for Model/Code and Paper
Object-Agnostic Suction Grasp Affordance Detection in Dense Cluster Using Self-Supervised Learning.docx

Jun 07, 2019
Mingshuo Han, Wenhai Liu., Zhenyu Pan, Teng Xue, Quanquan Shao, Jin Ma, Weiming Wang

In this paper we study grasp problem in dense cluster, a challenging task in warehouse logistics scenario. By introducing a two-step robust suction affordance detection method, we focus on using vacuum suction pad to clear up a box filled with seen and unseen objects. Two CNN based neural networks are proposed. A Fast Region Estimation Network (FRE-Net) predicts which region contains pickable objects, and a Suction Grasp Point Affordance network (SGPA-Net) determines which point in that region is pickable. So as to enable such two networks, we design a self-supervised learning pipeline to accumulate data, train and test the performance of our method. In both virtual and real environment, within 1500 picks (~5 hours), we reach a picking accuracy of 95% for known objects and 90% for unseen objects with similar geometry features.


  Click for Model/Code and Paper
Privacy-preserving Crowd-guided AI Decision-making in Ethical Dilemmas

Jun 04, 2019
Teng Wang, Jun Zhao, Han Yu, Jinyan Liu, Xinyu Yang, Xuebin Ren, Shuyu Shi

With the rapid development of artificial intelligence (AI), ethical issues surrounding AI have attracted increasing attention. In particular, autonomous vehicles may face moral dilemmas in accident scenarios, such as staying the course resulting in hurting pedestrians or swerving leading to hurting passengers. To investigate such ethical dilemmas, recent studies have adopted preference aggregation, in which each voter expresses her/his preferences over decisions for the possible ethical dilemma scenarios, and a centralized system aggregates these preferences to obtain the winning decision. Although a useful methodology for building ethical AI systems, such an approach can potentially violate the privacy of voters since moral preferences are sensitive information and their disclosure can be exploited by malicious parties. In this paper, we report a first-of-its-kind privacy-preserving crowd-guided AI decision-making approach in ethical dilemmas. We adopt the notion of differential privacy to quantify privacy and consider four granularities of privacy protection by taking voter-/record-level privacy protection and centralized/distributed perturbation into account, resulting in four approaches VLCP, RLCP, VLDP, and RLDP. Moreover, we propose different algorithms to achieve these privacy protection granularities, while retaining the accuracy of the learned moral preference model. Specifically, VLCP and RLCP are implemented with the data aggregator setting a universal privacy parameter and perturbing the averaged moral preference to protect the privacy of voters' data. VLDP and RLDP are implemented in such a way that each voter perturbs her/his local moral preference with a personalized privacy parameter. Extensive experiments on both synthetic and real data demonstrate that the proposed approach can achieve high accuracy of preference aggregation while protecting individual voter's privacy.

* 11pages 

  Click for Model/Code and Paper
Bayesian Grasp: Robotic visual stable grasp based on prior tactile knowledge

May 30, 2019
Teng Xue, Wenhai Liu, Mingshuo Han, Zhenyu Pan, Jin Ma, Quanquan Shao, Weiming Wang

Robotic grasp detection is a fundamental capability for intelligent manipulation in unstructured environments. Previous work mainly employed visual and tactile fusion to achieve stable grasp, while, the whole process depending heavily on regrasping, which wastes much time to regulate and evaluate. We propose a novel way to improve robotic grasping: by using learned tactile knowledge, a robot can achieve a stable grasp from an image. First, we construct a prior tactile knowledge learning framework with novel grasp quality metric which is determined by measuring its resistance to external perturbations. Second, we propose a multi-phases Bayesian Grasp architecture to generate stable grasp configurations through a single RGB image based on prior tactile knowledge. Results show that this framework can classify the outcome of grasps with an average accuracy of 86% on known objects and 79% on novel objects. The prior tactile knowledge improves the successful rate of 55% over traditional vision-based strategies.

* ICRA2019: ViTac Workshop 

  Click for Model/Code and Paper
Parameter Constrained Transfer Learning for Low Dose PET Image Denoising

Oct 13, 2019
Yu Gong, Yueyang Teng, Hongming Shan, Taohui Xiao, Ming Li, Guodong Liang, Ge Wang, Shanshan Wang

Positron emission tomography (PET) is widely used in clinical practice. However, the potential risk of PET-associated radiation dose to patients needs to be minimized. With reduction of the radiation dose, the resultant images may suffer from noise and artifacts which compromises the diagnostic performance. In this paper, we propose a parameter-constrained generative adversarial network with Wasserstein distance and perceptual loss (PC-WGAN) for low-dose PET image denoising. This method makes two main contributions: 1) a PC-WGAN framework is designed to denoise low-dose PET images without compromising structural details; and 2) a transfer learning strategy is developed to train PC-WGAN with parameters being constrained, which has major merits; namely, making the training process of PC-WGAN efficient and improving the quality of denoised images. The experimental results on clinical data show that the proposed network can suppress image noise more effectively while preserving better image fidelity than three selected state-of-the-art methods.

* 10 pages and 12 figures 

  Click for Model/Code and Paper
A Study of Deep Feature Fusion based Methods for Classifying Multi-lead ECG

Aug 06, 2018
Bin Chen, Wei Guo, Bin Li, Rober K. F. Teng, Mingjun Dai, Jianping Luo, Hui Wang

An automatic classification method has been studied to effectively detect and recognize Electrocardiogram (ECG). Based on the synchronizing and orthogonal relationships of multiple leads, we propose a Multi-branch Convolution and Residual Network (MBCRNet) with three kinds of feature fusion methods for automatic detection of normal and abnormal ECG signals. Experiments are conducted on the Chinese Cardiovascular Disease Database (CCDD). Through 10-fold cross-validation, we achieve an average accuracy of 87.04% and a sensitivity of 89.93%, which outperforms previous methods under the same database. It is also shown that the multi-lead feature fusion network can improve the classification accuracy over the network only with the single lead features.

* 6 pages, 5 figures 

  Click for Model/Code and Paper