Models, code, and papers for "Yunpeng Li":

From Here to There: Video Inbetweening Using Direct 3D Convolutions

Jun 04, 2019
Yunpeng Li, Dominik Roblek, Marco Tagliasacchi

We consider the problem of generating plausible and diverse video sequences, when we are only given a start and an end frame. This task is also known as inbetweening, and it belongs to the broader area of stochastic video generation, which is generally approached by means of recurrent neural networks (RNN). In this paper, we propose instead a fully convolutional model to generate video sequences directly in the pixel domain. We first obtain a latent video representation using a stochastic fusion mechanism that learns how to incorporate information from the start and end frames. Our model learns to produce such latent representation by progressively increasing the temporal resolution, and then decode in the spatiotemporal domain using 3D convolutions. The model is trained end-to-end by minimizing an adversarial loss. Experiments on several widely-used benchmark datasets show that it is able to generate meaningful and diverse in-between video sequences, according to both quantitative and qualitative evaluations.

  Click for Model/Code and Paper
A brief network analysis of Artificial Intelligence publication

Nov 23, 2013
Yunpeng Li, Jie Liu, Yong Deng

In this paper, we present an illustration to the history of Artificial Intelligence(AI) with a statistical analysis of publish since 1940. We collected and mined through the IEEE publish data base to analysis the geological and chronological variance of the activeness of research in AI. The connections between different institutes are showed. The result shows that the leading community of AI research are mainly in the USA, China, the Europe and Japan. The key institutes, authors and the research hotspots are revealed. It is found that the research institutes in the fields like Data Mining, Computer Vision, Pattern Recognition and some other fields of Machine Learning are quite consistent, implying a strong interaction between the community of each field. It is also showed that the research of Electronic Engineering and Industrial or Commercial applications are very active in California. Japan is also publishing a lot of papers in robotics. Due to the limitation of data source, the result might be overly influenced by the number of published articles, which is to our best improved by applying network keynode analysis on the research community instead of merely count the number of publish.

* 18 pages, 7 figures 

  Click for Model/Code and Paper
An Empirical Study of Generative Models with Encoders

Dec 19, 2018
Paul K. Rubenstein, Yunpeng Li, Dominik Roblek

Generative adversarial networks (GANs) are capable of producing high quality image samples. However, unlike variational autoencoders (VAEs), GANs lack encoders that provide the inverse mapping for the generators, i.e., encode images back to the latent space. In this work, we consider adversarially learned generative models that also have encoders. We evaluate models based on their ability to produce high quality samples and reconstructions of real images. Our main contributions are twofold: First, we find that the baseline Bidirectional GAN (BiGAN) can be improved upon with the addition of an autoencoder loss, at the expense of an extra hyper-parameter to tune. Second, we show that comparable performance to BiGAN can be obtained by simply training an encoder to invert the generator of a normal GAN.

  Click for Model/Code and Paper
Defuzzify firstly or finally: Dose it matter in fuzzy DEMATEL under uncertain environment?

Mar 20, 2014
Yunpeng Li, Ya Li, Jie Liu, Yong Deng

Decision-Making Trial and Evaluation Laboratory (DEMATEL) method is widely used in many real applications. With the desirable property of efficient handling with the uncertain information in decision making, the fuzzy DEMATEL is heavily studied. Recently, Dytczak and Ginda suggested to defuzzify the fuzzy numbers firstly and then use the classical DEMATEL to obtain the final result. In this short paper, we show that it is not reasonable in some situations. The results of defuzzification at the first step are not coincide with the results of defuzzification at the final step.It seems that the alternative is to defuzzification in the final step in fuzzy DEMATEL.

  Click for Model/Code and Paper
Facial Expression Restoration Based on Improved Graph Convolutional Networks

Oct 23, 2019
Zhilei Liu, Le Li, Yunpeng Wu, Cuicui Zhang

Facial expression analysis in the wild is challenging when the facial image is with low resolution or partial occlusion. Considering the correlations among different facial local regions under different facial expressions, this paper proposes a novel facial expression restoration method based on generative adversarial network by integrating an improved graph convolutional network (IGCN) and region relation modeling block (RRMB). Unlike conventional graph convolutional networks taking vectors as input features, IGCN can use tensors of face patches as inputs. It is better to retain the structure information of face patches. The proposed RRMB is designed to address facial generative tasks including inpainting and super-resolution with facial action units detection, which aims to restore facial expression as the ground-truth. Extensive experiments conducted on BP4D and DISFA benchmarks demonstrate the effectiveness of our proposed method through quantitative and qualitative evaluations.

* Accepted by MMM2020 

  Click for Model/Code and Paper
Dynamic Feature Fusion for Semantic Edge Detection

Feb 25, 2019
Yuan Hu, Yunpeng Chen, Xiang Li, Jiashi Feng

Features from multiple scales can greatly benefit the semantic edge detection task if they are well fused. However, the prevalent semantic edge detection methods apply a fixed weight fusion strategy where images with different semantics are forced to share the same weights, resulting in universal fusion weights for all images and locations regardless of their different semantics or local context. In this work, we propose a novel dynamic feature fusion strategy that assigns different fusion weights for different input images and locations adaptively. This is achieved by a proposed weight learner to infer proper fusion weights over multi-level features for each location of the feature map, conditioned on the specific input. In this way, the heterogeneity in contributions made by different locations of feature maps and input images can be better considered and thus help produce more accurate and sharper edge predictions. We show that our model with the novel dynamic feature fusion is superior to fixed weight fusion and also the na\"ive location-invariant weight fusion methods, via comprehensive experiments on benchmarks Cityscapes and SBD. In particular, our method outperforms all existing well established methods and achieves new state-of-the-art.

  Click for Model/Code and Paper
Microwave breast cancer detection using Empirical Mode Decomposition features

Feb 24, 2017
Hongchao Song, Yunpeng Li, Mark Coates, Aidong Men

Microwave-based breast cancer detection has been proposed as a complementary approach to compensate for some drawbacks of existing breast cancer detection techniques. Among the existing microwave breast cancer detection methods, machine learning-type algorithms have recently become more popular. These focus on detecting the existence of breast tumours rather than performing imaging to identify the exact tumour position. A key step of the machine learning approaches is feature extraction. One of the most widely used feature extraction method is principle component analysis (PCA). However, it can be sensitive to signal misalignment. This paper presents an empirical mode decomposition (EMD)-based feature extraction method, which is more robust to the misalignment. Experimental results involving clinical data sets combined with numerically simulated tumour responses show that combined features from EMD and PCA improve the detection performance with an ensemble selection-based classifier.

  Click for Model/Code and Paper
Deep Reasoning with Multi-scale Context for Salient Object Detection

Jan 24, 2019
Zun Li, Congyan Lang, Yunpeng Chen, Junhao Liew, Jiashi Feng

To detect and segment salient objects accurately, existing methods are usually devoted to designing complex network architectures to fuse powerful features from the backbone networks. However, they put much less efforts on the saliency inference module and only use a few fully convolutional layers to perform saliency reasoning from the fused features. However, should feature fusion strategies receive much attention but saliency reasoning be ignored a lot? In this paper, we find that weakness of the saliency reasoning unit limits salient object detection performance, and claim that saliency reasoning after multi-scale convolutional features fusion is critical. To verify our findings, we first extract multi-scale features with a fully convolutional network, and then directly reason from these comprehensive features using a deep yet light-weighted network, modified from ShuffleNet, to fast and precisely predict salient objects. Such simple design is shown to be capable of reasoning from multi-scale saliency features as well as giving superior saliency detection performance with less computation cost. Experimental results show that our simple framework outperforms the best existing method with 2.3\% and 3.6\% promotion for F-measure scores, 2.8\% reduction for MAE score on PASCAL-S, DUT-OMRON and SOD datasets respectively.

* Draft, 6 pages, 3 figures, 1 table 

  Click for Model/Code and Paper
$A^2$-Nets: Double Attention Networks

Oct 27, 2018
Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, Jiashi Feng

Learning to capture long-range relations is fundamental to image/video recognition. Existing CNN models generally rely on increasing depth to model such relations which is highly inefficient. In this work, we propose the "double attention block", a novel component that aggregates and propagates informative global features from the entire spatio-temporal space of input images/videos, enabling subsequent convolution layers to access features from the entire space efficiently. The component is designed with a double attention mechanism in two steps, where the first step gathers features from the entire space into a compact set through second-order attention pooling and the second step adaptively selects and distributes features to each location via another attention. The proposed double attention block is easy to adopt and can be plugged into existing deep neural networks conveniently. We conduct extensive ablation studies and experiments on both image and video recognition tasks for evaluating its performance. On the image recognition task, a ResNet-50 equipped with our double attention blocks outperforms a much larger ResNet-152 architecture on ImageNet-1k dataset with over 40% less the number of parameters and less FLOPs. On the action recognition task, our proposed model achieves the state-of-the-art results on the Kinetics and UCF-101 datasets with significantly higher efficiency than recent works.

* Accepted at NIPS 2018 

  Click for Model/Code and Paper
Multi-Fiber Networks for Video Recognition

Sep 18, 2018
Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, Jiashi Feng

In this paper, we aim to reduce the computational cost of spatio-temporal deep neural networks, making them run as fast as their 2D counterparts while preserving state-of-the-art accuracy on video recognition benchmarks. To this end, we present the novel Multi-Fiber architecture that slices a complex neural network into an ensemble of lightweight networks or fibers that run through the network. To facilitate information flow between fibers we further incorporate multiplexer modules and end up with an architecture that reduces the computational cost of 3D networks by an order of magnitude, while increasing recognition performance at the same time. Extensive experimental results show that our multi-fiber architecture significantly boosts the efficiency of existing convolution networks for both image and video recognition tasks, achieving state-of-the-art performance on UCF-101, HMDB-51 and Kinetics datasets. Our proposed model requires over 9x and 13x less computations than the I3D and R(2+1)D models, respectively, yet providing higher accuracy.

* ECCV 2018, Code is on GitHub 

  Click for Model/Code and Paper
Weaving Multi-scale Context for Single Shot Detector

Dec 08, 2017
Yunpeng Chen, Jianshu Li, Bin Zhou, Jiashi Feng, Shuicheng Yan

Aggregating context information from multiple scales has been proved to be effective for improving accuracy of Single Shot Detectors (SSDs) on object detection. However, existing multi-scale context fusion techniques are computationally expensive, which unfavorably diminishes the advantageous speed of SSD. In this work, we propose a novel network topology, called WeaveNet, that can efficiently fuse multi-scale information and boost the detection accuracy with negligible extra cost. The proposed WeaveNet iteratively weaves context information from adjacent scales together to enable more sophisticated context reasoning while maintaining fast speed. Built by stacking light-weight blocks, WeaveNet is easy to train without requiring batch normalization and can be further accelerated by our proposed architecture simplification. Experimental results on PASCAL VOC 2007, PASCAL VOC 2012 benchmarks show signification performance boost brought by WeaveNet. For 320x320 input of batch size = 8, WeaveNet reaches 79.5% mAP on PASCAL VOC 2007 test in 101 fps with only 4 fps extra cost, and further improves to 79.7% mAP with more iterations.

  Click for Model/Code and Paper
Dual Path Networks

Aug 01, 2017
Yunpeng Chen, Jianan Li, Huaxin Xiao, Xiaojie Jin, Shuicheng Yan, Jiashi Feng

In this work, we present a simple, highly efficient and modularized Dual Path Network (DPN) for image classification which presents a new topology of connection paths internally. By revealing the equivalence of the state-of-the-art Residual Network (ResNet) and Densely Convolutional Network (DenseNet) within the HORNN framework, we find that ResNet enables feature re-usage while DenseNet enables new features exploration which are both important for learning good representations. To enjoy the benefits from both path topologies, our proposed Dual Path Network shares common features while maintaining the flexibility to explore new features through dual path architectures. Extensive experiments on three benchmark datasets, ImagNet-1k, Places365 and PASCAL VOC, clearly demonstrate superior performance of the proposed DPN over state-of-the-arts. In particular, on the ImagNet-1k dataset, a shallow DPN surpasses the best ResNeXt-101(64x4d) with 26% smaller model size, 25% less computational cost and 8% lower memory consumption, and a deeper DPN (DPN-131) further pushes the state-of-the-art single model performance with about 2 times faster training speed. Experiments on the Places365 large-scale scene dataset, PASCAL VOC detection dataset, and PASCAL VOC segmentation dataset also demonstrate its consistently better performance than DenseNet, ResNet and the latest ResNeXt model over various applications.

* for code and models, see 

  Click for Model/Code and Paper
A Unified RGB-T Saliency Detection Benchmark: Dataset, Baselines, Analysis and A Novel Approach

Jan 11, 2017
Chenglong Li, Guizhao Wang, Yunpeng Ma, Aihua Zheng, Bin Luo, Jin Tang

Despite significant progress, image saliency detection still remains a challenging task in complex scenes and environments. Integrating multiple different but complementary cues, like RGB and Thermal (RGB-T), may be an effective way for boosting saliency detection performance. The current research in this direction, however, is limited by the lack of a comprehensive benchmark. This work contributes such a RGB-T image dataset, which includes 821 spatially aligned RGB-T image pairs and their ground truth annotations for saliency detection purpose. The image pairs are with high diversity recorded under different scenes and environmental conditions, and we annotate 11 challenges on these image pairs for performing the challenge-sensitive analysis for different saliency detection algorithms. We also implement 3 kinds of baseline methods with different modality inputs to provide a comprehensive comparison platform. With this benchmark, we propose a novel approach, multi-task manifold ranking with cross-modality consistency, for RGB-T saliency detection. In particular, we introduce a weight for each modality to describe the reliability, and integrate them into the graph-based manifold ranking algorithm to achieve adaptive fusion of different source data. Moreover, we incorporate the cross-modality consistent constraints to integrate different modalities collaboratively. For the optimization, we design an efficient algorithm to iteratively solve several subproblems with closed-form solutions. Extensive experiments against other baseline methods on the newly created benchmark demonstrate the effectiveness of the proposed approach, and we also provide basic insights and potential future research directions for RGB-T saliency detection.

  Click for Model/Code and Paper
BCCNet: Bayesian classifier combination neural network

Nov 29, 2018
Olga Isupova, Yunpeng Li, Danil Kuzin, Stephen J Roberts, Katherine Willis, Steven Reece

Machine learning research for developing countries can demonstrate clear sustainable impact by delivering actionable and timely information to in-country government organisations (GOs) and NGOs in response to their critical information requirements. We co-create products with UK and in-country commercial, GO and NGO partners to ensure the machine learning algorithms address appropriate user needs whether for tactical decision making or evidence-based policy decisions. In one particular case, we developed and deployed a novel algorithm, BCCNet, to quickly process large quantities of unstructured data to prevent and respond to natural disasters. Crowdsourcing provides an efficient mechanism to generate labels from unstructured data to prime machine learning algorithms for large scale data analysis. However, these labels are often imperfect with qualities varying among different citizen scientists, which prohibits their direct use with many state-of-the-art machine learning techniques. We describe BCCNet, a framework that simultaneously aggregates biased and contradictory labels from the crowd and trains an automatic classifier to process new data. Our case studies, mosquito sound detection for malaria prevention and damage detection for disaster response, show the efficacy of our method in the challenging context of developing world applications.

* Presented at NeurIPS 2018 Workshop on Machine Learning for the Developing World 

  Click for Model/Code and Paper
Cost-sensitive detection with variational autoencoders for environmental acoustic sensing

Dec 07, 2017
Yunpeng Li, Ivan Kiskin, Davide Zilli, Marianne Sinka, Henry Chan, Kathy Willis, Stephen Roberts

Environmental acoustic sensing involves the retrieval and processing of audio signals to better understand our surroundings. While large-scale acoustic data make manual analysis infeasible, they provide a suitable playground for machine learning approaches. Most existing machine learning techniques developed for environmental acoustic sensing do not provide flexible control of the trade-off between the false positive rate and the false negative rate. This paper presents a cost-sensitive classification paradigm, in which the hyper-parameters of classifiers and the structure of variational autoencoders are selected in a principled Neyman-Pearson framework. We examine the performance of the proposed approach using a dataset from the HumBug project which aims to detect the presence of mosquitoes using sound collected by simple embedded devices.

* Presented at the NIPS 2017 Workshop on Machine Learning for Audio Signal Processing 

  Click for Model/Code and Paper
Mosquito detection with low-cost smartphones: data acquisition for malaria research

Dec 06, 2017
Yunpeng Li, Davide Zilli, Henry Chan, Ivan Kiskin, Marianne Sinka, Stephen Roberts, Kathy Willis

Mosquitoes are a major vector for malaria, causing hundreds of thousands of deaths in the developing world each year. Not only is the prevention of mosquito bites of paramount importance to the reduction of malaria transmission cases, but understanding in more forensic detail the interplay between malaria, mosquito vectors, vegetation, standing water and human populations is crucial to the deployment of more effective interventions. Typically the presence and detection of malaria-vectoring mosquitoes is only quantified by hand-operated insect traps or signified by the diagnosis of malaria. If we are to gather timely, large-scale data to improve this situation, we need to automate the process of mosquito detection and classification as much as possible. In this paper, we present a candidate mobile sensing system that acts as both a portable early warning device and an automatic acoustic data acquisition pipeline to help fuel scientific inquiry and policy. The machine learning algorithm that powers the mobile system achieves excellent off-line multi-species detection performance while remaining computationally efficient. Further, we have conducted preliminary live mosquito detection tests using low-cost mobile phones and achieved promising results. The deployment of this system for field usage in Southeast Asia and Africa is planned in the near future. In order to accelerate processing of field recordings and labelling of collected data, we employ a citizen science platform in conjunction with automated methods, the former implemented using the Zooniverse platform, allowing crowdsourcing on a grand scale.

* Presented at NIPS 2017 Workshop on Machine Learning for the Developing World 

  Click for Model/Code and Paper
Video Scene Parsing with Predictive Feature Learning

Dec 13, 2016
Xiaojie Jin, Xin Li, Huaxin Xiao, Xiaohui Shen, Zhe Lin, Jimei Yang, Yunpeng Chen, Jian Dong, Luoqi Liu, Zequn Jie, Jiashi Feng, Shuicheng Yan

In this work, we address the challenging video scene parsing problem by developing effective representation learning methods given limited parsing annotations. In particular, we contribute two novel methods that constitute a unified parsing framework. (1) \textbf{Predictive feature learning}} from nearly unlimited unlabeled video data. Different from existing methods learning features from single frame parsing, we learn spatiotemporal discriminative features by enforcing a parsing network to predict future frames and their parsing maps (if available) given only historical frames. In this way, the network can effectively learn to capture video dynamics and temporal context, which are critical clues for video scene parsing, without requiring extra manual annotations. (2) \textbf{Prediction steering parsing}} architecture that effectively adapts the learned spatiotemporal features to scene parsing tasks and provides strong guidance for any off-the-shelf parsing model to achieve better video scene parsing performance. Extensive experiments over two challenging datasets, Cityscapes and Camvid, have demonstrated the effectiveness of our methods by showing significant improvement over well-established baselines.

* 15 pages, 7 figures, 5 tables, currently v2 

  Click for Model/Code and Paper
A Survey on Theoretical Advances of Community Detection in Networks

Aug 26, 2018
Yunpeng Zhao

Real-world networks usually have community structure, that is, nodes are grouped into densely connected communities. Community detection is one of the most popular and best-studied research topics in network science and has attracted attention in many different fields, including computer science, statistics, social sciences, among others. Numerous approaches for community detection have been proposed in literature, from ad-hoc algorithms to systematic model-based approaches. The large number of available methods leads to a fundamental question: whether a certain method can provide consistent estimates of community labels. The stochastic blockmodel (SBM) and its variants provide a convenient framework for the study of such problems. This article is a survey on the recent theoretical advances of community detection. The authors review a number of community detection methods and their theoretical properties, including graph cut methods, profile likelihoods, the pseudo-likelihood method, the variational method, belief propagation, spectral clustering, and semidefinite relaxations of the SBM. The authors also briefly discuss other research topics in community detection such as robust community detection, community detection with nodal covariates and model selection, as well as suggest a few possible directions for future research.

* Wire Computational Statistics, 2017 

  Click for Model/Code and Paper
Network Inference from Temporal-Dependent Grouped Observations

Aug 25, 2018
Yunpeng Zhao

In social network analysis, the observed data is usually some social behavior, such as the formation of groups, rather than an explicit network structure. Zhao and Weko (2017) propose a model-based approach called the hub model to infer implicit networks from grouped observations. The hub model assumes independence between groups, which sometimes is not valid in practice. In this article, we generalize the idea of the hub model into the case of grouped observations with temporal dependence. As in the hub model, we assume that the group at each time point is gathered by one leader. Unlike in the hub model, the group leaders are not sampled independently but follow a Markov chain, and other members in adjacent groups can also be correlated. An expectation-maximization (EM) algorithm is developed for this model and a polynomial-time algorithm is proposed for the E-step. The performance of the new model is evaluated under different simulation settings. We apply this model to a data set of the Kibale Chimpanzee Project.

  Click for Model/Code and Paper