Models, code, and papers for "Ming Yang":

Res2Net: A New Multi-scale Backbone Architecture

Apr 02, 2019
Shang-Hua Gao, Ming-Ming Cheng, Kai Zhao, Xin-Yu Zhang, Ming-Hsuan Yang, Philip Torr

Representing features at multiple scales is of great importance for numerous vision tasks. Recent advances in backbone convolutional neural networks (CNNs) continually demonstrate stronger multi-scale representation ability, leading to consistent performance gains on a wide range of applications. However, most existing methods represent the multi-scale features in a layer-wise manner. In this paper, we propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block. The Res2Net represents multi-scale features at a granular level and increases the range of receptive fields for each network layer. The proposed Res2Net block can be plugged into the state-of-the-art backbone CNN models, e.g., ResNet, ResNeXt, and DLA. We evaluate the Res2Net block on all these models and demonstrate consistent performance gains over baseline models on widely-used datasets, e.g., CIFAR-100 and ImageNet. Further ablation studies and experimental results on representative computer vision tasks, i.e., object detection, class activation mapping, and salient object detection, further verify the superiority of the Res2Net over the state-of-the-art baseline methods. The source code and trained models will be made publicly available.


  Click for Model/Code and Paper
Simultaneous Subspace Clustering and Cluster Number Estimating based on Triplet Relationship

Jan 23, 2019
Jie Liang, Jufeng Yang, Ming-Ming Cheng, Paul L. Rosin, Liang Wang

In this paper we propose a unified framework to simultaneously discover the number of clusters and group the data points into them using subspace clustering. Real data distributed in a high-dimensional space can be disentangled into a union of low-dimensional subspaces, which can benefit various applications. To explore such intrinsic structure, state-of-the-art subspace clustering approaches often optimize a self-representation problem among all samples, to construct a pairwise affinity graph for spectral clustering. However, a graph with pairwise similarities lacks robustness for segmentation, especially for samples which lie on the intersection of two subspaces. To address this problem, we design a hyper-correlation based data structure termed as the \textit{triplet relationship}, which reveals high relevance and local compactness among three samples. The triplet relationship can be derived from the self-representation matrix, and be utilized to iteratively assign the data points to clusters. Three samples in each triplet are encouraged to be highly correlated and are considered as a meta-element during clustering, which show more robustness than pairwise relationships when segmenting two densely distributed subspaces. Based on the triplet relationship, we propose a unified optimizing scheme to automatically calculate clustering assignments. Specifically, we optimize a model selection reward and a fusion reward by simultaneously maximizing the similarity of triplets from different clusters while minimizing the correlation of triplets from same cluster. The proposed algorithm also automatically reveals the number of clusters and fuses groups to avoid over-segmentation. Extensive experimental results on both synthetic and real-world datasets validate the effectiveness and robustness of the proposed method.

* 13 pages, 4 figures, 6 tables 

  Click for Model/Code and Paper
A Closed-form Solution to Photorealistic Image Stylization

Jul 27, 2018
Yijun Li, Ming-Yu Liu, Xueting Li, Ming-Hsuan Yang, Jan Kautz

Photorealistic image stylization concerns transferring style of a reference photo to a content photo with the constraint that the stylized photo should remain photorealistic. While several photorealistic image stylization methods exist, they tend to generate spatially inconsistent stylizations with noticeable artifacts. In this paper, we propose a method to address these issues. The proposed method consists of a stylization step and a smoothing step. While the stylization step transfers the style of the reference photo to the content photo, the smoothing step ensures spatially consistent stylizations. Each of the steps has a closed-form solution and can be computed efficiently. We conduct extensive experimental validations. The results show that the proposed method generates photorealistic stylization outputs that are more preferred by human subjects as compared to those by the competing methods while running much faster. Source code and additional results are available at https://github.com/NVIDIA/FastPhotoStyle .

* Accepted by ECCV 2018 

  Click for Model/Code and Paper
Superpixel Sampling Networks

Jul 26, 2018
Varun Jampani, Deqing Sun, Ming-Yu Liu, Ming-Hsuan Yang, Jan Kautz

Superpixels provide an efficient low/mid-level representation of image data, which greatly reduces the number of image primitives for subsequent vision tasks. Existing superpixel algorithms are not differentiable, making them difficult to integrate into otherwise end-to-end trainable deep neural networks. We develop a new differentiable model for superpixel sampling that leverages deep networks for learning superpixel segmentation. The resulting "Superpixel Sampling Network" (SSN) is end-to-end trainable, which allows learning task-specific superpixels with flexible loss functions and has fast runtime. Extensive experimental analysis indicates that SSNs not only outperform existing superpixel algorithms on traditional segmentation benchmarks, but can also learn superpixels for other tasks. In addition, SSNs can be easily integrated into downstream deep networks resulting in performance improvements.

* ECCV2018. Project URL: https://varunjampani.github.io/ssn/ 

  Click for Model/Code and Paper
Context-Aware Synthesis and Placement of Object Instances

Dec 07, 2018
Donghoon Lee, Sifei Liu, Jinwei Gu, Ming-Yu Liu, Ming-Hsuan Yang, Jan Kautz

Learning to insert an object instance into an image in a semantically coherent manner is a challenging and interesting problem. Solving it requires (a) determining a location to place an object in the scene and (b) determining its appearance at the location. Such an object insertion model can potentially facilitate numerous image editing and scene parsing applications. In this paper, we propose an end-to-end trainable neural network for the task of inserting an object instance mask of a specified class into the semantic label map of an image. Our network consists of two generative modules where one determines where the inserted object mask should be (i.e., location and scale) and the other determines what the object mask shape (and pose) should look like. The two modules are connected together via a spatial transformation network and jointly trained. We devise a learning procedure that leverage both supervised and unsupervised data and show our model can insert an object at diverse locations with various appearances. We conduct extensive experimental validations with comparisons to strong baselines to verify the effectiveness of the proposed network.


  Click for Model/Code and Paper
Enhanced-alignment Measure for Binary Foreground Map Evaluation

Jul 24, 2018
Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming-Ming Cheng, Ali Borji

The existing binary foreground map (FM) measures to address various types of errors in either pixel-wise or structural ways. These measures consider pixel-level match or image-level information independently, while cognitive vision studies have shown that human vision is highly sensitive to both global information and local details in scenes. In this paper, we take a detailed look at current binary FM evaluation measures and propose a novel and effective E-measure (Enhanced-alignment measure). Our measure combines local pixel values with the image-level mean value in one term, jointly capturing image-level statistics and local pixel matching information. We demonstrate the superiority of our measure over the available measures on 4 popular datasets via 5 meta-measures, including ranking models for applications, demoting generic, random Gaussian noise maps, ground-truth switch, as well as human judgments. We find large improvements in almost all the meta-measures. For instance, in terms of application ranking, we observe improvementrangingfrom9.08% to 19.65% compared with other popular measures.

* 8pages, 10 figures, IJCAI 2018 (oral) 

  Click for Model/Code and Paper
Learning Binary Residual Representations for Domain-specific Video Streaming

Dec 14, 2017
Yi-Hsuan Tsai, Ming-Yu Liu, Deqing Sun, Ming-Hsuan Yang, Jan Kautz

We study domain-specific video streaming. Specifically, we target a streaming setting where the videos to be streamed from a server to a client are all in the same domain and they have to be compressed to a small size for low-latency transmission. Several popular video streaming services, such as the video game streaming services of GeForce Now and Twitch, fall in this category. While conventional video compression standards such as H.264 are commonly used for this task, we hypothesize that one can leverage the property that the videos are all in the same domain to achieve better video quality. Based on this hypothesis, we propose a novel video compression pipeline. Specifically, we first apply H.264 to compress domain-specific videos. We then train a novel binary autoencoder to encode the leftover domain-specific residual information frame-by-frame into binary representations. These binary representations are then compressed and sent to the client together with the H.264 stream. In our experiments, we show that our pipeline yields consistent gains over standard H.264 compression across several benchmark datasets while using the same channel bandwidth.

* Accepted in AAAI'18. Project website at https://research.nvidia.com/publication/2018-02_Learning-Binary-Residual 

  Click for Model/Code and Paper
Image Formation Model Guided Deep Image Super-Resolution

Aug 25, 2019
Jinshan Pan, Yang Liu, Deqing Sun, Jimmy Ren, Ming-Ming Cheng, Jian Yang, Jinhui Tang

We present a simple and effective image super-resolution algorithm that imposes an image formation constraint on the deep neural networks via pixel substitution. The proposed algorithm first uses a deep neural network to estimate intermediate high-resolution images, blurs the intermediate images using known blur kernels, and then substitutes values of the pixels at the un-decimated positions with those of the corresponding pixels from the low-resolution images. The output of the pixel substitution process strictly satisfies the image formation model and is further refined by the same deep neural network in a cascaded manner. The proposed framework is trained in an end-to-end fashion and can work with existing feed-forward deep neural networks for super-resolution and converges fast in practice. Extensive experimental results show that the proposed algorithm performs favorably against state-of-the-art methods.

* We need to improve this paper 

  Click for Model/Code and Paper
EGNet:Edge Guidance Network for Salient Object Detection

Aug 22, 2019
Jia-Xing Zhao, Jiangjiang Liu, Den-Ping Fan, Yang Cao, Jufeng Yang, Ming-Ming Cheng

Fully convolutional neural networks (FCNs) have shown their advantages in the salient object detection task. However, most existing FCNs-based methods still suffer from coarse object boundaries. In this paper, to solve this problem, we focus on the complementarity between salient edge information and salient object information. Accordingly, we present an edge guidance network (EGNet) for salient object detection with three steps to simultaneously model these two kinds of complementary information in a single network. In the first step, we extract the salient object features by a progressive fusion way. In the second step, we integrate the local edge information and global location information to obtain the salient edge features. Finally, to sufficiently leverage these complementary features, we couple the same salient edge features with salient object features at various resolutions. Benefiting from the rich edge information and location information in salient edge features, the fused features can help locate salient objects, especially their boundaries more accurately. Experimental results demonstrate that the proposed method performs favorably against the state-of-the-art methods on six widely used datasets without any pre-processing and post-processing. The source code is available at http: //mmcheng.net/egnet/.


  Click for Model/Code and Paper
Semantic Edge Detection with Diverse Deep Supervision

Apr 09, 2018
Yun Liu, Ming-Ming Cheng, JiaWang Bian, Le Zhang, Peng-Tao Jiang, Yang Cao

Semantic edge detection (SED), which aims at jointly extracting edges as well as their category information, has far-reaching applications in domains such as semantic segmentation, object proposal generation, and object recognition. SED naturally requires achieving two distinct supervision targets: locating fine detailed edges and identifying high-level semantics. We shed light on how such distracted supervision targets prevent state-of-the-art SED methods from effectively using deep supervision to improve results. In this paper, we propose a novel fully convolutional neural network architecture using diverse deep supervision (DDS) within a multi-task framework where lower layers aim at generating category-agnostic edges, while higher layers are responsible for the detection of category-aware semantic edges. To overcome the distracted supervision challenge, a novel information converter unit is introduced, whose effectiveness has been extensively evaluated in several popular benchmark datasets, including SBD, Cityscapes, and PASCAL VOC2012. Source code will be released upon paper acceptance.


  Click for Model/Code and Paper
MatchBench: An Evaluation of Feature Matchers

Aug 07, 2018
JiaWang Bian, Ruihan Yang, Yun Liu, Le Zhang, Ming-Ming Cheng, Ian Reid, WenHai Wu

Feature matching is one of the most fundamental and active research areas in computer vision. A comprehensive evaluation of feature matchers is necessary, since it would advance both the development of this field and also high-level applications such as Structure-from-Motion or Visual SLAM. However, to the best of our knowledge, no previous work targets the evaluation of feature matchers while they only focus on evaluating feature detectors and descriptors. This leads to a critical absence in this field that there is no standard datasets and evaluation metrics to evaluate different feature matchers fairly. To this end, we present the first uniform feature matching benchmark to facilitate the evaluation of feature matchers. In the proposed benchmark, matchers are evaluated in different aspects, involving matching ability, correspondence sufficiency, and efficiency. Also, their performances are investigated in different scenes and in different matching types. Subsequently, we carry out an extensive evaluation of different state-of-the-art matchers on the benchmark and make in-depth analyses based on the reported results. This can be used to design practical matching systems in real applications and also advocates the potential future research directions in the field of feature matching.


  Click for Model/Code and Paper
UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking

Sep 04, 2016
Longyin Wen, Dawei Du, Zhaowei Cai, Zhen Lei, Ming-Ching Chang, Honggang Qi, Jongwoo Lim, Ming-Hsuan Yang, Siwei Lyu

In recent years, numerous effective multi-object tracking (MOT) methods are developed because of the wide range of applications. Existing performance evaluations of MOT methods usually separate the object tracking step from the object detection step by using the same fixed object detection results for comparisons. In this work, we perform a comprehensive quantitative study on the effects of object detection accuracy to the overall MOT performance, using the new large-scale University at Albany DETection and tRACking (UA-DETRAC) benchmark dataset. The UA-DETRAC benchmark dataset consists of 100 challenging video sequences captured from real-world traffic scenes (over 140,000 frames with rich annotations, including occlusion, weather, vehicle category, truncation, and vehicle bounding boxes) for object detection, object tracking and MOT system. We evaluate complete MOT systems constructed from combinations of state-of-the-art object detection and object tracking methods. Our analysis shows the complex effects of object detection accuracy on MOT system performance. Based on these observations, we propose new evaluation tools and metrics for MOT systems that consider both object detection and object tracking for comprehensive analysis.

* 18 pages, 11 figures 

  Click for Model/Code and Paper
Virtual-to-Real: Learning to Control in Visual Semantic Segmentation

Oct 28, 2018
Zhang-Wei Hong, Chen Yu-Ming, Shih-Yang Su, Tzu-Yun Shann, Yi-Hsiang Chang, Hsuan-Kung Yang, Brian Hsi-Lin Ho, Chih-Chieh Tu, Yueh-Chuan Chang, Tsu-Ching Hsiao, Hsin-Wei Hsiao, Sih-Pin Lai, Chun-Yi Lee

Collecting training data from the physical world is usually time-consuming and even dangerous for fragile robots, and thus, recent advances in robot learning advocate the use of simulators as the training platform. Unfortunately, the reality gap between synthetic and real visual data prohibits direct migration of the models trained in virtual worlds to the real world. This paper proposes a modular architecture for tackling the virtual-to-real problem. The proposed architecture separates the learning model into a perception module and a control policy module, and uses semantic image segmentation as the meta representation for relating these two modules. The perception module translates the perceived RGB image to semantic image segmentation. The control policy module is implemented as a deep reinforcement learning agent, which performs actions based on the translated image segmentation. Our architecture is evaluated in an obstacle avoidance task and a target following task. Experimental results show that our architecture significantly outperforms all of the baseline methods in both virtual and real environments, and demonstrates a faster learning curve than them. We also present a detailed analysis for a variety of variant configurations, and validate the transferability of our modular architecture.

* 7 pages, accepted by IJCAI-18 

  Click for Model/Code and Paper
Approximation capabilities of neural networks on unbounded domains

Oct 21, 2019
Yang Qu, Ming-Xi Wang

We prove universal approximation theorems of neural networks in $L^{p}(\mathbb{R} \times [0, 1]^n)$, under the conditions that $p \in [2, \infty)$ and that the activiation function belongs to among others a monotone sigmoid, relu, elu, softplus or leaky relu. Our results partially generalize classical universal approximation theorems on $[0,1]^n.$


  Click for Model/Code and Paper
The option pricing model based on time values: an application of the universal approximation theory on unbounded domains

Oct 02, 2019
Yang Qu, Ming-Xi Wang

Hutchinson, Lo and Poggio raised the question that if learning works can learn the Black-Scholes formula, and they proposed the network mapping the ratio of underlying price to strike $S_t/K$ and the time to maturity $\tau$ directly into the ratio of option price to strike $C_t/K$. In this paper we propose a novel descision function and study the network mapping $S_t/K$ and $\tau$ into the ratio of time value to strike $V_t/K$. Time values' appearance in artificial intelligence fits into traders' natural intelligence. Empirical experiments will be carried out to demonstrate that it significantly improves Hutchinson-Lo-Poggio's original model by faster learning and better generalization performance. In order to take a conceptual viewpoint and to prove that $V_t/K$ but not $C_t/K$ can be approximated by superpositions of logistic functions on its domain of definition, we work on the theory of universal approximation on unbounded domains. We prove some general results which imply that an artificial neural network with a single hidden layer and sigmoid activation represents no function in $L^{p}(\RR^2 \times [0, 1]^{n})$ unless it is constant zero, and that an artificial neural network with a single hidden layer and logistic activation is a universal approximator of $L^{2}(\RR \times [0, 1]^{n})$. Our work partially generalizes Cybenko's fundamental universal approximation theorem on the unit hypercube $[0, 1]^{n}$.


  Click for Model/Code and Paper
A Novel Demodulation and Estimation Algorithm for Blackout Communication: Extract Principal Components with Deep Learning

May 27, 2019
Haoyan Liu, Yanming Liu, Ming Yang

For reentry or near space communication, owing to the influence of the time-varying plasma sheath channel environment, the received IQ baseband signals are severely rotated on the constellation. Researches have shown that the frequency of electron density varies from 20kHz to 100 kHz which is on the same order as the symbol rate of most TT\&C communication systems and a mass of bandwidth will be consumed to track the time-varying channel with traditional estimation. In this paper, motivated by principal curve analysis, we propose a deep learning (DL) algorithm which called symmetric manifold network (SMN) to extract the curves on the constellation and classify the signals based on the curves. The key advantage is that SMN can achieve joint optimization of demodulation and channel estimation. From our simulation results, the new algorithm significantly reduces the symbol error rate (SER) compared to existing algorithms and enables accurate estimation of fading with extremely high bandwith utilization rate.


  Click for Model/Code and Paper
S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking

Sep 26, 2016
Yi Yang, Ming-Wei Chang

Non-linear models recently receive a lot of attention as people are starting to discover the power of statistical and embedding features. However, tree-based models are seldom studied in the context of structured learning despite their recent success on various classification and ranking tasks. In this paper, we propose S-MART, a tree-based structured learning framework based on multiple additive regression trees. S-MART is especially suitable for handling tasks with dense features, and can be used to learn many different structures under various loss functions. We apply S-MART to the task of tweet entity linking --- a core component of tweet information extraction, which aims to identify and link name mentions to entities in a knowledge base. A novel inference algorithm is proposed to handle the special structure of the task. The experimental results show that S-MART significantly outperforms state-of-the-art tweet entity linking systems.

* Appeared in ACL 2015 proceedings. This is an updated version. More details available in the pdf file 

  Click for Model/Code and Paper
Provably Fast and Accurate Recovery of Evolutionary Trees through Harmonic Greedy Triplets

Nov 23, 2000
Miklos Csuros, Ming-Yang Kao

We give a greedy learning algorithm for reconstructing an evolutionary tree based on a certain harmonic average on triplets of terminal taxa. After the pairwise distances between terminal taxa are estimated from sequence data, the algorithm runs in O(n^2) time using O(n) work space, where n is the number of terminal taxa. These time and space complexities are optimal in the sense that the size of an input distance matrix is n^2 and the size of an output tree is n. Moreover, in the Jukes-Cantor model of evolution, the algorithm recovers the correct tree topology with high probability using sample sequences of length polynomial in (1) n, (2) the logarithm of the error probability, and (3) the inverses of two small parameters.

* The paper will appear in SIAM Journal on Computing 

  Click for Model/Code and Paper
Theoretical Investigation of Composite Neural Network

Oct 18, 2019
Ming-Chuan Yang, Meng Chang Chen

A composite neural network is a rooted directed acyclic graph combining a set of pre-trained and non-instantiated neural network models. A pre-trained neural network model is well-crafted for a specific task and with instantiated weights. is generally well trained, targeted to approximate a specific function. Despite a general belief that a composite neural network may perform better than a single component, the overall performance characteristics are not clear. In this work, we prove that there exist parameters such that a composite neural network performs better than any of its pre-trained components with a high probability bound.


  Click for Model/Code and Paper