Models, code, and papers for "Ye Tian":

THORS: An Efficient Approach for Making Classifiers Cost-sensitive

Nov 07, 2018
Ye Tian, Weiping Zhang

In this paper, we propose an effective THresholding method based on ORder Statistic, called THORS, to convert an arbitrary scoring-type classifier, which can induce a continuous cumulative distribution function of the score, into a cost-sensitive one. The procedure, uses order statistic to find an optimal threshold for classification, requiring almost no knowledge of classifiers itself. Unlike common data-driven methods, we analytically show that THORS has theoretical guaranteed performance, theoretical bounds for the costs and lower time complexity. Coupled with empirical results on several real-world data sets, we argue that THORS is the preferred cost-sensitive technique.

* 26 pages, 6 figures 

  Click for Model/Code and Paper
Discovering Spatio-Temporal Action Tubes

Nov 29, 2018
Yuancheng Ye, Xiaodong Yang, Yingli Tian

In this paper, we address the challenging problem of spatial and temporal action detection in videos. We first develop an effective approach to localize frame-level action regions through integrating static and kinematic information by the early- and late-fusion detection scheme. With the intention of exploring important temporal connections among the detected action regions, we propose a tracking-by-point-matching algorithm to stitch the discrete action regions into a continuous spatio-temporal action tube. Recurrent 3D convolutional neural network is used to predict action categories and determine temporal boundaries of the generated tubes. We then introduce an action footprint map to refine the candidate tubes based on the action-specific spatial characteristics preserved in the convolutional layers of R3DCNN. In the extensive experiments, our method achieves superior detection results on the three public benchmark datasets: UCFSports, J-HMDB and UCF101.


  Click for Model/Code and Paper
ClusterNet: 3D Instance Segmentation in RGB-D Images

Sep 19, 2018
Lin Shao, Ye Tian, Jeannette Bohg

We propose a method for instance-level segmentation that uses RGB-D data as input and provides detailed information about the location, geometry and number of individual objects in the scene. This level of understanding is fundamental for autonomous robots. It enables safe and robust decision-making under the large uncertainty of the real-world. In our model, we propose to use the first and second order moments of the object occupancy function to represent an object instance. We train an hourglass Deep Neural Network (DNN) where each pixel in the output votes for the 3D position of the corresponding object center and for the object's size and pose. The final instance segmentation is achieved through clustering in the space of moments. The object-centric training loss is defined on the output of the clustering. Our method outperforms the state-of-the-art instance segmentation method on our synthesized dataset. We show that our method generalizes well on real-world data achieving visually better segmentation results.


  Click for Model/Code and Paper
Interpretable Intuitive Physics Model

Aug 29, 2018
Tian Ye, Xiaolong Wang, James Davidson, Abhinav Gupta

Humans have a remarkable ability to use physical commonsense and predict the effect of collisions. But do they understand the underlying factors? Can they predict if the underlying factors have changed? Interestingly, in most cases humans can predict the effects of similar collisions with different conditions such as changes in mass, friction, etc. It is postulated this is primarily because we learn to model physics with meaningful latent variables. This does not imply we can estimate the precise values of these meaningful variables (estimate exact values of mass or friction). Inspired by this observation, we propose an interpretable intuitive physics model where specific dimensions in the bottleneck layers correspond to different physical properties. In order to demonstrate that our system models these underlying physical properties, we train our model on collisions of different shapes (cube, cone, cylinder, spheres etc.) and test on collisions of unseen combinations of shapes. Furthermore, we demonstrate our model generalizes well even when similar scenes are simulated with different underlying properties.


  Click for Model/Code and Paper
Spatial Transformer Introspective Neural Network

May 16, 2018
Yunhan Zhao, Ye Tian, Wei Shen, Alan Yuille

Natural images contain many variations such as illumination differences, affine transformations, and shape distortions. Correctly classifying these variations poses a long standing problem. The most commonly adopted solution is to build large-scale datasets that contain objects under different variations. However, this approach is not ideal since it is computationally expensive and it is hard to cover all variations in one single dataset. Towards addressing this difficulty, we propose the spatial transformer introspective neural network (ST-INN) that explicitly generates samples with the unseen affine transformation variations in the training set. Experimental results indicate ST-INN achieves classification accuracy improvements on several benchmark datasets, including MNIST, affNIST, SVHN and CIFAR-10. We further extend our method to cross dataset classification tasks and few-shot learning problems to verify our method under extreme conditions and observe substantial improvements from experiment results.

* Submitted to BMVC2018 

  Click for Model/Code and Paper
PlatEMO: A MATLAB Platform for Evolutionary Multi-Objective Optimization

Jan 04, 2017
Ye Tian, Ran Cheng, Xingyi Zhang, Yaochu Jin

Over the last three decades, a large number of evolutionary algorithms have been developed for solving multiobjective optimization problems. However, there lacks an up-to-date and comprehensive software platform for researchers to properly benchmark existing algorithms and for practitioners to apply selected algorithms to solve their real-world problems. The demand of such a common tool becomes even more urgent, when the source code of many proposed algorithms has not been made publicly available. To address these issues, we have developed a MATLAB platform for evolutionary multi-objective optimization in this paper, called PlatEMO, which includes more than 50 multi-objective evolutionary algorithms and more than 100 multi-objective test problems, along with several widely used performance indicators. With a user-friendly graphical user interface, PlatEMO enables users to easily compare several evolutionary algorithms at one time and collect statistical results in Excel or LaTeX files. More importantly, PlatEMO is completely open source, such that users are able to develop new algorithms on the basis of it. This paper introduces the main features of PlatEMO and illustrates how to use it for performing comparative experiments, embedding new algorithms, creating new test problems, and developing performance indicators. Source code of PlatEMO is now available at: http://bimk.ahu.edu.cn/index.php?s=/Index/Software/index.html.

* IEEE Computational Intelligence Magazine, 2017, 12(4): 73-87 
* 20 pages, 12 figures, 4 tables 

  Click for Model/Code and Paper
Graph Star Net for Generalized Multi-Task Learning

Jun 21, 2019
Lu Haonan, Seth H. Huang, Tian Ye, Guo Xiuyan

In this work, we present graph star net (GraphStar), a novel and unified graph neural net architecture which utilizes message-passing relay and attention mechanism for multiple prediction tasks - node classification, graph classification and link prediction. GraphStar addresses many earlier challenges facing graph neural nets and achieves non-local representation without increasing the model depth or bearing heavy computational costs. We also propose a new method to tackle topic-specific sentiment analysis based on node classification and text classification as graph classification. Our work shows that 'star nodes' can learn effective graph-data representation and improve on current methods for the three tasks. Specifically, for graph classification and link prediction, GraphStar outperforms the current state-of-the-art models by 2-5% on several key benchmarks.


  Click for Model/Code and Paper
Self-Attention Transducers for End-to-End Speech Recognition

Sep 28, 2019
Zhengkun Tian, Jiangyan Yi, Jianhua Tao, Ye Bai, Zhengqi Wen

Recurrent neural network transducers (RNN-T) have been successfully applied in end-to-end speech recognition. However, the recurrent structure makes it difficult for parallelization . In this paper, we propose a self-attention transducer (SA-T) for speech recognition. RNNs are replaced with self-attention blocks, which are powerful to model long-term dependencies inside sequences and able to be efficiently parallelized. Furthermore, a path-aware regularization is proposed to assist SA-T to learn alignments and improve the performance. Additionally, a chunk-flow mechanism is utilized to achieve online decoding. All experiments are conducted on a Mandarin Chinese dataset AISHELL-1. The results demonstrate that our proposed approach achieves a 21.3% relative reduction in character error rate compared with the baseline RNN-T. In addition, the SA-T with chunk-flow mechanism can perform online decoding with only a little degradation of the performance.


  Click for Model/Code and Paper
Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition

Jul 13, 2019
Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen

Integrating an external language model into a sequence-to-sequence speech recognition system is non-trivial. Previous works utilize linear interpolation or a fusion network to integrate external language models. However, these approaches introduce external components, and increase decoding computation. In this paper, we instead propose a knowledge distillation based training approach to integrating external language models into a sequence-to-sequence model. A recurrent neural network language model, which is trained on large scale external text, generates soft labels to guide the sequence-to-sequence model training. Thus, the language model plays the role of the teacher. This approach does not add any external component to the sequence-to-sequence model during testing. And this approach is flexible to be combined with shallow fusion technique together for decoding. The experiments are conducted on public Chinese datasets AISHELL-1 and CLMAD. Our approach achieves a character error rate of 9.3%, which is relatively reduced by 18.42% compared with the vanilla sequence-to-sequence model.

* 5 pages, 3 figures, accepted by INTERSPEECH 2019 

  Click for Model/Code and Paper
DeepShift: Towards Multiplication-Less Neural Networks

May 30, 2019
Mostafa Elhoushi, Farhan Shafiq, Ye Tian, Joey Yiwei Li, Zihao Chen

Deep learning models, especially DCNN have obtained high accuracies in several computer vision applications. However, for deployment in mobile environments, the high computation and power budget proves to be a major bottleneck. Convolution layers and fully connected layers, because of their intense use of multiplications, are the dominant contributer to this computation budget. This paper, proposes to tackle this problem by introducing two new operations: convolutional shifts and fully-connected shifts, that replace multiplications all together and use bitwise shift and bitwise negation instead. This family of neural network architectures (that use convolutional shifts and fully-connected shifts) are referred to as DeepShift models. With such DeepShift models that can be implemented with no multiplications, the authors have obtained accuracies of up to 93.6% on CIFAR10 dataset, and Top-1/Top-5 accuracies of 70.9%/90.13% on Imagenet dataset. Extensive testing is made on various well-known CNN architectures after converting all their convolution layers and fully connected layers to their bitwise shift counterparts, and we show that in some architectures, the Top-1 accuracy drops by less than 4% and the Top-5 accuracy drops by less than 1.5%. The experiments have been conducted on PyTorch framework and the code for training and running is submitted along with the paper and will be made available online.


  Click for Model/Code and Paper
Accelerating Training using Tensor Decomposition

Sep 10, 2019
Mostafa Elhoushi, Ye Henry Tian, Zihao Chen, Farhan Shafiq, Joey Yiwei Li

Tensor decomposition is one of the well-known approaches to reduce the latency time and number of parameters of a pre-trained model. However, in this paper, we propose an approach to use tensor decomposition to reduce training time of training a model from scratch. In our approach, we train the model from scratch (i.e., randomly initialized weights) with its original architecture for a small number of epochs, then the model is decomposed, and then continue training the decomposed model till the end. There is an optional step in our approach to convert the decomposed architecture back to the original architecture. We present results of using this approach on both CIFAR10 and Imagenet datasets, and show that there can be upto 2x speed up in training time with accuracy drop of upto 1.5% only, and in other cases no accuracy drop. This training acceleration approach is independent of hardware and is expected to have similar speed ups on both CPU and GPU platforms.


  Click for Model/Code and Paper
Learning a Multitask Curriculum for Neural Machine Translation

Aug 28, 2019
Wei Wang, Ye Tian, Jiquan Ngiam, Yinfei Yang, Isaac Caswell, Zarana Parekh

Existing curriculum learning research in neural machine translation (NMT) mostly focuses on a single final task such as selecting data for a domain or for denoising, and considers in-task example selection. This paper studies the data selection problem in multitask setting. We present a method to learn a multitask curriculum on a single, diverse, potentially noisy training dataset. It computes multiple data selection scores for each training example, each score measuring how useful the example is to a certain task. It uses Bayesian optimization to learn a linear weighting of these per-instance scores, and then sorts the data to form a curriculum. We experiment with three domain translation tasks: two specific domains and the general domain, and demonstrate that the learned multitask curriculum delivers results close to individually optimized models and brings solid gains over no curriculum training, across all test sets.

* 12 pages 

  Click for Model/Code and Paper
A New Approach for Resource Scheduling with Deep Reinforcement Learning

Jun 21, 2018
Yufei Ye, Xiaoqin Ren, Jin Wang, Lingxiao Xu, Wenxia Guo, Wenqiang Huang, Wenhong Tian

With the rapid development of deep learning, deep reinforcement learning (DRL) began to appear in the field of resource scheduling in recent years. Based on the previous research on DRL in the literature, we introduce online resource scheduling algorithm DeepRM2 and the offline resource scheduling algorithm DeepRM_Off. Compared with the state-of-the-art DRL algorithm DeepRM and heuristic algorithms, our proposed algorithms have faster convergence speed and better scheduling efficiency with regarding to average slowdown time, job completion time and rewards.


  Click for Model/Code and Paper
Information Competing Process for Learning Diversified Representations

Jun 04, 2019
Jie Hu, Rongrong Ji, ShengChuan Zhang, Xiaoshuai Sun, Qixiang Ye, Chia-Wen Lin, Qi Tian

Learning representations with diversified information remains an open problem. Towards learning diversified representations, a new approach, termed Information Competing Process (ICP), is proposed in this paper. Aiming to enrich the information carried by feature representations, ICP separates a representation into two parts with different mutual information constraints. The separated parts are forced to accomplish the downstream task independently in a competitive environment which prevents the two parts from learning what each other learned for the downstream task. Such competing parts are then combined synergistically to complete the task. By fusing representation parts learned competitively under different conditions, ICP facilitates obtaining diversified representations which contain complementary information. Experiments on image classification and image reconstruction tasks demonstrate the great potential of ICP to learn discriminative and disentangled representations in both supervised and self-supervised learning settings.


  Click for Model/Code and Paper
Computer-aided Detection of Squamous Carcinoma of the Cervix in Whole Slide Images

May 27, 2019
Ye Tian, Li Yang, Wei Wang, Jing Zhang, Qing Tang, Mili Ji, Yang Yu, Yu Li, Hong Yang, Airong Qian

Goal: Squamous cell carcinoma of cervix is one of the most prevalent cancer worldwide in females. Traditionally, the most indispensable diagnosis of cervix squamous carcinoma is histopathological assessment which is achieved under microscope by pathologist. However, human evaluation of pathology slide is highly depending on the experience of pathologist, thus big inter- and intra-observer variability exists. Digital pathology, in combination with deep learning provides an opportunity to improve the objectivity and efficiency of histopathologic slide analysis. Methods: In this study, we obtained 800 haematoxylin and eosin stained slides from 300 patients suffered from cervix squamous carcinoma. Based on information from morphological heterogeneity in the tumor and its adjacent area, we established deep learning models using popular convolution neural network architectures (inception-v3, InceptionResnet-v2 and Resnet50). Then random forest was introduced to feature extractions and slide-based classification. Results: The overall performance of our proposed models on slide-based tumor discrimination were outstanding with an AUC scores > 0.94. While, location identifications of lesions in whole slide images were mediocre (FROC scores > 0.52) duo to the extreme complexity of tumor tissues. Conclusion: For the first time, our analysis workflow highlighted a quantitative visual-based slide analysis of cervix squamous carcinoma. Significance: This study demonstrates a pathway to assist pathologist and accelerate the diagnosis of patients by utilizing new computational approaches.

* 8 pages, 5figures 

  Click for Model/Code and Paper
Towards Generation and Evaluation of Comprehensive Mapping Robot Datasets

May 23, 2019
Hongyu Chen, Xiting Zhao, Jianwen Luo, Zhijie Yang, Zehao Zhao, Haochuan Wan, Xiaoya Ye, Guangyuan Weng, Zhenpeng He, Tian Dong, Sören Schwertfeger

This paper presents a fully hardware synchronized mapping robot with support for a hardware synchronized external tracking system, for super-precise timing and localization. We also employ a professional, static 3D scanner for ground truth map collection. Three datasets are generated to evaluate the performance of mapping algorithms within a room and between rooms. Based on these datasets we generate maps and trajectory data, which is then fed into evaluation algorithms. The mapping and evaluation procedures are made in a very easily reproducible manner for maximum comparability. In the end we can draw a couple of conclusions about the tested SLAM algorithms.


  Click for Model/Code and Paper
AIBench: An Industry Standard Internet Service AI Benchmark Suite

Aug 13, 2019
Wanling Gao, Fei Tang, Lei Wang, Jianfeng Zhan, Chunxin Lan, Chunjie Luo, Yunyou Huang, Chen Zheng, Jiahui Dai, Zheng Cao, Daoyi Zheng, Haoning Tang, Kunlin Zhan, Biao Wang, Defei Kong, Tong Wu, Minghe Yu, Chongkang Tan, Huan Li, Xinhui Tian, Yatao Li, Junchao Shao, Zhenyu Wang, Xiaoyu Wang, Hainan Ye

Today's Internet Services are undergoing fundamental changes and shifting to an intelligent computing era where AI is widely employed to augment services. In this context, many innovative AI algorithms, systems, and architectures are proposed, and thus the importance of benchmarking and evaluating them rises. However, modern Internet services adopt a microservice-based architecture and consist of various modules. The diversity of these modules and complexity of execution paths, the massive scale and complex hierarchy of datacenter infrastructure, the confidential issues of data sets and workloads pose great challenges to benchmarking. In this paper, we present the first industry-standard Internet service AI benchmark suite---AIBench with seventeen industry partners, including several top Internet service providers. AIBench provides a highly extensible, configurable, and flexible benchmark framework that contains loosely coupled modules. We identify sixteen prominent AI problem domains like learning to rank, each of which forms an AI component benchmark, from three most important Internet service domains: search engine, social network, and e-commerce, which is by far the most comprehensive AI benchmarking effort. On the basis of the AIBench framework, abstracting the real-world data sets and workloads from one of the top e-commerce providers, we design and implement the first end-to-end Internet service AI benchmark, which contains the primary modules in the critical paths of an industry scale application and is scalable to deploy on different cluster scales. The specifications, source code, and performance numbers are publicly available from the benchmark council web site http://www.benchcouncil.org/AIBench/index.html.

* 24 pages 

  Click for Model/Code and Paper
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

Feb 21, 2019
Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel A. U. Bacchiani, Thomas B. Jablin, Rob Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon

Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of modular building blocks that are flexible and easily extensible, and experiment configurations are centralized and highly customizable. Distributed training and quantized inference are supported directly within the framework, and it contains existing implementations of a large number of utilities, helper functions, and the newest research ideas. Lingvo has been used in collaboration by dozens of researchers in more than 20 papers over the last two years. This document outlines the underlying design of Lingvo and serves as an introduction to the various pieces of the framework, while also offering examples of advanced features that showcase the capabilities of the framework.


  Click for Model/Code and Paper
Over-parameterization as a Catalyst for Better Generalization of Deep ReLU network

Sep 30, 2019
Yuandong Tian

To analyze deep ReLU network, we adopt a student-teacher setting in which an over-parameterized student network learns from the output of a fixed teacher network of the same depth, with Stochastic Gradient Descent (SGD). Our contributions are two-fold. First, we prove that when the gradient is zero (or bounded above by a small constant) at every data point in training, a situation called \emph{interpolation setting}, there exists many-to-one \emph{alignment} between student and teacher nodes in the lowest layer under mild conditions. This suggests that generalization in unseen dataset is achievable, even the same condition often leads to zero training error. Second, analysis of noisy recovery and training dynamics in 2-layer network shows that strong teacher nodes (with large fan-out weights) are learned first and subtle teacher nodes are left unlearned until late stage of training. As a result, it could take a long time to converge into these small-gradient critical points. Our analysis shows that over-parameterization plays two roles: (1) it is a necessary condition for alignment to happen at the critical points, and (2) in training dynamics, it helps student nodes cover more teacher nodes with fewer iterations. Both improve generalization. Experiments justify our finding.


  Click for Model/Code and Paper