Models, code, and papers for "Yu Zhang":

Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis

Aug 30, 2018
Yu-An Chung, Yuxuan Wang, Wei-Ning Hsu, Yu Zhang, RJ Skerry-Ryan

Although end-to-end text-to-speech (TTS) models such as Tacotron have shown excellent results, they typically require a sizable set of high-quality <text, audio> pairs for training, which are expensive to collect. In this paper, we propose a semi-supervised training framework to improve the data efficiency of Tacotron. The idea is to allow Tacotron to utilize textual and acoustic knowledge contained in large, publicly-available text and speech corpora. Importantly, these external data are unpaired and potentially noisy. Specifically, first we embed each word in the input text into word vectors and condition the Tacotron encoder on them. We then use an unpaired speech corpus to pre-train the Tacotron decoder in the acoustic domain. Finally, we fine-tune the model using available paired data. We demonstrate that the proposed framework enables Tacotron to generate intelligible speech using less than half an hour of paired training data.


  Access Model/Code and Paper
3D-CariGAN: An End-to-End Solution to 3D Caricature Generation from Face Photos

Mar 15, 2020
Zipeng Ye, Ran Yi, Minjing Yu, Juyong Zhang, Yu-Kun Lai, Yong-jin Liu

Caricature is a kind of artistic style of human faces that attracts considerable research in computer vision. So far all existing 3D caricature generation methods require some information related to caricature as input, e.g., a caricature sketch or 2D caricature. However, this kind of input is difficult to provide by non-professional users. In this paper, we propose an end-to-end deep neural network model to generate high-quality 3D caricature with a simple face photo as input. The most challenging issue in our system is that the source domain of face photos (characterized by 2D normal faces) is significantly different from the target domain of 3D caricatures (characterized by 3D exaggerated face shapes and texture). To address this challenge, we (1) build a large dataset of 6,100 3D caricature meshes and use it to establish a PCA model in the 3D caricature shape space and (2) detect landmarks in the input face photo and use them to set up correspondence between 2D caricature and 3D caricature shape. Our system can automatically generate high-quality 3D caricatures. In many situations, users want to control the output by a simple and intuitive way, so we further introduce a simple-to-use interactive control with three horizontal and one vertical lines. Experiments and user studies show that our system is easy to use and can generate high-quality 3D caricatures.


  Access Model/Code and Paper
Salient Object Detection via High-to-Low Hierarchical Context Aggregation

Dec 28, 2018
Yun Liu, Yu Qiu, Le Zhang, JiaWang Bian, Guang-Yu Nie, Ming-Ming Cheng

Recent progress on salient object detection mainly aims at exploiting how to effectively integrate convolutional side-output features in convolutional neural networks (CNN). Based on this, most of the existing state-of-the-art saliency detectors design complex network structures to fuse the side-output features of the backbone feature extraction networks. However, should the fusion strategies be more and more complex for accurate salient object detection? In this paper, we observe that the contexts of a natural image can be well expressed by a high-to-low self-learning of side-output convolutional features. As we know, the contexts of an image usually refer to the global structures, and the top layers of CNN usually learn to convey global information. On the other hand, it is difficult for the intermediate side-output features to express contextual information. Here, we design an hourglass network with intermediate supervision to learn contextual features in a high-to-low manner. The learned hierarchical contexts are aggregated to generate the hybrid contextual expression for an input image. At last, the hybrid contextual features can be used for accurate saliency estimation. We extensively evaluate our method on six challenging saliency datasets, and our simple method achieves state-of-the-art performance under various evaluation metrics. Code will be released upon paper acceptance.


  Access Model/Code and Paper
Deep Line Art Video Colorization with a Few References

Mar 30, 2020
Min Shi, Jia-Qi Zhang, Shu-Yu Chen, Lin Gao, Yu-Kun Lai, Fang-Lue Zhang

Coloring line art images based on the colors of reference images is an important stage in animation production, which is time-consuming and tedious. In this paper, we propose a deep architecture to automatically color line art videos with the same color style as the given reference images. Our framework consists of a color transform network and a temporal constraint network. The color transform network takes the target line art images as well as the line art and color images of one or more reference images as input, and generates corresponding target color images. To cope with larger differences between the target line art image and reference color images, our architecture utilizes non-local similarity matching to determine the region correspondences between the target image and the reference images, which are used to transform the local color information from the references to the target. To ensure global color style consistency, we further incorporate Adaptive Instance Normalization (AdaIN) with the transformation parameters obtained from a style embedding vector that describes the global color style of the references, extracted by an embedder. The temporal constraint network takes the reference images and the target image together in chronological order, and learns the spatiotemporal features through 3D convolution to ensure the temporal consistency of the target image and the reference image. Our model can achieve even better coloring results by fine-tuning the parameters with only a small amount of samples when dealing with an animation of a new style. To evaluate our method, we build a line art coloring dataset. Experiments show that our method achieves the best performance on line art video coloring compared to the state-of-the-art methods and other baselines.


  Access Model/Code and Paper
On Reinforcement Learning for Full-length Game of StarCraft

Sep 23, 2018
Zhen-Jia Pang, Ruo-Ze Liu, Zhou-Yu Meng, Yi Zhang, Yang Yu, Tong Lu

StarCraft II poses a grand challenge for reinforcement learning. The main difficulties of it include huge state and action space and a long-time horizon. In this paper, we investigate a hierarchical reinforcement learning approach for StarCraft II. The hierarchy involves two levels of abstraction. One is the macro-action automatically extracted from expert's trajectories, which reduces the action space in an order of magnitude yet remains effective. The other is a two-layer hierarchical architecture which is modular and easy to scale, enabling a curriculum transferring from simpler tasks to more complex tasks. The reinforcement training algorithm for this architecture is also investigated. On a 64x64 map and using restrictive units, we achieve a winning rate of more than 99\% against the difficulty level-1 built-in AI. Through the curriculum transfer learning algorithm and a mixture of combat model, we can achieve over 93\% winning rate of Protoss against the most difficult non-cheating built-in AI (level-7) of Terran, training within two days using a single machine with only 48 CPU cores and 8 K40 GPUs. It also shows strong generalization performance, when tested against never seen opponents including cheating levels built-in AI and all levels of Zerg and Protoss built-in AI. We hope this study could shed some light on the future research of large-scale reinforcement learning.


  Access Model/Code and Paper
PRS-Net: Planar Reflective Symmetry Detection Net for 3D Models

Oct 24, 2019
Lin Gao, Ling-Xiao Zhang, Hsien-Yu Meng, Yi-Hui Ren, Yu-Kun Lai, Leif Kobbelt

In geometry processing, symmetry is the universally high-level structural information of the 3d models and benefits many geometry processing tasks including shape segmentation, alignment, matching, completion, e.g.. Thus it is an important problem to analyze various forms of the symmetry of 3D shapes. The planar reflective symmetry is the most fundamental one. Traditional methods based on spatial sampling can be time consuming and may not be able to identify all the symmetry planes. In this paper, we present a novel learning framework to automatically discover global planar reflective symmetry of a 3D shape. Our framework trains an unsupervised 3D convolutional neural network to extract global model features and then outputs possible global symmetry parameters, where input shapes are represented using voxels. We introduce a dedicated symmetry distance loss along with a regularization loss to avoid generating duplicated symmetry planes. Our network can also identify isotropic shapes by predicting their rotation axes. We further provide a method to remove invalid and duplicated planes and axes. We demonstrate that our method is able to produce reliable and accurate results. Our neural network-based method is hundreds of times faster than the state-of-the-art method, which is based on sampling. Our method is also robust even with noisy or incomplete input surfaces.

* Corrected typos 

  Access Model/Code and Paper
SDM-NET: Deep Generative Network for Structured Deformable Mesh

Sep 03, 2019
Lin Gao, Jie Yang, Tong Wu, Yu-Jie Yuan, Hongbo Fu, Yu-Kun Lai, Hao Zhang

We introduce SDM-NET, a deep generative neural network which produces structured deformable meshes. Specifically, the network is trained to generate a spatial arrangement of closed, deformable mesh parts, which respect the global part structure of a shape collection, e.g., chairs, airplanes, etc. Our key observation is that while the overall structure of a 3D shape can be complex, the shape can usually be decomposed into a set of parts, each homeomorphic to a box, and the finer-scale geometry of the part can be recovered by deforming the box. The architecture of SDM-NET is that of a two-level variational autoencoder (VAE). At the part level, a PartVAE learns a deformable model of part geometries. At the structural level, we train a Structured Parts VAE (SP-VAE), which jointly learns the part structure of a shape collection and the part geometries, ensuring a coherence between global shape structure and surface details. Through extensive experiments and comparisons with the state-of-the-art deep generative models of shapes, we demonstrate the superiority of SDM-NET in generating meshes with visual quality, flexible topology, and meaningful structures, which benefit shape interpolation and other subsequently modeling tasks.

* Conditionally Accepted to Siggraph Asia 2019 

  Access Model/Code and Paper
Towards a General-Purpose Linguistic Annotation Backend

Dec 13, 2018
Graham Neubig, Patrick Littell, Chian-Yu Chen, Jean Lee, Zirui Li, Yu-Hsiang Lin, Yuyan Zhang

Language documentation is inherently a time-intensive process; transcription, glossing, and corpus management consume a significant portion of documentary linguists' work. Advances in natural language processing can help to accelerate this work, using the linguists' past decisions as training material, but questions remain about how to prioritize human involvement. In this extended abstract, we describe the beginnings of a new project that will attempt to ease this language documentation process through the use of natural language processing (NLP) technology. It is based on (1) methods to adapt NLP tools to new languages, based on recent advances in massively multilingual neural networks, and (2) backend APIs and interfaces that allow linguists to upload their data. We then describe our current progress on two fronts: automatic phoneme transcription, and glossing. Finally, we briefly describe our future directions.

* 4 pages, 8 figures, accepted by ComputEL-3 

  Access Model/Code and Paper
Choosing Transfer Languages for Cross-Lingual Learning

Jun 07, 2019
Yu-Hsiang Lin, Chian-Yu Chen, Jean Lee, Zirui Li, Yuyan Zhang, Mengzhou Xia, Shruti Rijhwani, Junxian He, Zhisong Zhang, Xuezhe Ma, Antonios Anastasopoulos, Patrick Littell, Graham Neubig

Cross-lingual transfer, where a high-resource transfer language is used to improve the accuracy of a low-resource task language, is now an invaluable tool for improving performance of natural language processing (NLP) on low-resource languages. However, given a particular task language, it is not clear which language to transfer from, and the standard strategy is to select languages based on ad hoc criteria, usually the intuition of the experimenter. Since a large number of features contribute to the success of cross-lingual transfer (including phylogenetic similarity, typological properties, lexical overlap, or size of available data), even the most enlightened experimenter rarely considers all these factors for the particular task at hand. In this paper, we consider this task of automatically selecting optimal transfer languages as a ranking problem, and build models that consider the aforementioned features to perform this prediction. In experiments on representative NLP tasks, we demonstrate that our model predicts good transfer languages much better than ad hoc baselines considering single features in isolation, and glean insights on what features are most informative for each different NLP tasks, which may inform future ad hoc selection even without use of our method. Code, data, and pre-trained models are available at https://github.com/neulab/langrank

* Proceedings of ACL 2019 

  Access Model/Code and Paper
A Simple General Approach to Balance Task Difficulty in Multi-Task Learning

Feb 12, 2020
Sicong Liang, Yu Zhang

In multi-task learning, difficulty levels of different tasks are varying. There are many works to handle this situation and we classify them into five categories, including the direct sum approach, the weighted sum approach, the maximum approach, the curriculum learning approach, and the multi-objective optimization approach. Those approaches have their own limitations, for example, using manually designed rules to update task weights, non-smooth objective function, and failing to incorporate other functions than training losses. In this paper, to alleviate those limitations, we propose a Balanced Multi-Task Learning (BMTL) framework. Different from existing studies which rely on task weighting, the BMTL framework proposes to transform the training loss of each task to balance difficulty levels among tasks based on an intuitive idea that tasks with larger training losses will receive more attention during the optimization procedure. We analyze the transformation function and derive necessary conditions. The proposed BMTL framework is very simple and it can be combined with most multi-task learning models. Empirical studies show the state-of-the-art performance of the proposed BMTL framework.


  Access Model/Code and Paper
Convergence Behaviour of Some Gradient-Based Methods on Bilinear Games

Aug 15, 2019
Guojun Zhang, Yaoliang Yu

Min-max optimization has attracted much attention in the machine learning community due to the popularization of deep generative models and adversarial training. The optimization is quite different from traditional minimization analysis. For example, gradient descent does not converge in one of the simplest settings -- bilinear games. In this paper, we try to understand several gradient-based algorithms for bilinear min-max games: gradient descent, extra-gradient, optimistic gradient descent and the momentum method, for both simultaneous and alternating updates. We provide necessary and sufficient conditions for their convergence, with the Schur theorem. Furthermore, by extending these algorithms to more general parameter settings, we are able to optimize over larger parameter spaces to find the optimal convergence rates. Our results imply that alternating updates converge more easily in min-max games than simultaneous updates.


  Access Model/Code and Paper
Policy Optimization with Stochastic Mirror Descent

Jun 25, 2019
Long Yang, Yu Zhang

Stochastic mirror descent (SMD) keeps the advantages of simplicity of implementation, low memory requirement, and low computational complexity. However, the non-convexity of objective function with its non-stationary sampling process is the main bottleneck of applying SMD to reinforcement learning. To address the above problem, we propose the mirror policy optimization (MPO) by estimating the policy gradient via dynamic batch-size of gradient information. Comparing with REINFORCE or VPG, the proposed MPO improves the convergence rate from $\mathcal{O}({{1}/{\sqrt{N}}})$ to $\mathcal{O}({\ln N}/{N})$. We also propose VRMPO algorithm, a variance reduction implementation of MPO. We prove the convergence of VRMPO and show its computational complexity. We evaluate the performance of VRMPO on the MuJoCo continuous control tasks, results show that VRMPO outperforms or matches several state-of-art algorithms DDPG, TRPO, PPO, and TD3.


  Access Model/Code and Paper
Expected Sarsa($位$) with Control Variate for Variance Reduction

Jun 25, 2019
Long Yang, Yu Zhang

Off-policy learning is powerful for reinforcement learning. However, the high variance of off-policy evaluation is a critical challenge, which causes off-policy learning with function approximation falls into an uncontrolled instability. In this paper, for reducing the variance, we introduce control variate technique to Expected Sarsa($\lambda$) and propose a tabular ES($\lambda$)-CV algorithm. We prove that if a proper estimator of value function reaches, the proposed ES($\lambda$)-CV enjoys a lower variance than Expected Sarsa($\lambda$). Furthermore, to extend ES($\lambda$)-CV to be a convergent algorithm with linear function approximation, we propose the GES($\lambda$) algorithm under the convex-concave saddle-point formulation. We prove that the convergence rate of GES($\lambda$) achieves $\mathcal{O}(1/T)$, which matches or outperforms several state-of-art gradient-based algorithms, but we use a more relaxed step-size. Numerical experiments show that the proposed algorithm is stable and converges faster with lower variance than several state-of-art gradient-based TD learning algorithms: GQ($\lambda$), GTB($\lambda$) and ABQ($\zeta$).


  Access Model/Code and Paper
From Abstractions to "Natural Languages" for Planning Agents

May 01, 2019
Yu Zhang, Li Wang

Despite our unique ability to use natural languages, we know little about their origins like how they are created and evolved. The answer lies deeply in the evolution of our cognitive and social abilities over a very long period of time which is beyond our scrutiny. Existing studies on the origin of languages are often focused on the emergence of specific language features (such as recursion) without supporting a comprehensive view. Investigation of restricted language representations, such as temporal logic, unfortunately does not reveal much about the impetus underlying language formation and evolution, since much of their construction is based on natural languages themselves. In this paper, we investigate the origin of "natural languages" in a restricted setting involving only planning agents. Similar to a common view that considers languages as a tool for grounding symbols to semantic meanings, we take the view that a language for planning agents is a tool for grounding symbols to physical configurations. From this perspective, a language is used by the agents to coordinate their behaviors during planning. With a few assumptions, we show that language is closely connected to a type of domain abstractions, based on which a language can be constructed. We study how such abstractions can be identified and discuss how to use them during planning. We apply our method to several domains, discuss the results, and relaxation of the assumptions made.


  Access Model/Code and Paper
Painting on Placement: Forecasting Routing Congestion using Conditional Generative Adversarial Nets

Apr 15, 2019
Cunxi Yu, Zhiru Zhang

Physical design process commonly consumes hours to days for large designs, and routing is known as the most critical step. Demands for accurate routing quality prediction raise to a new level to accelerate hardware innovation with advanced technology nodes. This work presents an approach that forecasts the density of all routing channels over the entire floorplan, with features collected up to placement, using conditional GANs. Specifically, forecasting the routing congestion is constructed as an image translation (colorization) problem. The proposed approach is applied to a) placement exploration for minimum congestion, b) constrained placement exploration and c) forecasting congestion in real-time during incremental placement, using eight designs targeting a fixed FPGA architecture.

* 6 pages, 9 figures, to appear at DAC'19 

  Access Model/Code and Paper
Progressive Explanation Generation for Human-robot Teaming

Feb 02, 2019
Yu Zhang, Mehrdad Zakershahrak

Generating explanation to explain its behavior is an essential capability for a robotic teammate. Explanations help human partners better understand the situation and maintain trust of their teammates. Prior work on robot generating explanations focuses on providing the reasoning behind its decision making. These approaches, however, fail to heed the cognitive requirement of understanding an explanation. In other words, while they provide the right explanations from the explainer's perspective, the explainee part of the equation is ignored. In this work, we address an important aspect along this direction that contributes to a better understanding of a given explanation, which we refer to as the progressiveness of explanations. A progressive explanation improves understanding by limiting the cognitive effort required at each step of making the explanation. As a result, such explanations are expected to be smoother and hence easier to understand. A general formulation of progressive explanation is presented. Algorithms are provided based on several alternative quantifications of cognitive effort as an explanation is being made, which are evaluated in a standard planning competition domain.


  Access Model/Code and Paper