Models, code, and papers for "Yu Zhang":

##### Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis

Aug 30, 2018
Yu-An Chung, Yuxuan Wang, Wei-Ning Hsu, Yu Zhang, RJ Skerry-Ryan

Although end-to-end text-to-speech (TTS) models such as Tacotron have shown excellent results, they typically require a sizable set of high-quality <text, audio> pairs for training, which are expensive to collect. In this paper, we propose a semi-supervised training framework to improve the data efficiency of Tacotron. The idea is to allow Tacotron to utilize textual and acoustic knowledge contained in large, publicly-available text and speech corpora. Importantly, these external data are unpaired and potentially noisy. Specifically, first we embed each word in the input text into word vectors and condition the Tacotron encoder on them. We then use an unpaired speech corpus to pre-train the Tacotron decoder in the acoustic domain. Finally, we fine-tune the model using available paired data. We demonstrate that the proposed framework enables Tacotron to generate intelligible speech using less than half an hour of paired training data.

##### Salient Object Detection via High-to-Low Hierarchical Context Aggregation

Dec 28, 2018
Yun Liu, Yu Qiu, Le Zhang, JiaWang Bian, Guang-Yu Nie, Ming-Ming Cheng

Recent progress on salient object detection mainly aims at exploiting how to effectively integrate convolutional side-output features in convolutional neural networks (CNN). Based on this, most of the existing state-of-the-art saliency detectors design complex network structures to fuse the side-output features of the backbone feature extraction networks. However, should the fusion strategies be more and more complex for accurate salient object detection? In this paper, we observe that the contexts of a natural image can be well expressed by a high-to-low self-learning of side-output convolutional features. As we know, the contexts of an image usually refer to the global structures, and the top layers of CNN usually learn to convey global information. On the other hand, it is difficult for the intermediate side-output features to express contextual information. Here, we design an hourglass network with intermediate supervision to learn contextual features in a high-to-low manner. The learned hierarchical contexts are aggregated to generate the hybrid contextual expression for an input image. At last, the hybrid contextual features can be used for accurate saliency estimation. We extensively evaluate our method on six challenging saliency datasets, and our simple method achieves state-of-the-art performance under various evaluation metrics. Code will be released upon paper acceptance.

##### On Reinforcement Learning for Full-length Game of StarCraft

Sep 23, 2018
Zhen-Jia Pang, Ruo-Ze Liu, Zhou-Yu Meng, Yi Zhang, Yang Yu, Tong Lu

StarCraft II poses a grand challenge for reinforcement learning. The main difficulties of it include huge state and action space and a long-time horizon. In this paper, we investigate a hierarchical reinforcement learning approach for StarCraft II. The hierarchy involves two levels of abstraction. One is the macro-action automatically extracted from expert's trajectories, which reduces the action space in an order of magnitude yet remains effective. The other is a two-layer hierarchical architecture which is modular and easy to scale, enabling a curriculum transferring from simpler tasks to more complex tasks. The reinforcement training algorithm for this architecture is also investigated. On a 64x64 map and using restrictive units, we achieve a winning rate of more than 99\% against the difficulty level-1 built-in AI. Through the curriculum transfer learning algorithm and a mixture of combat model, we can achieve over 93\% winning rate of Protoss against the most difficult non-cheating built-in AI (level-7) of Terran, training within two days using a single machine with only 48 CPU cores and 8 K40 GPUs. It also shows strong generalization performance, when tested against never seen opponents including cheating levels built-in AI and all levels of Zerg and Protoss built-in AI. We hope this study could shed some light on the future research of large-scale reinforcement learning.

##### PRS-Net: Planar Reflective Symmetry Detection Net for 3D Models

Oct 24, 2019
Lin Gao, Ling-Xiao Zhang, Hsien-Yu Meng, Yi-Hui Ren, Yu-Kun Lai, Leif Kobbelt

In geometry processing, symmetry is the universally high-level structural information of the 3d models and benefits many geometry processing tasks including shape segmentation, alignment, matching, completion, e.g.. Thus it is an important problem to analyze various forms of the symmetry of 3D shapes. The planar reflective symmetry is the most fundamental one. Traditional methods based on spatial sampling can be time consuming and may not be able to identify all the symmetry planes. In this paper, we present a novel learning framework to automatically discover global planar reflective symmetry of a 3D shape. Our framework trains an unsupervised 3D convolutional neural network to extract global model features and then outputs possible global symmetry parameters, where input shapes are represented using voxels. We introduce a dedicated symmetry distance loss along with a regularization loss to avoid generating duplicated symmetry planes. Our network can also identify isotropic shapes by predicting their rotation axes. We further provide a method to remove invalid and duplicated planes and axes. We demonstrate that our method is able to produce reliable and accurate results. Our neural network-based method is hundreds of times faster than the state-of-the-art method, which is based on sampling. Our method is also robust even with noisy or incomplete input surfaces.

* Corrected typos
##### SDM-NET: Deep Generative Network for Structured Deformable Mesh

Sep 03, 2019
Lin Gao, Jie Yang, Tong Wu, Yu-Jie Yuan, Hongbo Fu, Yu-Kun Lai, Hao Zhang

We introduce SDM-NET, a deep generative neural network which produces structured deformable meshes. Specifically, the network is trained to generate a spatial arrangement of closed, deformable mesh parts, which respect the global part structure of a shape collection, e.g., chairs, airplanes, etc. Our key observation is that while the overall structure of a 3D shape can be complex, the shape can usually be decomposed into a set of parts, each homeomorphic to a box, and the finer-scale geometry of the part can be recovered by deforming the box. The architecture of SDM-NET is that of a two-level variational autoencoder (VAE). At the part level, a PartVAE learns a deformable model of part geometries. At the structural level, we train a Structured Parts VAE (SP-VAE), which jointly learns the part structure of a shape collection and the part geometries, ensuring a coherence between global shape structure and surface details. Through extensive experiments and comparisons with the state-of-the-art deep generative models of shapes, we demonstrate the superiority of SDM-NET in generating meshes with visual quality, flexible topology, and meaningful structures, which benefit shape interpolation and other subsequently modeling tasks.

* Conditionally Accepted to Siggraph Asia 2019
##### Towards a General-Purpose Linguistic Annotation Backend

Language documentation is inherently a time-intensive process; transcription, glossing, and corpus management consume a significant portion of documentary linguists' work. Advances in natural language processing can help to accelerate this work, using the linguists' past decisions as training material, but questions remain about how to prioritize human involvement. In this extended abstract, we describe the beginnings of a new project that will attempt to ease this language documentation process through the use of natural language processing (NLP) technology. It is based on (1) methods to adapt NLP tools to new languages, based on recent advances in massively multilingual neural networks, and (2) backend APIs and interfaces that allow linguists to upload their data. We then describe our current progress on two fronts: automatic phoneme transcription, and glossing. Finally, we briefly describe our future directions.

* 4 pages, 8 figures, accepted by ComputEL-3
##### Choosing Transfer Languages for Cross-Lingual Learning

Cross-lingual transfer, where a high-resource transfer language is used to improve the accuracy of a low-resource task language, is now an invaluable tool for improving performance of natural language processing (NLP) on low-resource languages. However, given a particular task language, it is not clear which language to transfer from, and the standard strategy is to select languages based on ad hoc criteria, usually the intuition of the experimenter. Since a large number of features contribute to the success of cross-lingual transfer (including phylogenetic similarity, typological properties, lexical overlap, or size of available data), even the most enlightened experimenter rarely considers all these factors for the particular task at hand. In this paper, we consider this task of automatically selecting optimal transfer languages as a ranking problem, and build models that consider the aforementioned features to perform this prediction. In experiments on representative NLP tasks, we demonstrate that our model predicts good transfer languages much better than ad hoc baselines considering single features in isolation, and glean insights on what features are most informative for each different NLP tasks, which may inform future ad hoc selection even without use of our method. Code, data, and pre-trained models are available at https://github.com/neulab/langrank

* Proceedings of ACL 2019
##### Convergence Behaviour of Some Gradient-Based Methods on Bilinear Games

Aug 15, 2019
Guojun Zhang, Yaoliang Yu

Min-max optimization has attracted much attention in the machine learning community due to the popularization of deep generative models and adversarial training. The optimization is quite different from traditional minimization analysis. For example, gradient descent does not converge in one of the simplest settings -- bilinear games. In this paper, we try to understand several gradient-based algorithms for bilinear min-max games: gradient descent, extra-gradient, optimistic gradient descent and the momentum method, for both simultaneous and alternating updates. We provide necessary and sufficient conditions for their convergence, with the Schur theorem. Furthermore, by extending these algorithms to more general parameter settings, we are able to optimize over larger parameter spaces to find the optimal convergence rates. Our results imply that alternating updates converge more easily in min-max games than simultaneous updates.

##### Policy Optimization with Stochastic Mirror Descent

Jun 25, 2019
Long Yang, Yu Zhang

Stochastic mirror descent (SMD) keeps the advantages of simplicity of implementation, low memory requirement, and low computational complexity. However, the non-convexity of objective function with its non-stationary sampling process is the main bottleneck of applying SMD to reinforcement learning. To address the above problem, we propose the mirror policy optimization (MPO) by estimating the policy gradient via dynamic batch-size of gradient information. Comparing with REINFORCE or VPG, the proposed MPO improves the convergence rate from $\mathcal{O}({{1}/{\sqrt{N}}})$ to $\mathcal{O}({\ln N}/{N})$. We also propose VRMPO algorithm, a variance reduction implementation of MPO. We prove the convergence of VRMPO and show its computational complexity. We evaluate the performance of VRMPO on the MuJoCo continuous control tasks, results show that VRMPO outperforms or matches several state-of-art algorithms DDPG, TRPO, PPO, and TD3.

##### Expected Sarsa($λ$) with Control Variate for Variance Reduction

Jun 25, 2019
Long Yang, Yu Zhang

Off-policy learning is powerful for reinforcement learning. However, the high variance of off-policy evaluation is a critical challenge, which causes off-policy learning with function approximation falls into an uncontrolled instability. In this paper, for reducing the variance, we introduce control variate technique to Expected Sarsa($\lambda$) and propose a tabular ES($\lambda$)-CV algorithm. We prove that if a proper estimator of value function reaches, the proposed ES($\lambda$)-CV enjoys a lower variance than Expected Sarsa($\lambda$). Furthermore, to extend ES($\lambda$)-CV to be a convergent algorithm with linear function approximation, we propose the GES($\lambda$) algorithm under the convex-concave saddle-point formulation. We prove that the convergence rate of GES($\lambda$) achieves $\mathcal{O}(1/T)$, which matches or outperforms several state-of-art gradient-based algorithms, but we use a more relaxed step-size. Numerical experiments show that the proposed algorithm is stable and converges faster with lower variance than several state-of-art gradient-based TD learning algorithms: GQ($\lambda$), GTB($\lambda$) and ABQ($\zeta$).

##### From Abstractions to "Natural Languages" for Planning Agents

May 01, 2019
Yu Zhang, Li Wang

Despite our unique ability to use natural languages, we know little about their origins like how they are created and evolved. The answer lies deeply in the evolution of our cognitive and social abilities over a very long period of time which is beyond our scrutiny. Existing studies on the origin of languages are often focused on the emergence of specific language features (such as recursion) without supporting a comprehensive view. Investigation of restricted language representations, such as temporal logic, unfortunately does not reveal much about the impetus underlying language formation and evolution, since much of their construction is based on natural languages themselves. In this paper, we investigate the origin of "natural languages" in a restricted setting involving only planning agents. Similar to a common view that considers languages as a tool for grounding symbols to semantic meanings, we take the view that a language for planning agents is a tool for grounding symbols to physical configurations. From this perspective, a language is used by the agents to coordinate their behaviors during planning. With a few assumptions, we show that language is closely connected to a type of domain abstractions, based on which a language can be constructed. We study how such abstractions can be identified and discuss how to use them during planning. We apply our method to several domains, discuss the results, and relaxation of the assumptions made.

##### Painting on Placement: Forecasting Routing Congestion using Conditional Generative Adversarial Nets

Apr 15, 2019
Cunxi Yu, Zhiru Zhang

Physical design process commonly consumes hours to days for large designs, and routing is known as the most critical step. Demands for accurate routing quality prediction raise to a new level to accelerate hardware innovation with advanced technology nodes. This work presents an approach that forecasts the density of all routing channels over the entire floorplan, with features collected up to placement, using conditional GANs. Specifically, forecasting the routing congestion is constructed as an image translation (colorization) problem. The proposed approach is applied to a) placement exploration for minimum congestion, b) constrained placement exploration and c) forecasting congestion in real-time during incremental placement, using eight designs targeting a fixed FPGA architecture.

* 6 pages, 9 figures, to appear at DAC'19
##### Progressive Explanation Generation for Human-robot Teaming

Feb 02, 2019
Yu Zhang, Mehrdad Zakershahrak

Generating explanation to explain its behavior is an essential capability for a robotic teammate. Explanations help human partners better understand the situation and maintain trust of their teammates. Prior work on robot generating explanations focuses on providing the reasoning behind its decision making. These approaches, however, fail to heed the cognitive requirement of understanding an explanation. In other words, while they provide the right explanations from the explainer's perspective, the explainee part of the equation is ignored. In this work, we address an important aspect along this direction that contributes to a better understanding of a given explanation, which we refer to as the progressiveness of explanations. A progressive explanation improves understanding by limiting the cognitive effort required at each step of making the explanation. As a result, such explanations are expected to be smoother and hence easier to understand. A general formulation of progressive explanation is presented. Algorithms are provided based on several alternative quantifications of cognitive effort as an explanation is being made, which are evaluated in a standard planning competition domain.

##### Interactive Plan Explicability in Human-Robot Teaming

Jan 17, 2019
Mehrdad Zakershahrak, Yu Zhang

Human-robot teaming is one of the most important applications of artificial intelligence in the fast-growing field of robotics. For effective teaming, a robot must not only maintain a behavioral model of its human teammates to project the team status, but also be aware that its human teammates' expectation of itself. Being aware of the human teammates' expectation leads to robot behaviors that better align with human expectation, thus facilitating more efficient and potentially safer teams. Our work addresses the problem of human-robot cooperation with the consideration of such teammate models in sequential domains by leveraging the concept of plan explicability. In plan explicability, however, the human is considered solely as an observer. In this paper, we extend plan explicability to consider interactive settings where human and robot behaviors can influence each other. We term this new measure as Interactive Plan Explicability. We compare the joint plan generated with the consideration of this measure using the fast forward planner (FF) with the plan created by FF without such consideration, as well as the plan created with actual human subjects. Results indicate that the explicability score of plans generated by our algorithm is comparable to the human plan, and better than the plan created by FF without considering the measure, implying that the plans created by our algorithms align better with expected joint plans of the human during execution. This can lead to more efficient collaboration in practice.

##### A Survey on Multi-Task Learning

Jul 27, 2018
Yu Zhang, Qiang Yang

Multi-Task Learning (MTL) is a learning paradigm in machine learning and its aim is to leverage useful information contained in multiple related tasks to help improve the generalization performance of all the tasks. In this paper, we give a survey for MTL. First, we classify different MTL algorithms into several categories, including feature learning approach, low-rank approach, task clustering approach, task relation learning approach, and decomposition approach, and then discuss the characteristics of each approach. In order to improve the performance of learning tasks further, MTL can be combined with other learning paradigms including semi-supervised learning, active learning, unsupervised learning, reinforcement learning, multi-view learning and graphical models. When the number of tasks is large or the data dimensionality is high, batch MTL models are difficult to handle this situation and online, parallel and distributed MTL models as well as dimensionality reduction and feature hashing are reviewed to reveal their computational and storage advantages. Many real-world applications use MTL to boost their performance and we review representative works. Finally, we present theoretical analyses and discuss several future directions for MTL.

##### Webpage Saliency Prediction with Two-stage Generative Adversarial Networks

May 29, 2018
Yu Li, Ya Zhang

Web page saliency prediction is a challenge problem in image transformation and computer vision. In this paper, we propose a new model combined with web page outline information to prediction people's interest region in web page. For each web page image, our model can generate the saliency map which indicates the region of interest for people. A two-stage generative adversarial networks are proposed and image outline information is introduced for better transferring. Experiment results on FIWI dataset show that our model have better performance in terms of saliency prediction.

##### Learning of Agent Capability Models with Applications in Multi-agent Planning

Nov 04, 2014
Yu Zhang, Subbarao Kambhampati

One important challenge for a set of agents to achieve more efficient collaboration is for these agents to maintain proper models of each other. An important aspect of these models of other agents is that they are often partial and incomplete. Thus far, there are two common representations of agent models: MDP based and action based, which are both based on action modeling. In many applications, agent models may not have been given, and hence must be learnt. While it may seem convenient to use either MDP based or action based models for learning, in this paper, we introduce a new representation based on capability models, which has several unique advantages. First, we show that learning capability models can be performed efficiently online via Bayesian learning, and the learning process is robust to high degrees of incompleteness in plan execution traces (e.g., with only start and end states). While high degrees of incompleteness in plan execution traces presents learning challenges for MDP based and action based models, capability models can still learn to {\em abstract} useful information out of these traces. As a result, capability models are useful in applications in which such incompleteness is common, e.g., robot learning human model from observations and interactions. Furthermore, when used in multi-agent planning (with each agent modeled separately), capability models provide flexible abstraction of actions. The limitation, however, is that the synthesized plan is incomplete and abstract.