Models, code, and papers for "Weilin Zhang":

The Effect of Explicit Structure Encoding of Deep Neural Networks for Symbolic Music Generation

Nov 20, 2018
Ke Chen, Weilin Zhang, Shlomo Dubnov, Gus Xia

With recent breakthroughs in artificial neural networks, deep generative models have become one of the leading techniques for computational creativity. Despite very promising progress on image and short sequence generation, symbolic music generation remains a challenging problem since the structure of compositions are usually complicated. In this study, we attempt to solve the melody generation problem constrained by the given chord progression. This music meta-creation problem can also be incorporated into a plan recognition system with user inputs and predictive structural outputs. In particular, we explore the effect of explicit architectural encoding of musical structure via comparing two sequential generative models: LSTM (a type of RNN) and WaveNet (dilated temporal-CNN). As far as we know, this is the first study of applying WaveNet to symbolic music generation, as well as the first systematic comparison between temporal-CNN and RNN for music generation. We conduct a survey for evaluation in our generations and implemented Variable Markov Oracle in music pattern discovery. Experimental results show that to encode structure more explicitly using a stack of dilated convolution layers improved the performance significantly, and a global encoding of underlying chord progression into the generation procedure gains even more.

* 8 pages, 13 figures 

  Click for Model/Code and Paper
Cross-Batch Memory for Embedding Learning

Dec 14, 2019
Xun Wang, Haozhi Zhang, Weilin Huang, Matthew R. Scott

Mining informative negative instances are of central importance to deep metric learning (DML). However, the hard-mining ability of existing DML methods is intrinsically limited by mini-batch training, where only a mini-batch of instances are accessible at each iteration. In this paper, we identify a {"slow drift"} phenomena by observing that the embedding features drift exceptionally slow even as the model parameters are updating throughout the training process. It suggests that the features of instances computed at preceding iterations can considerably approximate to their features extracted by current model. We propose a cross-batch memory (XBM) mechanism that memorizes the embeddings of past iterations, allowing the model to collect sufficient hard negative pairs across multiple mini-batches - even over the whole dataset. Our XBM can be directly integrated into general pair-based DML framework. We demonstrate that, without bells and whistles, XBM augmented DML can boost the performance considerably on image retrieval. In particular, with XBM, a simple contrastive loss can have large R@1 improvements of 12\%-22.5\% on three large-scale datasets, easily surpassing the most sophisticated state-of-the-art methods by a large margin. Our XBM is conceptually simple, easy to implement - using several lines of codes, and is memory efficient - with a negligible 0.2 GB extra GPU memory.

* under review 

  Click for Model/Code and Paper
CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images

Oct 18, 2018
Sheng Guo, Weilin Huang, Haozhi Zhang, Chenfan Zhuang, Dengke Dong, Matthew R. Scott, Dinglong Huang

We present a simple yet efficient approach capable of training deep neural networks on large-scale weakly-supervised web images, which are crawled raw from the Internet by using text queries, without any human annotation. We develop a principled learning strategy by leveraging curriculum learning, with the goal of handling a massive amount of noisy labels and data imbalance effectively. We design a new learning curriculum by measuring the complexity of data using its distribution density in a feature space, and rank the complexity in an unsupervised manner. This allows for an efficient implementation of curriculum learning on large-scale web images, resulting in a high-performance CNN model, where the negative impact of noisy labels is reduced substantially. Importantly, we show by experiments that those images with highly noisy labels can surprisingly improve the generalization capability of the model, by serving as a manner of regularization. Our approaches obtain state-of-the-art performance on four benchmarks: WebVision, ImageNet, Clothing-1M and Food-101. With an ensemble of multiple models, we achieved a top-5 error rate of 5.2% on the WebVision challenge for 1000-category classification. This result was the top performance by a wide margin, outperforming second place by a nearly 50% relative error rate. Code and models are available at: https://github.com/MalongTech/CurriculumNet .

* Accepted to ECCV 2018. 16 pages, 5 figures, 5 tables 

  Click for Model/Code and Paper
The iMaterialist Fashion Attribute Dataset

Jun 14, 2019
Sheng Guo, Weilin Huang, Xiao Zhang, Prasanna Srikhanta, Yin Cui, Yuan Li, Matthew R. Scott, Hartwig Adam, Serge Belongie

Large-scale image databases such as ImageNet have significantly advanced image classification and other visual recognition tasks. However much of these datasets are constructed only for single-label and coarse object-level classification. For real-world applications, multiple labels and fine-grained categories are often needed, yet very few such datasets exist publicly, especially those of large-scale and high quality. In this work, we contribute to the community a new dataset called iMaterialist Fashion Attribute (iFashion-Attribute) to address this problem in the fashion domain. The dataset is constructed from over one million fashion images with a label space that includes 8 groups of 228 fine-grained attributes in total. Each image is annotated by experts with multiple, high-quality fashion attributes. The result is the first known million-scale multi-label and fine-grained image dataset. We conduct extensive experiments and provide baseline results with modern deep Convolutional Neural Networks (CNNs). Additionally, we demonstrate models pre-trained on iFashion-Attribute achieve superior transfer learning performance on fashion related tasks compared with pre-training from ImageNet or other fashion datasets. Data is available at: https://github.com/visipedia/imat_fashion_comp


  Click for Model/Code and Paper
Efforts estimation of doctors annotating medical image

Jan 06, 2019
Yang Deng, Yao Sun, Yongpei Zhu, Yue Xu, Qianxi Yang, Shuo Zhang, Mingwang Zhu, Jirang Sun, Weiling Zhao, Xiaobo Zhou, Kehong Yuan

Accurate annotation of medical image is the crucial step for image AI clinical application. However, annotating medical image will incur a great deal of annotation effort and expense due to its high complexity and needing experienced doctors. To alleviate annotation cost, some active learning methods are proposed. But such methods just cut the number of annotation candidates and do not study how many efforts the doctor will exactly take, which is not enough since even annotating a small amount of medical data will take a lot of time for the doctor. In this paper, we propose a new criterion to evaluate efforts of doctors annotating medical image. First, by coming active learning and U-shape network, we employ a suggestive annotation strategy to choose the most effective annotation candidates. Then we exploit a fine annotation platform to alleviate annotating efforts on each candidate and first utilize a new criterion to quantitatively calculate the efforts taken by doctors. In our work, we take MR brain tissue segmentation as an example to evaluate the proposed method. Extensive experiments on the well-known IBSR18 dataset and MRBrainS18 Challenge dataset show that, using proposed strategy, state-of-the-art segmentation performance can be achieved by using only 60% annotation candidates and annotation efforts can be alleviated by at least 44%, 44%, 47% on CSF, GM, WM separately.


  Click for Model/Code and Paper