Models, code, and papers for "Yi Liu":

Computation-Performance Optimization of Convolutional Neural Networks with Redundant Kernel Removal

Apr 10, 2018
Chih-Ting Liu, Yi-Heng Wu, Yu-Sheng Lin, Shao-Yi Chien

Deep Convolutional Neural Networks (CNNs) are widely employed in modern computer vision algorithms, where the input image is convolved iteratively by many kernels to extract the knowledge behind it. However, with the depth of convolutional layers getting deeper and deeper in recent years, the enormous computational complexity makes it difficult to be deployed on embedded systems with limited hardware resources. In this paper, we propose two computation-performance optimization methods to reduce the redundant convolution kernels of a CNN with performance and architecture constraints, and apply it to a network for super resolution (SR). Using PSNR drop compared to the original network as the performance criterion, our method can get the optimal PSNR under a certain computation budget constraint. On the other hand, our method is also capable of minimizing the computation required under a given PSNR drop.

* This paper was accepted by 2018 The International Symposium on Circuits and Systems (ISCAS) 

  Click for Model/Code and Paper
Learning to Compose with Professional Photographs on the Web

Jul 18, 2017
Yi-Ling Chen, Jan Klopp, Min Sun, Shao-Yi Chien, Kwan-Liu Ma

Photo composition is an important factor affecting the aesthetics in photography. However, it is a highly challenging task to model the aesthetic properties of good compositions due to the lack of globally applicable rules to the wide variety of photographic styles. Inspired by the thinking process of photo taking, we formulate the photo composition problem as a view finding process which successively examines pairs of views and determines their aesthetic preferences. We further exploit the rich professional photographs on the web to mine unlimited high-quality ranking samples and demonstrate that an aesthetics-aware deep ranking network can be trained without explicitly modeling any photographic rules. The resulting model is simple and effective in terms of its architectural design and data sampling method. It is also generic since it naturally learns any photographic rules implicitly encoded in professional photographs. The experiments show that the proposed view finding network achieves state-of-the-art performance with sliding window search strategy on two image cropping datasets.

* Scripts and pre-trained models available at 

  Click for Model/Code and Paper
Revisiting the problem of audio-based hit song prediction using convolutional neural networks

Apr 05, 2017
Li-Chia Yang, Szu-Yu Chou, Jen-Yu Liu, Yi-Hsuan Yang, Yi-An Chen

Being able to predict whether a song can be a hit has impor- tant applications in the music industry. Although it is true that the popularity of a song can be greatly affected by exter- nal factors such as social and commercial influences, to which degree audio features computed from musical signals (whom we regard as internal factors) can predict song popularity is an interesting research question on its own. Motivated by the recent success of deep learning techniques, we attempt to ex- tend previous work on hit song prediction by jointly learning the audio features and prediction models using deep learning. Specifically, we experiment with a convolutional neural net- work model that takes the primitive mel-spectrogram as the input for feature learning, a more advanced JYnet model that uses an external song dataset for supervised pre-training and auto-tagging, and the combination of these two models. We also consider the inception model to characterize audio infor- mation in different scales. Our experiments suggest that deep structures are indeed more accurate than shallow structures in predicting the popularity of either Chinese or Western Pop songs in Taiwan. We also use the tags predicted by JYnet to gain insights into the result of different models.

* To appear in the proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 

  Click for Model/Code and Paper
Increasing Compactness Of Deep Learning Based Speech Enhancement Models With Parameter Pruning And Quantization Techniques

May 31, 2019
Jyun-Yi Wu, Cheng Yu, Szu-Wei Fu, Chih-Ting Liu, Shao-Yi Chien, Yu Tsao

Most recent studies on deep learning based speech enhancement (SE) focused on improving denoising performance. However, successful SE applications require striking a desirable balance between denoising performance and computational cost in real scenarios. In this study, we propose a novel parameter pruning (PP) technique, which removes redundant channels in a neural network. In addition, a parameter quantization (PQ) technique was applied to reduce the size of a neural network by representing weights with fewer cluster centroids. Because the techniques are derived based on different concepts, the PP and PQ can be integrated to provide even more compact SE models. The experimental results show that the PP and PQ techniques produce a compacted SE model with a size of only 10.03% compared to that of the original model, resulting in minor performance losses of 1.43% (from 0.70 to 0.69) for STOI and 3.24% (from 1.85 to 1.79) for PESQ. The promising results suggest that the PP and PQ techniques can be used in a SE system in devices with limited storage and computation resources.

* 4pages, 6 figures 

  Click for Model/Code and Paper
When Causal Intervention Meets Image Masking and Adversarial Perturbation for Deep Neural Networks

Feb 13, 2019
Chao-Han Huck Yang, Yi-Chieh Liu, Pin-Yu Chen, Xiaoli Ma, Yi-Chang James Tsai

Discovering and exploiting the causality in deep neural networks (DNNs) are crucial challenges for understanding and reasoning causal effects (CE) on an explainable visual model. "Intervention" has been widely used for recognizing a causal relation ontologically. In this paper, we propose a causal inference framework for visual reasoning via do-calculus. To study the intervention effects on pixel-level feature(s) for causal reasoning, we introduce pixel-wise masking and adversarial perturbation. In our framework, CE is calculated using features in a latent space and perturbed prediction from a DNN-based model. We further provide a first look into the characteristics of discovered CE of adversarially perturbed images generated by gradient-based methods. Experimental results show that CE is a competitive and robust index for understanding DNNs when compared with conventional methods such as class-activation mappings (CAMs) on the ChestX-ray 14 dataset for human-interpretable feature(s) (e.g., symptom) reasoning. Moreover, CE holds promises for detecting adversarial examples as it possesses distinct characteristics in the presence of adversarial perturbations.

* Submitted to IEEE International Conference on Image Processing (ICIP) 2019, Pytorch code will be released in Jun, 2019 

  Click for Model/Code and Paper
Interpretable Self-Attention Temporal Reasoning for Driving Behavior Understanding

Nov 06, 2019
Yi-Chieh Liu, Yung-An Hsieh, Min-Hung Chen, Chao-Han Huck Yang, Jesper Tegner, Yi-Chang James Tsai

Performing driving behaviors based on causal reasoning is essential to ensure driving safety. In this work, we investigated how state-of-the-art 3D Convolutional Neural Networks (CNNs) perform on classifying driving behaviors based on causal reasoning. We proposed a perturbation-based visual explanation method to inspect the models' performance visually. By examining the video attention saliency, we found that existing models could not precisely capture the causes (e.g., traffic light) of the specific action (e.g., stopping). Therefore, the Temporal Reasoning Block (TRB) was proposed and introduced to the models. With the TRB models, we achieved the accuracy of $\mathbf{86.3\%}$, which outperform the state-of-the-art 3D CNNs from previous works. The attention saliency also demonstrated that TRB helped models focus on the causes more precisely. With both numerical and visual evaluations, we concluded that our proposed TRB models were able to provide accurate driving behavior prediction by learning the causal reasoning of the behaviors.

* Submitted to IEEE ICASSP 2020; Pytorch code will be released soon 

  Click for Model/Code and Paper
Synthesizing New Retinal Symptom Images by Multiple Generative Models

Feb 11, 2019
Yi-Chieh Liu, Hao-Hsiang Yang, Chao-Han Huck Yang, Jia-Hong Huang, Meng Tian, Hiromasa Morikawa, Yi-Chang James Tsai, Jesper Tegner

Age-Related Macular Degeneration (AMD) is an asymptomatic retinal disease which may result in loss of vision. There is limited access to high-quality relevant retinal images and poor understanding of the features defining sub-classes of this disease. Motivated by recent advances in machine learning we specifically explore the potential of generative modeling, using Generative Adversarial Networks (GANs) and style transferring, to facilitate clinical diagnosis and disease understanding by feature extraction. We design an analytic pipeline which first generates synthetic retinal images from clinical images; a subsequent verification step is applied. In the synthesizing step we merge GANs (DCGANs and WGANs architectures) and style transferring for the image generation, whereas the verified step controls the accuracy of the generated images. We find that the generated images contain sufficient pathological details to facilitate ophthalmologists' task of disease classification and in discovery of disease relevant features. In particular, our system predicts the drusen and geographic atrophy sub-classes of AMD. Furthermore, the performance using CFP images for GANs outperforms the classification based on using only the original clinical dataset. Our results are evaluated using existing classifier of retinal diseases and class activated maps, supporting the predictive power of the synthetic images and their utility for feature extraction. Our code examples are available online.

* AI for Retinal Image Analysis Workshop ACCV 2018 

  Click for Model/Code and Paper
Universal low-rank matrix recovery from Pauli measurements

Nov 03, 2011
Yi-Kai Liu

We study the problem of reconstructing an unknown matrix M of rank r and dimension d using O(rd poly log d) Pauli measurements. This has applications in quantum state tomography, and is a non-commutative analogue of a well-known problem in compressed sensing: recovering a sparse vector from a few of its Fourier coefficients. We show that almost all sets of O(rd log^6 d) Pauli measurements satisfy the rank-r restricted isometry property (RIP). This implies that M can be recovered from a fixed ("universal") set of Pauli measurements, using nuclear-norm minimization (e.g., the matrix Lasso), with nearly-optimal bounds on the error. A similar result holds for any class of measurements that use an orthonormal operator basis whose elements have small operator norm. Our proof uses Dudley's inequality for Gaussian processes, together with bounds on covering numbers obtained via entropy duality.

* Advances in Neural Information Processing Systems (NIPS) 24, pp.1638-1646 (2011) 
* v2: corrected typos, added proof details, 9+8 pages, to appear in NIPS 2011 

  Click for Model/Code and Paper
Team NCTU: Toward AI-Driving for Autonomous Surface Vehicles -- From Duckietown to RobotX

Oct 31, 2019
Yi-Wei Huang, Tzu-Kuan Chuang, Ni-Ching Lin, Yu-Chieh Hsiao, Pin-Wei Chen, Ching-Tang Hung, Shih-Hsing Liu, Hsiao-Sheng Chen, Ya-Hsiu Hsieh, Ching-Tang Hung, Yen-Hsiang Huang, Yu-Xuan Chen, Kuan-Lin Chen, Ya-Jou Lan, Chao-Chun Hsu, Chun-Yi Lin, Jhih-Ying Li, Jui-Te Huang, Yu-Jen Menn, Sin-Kiat Lim, Kim-Boon Lua, Chia-Hung Dylan Tsai, Chi-Fang Chen, Hsueh-Cheng Wang

Robotic software and hardware systems of autonomous surface vehicles have been developed in transportation, military, and ocean researches for decades. Previous efforts in RobotX Challenges 2014 and 2016 facilitates the developments for important tasks such as obstacle avoidance and docking. Team NCTU is motivated by the AI Driving Olympics (AI-DO) developed by the Duckietown community, and adopts the principles to RobotX challenge. With the containerization (Docker) and uniformed AI agent (with observations and actions), we could better 1) integrate solutions developed in different middlewares (ROS and MOOS), 2) develop essential functionalities of from simulation (Gazebo) to real robots (either miniaturized or full-sized WAM-V), and 3) compare different approaches either from classic model-based or learning-based. Finally, we setup an outdoor on-surface platform with localization services for evaluation. Some of the preliminary results will be presented for the Team NCTU participations of the RobotX competition in Hawaii in 2018.

  Click for Model/Code and Paper
Optimal Function Approximation with Relu Neural Networks

Sep 10, 2019
Bo Liu, Yi Liang

We consider in this paper the optimal approximations of convex univariate functions with feed-forward Relu neural networks. We are interested in the following question: what is the minimal approximation error given the number of approximating linear pieces? We establish the necessary and sufficient conditions and uniqueness of optimal approximations, and give lower and upper bounds of the optimal approximation errors. Relu neural network architectures are then presented to generate these optimal approximations. Finally, we propose an algorithm to find the optimal approximations, as well as prove its convergence and validate it with experimental results.

  Click for Model/Code and Paper
Robust video object tracking via Bayesian model averaging based feature fusion

Sep 06, 2016
Yi Dai, Bin Liu

In this article, we are concerned with tracking an object of interest in video stream. We propose an algorithm that is robust against occlusion, the presence of confusing colors, abrupt changes in the object feature space and changes in object size. We develop the algorithm within a Bayesian modeling framework. The state space model is used for capturing the temporal correlation in the sequence of frame images by modeling the underlying dynamics of the tracking system. The Bayesian model averaging (BMA) strategy is proposed for fusing multi-clue information in the observations. Any number of object features are allowed to be involved in the proposed framework. Every feature represents one source of information to be fused and is associated with an observation model. The state inference is performed by employing the particle filter methods. In comparison with related approaches, the BMA based tracker is shown to have robustness, expressivity, and comprehensibility.

* Opt. Eng. 55(8), 083102 (2016) 

  Click for Model/Code and Paper
Robust video object tracking using particle filter with likelihood based feature fusion and adaptive template updating

Sep 28, 2015
Yi Dai, Bin Liu

A robust algorithm solution is proposed for tracking an object in complex video scenes. In this solution, the bootstrap particle filter (PF) is initialized by an object detector, which models the time-evolving background of the video signal by an adaptive Gaussian mixture. The motion of the object is expressed by a Markov model, which defines the state transition prior. The color and texture features are used to represent the object, and a marginal likelihood based feature fusion approach is proposed. A corresponding object template model updating procedure is developed to account for possible scale changes of the object in the tracking process. Experimental results show that our algorithm beats several existing alternatives in tackling challenging scenarios in video tracking tasks.

* 5 pages, 5 pages, conference 

  Click for Model/Code and Paper
Feature Selection Based on Confidence Machine

Jan 13, 2015
Chang Liu, Yi Xu

In machine learning and pattern recognition, feature selection has been a hot topic in the literature. Unsupervised feature selection is challenging due to the loss of labels which would supply the related information.How to define an appropriate metric is the key for feature selection. We propose a filter method for unsupervised feature selection which is based on the Confidence Machine. Confidence Machine offers an estimation of confidence on a feature'reliability. In this paper, we provide the math model of Confidence Machine in the context of feature selection, which maximizes the relevance and minimizes the redundancy of the selected feature. We compare our method against classic feature selection methods Laplacian Score, Pearson Correlation and Principal Component Analysis on benchmark data sets. The experimental results demonstrate the efficiency and effectiveness of our method.

* 10 pages 

  Click for Model/Code and Paper
Classical Chinese Sentence Segmentation for Tomb Biographies of Tang Dynasty

Aug 28, 2019
Chao-Lin Liu, Yi Chang

Tomb biographies of the Tang dynasty provide invaluable information about Chinese history. The original biographies are classical Chinese texts which contain neither word boundaries nor sentence boundaries. Relying on three published books of tomb biographies of the Tang dynasty, we investigated the effectiveness of employing machine-learning methods for algorithmically identifying the pauses and terminals of sentences in the biographies. We consider the segmentation task as a classification problem. Chinese characters that are and are not followed by a punctuation mark are classified into two categories. We applied a machine-learning-based mechanism, the conditional random fields (CRF), to classify the characters (and words) in the texts, and we studied the contributions of selected types of lexical information to the resulting quality of the segmentation recommendations. This proposal presented at the DH 2018 conference discussed some of the basic experiments and their evaluations. By considering the contextual information and employing the heuristics provided by experts of Chinese literature, we achieved F1 measures that were better than 80%. More complex experiments that employ deep neural networks helped us further improve the results in recent work.

* 6 pages, 3 figures, 2 tables, presented at the 2019 International Conference on Digital Humanities (ADHO) 

  Click for Model/Code and Paper
TrueLabel + Confusions: A Spectrum of Probabilistic Models in Analyzing Multiple Ratings

Jun 18, 2012
Chao Liu, Yi-Min Wang

This paper revisits the problem of analyzing multiple ratings given by different judges. Different from previous work that focuses on distilling the true labels from noisy crowdsourcing ratings, we emphasize gaining diagnostic insights into our in-house well-trained judges. We generalize the well-known DawidSkene model (Dawid & Skene, 1979) to a spectrum of probabilistic models under the same "TrueLabel + Confusion" paradigm, and show that our proposed hierarchical Bayesian model, called HybridConfusion, consistently outperforms DawidSkene on both synthetic and real-world data sets.

* ICML2012 

  Click for Model/Code and Paper
Dilated Convolution with Dilated GRU for Music Source Separation

Jun 04, 2019
Jen-Yu Liu, Yi-Hsuan Yang

Stacked dilated convolutions used in Wavenet have been shown effective for generating high-quality audios. By replacing pooling/striding with dilation in convolution layers, they can preserve high-resolution information and still reach distant locations. Producing high-resolution predictions is also crucial in music source separation, whose goal is to separate different sound sources while maintaining the quality of the separated sounds. Therefore, this paper investigates using stacked dilated convolutions as the backbone for music source separation. However, while stacked dilated convolutions can reach wider context than standard convolutions, their effective receptive fields are still fixed and may not be wide enough for complex music audio signals. To reach information at remote locations, we propose to combine dilated convolution with a modified version of gated recurrent units (GRU) called the `Dilated GRU' to form a block. A Dilated GRU unit receives information from k steps before instead of the previous step for a fixed k. This modification allows a GRU unit to reach a location with fewer recurrent steps and run faster because it can execute partially in parallel. We show that the proposed model with a stack of such blocks performs equally well or better than the state-of-the-art models for separating vocals and accompaniments.

  Click for Model/Code and Paper
Audio2Face: Generating Speech/Face Animation from Single Audio with Attention-Based Bidirectional LSTM Networks

May 27, 2019
Guanzhong Tian, Yi Yuan, Yong liu

We propose an end to end deep learning approach for generating real-time facial animation from just audio. Specifically, our deep architecture employs deep bidirectional long short-term memory network and attention mechanism to discover the latent representations of time-varying contextual information within the speech and recognize the significance of different information contributed to certain face status. Therefore, our model is able to drive different levels of facial movements at inference and automatically keep up with the corresponding pitch and latent speaking style in the input audio, with no assumption or further human intervention. Evaluation results show that our method could not only generate accurate lip movements from audio, but also successfully regress the speaker's time-varying facial movements.

  Click for Model/Code and Paper
Meta Filter Pruning to Accelerate Deep Convolutional Neural Networks

Apr 08, 2019
Yang He, Ping Liu, Linchao Zhu, Yi Yang

Existing methods usually utilize pre-defined criterions, such as p-norm, to prune unimportant filters. There are two major limitations in these methods. First, the relations of the filters are largely ignored. The filters usually work jointly to make an accurate prediction in a collaborative way. Similar filters will have equivalent effects on the network prediction, and the redundant filters can be further pruned. Second, the pruning criterion remains unchanged during training. As the network updated at each iteration, the filter distribution also changes continuously. The pruning criterions should also be adaptively switched. In this paper, we propose Meta Filter Pruning (MFP) to solve the above problems. First, as a complement to the existing p-norm criterion, we introduce a new pruning criterion considering the filter relation via filter distance. Additionally, we build a meta pruning framework for filter pruning, so that our method could adaptively select the most appropriate pruning criterion as the filter distribution changes. Experiments validate our approach on two image classification benchmarks. Notably, on ILSVRC-2012, our MFP reduces more than 50% FLOPs on ResNet-50 with only 0.44% top-5 accuracy loss.

* 10 pages 

  Click for Model/Code and Paper
Pruning Filter via Geometric Median for Deep Convolutional Neural Networks Acceleration

Nov 01, 2018
Yang He, Ping Liu, Ziwei Wang, Yi Yang

Previous works utilized "smaller-norm-less-important" criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, PFGM compresses CNN models by determining and pruning those filters with redundant information via Geometric Median (GM), rather than those with "relatively less" importance. When applied to two image classification benchmarks, our method validates its usefulness and strengths. Notably, on Cifar-10, PFGM reduces more than 52% FLOPs on ResNet-110 with even 2.69% relative accuracy improvement. Besides, on ILSCRC-2012, PFGM reduces more than 42% FLOPs on ResNet-101 without top-5 accuracy drop, which has advanced the state-of-the-art.

* 9 pages 

  Click for Model/Code and Paper
Lead Sheet Generation and Arrangement by Conditional Generative Adversarial Network

Jul 30, 2018
Hao-Min Liu, Yi-Hsuan Yang

Research on automatic music generation has seen great progress due to the development of deep neural networks. However, the generation of multi-instrument music of arbitrary genres still remains a challenge. Existing research either works on lead sheets or multi-track piano-rolls found in MIDIs, but both musical notations have their limits. In this work, we propose a new task called lead sheet arrangement to avoid such limits. A new recurrent convolutional generative model for the task is proposed, along with three new symbolic-domain harmonic features to facilitate learning from unpaired lead sheets and MIDIs. Our model can generate lead sheets and their arrangements of eight-bar long. Audio samples of the generated result can be found at

* 7 pages, 7 figures and 4 tables 

  Click for Model/Code and Paper