Models, code, and papers for "Wei Wang":

Electrocardiography Separation of Mother and Baby

Nov 05, 2014
Wei Wang

Extraction of Electrocardiography (ECG or EKG) signals of mother and baby is a challenging task, because one single device is used and it receives a mixture of multiple heart beats. In this paper, we would like to design a filter to separate the signals from each other.


  Click for Model/Code and Paper
Optical Character Recognition, Using K-Nearest Neighbors

Nov 05, 2014
Wei Wang

The problem of optical character recognition, OCR, has been widely discussed in the literature. Having a hand-written text, the program aims at recognizing the text. Even though there are several approaches to this issue, it is still an open problem. In this paper we would like to propose an approach that uses K-nearest neighbors algorithm, and has the accuracy of more than 90%. The training and run time is also very short.


  Click for Model/Code and Paper
Factorization of Language Models through Backing-Off Lattices

May 26, 2003
Wei Wang

Factorization of statistical language models is the task that we resolve the most discriminative model into factored models and determine a new model by combining them so as to provide better estimate. Most of previous works mainly focus on factorizing models of sequential events, each of which allows only one factorization manner. To enable parallel factorization, which allows a model event to be resolved in more than one ways at the same time, we propose a general framework, where we adopt a backing-off lattice to reflect parallel factorizations and to define the paths along which a model is resolved into factored models, we use a mixture model to combine parallel paths in the lattice, and generalize Katz's backing-off method to integrate all the mixture models got by traversing the entire lattice. Based on this framework, we formulate two types of model factorizations that are used in natural language modeling.

* 7 pages and 1 figure; technical report; added one more related work, and removed some errors 

  Click for Model/Code and Paper
The Mixing method: low-rank coordinate descent for semidefinite programming with diagonal constraints

Jul 04, 2018
Po-Wei Wang, Wei-Cheng Chang, J. Zico Kolter

In this paper, we propose a low-rank coordinate descent approach to structured semidefinite programming with diagonal constraints. The approach, which we call the Mixing method, is extremely simple to implement, has no free parameters, and typically attains an order of magnitude or better improvement in optimization performance over the current state of the art. We show that the method is strictly decreasing, converges to a critical point, and further that for sufficient rank all non-optimal critical points are unstable. Moreover, we prove that with a step size, the Mixing method converges to the global optimum of the semidefinite program almost surely in a locally linear rate under random initialization. This is the first low-rank semidefinite programming method that has been shown to achieve a global optimum on the spherical manifold without assumption. We apply our algorithm to two related domains: solving the maximum cut semidefinite relaxation, and solving a maximum satisfiability relaxation (we also briefly consider additional applications such as learning word embeddings). In all settings, we demonstrate substantial improvement over the existing state of the art along various dimensions, and in total, this work expands the scope and scale of problems that can be solved using semidefinite programming methods.


  Click for Model/Code and Paper
Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification

Sep 06, 2019
Yichao Zhou, Jyun-Yu Jiang, Kai-Wei Chang, Wei Wang

Adversarial attacks against machine learning models have threatened various real-world applications such as spam filtering and sentiment analysis. In this paper, we propose a novel framework, learning to DIScriminate Perturbations (DISP), to identify and adjust malicious perturbations, thereby blocking adversarial attacks for text classification models. To identify adversarial attacks, a perturbation discriminator validates how likely a token in the text is perturbed and provides a set of potential perturbations. For each potential perturbation, an embedding estimator learns to restore the embedding of the original word based on the context and a replacement token is chosen based on approximate kNN search. DISP can block adversarial attacks for any NLP model without modifying the model structure or training procedure. Extensive experiments on two benchmark datasets demonstrate that DISP significantly outperforms baseline methods in blocking adversarial attacks for text classification. In addition, in-depth analysis shows the robustness of DISP across different situations.

* 10 pages, 8 tables, 4 figures 

  Click for Model/Code and Paper
Dual Adaptivity: A Universal Algorithm for Minimizing the Adaptive Regret of Convex Functions

Jun 26, 2019
Lijun Zhang, Guanghui Wang, Wei-Wei Tu, Zhi-Hua Zhou

To deal with changing environments, a new performance measure---adaptive regret, defined as the maximum static regret over any interval, is proposed in online learning. Under the setting of online convex optimization, several algorithms have been successfully developed to minimize the adaptive regret. However, existing algorithms lack universality in the sense that they can only handle one type of convex functions and need apriori knowledge of parameters. By contrast, there exist universal algorithms, such as MetaGrad, that attain optimal static regret for multiple types of convex functions simultaneously. Along this line of research, this paper presents the first universal algorithm for minimizing the adaptive regret of convex functions. Specifically, we borrow the idea of maintaining multiple learning rates in MetaGrad to handle the uncertainty of functions, and utilize the technique of sleeping experts to capture changing environments. In this way, our algorithm automatically adapts to the property of functions (convex, exponentially concave, or strongly convex), as well as the nature of environments (stationary or changing). As a by product, it also allows the type of functions to switch between rounds.


  Click for Model/Code and Paper
Feature functional theory - binding predictor (FFT-BP) for the blind prediction of binding free energies

Mar 31, 2017
Bao Wang, Zhixiong Zhao, Duc D. Nguyen, Guo-Wei Wei

We present a feature functional theory - binding predictor (FFT-BP) for the protein-ligand binding affinity prediction. The underpinning assumptions of FFT-BP are as follows: i) representability: there exists a microscopic feature vector that can uniquely characterize and distinguish one protein-ligand complex from another; ii) feature-function relationship: the macroscopic features, including binding free energy, of a complex is a functional of microscopic feature vectors; and iii) similarity: molecules with similar microscopic features have similar macroscopic features, such as binding affinity. Physical models, such as implicit solvent models and quantum theory, are utilized to extract microscopic features, while machine learning algorithms are employed to rank the similarity among protein-ligand complexes. A large variety of numerical validations and tests confirms the accuracy and robustness of the proposed FFT-BP model. The root mean square errors (RMSEs) of FFT-BP blind predictions of a benchmark set of 100 complexes, the PDBBind v2007 core set of 195 complexes and the PDBBind v2015 core set of 195 complexes are 1.99, 2.02 and 1.92 kcal/mol, respectively. Their corresponding Pearson correlation coefficients are 0.75, 0.80, and 0.78, respectively.

* 25 pages, 11 figures 

  Click for Model/Code and Paper
Learning Gender-Neutral Word Embeddings

Aug 29, 2018
Jieyu Zhao, Yichao Zhou, Zeyu Li, Wei Wang, Kai-Wei Chang

Word embedding models have become a fundamental component in a wide range of Natural Language Processing (NLP) applications. However, embeddings trained on human-generated corpora have been demonstrated to inherit strong gender stereotypes that reflect social constructs. To address this concern, in this paper, we propose a novel training procedure for learning gender-neutral word embeddings. Our approach aims to preserve gender information in certain dimensions of word vectors while compelling other dimensions to be free of gender influence. Based on the proposed method, we generate a Gender-Neutral variant of GloVe (GN-GloVe). Quantitative and qualitative experiments demonstrate that GN-GloVe successfully isolates gender information without sacrificing the functionality of the embedding model.

* EMNLP 2018 

  Click for Model/Code and Paper
All about Structure: Adapting Structural Information across Domains for Boosting Semantic Segmentation

Mar 26, 2019
Wei-Lun Chang, Hui-Po Wang, Wen-Hsiao Peng, Wei-Chen Chiu

In this paper we tackle the problem of unsupervised domain adaptation for the task of semantic segmentation, where we attempt to transfer the knowledge learned upon synthetic datasets with ground-truth labels to real-world images without any annotation. With the hypothesis that the structural content of images is the most informative and decisive factor to semantic segmentation and can be readily shared across domains, we propose a Domain Invariant Structure Extraction (DISE) framework to disentangle images into domain-invariant structure and domain-specific texture representations, which can further realize image-translation across domains and enable label transfer to improve segmentation performance. Extensive experiments verify the effectiveness of our proposed DISE model and demonstrate its superiority over several state-of-the-art approaches.


  Click for Model/Code and Paper
Multi-Label Zero-Shot Learning with Structured Knowledge Graphs

May 26, 2018
Chung-Wei Lee, Wei Fang, Chih-Kuan Yeh, Yu-Chiang Frank Wang

In this paper, we propose a novel deep learning architecture for multi-label zero-shot learning (ML-ZSL), which is able to predict multiple unseen class labels for each input instance. Inspired by the way humans utilize semantic knowledge between objects of interests, we propose a framework that incorporates knowledge graphs for describing the relationships between multiple labels. Our model learns an information propagation mechanism from the semantic label space, which can be applied to model the interdependencies between seen and unseen class labels. With such investigation of structured knowledge graphs for visual reasoning, we show that our model can be applied for solving multi-label classification and ML-ZSL tasks. Compared to state-of-the-art approaches, comparable or improved performances can be achieved by our method.

* CVPR 2018 

  Click for Model/Code and Paper
End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks

Mar 15, 2018
Szu-Wei Fu, Tao-Wei Wang, Yu Tsao, Xugang Lu, Hisashi Kawai

Speech enhancement model is used to map a noisy speech to a clean speech. In the training stage, an objective function is often adopted to optimize the model parameters. However, in most studies, there is an inconsistency between the model optimization criterion and the evaluation criterion on the enhanced speech. For example, in measuring speech intelligibility, most of the evaluation metric is based on a short-time objective intelligibility (STOI) measure, while the frame based minimum mean square error (MMSE) between estimated and clean speech is widely used in optimizing the model. Due to the inconsistency, there is no guarantee that the trained model can provide optimal performance in applications. In this study, we propose an end-to-end utterance-based speech enhancement framework using fully convolutional neural networks (FCN) to reduce the gap between the model optimization and evaluation criterion. Because of the utterance-based optimization, temporal correlation information of long speech segments, or even at the entire utterance level, can be considered when perception-based objective functions are used for the direct optimization. As an example, we implement the proposed FCN enhancement framework to optimize the STOI measure. Experimental results show that the STOI of test speech is better than conventional MMSE-optimized speech due to the consistency between the training and evaluation target. Moreover, by integrating the STOI in model optimization, the intelligibility of human subjects and automatic speech recognition (ASR) system on the enhanced speech is also substantially improved compared to those generated by the MMSE criterion.

* Accepted in IEEE Transactions on Audio, Speech and Language Processing (TASLP) 

  Click for Model/Code and Paper
Learning Deep Latent Spaces for Multi-Label Classification

Jul 03, 2017
Chih-Kuan Yeh, Wei-Chieh Wu, Wei-Jen Ko, Yu-Chiang Frank Wang

Multi-label classification is a practical yet challenging task in machine learning related fields, since it requires the prediction of more than one label category for each input instance. We propose a novel deep neural networks (DNN) based model, Canonical Correlated AutoEncoder (C2AE), for solving this task. Aiming at better relating feature and label domain data for improved classification, we uniquely perform joint feature and label embedding by deriving a deep latent space, followed by the introduction of label-correlation sensitive loss function for recovering the predicted label outputs. Our C2AE is achieved by integrating the DNN architectures of canonical correlation analysis and autoencoder, which allows end-to-end learning and prediction with the ability to exploit label dependency. Moreover, our C2AE can be easily extended to address the learning problem with missing labels. Our experiments on multiple datasets with different scales confirm the effectiveness and robustness of our proposed method, which is shown to perform favorably against state-of-the-art methods for multi-label classification.

* published in AAAI-2017 

  Click for Model/Code and Paper
Architecture-aware Network Pruning for Vision Quality Applications

Aug 05, 2019
Wei-Ting Wang, Han-Lin Li, Wei-Shiang Lin, Cheng-Ming Chiang, Yi-Min Tsai

Convolutional neural network (CNN) delivers impressive achievements in computer vision and machine learning field. However, CNN incurs high computational complexity, especially for vision quality applications because of large image resolution. In this paper, we propose an iterative architecture-aware pruning algorithm with adaptive magnitude threshold while cooperating with quality-metric measurement simultaneously. We show the performance improvement applied on vision quality applications and provide comprehensive analysis with flexible pruning configuration. With the proposed method, the Multiply-Accumulate (MAC) of state-of-the-art low-light imaging (SID) and super-resolution (EDSR) are reduced by 58% and 37% without quality drop, respectively. The memory bandwidth (BW) requirements of convolutional layer can be also reduced by 20% to 40%.

* Accepted to be Published in the 26th IEEE International Conference on Image Processing (ICIP 2019). Updated to contain the IEEE copyright notice 

  Click for Model/Code and Paper
Multichannel Speech Enhancement by Raw Waveform-mapping using Fully Convolutional Networks

Sep 26, 2019
Chang-Le Liu, Szu-Wei Fu, You-Jin Lee, Yu Tsao, Jen-Wei Huang, Hsin-Min Wang

In recent years, waveform-mapping-based speech enhancement (SE) methods have garnered significant attention. These methods generally use a deep learning model to directly process and reconstruct speech waveforms. Because both the input and output are in waveform format, the waveform-mapping-based SE methods can overcome the distortion caused by imperfect phase estimation, which may be encountered in spectral-mapping-based SE systems. So far, most waveform-mapping-based SE methods have focused on single-channel tasks. In this paper, we propose a novel fully convolutional network (FCN) with Sinc and dilated convolutional layers (termed SDFCN) for multichannel SE that operates in the time domain. We also propose an extended version of SDFCN, called the residual SDFCN (termed rSDFCN). The proposed methods are evaluated on two multichannel SE tasks, namely the dual-channel inner-ear microphones SE task and the distributed microphones SE task. The experimental results confirm the outstanding denoising capability of the proposed SE systems on both tasks and the benefits of using the residual architecture on the overall SE performance.


  Click for Model/Code and Paper
Taking Human out of Learning Applications: A Survey on Automated Machine Learning

Oct 31, 2018
Yao Quanming, Wang Mengshuo, Jair Escalante Hugo, Guyon Isabelle, Hu Yi-Qi, Li Yu-Feng, Tu Wei-Wei, Yang Qiang, Yu Yang

Machine learning techniques have deeply rooted in our everyday life. However, since it is knowledge- and labor-intensive to pursuit good learning performance, human experts are heavily engaged in every aspect of machine learning. In order to make machine learning techniques easier to apply and reduce the demand for experienced human experts, automatic machine learning~(AutoML) has emerged as a hot topic of both in industry and academy. In this paper, we provide a survey on existing AutoML works. First, we introduce and define the AutoML problem, with inspiration from both realms of automation and machine learning. Then, we propose a general AutoML framework that not only covers almost all existing approaches but also guides the design for new methods. Afterward, we categorize and review the existing works from two aspects, i.e., the problem setup and the employed techniques. Finally, we provide a detailed analysis of AutoML approaches and explain the reasons underneath their successful applications. We hope this survey can serve as not only an insightful guideline for AutoML beginners but also an inspiration for future researches.

* This is a preliminary and will be kept updated 

  Click for Model/Code and Paper
Demonstration of Topological Data Analysis on a Quantum Processor

Jan 19, 2018
He-Liang Huang, Xi-Lin Wang, Peter P. Rohde, Yi-Han Luo, You-Wei Zhao, Chang Liu, Li Li, Nai-Le Liu, Chao-Yang Lu, Jian-Wei Pan

Topological data analysis offers a robust way to extract useful information from noisy, unstructured data by identifying its underlying structure. Recently, an efficient quantum algorithm was proposed [Lloyd, Garnerone, Zanardi, Nat. Commun. 7, 10138 (2016)] for calculating Betti numbers of data points -- topological features that count the number of topological holes of various dimensions in a scatterplot. Here, we implement a proof-of-principle demonstration of this quantum algorithm by employing a six-photon quantum processor to successfully analyze the topological features of Betti numbers of a network including three data points, providing new insights into data analysis in the era of quantum computing.

* Accepted by Optica 

  Click for Model/Code and Paper
Depth and Reflection Total Variation for Single Image Dehazing

Jan 22, 2016
Wei Wang, Chuanjiang He

Haze removal has been a very challenging problem due to its ill-posedness, which is more ill-posed if the input data is only a single hazy image. In this paper, we present a new approach for removing haze from a single input image. The proposed method combines the model widely used to describe the formation of a haze image with the assumption in Retinex that an image is the product of the illumination and the reflection. We assume that the depth and reflection functions are spatially piecewise smooth in the model, where the total variation is used for the regularization. The proposed model is defined as a constrained optimization problem, which is solved by an alternating minimization scheme and the fast gradient projection algorithm. Some theoretic analyses are given for the proposed model and algorithm. Finally, numerical examples are presented to demonstrate that our method can restore vivid and contrastive hazy images effectively.


  Click for Model/Code and Paper
Duckiefloat: a Collision-Tolerant Resource-Constrained Blimp for Long-Term Autonomy in Subterranean Environments

Oct 31, 2019
Yi-Wei Huang, Chen-Lung Lu, Kuan-Lin Chen, Po-Sheng Ser, Jui-Te Huang, Yu-Chia Shen, Pin-Wei Chen, Po-Kai Chang, Sheng-Cheng Lee, Hsueh-Cheng Wang

There are several challenges for search and rescue robots: mobility, perception, autonomy, and communication. Inspired by the DARPA Subterranean (SubT) Challenge, we propose an autonomous blimp robot, which has the advantages of low power consumption and collision-tolerance compared to other aerial vehicles like drones. This is important for search and rescue tasks that usually last for one or more hours. However, the underground constrained passages limit the size of blimp envelope and its payload, making the proposed system resource-constrained. Therefore, a careful design consideration is needed to build a blimp system with on-board artifact search and SLAM. In order to reach long-term operation, a failure-aware algorithm with minimal communication to human supervisor to have situational awareness and send control signals to the blimp when needed.


  Click for Model/Code and Paper
Team NCTU: Toward AI-Driving for Autonomous Surface Vehicles -- From Duckietown to RobotX

Oct 31, 2019
Yi-Wei Huang, Tzu-Kuan Chuang, Ni-Ching Lin, Yu-Chieh Hsiao, Pin-Wei Chen, Ching-Tang Hung, Shih-Hsing Liu, Hsiao-Sheng Chen, Ya-Hsiu Hsieh, Ching-Tang Hung, Yen-Hsiang Huang, Yu-Xuan Chen, Kuan-Lin Chen, Ya-Jou Lan, Chao-Chun Hsu, Chun-Yi Lin, Jhih-Ying Li, Jui-Te Huang, Yu-Jen Menn, Sin-Kiat Lim, Kim-Boon Lua, Chia-Hung Dylan Tsai, Chi-Fang Chen, Hsueh-Cheng Wang

Robotic software and hardware systems of autonomous surface vehicles have been developed in transportation, military, and ocean researches for decades. Previous efforts in RobotX Challenges 2014 and 2016 facilitates the developments for important tasks such as obstacle avoidance and docking. Team NCTU is motivated by the AI Driving Olympics (AI-DO) developed by the Duckietown community, and adopts the principles to RobotX challenge. With the containerization (Docker) and uniformed AI agent (with observations and actions), we could better 1) integrate solutions developed in different middlewares (ROS and MOOS), 2) develop essential functionalities of from simulation (Gazebo) to real robots (either miniaturized or full-sized WAM-V), and 3) compare different approaches either from classic model-based or learning-based. Finally, we setup an outdoor on-surface platform with localization services for evaluation. Some of the preliminary results will be presented for the Team NCTU participations of the RobotX competition in Hawaii in 2018.


  Click for Model/Code and Paper
Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics

Apr 07, 2019
Jiangliu Wang, Jianbo Jiao, Linchao Bao, Shengfeng He, Yunhui Liu, Wei Liu

We address the problem of video representation learning without human-annotated labels. While previous efforts address the problem by designing novel self-supervised tasks using video data, the learned features are merely on a frame-by-frame basis, which are not applicable to many video analytic tasks where spatio-temporal features are prevailing. In this paper we propose a novel self-supervised approach to learn spatio-temporal features for video representation. Inspired by the success of two-stream approaches in video classification, we propose to learn visual features by regressing both motion and appearance statistics along spatial and temporal dimensions, given only the input video data. Specifically, we extract statistical concepts (fast-motion region and the corresponding dominant direction, spatio-temporal color diversity, dominant color, etc.) from simple patterns in both spatial and temporal domains. Unlike prior puzzles that are even hard for humans to solve, the proposed approach is consistent with human inherent visual habits and therefore easy to answer. We conduct extensive experiments with C3D to validate the effectiveness of our proposed approach. The experiments show that our approach can significantly improve the performance of C3D when applied to video classification tasks. Code is available at https://github.com/laura-wang/video_repres_mas.

* CVPR 2019 

  Click for Model/Code and Paper