Models, code, and papers for "Yang Wu":

Deep Compressive Autoencoder for Action Potential Compression in Large-Scale Neural Recording

Sep 17, 2018
Tong Wu, Wenfeng Zhao, Edward Keefer, Zhi Yang

Understanding the coordinated activity underlying brain computations requires large-scale, simultaneous recordings from distributed neuronal structures at a cellular-level resolution. One major hurdle to design high-bandwidth, high-precision, large-scale neural interfaces lies in the formidable data streams that are generated by the recorder chip and need to be online transferred to a remote computer. The data rates can require hundreds to thousands of I/O pads on the recorder chip and power consumption on the order of Watts for data streaming alone. We developed a deep learning-based compression model to reduce the data rate of multichannel action potentials. The proposed model is built upon a deep compressive autoencoder (CAE) with discrete latent embeddings. The encoder is equipped with residual transformations to extract representative features from spikes, which are mapped into the latent embedding space and updated via vector quantization (VQ). The decoder network reconstructs spike waveforms from the quantized latent embeddings. Experimental results show that the proposed model consistently outperforms conventional methods by achieving much higher compression ratios (20-500x) and better or comparable reconstruction accuracies. Testing results also indicate that CAE is robust against a diverse range of imperfections, such as waveform variation and spike misalignment, and has minor influence on spike sorting accuracy. Furthermore, we have estimated the hardware cost and real-time performance of CAE and shown that it could support thousands of recording channels simultaneously without excessive power/heat dissipation. The proposed model can reduce the required data transmission bandwidth in large-scale recording experiments and maintain good signal qualities. The code of this work has been made available at https://github.com/tong-wu-umn/spike-compression-autoencoder

* 19 pages, 13 figures 

  Click for Model/Code and Paper
Detecting 11K Classes: Large Scale Object Detection without Fine-Grained Bounding Boxes

Aug 14, 2019
Hao Yang, Hao Wu, Hao Chen

Recent advances in deep learning greatly boost the performance of object detection. State-of-the-art methods such as Faster-RCNN, FPN and R-FCN have achieved high accuracy in challenging benchmark datasets. However, these methods require fully annotated object bounding boxes for training, which are incredibly hard to scale up due to the high annotation cost. Weakly-supervised methods, on the other hand, only require image-level labels for training, but the performance is far below their fully-supervised counterparts. In this paper, we propose a semi-supervised large scale fine-grained detection method, which only needs bounding box annotations of a smaller number of coarse-grained classes and image-level labels of large scale fine-grained classes, and can detect all classes at nearly fully-supervised accuracy. We achieve this by utilizing the correlations between coarse-grained and fine-grained classes with shared backbone, soft-attention based proposal re-ranking, and a dual-level memory module. Experiment results show that our methods can achieve close accuracy on object detection to state-of-the-art fully-supervised methods on two large scale datasets, ImageNet and OpenImages, with only a small fraction of fully annotated classes.

* Accepted to ICCV 2019 

  Click for Model/Code and Paper
Position Estimation of Camera Based on Unsupervised Learning

May 05, 2018
YanTong Wu, Yang Liu

It is an exciting task to recover the scene's 3d-structure and camera pose from the video sequence. Most of the current solutions divide it into two parts, monocular depth recovery and camera pose estimation. The monocular depth recovery is often studied as an independent part, and a better depth estimation is used to solve the pose. While camera pose is still estimated by traditional SLAM (Simultaneous Localization And Mapping) methods in most cases. The use of unsupervised method for monocular depth recovery and pose estimation has benefited from the study of [1] and achieved good results. In this paper, we improve the method of [1]. Our emphasis is laid on the improvement of the idea and related theory, introducing a more reasonable inter frame constraints and finally synthesize the camera trajectory with inter frame pose estimation in the unified world coordinate system. And our results get better performance.

* 6 pages,5 figures,1 table 

  Click for Model/Code and Paper
Beyond Low-Rank Representations: Orthogonal Clustering Basis Reconstruction with Optimized Graph Structure for Multi-view Spectral Clustering

Mar 22, 2018
Yang Wang, Lin Wu

Low-Rank Representation (LRR) is arguably one of the most powerful paradigms for Multi-view spectral clustering, which elegantly encodes the multi-view local graph/manifold structures into an intrinsic low-rank self-expressive data similarity embedded in high-dimensional space, to yield a better graph partition than their single-view counterparts. In this paper we revisit it with a fundamentally different perspective by discovering LRR as essentially a latent clustered orthogonal projection based representation winged with an optimized local graph structure for spectral clustering; each column of the representation is fundamentally a cluster basis orthogonal to others to indicate its members, which intuitively projects the view-specific feature representation to be the one spanned by all orthogonal basis to characterize the cluster structures. Upon this finding, we propose our technique with the followings: (1) We decompose LRR into latent clustered orthogonal representation via low-rank matrix factorization, to encode the more flexible cluster structures than LRR over primal data objects; (2) We convert the problem of LRR into that of simultaneously learning orthogonal clustered representation and optimized local graph structure for each view; (3) The learned orthogonal clustered representations and local graph structures enjoy the same magnitude for multi-view, so that the ideal multi-view consensus can be readily achieved. The experiments over multi-view datasets validate its superiority.

* Accepted to appear in Neural Networks, Elsevier, on 9th March 2018 

  Click for Model/Code and Paper
Multi-View Spectral Clustering via Structured Low-Rank Matrix Factorization

Dec 07, 2017
Yang Wang, Lin Wu

Multi-view data clustering attracts more attention than their single view counterparts due to the fact that leveraging multiple independent and complementary information from multi-view feature spaces outperforms the single one. Multi-view Spectral Clustering aims at yielding the data partition agreement over their local manifold structures by seeking eigenvalue-eigenvector decompositions. However, as we observed, such classical paradigm still suffers from (1) overlooking the flexible local manifold structure, caused by (2) enforcing the low-rank data correlation agreement among all views; worse still, (3) LRR is not intuitively flexible to capture the latent data clustering structures. In this paper, we present the structured LRR by factorizing into the latent low-dimensional data-cluster representations, which characterize the data clustering structure for each view. Upon such representation, (b) the laplacian regularizer is imposed to be capable of preserving the flexible local manifold structure for each view. (c) We present an iterative multi-view agreement strategy by minimizing the divergence objective among all factorized latent data-cluster representations during each iteration of optimization process, where such latent representation from each view serves to regulate those from other views, such intuitive process iteratively coordinates all views to be agreeable. (d) We remark that such data-cluster representation can flexibly encode the data clustering structure from any view with adaptive input cluster number. To this end, (e) a novel non-convex objective function is proposed via the efficient alternating minimization strategy. The complexity analysis are also presented. The extensive experiments conducted against the real-world multi-view datasets demonstrate the superiority over state-of-the-arts.

* Accepted to appear at IEEE Trans on Neural Networks and Learning Systems 

  Click for Model/Code and Paper
Structured Deep Hashing with Convolutional Neural Networks for Fast Person Re-identification

Dec 03, 2017
Lin Wu, Yang Wang

Given a pedestrian image as a query, the purpose of person re-identification is to identify the correct match from a large collection of gallery images depicting the same person captured by disjoint camera views. The critical challenge is how to construct a robust yet discriminative feature representation to capture the compounded variations in pedestrian appearance. To this end, deep learning methods have been proposed to extract hierarchical features against extreme variability of appearance. However, existing methods in this category generally neglect the efficiency in the matching stage whereas the searching speed of a re-identification system is crucial in real-world applications. In this paper, we present a novel deep hashing framework with Convolutional Neural Networks (CNNs) for fast person re-identification. Technically, we simultaneously learn both CNN features and hash functions/codes to get robust yet discriminative features and similarity-preserving hash codes. Thereby, person re-identification can be resolved by efficiently computing and ranking the Hamming distances between images. A structured loss function defined over positive pairs and hard negatives is proposed to formulate a novel optimization problem so that fast convergence and more stable optimized solution can be obtained. Extensive experiments on two benchmarks CUHK03 \cite{FPNN} and Market-1501 \cite{Market1501} show that the proposed deep architecture is efficacy over state-of-the-arts.

* To appear at Computer Vision and Image Understanding 

  Click for Model/Code and Paper
Where to Focus: Deep Attention-based Spatially Recurrent Bilinear Networks for Fine-Grained Visual Recognition

Sep 18, 2017
Lin Wu, Yang Wang

Fine-grained visual recognition typically depends on modeling subtle difference from object parts. However, these parts often exhibit dramatic visual variations such as occlusions, viewpoints, and spatial transformations, making it hard to detect. In this paper, we present a novel attention-based model to automatically, selectively and accurately focus on critical object regions with higher importance against appearance variations. Given an image, two different Convolutional Neural Networks (CNNs) are constructed, where the outputs of two CNNs are correlated through bilinear pooling to simultaneously focus on discriminative regions and extract relevant features. To capture spatial distributions among the local regions with visual attention, soft attention based spatial Long-Short Term Memory units (LSTMs) are incorporated to realize spatially recurrent yet visually selective over local input patterns. All the above intuitions equip our network with the following novel model: two-stream CNN layers, bilinear pooling layer, spatial recurrent layer with location attention are jointly trained via an end-to-end fashion to serve as the part detector and feature extractor, whereby relevant features are localized and extracted attentively. We show the significance of our network against two well-known visual recognition tasks: fine-grained image classification and person re-identification.

* 8 pages 

  Click for Model/Code and Paper
Finding Modes by Probabilistic Hypergraphs Shifting

Apr 12, 2017
Yang Wang, Lin Wu

In this paper, we develop a novel paradigm, namely hypergraph shift, to find robust graph modes by probabilistic voting strategy, which are semantically sound besides the self-cohesiveness requirement in forming graph modes. Unlike the existing techniques to seek graph modes by shifting vertices based on pair-wise edges (i.e, an edge with $2$ ends), our paradigm is based on shifting high-order edges (hyperedges) to deliver graph modes. Specifically, we convert the problem of seeking graph modes as the problem of seeking maximizers of a novel objective function with the aim to generate good graph modes based on sifting edges in hypergraphs. As a result, the generated graph modes based on dense subhypergraphs may more accurately capture the object semantics besides the self-cohesiveness requirement. We also formally prove that our technique is always convergent. Extensive empirical studies on synthetic and real world data sets are conducted on clustering and graph matching. They demonstrate that our techniques significantly outperform the existing techniques.

* Fixing some minor issues in PAKDD 2014 

  Click for Model/Code and Paper
Robust Hashing for Multi-View Data: Jointly Learning Low-Rank Kernelized Similarity Consensus and Hash Functions

Nov 17, 2016
Lin Wu, Yang Wang

Learning hash functions/codes for similarity search over multi-view data is attracting increasing attention, where similar hash codes are assigned to the data objects characterizing consistently neighborhood relationship across views. Traditional methods in this category inherently suffer three limitations: 1) they commonly adopt a two-stage scheme where similarity matrix is first constructed, followed by a subsequent hash function learning; 2) these methods are commonly developed on the assumption that data samples with multiple representations are noise-free,which is not practical in real-life applications; 3) they often incur cumbersome training model caused by the neighborhood graph construction using all $N$ points in the database ($O(N)$). In this paper, we motivate the problem of jointly and efficiently training the robust hash functions over data objects with multi-feature representations which may be noise corrupted. To achieve both the robustness and training efficiency, we propose an approach to effectively and efficiently learning low-rank kernelized \footnote{We use kernelized similarity rather than kernel, as it is not a squared symmetric matrix for data-landmark affinity matrix.} hash functions shared across views. Specifically, we utilize landmark graphs to construct tractable similarity matrices in multi-views to automatically discover neighborhood structure in the data. To learn robust hash functions, a latent low-rank kernel function is used to construct hash functions in order to accommodate linearly inseparable data. In particular, a latent kernelized similarity matrix is recovered by rank minimization on multiple kernel-based similarity matrices. Extensive experiments on real-world multi-view datasets validate the efficacy of our method in the presence of error corruptions.

* Accepted to appear in Image and Vision Computing 

  Click for Model/Code and Paper
Detecting Image Forgeries using Geometric Cues

Dec 17, 2010
Lin Wu, Yang Wang

This chapter presents a framework for detecting fake regions by using various methods including watermarking technique and blind approaches. In particular, we describe current categories on blind approaches which can be divided into five: pixel-based techniques, format-based techniques, camera-based techniques, physically-based techniques and geometric-based techniques. Then we take a second look on the geometric-based techniques and further categorize them in detail. In the following section, the state-of-the-art methods involved in the geometric technique are elaborated.

* 18 pages, 10 figures 

  Click for Model/Code and Paper
Distributed Machine Learning on Mobile Devices: A Survey

Sep 18, 2019
Renjie Gu, Shuo Yang, Fan Wu

In recent years, mobile devices have gained increasingly development with stronger computation capability and larger storage. Some of the computation-intensive machine learning and deep learning tasks can now be run on mobile devices. To take advantage of the resources available on mobile devices and preserve users' privacy, the idea of mobile distributed machine learning is proposed. It uses local hardware resources and local data to solve machine learning sub-problems on mobile devices, and only uploads computation results instead of original data to contribute to the optimization of the global model. This architecture can not only relieve computation and storage burden on servers, but also protect the users' sensitive information. Another benefit is the bandwidth reduction, as various kinds of local data can now participate in the training process without being uploaded to the server. In this paper, we provide a comprehensive survey on recent studies of mobile distributed machine learning. We survey a number of widely-used mobile distributed machine learning methods. We also present an in-depth discussion on the challenges and future directions in this area. We believe that this survey can demonstrate a clear overview of mobile distributed machine learning and provide guidelines on applying mobile distributed machine learning to real applications.


  Click for Model/Code and Paper
Self-Supervised Dialogue Learning

Jun 30, 2019
Jiawei Wu, Xin Wang, William Yang Wang

The sequential order of utterances is often meaningful in coherent dialogues, and the order changes of utterances could lead to low-quality and incoherent conversations. We consider the order information as a crucial supervised signal for dialogue learning, which, however, has been neglected by many previous dialogue systems. Therefore, in this paper, we introduce a self-supervised learning task, inconsistent order detection, to explicitly capture the flow of conversation in dialogues. Given a sampled utterance pair triple, the task is to predict whether it is ordered or misordered. Then we propose a sampling-based self-supervised network SSN to perform the prediction with sampled triple references from previous dialogue history. Furthermore, we design a joint learning framework where SSN can guide the dialogue systems towards more coherent and relevant dialogue learning through adversarial training. We demonstrate that the proposed methods can be applied to both open-domain and task-oriented dialogue scenarios, and achieve the new state-of-the-art performance on the OpenSubtitiles and Movie-Ticket Booking datasets.

* 11pages, 2 figures, accepted to ACL 2019 

  Click for Model/Code and Paper
Revisiting EmbodiedQA: A Simple Baseline and Beyond

Apr 08, 2019
Yu Wu, Lu Jiang, Yi Yang

In Embodied Question Answering (EmbodiedQA), an agent interacts with an environment to gather necessary information for answering user questions. Existing works have laid a solid foundation towards solving this interesting problem. But the current performance, especially in navigation, suggests that EmbodiedQA might be too challenging for current approaches. In this paper, we empirically study this problem and introduce 1) a simple yet effective baseline that can be end-to-end optimized by SGD; 2) an easier and practical setting for EmbodiedQA where an agent has a chance to adapt the trained model to a new environment before it actually answers users questions. In the new setting, we randomly place a few objects in new environments, and upgrade the agent policy by a distillation network to retain the generalization ability from the trained model. On the EmbodiedQA v1 benchmark, under the standard setting, our simple baseline achieves very competitive results to the-state-of-the-art; in the new setting, we found the introduced small change in settings yields a notable gain in navigation.


  Click for Model/Code and Paper
Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation

Apr 04, 2019
Jiawei Wu, Xin Wang, William Yang Wang

The overreliance on large parallel corpora significantly limits the applicability of machine translation systems to the majority of language pairs. Back-translation has been dominantly used in previous approaches for unsupervised neural machine translation, where pseudo sentence pairs are generated to train the models with a reconstruction loss. However, the pseudo sentences are usually of low quality as translation errors accumulate during training. To avoid this fundamental issue, we propose an alternative but more effective approach, extract-edit, to extract and then edit real sentences from the target monolingual corpora. Furthermore, we introduce a comparative translation loss to evaluate the translated target sentences and thus train the unsupervised translation systems. Experiments show that the proposed approach consistently outperforms the previous state-of-the-art unsupervised machine translation systems across two benchmarks (English-French and English-German) and two low-resource language pairs (English-Romanian and English-Russian) by more than 2 (up to 3.63) BLEU points.

* 11 pages, 3 figures. Accepted to NAACL 2019 

  Click for Model/Code and Paper
Reinforcement Learning for Optimal Load Distribution Sequencing in Resource-Sharing System

Feb 05, 2019
Fei Wu, Yang Cao, Thomas Robertazzi

Divisible Load Theory (DLT) is a powerful tool for modeling divisible load problems in data-intensive systems. This paper studied an optimal divisible load distribution sequencing problem using a machine learning framework. The problem is to decide the optimal sequence to distribute divisible load to processors in order to achieve minimum finishing time. The scheduling is performed in a resource-sharing system where each physical processor is virtualized to multiple virtual processors. A reinforcement learning method called Multi-armed bandit (MAB) is used for our problem. We first provide a naive solution using the MAB algorithm and then several optimizations are performed. Various numerical tests are conducted. Our algorithm shows an increasing performance during the training progress and the global optimum will be acheived when the sample size is large enough.


  Click for Model/Code and Paper
Attacks on State-of-the-Art Face Recognition using Attentional Adversarial Attack Generative Network

Nov 30, 2018
Qing Song, Yingqi Wu, Lu Yang

With the broad use of face recognition, its weakness gradually emerges that it is able to be attacked. So, it is important to study how face recognition networks are subject to attacks. In this paper, we focus on a novel way to do attacks against face recognition network that misleads the network to identify someone as the target person not misclassify inconspicuously. Simultaneously, for this purpose, we introduce a specific attentional adversarial attack generative network to generate fake face images. For capturing the semantic information of the target person, this work adds a conditional variational autoencoder and attention modules to learn the instance-level correspondences between faces. Unlike traditional two-player GAN, this work introduces face recognition networks as the third player to participate in the competition between generator and discriminator which allows the attacker to impersonate the target person better. The generated faces which are hard to arouse the notice of onlookers can evade recognition by state-of-the-art networks and most of them are recognized as the target person.


  Click for Model/Code and Paper
An Adaptive Oversampling Learning Method for Class-Imbalanced Fault Diagnostics and Prognostics

Nov 19, 2018
Wenfang Lin, Zhenyu Wu, Yang Ji

Data-driven fault diagnostics and prognostics suffers from class-imbalance problem in industrial systems and it raises challenges to common machine learning algorithms as it becomes difficult to learn the features of the minority class samples. Synthetic oversampling methods are commonly used to tackle these problems by generating the minority class samples to balance the distributions between majority and minority classes. However, many of oversampling methods are inappropriate that they cannot generate effective and useful minority class samples according to different distributions of data, which further complicate the process of learning samples. Thus, this paper proposes a novel adaptive oversampling technique: EM-based Weighted Minority Oversampling TEchnique (EWMOTE) for industrial fault diagnostics and prognostics. The methods comprises a weighted minority sampling strategy to identify hard-to-learn informative minority fault samples and Expectation Maximization (EM) based imputation algorithm to generate fault samples. To validate the performance of the proposed methods, experiments are conducted in two real datasets. The results show that the method could achieve better performance on not only binary class, but multi-class imbalance learning task in different imbalance ratios than other oversampling-based baseline models.

* 8 pages 

  Click for Model/Code and Paper
Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval

Oct 29, 2018
Lin Wu, Yang Wang, Ling Shao

In this paper, we propose a novel deep generative approach to cross-modal retrieval to learn hash functions in the absence of paired training samples through the cycle consistency loss. Our proposed approach employs adversarial training scheme to lean a couple of hash functions enabling translation between modalities while assuming the underlying semantic relationship. To induce the hash codes with semantics to the input-output pair, cycle consistency loss is further proposed upon the adversarial training to strengthen the correlations between inputs and corresponding outputs. Our approach is generative to learn hash functions such that the learned hash codes can maximally correlate each input-output correspondence, meanwhile can also regenerate the inputs so as to minimize the information loss. The learning to hash embedding is thus performed to jointly optimize the parameters of the hash functions across modalities as well as the associated generative models. Extensive experiments on a variety of large-scale cross-modal data sets demonstrate that our proposed method achieves better retrieval results than the state-of-the-arts.

* To appeared on IEEE Trans. Image Processing. arXiv admin note: text overlap with arXiv:1703.10593 by other authors 

  Click for Model/Code and Paper
Privacy-Protective-GAN for Face De-identification

Jun 23, 2018
Yifan Wu, Fan Yang, Haibin Ling

Face de-identification has become increasingly important as the image sources are explosively growing and easily accessible. The advance of new face recognition techniques also arises people's concern regarding the privacy leakage. The mainstream pipelines of face de-identification are mostly based on the k-same framework, which bears critiques of low effectiveness and poor visual quality. In this paper, we propose a new framework called Privacy-Protective-GAN (PP-GAN) that adapts GAN with novel verificator and regulator modules specially designed for the face de-identification problem to ensure generating de-identified output with retained structure similarity according to a single input. We evaluate the proposed approach in terms of privacy protection, utility preservation, and structure similarity. Our approach not only outperforms existing face de-identification techniques but also provides a practical framework of adapting GAN with priors of domain knowledge.


  Click for Model/Code and Paper
Binary output layer of feedforward neural networks for solving multi-class classification problems

Jan 22, 2018
Sibo Yang, Chao Zhang, Wei Wu

Considered in this short note is the design of output layer nodes of feedforward neural networks for solving multi-class classification problems with r (bigger than or equal to 3) classes of samples. The common and conventional setting of output layer, called "one-to-one approach" in this paper, is as follows: The output layer contains r output nodes corresponding to the r classes. And for an input sample of the i-th class, the ideal output is 1 for the i-th output node, and 0 for all the other output nodes. We propose in this paper a new "binary approach": Suppose r is (2^(q minus 1), 2^q] with q bigger than or equal to 2, then we let the output layer contain q output nodes, and let the ideal outputs for the r classes be designed in a binary manner. Numerical experiments carried out in this paper show that our binary approach does equally good job as, but uses less output nodes than, the traditional one-to-one approach.


  Click for Model/Code and Paper