Models, code, and papers for "Jiaqi Wang":

Fast and Scalable Estimator for Sparse and Unit-Rank Higher-Order Regression Models

Nov 29, 2019
Jiaqi Zhang, Beilun Wang

Because tensor data appear more and more frequently in various scientific researches and real-world applications, analyzing the relationship between tensor features and the univariate outcome becomes an elementary task in many fields. To solve this task, we propose \underline{Fa}st \underline{S}parse \underline{T}ensor \underline{R}egression model (FasTR) based on so-called unit-rank CANDECOMP/PARAFAC decomposition. FasTR first decomposes the tensor coefficient into component vectors and then estimates each vector with $\ell_1$ regularized regression. Because of the independence of component vectors, FasTR is able to solve in a parallel way and the time complexity is proved to be superior to previous models. We evaluate the performance of FasTR on several simulated datasets and a real-world fMRI dataset. Experiment results show that, compared with four baseline models, in every case, FasTR can compute a better solution within less time.

* arXiv admin note: substantial text overlap with arXiv:1911.12965 

  Click for Model/Code and Paper
Sparse and Low-Rank Tensor Regression via Parallel Proximal Method

Nov 29, 2019
Jiaqi Zhang, Beilun Wang

Motivated by applications in various scientific fields having demand of predicting relationship between higher-order (tensor) feature and univariate response, we propose a \underline{S}parse and \underline{L}ow-rank \underline{T}ensor \underline{R}egression model (SLTR). This model enforces sparsity and low-rankness of the tensor coefficient by directly applying $\ell_1$ norm and tensor nuclear norm on it respectively, such that (1) the structural information of tensor is preserved and (2) the data interpretation is convenient. To make the solving procedure scalable and efficient, SLTR makes use of the proximal gradient method to optimize two norm regularizers, which can be easily implemented parallelly. Additionally, a tighter convergence rate is proved over three-order tensor data. We evaluate SLTR on several simulated datasets and one fMRI dataset. Experiment results show that, compared with previous models, SLTR is able to obtain a solution no worse than others with much less time cost.

  Click for Model/Code and Paper
Hierarchical Attention Generative Adversarial Networks for Cross-domain Sentiment Classification

Mar 27, 2019
Yuebing Zhang, Duoqian Miao, Jiaqi Wang

Cross-domain sentiment classification (CDSC) is an importance task in domain adaptation and sentiment classification. Due to the domain discrepancy, a sentiment classifier trained on source domain data may not works well on target domain data. In recent years, many researchers have used deep neural network models for cross-domain sentiment classification task, many of which use Gradient Reversal Layer (GRL) to design an adversarial network structure to train a domain-shared sentiment classifier. Different from those methods, we proposed Hierarchical Attention Generative Adversarial Networks (HAGAN) which alternately trains a generator and a discriminator in order to produce a document representation which is sentiment-distinguishable but domain-indistinguishable. Besides, the HAGAN model applies Bidirectional Gated Recurrent Unit (Bi-GRU) to encode the contextual information of a word and a sentence into the document representation. In addition, the HAGAN model use hierarchical attention mechanism to optimize the document representation and automatically capture the pivots and non-pivots. The experiments on Amazon review dataset show the effectiveness of HAGAN.

  Click for Model/Code and Paper
A Performance Evaluation of Correspondence Grouping Methods for 3D Rigid Data Matching

Jul 05, 2019
Jiaqi Yang, Ke Xian, Peng Wang, Yanning Zhang

Seeking consistent point-to-point correspondences between 3D rigid data (point clouds, meshes, or depth maps) is a fundamental problem in 3D computer vision. While a number of correspondence selection methods have been proposed in recent years, their advantages and shortcomings remain unclear regarding different applications and perturbations. To fill this gap, this paper gives a comprehensive evaluation of nine state-of-the-art 3D correspondence grouping methods. A good correspondence grouping algorithm is expected to retrieve as many as inliers from initial feature matches, giving a rise in both precision and recall as well as facilitating accurate transformation estimation. Toward this rule, we deploy experiments on three benchmarks with different application contexts including shape retrieval, 3D object recognition, and point cloud registration together with various perturbations such as noise, point density variation, clutter, occlusion, partial overlap, different scales of initial correspondences, and different combinations of keypoint detectors and descriptors. The rich variety of application scenarios and nuisances result in different spatial distributions and inlier ratios of initial feature correspondences, thus enabling a thorough evaluation. Based on the outcomes, we give a summary of the traits, merits, and demerits of evaluated approaches and indicate some potential future research directions.

* Extension of 3DV 2017 paper 

  Click for Model/Code and Paper
Evaluating Local Geometric Feature Representations for 3D Rigid Data Matching

Jun 29, 2019
Jiaqi Yang, Siwen Quan, Peng Wang, Yanning Zhang

Local geometric descriptors remain an essential component for 3D rigid data matching and fusion. The devise of a rotational invariant local geometric descriptor usually consists of two steps: local reference frame (LRF) construction and feature representation. Existing evaluation efforts have mainly been paid on the LRF or the overall descriptor, yet the quantitative comparison of feature representations remains unexplored. This paper fills this gap by comprehensively evaluating nine state-of-the-art local geometric feature representations. Our evaluation is on the ground that ground-truth LRFs are leveraged such that the ranking of tested feature representations are more convincing as opposed to existing studies. The experiments are deployed on six standard datasets with various application scenarios (shape retrieval, point cloud registration, and object recognition) and data modalities (LiDAR, Kinect, and Space Time) as well as perturbations including Gaussian noise, shot noise, data decimation, clutter, occlusion, and limited overlap. The evaluated terms cover the major concerns for a feature representation, e.g., distinctiveness, robustness, compactness, and efficiency. The outcomes present interesting findings that may shed new light on this community and provide complementary perspectives to existing evaluations on the topic of local geometric feature description. A summary of evaluated methods regarding their peculiarities is also presented to guide real-world applications and new descriptor crafting.

  Click for Model/Code and Paper
On the Robustness of the Backdoor-based Watermarking in Deep Neural Networks

Jun 18, 2019
Masoumeh Shafieinejad, Jiaqi Wang, Nils Lukas, Florian Kerschbaum

Obtaining the state of the art performance of deep learning models imposes a high cost to model generators, due to the tedious data preparation and the substantial processing requirements. To protect the model from unauthorized re-distribution, watermarking approaches have been introduced in the past couple of years. The watermark allows the legitimate owner to detect copyright violations of their model. We investigate the robustness and reliability of state-of-the-art deep neural network watermarking schemes. We focus on backdoor-based watermarking and show that an adversary can remove the watermark fully by just relying on public data and without any access to the model's sensitive information such as the training data set, the trigger set or the model parameters. We as well prove the security inadequacy of the backdoor-based watermarking in keeping the watermark hidden by proposing an attack that detects whether a model contains a watermark.

  Click for Model/Code and Paper
Efficient Private ERM for Smooth Objectives

May 24, 2017
Jiaqi Zhang, Kai Zheng, Wenlong Mou, Liwei Wang

In this paper, we consider efficient differentially private empirical risk minimization from the viewpoint of optimization algorithms. For strongly convex and smooth objectives, we prove that gradient descent with output perturbation not only achieves nearly optimal utility, but also significantly improves the running time of previous state-of-the-art private optimization algorithms, for both $\epsilon$-DP and $(\epsilon, \delta)$-DP. For non-convex but smooth objectives, we propose an RRPSGD (Random Round Private Stochastic Gradient Descent) algorithm, which provably converges to a stationary point with privacy guarantee. Besides the expected utility bounds, we also provide guarantees in high probability form. Experiments demonstrate that our algorithm consistently outperforms existing method in both utility and running time.

  Click for Model/Code and Paper
Robot Calligraphy using Pseudospectral Optimal Control in Conjunction with a Simulated Brush Model

Nov 18, 2019
Sen Wang, Jiaqi Chen, Xuanliang Deng, Seth Hutchinson, Frank Dellaert

Chinese calligraphy is a unique form of art that has great artistic value but is difficult to master. In this paper, we make robots write calligraphy. Learning methods could teach robots to write, but may not be able to generalize to new characters. As such, we formulate the calligraphy writing problem as a trajectory optimization problem, and propose a new virtual brush model for simulating the real dynamic writing process. Our optimization approach is taken from pseudospectral optimal control, where the proposed dynamic virtual brush model plays a key role in formulating the objective function to be optimized. We also propose a stroke-level optimization to achieve better performance compared to the character-level optimization proposed in previous work. Our methodology shows good performance in drawing aesthetically pleasing characters.

* conference paper submitted to ICRA2020 

  Click for Model/Code and Paper
Region Proposal by Guided Anchoring

Jan 10, 2019
Jiaqi Wang, Kai Chen, Shuo Yang, Chen Change Loy, Dahua Lin

Region anchors are the cornerstone of modern object detection techniques. State-of-the-art detectors mostly rely on a dense anchoring scheme, where anchors are sampled uniformly over the spatial domain with a predefined set of scales and aspect ratios. In this paper, we revisit this foundational stage. Our study shows that it can be done much more effectively and efficiently. Specifically, we present an alternative scheme, named Guided Anchoring, which leverages semantic features to guide the anchoring. The proposed method jointly predicts the locations where the center of objects of interest are likely to exist as well as the scales and aspect ratios at different locations. On top of predicted anchor shapes, we mitigate the feature inconsistency with a feature adaption module. We also study the use of high-quality proposals to improve detection performance. The anchoring scheme can be seamlessly integrated to proposal methods and detectors. With Guided Anchoring, we achieve $9.1\%$ higher recall on MS COCO with $90\%$ fewer anchors than the RPN baseline. We also adopt Guided Anchoring in Fast R-CNN, Faster R-CNN and RetinaNet, respectively improving the detection mAP by $2.2\%$, $2.7\%$ and $1.2\%$.

  Click for Model/Code and Paper
Modelling Protagonist Goals and Desires in First-Person Narrative

Aug 29, 2017
Elahe Rahimtoroghi, Jiaqi Wu, Ruimin Wang, Pranav Anand, Marilyn A Walker

Many genres of natural language text are narratively structured, a testament to our predilection for organizing our experiences as narratives. There is broad consensus that understanding a narrative requires identifying and tracking the goals and desires of the characters and their narrative outcomes. However, to date, there has been limited work on computational models for this problem. We introduce a new dataset, DesireDB, which includes gold-standard labels for identifying statements of desire, textual evidence for desire fulfillment, and annotations for whether the stated desire is fulfilled given the evidence in the narrative context. We report experiments on tracking desire fulfillment using different methods, and show that LSTM Skip-Thought model achieves F-measure of 0.7 on our corpus.

* 10 pages, 18th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL 2017) 

  Click for Model/Code and Paper
Research on the fast Fourier transform of image based on GPU

May 29, 2015
Feifei Shen, Zhenjian Song, Congrui Wu, Jiaqi Geng, Qingyun Wang

Study of general purpose computation by GPU (Graphics Processing Unit) can improve the image processing capability of micro-computer system. This paper studies the parallelism of the different stages of decimation in time radix 2 FFT algorithm, designs the butterfly and scramble kernels and implements 2D FFT on GPU. The experiment result demonstrates the validity and advantage over general CPU, especially in the condition of large input size. The approach can also be generalized to other transforms alike.

  Click for Model/Code and Paper
Inference for Network Structure and Dynamics from Time Series Data via Graph Neural Network

Jan 18, 2020
Mengyuan Chen, Jiang Zhang, Zhang Zhang, Lun Du, Qiao Hu, Shuo Wang, Jiaqi Zhu

Network structures in various backgrounds play important roles in social, technological, and biological systems. However, the observable network structures in real cases are often incomplete or unavailable due to measurement errors or private protection issues. Therefore, inferring the complete network structure is useful for understanding complex systems. The existing studies have not fully solved the problem of inferring network structure with partial or no information about connections or nodes. In this paper, we tackle the problem by utilizing time series data generated by network dynamics. We regard the network inference problem based on dynamical time series data as a problem of minimizing errors for predicting future states and proposed a novel data-driven deep learning model called Gumbel Graph Network (GGN) to solve the two kinds of network inference problems: Network Reconstruction and Network Completion. For the network reconstruction problem, the GGN framework includes two modules: the dynamics learner and the network generator. For the network completion problem, GGN adds a new module called the States Learner to infer missing parts of the network. We carried out experiments on discrete and continuous time series data. The experiments show that our method can reconstruct up to 100% network structure on the network reconstruction task. While the model can also infer the unknown parts of the structure with up to 90% accuracy when some nodes are missing. And the accuracy decays with the increase of the fractions of missing nodes. Our framework may have wide application areas where the network structure is hard to obtained and the time series data is rich.

  Click for Model/Code and Paper
CARAFE: Content-Aware ReAssembly of FEatures

May 07, 2019
Jiaqi Wang, Kai Chen, Rui Xu, Ziwei Liu, Chen Change Loy, Dahua Lin

Feature upsampling is a key operation in a number of modern convolutional network architectures, e.g. feature pyramids. Its design is critical for dense prediction tasks such as object detection and semantic/instance segmentation. In this work, we propose Content-Aware ReAssembly of FEatures (CARAFE), a universal, lightweight and highly effective operator to fulfill this goal. CARAFE has several appealing properties: (1) Large field of view. Unlike previous works (e.g. bilinear interpolation) that only exploit sub-pixel neighborhood, CARAFE can aggregate contextual information within a large receptive field. (2) Content-aware handling. Instead of using a fixed kernel for all samples (e.g. deconvolution), CARAFE enables instance specific content-aware handling, which generates adaptive kernels on-the-fly. (3) Lightweight and fast to compute. CARAFE introduces little computational overhead and can be readily integrated into modern network architectures. We conduct comprehensive evaluations on standard benchmarks in object detection, instance/semantic segmentation and inpainting. CARAFE shows consistent and substantial gains across all the tasks (1.2%, 1.3%, 1.8%, 1.1db respectively) with negligible computational overhead. It has great potential to serve as a strong building block for future research.

* Technical Report 

  Click for Model/Code and Paper
Optimizing Video Object Detection via a Scale-Time Lattice

Apr 16, 2018
Kai Chen, Jiaqi Wang, Shuo Yang, Xingcheng Zhang, Yuanjun Xiong, Chen Change Loy, Dahua Lin

High-performance object detection relies on expensive convolutional networks to compute features, often leading to significant challenges in applications, e.g. those that require detecting objects from video streams in real time. The key to this problem is to trade accuracy for efficiency in an effective way, i.e. reducing the computing cost while maintaining competitive performance. To seek a good balance, previous efforts usually focus on optimizing the model architectures. This paper explores an alternative approach, that is, to reallocate the computation over a scale-time space. The basic idea is to perform expensive detection sparsely and propagate the results across both scales and time with substantially cheaper networks, by exploiting the strong correlations among them. Specifically, we present a unified framework that integrates detection, temporal propagation, and across-scale refinement on a Scale-Time Lattice. On this framework, one can explore various strategies to balance performance and cost. Taking advantage of this flexibility, we further develop an adaptive scheme with the detector invoked on demand and thus obtain improved tradeoff. On ImageNet VID dataset, the proposed method can achieve a competitive mAP 79.6% at 20 fps, or 79.0% at 62 fps as a performance/speed tradeoff.

* Accepted to CVPR 2018. Project page: 

  Click for Model/Code and Paper
Side-Aware Boundary Localization for More Precise Object Detection

Dec 09, 2019
Jiaqi Wang, Wenwei Zhang, Yuhang Cao, Kai Chen, Jiangmiao Pang, Tao Gong, Jianping Shi, Chen Change Loy, Dahua Lin

Current object detection frameworks mainly rely on bounding box regression to localize objects. Despite the remarkable progress in recent years, the precision of bounding box regression remains unsatisfactory, hence limiting performance in object detection. We observe that precise localization requires careful placement of each side of the bounding box. However, the mainstream approach, which focuses on predicting centers and sizes, is not the most effective way to accomplish this task, especially when there exists displacements with large variance between the anchors and the targets.In this paper, we propose an alternative approach, named as Side-Aware Boundary Localization (SABL), where each side of the bounding box is respectively localized with a dedicated network branch. Moreover, to tackle the difficulty of precise localization in the presence of displacements with large variance, we further propose a two-step localization scheme, which first predicts a range of movement through bucket prediction and then pinpoints the precise position within the predicted bucket. We test the proposed method on both two-stage and single-stage detection frameworks. Replacing the standard bounding box regression branch with the proposed design leads to significant improvements on Faster R-CNN, RetinaNet, and Cascade R-CNN, by 3.0%, 1.6%, and 0.9%, respectively. Code and models will be available at

* Technical Report 

  Click for Model/Code and Paper
Hybrid Task Cascade for Instance Segmentation

Jan 22, 2019
Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin

Cascade is a classic yet powerful architecture that has boosted performance on various tasks. However, how to introduce cascade to instance segmentation remains an open question. A simple combination of Cascade R-CNN and Mask R-CNN only brings limited gain. In exploring a more effective approach, we find that the key to a successful instance segmentation cascade is to fully leverage the reciprocal relationship between detection and segmentation. In this work, we propose a new framework, Hybrid Task Cascade (HTC), which differs in two important aspects: (1) instead of performing cascaded refinement on these two tasks separately, it interweaves them for a joint multi-stage processing; (2) it adopts a fully convolutional branch to provide spatial context, which can help distinguishing hard foreground from cluttered background. Overall, this framework can learn more discriminative features progressively while integrating complementary features together in each stage. Without bells and whistles, a single HTC obtains 38.4% and 1.5% improvement over a strong Cascade Mask R-CNN baseline on MSCOCO dataset. More importantly, our overall system achieves 48.6 mask AP on the test-challenge dataset and 49.0 mask AP on test-dev, which are the state-of-the-art performance.

* Technical report. Winning entry of COCO 2018 Challenge (object detection task) 

  Click for Model/Code and Paper
MMDetection: Open MMLab Detection Toolbox and Benchmark

Jun 17, 2019
Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin

We present MMDetection, an object detection toolbox that contains a rich set of object detection and instance segmentation methods as well as related components and modules. The toolbox started from a codebase of MMDet team who won the detection track of COCO Challenge 2018. It gradually evolves into a unified platform that covers many popular detection methods and contemporary modules. It not only includes training and inference codes, but also provides weights for more than 200 network models. We believe this toolbox is by far the most complete detection toolbox. In this paper, we introduce the various features of this toolbox. In addition, we also conduct a benchmarking study on different methods, components, and their hyper-parameters. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new detectors. Code and models are available at The project is under active development and we will keep this document updated.

* Technical report of MMDetection. 11 pages 

  Click for Model/Code and Paper
Deep Recurrent Convolutional Neural Network: Improving Performance For Speech Recognition

Dec 27, 2016
Zewang Zhang, Zheng Sun, Jiaqi Liu, Jingwen Chen, Zhao Huo, Xiao Zhang

A deep learning approach has been widely applied in sequence modeling problems. In terms of automatic speech recognition (ASR), its performance has significantly been improved by increasing large speech corpus and deeper neural network. Especially, recurrent neural network and deep convolutional neural network have been applied in ASR successfully. Given the arising problem of training speed, we build a novel deep recurrent convolutional network for acoustic modeling and then apply deep residual learning to it. Our experiments show that it has not only faster convergence speed but better recognition accuracy over traditional deep convolutional recurrent network. In the experiments, we compare the convergence speed of our novel deep recurrent convolutional networks and traditional deep convolutional recurrent networks. With faster convergence speed, our novel deep recurrent convolutional networks can reach the comparable performance. We further show that applying deep residual learning can boost the convergence speed of our novel deep recurret convolutional networks. Finally, we evaluate all our experimental networks by phoneme error rate (PER) with our proposed bidirectional statistical n-gram language model. Our evaluation results show that our newly proposed deep recurrent convolutional network applied with deep residual learning can reach the best PER of 17.33\% with the fastest convergence speed on TIMIT database. The outstanding performance of our novel deep recurrent convolutional neural network with deep residual learning indicates that it can be potentially adopted in other sequential problems.

* 11 pages, 13 figures 

  Click for Model/Code and Paper
CN-Probase: A Data-driven Approach for Large-scale Chinese Taxonomy Construction

Feb 27, 2019
Jindong Chen, Ao Wang, Jiangjie Chen, Yanghua Xiao, Zhendong Chu, Jingping Liu, Jiaqing Liang, Wei Wang

Taxonomies play an important role in machine intelligence. However, most well-known taxonomies are in English, and non-English taxonomies, especially Chinese ones, are still very rare. In this paper, we focus on automatic Chinese taxonomy construction and propose an effective generation and verification framework to build a large-scale and high-quality Chinese taxonomy. In the generation module, we extract isA relations from multiple sources of Chinese encyclopedia, which ensures the coverage. To further improve the precision of taxonomy, we apply three heuristic approaches in verification module. As a result, we construct the largest Chinese taxonomy with high precision about 95% called CN-Probase. Our taxonomy has been deployed on Aliyun, with over 82 million API calls in six months.

  Click for Model/Code and Paper
Composing Music with Grammar Argumented Neural Networks and Note-Level Encoding

Dec 07, 2016
Zheng Sun, Jiaqi Liu, Zewang Zhang, Jingwen Chen, Zhao Huo, Ching Hua Lee, Xiao Zhang

Creating aesthetically pleasing pieces of art, including music, has been a long-term goal for artificial intelligence research. Despite recent successes of long-short term memory (LSTM) recurrent neural networks (RNNs) in sequential learning, LSTM neural networks have not, by themselves, been able to generate natural-sounding music conforming to music theory. To transcend this inadequacy, we put forward a novel method for music composition that combines the LSTM with Grammars motivated by music theory. The main tenets of music theory are encoded as grammar argumented (GA) filters on the training data, such that the machine can be trained to generate music inheriting the naturalness of human-composed pieces from the original dataset while adhering to the rules of music theory. Unlike previous approaches, pitches and durations are encoded as one semantic entity, which we refer to as note-level encoding. This allows easy implementation of music theory grammars, as well as closer emulation of the thinking pattern of a musician. Although the GA rules are applied to the training data and never directly to the LSTM music generation, our machine still composes music that possess high incidences of diatonic scale notes, small pitch intervals and chords, in deference to music theory.

* 6 pages, 4 figures 

  Click for Model/Code and Paper