Models, code, and papers for "Junjie Yin":

A Deep Learning Based Chatbot for Campus Psychological Therapy

Oct 09, 2019
Junjie Yin, Zixun Chen, Kelai Zhou, Chongyuan Yu

In this paper, we propose Evebot, an innovative, sequence to sequence (Seq2seq) based, fully generative conversational system for the diagnosis of negative emotions and prevention of depression through positively suggestive responses. The system consists of an assembly of deep-learning based models, including Bi-LSTM based model for detecting negative emotions of users and obtaining psychological counselling related corpus for training the chatbot, anti-language sequence to sequence neural network, and maximum mutual information (MMI) model. As adolescents are reluctant to show their negative emotions in physical interaction, traditional methods of emotion analysis and comforting methods may not work. Therefore, this system puts emphasis on using virtual platform to detect signs of depression or anxiety, channel adolescents' stress and mood, and thus prevent the emergence of mental illness. We launched the integrated chatbot system onto an online platform for real-world campus applications. Through a one-month user study, we observe better results in the increase in positivity than other public chatbots in the control group.

* 31 pages 

  Click for Model/Code and Paper
Combining Subgoal Graphs with Reinforcement Learning to Build a Rational Pathfinder

Nov 05, 2018
Junjie Zeng, Long Qin, Yue Hu, Cong Hu, Quanjun Yin

In this paper, we present a hierarchical path planning framework called SG-RL (subgoal graphs-reinforcement learning), to plan rational paths for agents maneuvering in continuous and uncertain environments. By "rational", we mean (1) efficient path planning to eliminate first-move lags; (2) collision-free and smooth for agents with kinematic constraints satisfied. SG-RL works in a two-level manner. At the first level, SG-RL uses a geometric path-planning method, i.e., Simple Subgoal Graphs (SSG), to efficiently find optimal abstract paths, also called subgoal sequences. At the second level, SG-RL uses an RL method, i.e., Least-Squares Policy Iteration (LSPI), to learn near-optimal motion-planning policies which can generate kinematically feasible and collision-free trajectories between adjacent subgoals. The first advantage of the proposed method is that SSG can solve the limitations of sparse reward and local minima trap for RL agents; thus, LSPI can be used to generate paths in complex environments. The second advantage is that, when the environment changes slightly (i.e., unexpected obstacles appearing), SG-RL does not need to reconstruct subgoal graphs and replan subgoal sequences using SSG, since LSPI can deal with uncertainties by exploiting its generalization ability to handle changes in environments. Simulation experiments in representative scenarios demonstrate that, compared with existing methods, SG-RL can work well on large-scale maps with relatively low action-switching frequencies and shorter path lengths, and SG-RL can deal with small changes in environments. We further demonstrate that the design of reward functions and the types of training environments are important factors for learning feasible policies.

* 20 pages 

  Click for Model/Code and Paper
A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis

Nov 11, 2019
Junjie Pan, Xiang Yin, Zhiling Zhang, Shichao Liu, Yang Zhang, Zejun Ma, Yuxuan Wang

In Mandarin text-to-speech (TTS) system, the front-end text processing module significantly influences the intelligibility and naturalness of synthesized speech. Building a typical pipeline-based front-end which consists of multiple individual components requires extensive efforts. In this paper, we proposed a unified sequence-to-sequence front-end model for Mandarin TTS that converts raw texts to linguistic features directly. Compared to the pipeline-based front-end, our unified front-end can achieve comparable performance in polyphone disambiguation and prosody word prediction, and improve intonation phrase prediction by 0.0738 in F1 score. We also implemented the unified front-end with Tacotron and WaveRNN to build a Mandarin TTS system. The synthesized speech by that got a comparable MOS (4.38) with the pipeline-based front-end (4.37) and close to human recordings (4.49).

* Submitted to ICASSP 2020 

  Click for Model/Code and Paper
A hybrid text normalization system using multi-head self-attention for mandarin

Nov 11, 2019
Junhui Zhang, Junjie Pan, Xiang Yin, Chen Li, Shichao Liu, Yang Zhang, Yuxuan Wang, Zejun Ma

In this paper, we propose a hybrid text normalization system using multi-head self-attention. The system combines the advantages of a rule-based model and a neural model for text preprocessing tasks. Previous studies in Mandarin text normalization usually use a set of hand-written rules, which are hard to improve on general cases. The idea of our proposed system is motivated by the neural models from recent studies and has a better performance on our internal news corpus. This paper also includes different attempts to deal with imbalanced pattern distribution of the dataset. Overall, the performance of the system is improved by over 1.5% on sentence-level and it has a potential to improve further.

* 4 pages of content, 1 page of reference, 3 figures, submitted to ICASSP 2020 

  Click for Model/Code and Paper
DeepCenterline: a Multi-task Fully Convolutional Network for Centerline Extraction

Mar 25, 2019
Zhihui Guo, Junjie Bai, Yi Lu, Xin Wang, Kunlin Cao, Qi Song, Milan Sonka, Youbing Yin

A novel centerline extraction framework is reported which combines an end-to-end trainable multi-task fully convolutional network (FCN) with a minimal path extractor. The FCN simultaneously computes centerline distance maps and detects branch endpoints. The method generates single-pixel-wide centerlines with no spurious branches. It handles arbitrary tree-structured object with no prior assumption regarding depth of the tree or its bifurcation pattern. It is also robust to substantial scale changes across different parts of the target object and minor imperfections of the object's segmentation mask. To the best of our knowledge, this is the first deep-learning based centerline extraction method that guarantees single-pixel-wide centerline for a complex tree-structured object. The proposed method is validated in coronary artery centerline extraction on a dataset of 620 patients (400 of which used as test set). This application is challenging due to the large number of coronary branches, branch tortuosity, and large variations in length, thickness, shape, etc. The proposed method generates well-positioned centerlines, exhibiting lower number of missing branches and is more robust in the presence of minor imperfections of the object segmentation mask. Compared to a state-of-the-art traditional minimal path approach, our method improves patient-level success rate of centerline extraction from 54.3% to 88.8% according to independent human expert review.

* Accepted by the international conference on Information Processing in Medical Imaging (IPMI) 2019 

  Click for Model/Code and Paper
Residual Attention based Network for Hand Bone Age Assessment

Dec 21, 2018
Eric Wu, Bin Kong, Xin Wang, Junjie Bai, Yi Lu, Feng Gao, Shaoting Zhang, Kunlin Cao, Qi Song, Siwei Lyu, Youbing Yin

Computerized automatic methods have been employed to boost the productivity as well as objectiveness of hand bone age assessment. These approaches make predictions according to the whole X-ray images, which include other objects that may introduce distractions. Instead, our framework is inspired by the clinical workflow (Tanner-Whitehouse) of hand bone age assessment, which focuses on the key components of the hand. The proposed framework is composed of two components: a Mask R-CNN subnet of pixelwise hand segmentation and a residual attention network for hand bone age assessment. The Mask R-CNN subnet segments the hands from X-ray images to avoid the distractions of other objects (e.g., X-ray tags). The hierarchical attention components of the residual attention subnet force our network to focus on the key components of the X-ray images and generate the final predictions as well as the associated visual supports, which is similar to the assessment procedure of clinicians. We evaluate the performance of the proposed pipeline on the RSNA pediatric bone age dataset and the results demonstrate its superiority over the previous methods.

  Click for Model/Code and Paper
Model-free prediction of spatiotemporal dynamical systems with recurrent neural networks: Role of network spectral radius

Oct 10, 2019
Junjie Jiang, Ying-Cheng Lai

A common difficulty in applications of machine learning is the lack of any general principle for guiding the choices of key parameters of the underlying neural network. Focusing on a class of recurrent neural networks - reservoir computing systems that have recently been exploited for model-free prediction of nonlinear dynamical systems, we uncover a surprising phenomenon: the emergence of an interval in the spectral radius of the neural network in which the prediction error is minimized. In a three-dimensional representation of the error versus time and spectral radius, the interval corresponds to the bottom region of a "valley." Such a valley arises for a variety of spatiotemporal dynamical systems described by nonlinear partial differential equations, regardless of the structure and the edge-weight distribution of the underlying reservoir network. We also find that, while the particular location and size of the valley would depend on the details of the target system to be predicted, the interval tends to be larger for undirected than for directed networks. The valley phenomenon can be beneficial to the design of optimal reservoir computing, representing a small step forward in understanding these machine-learning systems.

* 15 pages, 13 figures 

  Click for Model/Code and Paper
Object-Part Attention Model for Fine-grained Image Classification

Sep 25, 2017
Yuxin Peng, Xiangteng He, Junjie Zhao

Fine-grained image classification is to recognize hundreds of subcategories belonging to the same basic-level category, such as 200 subcategories belonging to the bird, which is highly challenging due to large variance in the same subcategory and small variance among different subcategories. Existing methods generally first locate the objects or parts and then discriminate which subcategory the image belongs to. However, they mainly have two limitations: (1) Relying on object or part annotations which are heavily labor consuming. (2) Ignoring the spatial relationships between the object and its parts as well as among these parts, both of which are significantly helpful for finding discriminative parts. Therefore, this paper proposes the object-part attention model (OPAM) for weakly supervised fine-grained image classification, and the main novelties are: (1) Object-part attention model integrates two level attentions: object-level attention localizes objects of images, and part-level attention selects discriminative parts of object. Both are jointly employed to learn multi-view and multi-scale features to enhance their mutual promotions. (2) Object-part spatial constraint model combines two spatial constraints: object spatial constraint ensures selected parts highly representative, and part spatial constraint eliminates redundancy and enhances discrimination of selected parts. Both are jointly employed to exploit the subtle and local differences for distinguishing the subcategories. Importantly, neither object nor part annotations are used in our proposed approach, which avoids the heavy labor consumption of labeling. Comparing with more than 10 state-of-the-art methods on 4 widely-used datasets, our OPAM approach achieves the best performance.

* 14 pages, submitted to IEEE Transactions on Image Processing 

  Click for Model/Code and Paper
Visualization of Convolutional Neural Networks for Monocular Depth Estimation

Apr 06, 2019
Junjie Hu, Yan Zhang, Takayuki Okatani

Recently, convolutional neural networks (CNNs) have shown great success on the task of monocular depth estimation. A fundamental yet unanswered question is: how CNNs can infer depth from a single image. Toward answering this question, we consider visualization of inference of a CNN by identifying relevant pixels of an input image to depth estimation. We formulate it as an optimization problem of identifying the smallest number of image pixels from which the CNN can estimate a depth map with the minimum difference from the estimate from the entire image. To cope with a difficulty with optimization through a deep CNN, we propose to use another network to predict those relevant image pixels in a forward computation. In our experiments, we first show the effectiveness of this approach, and then apply it to different depth estimation networks on indoor and outdoor scene datasets. The results provide several findings that help exploration of the above question.

  Click for Model/Code and Paper
Detecting Collusive Cliques in Futures Markets Based on Trading Behaviors from Real Data

Oct 07, 2011
Junjie Wang, Shuigeng Zhou, Jihong Guan

In financial markets, abnormal trading behaviors pose a serious challenge to market surveillance and risk management. What is worse, there is an increasing emergence of abnormal trading events that some experienced traders constitute a collusive clique and collaborate to manipulate some instruments, thus mislead other investors by applying similar trading behaviors for maximizing their personal benefits. In this paper, a method is proposed to detect the hidden collusive cliques involved in an instrument of future markets by first calculating the correlation coefficient between any two eligible unified aggregated time series of signed order volume, and then combining the connected components from multiple sparsified weighted graphs constructed by using the correlation matrices where each correlation coefficient is over a user-specified threshold. Experiments conducted on real order data from the Shanghai Futures Exchange show that the proposed method can effectively detect suspect collusive cliques. A tool based on the proposed method has been deployed in the exchange as a pilot application for futures market surveillance and risk management.

* 13 pages, 5 figures and 3 tables. submitted to Neurocomputing 

  Click for Model/Code and Paper
An Efficient Optical Flow Based Motion Detection Method for Non-stationary Scenes

Nov 21, 2018
Junjie Huang, Wei Zou, Zheng Zhu, Jiagang Zhu

Real-time motion detection in non-stationary scenes is a difficult task due to dynamic background, changing foreground appearance and limited computational resource. These challenges degrade the performance of the existing methods in practical applications. In this paper, an optical flow based framework is proposed to address this problem. By applying a novel strategy to utilize optical flow, we enable our method being free of model constructing, training or updating and can be performed efficiently. Besides, a dual judgment mechanism with adaptive intervals and adaptive thresholds is designed to heighten the system's adaptation to different situations. In experiment part, we quantitatively and qualitatively validate the effectiveness and feasibility of our method with videos in various scene conditions. The experimental results show that our method adapts itself to different situations and outperforms the state-of-the-art real-time methods, indicating the advantages of our optical flow based method.

* 6 pages. arXiv admin note: substantial text overlap with arXiv:1807.04890 

  Click for Model/Code and Paper
Signed Graph Attention Networks

Jun 26, 2019
Junjie Huang, Huawei Shen, Liang Hou, Xueqi Cheng

Graph or network data is ubiquitous in the real world, including social networks, information networks, traffic networks, biological networks and various technical networks. The non-Euclidean nature of graph data poses the challenge for modeling and analyzing graph data. Recently, Graph Neural Network (GNN) is proposed as a general and powerful framework to handle tasks on graph data, e.g., node embedding, link prediction and node classification. As a representative implementation of GNNs, Graph Attention Networks (GAT) is successfully applied in a variety of tasks on real datasets. However, GAT is designed to networks with only positive links and fails to handle signed networks which contain both positive and negative links. In this paper, we propose Signed Graph Attention Networks (SiGAT), generalizing GAT to signed networks. SiGAT incorporates graph motifs into GAT to capture two well-known theories in signed network research, i.e., balance theory and status theory. In SiGAT, motifs offer us the flexible structural pattern to aggregate and propagate messages on the signed network to generate node embeddings. We evaluate the proposed SiGAT method by applying it to the signed link prediction task. Experimental results on three real datasets demonstrate that SiGAT outperforms feature-based method, network embedding method and state-of-the-art GNN-based methods like signed graph convolutional networks (SGCN).

* Accepted and to appear at ICANN2019 

  Click for Model/Code and Paper
Impression Network for Video Object Detection

Dec 16, 2017
Congrui Hetang, Hongwei Qin, Shaohui Liu, Junjie Yan

Video object detection is more challenging compared to image object detection. Previous works proved that applying object detector frame by frame is not only slow but also inaccurate. Visual clues get weakened by defocus and motion blur, causing failure on corresponding frames. Multi-frame feature fusion methods proved effective in improving the accuracy, but they dramatically sacrifice the speed. Feature propagation based methods proved effective in improving the speed, but they sacrifice the accuracy. So is it possible to improve speed and performance simultaneously? Inspired by how human utilize impression to recognize objects from blurry frames, we propose Impression Network that embodies a natural and efficient feature aggregation mechanism. In our framework, an impression feature is established by iteratively absorbing sparsely extracted frame features. The impression feature is propagated all the way down the video, helping enhance features of low-quality frames. This impression mechanism makes it possible to perform long-range multi-frame feature fusion among sparse keyframes with minimal overhead. It significantly improves per-frame detection baseline on ImageNet VID while being 3 times faster (20 fps). We hope Impression Network can provide a new perspective on video feature enhancement. Code will be made available.

* Tech Report 

  Click for Model/Code and Paper
AlphaStock: A Buying-Winners-and-Selling-Losers Investment Strategy using Interpretable Deep Reinforcement Attention Networks

Jul 24, 2019
Jingyuan Wang, Yang Zhang, Ke Tang, Junjie Wu, Zhang Xiong

Recent years have witnessed the successful marriage of finance innovations and AI techniques in various finance applications including quantitative trading (QT). Despite great research efforts devoted to leveraging deep learning (DL) methods for building better QT strategies, existing studies still face serious challenges especially from the side of finance, such as the balance of risk and return, the resistance to extreme loss, and the interpretability of strategies, which limit the application of DL-based strategies in real-life financial markets. In this work, we propose AlphaStock, a novel reinforcement learning (RL) based investment strategy enhanced by interpretable deep attention networks, to address the above challenges. Our main contributions are summarized as follows: i) We integrate deep attention networks with a Sharpe ratio-oriented reinforcement learning framework to achieve a risk-return balanced investment strategy; ii) We suggest modeling interrelationships among assets to avoid selection bias and develop a cross-asset attention mechanism; iii) To our best knowledge, this work is among the first to offer an interpretable investment strategy using deep reinforcement learning models. The experiments on long-periodic U.S. and Chinese markets demonstrate the effectiveness and robustness of AlphaStock over diverse market states. It turns out that AlphaStock tends to select the stocks as winners with high long-term growth, low volatility, high intrinsic value, and being undervalued recently.

* Accepted for POSTER presentation at KDD2019 Applied Data Science Track 

  Click for Model/Code and Paper
CRAFT Objects from Images

Apr 12, 2016
Bin Yang, Junjie Yan, Zhen Lei, Stan Z. Li

Object detection is a fundamental problem in image understanding. One popular solution is the R-CNN framework and its fast versions. They decompose the object detection problem into two cascaded easier tasks: 1) generating object proposals from images, 2) classifying proposals into various object categories. Despite that we are handling with two relatively easier tasks, they are not solved perfectly and there's still room for improvement. In this paper, we push the "divide and conquer" solution even further by dividing each task into two sub-tasks. We call the proposed method "CRAFT" (Cascade Region-proposal-network And FasT-rcnn), which tackles each task with a carefully designed network cascade. We show that the cascade structure helps in both tasks: in proposal generation, it provides more compact and better localized object proposals; in object classification, it reduces false positives (mainly between ambiguous categories) by capturing both inter- and intra-category variances. CRAFT achieves consistent and considerable improvement over the state-of-the-art on object detection benchmarks like PASCAL VOC 07/12 and ILSVRC.

* CVPR2016 

  Click for Model/Code and Paper
SGD Converges to Global Minimum in Deep Learning via Star-convex Path

Jan 02, 2019
Yi Zhou, Junjie Yang, Huishuai Zhang, Yingbin Liang, Vahid Tarokh

Stochastic gradient descent (SGD) has been found to be surprisingly effective in training a variety of deep neural networks. However, there is still a lack of understanding on how and why SGD can train these complex networks towards a global minimum. In this study, we establish the convergence of SGD to a global minimum for nonconvex optimization problems that are commonly encountered in neural network training. Our argument exploits the following two important properties: 1) the training loss can achieve zero value (approximately), which has been widely observed in deep learning; 2) SGD follows a star-convex path, which is verified by various experiments in this paper. In such a context, our analysis shows that SGD, although has long been considered as a randomized algorithm, converges in an intrinsically deterministic manner to a global minimum.

* ICLR2019 

  Click for Model/Code and Paper
Factorized Attention: Self-Attention with Linear Complexities

Dec 04, 2018
Zhuoran Shen, Mingyuan Zhang, Shuai Yi, Junjie Yan, Haiyu Zhao

Recent works have been applying self-attention to various fields in computer vision and natural language processing. However, the memory and computational demands of existing self-attention operations grow quadratically with the spatiotemporal size of the input. This prohibits the application of self-attention on large inputs, e.g., long sequences, high-definition images, or large videos. To remedy this, this paper proposes a novel factorized attention (FA) module, which achieves the same expressive power as previous approaches with substantially less memory and computational consumption. The resource-efficiency allows more widespread and flexible application of it. Empirical evaluations on object recognition demonstrate the effectiveness of these advantages. FA-augmented models achieved state-of-the-art performance for object detection and instance segmentation on MS-COCO. Further, the resource-efficiency of FA democratizes self-attention to fields where the prohibitively high costs currently prevent its application. The state-of-the-art result for stereo depth estimation on the Scene Flow dataset exemplifies this.

  Click for Model/Code and Paper
Deep Cocktail Network: Multi-source Unsupervised Domain Adaptation with Category Shift

Mar 02, 2018
Ruijia Xu, Ziliang Chen, Wangmeng Zuo, Junjie Yan, Liang Lin

Unsupervised domain adaptation (UDA) conventionally assumes labeled source samples coming from a single underlying source distribution. Whereas in practical scenario, labeled data are typically collected from diverse sources. The multiple sources are different not only from the target but also from each other, thus, domain adaptater should not be modeled in the same way. Moreover, those sources may not completely share their categories, which further brings a new transfer challenge called category shift. In this paper, we propose a deep cocktail network (DCTN) to battle the domain and category shifts among multiple sources. Motivated by the theoretical results in \cite{mansour2009domain}, the target distribution can be represented as the weighted combination of source distributions, and, the multi-source unsupervised domain adaptation via DCTN is then performed as two alternating steps: i) It deploys multi-way adversarial learning to minimize the discrepancy between the target and each of the multiple source domains, which also obtains the source-specific perplexity scores to denote the possibilities that a target sample belongs to different source domains. ii) The multi-source category classifiers are integrated with the perplexity scores to classify target sample, and the pseudo-labeled target samples together with source samples are utilized to update the multi-source category classifier and the feature extractor. We evaluate DCTN in three domain adaptation benchmarks, which clearly demonstrate the superiority of our framework.

* Accepted for publication in Conference on Computer Vision and Pattern Recognition(CVPR), 2018 

  Click for Model/Code and Paper
Learning to Cluster Faces on an Affinity Graph

May 05, 2019
Lei Yang, Xiaohang Zhan, Dapeng Chen, Junjie Yan, Chen Change Loy, Dahua Lin

Face recognition sees remarkable progress in recent years, and its performance has reached a very high level. Taking it to a next level requires substantially larger data, which would involve prohibitive annotation cost. Hence, exploiting unlabeled data becomes an appealing alternative. Recent works have shown that clustering unlabeled faces is a promising approach, often leading to notable performance gains. Yet, how to effectively cluster, especially on a large-scale (i.e. million-level or above) dataset, remains an open question. A key challenge lies in the complex variations of cluster patterns, which make it difficult for conventional clustering methods to meet the needed accuracy. This work explores a novel approach, namely, learning to cluster instead of relying on hand-crafted criteria. Specifically, we propose a framework based on graph convolutional network, which combines a detection and a segmentation module to pinpoint face clusters. Experiments show that our method yields significantly more accurate face clusters, which, as a result, also lead to further performance gain in face recognition.

* 8 pages, 8 figures, CVPR 2019 

  Click for Model/Code and Paper