Models, code, and papers for "Hui Shi":

##### Sequential VAE-LSTM for Anomaly Detection on Time Series

Oct 10, 2019
Run-Qing Chen, Guang-Hui Shi, Wan-Lei Zhao, Chang-Hui Liang

In order to support stable web-based applications and services, anomalies on the IT performance status have to be detected timely. Moreover, the performance trend across the time series should be predicted. In this paper, we propose SeqVL (Sequential VAE-LSTM), a neural network model based on both VAE (Variational Auto-Encoder) and LSTM (Long Short-Term Memory). This work is the first attempt to integrate unsupervised anomaly detection and trend prediction under one framework. Moreover, this model performs considerably better on detection and prediction than VAE and LSTM work alone. On unsupervised anomaly detection, SeqVL achieves competitive experimental results compared with other state-of-the-art methods on public datasets. On trend prediction, SeqVL outperforms several classic time series prediction models in the experiments of the public dataset.

* 7 pages, 4 figures
##### Gastroscopic Panoramic View: Application to Automatic Polyps Detection under Gastroscopy

Oct 19, 2019
Shi Chenfei, Yan Xue, Chuan Jiang, Hui Tian, Bei Liu

Endoscopic diagnosis is an important means for gastric polyp detection. In this paper, a panoramic image of gastroscopy is developed, which can display the inner surface of the stomach intuitively and comprehensively. Moreover, the proposed automatic detection solution can help doctors locate the polyps automatically, and reduce missed diagnosis. The main contributions of this paper are: firstly, a gastroscopic panorama reconstruction method is developed. The reconstruction does not require additional hardware devices, and can solve the problem of texture dislocation and illumination imbalance properly; secondly, an end-to-end multi-object detection for gastroscopic panorama is trained based on deep learning framework. Compared with traditional solutions, the automatic polyp detection system can locate all polyps in the inner wall of stomach in real time and assist doctors to find the lesions. Thirdly, the system was evaluated in the Affiliated Hospital of Zhejiang University. The results show that the average error of the panorama is less than 2 mm, the accuracy of the polyp detection is 95%, and the recall rate is 99%. In addition, the research roadmap of this paper has guiding significance for endoscopy-assisted detection of other human soft cavities.

##### Clustering by Orthogonal NMF Model and Non-Convex Penalty Optimization

Jun 03, 2019
Shuai Wang, Tsung-Hui Chang, Ying Cui, Jong-Shi Pang

The non-negative matrix factorization (NMF) model with an additional orthogonality constraint on one of the factor matrices, called the orthogonal NMF (ONMF), has been found to provide improved clustering performance over the K-means. Solving the ONMF model is a challenging optimization problem due to the existence of both orthogonality and nonnegativity constraints, and most of the existing methods directly deal with the orthogonality constraint in its original form via various optimization techniques. In this paper, we propose a new ONMF based clustering formulation that equivalently transforms the orthogonality constraint into a set of norm-based non-convex equality constraints. We then apply a non-convex penalty (NCP) approach to add the non-convex equality constraints to the objective as penalty terms, leaving simple non-negativity constraints only in the penalized problem. One smooth penalty formulation and one non-smooth penalty formulation are respectively studied, and theoretical conditions for the penalized problems to provide feasible stationary solutions to the ONMF based clustering problem are presented. Experimental results based on both synthetic and real datasets are presented to show that the proposed NCP methods are computationally time efficient, and either match or outperform the existing K-means and ONMF based methods in terms of the clustering performance.

##### Penalizing Top Performers: Conservative Loss for Semantic Segmentation Adaptation

Sep 04, 2018
Xinge Zhu, Hui Zhou, Ceyuan Yang, Jianping Shi, Dahua Lin

Due to the expensive and time-consuming annotations (e.g., segmentation) for real-world images, recent works in computer vision resort to synthetic data. However, the performance on the real image often drops significantly because of the domain shift between the synthetic data and the real images. In this setting, domain adaptation brings an appealing option. The effective approaches of domain adaptation shape the representations that (1) are discriminative for the main task and (2) have good generalization capability for domain shift. To this end, we propose a novel loss function, i.e., Conservative Loss, which penalizes the extreme good and bad cases while encouraging the moderate examples. More specifically, it enables the network to learn features that are discriminative by gradient descent and are invariant to the change of domains via gradient ascend method. Extensive experiments on synthetic to real segmentation adaptation show our proposed method achieves state of the art results. Ablation studies give more insights into properties of the Conservative Loss. Exploratory experiments and discussion demonstrate that our Conservative Loss has good flexibility rather than restricting an exact form.

* ECCV 2018
##### Worst-Case Linear Discriminant Analysis as Scalable Semidefinite Feasibility Problems

Nov 27, 2014
Hui Li, Chunhua Shen, Anton van den Hengel, Qinfeng Shi

In this paper, we propose an efficient semidefinite programming (SDP) approach to worst-case linear discriminant analysis (WLDA). Compared with the traditional LDA, WLDA considers the dimensionality reduction problem from the worst-case viewpoint, which is in general more robust for classification. However, the original problem of WLDA is non-convex and difficult to optimize. In this paper, we reformulate the optimization problem of WLDA into a sequence of semidefinite feasibility problems. To efficiently solve the semidefinite feasibility problems, we design a new scalable optimization method with quasi-Newton methods and eigen-decomposition being the core components. The proposed method is orders of magnitude faster than standard interior-point based SDP solvers. Experiments on a variety of classification problems demonstrate that our approach achieves better performance than standard LDA. Our method is also much faster and more scalable than standard interior-point SDP solvers based WLDA. The computational complexity for an SDP with $m$ constraints and matrices of size $d$ by $d$ is roughly reduced from $\mathcal{O}(m^3+md^3+m^2d^2)$ to $\mathcal{O}(d^3)$ ($m>d$ in our case).

* 14 pages
##### Residual Block-based Multi-Label Classification and Localization Network with Integral Regression for Vertebrae Labeling

Jan 01, 2020
Chunli Qin, Demin Yao, Han Zhuang, Hui Wang, Yonghong Shi, Zhijian Song

Accurate identification and localization of the vertebrae in CT scans is a critical and standard preprocessing step for clinical spinal diagnosis and treatment. Existing methods are mainly based on the integration of multiple neural networks, and most of them use the Gaussian heat map to locate the vertebrae's centroid. However, the process of obtaining the vertebrae's centroid coordinates using heat maps is non-differentiable, so it is impossible to train the network to label the vertebrae directly. Therefore, for end-to-end differential training of vertebra coordinates on CT scans, a robust and accurate automatic vertebral labeling algorithm is proposed in this study. Firstly, a novel residual-based multi-label classification and localization network is developed, which can capture multi-scale features, but also utilize the residual module and skip connection to fuse the multi-level features. Secondly, to solve the problem that the process of finding coordinates is non-differentiable and the spatial structure is not destructible, integral regression module is used in the localization network. It combines the advantages of heat map representation and direct regression coordinates to achieve end-to-end training, and can be compatible with any key point detection methods of medical image based on heat map. Finally, multi-label classification of vertebrae is carried out, which use bidirectional long short term memory (Bi-LSTM) to enhance the learning of long contextual information to improve the classification performance. The proposed method is evaluated on a challenging dataset and the results are significantly better than the state-of-the-art methods (mean localization error <3mm).

* 10 pages with 9 figures
##### Learning to Synthesize Fashion Textures

Nov 18, 2019
Wu Shi, Tak-Wai Hui, Ziwei Liu, Dahua Lin, Chen Change Loy

Existing unconditional generative models mainly focus on modeling general objects, such as faces and indoor scenes. Fashion textures, another important type of visual elements around us, have not been extensively studied. In this work, we propose an effective generative model for fashion textures and also comprehensively investigate the key components involved: internal representation, latent space sampling and the generator architecture. We use Gram matrix as a suitable internal representation for modeling realistic fashion textures, and further design two dedicated modules for modulating Gram matrix into a low-dimension vector. Since fashion textures are scale-dependent, we propose a recursive auto-encoder to capture the dependency between multiple granularity levels of texture feature. Another important observation is that fashion textures are multi-modal. We fit and sample from a Gaussian mixture model in the latent space to improve the diversity of the generated textures. Extensive experiments demonstrate that our approach is capable of synthesizing more realistic and diverse fashion textures over other state-of-the-art methods.

##### Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning

Sep 07, 2019
Wenjie Shi, Shiji Song, Hui Wu, Ya-Chu Hsu, Cheng Wu, Gao Huang

Model-free deep reinforcement learning (RL) algorithms have been widely used for a range of complex control tasks. However, slow convergence and sample inefficiency remain challenging problems in RL, especially when handling continuous and high-dimensional state spaces. To tackle this problem, we propose a general acceleration method for model-free, off-policy deep RL algorithms by drawing the idea underlying regularized Anderson acceleration (RAA), which is an effective approach to accelerating the solving of fixed point problems with perturbations. Specifically, we first explain how policy iteration can be applied directly with Anderson acceleration. Then we extend RAA to the case of deep RL by introducing a regularization term to control the impact of perturbation induced by function approximation errors. We further propose two strategies, i.e., progressive update and adaptive restart, to enhance the performance. The effectiveness of our method is evaluated on a variety of benchmark tasks, including Atari 2600 and MuJoCo. Experimental results show that our approach substantially improves both the learning speed and final performance of state-of-the-art deep RL algorithms.

* 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)
##### Robust Multi-Modality Multi-Object Tracking

Sep 09, 2019
Wenwei Zhang, Hui Zhou, Shuyang Sun, Zhe Wang, Jianping Shi, Chen Change Loy

Multi-sensor perception is crucial to ensure the reliability and accuracy in autonomous driving system, while multi-object tracking (MOT) improves that by tracing sequential movement of dynamic objects. Most current approaches for multi-sensor multi-object tracking are either lack of reliability by tightly relying on a single input source (e.g., center camera), or not accurate enough by fusing the results from multiple sensors in post processing without fully exploiting the inherent information. In this study, we design a generic sensor-agnostic multi-modality MOT framework (mmMOT), where each modality (i.e., sensors) is capable of performing its role independently to preserve reliability, and further improving its accuracy through a novel multi-modality fusion module. Our mmMOT can be trained in an end-to-end manner, enables joint optimization for the base feature extractor of each modality and an adjacency estimator for cross modality. Our mmMOT also makes the first attempt to encode deep representation of point cloud in data association process in MOT. We conduct extensive experiments to evaluate the effectiveness of the proposed framework on the challenging KITTI benchmark and report state-of-the-art performance. Code and models are available at https://github.com/ZwwWayne/mmMOT.

* To appear in ICCV 2019. Code and models are available at https://github.com/ZwwWayne/mmMOT
##### From Non-Paying to Premium: Predicting User Conversion in Video Games with Ensemble Learning

Retaining premium players is key to the success of free-to-play games, but most of them do not start purchasing right after joining the game. By exploiting the exceptionally rich datasets recorded by modern video games--which provide information on the individual behavior of each and every player--survival analysis techniques can be used to predict what players are more likely to become paying (or even premium) users and when, both in terms of time and game level, the conversion will take place. Here we show that a traditional semi-parametric model (Cox regression), a random survival forest (RSF) technique and a method based on conditional inference survival ensembles all yield very promising results. However, the last approach has the advantage of being able to correct the inherent bias in RSF models by dividing the procedure into two steps: first selecting the best predictor to perform the splitting and then the best split point for that covariate. The proposed conditional inference survival ensembles method could be readily used in operational environments for early identification of premium players and the parts of the game that may prompt them to become paying users. Such knowledge would allow developers to induce their conversion and, more generally, to better understand the needs of their players and provide them with a personalized experience, thereby increasing their engagement and paving the way to higher monetization.

* social games, conversion prediction, ensemble methods, survival analysis, online games, user behavior

Recently, autonomous driving development ignited competition among car makers and technical corporations. Low-level automation cars are already commercially available. But high automated vehicles where the vehicle drives by itself without human monitoring is still at infancy. Such autonomous vehicles (AVs) rely on the computing system in the car to to interpret the environment and make driving decisions. Therefore, computing system design is essential particularly in enhancing the attainment of driving safety. However, to our knowledge, no clear guideline exists so far regarding safety-aware AV computing system and architecture design. To understand the safety requirement of AV computing system, we performed a field study by running industrial Level-4 autonomous driving fleets in various locations, road conditions, and traffic patterns. The field study indicates that traditional computing system performance metrics, such as tail latency, average latency, maximum latency, and timeout, cannot fully satisfy the safety requirement for AV computing system design. To address this issue, we propose a `safety score' as a primary metric for measuring the level of safety in AV computing system design. Furthermore, we propose a perception latency model, which helps architects estimate the safety score of given architecture and system design without physically testing them in an AV. We demonstrate the use of our safety score and latency model, by developing and evaluating a safety-aware AV computing system computation hardware resource management scheme.

##### Human vs. Computer Go: Review and Prospect

The Google DeepMind challenge match in March 2016 was a historic achievement for computer Go development. This article discusses the development of computational intelligence (CI) and its relative strength in comparison with human intelligence for the game of Go. We first summarize the milestones achieved for computer Go from 1998 to 2016. Then, the computer Go programs that have participated in previous IEEE CIS competitions as well as methods and techniques used in AlphaGo are briefly introduced. Commentaries from three high-level professional Go players on the five AlphaGo versus Lee Sedol games are also included. We conclude that AlphaGo beating Lee Sedol is a huge achievement in artificial intelligence (AI) based largely on CI methods. In the future, powerful computer Go programs such as AlphaGo are expected to be instrumental in promoting Go education and AI real-world applications.

##### AIM 2019 Challenge on Constrained Super-Resolution: Methods and Results

This paper reviews the AIM 2019 challenge on constrained example-based single image super-resolution with focus on proposed solutions and results. The challenge had 3 tracks. Taking the three main aspects (i.e., number of parameters, inference/running time, fidelity (PSNR)) of MSRResNet as the baseline, Track 1 aims to reduce the amount of parameters while being constrained to maintain or improve the running time and the PSNR result, Tracks 2 and 3 aim to optimize running time and PSNR result with constrain of the other two aspects, respectively. Each track had an average of 64 registered participants, and 12 teams submitted the final results. They gauge the state-of-the-art in single image super-resolution.

##### PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report

This paper reviews the first challenge on efficient perceptual image enhancement with the focus on deploying deep learning models on smartphones. The challenge consisted of two tracks. In the first one, participants were solving the classical image super-resolution problem with a bicubic downscaling factor of 4. The second track was aimed at real-world photo enhancement, and the goal was to map low-quality photos from the iPhone 3GS device to the same photos captured with a DSLR camera. The target metric used in this challenge combined the runtime, PSNR scores and solutions' perceptual results measured in the user study. To ensure the efficiency of the submitted models, we additionally measured their runtime and memory requirements on Android smartphones. The proposed solutions significantly improved baseline results defining the state-of-the-art for image enhancement on smartphones.

##### Information Retrieval and Its Sister Disciplines

Dec 05, 2019
Grace Hui Yang

This article presents a summary graph to show the relationships between Information Retrieval (IR) and other related disciplines. The figure tells the key differences between them and the conditions under which one would transition into another.

##### Entanglement Entropy of Target Functions for Image Classification and Convolutional Neural Network

Oct 16, 2017
Ya-Hui Zhang

The success of deep convolutional neural network (CNN) in computer vision especially image classification problems requests a new information theory for function of image, instead of image itself. In this article, after establishing a deep mathematical connection between image classification problem and quantum spin model, we propose to use entanglement entropy, a generalization of classical Boltzmann-Shannon entropy, as a powerful tool to characterize the information needed for representation of general function of image. We prove that there is a sub-volume-law bound for entanglement entropy of target functions of reasonable image classification problems. Therefore target functions of image classification only occupy a small subspace of the whole Hilbert space. As a result, a neural network with polynomial number of parameters is efficient for representation of such target functions of image. The concept of entanglement entropy can also be useful to characterize the expressive power of different neural networks. For example, we show that to maintain the same expressive power, number of channels $D$ in a convolutional neural network should scale with the number of convolution layers $n_c$ as $D\sim D_0^{\frac{1}{n_c}}$. Therefore, deeper CNN with large $n_c$ is more efficient than shallow ones.

* 9pages, 1 figures
##### Developing Parallel Dependency Graph In Improving Game Balancing

Jan 26, 2013
Sim-Hui Tee

The dependency graph is a data architecture that models all the dependencies between the different types of assets in the game. It depicts the dependency-based relationships between the assets of a game. For example, a player must construct an arsenal before he can build weapons. It is vital that the dependency graph of a game is designed logically to ensure a logical sequence of game play. However, a mere logical dependency graph is not sufficient in sustaining the players' enduring interests in a game, which brings the problem of game balancing into picture. The issue of game balancing arises when the players do not feel the chances of winning the game over their AI opponents who are more skillful in the game play. At the current state of research, the architecture of dependency graph is monolithic for the players. The sequence of asset possession is always foreseeable because there is only a single dependency graph. Game balancing is impossible when the assets of AI players are overwhelmingly outnumbering that of human players. This paper proposes a parallel architecture of dependency graph for the AI players and human players. Instead of having a single dependency graph, a parallel architecture is proposed where the dependency graph of AI player is adjustable with that of human player using a support dependency as a game balancing mechanism. This paper exhibits that the parallel dependency graph helps to improve game balancing.

* 5 pages
##### DE-PACRR: Exploring Layers Inside the PACRR Model

Jul 24, 2017
Andrew Yates, Kai Hui

Recent neural IR models have demonstrated deep learning's utility in ad-hoc information retrieval. However, deep models have a reputation for being black boxes, and the roles of a neural IR model's components may not be obvious at first glance. In this work, we attempt to shed light on the inner workings of a recently proposed neural IR model, namely the PACRR model, by visualizing the output of intermediate layers and by investigating the relationship between intermediate weights and the ultimate relevance score produced. We highlight several insights, hoping that such insights will be generally applicable.

* Neu-IR 2017 SIGIR Workshop on Neural Information Retrieval
##### Hybrid Orthogonal Projection and Estimation (HOPE): A New Framework to Probe and Learn Neural Networks

Jun 06, 2015
Shiliang Zhang, Hui Jiang

In this paper, we propose a novel model for high-dimensional data, called the Hybrid Orthogonal Projection and Estimation (HOPE) model, which combines a linear orthogonal projection and a finite mixture model under a unified generative modeling framework. The HOPE model itself can be learned unsupervised from unlabelled data based on the maximum likelihood estimation as well as discriminatively from labelled data. More interestingly, we have shown the proposed HOPE models are closely related to neural networks (NNs) in a sense that each hidden layer can be reformulated as a HOPE model. As a result, the HOPE framework can be used as a novel tool to probe why and how NNs work, more importantly, to learn NNs in either supervised or unsupervised ways. In this work, we have investigated the HOPE framework to learn NNs for several standard tasks, including image recognition on MNIST and speech recognition on TIMIT. Experimental results have shown that the HOPE framework yields significant performance gains over the current state-of-the-art methods in various types of NN learning problems, including unsupervised feature learning, supervised or semi-supervised learning.

* Journal of Machine Learning Research (JMLR), 17(37):1-33, 2016. (http://jmlr.org/papers/v17/15-335.html)
* 31 pages, 5 Figures, technical report