Models, code, and papers for "Da Li":

Large-Scale Pedestrian Retrieval Competition

Mar 06, 2019
Da Li, Zhang Zhang

The Large-Scale Pedestrian Retrieval Competition (LSPRC) mainly focuses on person retrieval which is an important end application in intelligent vision system of surveillance. Person retrieval aims at searching the interested target with specific visual attributes or images. The low image quality, various camera viewpoints, large pose variations and occlusions in real scenes make it a challenge problem. By providing large-scale surveillance data in real scene and standard evaluation methods that are closer to real application, the competition aims to improve the robust of related algorithms and further meet the complicated situations in real application. LSPRC includes two kinds of tasks, i.e., Attribute based Pedestrian Retrieval (PR-A) and Re-IDentification (ReID) based Pedestrian Retrieval (PR-ID). The normal evaluation index, i.e., mean Average Precision (mAP), is used to measure the performances of the two tasks under various scale, pose and occlusion. While the method of system evaluation is introduced to evaluate the person retrieval system in which the related algorithms of the two tasks are integrated into a large-scale video parsing platform (named ISEE) combing with algorithm of pedestrian detection.

  Access Model/Code and Paper
Facial Emotion Recognition Using Deep Learning

Oct 19, 2019
Ching-Da Wu, Li-Heng Chen

We aim to construct a system that captures real-world facial images through the front camera on a laptop. The system is capable of processing/recognizing the captured image and predict a result in real-time. In this system, we exploit the power of deep learning technique to learn a facial emotion recognition (FER) model based on a set of labeled facial images. Finally, experiments are conducted to evaluate our model using largely used public database.

* 5 pages, 7 figures, 4 tables 

  Access Model/Code and Paper
Dense Motion Estimation for Smoke

Sep 08, 2016
Da Chen, Wenbin Li, Peter Hall

Motion estimation for highly dynamic phenomena such as smoke is an open challenge for Computer Vision. Traditional dense motion estimation algorithms have difficulties with non-rigid and large motions, both of which are frequently observed in smoke motion. We propose an algorithm for dense motion estimation of smoke. Our algorithm is robust, fast, and has better performance over different types of smoke compared to other dense motion estimation algorithms, including state of the art and neural network approaches. The key to our contribution is to use skeletal flow, without explicit point matching, to provide a sparse flow. This sparse flow is upgraded to a dense flow. In this paper we describe our algorithm in greater detail, and provide experimental evidence to support our claims.

* ACCV2016 

  Access Model/Code and Paper
Magnitude Bounded Matrix Factorisation for Recommender Systems

Jul 15, 2018
Shuai Jiang, Kan Li, Richard Yi Da Xu

Low rank matrix factorisation is often used in recommender systems as a way of extracting latent features. When dealing with large and sparse datasets, traditional recommendation algorithms face the problem of acquiring large, unrestrained, fluctuating values over predictions especially for users/items with very few corresponding observations. Although the problem has been somewhat solved by imposing bounding constraints over its objectives, and/or over all entries to be within a fixed range, in terms of gaining better recommendations, these approaches have two major shortcomings that we aim to mitigate in this work: one is they can only deal with one pair of fixed bounds for all entries, and the other one is they are very time-consuming when applied on large scale recommender systems. In this paper, we propose a novel algorithm named Magnitude Bounded Matrix Factorisation (MBMF), which allows different bounds for individual users/items and performs very fast on large scale datasets. The key idea of our algorithm is to construct a model by constraining the magnitudes of each individual user/item feature vector. We achieve this by converting from the Cartesian to Spherical coordinate system with radii set as the corresponding magnitudes, which allows the above constrained optimisation problem to become an unconstrained one. The Stochastic Gradient Descent (SGD) method is then applied to solve the unconstrained task efficiently. Experiments on synthetic and real datasets demonstrate that in most cases the proposed MBMF is superior over all existing algorithms in terms of accuracy and time complexity.

* 11 pages, 6 figures, TNNLS 

  Access Model/Code and Paper
Learn to Model Motion from Blurry Footages

Apr 19, 2017
Wenbin Li, Da Chen, Zhihan Lv, Yan Yan, Darren Cosker

It is difficult to recover the motion field from a real-world footage given a mixture of camera shake and other photometric effects. In this paper we propose a hybrid framework by interleaving a Convolutional Neural Network (CNN) and a traditional optical flow energy. We first conduct a CNN architecture using a novel learnable directional filtering layer. Such layer encodes the angle and distance similarity matrix between blur and camera motion, which is able to enhance the blur features of the camera-shake footages. The proposed CNNs are then integrated into an iterative optical flow framework, which enable the capability of modelling and solving both the blind deconvolution and the optical flow estimation problems simultaneously. Our framework is trained end-to-end on a synthetic dataset and yields competitive precision and performance against the state-of-the-art approaches.

* Preprint of our paper accepted by Pattern Recognition 

  Access Model/Code and Paper
A Large-scale Distributed Video Parsing and Evaluation Platform

Nov 29, 2016
Kai Yu, Yang Zhou, Da Li, Zhang Zhang, Kaiqi Huang

Visual surveillance systems have become one of the largest data sources of Big Visual Data in real world. However, existing systems for video analysis still lack the ability to handle the problems of scalability, expansibility and error-prone, though great advances have been achieved in a number of visual recognition tasks and surveillance applications, e.g., pedestrian/vehicle detection, people/vehicle counting. Moreover, few algorithms explore the specific values/characteristics in large-scale surveillance videos. To address these problems in large-scale video analysis, we develop a scalable video parsing and evaluation platform through combining some advanced techniques for Big Data processing, including Spark Streaming, Kafka and Hadoop Distributed Filesystem (HDFS). Also, a Web User Interface is designed in the system, to collect users' degrees of satisfaction on the recognition tasks so as to evaluate the performance of the whole system. Furthermore, the highly extensible platform running on the long-term surveillance videos makes it possible to develop more intelligent incremental algorithms to enhance the performance of various visual recognition tasks.

* Accepted by Chinese Conference on Intelligent Visual Surveillance 2016 

  Access Model/Code and Paper
Robust Scene Text Recognition Using Sparse Coding based Features

Dec 29, 2015
Da-Han Wang, Hanzi Wang, Dong Zhang, Jonathan Li, David Zhang

In this paper, we propose an effective scene text recognition method using sparse coding based features, called Histograms of Sparse Codes (HSC) features. For character detection, we use the HSC features instead of using the Histograms of Oriented Gradients (HOG) features. The HSC features are extracted by computing sparse codes with dictionaries that are learned from data using K-SVD, and aggregating per-pixel sparse codes to form local histograms. For word recognition, we integrate multiple cues including character detection scores and geometric contexts in an objective function. The final recognition results are obtained by searching for the words which correspond to the maximum value of the objective function. The parameters in the objective function are learned using the Minimum Classification Error (MCE) training method. Experiments on several challenging datasets demonstrate that the proposed HSC-based scene text recognition method outperforms HOG-based methods significantly and outperforms most state-of-the-art methods.

  Access Model/Code and Paper
Self-Supervised Learning For Few-Shot Image Classification

Nov 14, 2019
Da Chen, Yuefeng Chen, Yuhong Li, Feng Mao, Yuan He, Hui Xue

Few-shot image classification aims to classify unseen classes with limited labeled samples. Recent works benefit from the meta-learning process with episodic tasks and can fast adapt to class from training to testing. Due to the limited number of samples for each task, the initial embedding network for meta learning becomes an essential component and can largely affects the performance in practice. To this end, many pre-trained methods have been proposed, and most of them are trained in supervised way with limited transfer ability for unseen classes. In this paper, we proposed to train a more generalized embedding network with self-supervised learning (SSL) which can provide slow and robust representation for downstream tasks by learning from the data itself. We evaluate our work by extensive comparisons with previous baseline methods on two few-shot classification datasets ({\em i.e.,} MiniImageNet and CUB). Based on the evaluation results, the proposed method achieves significantly better performance, i.e., improve 1-shot and 5-shot tasks by nearly \textbf{3\%} and \textbf{4\%} on MiniImageNet, by nearly \textbf{9\%} and \textbf{3\%} on CUB. Moreover, the proposed method can gain the improvement of (\textbf{15\%}, \textbf{13\%}) on MiniImageNet and (\textbf{15\%}, \textbf{8\%}) on CUB by pretraining using more unlabeled data. Our code will be available at \hyperref[]{}

* Submitted to ICASSP 2020, Implementation 

  Access Model/Code and Paper
Learning to Generalize: Meta-Learning for Domain Generalization

Oct 10, 2017
Da Li, Yongxin Yang, Yi-Zhe Song, Timothy M. Hospedales

Domain shift refers to the well known problem that a model trained in one source domain performs poorly when applied to a target domain with different statistics. {Domain Generalization} (DG) techniques attempt to alleviate this issue by producing models which by design generalize well to novel testing domains. We propose a novel {meta-learning} method for domain generalization. Rather than designing a specific model that is robust to domain shift as in most previous DG work, we propose a model agnostic training procedure for DG. Our algorithm simulates train/test domain shift during training by synthesizing virtual testing domains within each mini-batch. The meta-optimization objective requires that steps to improve training domain performance should also improve testing domain performance. This meta-learning procedure trains models with good generalization ability to novel domains. We evaluate our method and achieve state of the art results on a recent cross-domain image classification benchmark, as well demonstrating its potential on two classic reinforcement learning tasks.

* 8 pages, 2 figures, under review of AAAI 2018 

  Access Model/Code and Paper
Deeper, Broader and Artier Domain Generalization

Oct 09, 2017
Da Li, Yongxin Yang, Yi-Zhe Song, Timothy M. Hospedales

The problem of domain generalization is to learn from multiple training domains, and extract a domain-agnostic model that can then be applied to an unseen domain. Domain generalization (DG) has a clear motivation in contexts where there are target domains with distinct characteristics, yet sparse data for training. For example recognition in sketch images, which are distinctly more abstract and rarer than photos. Nevertheless, DG methods have primarily been evaluated on photo-only benchmarks focusing on alleviating the dataset bias where both problems of domain distinctiveness and data sparsity can be minimal. We argue that these benchmarks are overly straightforward, and show that simple deep learning baselines perform surprisingly well on them. In this paper, we make two main contributions: Firstly, we build upon the favorable domain shift-robust properties of deep learning methods, and develop a low-rank parameterized CNN model for end-to-end DG learning. Secondly, we develop a DG benchmark dataset covering photo, sketch, cartoon and painting domains. This is both more practically relevant, and harder (bigger domain shift) than existing benchmarks. The results show that our method outperforms existing DG alternatives, and our dataset provides a more significant DG challenge to drive future research.

* 9 pages, 4 figures, ICCV 2017 

  Access Model/Code and Paper
Trajectory Grouping with Curvature Regularization for Tubular Structure Tracking

Mar 08, 2020
Li Liu, Jiong Zhang, Da Chen, HUazhong Shu, Laurent D. Cohen

Tubular structure tracking is an important and difficult problem in the fields of computer vision and medical image analysis. The minimal path models have exhibited its power in tracing tubular structures, by which a centerline can be naturally treated as a minimal path with a suitable geodesic metric. However, existing minimal path-based tubular structure tracing models still suffer from difficulty like the shortcuts and short branches combination problems, especially when dealing with the images with a complicated background. We introduce a new minima path-based model for minimally interactive tubular structure centerline extraction in conjunction with a perceptual grouping scheme. We take into account the prescribed tubular trajectories and the relevant curvature-penalized geodesic distances for minimal paths extraction in a graph-based optimization way. Experimental results on both synthetic and real images prove that the proposed model indeed obtains outperformance comparing to state-of-the-art minimal path-based tubular structure tracing algorithms.

  Access Model/Code and Paper
Subgoal Discovery for Hierarchical Dialogue Policy Learning

Sep 22, 2018
Da Tang, Xiujun Li, Jianfeng Gao, Chong Wang, Lihong Li, Tony Jebara

Developing agents to engage in complex goal-oriented dialogues is challenging partly because the main learning signals are very sparse in long conversations. In this paper, we propose a divide-and-conquer approach that discovers and exploits the hidden structure of the task to enable efficient policy learning. First, given successful example dialogues, we propose the Subgoal Discovery Network (SDN) to divide a complex goal-oriented task into a set of simpler subgoals in an unsupervised fashion. We then use these subgoals to learn a multi-level policy by hierarchical reinforcement learning. We demonstrate our method by building a dialogue agent for the composite task of travel planning. Experiments with simulated and real users show that our approach performs competitively against a state-of-the-art method that requires human-defined subgoals. Moreover, we show that the learned subgoals are often human comprehensible.

* 11 pages, 6 figures, EMNLP 2018 

  Access Model/Code and Paper
Sketch-a-Classifier: Sketch-based Photo Classifier Generation

Apr 30, 2018
Conghui Hu, Da Li, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales

Contemporary deep learning techniques have made image recognition a reasonably reliable technology. However training effective photo classifiers typically takes numerous examples which limits image recognition's scalability and applicability to scenarios where images may not be available. This has motivated investigation into zero-shot learning, which addresses the issue via knowledge transfer from other modalities such as text. In this paper we investigate an alternative approach of synthesizing image classifiers: almost directly from a user's imagination, via free-hand sketch. This approach doesn't require the category to be nameable or describable via attributes as per zero-shot learning. We achieve this via training a {model regression} network to map from {free-hand sketch} space to the space of photo classifiers. It turns out that this mapping can be learned in a category-agnostic way, allowing photo classifiers for new categories to be synthesized by user with no need for annotated training photos. {We also demonstrate that this modality of classifier generation can also be used to enhance the granularity of an existing photo classifier, or as a complement to name-based zero-shot learning.

* published in CVPR2018 as spotlight 

  Access Model/Code and Paper
VisualBERT: A Simple and Performant Baseline for Vision and Language

Aug 09, 2019
Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang

We propose VisualBERT, a simple and flexible framework for modeling a broad range of vision-and-language tasks. VisualBERT consists of a stack of Transformer layers that implicitly align elements of an input text and regions in an associated input image with self-attention. We further propose two visually-grounded language model objectives for pre-training VisualBERT on image caption data. Experiments on four vision-and-language tasks including VQA, VCR, NLVR2, and Flickr30K show that VisualBERT outperforms or rivals with state-of-the-art models while being significantly simpler. Further analysis demonstrates that VisualBERT can ground elements of language to image regions without any explicit supervision and is even sensitive to syntactic relationships, tracking, for example, associations between verbs and image regions corresponding to their arguments.

* Work in Progress 

  Access Model/Code and Paper
Deep Factorised Inverse-Sketching

Aug 07, 2018
Kaiyue Pang, Da Li, Jifei Song, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales

Modelling human free-hand sketches has become topical recently, driven by practical applications such as fine-grained sketch based image retrieval (FG-SBIR). Sketches are clearly related to photo edge-maps, but a human free-hand sketch of a photo is not simply a clean rendering of that photo's edge map. Instead there is a fundamental process of abstraction and iconic rendering, where overall geometry is warped and salient details are selectively included. In this paper we study this sketching process and attempt to invert it. We model this inversion by translating iconic free-hand sketches to contours that resemble more geometrically realistic projections of object boundaries, and separately factorise out the salient added details. This factorised re-representation makes it easier to match a free-hand sketch to a photo instance of an object. Specifically, we propose a novel unsupervised image style transfer model based on enforcing a cyclic embedding consistency constraint. A deep FG-SBIR model is then formulated to accommodate complementary discriminative detail from each factorised sketch for better matching with the corresponding photo. Our method is evaluated both qualitatively and quantitatively to demonstrate its superiority over a number of state-of-the-art alternatives for style transfer and FG-SBIR.

* Accepted to ECCV 2018 

  Access Model/Code and Paper
Episodic Training for Domain Generalization

Jan 31, 2019
Da Li, Jianshu Zhang, Yongxin Yang, Cong Liu, Yi-Zhe Song, Timothy M. Hospedales

Domain generalization (DG) is the challenging and topical problem of learning models that generalize to novel testing domain with different statistics than a set of known training domains. The simple approach of aggregating data from all source domains and training a single deep neural network end-to-end on all the data provides a surprisingly strong baseline that surpasses many prior published methods. In this paper we build on this strong baseline by designing an episodic training procedure that trains a single deep network in a way that exposes it to the domain shift that characterises a novel domain at runtime. Specifically, we decompose a deep network into feature extractor and classifier components, and then train each component by simulating it interacting with a partner who is badly tuned for the current domain. This makes both components more robust, ultimately leading to our networks producing state-of-the-art performance on three DG benchmarks. As a demonstration, we consider the pervasive workflow of using an ImageNet trained CNN as a fixed feature extractor for downstream recognition tasks. Using the Visual Decathlon benchmark, we demonstrate that our episodic-DG training improves the performance of such a general purpose feature extractor by explicitly training it for robustness to novel problems. This provides the largest-scale demonstration of heterogeneous DG to date.

* technical report 

  Access Model/Code and Paper
Signet Ring Cell Detection With a Semi-supervised Learning Framework

Jul 09, 2019
Jiahui Li, Shuang Yang, Xiaodi Huang, Qian Da, Xiaoqun Yang, Zhiqiang Hu, Qi Duan, Chaofu Wang, Hongsheng Li

Signet ring cell carcinoma is a type of rare adenocarcinoma with poor prognosis. Early detection leads to huge improvement of patients' survival rate. However, pathologists can only visually detect signet ring cells under the microscope. This procedure is not only laborious but also prone to omission. An automatic and accurate signet ring cell detection solution is thus important but has not been investigated before. In this paper, we take the first step to present a semi-supervised learning framework for the signet ring cell detection problem. Self-training is proposed to deal with the challenge of incomplete annotations, and cooperative-training is adapted to explore the unlabeled regions. Combining the two techniques, our semi-supervised learning framework can make better use of both labeled and unlabeled data. Experiments on large real clinical data demonstrate the effectiveness of our design. Our framework achieves accurate signet ring cell detection and can be readily applied in the clinical trails. The dataset will be released soon to facilitate the development of the area.

* Published in The 26th international conference on Information Processing in Medical Imaging (IPMI) 

  Access Model/Code and Paper
Class Regularization: Improve Few-shot Image Classification by Reducing Meta Shift

Dec 18, 2019
Da Chen, Feng Mao, Mingli Song, Yuan He, Xiang Wu, Jinqiao Wang, Wenbin Li, Yongliang Yang, Hui Xue

Few-shot image classification requires the classifier to robustly cope with unseen classes even if there are only a few samples for each class. Recent advances benefit from the meta-learning process where episodic tasks are formed to train a model that can adapt to class change. However, these tasks are independent to each other and existing works mainly rely on limited samples of individual support set in a single meta task. This strategy leads to severe meta shift issues across multiple tasks, meaning the learned prototypes or class descriptors are not stable as each task only involves their own support set. To avoid this problem, we propose a concise Class Regularization Network which aggregates the embedding features of all samples in the entire training set and further regularizes the generated class descriptor. The key is to train a class encoder and decoder structure that can encode the embedding sample features into a class domain with trained class basis, and generate a more stable and general class descriptor from the decoder. We evaluate our work by extensive comparisons with previous methods on two benchmark datasets (MiniImageNet and CUB). The results show that our method achieves state-of-the-art performance over previous work.

  Access Model/Code and Paper
Learning with Hierarchical Complement Objective

Nov 17, 2019
Hao-Yun Chen, Li-Huang Tsai, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan

Label hierarchies widely exist in many vision-related problems, ranging from explicit label hierarchies existed in image classification to latent label hierarchies existed in semantic segmentation. Nevertheless, state-of-the-art methods often deploy cross-entropy loss that implicitly assumes class labels to be exclusive and thus independence from each other. Motivated by the fact that classes from the same parental category usually share certain similarity, we design a new training diagram called Hierarchical Complement Objective Training (HCOT) that leverages the information from label hierarchy. HCOT maximizes the probability of the ground truth class, and at the same time, neutralizes the probabilities of rest of the classes in a hierarchical fashion, making the model take advantage of the label hierarchy explicitly. The proposed HCOT is evaluated on both image classification and semantic segmentation tasks. Experimental results confirm that HCOT outperforms state-of-the-art models in CIFAR-100, ImageNet-2012, and PASCAL-Context. The study further demonstrates that HCOT can be applied on tasks with latent label hierarchies, which is a common characteristic in many machine learning tasks.

  Access Model/Code and Paper
HAR-Net: Joint Learning of Hybrid Attention for Single-stage Object Detection

Apr 25, 2019
Ya-Li Li, Shengjin Wang

Object detection has been a challenging task in computer vision. Although significant progress has been made in object detection with deep neural networks, the attention mechanism is far from development. In this paper, we propose the hybrid attention mechanism for single-stage object detection. First, we present the modules of spatial attention, channel attention and aligned attention for single-stage object detection. In particular, stacked dilated convolution layers with symmetrically fixed rates are constructed to learn spatial attention. The channel attention is proposed with the cross-level group normalization and squeeze-and-excitation module. Aligned attention is constructed with organized deformable filters. Second, the three kinds of attention are unified to construct the hybrid attention mechanism. We then embed the hybrid attention into Retina-Net and propose the efficient single-stage HAR-Net for object detection. The attention modules and the proposed HAR-Net are evaluated on the COCO detection dataset. Experiments demonstrate that hybrid attention can significantly improve the detection accuracy and the HAR-Net can achieve the state-of-the-art 45.8\% mAP, outperform existing single-stage object detectors.

  Access Model/Code and Paper