Models, code, and papers for "Fan Jiang":

Feature Augmentation via Nonparametrics and Selection (FANS) in High Dimensional Classification

Jan 02, 2015
Jianqing Fan, Yang Feng, Jiancheng Jiang, Xin Tong

We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.

* 30 pages, 2 figures 

  Click for Model/Code and Paper
Salient Objects in Clutter: Bringing Salient Object Detection to the Foreground

Jul 22, 2018
Deng-Ping Fan, Ming-Ming Cheng, Jiang-Jiang Liu, Shang-Hua Gao, Qibin Hou, Ali Borji

We provide a comprehensive evaluation of salient object detection (SOD) models. Our analysis identifies a serious design bias of existing SOD datasets which assumes that each image contains at least one clearly outstanding salient object in low clutter. The design bias has led to a saturated high performance for state-of-the-art SOD models when evaluated on existing datasets. The models, however, still perform far from being satisfactory when applied to real-world daily scenes. Based on our analyses, we first identify 7 crucial aspects that a comprehensive and balanced dataset should fulfill. Then, we propose a new high quality dataset and update the previous saliency benchmark. Specifically, our SOC (Salient Objects in Clutter) dataset, includes images with salient and non-salient objects from daily object categories. Beyond object category annotations, each salient image is accompanied by attributes that reflect common challenges in real-world scenes. Finally, we report attribute-based performance assessment on our dataset.

* ECCV 2018 

  Click for Model/Code and Paper
Dataless training of generative models for the inverse design of metasurfaces

Jun 18, 2019
Jiaqi Jiang, Jonathan A. Fan

Metasurfaces are subwavelength-structured artificial media that can shape and localize electromagnetic waves in unique ways. The inverse design of metasurfaces is a non-convex optimization problem in a high dimensional space, making global optimization a huge challenge. We present a new type of global optimization algorithm, based on the training of a generative neural network without a training set, which can produce high-performance metasurfaces. Instead of directly optimizing devices one at a time, we reframe the optimization as the training of a generator that iteratively enhances the probability of generating high-performance devices. The loss function used for backpropagation is defined as a function of generated patterns and their efficiency gradients, which are calculated by the adjoint variable method using the forward and adjoint electromagnetic simulations. We observe that distributions of devices generated by the network continuously shift towards high-performance design space regions over the course of optimization. Upon training completion, the best-generated devices have efficiencies comparable to or exceeding the best devices designed using standard topology optimization. We envision that our proposed global optimization algorithm generally applies to other gradient-based optimization problems in optics, mechanics and electronics.

* 9 pages, 5 figures 

  Click for Model/Code and Paper
Progressive-Growing of Generative Adversarial Networks for Metasurface Optimization

Dec 02, 2019
Fufang Wen, Jiaqi Jiang, Jonathan A. Fan

Generative adversarial networks, which can generate metasurfaces based on a training set of high performance device layouts, have the potential to significantly reduce the computational cost of the metasurface design process. However, basic GAN architectures are unable to fully capture the detailed features of topologically complex metasurfaces, and generated devices therefore require additional computationally-expensive design refinement. In this Letter, we show that GANs can better learn spatially fine features from high-resolution training data by progressively growing its network architecture and training set. Our results indicate that with this training methodology, the best generated devices have performances that compare well with the best devices produced by gradient-based topology optimization, thereby eliminating the need for additional design refinement. We envision that this network training method can generalize to other physical systems where device performance is strongly correlated with fine geometric structuring.

  Click for Model/Code and Paper
Connectionist Recommendation in the Wild

Sep 15, 2018
Zachary A. Pardos, Zihao Fan, Weijie Jiang

The aggregate behaviors of users can collectively encode deep semantic information about the objects with which they interact. In this paper, we demonstrate novel ways in which the synthesis of these data can illuminate the terrain of users' environment and support them in their decision making and wayfinding. A novel application of Recurrent Neural Networks and skip-gram models, approaches popularized by their application to modeling language, are brought to bear on student university enrollment sequences to create vector representations of courses and map out traversals across them. We present demonstrations of how scrutability from these neural networks can be gained and how the combination of these techniques can be seen as an evolution of content tagging and a means for a recommender to balance user preferences inferred from data with those explicitly specified. From validation of the models to the development of a UI, we discuss additional requisite functionality informed by the results of a field study leading to the ultimate deployment of the system at a university.

  Click for Model/Code and Paper
Deep Scene Text Detection with Connected Component Proposals

Aug 17, 2017
Fan Jiang, Zhihui Hao, Xinran Liu

A growing demand for natural-scene text detection has been witnessed by the computer vision community since text information plays a significant role in scene understanding and image indexing. Deep neural networks are being used due to their strong capabilities of pixel-wise classification or word localization, similar to being used in common vision problems. In this paper, we present a novel two-task network with integrating bottom and top cues. The first task aims to predict a pixel-by-pixel labeling and based on which, word proposals are generated with a canonical connected component analysis. The second task aims to output a bundle of character candidates used later to verify the word proposals. The two sub-networks share base convolutional features and moreover, we present a new loss to strengthen the interaction between them. We evaluate the proposed network on public benchmark datasets and show it can detect arbitrary-orientation scene text with a finer output boundary. In ICDAR 2013 text localization task, we achieve the state-of-the-art performance with an F-score of 0.919 and a much better recall of 0.915.

* 10 pages, 5 figures 

  Click for Model/Code and Paper
Attentive Geo-Social Group Recommendation

Nov 15, 2019
Fei Yu, Feiyi Fan, Shouxu Jiang, Kaiping Zheng

Social activities play an important role in people's daily life since they interact. For recommendations based on social activities, it is vital to have not only the activity information but also individuals' social relations. Thanks to the geo-social networks and widespread use of location-aware mobile devices, massive geo-social data is now readily available for exploitation by the recommendation system. In this paper, a novel group recommendation method, called attentive geo-social group recommendation, is proposed to recommend the target user with both activity locations and a group of users that may join the activities. We present an attention mechanism to model the influence of the target user $u_T$ in candidate user groups that satisfy the social constraints. It helps to retrieve the optimal user group and activity topic candidates, as well as explains the group decision-making process. Once the user group and topics are retrieved, a novel efficient spatial query algorithm SPA-DF is employed to determine the activity location under the constraints of the given user group and activity topic candidates. The proposed method is evaluated in real-world datasets and the experimental results show that the proposed model significantly outperforms baseline methods.

* 12 pages, 7 figures 

  Click for Model/Code and Paper
STNReID : Deep Convolutional Networks with Pairwise Spatial Transformer Networks for Partial Person Re-identification

Mar 17, 2019
Hao Luo, Xing Fan, Chi Zhang, Wei Jiang

Partial person re-identification (ReID) is a challenging task because only partial information of person images is available for matching target persons. Few studies, especially on deep learning, have focused on matching partial person images with holistic person images. This study presents a novel deep partial ReID framework based on pairwise spatial transformer networks (STNReID), which can be trained on existing holistic person datasets. STNReID includes a spatial transformer network (STN) module and a ReID module. The STN module samples an affined image (a semantically corresponding patch) from the holistic image to match the partial image. The ReID module extracts the features of the holistic, partial, and affined images. Competition (or confrontation) is observed between the STN module and the ReID module, and two-stage training is applied to acquire a strong STNReID for partial ReID. Experimental results show that our STNReID obtains 66.7% and 54.6% rank-1 accuracies on partial ReID and partial iLIDS datasets, respectively. These values are at par with those obtained with state-of-the-art methods.

  Click for Model/Code and Paper
SphereReID: Deep Hypersphere Manifold Embedding for Person Re-Identification

Jul 02, 2018
Xing Fan, Wei Jiang, Hao Luo, Mengjuan Fei

Many current successful Person Re-Identification(ReID) methods train a model with the softmax loss function to classify images of different persons and obtain the feature vectors at the same time. However, the underlying feature embedding space is ignored. In this paper, we use a modified softmax function, termed Sphere Softmax, to solve the classification problem and learn a hypersphere manifold embedding simultaneously. A balanced sampling strategy is also introduced. Finally, we propose a convolutional neural network called SphereReID adopting Sphere Softmax and training a single model end-to-end with a new warming-up learning rate schedule on four challenging datasets including Market-1501, DukeMTMC-reID, CHHK-03, and CUHK-SYSU. Experimental results demonstrate that this single model outperforms the state-of-the-art methods on all four datasets without fine-tuning or re-ranking. For example, it achieves 94.4% rank-1 accuracy on Market-1501 and 83.9% rank-1 accuracy on DukeMTMC-reID. The code and trained weights of our model will be released.

* Contribute to Journal of Visual Communication and Image Representation 

  Click for Model/Code and Paper
DFPENet-geology: A Deep Learning Framework for High Precision Recognition and Segmentation of Co-seismic Landslides

Aug 28, 2019
Qingsong Xu, Chaojun Ouyang, Tianhai Jiang, Xuanmei Fan, Duoxiang Cheng

This paper develops a robust model, Dense Feature Pyramid with Encoder-decoder Network (DFPENet), to understand and fuse the multi-scale features of objects in remote sensing images. The proposed method achieves a competitive segmentation accuracy on the public ISPRS 2D Semantic. Furthermore, a comprehensive and widely-used scheme is proposed for co-seismic landslide recognition, which integrates image features extracted from the DFPENet model, geologic features, temporal resolution, landslide spatial analysis, and transfer learning, while only RGB images are used. To corroborate its feasibility and applicability, the proposed scheme is applied to two earthquake-triggered landslides in Jiuzhaigou (China) and Hokkaido (Japan), using available pre- and post-earthquake remote sensing images. The experiments show that the proposed scheme presents a new state-of-the-art performance in regional landslide identification, and performs well in different seismic landslide recognition tasks, though landslide boundary error is not considered. The proposed scheme demonstrates a competitive performance for high-precision, high-efficiency and cross-scene recognition of earthquake disasters, which may serve as a starting point for the application of deep learning methods in co-seismic landslide recognition.

* 31 pages, 11 figures, 6 tables, Preprint submitted to Remote Sensing of Environment 

  Click for Model/Code and Paper
SCPNet: Spatial-Channel Parallelism Network for Joint Holistic and Partial Person Re-Identification

Oct 16, 2018
Xing Fan, Hao Luo, Xuan Zhang, Lingxiao He, Chi Zhang, Wei Jiang

Holistic person re-identification (ReID) has received extensive study in the past few years and achieves impressive progress. However, persons are often occluded by obstacles or other persons in practical scenarios, which makes partial person re-identification non-trivial. In this paper, we propose a spatial-channel parallelism network (SCPNet) in which each channel in the ReID feature pays attention to a given spatial part of the body. The spatial-channel corresponding relationship supervises the network to learn discriminative feature for both holistic and partial person re-identification. The single model trained on four holistic ReID datasets achieves competitive accuracy on these four datasets, as well as outperforms the state-of-the-art methods on two partial ReID datasets without training.

* accepted by ACCV 2018 

  Click for Model/Code and Paper
Learning Deep Face Representation

Mar 12, 2014
Haoqiang Fan, Zhimin Cao, Yuning Jiang, Qi Yin, Chinchilla Doudou

Face representation is a crucial step of face recognition systems. An optimal face representation should be discriminative, robust, compact, and very easy-to-implement. While numerous hand-crafted and learning-based representations have been proposed, considerable room for improvement is still present. In this paper, we present a very easy-to-implement deep learning framework for face representation. Our method bases on a new structure of deep network (called Pyramid CNN). The proposed Pyramid CNN adopts a greedy-filter-and-down-sample operation, which enables the training procedure to be very fast and computation-efficient. In addition, the structure of Pyramid CNN can naturally incorporate feature sharing across multi-scale face representations, increasing the discriminative ability of resulting representation. Our basic network is capable of achieving high recognition accuracy ($85.8\%$ on LFW benchmark) with only 8 dimension representation. When extended to feature-sharing Pyramid CNN, our system achieves the state-of-the-art performance ($97.3\%$) on LFW benchmark. We also introduce a new benchmark of realistic face images on social network and validate our proposed representation has a good ability of generalization.

  Click for Model/Code and Paper
Learning What Data to Learn

Feb 28, 2017
Yang Fan, Fei Tian, Tao Qin, Jiang Bian, Tie-Yan Liu

Machine learning is essentially the sciences of playing with data. An adaptive data selection strategy, enabling to dynamically choose different data at various training stages, can reach a more effective model in a more efficient way. In this paper, we propose a deep reinforcement learning framework, which we call \emph{\textbf{N}eural \textbf{D}ata \textbf{F}ilter} (\textbf{NDF}), to explore automatic and adaptive data selection in the training process. In particular, NDF takes advantage of a deep neural network to adaptively select and filter important data instances from a sequential stream of training data, such that the future accumulative reward (e.g., the convergence speed) is maximized. In contrast to previous studies in data selection that is mainly based on heuristic strategies, NDF is quite generic and thus can be widely suitable for many machine learning tasks. Taking neural network training with stochastic gradient descent (SGD) as an example, comprehensive experiments with respect to various neural network modeling (e.g., multi-layer perceptron networks, convolutional neural networks and recurrent neural networks) and several applications (e.g., image classification and text understanding) demonstrate that NDF powered SGD can achieve comparable accuracy with standard SGD process by using less data and fewer iterations.

* A preliminary version will appear in ICLR 2017, workshop track. 

  Click for Model/Code and Paper
Data-driven metasurface discovery

Nov 29, 2018
Jiaqi Jiang, David Sell, Stephan Hoyer, Jason Hickey, Jianji Yang, Jonathan A. Fan

A long-standing challenge with metasurface design is identifying computationally efficient methods that produce high performance devices. Design methods based on iterative optimization push the performance limits of metasurfaces, but they require extensive computational resources that limit their implementation to small numbers of microscale devices. We show that generative neural networks can learn from a small set of topology-optimized metasurfaces to produce large numbers of high-efficiency, topologically-complex metasurfaces operating across a large parameter space. This approach enables considerable savings in computation cost compared to brute force optimization. As a model system, we employ conditional generative adversarial networks to design highly-efficient metagratings over a broad range of deflection angles and operating wavelengths. Generated device designs can be further locally optimized and serve as additional training data for network refinement. Our design concept utilizes a relatively small initial training set of just a few hundred devices, and it serves as a more general blueprint for the AI-based analysis of physical systems where access to large datasets is limited. We envision that such data-driven design tools can be broadly utilized in other domains of optics, acoustics, mechanics, and electronics.

* 14 pages, 5 figures 

  Click for Model/Code and Paper
Learning Aggregated Transmission Propagation Networks for Haze Removal and Beyond

Jul 31, 2018
Risheng Liu, Xin Fan, Minjun Hou, Zhiying Jiang, Zhongxuan Luo, Lei Zhang

Single image dehazing is an important low-level vision task with many applications. Early researches have investigated different kinds of visual priors to address this problem. However, they may fail when their assumptions are not valid on specific images. Recent deep networks also achieve relatively good performance in this task. But unfortunately, due to the disappreciation of rich physical rules in hazes, large amounts of data are required for their training. More importantly, they may still fail when there exist completely different haze distributions in testing images. By considering the collaborations of these two perspectives, this paper designs a novel residual architecture to aggregate both prior (i.e., domain knowledge) and data (i.e., haze distribution) information to propagate transmissions for scene radiance estimation. We further present a variational energy based perspective to investigate the intrinsic propagation behavior of our aggregated deep model. In this way, we actually bridge the gap between prior driven models and data driven networks and leverage advantages but avoid limitations of previous dehazing approaches. A lightweight learning framework is proposed to train our propagation network. Finally, by introducing a taskaware image separation formulation with a flexible optimization scheme, we extend the proposed model for more challenging vision tasks, such as underwater image enhancement and single image rain removal. Experiments on both synthetic and realworld images demonstrate the effectiveness and efficiency of the proposed framework.

  Click for Model/Code and Paper
High Quality Image Interpolation via Local Autoregressive and Nonlocal 3-D Sparse Regularization

Dec 25, 2012
Xinwei Gao, Jian Zhang, Feng Jiang, Xiaopeng Fan, Siwei Ma, Debin Zhao

In this paper, we propose a novel image interpolation algorithm, which is formulated via combining both the local autoregressive (AR) model and the nonlocal adaptive 3-D sparse model as regularized constraints under the regularization framework. Estimating the high-resolution image by the local AR regularization is different from these conventional AR models, which weighted calculates the interpolation coefficients without considering the rough structural similarity between the low-resolution (LR) and high-resolution (HR) images. Then the nonlocal adaptive 3-D sparse model is formulated to regularize the interpolated HR image, which provides a way to modify these pixels with the problem of numerical stability caused by AR model. In addition, a new Split-Bregman based iterative algorithm is developed to solve the above optimization problem iteratively. Experiment results demonstrate that the proposed algorithm achieves significant performance improvements over the traditional algorithms in terms of both objective quality and visual perception

* 4 pages, 5 figures, 2 tables, to be published at IEEE Visual Communications and Image Processing (VCIP) 2012 

  Click for Model/Code and Paper
$S^4$Net: Single Stage Salient-Instance Segmentation

Nov 21, 2017
Ruochen Fan, Qibin Hou, Ming-Ming Cheng, Tai-Jiang Mu, Shi-Min Hu

In this paper, we consider an interesting vision problem---salient instance segmentation. Other than producing approximate bounding boxes, our network also outputs high-quality instance-level segments. Taking into account the category-independent property of each target, we design a single stage salient instance segmentation framework, with a novel segmentation branch. Our new branch regards not only local context inside each detection window but also its surrounding context, enabling us to distinguish the instances in the same scope even with obstruction. Our network is end-to-end trainable and runs at a fast speed (40 fps when processing an image with resolution $320 \times 320$). We evaluate our approach on a public available benchmark and show that it outperforms other alternative solutions. In addition, we also provide a thorough analysis of the design choices to help readers better understand the functions of each part of our network. To facilitate the development of this area, our code will be available at \url{}.

  Click for Model/Code and Paper
Automated Steel Bar Counting and Center Localization with Convolutional Neural Networks

Jun 03, 2019
Zhun Fan, Jiewei Lu, Benzhang Qiu, Tao Jiang, Kang An, Alex Noel Josephraj, Chuliang Wei

Automated steel bar counting and center localization plays an important role in the factory automation of steel bars. Traditional methods only focus on steel bar counting and their performances are often limited by complex industrial environments. Convolutional neural network (CNN), which has great capability to deal with complex tasks in challenging environments, is applied in this work. A framework called CNN-DC is proposed to achieve automated steel bar counting and center localization simultaneously. The proposed framework CNN-DC first detects the candidate center points with a deep CNN. Then an effective clustering algorithm named as Distance Clustering(DC) is proposed to cluster the candidate center points and locate the true centers of steel bars. The proposed CNN-DC can achieve 99.26% accuracy for steel bar counting and 4.1% center offset for center localization on the established steel bar dataset, which demonstrates that the proposed CNN-DC can perform well on automated steel bar counting and center localization. Code is made publicly available at:

* Ready to submit IEEE Transactions on Industrial Informatics 

  Click for Model/Code and Paper
A Semantic-based Medical Image Fusion Approach

Jun 01, 2019
Fanda Fan, Yunyou Huang, Lei Wang, Xingwang Xiong, Zihan Jiang, Zhifei Zhang, Jianfeng Zhan

It is necessary for clinicians to comprehensively analyze patient information from different sources. Medical image fusion is a promising approach to providing overall information from medical images of different modalities. However, existing medical image fusion approaches ignore the semantics of images, making the fused image difficult to understand. In this paper, we put forward a semantic-based medical image fusion methodology, and as an implementation, we propose a Fusion W-Net (FW-Net) for multimodal medical image fusion. The experimental results are promising: the fused image generated by our approach greatly reduces the semantic information loss, and has comparable visual effects in contrast to the state-of-art approaches. Our approach and tool have great potential to be applied in the clinical setting. The source code of FW-Net is available at

  Click for Model/Code and Paper
VLUC: An Empirical Benchmark for Video-Like Urban Computing on Citywide Crowd and Traffic Prediction

Nov 16, 2019
Renhe Jiang, Zekun Cai, Zhaonan Wang, Chuang Yang, Zipei Fan, Xuan Song, Kota Tsubouchi, Ryosuke Shibasaki

Nowadays, massive urban human mobility data are being generated from mobile phones, car navigation systems, and traffic sensors. Predicting the density and flow of the crowd or traffic at a citywide level becomes possible by using the big data and cutting-edge AI technologies. It has been a very significant research topic with high social impact, which can be widely applied to emergency management, traffic regulation, and urban planning. In particular, by meshing a large urban area to a number of fine-grained mesh-grids, citywide crowd and traffic information in a continuous time period can be represented like a video, where each timestamp can be seen as one video frame. Based on this idea, a series of methods have been proposed to address video-like prediction for citywide crowd and traffic. In this study, we publish a new aggregated human mobility dataset generated from a real-world smartphone application and build a standard benchmark for such kind of video-like urban computing with this new dataset and the existing open datasets. We first comprehensively review the state-of-the-art works of literature and formulate the density and in-out flow prediction problem, then conduct a thorough performance assessment for those methods. With this benchmark, we hope researchers can easily follow up and quickly launch a new solution on this topic.

  Click for Model/Code and Paper