Models, code, and papers for "Shu Liang":

Sep 19, 2017
Davis Liang, Yan Shu

* IJCNLP 2017
##### Normalized Total Gradient: A New Measure for Multispectral Image Registration

Feb 15, 2017
Shu-Jie Chen, Hui-Liang Shen

Image registration is a fundamental issue in multispectral image processing. In filter wheel based multispectral imaging systems, the non-coplanar placement of the filters always causes the misalignment of multiple channel images. The selective characteristic of spectral response in multispectral imaging raises two challenges to image registration. First, the intensity levels of a local region may be different in individual channel images. Second, the local intensity may vary rapidly in some channel images while keeps stationary in others. Conventional multimodal measures, such as mutual information, correlation coefficient, and correlation ratio, can register images with different regional intensity levels, but will fail in the circumstance of severe local intensity variation. In this paper, a new measure, namely normalized total gradient (NTG), is proposed for multispectral image registration. The NTG is applied on the difference between two channel images. This measure is based on the key assumption (observation) that the gradient of difference image between two aligned channel images is sparser than that between two misaligned ones. A registration framework, which incorporates image pyramid and global/local optimization, is further introduced for rigid transform. Experimental results validate that the proposed method is effective for multispectral image registration and performs better than conventional methods.

* 12 pages, 11 figures
##### 3D Face Hallucination from a Single Depth Frame

Sep 13, 2018
Shu Liang, Ira Kemelmacher-Shlizerman, Linda G. Shapiro

We present an algorithm that takes a single frame of a person's face from a depth camera, e.g., Kinect, and produces a high-resolution 3D mesh of the input face. We leverage a dataset of 3D face meshes of 1204 distinct individuals ranging from age 3 to 40, captured in a neutral expression. We divide the input depth frame into semantically significant regions (eyes, nose, mouth, cheeks) and search the database for the best matching shape per region. We further combine the input depth frame with the matched database shapes into a single mesh that results in a high-resolution shape of the input person. Our system is fully automatic and uses only depth data for matching, making it invariant to imaging conditions. We evaluate our results using ground truth shapes, as well as compare to state-of-the-art shape estimation methods. We demonstrate the robustness of our local matching approach with high-quality reconstruction of faces that fall outside of the dataset span, e.g., faces older than 40 years old, facial expressions, and different ethnicities.

* published on 3Dv 2014
##### Head Reconstruction from Internet Photos

Sep 13, 2018
Shu Liang, Linda G. Shapiro, Ira Kemelmacher-Shlizerman

3D face reconstruction from Internet photos has recently produced exciting results. A person's face, e.g., Tom Hanks, can be modeled and animated in 3D from a completely uncalibrated photo collection. Most methods, however, focus solely on face area and mask out the rest of the head. This paper proposes that head modeling from the Internet is a problem we can solve. We target reconstruction of the rough shape of the head. Our method is to gradually "grow" the head mesh starting from the frontal face and extending to the rest of views using photometric stereo constraints. We call our method boundary-value growing algorithm. Results on photos of celebrities downloaded from the Internet are presented.

* Published on ECCV 2016
##### A Comprehensive Survey on Cross-modal Retrieval

Jul 21, 2016
Kaiye Wang, Qiyue Yin, Wei Wang, Shu Wu, Liang Wang

In recent years, cross-modal retrieval has drawn much attention due to the rapid growth of multimodal data. It takes one type of data as the query to retrieve relevant data of another type. For example, a user can use a text to retrieve relevant pictures or videos. Since the query and its retrieved results can be of different modalities, how to measure the content similarity between different modalities of data remains a challenge. Various methods have been proposed to deal with such a problem. In this paper, we first review a number of representative methods for cross-modal retrieval and classify them into two main groups: 1) real-valued representation learning, and 2) binary representation learning. Real-valued representation learning methods aim to learn real-valued common representations for different modalities of data. To speed up the cross-modal retrieval, a number of binary representation learning methods are proposed to map different modalities of data into a common Hamming space. Then, we introduce several multimodal datasets in the community, and show the experimental results on two commonly used multimodal datasets. The comparison reveals the characteristic of different kinds of cross-modal retrieval methods, which is expected to benefit both practical applications and future research. Finally, we discuss open problems and future research directions.

* 20 pages, 11 figures, 9 tables
##### Variational Quantum Algorithms for Dimensionality Reduction and Classification

Oct 27, 2019
Jin-Min Liang, Shu-Qian Shen, Ming Li, Lei Li

Dimensionality reduction and classification play an absolutely critical role in pattern recognition and machine learning. In this work, we present a quantum neighborhood preserving embedding and a quantum local discriminant embedding for dimensionality reduction and classification. These two algorithms have an exponential speedup over their respectively classical counterparts. Along the way, we propose a variational quantum generalized eigenvalue solver (VQGE) that finds the generalized eigenvalues and eigenvectors of a matrix pencil $(\mathcal{G},\mathcal{S})$ with coherence time $O(1)$. We successfully conduct numerical experiment solving a problem size of $2^5\times2^5$. Moreover, our results offer two optional outputs with quantum or classical form, which can be directly applied in another quantum or classical machine learning process.

##### Context-aware Sequential Recommendation

Sep 19, 2016
Qiang Liu, Shu Wu, Diyi Wang, Zhaokang Li, Liang Wang

Since sequential information plays an important role in modeling user behaviors, various sequential recommendation methods have been proposed. Methods based on Markov assumption are widely-used, but independently combine several most recent components. Recently, Recurrent Neural Networks (RNN) based methods have been successfully applied in several sequential modeling tasks. However, for real-world applications, these methods have difficulty in modeling the contextual information, which has been proved to be very important for behavior modeling. In this paper, we propose a novel model, named Context-Aware Recurrent Neural Networks (CA-RNN). Instead of using the constant input matrix and transition matrix in conventional RNN models, CA-RNN employs adaptive context-specific input matrices and adaptive context-specific transition matrices. The adaptive context-specific input matrices capture external situations where user behaviors happen, such as time, location, weather and so on. And the adaptive context-specific transition matrices capture how lengths of time intervals between adjacent behaviors in historical sequences affect the transition of global sequential features. Experimental results show that the proposed CA-RNN model yields significant improvements over state-of-the-art sequential recommendation methods and context-aware recommendation methods on two public datasets, i.e., the Taobao dataset and the Movielens-1M dataset.

* IEEE International Conference on Data Mining (ICDM) 2016, to apear
##### Cross-modal supervised learning for better acoustic representations

Jan 01, 2020
Shaoyong Jia, Xin Shu, Yang Yang, Dawei Liang, Qiyue Liu, Junhui Liu

Obtaining large-scale human-labeled datasets to train acoustic representation models is a very challenging task. On the contrary, we can easily collect data with machine-generated labels. In this work, we propose to exploit machine-generated labels to learn better acoustic representations, based on the synchronization between vision and audio. Firstly, we collect a large-scale video dataset with 15 million samples, which totally last 16,320 hours. Each video is 3 to 5 seconds in length and annotated automatically by publicly available visual and audio classification models. Secondly, we train various classical convolutional neural networks (CNNs) including VGGish, ResNet 50 and Mobilenet v2. We also make several improvements to VGGish and achieve better results. Finally, we transfer our models on three external standard benchmarks for audio classification task, and achieve significant performance boost over the state-of-the-art results. Models and codes are available at: https://github.com/Deeperjia/vgg-like-audio-models.

##### Personalizing Graph Neural Networks with Attention Mechanism for Session-based Recommendation

Dec 02, 2019
Shu Wu, Mengqi Zhang, Xin Jiang, Xu Ke, Liang Wang

The problem of personalized session-based recommendation aims to predict users' next click based on their sequential behaviors. Existing session-based recommendation methods only consider all sessions of user as a single sequence, ignoring the relationship of among sessions. Other than that, most of them neglect complex transitions of items and the collaborative relationship between users and items. To this end, we propose a novel method, named Personalizing Graph Neural Networks with Attention Mechanism, A-PGNN for brevity. A-PGNN mainly consists of two components: One is Personalizing Graph Neural Network (PGNN), which is used to capture complex transitions in user session sequence. Compared with the traditional Graph Neural Network (GNN) model, it also considers the role of users in the sequence. The other is Dot-Product Attention mechanism, which draws on the attention mechanism in machine translation to explicitly model the effect of historical sessions on the current session. These two parts make it possible to learn the multi-level transition relationships between items and sessions in user-specific fashion. Extensive experiments conducted on two real-world data sets show that A-PGNN significantly outperforms the state-of-the-art personalizing session-based recommendation methods consistently.

##### Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction

Oct 12, 2019
Zekun Li, Zeyu Cui, Shu Wu, Xiaoyu Zhang, Liang Wang

Click-through rate (CTR) prediction is an essential task in web applications such as online advertising and recommender systems, whose features are usually in multi-field form. The key of this task is to model feature interactions among different feature fields. Recently proposed deep learning based models follow a general paradigm: raw sparse input multi-filed features are first mapped into dense field embedding vectors, and then simply concatenated together to feed into deep neural networks (DNN) or other specifically designed networks to learn high-order feature interactions. However, the simple \emph{unstructured combination} of feature fields will inevitably limit the capability to model sophisticated interactions among different fields in a sufficiently flexible and explicit fashion. In this work, we propose to represent the multi-field features in a graph structure intuitively, where each node corresponds to a feature field and different fields can interact through edges. The task of modeling feature interactions can be thus converted to modeling node interactions on the corresponding graph. To this end, we design a novel model Feature Interaction Graph Neural Networks (Fi-GNN). Taking advantage of the strong representative power of graphs, our proposed model can not only model sophisticated feature interactions in a flexible and explicit fashion, but also provide good model explanations for CTR prediction. Experimental results on two real-world datasets show its superiority over the state-of-the-arts.

* 10 pages, accepted by the 2019 Conference on Information and Knowledge Management (CIKM-2019)
##### Semi-supervised Compatibility Learning Across Categories for Clothing Matching

Jul 31, 2019
Zekun Li, Zeyu Cui, Shu Wu, Xiaoyu Zhang, Liang Wang

Learning the compatibility between fashion items across categories is a key task in fashion analysis, which can decode the secret of clothing matching. The main idea of this task is to map items into a latent style space where compatible items stay close. Previous works try to build such a transformation by minimizing the distances between annotated compatible items, which require massive item-level supervision. However, these annotated data are expensive to obtain and hard to cover the numerous items with various styles in real applications. In such cases, these supervised methods fail to achieve satisfactory performances. In this work, we propose a semi-supervised method to learn the compatibility across categories. We observe that the distributions of different categories have intrinsic similar structures. Accordingly, the better distributions align, the closer compatible items across these categories become. To achieve the alignment, we minimize the distances between distributions with unsupervised adversarial learning, and also the distances between some annotated compatible items which play the role of anchor points to help align. Experimental results on two real-world datasets demonstrate the effectiveness of our method.

* 6 pages, 4 figures, accepted by ICME2019
##### Semi-supervised Node Classification via Hierarchical Graph Convolutional Networks

Mar 05, 2019
Fenyu Hu, Yanqiao Zhu, Shu Wu, Liang Wang, Tieniu Tan

Graph convolutional networks (GCNs) have been successfully applied in node classification tasks of network mining. However, most of these models based on neighborhood aggregation are usually shallow and lack the "graph pooling" mechanism, which prevents the model from obtaining adequate global information. In order to increase the receptive field, we propose a novel deep Hierarchical Graph Convolutional Network (H-GCN) for semi-supervised node classification. H-GCN first repeatedly aggregates structurally similar nodes to hyper-nodes and then refines the coarsened graph to the original to restore the representation for each node. Instead of merely aggregating one- or two-hop neighborhood information, the proposed coarsening procedure enlarges the receptive field for each node, hence more global information can be learned. Comprehensive experiments conducted on public datasets demonstrate the effectiveness of the proposed method over the state-of-art methods. Notably, our model gains substantial improvements when only a few labeled samples are provided.

* 7 pages, 3 figures
##### Dressing as a Whole: Outfit Compatibility Learning Based on Node-wise Graph Neural Networks

Feb 21, 2019
Zeyu Cui, Zekun Li, Shu Wu, Xiaoyu Zhang, Liang Wang

With the rapid development of fashion market, the customers' demands of customers for fashion recommendation are rising. In this paper, we aim to investigate a practical problem of fashion recommendation by answering the question "which item should we select to match with the given fashion items and form a compatible outfit". The key to this problem is to estimate the outfit compatibility. Previous works which focus on the compatibility of two items or represent an outfit as a sequence fail to make full use of the complex relations among items in an outfit. To remedy this, we propose to represent an outfit as a graph. In particular, we construct a Fashion Graph, where each node represents a category and each edge represents interaction between two categories. Accordingly, each outfit can be represented as a subgraph by putting items into their corresponding category nodes. To infer the outfit compatibility from such a graph, we propose Node-wise Graph Neural Networks (NGNN) which can better model node interactions and learn better node representations. In NGNN, the node interaction on each edge is different, which is determined by parameters correlated to the two connected nodes. An attention mechanism is utilized to calculate the outfit compatibility score with learned node representations. NGNN can not only be used to model outfit compatibility from visual or textual modality but also from multiple modalities. We conduct experiments on two tasks: (1) Fill-in-the-blank: suggesting an item that matches with existing components of outfit; (2) Compatibility prediction: predicting the compatibility scores of given outfits. Experimental results demonstrate the great superiority of our proposed method over others.

* 11 pages, accepted by the 2019 World Wide Web Conference (WWW-2019)
##### ICE: Information Credibility Evaluation on Social Media via Representation Learning

Oct 24, 2016
Qiang Liu, Shu Wu, Feng Yu, Liang Wang, Tieniu Tan

With the rapid growth of social media, rumors are also spreading widely on social media and bring harm to people's daily life. Nowadays, information credibility evaluation has drawn attention from academic and industrial communities. Current methods mainly focus on feature engineering and achieve some success. However, feature engineering based methods require a lot of labor and cannot fully reveal the underlying relations among data. In our viewpoint, the key elements of user behaviors for evaluating credibility are concluded as "who", "what", "when", and "how". These existing methods cannot model the correlation among different key elements during the spreading of microblogs. In this paper, we propose a novel representation learning method, Information Credibility Evaluation (ICE), to learn representations of information credibility on social media. In ICE, latent representations are learnt for modeling user credibility, behavior types, temporal properties, and comment attitudes. The aggregation of these factors in the microblog spreading process yields the representation of a user's behavior, and the aggregation of these dynamic representations generates the credibility representation of an event spreading on social media. Moreover, a pairwise learning method is applied to maximize the credibility difference between rumors and non-rumors. To evaluate the performance of ICE, we conduct experiments on a Sina Weibo data set, and the experimental results show that our ICE model outperforms the state-of-the-art methods.

* IEEE Transactions on Information Forensics and Security (TIFS), under review
##### GraphAIR: Graph Representation Learning with Neighborhood Aggregation and Interaction

Nov 14, 2019
Fenyu Hu, Yanqiao Zhu, Shu Wu, Weiran Huang, Liang Wang, Tieniu Tan

Graph representation learning is of paramount importance for a variety of graph analytical tasks, ranging from node classification to community detection. Recently, graph convolutional networks (GCNs) have been successfully applied for graph representation learning. These GCNs generate node representation by aggregating features from the neighborhoods, which follows the "neighborhood aggregation" scheme. In spite of having achieved promising performance on various tasks, existing GCN-based models have difficulty in well capturing complicated non-linearity of graph data. In this paper, we first theoretically prove that coefficients of the neighborhood interacting terms are relatively small in current models, which explains why GCNs barely outperforms linear models. Then, in order to better capture the complicated non-linearity of graph data, we present a novel GraphAIR framework which models the neighborhood interaction in addition to neighborhood aggregation. Comprehensive experiments conducted on benchmark tasks including node classification and link prediction using public datasets demonstrate the effectiveness of the proposed method over the state-of-the-art methods.

* 8 pages, in submission to IEEE Transactions on Knowledge and Data Engineering
##### Identification of primary angle-closure on AS-OCT images with Convolutional Neural Networks

Oct 23, 2019
Chenglang Yuan, Cheng Bian, Hongjian Kang, Shu Liang, Kai Ma, Yefeng Zheng

Primary angle-closure disease (PACD) is a severe retinal disease, which might cause irreversible vision loss. In clinic, accurate identification of angle-closure and localization of the scleral spur's position on anterior segment optical coherence tomography (AS-OCT) is essential for the diagnosis of PACD. However, manual delineation might confine in low accuracy and low efficiency. In this paper, we propose an efficient and accurate end-to-end architecture for angle-closure classification and scleral spur localization. Specifically, we utilize a revised ResNet152 as our backbone to improve the accuracy of the angle-closure identification. For scleral spur localization, we adopt EfficientNet as encoder because of its powerful feature extraction potential. By combining the skip-connect module and pyramid pooling module, the network is able to collect semantic cues in feature maps from multiple dimensions and scales. Afterward, we propose a novel keypoint registration loss to constrain the model's attention to the intensity and location of the scleral spur area. Several experiments are extensively conducted to evaluate our method on the angle-closure glaucoma evaluation (AGE) Challenge dataset. The results show that our proposed architecture ranks the first place of the classification task on the test dataset and achieves the average Euclidean distance error of 12.00 pixels in the scleral spur localization task.

* The third place in angle-closure glaucoma evaluation (AGE) Challenge, MICCAI 2019