Research papers and code for "Yu-Shen Liu":
Deep learning has achieved remarkable results in 3D shape analysis by learning global shape features from the pixel-level over multiple views. Previous methods, however, compute low-level features for entire views without considering part-level information. In contrast, we propose a deep neural network, called Parts4Feature, to learn 3D global features from part-level information in multiple views. We introduce a novel definition of generally semantic parts, which Parts4Feature learns to detect in multiple views from different 3D shape segmentation benchmarks. A key idea of our architecture is that it transfers the ability to detect semantically meaningful parts in multiple views to learn 3D global features. Parts4Feature achieves this by combining a local part detection branch and a global feature learning branch with a shared region proposal module. The global feature learning branch aggregates the detected parts in terms of learned part patterns with a novel multi-attention mechanism, while the region proposal module enables locally and globally discriminative information to be promoted by each other. We demonstrate that Parts4Feature outperforms the state-of-the-art under three large-scale 3D shape benchmarks.

* To appear at IJCAI2019
Click to Read Paper and Get Code
Exploring contextual information in the local region is important for shape understanding and analysis. Existing studies often employ hand-crafted or explicit ways to encode contextual information of local regions. However, it is hard to capture fine-grained contextual information in hand-crafted or explicit manners, such as the correlation between different areas in a local region, which limits the discriminative ability of learned features. To resolve this issue, we propose a novel deep learning model for 3D point clouds, named Point2Sequence, to learn 3D shape features by capturing fine-grained contextual information in a novel implicit way. Point2Sequence employs a novel sequence learning model for point clouds to capture the correlations by aggregating multi-scale areas of each local region with attention. Specifically, Point2Sequence first learns the feature of each area scale in a local region. Then, it captures the correlation between area scales in the process of aggregating all area scales using a recurrent neural network (RNN) based encoder-decoder structure, where an attention mechanism is proposed to highlight the importance of different area scales. Experimental results show that Point2Sequence achieves state-of-the-art performance in shape classification and segmentation tasks.

* To be published in AAAI 2019
Click to Read Paper and Get Code
In this paper we present a novel unsupervised representation learning approach for 3D shapes, which is an important research challenge as it avoids the manual effort required for collecting supervised data. Our method trains an RNN-based neural network architecture to solve multiple view inter-prediction tasks for each shape. Given several nearby views of a shape, we define view inter-prediction as the task of predicting the center view between the input views, and reconstructing the input views in a low-level feature space. The key idea of our approach is to implement the shape representation as a shape-specific global memory that is shared between all local view inter-predictions for each shape. Intuitively, this memory enables the system to aggregate information that is useful to better solve the view inter-prediction tasks for each shape, and to leverage the memory as a view-independent shape representation. Our approach obtains the best results using a combination of L_2 and adversarial losses for the view inter-prediction task. We show that VIP-GAN outperforms state-of-the-art methods in unsupervised 3D feature learning on three large scale 3D shape benchmarks.

* To be published at AAAI 2019
Click to Read Paper and Get Code
A recent method employs 3D voxels to represent 3D shapes, but this limits the approach to low resolutions due to the computational cost caused by the cubic complexity of 3D voxels. Hence the method suffers from a lack of detailed geometry. To resolve this issue, we propose Y^2Seq2Seq, a view-based model, to learn cross-modal representations by joint reconstruction and prediction of view and word sequences. Specifically, the network architecture of Y^2Seq2Seq bridges the semantic meaning embedded in the two modalities by two coupled `Y' like sequence-to-sequence (Seq2Seq) structures. In addition, our novel hierarchical constraints further increase the discriminability of the cross-modal representations by employing more detailed discriminative information. Experimental results on cross-modal retrieval and 3D shape captioning show that Y^2Seq2Seq outperforms the state-of-the-art methods.

* To be pubilished at AAAI 2019
Click to Read Paper and Get Code
Learning global features by aggregating information over multiple views has been shown to be effective for 3D shape analysis. For view aggregation in deep learning models, pooling has been applied extensively. However, pooling leads to a loss of the content within views, and the spatial relationship among views, which limits the discriminability of learned features. We propose 3DViewGraph to resolve this issue, which learns 3D global features by more effectively aggregating unordered views with attention. Specifically, unordered views taken around a shape are regarded as view nodes on a view graph. 3DViewGraph first learns a novel latent semantic mapping to project low-level view features into meaningful latent semantic embeddings in a lower dimensional space, which is spanned by latent semantic patterns. Then, the content and spatial information of each pair of view nodes are encoded by a novel spatial pattern correlation, where the correlation is computed among latent semantic patterns. Finally, all spatial pattern correlations are integrated with attention weights learned by a novel attention mechanism. This further increases the discriminability of learned features by highlighting the unordered view nodes with distinctive characteristics and depressing the ones with appearance ambiguity. We show that 3DViewGraph outperforms state-of-the-art methods under three large-scale benchmarks.

* To appear at IJCAI2019
Click to Read Paper and Get Code
Deep Convolutional Neural Networks (CNNs) are widely employed in modern computer vision algorithms, where the input image is convolved iteratively by many kernels to extract the knowledge behind it. However, with the depth of convolutional layers getting deeper and deeper in recent years, the enormous computational complexity makes it difficult to be deployed on embedded systems with limited hardware resources. In this paper, we propose two computation-performance optimization methods to reduce the redundant convolution kernels of a CNN with performance and architecture constraints, and apply it to a network for super resolution (SR). Using PSNR drop compared to the original network as the performance criterion, our method can get the optimal PSNR under a certain computation budget constraint. On the other hand, our method is also capable of minimizing the computation required under a given PSNR drop.

* This paper was accepted by 2018 The International Symposium on Circuits and Systems (ISCAS)
Click to Read Paper and Get Code