Models, code, and papers for "Yu-Kun Lai":

Robust Non-Rigid Registration With Reweighted Dual Sparsities

Mar 15, 2017
Jingyu Yang, Kun Li, Yu-Kun Lai, Daoliang Guo

Non-rigid registration is challenging because it is ill-posed with high degrees of freedom and is thus sensitive to noise and outliers. We propose a robust non-rigid registration method using reweighted sparsities on position and transformation to estimate the deformations between 3-D shapes. We formulate the energy function with dual sparsities on both the data term and the smoothness term, and define the smoothness constraint using local rigidity. The dual-sparsity based non-rigid registration model is enhanced with a reweighting scheme, and solved by transferring the model into some alternating optimized subproblems which have exact solutions and guaranteed convergence. Experimental results on both public datasets and real scanned datasets show that our method outperforms the state-of-the-art methods and is more robust to noise and outliers than conventional non-rigid registration methods.

* 12 pages, submitted to IEEE Transactions on Visualization and Computer Graphics 

  Access Model/Code and Paper
Mesh Variational Autoencoders with Edge Contraction Pooling

Aug 07, 2019
Yu-Jie Yuan, Yu-Kun Lai, Jie Yang, Hongbo Fu, Lin Gao

3D shape analysis is an important research topic in computer vision and graphics. While existing methods have generalized image-based deep learning to meshes using graph-based convolutions, the lack of an effective pooling operation restricts the learning capability of their networks. In this paper, we propose a novel pooling operation for mesh datasets with the same connectivity but different geometry, by building a mesh hierarchy using mesh simplification. For this purpose, we develop a modified mesh simplification method to avoid generating highly irregularly sized triangles. Our pooling operation effectively encodes the correspondence between coarser and finer meshes in the hierarchy. We then present a variational auto-encoder structure with the edge contraction pooling and graph-based convolutions, to explore probability latent spaces of 3D surfaces. Our network requires far fewer parameters than the original mesh VAE and thus can handle denser models thanks to our new pooling operation and convolutional kernels. Our evaluation also shows that our method has better generalization ability and is more reliable in various applications, including shape generation, shape interpolation and shape embedding.

  Access Model/Code and Paper
3D-CariGAN: An End-to-End Solution to 3D Caricature Generation from Face Photos

Mar 15, 2020
Zipeng Ye, Ran Yi, Minjing Yu, Juyong Zhang, Yu-Kun Lai, Yong-jin Liu

Caricature is a kind of artistic style of human faces that attracts considerable research in computer vision. So far all existing 3D caricature generation methods require some information related to caricature as input, e.g., a caricature sketch or 2D caricature. However, this kind of input is difficult to provide by non-professional users. In this paper, we propose an end-to-end deep neural network model to generate high-quality 3D caricature with a simple face photo as input. The most challenging issue in our system is that the source domain of face photos (characterized by 2D normal faces) is significantly different from the target domain of 3D caricatures (characterized by 3D exaggerated face shapes and texture). To address this challenge, we (1) build a large dataset of 6,100 3D caricature meshes and use it to establish a PCA model in the 3D caricature shape space and (2) detect landmarks in the input face photo and use them to set up correspondence between 2D caricature and 3D caricature shape. Our system can automatically generate high-quality 3D caricatures. In many situations, users want to control the output by a simple and intuitive way, so we further introduce a simple-to-use interactive control with three horizontal and one vertical lines. Experiments and user studies show that our system is easy to use and can generate high-quality 3D caricatures.

  Access Model/Code and Paper
Deep Line Art Video Colorization with a Few References

Mar 24, 2020
Min Shi, Jia-Qi Zhang, Shu-Yu Chen, Lin Gao, Yu-Kun Lai, Fang-Lue Zhang

Coloring line art images based on the colors of reference images is an important stage in animation production, which is time-consuming and tedious. In this paper, we propose a deep architecture to automatically color line art videos with the same color style as the given reference images. Our framework consists of a color transform network and a temporal constraint network. The color transform network takes the target line art images as well as the line art and color images of one or more reference images as input, and generates corresponding target color images. To cope with larger differences between the target line art image and reference color images, our architecture utilizes non-local similarity matching to determine the region correspondences between the target image and the reference images, which are used to transform the local color information from the references to the target. To ensure global color style consistency, we further incorporate Adaptive Instance Normalization (AdaIN) with the transformation parameters obtained from a style embedding vector that describes the global color style of the references, extracted by an embedder. The temporal constraint network takes the reference images and the target image together in chronological order, and learns the spatiotemporal features through 3D convolution to ensure the temporal consistency of the target image and the reference image. Our model can achieve even better coloring results by fine-tuning the parameters with only a small amount of samples when dealing with an animation of a new style. To evaluate our method, we build a line art coloring dataset. Experiments show that our method achieves the best performance on line art video coloring compared to the state-of-the-art methods and other baselines.

  Access Model/Code and Paper
PRS-Net: Planar Reflective Symmetry Detection Net for 3D Models

Oct 24, 2019
Lin Gao, Ling-Xiao Zhang, Hsien-Yu Meng, Yi-Hui Ren, Yu-Kun Lai, Leif Kobbelt

In geometry processing, symmetry is the universally high-level structural information of the 3d models and benefits many geometry processing tasks including shape segmentation, alignment, matching, completion, e.g.. Thus it is an important problem to analyze various forms of the symmetry of 3D shapes. The planar reflective symmetry is the most fundamental one. Traditional methods based on spatial sampling can be time consuming and may not be able to identify all the symmetry planes. In this paper, we present a novel learning framework to automatically discover global planar reflective symmetry of a 3D shape. Our framework trains an unsupervised 3D convolutional neural network to extract global model features and then outputs possible global symmetry parameters, where input shapes are represented using voxels. We introduce a dedicated symmetry distance loss along with a regularization loss to avoid generating duplicated symmetry planes. Our network can also identify isotropic shapes by predicting their rotation axes. We further provide a method to remove invalid and duplicated planes and axes. We demonstrate that our method is able to produce reliable and accurate results. Our neural network-based method is hundreds of times faster than the state-of-the-art method, which is based on sampling. Our method is also robust even with noisy or incomplete input surfaces.

* Corrected typos 

  Access Model/Code and Paper
SDM-NET: Deep Generative Network for Structured Deformable Mesh

Sep 03, 2019
Lin Gao, Jie Yang, Tong Wu, Yu-Jie Yuan, Hongbo Fu, Yu-Kun Lai, Hao Zhang

We introduce SDM-NET, a deep generative neural network which produces structured deformable meshes. Specifically, the network is trained to generate a spatial arrangement of closed, deformable mesh parts, which respect the global part structure of a shape collection, e.g., chairs, airplanes, etc. Our key observation is that while the overall structure of a 3D shape can be complex, the shape can usually be decomposed into a set of parts, each homeomorphic to a box, and the finer-scale geometry of the part can be recovered by deforming the box. The architecture of SDM-NET is that of a two-level variational autoencoder (VAE). At the part level, a PartVAE learns a deformable model of part geometries. At the structural level, we train a Structured Parts VAE (SP-VAE), which jointly learns the part structure of a shape collection and the part geometries, ensuring a coherence between global shape structure and surface details. Through extensive experiments and comparisons with the state-of-the-art deep generative models of shapes, we demonstrate the superiority of SDM-NET in generating meshes with visual quality, flexible topology, and meaningful structures, which benefit shape interpolation and other subsequently modeling tasks.

* Conditionally Accepted to Siggraph Asia 2019 

  Access Model/Code and Paper
Learning to Reconstruct and Understand Indoor Scenes from Sparse Views

Jun 19, 2019
Jingyu Yang, Ji Xu, Kun Li, Yu-Kun Lai, Huanjing Yue, Jianzhi Lu, Hao Wu, Yebin Liu

This paper proposes a new method for simultaneous 3D reconstruction and semantic segmentation of indoor scenes. Unlike existing methods that require recording a video using a color camera and/or a depth camera, our method only needs a small number of (e.g., 3-5) color images from uncalibrated sparse views as input, which greatly simplifies data acquisition and extends applicable scenarios. Since different views have limited overlaps, our method allows a single image as input to discern the depth and semantic information of the scene. The key issue is how to recover relatively accurate depth from single images and reconstruct a 3D scene by fusing very few depth maps. To address this problem, we first design an iterative deep architecture, IterNet, that estimates depth and semantic segmentation alternately, so that they benefit each other. To deal with the little overlap and non-rigid transformation between views, we further propose a joint global and local registration method to reconstruct a 3D scene with semantic information from sparse views. We also make available a new indoor synthetic dataset simultaneously providing photorealistic high-resolution RGB images, accurate depth maps and pixel-level semantic labels for thousands of complex layouts, useful for training and evaluation. Experimental results on public datasets and our dataset demonstrate that our method achieves more accurate depth estimation, smaller semantic segmentation errors and better 3D reconstruction results, compared with state-of-the-art methods.

  Access Model/Code and Paper
Learning from Web Data: the Benefit of Unsupervised Object Localization

Dec 21, 2018
Xiaoxiao Sun, Liang Zheng, Yu-Kun Lai, Jufeng Yang

Annotating a large number of training images is very time-consuming. In this background, this paper focuses on learning from easy-to-acquire web data and utilizes the learned model for fine-grained image classification in labeled datasets. Currently, the performance gain from training with web data is incremental, like a common saying "better than nothing, but not by much". Conventionally, the community looks to correcting the noisy web labels to select informative samples. In this work, we first systematically study the built-in gap between the web and standard datasets, i.e. different data distributions between the two kinds of data. Then, in addition to using web labels, we present an unsupervised object localization method, which provides critical insights into the object density and scale in web images. Specifically, we design two constraints on web data to substantially reduce the difference of data distributions for the web and standard datasets. First, we present a method to control the scale, localization and number of objects in the detected region. Second, we propose to select the regions containing objects that are consistent with the web tag. Based on the two constraints, we are able to process web images to reduce the gap, and the processed web data is used to better assist the standard dataset to train CNNs. Experiments on several fine-grained image classification datasets confirm that our method performs favorably against the state-of-the-art methods.

* 13 pages, 9 figures, 6 tables 

  Access Model/Code and Paper
MW-GAN: Multi-Warping GAN for Caricature Generation with Multi-Style Geometric Exaggeration

Jan 07, 2020
Haodi Hou, Jing Huo, Jing Wu, Yu-Kun Lai, Yang Gao

Given an input face photo, the goal of caricature generation is to produce stylized, exaggerated caricatures that share the same identity as the photo. It requires simultaneous style transfer and shape exaggeration with rich diversity, and meanwhile preserving the identity of the input. To address this challenging problem, we propose a novel framework called Multi-Warping GAN (MW-GAN), including a style network and a geometric network that are designed to conduct style transfer and geometric exaggeration respectively. We bridge the gap between the style and landmarks of an image with corresponding latent code spaces by a dual way design, so as to generate caricatures with arbitrary styles and geometric exaggeration, which can be specified either through random sampling of latent code or from a given caricature sample. Besides, we apply identity preserving loss to both image space and landmark space, leading to a great improvement in quality of generated caricatures. Experiments show that caricatures generated by MW-GAN have better quality than existing methods.

  Access Model/Code and Paper
A Deep Learning Driven Active Framework for Segmentation of Large 3D Shape Collections

Jul 17, 2018
David George, Xianguha Xie, Yu-Kun Lai, Gary KL Tam

High-level shape understanding and technique evaluation on large repositories of 3D shapes often benefit from additional information known about the shapes. One example of such information is the semantic segmentation of a shape into functional or meaningful parts. Generating accurate segmentations with meaningful segment boundaries is, however, a costly process, typically requiring large amounts of user time to achieve high quality results. In this paper we present an active learning framework for large dataset segmentation, which iteratively provides the user with new predictions by training new models based on already segmented shapes. Our proposed pipeline consists of three novel components. First, we a propose a fast and relatively accurate feature-based deep learning model to provide dataset-wide segmentation predictions. Second, we propose an information theory measure to estimate the prediction quality and for ordering subsequent fast and meaningful shape selection. Our experiments show that such suggestive ordering helps reduce users time and effort, produce high quality predictions, and construct a model that generalizes well. Finally, we provide effective segmentation refinement features to help the user quickly correct any incorrect predictions. We show that our framework is more accurate and in general more efficient than state-of-the-art, for massive dataset segmentation with while also providing consistent segment boundaries.

* 16 pages, 17 figures, 5 tables 

  Access Model/Code and Paper
Alive Caricature from 2D to 3D

Mar 27, 2018
Qianyi Wu, Juyong Zhang, Yu-Kun Lai, Jianmin Zheng, Jianfei Cai

Caricature is an art form that expresses subjects in abstract, simple and exaggerated view. While many caricatures are 2D images, this paper presents an algorithm for creating expressive 3D caricatures from 2D caricature images with a minimum of user interaction. The key idea of our approach is to introduce an intrinsic deformation representation that has a capacity of extrapolation enabling us to create a deformation space from standard face dataset, which maintains face constraints and meanwhile is sufficiently large for producing exaggerated face models. Built upon the proposed deformation representation, an optimization model is formulated to find the 3D caricature that captures the style of the 2D caricature image automatically. The experiments show that our approach has better capability in expressing caricatures than those fitting approaches directly using classical parametric face models such as 3DMM and FaceWareHouse. Moreover, our approach is based on standard face datasets and avoids constructing complicated 3D caricature training set, which provides great flexibility in real applications.

* Accepted to CVPR 2018 

  Access Model/Code and Paper
Learning-based Real-time Detection of Intrinsic Reflectional Symmetry

Nov 01, 2019
Yi-Ling Qiao, Lin Gao, Shu-Zhi Liu, Ligang Liu, Yu-Kun Lai, Xilin Chen

Reflectional symmetry is ubiquitous in nature. While extrinsic reflectional symmetry can be easily parametrized and detected, intrinsic symmetry is much harder due to the high solution space. Previous works usually solve this problem by voting or sampling, which suffer from high computational cost and randomness. In this paper, we propose \YL{a} learning-based approach to intrinsic reflectional symmetry detection. Instead of directly finding symmetric point pairs, we parametrize this self-isometry using a functional map matrix, which can be easily computed given the signs of Laplacian eigenfunctions under the symmetric mapping. Therefore, we train a novel deep neural network to predict the sign of each eigenfunction under symmetry, which in addition takes the first few eigenfunctions as intrinsic features to characterize the mesh while avoiding coping with the connectivity explicitly. Our network aims at learning the global property of functions, and consequently converts the problem defined on the manifold to the functional domain. By disentangling the prediction of the matrix into separated basis, our method generalizes well to new shapes and is invariant under perturbation of eigenfunctions. Through extensive experiments, we demonstrate the robustness of our method in challenging cases, including different topology and incomplete shapes with holes. By avoiding random sampling, our learning-based algorithm is over 100 times faster than state-of-the-art methods, and meanwhile, is more robust, achieving higher correspondence accuracy in commonly used metrics.

  Access Model/Code and Paper
LaplacianNet: Learning on 3D Meshes with Laplacian Encoding and Pooling

Oct 30, 2019
Yi-Ling Qiao, Lin Gao, Jie Yang, Paul L. Rosin, Yu-Kun Lai, Xilin Chen

3D models are commonly used in computer vision and graphics. With the wider availability of mesh data, an efficient and intrinsic deep learning approach to processing 3D meshes is in great need. Unlike images, 3D meshes have irregular connectivity, requiring careful design to capture relations in the data. To utilize the topology information while staying robust under different triangulation, we propose to encode mesh connectivity using Laplacian spectral analysis, along with Mesh Pooling Blocks (MPBs) that can split the surface domain into local pooling patches and aggregate global information among them. We build a mesh hierarchy from fine to coarse using Laplacian spectral clustering, which is flexible under isometric transformation. Inside the MPBs there are pooling layers to collect local information and multi-layer perceptrons to compute vertex features with increasing complexity. To obtain the relationships among different clusters, we introduce a Correlation Net to compute a correlation matrix, which can aggregate the features globally by matrix multiplication with cluster features. Our network architecture is flexible enough to be used on meshes with different numbers of vertices. We conduct several experiments including shape segmentation and classification, and our LaplacianNet outperforms state-of-the-art algorithms for these tasks on ShapeNet and COSEG datasets.

  Access Model/Code and Paper