Models, code, and papers for "Shichen Liu":
Rendering bridges the gap between 2D vision and 3D scenes by simulating the physical process of image formation. By inverting such renderer, one can think of a learning approach to infer 3D information from 2D images. However, standard graphics renderers involve a fundamental discretization step called rasterization, which prevents the rendering process to be differentiable, hence able to be learned. Unlike the state-of-the-art differentiable renderers, which only approximate the rendering gradient in the back propagation, we propose a truly differentiable rendering framework that is able to (1) directly render colorized mesh using differentiable functions and (2) back-propagate efficient supervision signals to mesh vertices and their attributes from various forms of image representations, including silhouette, shading and color images. The key to our framework is a novel formulation that views rendering as an aggregation function that fuses the probabilistic contributions of all mesh triangles with respect to the rendered pixels. Such formulation enables our framework to flow gradients to the occluded and far-range vertices, which cannot be achieved by the previous state-of-the-arts. We show that by using the proposed renderer, one can achieve significant improvement in 3D unsupervised single-view reconstruction both qualitatively and quantitatively. Experiments also demonstrate that our approach is able to handle the challenging tasks in image-based shape fitting, which remain nontrivial to existing differentiable renderers.
Rendering is the process of generating 2D images from 3D assets, simulated in a virtual environment, typically with a graphics pipeline. By inverting such renderer, one can think of a learning approach to predict a 3D shape from an input image. However, standard rendering pipelines involve a fundamental discretization step called rasterization, which prevents the rendering process to be differentiable, hence suitable for learning. We present the first non-parametric and truly differentiable rasterizer based on silhouettes. Our method enables unsupervised learning for high-quality 3D mesh reconstruction from a single image. We call our framework `soft rasterizer' as it provides an accurate soft approximation of the standard rasterizer. The key idea is to fuse the probabilistic contributions of all mesh triangles with respect to the rendered pixels. When combined with a mesh generator in a deep neural network, our soft rasterizer is able to generate an approximated silhouette of the generated polygon mesh in the forward pass. The rendering loss is back-propagated to supervise the mesh generation without the need of 3D training data. Experimental results demonstrate that our approach significantly outperforms the state-of-the-art unsupervised techniques, both quantitatively and qualitatively. We also show that our soft rasterizer can achieve comparable results to the cutting-edge supervised learning method and in various cases even better ones, especially for real-world data.
Recent advances in 3D deep learning have shown that it is possible to train highly effective deep models for 3D shape generation, directly from 2D images. This is particularly interesting since the availability of 3D models is still limited compared to the massive amount of accessible 2D images, which is invaluable for training. The representation of 3D surfaces itself is a key factor for the quality and resolution of the 3D output. While explicit representations, such as point clouds and voxels, can span a wide range of shape variations, their resolutions are often limited. Mesh-based representations are more efficient but are limited by their ability to handle varying topologies. Implicit surfaces, however, can robustly handle complex shapes, topologies, and also provide flexible resolution control. We address the fundamental problem of learning implicit surfaces for shape inference without the need of 3D supervision. Despite their advantages, it remains nontrivial to (1) formulate a differentiable connection between implicit surfaces and their 2D renderings, which is needed for image-based supervision; and (2) ensure precise geometric properties and control, such as local smoothness. In particular, sampling implicit surfaces densely is also known to be a computationally demanding and very slow operation. To this end, we propose a novel ray-based field probing technique for efficient image-to-field supervision, as well as a general geometric regularizer for implicit surfaces, which provides natural shape priors in unconstrained regions. We demonstrate the effectiveness of our framework on the task of single-view image-based 3D shape digitization and show how we outperform state-of-the-art techniques both quantitatively and qualitatively.
In the 'Big Data' era, many real-world applications like search involve the ranking problem for a large number of items. It is important to obtain effective ranking results and at the same time obtain the results efficiently in a timely manner for providing good user experience and saving computational costs. Valuable prior research has been conducted for learning to efficiently rank like the cascade ranking (learning) model, which uses a sequence of ranking functions to progressively filter some items and rank the remaining items. However, most existing research of learning to efficiently rank in search is studied in a relatively small computing environments with simulated user queries. This paper presents novel research and thorough study of designing and deploying a Cascade model in a Large-scale Operational E-commerce Search application (CLOES), which deals with hundreds of millions of user queries per day with hundreds of servers. The challenge of the real-world application provides new insights for research: 1). Real-world search applications often involve multiple factors of preferences or constraints with respect to user experience and computational costs such as search accuracy, search latency, size of search results and total CPU cost, while most existing search solutions only address one or two factors; 2). Effectiveness of e-commerce search involves multiple types of user behaviors such as click and purchase, while most existing cascade ranking in search only models the click behavior. Based on these observations, a novel cascade ranking model is designed and deployed in an operational e-commerce search application. An extensive set of experiments demonstrate the advantage of the proposed work to address multiple factors of effectiveness, efficiency and user experience in the real-world application.
Deep neural networks are increasingly used on mobile devices, where computational resources are limited. In this paper we develop CondenseNet, a novel network architecture with unprecedented efficiency. It combines dense connectivity with a novel module called learned group convolution. The dense connectivity facilitates feature re-use in the network, whereas learned group convolutions remove connections between layers for which this feature re-use is superfluous. At test time, our model can be implemented using standard group convolutions, allowing for efficient computation in practice. Our experiments show that CondenseNets are far more efficient than state-of-the-art compact convolutional networks such as MobileNets and ShuffleNets.
Tasks such as search and recommendation have become increas- ingly important for E-commerce to deal with the information over- load problem. To meet the diverse needs of di erent users, person- alization plays an important role. In many large portals such as Taobao and Amazon, there are a bunch of di erent types of search and recommendation tasks operating simultaneously for person- alization. However, most of current techniques address each task separately. This is suboptimal as no information about users shared across di erent tasks. In this work, we propose to learn universal user representations across multiple tasks for more e ective personalization. In partic- ular, user behavior sequences (e.g., click, bookmark or purchase of products) are modeled by LSTM and attention mechanism by integrating all the corresponding content, behavior and temporal information. User representations are shared and learned in an end-to-end setting across multiple tasks. Bene ting from better information utilization of multiple tasks, the user representations are more e ective to re ect their interests and are more general to be transferred to new tasks. We refer this work as Deep User Perception Network (DUPN) and conduct an extensive set of o ine and online experiments. Across all tested ve di erent tasks, our DUPN consistently achieves better results by giving more e ective user representations. Moreover, we deploy DUPN in large scale operational tasks in Taobao. Detailed implementations, e.g., incre- mental model updating, are also provided to address the practical issues for the real world applications.
This document describes our solution for the VATEX Captioning Challenge 2019, which requires generating descriptions for the videos in both English and Chinese languages. We identified three crucial factors that improve the performance, namely: multi-view features, hybrid reward, and diverse ensemble. Our method achieves the 2nd and the 3rd places on the Chinese and English video captioning tracks, respectively.
Ranking is a fundamental and widely studied problem in scenarios such as search, advertising, and recommendation. However, joint optimization for multi-scenario ranking, which aims to improve the overall performance of several ranking strategies in different scenarios, is rather untouched. Separately optimizing each individual strategy has two limitations. The first one is lack of collaboration between scenarios meaning that each strategy maximizes its own objective but ignores the goals of other strategies, leading to a sub-optimal overall performance. The second limitation is the inability of modeling the correlation between scenarios meaning that independent optimization in one scenario only uses its own user data but ignores the context in other scenarios. In this paper, we formulate multi-scenario ranking as a fully cooperative, partially observable, multi-agent sequential decision problem. We propose a novel model named Multi-Agent Recurrent Deterministic Policy Gradient (MA-RDPG) which has a communication component for passing messages, several private actors (agents) for making actions for ranking, and a centralized critic for evaluating the overall performance of the co-working actors. Each scenario is treated as an agent (actor). Agents collaborate with each other by sharing a global action-value function (the critic) and passing messages that encodes historical information across scenarios. The model is evaluated with online settings on a large E-commerce platform. Results show that the proposed model exhibits significant improvements against baselines in terms of the overall performance.