Models, code, and papers for "Lei Li":
It has been shown that gradient descent can yield the zero training loss in the over-parametrized regime (the width of the neural networks is much larger than the number of data points). In this work, combining the ideas of some existing works, we investigate the gradient descent method for training two-layer neural networks for approximating some target continuous functions. By making use the generic chaining technique from probability theory, we show that gradient descent can yield an exponential convergence rate, while the width of the neural networks needed is independent of the size of the training data. The result also implies some strong approximation ability of the two-layer neural networks without curse of dimensionality.
In this paper, we focus on the task of generating a pun sentence given a pair of word senses. A major challenge for pun generation is the lack of large-scale pun corpus to guide the supervised learning. To remedy this, we propose an adversarial generative network for pun generation (Pun-GAN), which does not require any pun corpus. It consists of a generator to produce pun sentences, and a discriminator to distinguish between the generated pun sentences and the real sentences with specific word senses. The output of the discriminator is then used as a reward to train the generator via reinforcement learning, encouraging it to produce pun sentences that can support two word senses simultaneously. Experiments show that the proposed Pun-GAN can generate sentences that are more ambiguous and diverse in both automatic and human evaluation.
With billions of personal images being generated from social media and cameras of all sorts on a daily basis, security and privacy are unprecedentedly challenged. Although extensive attempts have been made, existing face image de-identification techniques are either insufficient in photo-reality or incapable of balancing privacy and usability qualitatively and quantitatively, i.e., they fail to answer counterfactual questions such as "is it private now?", "how private is it?", and "can it be more private?" In this paper, we propose a novel framework called AnonymousNet, with an effort to address these issues systematically, balance usability, and enhance privacy in a natural and measurable manner. The framework encompasses four stages: facial attribute estimation, privacy-metric-oriented face obfuscation, directed natural image synthesis, and adversarial perturbation. Not only do we achieve the state-of-the-arts in terms of image quality and attribute prediction accuracy, we are also the first to show that facial privacy is measurable, can be factorized, and accordingly be manipulated in a photo-realistic fashion to fulfill different requirements and application scenarios. Experiments further demonstrate the effectiveness of the proposed framework.
In this paper, we study a multi-step interactive recommendation problem, where the item recommended at current step may affect the quality of future recommendations. To address the problem, we develop a novel and effective approach, named CFRL, which seamlessly integrates the ideas of both collaborative filtering (CF) and reinforcement learning (RL). More specifically, we first model the recommender-user interactive recommendation problem as an agent-environment RL task, which is mathematically described by a Markov decision process (MDP). Further, to achieve collaborative recommendations for the entire user community, we propose a novel CF-based MDP by encoding the states of all users into a shared latent vector space. Finally, we propose an effective Q-network learning method to learn the agent's optimal policy based on the CF-based MDP. The capability of CFRL is demonstrated by comparing its performance against a variety of existing methods on real-world datasets.
Despite their exceptional flexibility and popularity, the Monte Carlo methods often suffer from slow mixing times for challenging statistical physics problems. We present a general strategy to overcome this difficulty by adopting ideas and techniques from the machine learning community. We fit the unnormalized probability of the physical model to a feedforward neural network and reinterpret the architecture as a restricted Boltzmann machine. Then, exploiting its feature detection ability, we utilize the restricted Boltzmann machine for efficient Monte Carlo updates and to speed up the simulation of the original physical system. We implement these ideas for the Falicov-Kimball model and demonstrate improved acceptance ratio and autocorrelation time near the phase transition point.
Very short-term convective storm forecasting, termed nowcasting, has long been an important issue and has attracted substantial interest. Existing nowcasting methods rely principally on radar images and are limited in terms of nowcasting storm initiation and growth. Real-time re-analysis of meteorological data supplied by numerical models provides valuable information about three-dimensional (3D), atmospheric, boundary layer thermal dynamics, such as temperature and wind. To mine such data, we here develop a convolution-recurrent, hybrid deep-learning method with the following characteristics: (1) the use of cell-based oversampling to increase the number of training samples; this mitigates the class imbalance issue; (2) the use of both raw 3D radar data and 3D meteorological data re-analyzed via multi-source 3D convolution without any need for handcraft feature engineering; and (3) the stacking of convolutional neural networks on a long short-term memory encoder/decoder that learns the spatiotemporal patterns of convective processes. Experimental results demonstrated that our method performs better than other extrapolation methods. Qualitative analysis yielded encouraging nowcasting results.
We present a variational renormalization group (RG) approach using a deep generative model based on normalizing flows. The model performs hierarchical change-of-variables transformations from the physical space to a latent space with reduced mutual information. Conversely, the neural net directly maps independent Gaussian noises to physical configurations following the inverse RG flow. The model has an exact and tractable likelihood, which allows unbiased training and direct access to the renormalized energy function of the latent variables. To train the model, we employ probability density distillation for the bare energy function of the physical problem, in which the training loss provides a variational upper bound of the physical free energy. We demonstrate practical usage of the approach by identifying mutually independent collective variables of the Ising model and performing accelerated hybrid Monte Carlo sampling in the latent space. Lastly, we comment on the connection of the present approach to the wavelet formulation of RG and the modern pursuit of information preserving RG.
This paper is about labeling video frames with action classes under weak supervision in training, where we have access to a temporal ordering of actions, but their start and end frames in training videos are unknown. Following prior work, we use an HMM grounded on a Gated Recurrent Unit (GRU) for frame labeling. Our key contribution is a new constrained discriminative forward loss (CDFL) that we use for training the HMM and GRU under weak supervision. While prior work typically estimates the loss on a single, inferred video segmentation, our CDFL discriminates between the energy of all valid and invalid frame labelings of a training video. A valid frame labeling satisfies the ground-truth temporal ordering of actions, whereas an invalid one violates the ground truth. We specify an efficient recursive algorithm for computing the CDFL in terms of the logadd function of the segmentation energy. Our evaluation on action segmentation and alignment gives superior results to those of the state of the art on the benchmark Breakfast Action, Hollywood Extended, and 50Salads datasets.
The estimation of advantage is crucial for a number of reinforcement learning algorithms, as it directly influences the choices of future paths. In this work, we propose a family of estimates based on the order statistics over the path ensemble, which allows one to flexibly drive the learning process, towards or against risks. On top of this formulation, we systematically study the impacts of different methods for estimating advantages. Our findings reveal that biased estimates, when chosen appropriately, can result in significant benefits. In particular, for the environments with sparse rewards, optimistic estimates would lead to more efficient exploration of the policy space; while for those where individual actions can have critical impacts, conservative estimates are preferable. On various benchmarks, including MuJoCo continuous control, Terrain locomotion, Atari games, and sparse-reward environments, the proposed biased estimation schemes consistently demonstrate improvement over mainstream methods, not only accelerating the learning process but also obtaining substantial performance gains.
Remarkable gains in deep learning usually rely on tremendous supervised data. Ensuring the modality diversity for one object in training set is critical for the generalization of cutting-edge deep models, but it burdens human with heavy manual labor on data collection and annotation. In addition, some rare or unexpected modalities are new for the current model, causing reduced performance under such emerging modalities. Inspired by the achievements in speech recognition, psychology and behavioristics, we present a practical solution, self-reinforcing unsupervised matching (SUM), to annotate the images with 2D structure-preserving property in an emerging modality by cross-modality matching. This approach requires no any supervision in emerging modality and only one template in seen modality, providing a possible route towards continual learning.
Neural style transfer has been demonstrated to be powerful in creating artistic image with help of Convolutional Neural Networks (CNN). However, there is still lack of computational analysis of perceptual components of the artistic style. Different from some early attempts which studied the style by some pre-processing or post-processing techniques, we investigate the characteristics of the style systematically based on feature map produced by CNN. First, we computationally decompose the style into basic elements using not only spectrum based methods including Fast Fourier Transform (FFT), Discrete Cosine Transform (DCT) but also latent variable models such Principal Component Analysis (PCA), Independent Component Analysis (ICA). Then, the decomposition of style induces various ways of controlling the style elements which could be embedded as modules in state-of-the-art style transfer algorithms. Such decomposition of style brings several advantages. It enables the computational coding of different artistic styles by our style basis with similar styles clustering together, and thus it facilitates the mixing or intervention of styles based on the style basis from more than one styles so that compound style or new style could be generated to produce styled images. Experiments demonstrate the effectiveness of our method on not only painting style transfer but also sketch style transfer which indicates possible applications on picture-to-sketch problems.
Using deep learning, this paper addresses the problem of joint object boundary detection and boundary motion estimation in videos, which we named boundary flow estimation. Boundary flow is an important mid-level visual cue as boundaries characterize objects spatial extents, and the flow indicates objects motions and interactions. Yet, most prior work on motion estimation has focused on dense object motion or feature points that may not necessarily reside on boundaries. For boundary flow estimation, we specify a new fully convolutional Siamese network (FCSN) that jointly estimates object-level boundaries in two consecutive frames. Boundary correspondences in the two frames are predicted by the same FCSN with a new, unconventional deconvolution approach. Finally, the boundary flow estimate is improved with an edgelet-based filtering. Evaluation is conducted on three tasks: boundary detection in videos, boundary flow estimation, and optical flow estimation. On boundary detection, we achieve the state-of-the-art performance on the benchmark VSB100 dataset. On boundary flow estimation, we present the first results on the Sintel training dataset. For optical flow estimation, we run the recent approach CPMFlow but on the augmented input with our boundary-flow matches, and achieve significant performance improvement on the Sintel benchmark.
How can we enable computers to automatically answer questions like "Who created the character Harry Potter"? Carefully built knowledge bases provide rich sources of facts. However, it remains a challenge to answer factoid questions raised in natural language due to numerous expressions of one question. In particular, we focus on the most common questions --- ones that can be answered with a single fact in the knowledge base. We propose CFO, a Conditional Focused neural-network-based approach to answering factoid questions with knowledge bases. Our approach first zooms in a question to find more probable candidate subject mentions, and infers the final answers with a unified conditional probabilistic framework. Powered by deep recurrent neural networks and neural embeddings, our proposed CFO achieves an accuracy of 75.7% on a dataset of 108k questions - the largest public one to date. It outperforms the current state of the art by an absolute margin of 11.8%.
In this paper, we propose a novel algorithm for analysis-based sparsity reconstruction. It can solve the generalized problem by structured sparsity regularization with an orthogonal basis and total variation regularization. The proposed algorithm is based on the iterative reweighted least squares (IRLS) model, which is further accelerated by the preconditioned conjugate gradient method. The convergence rate of the proposed algorithm is almost the same as that of the traditional IRLS algorithms, that is, exponentially fast. Moreover, with the specifically devised preconditioner, the computational cost for each iteration is significantly less than that of traditional IRLS algorithms, which enables our approach to handle large scale problems. In addition to the fast convergence, it is straightforward to apply our method to standard sparsity, group sparsity, overlapping group sparsity and TV based problems. Experiments are conducted on a practical application: compressive sensing magnetic resonance imaging. Extensive results demonstrate that the proposed algorithm achieves superior performance over 14 state-of-the-art algorithms in terms of both accuracy and computational cost.
Bandwidth slicing is introduced to support federated learning in edge computing to assure low communication delay for training traffic. Results reveal that bandwidth slicing significantly improves training efficiency while achieving good learning accuracy.
This paper develops a new deep neural network optimized equalization framework for massive multiple input multiple output orthogonal frequency division multiplexing (MIMO-OFDM) systems that employ low-resolution analog-to-digital converters (ADCs) at the base station (BS). The use of low-resolution ADCs could largely reduce hardware complexity and circuit power consumption, however, makes the channel station information almost blind to the BS, hence causing difficulty in solving the equalization problem. In this paper, we consider a supervised learning architecture, where the goal is to learn a representative function that can predict the targets (constellation points) from the inputs (outputs of the low-resolution ADCs) based on the labeled training data (pilot signals). Specially, our main contributions are two-fold: 1) First, we design a new activation function, whose outputs are close to the constellation points when the parameters are finally optimized, to help us fully exploit the stochastic gradient descent method for the discrete optimization problem. 2) Second, an unsupervised loss is designed and then added to the optimization objective, aiming to enhance the representation ability (so-called generalization). The experimental results reveal that the proposed equalizer is robust to different channel taps (i.e., Gaussian, and Poisson), significantly outperforms the linearized MMSE equalizer, and shows potential for pilot saving.