Models, code, and papers for "Yan Yan":

Different from traditional hyperspectral super-resolution approaches that focus on improving the spatial resolution, spectral super-resolution aims at producing a high-resolution hyperspectral image from the RGB observation with super-resolution in spectral domain. However, it is challenging to accurately reconstruct a high-dimensional continuous spectrum from three discrete intensity values at each pixel, since too much information is lost during the procedure where the latent hyperspectral image is downsampled (e.g., with x10 scaling factor) in spectral domain to produce an RGB observation. To address this problem, we present a multi-scale deep convolutional neural network (CNN) to explicitly map the input RGB image into a hyperspectral image. Through symmetrically downsampling and upsampling the intermediate feature maps in a cascading paradigm, the local and non-local image information can be jointly encoded for spectral representation, ultimately improving the spectral reconstruction accuracy. Extensive experiments on a large hyperspectral dataset demonstrate the effectiveness of the proposed method.

PSO is a widely recognized optimization algorithm inspired by social swarm. In this brief we present a heterogeneous strategy particle swarm optimization (HSPSO), in which a proportion of particles adopt a fully informed strategy to enhance the converging speed while the rest are singly informed to maintain the diversity. Our extensive numerical experiments show that HSPSO algorithm is able to obtain satisfactory solutions, outperforming both PSO and the fully informed PSO. The evolution process is examined from both structural and microscopic points of view. We find that the cooperation between two types of particles can facilitate a good balance between exploration and exploitation, yielding better performance. We demonstrate the applicability of HSPSO on the filter design problem.

Ghosting artifacts caused by moving objects or misalignments is a key challenge in high dynamic range (HDR) imaging for dynamic scenes. Previous methods first register the input low dynamic range (LDR) images using optical flow before merging them, which are error-prone and cause ghosts in results. A very recent work tries to bypass optical flows via a deep network with skip-connections, however, which still suffers from ghosting artifacts for severe movement. To avoid the ghosting from the source, we propose a novel attention-guided end-to-end deep neural network (AHDRNet) to produce high-quality ghost-free HDR images. Unlike previous methods directly stacking the LDR images or features for merging, we use attention modules to guide the merging according to the reference image. The attention modules automatically suppress undesired components caused by misalignments and saturation and enhance desirable fine details in the non-reference images. In addition to the attention model, we use dilated residual dense block (DRDB) to make full use of the hierarchical features and increase the receptive field for hallucinating the missing details. The proposed AHDRNet is a non-flow-based method, which can also avoid the artifacts generated by optical-flow estimation error. Experiments on different datasets show that the proposed AHDRNet can achieve state-of-the-art quantitative and qualitative results.

Modern large scale machine learning applications require stochastic optimization algorithms to be implemented on distributed computational architectures. A key bottleneck is the communication overhead for exchanging information, such as stochastic gradients, among different nodes. Recently, gradient sparsification techniques have been proposed to reduce communications cost and thus alleviate the network overhead. However, most of gradient sparsification techniques consider only synchronous parallelism and cannot be applied in asynchronous scenarios, such as asynchronous distributed training for federated learning at mobile devices. In this paper, we present a dual-way gradient sparsification approach (DGS) that is suitable for asynchronous distributed training. We let workers download model difference, instead of the global model, from the server, and the model difference information is also sparsified so that the information exchanged overhead is reduced by sparsifying the dual-way communication between the server and workers. To preserve accuracy under dual-way sparsification, we design a sparsification aware momentum (SAMomentum) to turn sparsification into adaptive batch size between each parameter. We conduct experiments at a cluster of 32 workers, and the results show that, with the same compression ratio but much lower communication cost, our approach can achieve better scalability and generalization ability.

In this work, we mainly study the influence of the 2D warping module for one-shot face recognition.

In this paper, we propose a new primal-dual algorithm for minimizing $f(x) + g(x) + h(Ax)$, where $f$, $g$, and $h$ are proper lower semi-continuous convex functions, $f$ is differentiable with a Lipschitz continuous gradient, and $A$ is a bounded linear operator. The proposed algorithm has some famous primal-dual algorithms for minimizing the sum of two functions as special cases. E.g., it reduces to the Chambolle-Pock algorithm when $f = 0$ and the proximal alternating predictor-corrector when $g = 0$. For the general convex case, we prove the convergence of this new algorithm in terms of the distance to a fixed point by showing that the iteration is a nonexpansive operator. In addition, we prove the $O(1/k)$ ergodic convergence rate in the primal-dual gap. With additional assumptions, we derive the linear convergence rate in terms of the distance to the fixed point. Comparing to other primal-dual algorithms for solving the same problem, this algorithm extends the range of acceptable parameters to ensure its convergence and has a smaller per-iteration cost. The numerical experiments show the efficiency of this algorithm.

This paper presents our segmentation system developed for the MLP 2017 shared tasks on cross-lingual word segmentation and morpheme segmentation. We model both word and morpheme segmentation as character-level sequence labelling tasks. The prevalent bidirectional recurrent neural network with conditional random fields as the output interface is adapted as the baseline system, which is further improved via ensemble decoding. Our universal system is applied to and extensively evaluated on all the official data sets without any language-specific adjustment. The official evaluation results indicate that the proposed model achieves outstanding accuracies both for word and morpheme segmentation on all the languages in various types when compared to the other participating systems.

A central problem in analyzing networks is partitioning them into modules or communities. One of the best tools for this is the stochastic block model, which clusters vertices into blocks with statistically homogeneous pattern of links. Despite its flexibility and popularity, there has been a lack of principled statistical model selection criteria for the stochastic block model. Here we propose a Bayesian framework for choosing the number of blocks as well as comparing it to the more elaborate degree- corrected block models, ultimately leading to a universal model selection framework capable of comparing multiple modeling combinations. We will also investigate its connection to the minimum description length principle.

In this paper, we proposed a new approximate heuristic search algorithm: Cascading A*, which is a two-phrase algorithm combining A* and IDA* by a new concept "envelope ball". The new algorithm CA* is efficient, able to generate approximate solution and any-time solution, and parallel friendly.

We introduce a new algorithm, called adaptive sparse backfitting algorithm, for solving high dimensional Sparse Additive Model (SpAM) utilizing symmetric, non-negative definite smoothers. Unlike the previous sparse backfitting algorithm, our method is essentially a block coordinate descent algorithm that guarantees to converge to the optimal solution. It bridges the gap between the population backfitting algorithm and that of the data version. We also prove variable selection consistency under suitable conditions. Numerical studies on both synthesis and real data are conducted to show that adaptive sparse backfitting algorithm outperforms previous sparse backfitting algorithm in fitting and predicting high dimensional nonparametric models.

This article studies the problem of image restoration of observed images corrupted by impulse noise and mixed Gaussian impulse noise. Since the pixels damaged by impulse noise contain no information about the true image, how to find this set correctly is a very important problem. We propose two methods based on blind inpainting and $\ell_0$ minimization that can simultaneously find the damaged pixels and restore the image. By iteratively restoring the image and updating the set of damaged pixels, these methods have better performance than other methods, as shown in the experiments. In addition, we provide convergence analysis for these methods, these algorithms will converge to coordinatewise minimum points. In addition, they will converge to local minimum points (or with probability one) with some modifications in the algorithms.

Representing defeasibility is an important issue in common sense reasoning. In reasoning about action and change, this issue becomes more difficult because domain and action related defeasible information may conflict with general inertia rules. Furthermore, different types of defeasible information may also interfere with each other during the reasoning. In this paper, we develop a prioritized logic programming approach to handle defeasibilities in reasoning about action. In particular, we propose three action languages {\cal AT}^{0}, {\cal AT}^{1} and {\cal AT}^{2} which handle three types of defeasibilities in action domains named defeasible constraints, defeasible observations and actions with defeasible and abnormal effects respectively. Each language with a higher superscript can be viewed as an extension of the language with a lower superscript. These action languages inherit the simple syntax of {\cal A} language but their semantics is developed in terms of transition systems where transition functions are defined based on prioritized logic programs. By illustrating various examples, we show that our approach eventually provides a powerful mechanism to handle various defeasibilities in temporal prediction and postdiction. We also investigate semantic properties of these three action languages and characterize classes of action domains that present more desirable solutions in reasoning about action within the underlying action languages.

Prioritized default reasoning has illustrated its rich expressiveness and flexibility in knowledge representation and reasoning. However, many important aspects of prioritized default reasoning have yet to be thoroughly explored. In this paper, we investigate two properties of prioritized logic programs in the context of answer set semantics. Specifically, we reveal a close relationship between mutual defeasibility and uniqueness of the answer set for a prioritized logic program. We then explore how the splitting technique for extended logic programs can be extended to prioritized logic programs. We prove splitting theorems that can be used to simplify the evaluation of a prioritized logic program under certain conditions.

Spectral Clustering (SC) is a widely used data clustering method which first learns a low-dimensional embedding $U$ of data by computing the eigenvectors of the normalized Laplacian matrix, and then performs k-means on $U^\top$ to get the final clustering result. The Sparse Spectral Clustering (SSC) method extends SC with a sparse regularization on $UU^\top$ by using the block diagonal structure prior of $UU^\top$ in the ideal case. However, encouraging $UU^\top$ to be sparse leads to a heavily nonconvex problem which is challenging to solve and the work (Lu, Yan, and Lin 2016) proposes a convex relaxation in the pursuit of this aim indirectly. However, the convex relaxation generally leads to a loose approximation and the quality of the solution is not clear. This work instead considers to solve the nonconvex formulation of SSC which directly encourages $UU^\top$ to be sparse. We propose an efficient Alternating Direction Method of Multipliers (ADMM) to solve the nonconvex SSC and provide the convergence guarantee. In particular, we prove that the sequences generated by ADMM always exist a limit point and any limit point is a stationary point. Our analysis does not impose any assumptions on the iterates and thus is practical. Our proposed ADMM for nonconvex problems allows the stepsize to be increasing but upper bounded, and this makes it very efficient in practice. Experimental analysis on several real data sets verifies the effectiveness of our method.

Deep convolutional networks (CNNs) have exhibited their potential in image inpainting for producing plausible results. However, in most existing methods, e.g., context encoder, the missing parts are predicted by propagating the surrounding convolutional features through a fully connected layer, which intends to produce semantically plausible but blurry result. In this paper, we introduce a special shift-connection layer to the U-Net architecture, namely Shift-Net, for filling in missing regions of any shape with sharp structures and fine-detailed textures. To this end, the encoder feature of the known region is shifted to serve as an estimation of the missing parts. A guidance loss is introduced on decoder feature to minimize the distance between the decoder feature after fully connected layer and the ground-truth encoder feature of the missing parts. With such constraint, the decoder feature in missing region can be used to guide the shift of encoder feature in known region. An end-to-end learning algorithm is further developed to train the Shift-Net. Experiments on the Paris StreetView and Places datasets demonstrate the efficiency and effectiveness of our Shift-Net in producing sharper, fine-detailed, and visually plausible results. The codes and pre-trained models are available at https://github.com/Zhaoyi-Yan/Shift-Net.

It has long been noticed that high dimension data exhibits strange patterns. This has been variously interpreted as either a "blessing" or a "curse", causing uncomfortable inconsistencies in the literature. We propose that these patterns arise from an intrinsically hierarchical generative process. Modeling the process creates a web of constraints that reconcile many different theories and results. The model also implies high dimensional data posses an innate separability that can be exploited for machine learning. We demonstrate how this permits the open-set learning problem to be defined mathematically, leading to qualitative and quantitative improvements in performance.

Partial multi-label learning (PML), which tackles the problem of learning multi-label prediction models from instances with overcomplete noisy annotations, has recently started gaining attention from the research community. In this paper, we propose a novel adversarial learning model, PML-GAN, under a generalized encoder-decoder framework for partial multi-label learning. The PML-GAN model uses a disambiguation network to identify noisy labels and uses a multi-label prediction network to map the training instances to the disambiguated label vectors, while deploying a generative adversarial network as an inverse mapping from label vectors to data samples in the input feature space. The learning of the overall model corresponds to a minimax adversarial game, which enhances the correspondence of input features with the output labels. Extensive experiments are conducted on multiple datasets, while the proposed model demonstrates the state-of-the-art performance for partial multi-label learning.

Emerging transportation modes, including car-sharing, bike-sharing, and ride-hailing, are transforming urban mobility but have been shown to reinforce socioeconomic inequities. Spatiotemporal demand prediction models for these new mobility regimes must therefore consider fairness as a first-class design requirement. We present FairST, a fairness-aware model for predicting demand for new mobility systems. Our approach utilizes 1D, 2D and 3D convolutions to integrate various urban features and learn the spatial-temporal dynamics of a mobility system, but we include fairness metrics as a form of regularization to make the predictions more equitable across demographic groups. We propose two novel spatiotemporal fairness metrics, a region-based fairness gap (RFG) and an individual-based fairness gap (IFG). Both quantify equity in a spatiotemporal context, but vary by whether demographics are labeled at the region level (RFG) or whether population distribution information is available (IFG). Experimental results on real bike share and ride share datasets demonstrate the effectiveness of the proposed model: FairST not only reduces the fairness gap by more than 80%, but can surprisingly achieve better accuracy than state-of-the-art yet fairness-oblivious methods including LSTMs, ConvLSTMs, and 3D CNN.

We consider the problem of high-dimensional misspecified phase retrieval. This is where we have an $s$-sparse signal vector $\mathbf{x}_*$ in $\mathbb{R}^n$, which we wish to recover using sampling vectors $\textbf{a}_1,\ldots,\textbf{a}_m$, and measurements $y_1,\ldots,y_m$, which are related by the equation $f(\left<\textbf{a}_i,\textbf{x}_*\right>) = y_i$. Here, $f$ is an unknown link function satisfying a positive correlation with the quadratic function. This problem was analyzed in a recent paper by Neykov, Wang and Liu, who provided recovery guarantees for a two-stage algorithm with sample complexity $m = O(s^2\log n)$. In this paper, we show that the first stage of their algorithm suffices for signal recovery with the same sample complexity, and extend the analysis to non-Gaussian measurements. Furthermore, we show how the algorithm can be generalized to recover a signal vector $\textbf{x}_*$ efficiently given geometric prior information other than sparsity.

The stable model semantics had been recently generalized to non-Herbrand structures by several works, which provides a unified framework and solid logical foundations for answer set programming. This paper focuses on the expressiveness of normal and disjunctive programs under the general stable model semantics. A translation from disjunctive programs to normal programs is proposed for infinite structures. Over finite structures, some disjunctive programs are proved to be intranslatable to normal programs if the arities of auxiliary predicates and functions are bounded in a certain way. The equivalence of the expressiveness of normal programs and disjunctive programs over arbitrary structures is also shown to coincide with that over finite structures, and coincide with whether NP is closed under complement. Moreover, to capture the exact expressiveness, some intertranslatability results between logic program classes and fragments of second-order logic are obtained.