Models, code, and papers for "Shuai Li":

We present a formal measure-theoretical theory of neural networks (NN) built on probability coupling theory. Our main contributions are summarized as follows. * Built on the formalism of probability coupling theory, we derive an algorithm framework, named Hierarchical Measure Group and Approximate System (HMGAS), nicknamed S-System, that is designed to learn the complex hierarchical, statistical dependency in the physical world. * We show that NNs are special cases of S-System when the probability kernels assume certain exponential family distributions. Activation Functions are derived formally. We further endow geometry on NNs through information geometry, show that intermediate feature spaces of NNs are stochastic manifolds, and prove that "distance" between samples is contracted as layers stack up. * S-System shows NNs are inherently stochastic, and under a set of realistic boundedness and diversity conditions, it enables us to prove that for large size nonlinear deep NNs with a class of losses, including the hinge loss, all local minima are global minima with zero loss errors, and regions around the minima are flat basins where all eigenvalues of Hessians are concentrated around zero, using tools and ideas from mean field theory, random matrix theory, and nonlinear operator equations. * S-System, the information-geometry structure and the optimization behaviors combined completes the analog between Renormalization Group (RG) and NNs. It shows that a NN is a complex adaptive system that estimates the statistic dependency of microscopic object, e.g., pixels, in multiple scales. Unlike clear-cut physical quantity produced by RG in physics, e.g., temperature, NNs renormalize/recompose manifolds emerging through learning/optimization that divide the sample space into highly semantically meaningful groups that are dictated by supervised labels (in supervised NNs).

Neural networks are a revolutionary but immature technique that is fast evolving and heavily relies on data. To benefit from the newest development and newly available data, we want the gap between research and production as small as possibly. On the other hand, differing from traditional machine learning models, neural network is not just yet another statistic model, but a model for the natural processing engine --- the brain. In this work, we describe a neural network library named {\texttt akid}. It provides higher level of abstraction for entities (abstracted as blocks) in nature upon the abstraction done on signals (abstracted as tensors) by Tensorflow, characterizing the dataism observation that all entities in nature processes input and emit out in some ways. It includes a full stack of software that provides abstraction to let researchers focus on research instead of implementation, while at the same time the developed program can also be put into production seamlessly in a distributed environment, and be production ready. At the top application stack, it provides out-of-box tools for neural network applications. Lower down, akid provides a programming paradigm that lets user easily build customized models. The distributed computing stack handles the concurrency and communication, thus letting models be trained or deployed to a single GPU, multiple GPUs, or a distributed environment without affecting how a model is specified in the programming paradigm stack. Lastly, the distributed deployment stack handles how the distributed computing is deployed, thus decoupling the research prototype environment with the actual production environment, and is able to dynamically allocate computing resources, so development (Devs) and operations (Ops) could be separated. Please refer to http://akid.readthedocs.io/en/latest/ for documentation.

Many well-established recommender systems are based on representation learning in Euclidean space. In these models, matching functions such as the Euclidean distance or inner product are typically used for computing similarity scores between user and item embeddings. This paper investigates the notion of learning user and item representations in Hyperbolic space. In this paper, we argue that Hyperbolic space is more suitable for learning user-item embeddings in the recommendation domain. Unlike Euclidean spaces, Hyperbolic spaces are intrinsically equipped to handle hierarchical structure, encouraged by its property of exponentially increasing distances away from origin. We propose HyperBPR (Hyperbolic Bayesian Personalized Ranking), a conceptually simple but highly effective model for the task at hand. Our proposed HyperBPR not only outperforms their Euclidean counterparts, but also achieves state-of-the-art performance on multiple benchmark datasets, demonstrating the effectiveness of personalized recommendation in Hyperbolic space.

Future mobile communication networks require an Aerial Base Station (ABS) with fast mobility and long-term hovering capabilities. At present, unmanned aerial vehicles (UAV) or drones do not have long flight times and are mainly used for monitoring, surveillance, and image post-processing. On the other hand, the traditional airship is too large and not easy to take off and land. Therefore, we propose to develop an "Artificial Intelligence (AI) Drone-Cruiser" base station that can help 5G mobile communication systems and beyond quickly recover the network after a disaster and handle the instant communications by the flash crowd. The drone-cruiser base station can overcome the communications problem for three types of flash crowds, such as in stadiums, parades, and large plaza so that an appropriate number of aerial base stations can be accurately deployed to meet large and dynamic traffic demands. Artificial intelligence can solve these problems by analyzing the collected data, and then adjust the system parameters in the framework of Self-Organizing Network (SON) to achieve the goals of self-configuration, self-optimization, and self-healing. With the help of AI technologies, 5G networks can become more intelligent. This paper aims to provide a new type of service, On-Demand Aerial Base Station as a Service. This work needs to overcome the following five technical challenges: innovative design of drone-cruisers for the long-time hovering, crowd estimation and prediction, rapid 3D wireless channel learning and modeling, 3D placement of aerial base stations and the integration of WiFi front-haul and millimeter wave/WiGig back-haul networks.

The quality of inverse problem solutions obtained through deep learning [Barbastathis et al, 2019] is limited by the nature of the priors learned from examples presented during the training phase. In the case of quantitative phase retrieval [Sinha et al, 2017, Goy et al, 2019], in particular, spatial frequencies that are underrepresented in the training database, most often at the high band, tend to be suppressed in the reconstruction. Ad hoc solutions have been proposed, such as pre-amplifying the high spatial frequencies in the examples [Li et al, 2018]; however, while that strategy improves resolution, it also leads to high-frequency artifacts as well as low-frequency distortions in the reconstructions. Here, we present a new approach that learns separately how to handle the two frequency bands, low and high; and also learns how to synthesize these two bands into the full-band reconstructions. We show that this "learning to synthesize" (LS) method yields phase reconstructions of high spatial resolution and artifact-free; and it is also resilient to high-noise conditions, e.g. in the case of very low photon flux. In addition to the problem of quantitative phase retrieval, the LS method is applicable, in principle, to any inverse problem where the forward operator treats different frequency bands unevenly, i.e. is ill-posed.

We consider a new setting of online clustering of contextual cascading bandits, an online learning problem where the underlying cluster structure over users is unknown and needs to be learned from a random prefix feedback. More precisely, a learning agent recommends an ordered list of items to a user, who checks the list and stops at the first satisfactory item, if any. We propose an algorithm of CLUB-cascade for this setting and prove an $n$-step regret bound of order $\tilde{O}(\sqrt{n})$. Previous work corresponds to the degenerate case of only one cluster, and our general regret bound in this special case also significantly improves theirs. We conduct experiments on both synthetic and real data, and demonstrate the effectiveness of our algorithm and the advantage of incorporating online clustering method.

Beetle antennae search (BAS) is an efficient meta-heuristic algorithm inspired by foraging behaviors of beetles. This algorithm includes several parameters for tuning and the existing results are limited to solve single objective optimization. This work pushes forward the research on BAS by providing one variant that releases the tuning parameters and is able to handle multi-objective optimization. This new approach applies normalization to simplify the original algorithm and uses a penalty function to exploit infeasible solutions with low constraint violation to solve the constraint optimization problem. Extensive experimental studies are carried out and the results reveal efficacy of the proposed approach to constraint handling.

Meta-heuristic algorithms have become very popular because of powerful performance on the optimization problem. A new algorithm called beetle antennae search algorithm (BAS) is proposed in the paper inspired by the searching behavior of longhorn beetles. The BAS algorithm imitates the function of antennae and the random walking mechanism of beetles in nature, and then two main steps of detecting and searching are implemented. Finally, the algorithm is benchmarked on 2 well-known test functions, in which the numerical results validate the efficacy of the proposed BAS algorithm.

Bio-inspired algorithms have received a significant amount of attention in both academic and engineering societies. In this paper, based on the observation of two major survival rules of a species of woodlice, i.e., porcellio scaber, we design and propose an algorithm called the porcellio scaber algorithm (PSA) for solving optimization problems, including differentiable and non-differential ones as well as the case with local optimums. Numerical results based on benchmark problems are presented to validate the efficacy of PSA.

We propose an efficient Context-Aware clustering of Bandits (CAB) algorithm, which can capture collaborative effects. CAB can be easily deployed in a real-world recommendation system, where multi-armed bandits have been shown to perform well in particular with respect to the cold-start problem. CAB utilizes a context-aware clustering augmented by exploration-exploitation strategies. CAB dynamically clusters the users based on the content universe under consideration. We give a theoretical analysis in the standard stochastic multi-armed bandits setting. We show the efficiency of our approach on production and real-world datasets, demonstrate the scalability, and, more importantly, the significant increased prediction performance against several state-of-the-art methods.

This paper proposes adaptive fractional order graph neural network (AFGNN), optimized by a time-varying fractional order gradient descent method to address the challenges of local optimum of classic and fractional GNNs which are specialised at aggregating information from the feature and adjacent matrices of connected nodes and their neighbours to solve learning tasks on non-Euclidean data such as graphs. To overcome the high computational complexity of fractional order derivations, the proposed model approximately calculates the fractional order gradients. We further prove such approximation is feasible and the AFGNN is unbiased. Extensive experiments on benchmark citation networks and object recognition challenges confirm the performance of AFGNN. The first group of experiments show that the results of AFGNN outperform the steepest gradient based method and conventional GNNs on the citation networks. The second group of experiments demonstrate that AFGNN excels at image recognition tasks where the images have a significant amount of missing pixels and expresses improved accuracy than GNNs.

The problem of multi-armed bandits (MAB) asks to make sequential decisions while balancing between exploitation and exploration, and have been successfully applied to a wide range of practical scenarios. Various algorithms have been designed to achieve a high reward in a long term. However, its short-term performance might be rather low, which is injurious in risk sensitive applications. Building on previous work of conservative bandits, we bring up a framework of contextual combinatorial conservative bandits. An algorithm is presented and a regret bound of $\tilde O(d^2+d\sqrt{T})$ is proven, where $d$ is the dimension of the feature vectors, and $T$ is the total number of time steps. We further provide an algorithm as well as regret analysis for the case when the conservative reward is unknown. Experiments are conducted, and the results validate the effectiveness of our algorithm.

The beetle antennae search algorithm was recently proposed and investigated for solving global optimization problems. Although the performance of the algorithm and its variants were shown to be better than some existing meta-heuristic algorithms, there is still a lack of convergence analysis. In this paper, we provide theoretical analysis on the convergence of the beetle antennae search algorithm. We test the performance of the BAS algorithm via some representative benchmark functions. Meanwhile, some applications of the BAS algorithm are also presented.

Deep Neural Network (DNN)-based image reconstruction, despite many successes, often exhibits uneven fidelity between high and low spatial frequency bands. In this paper we propose the Learning Synthesis by DNN (LS-DNN) approach where two DNNs process the low and high spatial frequencies, respectively, and, improving over [30], the two DNNs are trained separately and a third DNN combines them into an image with high fidelity at all bands. We demonstrate LS-DNN in two canonical inverse problems: super-resolution (SR) in diffraction-limited imaging (DLI), and quantitative phase retrieval (QPR). Our results also show comparable or improved performance over perceptual-loss based SR [21], and can be generalized to a wider range of image recovery problems.

In this paper, we extend a bio-inspired algorithm called the porcellio scaber algorithm (PSA) to solve constrained optimization problems, including a constrained mixed discrete-continuous nonlinear optimization problem. Our extensive experiment results based on benchmark optimization problems show that the PSA has a better performance than many existing methods or algorithms. The results indicate that the PSA is a promising algorithm for constrained optimization.

The recent successful deep neural networks are largely trained in a supervised manner. It {\it associates} complex patterns of input samples with neurons in the last layer, which form representations of {\it concepts}. In spite of their successes, the properties of complex patterns associated a learned concept remain elusive. In this work, by analyzing how neurons are associated with concepts in supervised networks, we hypothesize that with proper priors to regulate learning, neural networks can automatically associate neurons in the intermediate layers with concepts that are aligned with real world concepts, when trained only with labels that associate concepts with top level neurons, which is a plausible way for unsupervised learning. We develop a prior to verify the hypothesis and experimentally find the proposed prior help neural networks automatically learn both basic physical concepts at the lower layers, e.g., rotation of filters, and highly semantic concepts at the higher layers, e.g., fine-grained categories of an entry-level category.

We provide two distributed confidence ball algorithms for solving linear bandit problems in peer to peer networks with limited communication capabilities. For the first, we assume that all the peers are solving the same linear bandit problem, and prove that our algorithm achieves the optimal asymptotic regret rate of any centralised algorithm that can instantly communicate information between the peers. For the second, we assume that there are clusters of peers solving the same bandit problem within each cluster, and we prove that our algorithm discovers these clusters, while achieving the optimal asymptotic regret rate within each one. Through experiments on several real-world datasets, we demonstrate the performance of proposed algorithms compared to the state-of-the-art.

Classical collaborative filtering, and content-based filtering methods try to learn a static recommendation model given training data. These approaches are far from ideal in highly dynamic recommendation domains such as news recommendation and computational advertisement, where the set of items and users is very fluid. In this work, we investigate an adaptive clustering technique for content recommendation based on exploration-exploitation strategies in contextual multi-armed bandit settings. Our algorithm takes into account the collaborative effects that arise due to the interaction of the users with the items, by dynamically grouping users based on the items under consideration and, at the same time, grouping items based on the similarity of the clusterings induced over the users. The resulting algorithm thus takes advantage of preference patterns in the data in a way akin to collaborative filtering methods. We provide an empirical analysis on medium-size real-world datasets, showing scalability and increased prediction performance (as measured by click-through rate) over state-of-the-art methods for clustering bandits. We also provide a regret analysis within a standard linear stochastic noise setting.

We investigate an efficient context-dependent clustering technique for recommender systems based on exploration-exploitation strategies through multi-armed bandits over multiple users. Our algorithm dynamically groups users based on their observed behavioral similarity during a sequence of logged activities. In doing so, the algorithm reacts to the currently served user by shaping clusters around him/her but, at the same time, it explores the generation of clusters over users which are not currently engaged. We motivate the effectiveness of this clustering policy, and provide an extensive empirical analysis on real-world datasets, showing scalability and improved prediction performance over state-of-the-art methods for sequential clustering of users in multi-armed bandit scenarios.

We introduce a novel algorithmic approach to content recommendation based on adaptive clustering of exploration-exploitation ("bandit") strategies. We provide a sharp regret analysis of this algorithm in a standard stochastic noise setting, demonstrate its scalability properties, and prove its effectiveness on a number of artificial and real-world datasets. Our experiments show a significant increase in prediction performance over state-of-the-art methods for bandit problems.