Research papers and code for "representation learning":
A longstanding question in computer vision concerns the representation of 3D shapes for recognition: should 3D shapes be represented with descriptors operating on their native 3D formats, such as voxel grid or polygon mesh, or can they be effectively represented with view-based descriptors? We address this question in the context of learning to recognize 3D shapes from a collection of their rendered views on 2D images. We first present a standard CNN architecture trained to recognize the shapes' rendered views independently of each other, and show that a 3D shape can be recognized even from a single view at an accuracy far higher than using state-of-the-art 3D shape descriptors. Recognition rates further increase when multiple views of the shapes are provided. In addition, we present a novel CNN architecture that combines information from multiple views of a 3D shape into a single and compact shape descriptor offering even better recognition performance. The same architecture can be applied to accurately recognize human hand-drawn sketches of shapes. We conclude that a collection of 2D views can be highly informative for 3D shape recognition and is amenable to emerging CNN architectures and their derivatives.

* v1: Initial version. v2: An updated ModelNet40 training/test split is used; results with low-rank Mahalanobis metric learning are added. v3 (ICCV 2015): A second camera setup without the upright orientation assumption is added; some accuracy and mAP numbers are changed slightly because a small issue in mesh rendering related to specularities is fixed
Click to Read Paper and Get Code
Since about 100 years ago, to learn the intrinsic structure of data, many representation learning approaches have been proposed, including both linear ones and nonlinear ones, supervised ones and unsupervised ones. Particularly, deep architectures are widely applied for representation learning in recent years, and have delivered top results in many tasks, such as image classification, object detection and speech recognition. In this paper, we review the development of data representation learning methods. Specifically, we investigate both traditional feature learning algorithms and state-of-the-art deep learning models. The history of data representation learning is introduced, while available resources (e.g. online course, tutorial and book information) and toolboxes are provided. Finally, we conclude this paper with remarks and some interesting research directions on data representation learning.

* About 20 pages. Submitted to Journal of Finance and Data Science as an invited paper
Click to Read Paper and Get Code
Inspired by the success of deploying deep learning in the fields of Computer Vision and Natural Language Processing, this learning paradigm has also found its way into the field of Music Information Retrieval. In order to benefit from deep learning in an effective, but also efficient manner, deep transfer learning has become a common approach. In this approach, it is possible to reuse the output of a pre-trained neural network as the basis for a new learning task. The underlying hypothesis is that if the initial and new learning tasks show commonalities and are applied to the same type of input data (e.g. music audio), the generated deep representation of the data is also informative for the new task. Since, however, most of the networks used to generate deep representations are trained using a single initial learning source, the validity of the above hypothesis is questionable for an arbitrary future task. In this paper, we present the results of our investigation of what the most important factor to generate deep representations for the data and learning tasks in the music domain. We conducted this investigation via an extensive empirical study that involves multiple learning sources, as well as multiple deep learning architectures with varying levels of information sharing between sources, in order to learn music representations. We then validate these representations considering multiple target datasets for evaluation. The results of our experiments yield several insights on how to approach the design of methods for learning widely deployable deep data representations in the music domain.

* This work has been submitted to "Neural Computing and Applications: Special Issue on Deep Learning for Music and Audio" and currently is under review
Click to Read Paper and Get Code
It is widely believed that learning good representations is one of the main reasons for the success of deep neural networks. Although highly intuitive, there is a lack of theory and systematic approach quantitatively characterizing what representations do deep neural networks learn. In this work, we move a tiny step towards a theory and better understanding of the representations. Specifically, we study a simpler problem: How similar are the representations learned by two networks with identical architecture but trained from different initializations. We develop a rigorous theory based on the neuron activation subspace match model. The theory gives a complete characterization of the structure of neuron activation subspace matches, where the core concepts are maximum match and simple match which describe the overall and the finest similarity between sets of neurons in two networks respectively. We also propose efficient algorithms to find the maximum match and simple matches. Finally, we conduct extensive experiments using our algorithms. Experimental results suggest that, surprisingly, representations learned by the same convolutional layers of networks trained from different initializations are not as similar as prevalently expected, at least in terms of subspace match.

* 17 pages, 6 figures
Click to Read Paper and Get Code
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning.

Click to Read Paper and Get Code
Deep learning owes its success to three key factors: scale of data, enhanced models to learn representations from data, and scale of computation. This book chapter presented the importance of the data-driven approach to learn good representations from both big data and small data. In terms of big data, it has been widely accepted in the research community that the more data the better for both representation and classification improvement. The question is then how to learn representations from big data, and how to perform representation learning when data is scarce. We addressed the first question by presenting CNN model enhancements in the aspects of representation, optimization, and generalization. To address the small data challenge, we showed transfer representation learning to be effective. Transfer representation learning transfers the learned representation from a source domain where abundant training data is available to a target domain where training data is scarce. Transfer representation learning gave the OM and melanoma diagnosis modules of our XPRIZE Tricorder device (which finished $2^{nd}$ out of $310$ competing teams) a significant boost in diagnosis accuracy.

* Book chapter
Click to Read Paper and Get Code
A variety of representation learning approaches have been investigated for reinforcement learning; much less attention, however, has been given to investigating the utility of sparse coding. Outside of reinforcement learning, sparse coding representations have been widely used, with non-convex objectives that result in discriminative representations. In this work, we develop a supervised sparse coding objective for policy evaluation. Despite the non-convexity of this objective, we prove that all local minima are global minima, making the approach amenable to simple optimization strategies. We empirically show that it is key to use a supervised objective, rather than the more straightforward unsupervised sparse coding approach. We compare the learned representations to a canonical fixed sparse representation, called tile-coding, demonstrating that the sparse coding representation outperforms a wide variety of tilecoding representations.

* 6(+1) pages, 2 figures, International Joint Conference on Artificial Intelligence 2017
Click to Read Paper and Get Code
We study the problem of representation learning in goal-conditioned hierarchical reinforcement learning. In such hierarchical structures, a higher-level controller solves tasks by iteratively communicating goals which a lower-level policy is trained to reach. Accordingly, the choice of representation -- the mapping of observation space to goal space -- is crucial. To study this problem, we develop a notion of sub-optimality of a representation, defined in terms of expected reward of the optimal hierarchical policy using this representation. We derive expressions which bound the sub-optimality and show how these expressions can be translated to representation learning objectives which may be optimized in practice. Results on a number of difficult continuous-control tasks show that our approach to representation learning yields qualitatively better representations as well as quantitatively better hierarchical policies, compared to existing methods (see videos at https://sites.google.com/view/representation-hrl).

Click to Read Paper and Get Code
Representation learning algorithms are designed to learn abstract features that characterize data. State representation learning (SRL) focuses on a particular kind of representation learning where learned features are in low dimension, evolve through time, and are influenced by actions of an agent. The representation is learned to capture the variation in the environment generated by the agent's actions; this kind of representation is particularly suitable for robotics and control scenarios. In particular, the low dimension characteristic of the representation helps to overcome the curse of dimensionality, provides easier interpretation and utilization by humans and can help improve performance and speed in policy learning algorithms such as reinforcement learning. This survey aims at covering the state-of-the-art on state representation learning in the most recent years. It reviews different SRL methods that involve interaction with the environment, their implementations and their applications in robotics control tasks (simulated or real). In particular, it highlights how generic learning objectives are differently exploited in the reviewed algorithms. Finally, it discusses evaluation methods to assess the representation learned and summarizes current and future lines of research.

Click to Read Paper and Get Code
Representation learning is a central challenge across a range of machine learning areas. In reinforcement learning, effective and functional representations have the potential to tremendously accelerate learning progress and solve more challenging problems. Most prior work on representation learning has focused on generative approaches, learning representations that capture all underlying factors of variation in the observation space in a more disentangled or well-ordered manner. In this paper, we instead aim to learn functionally salient representations: representations that are not necessarily complete in terms of capturing all factors of variation in the observation space, but rather aim to capture those factors of variation that are important for decision making -- that are "actionable." These representations are aware of the dynamics of the environment, and capture only the elements of the observation that are necessary for decision making rather than all factors of variation, without explicit reconstruction of the observation. We show how these representations can be useful to improve exploration for sparse reward problems, to enable long horizon hierarchical reinforcement learning, and as a state representation for learning policies for downstream tasks. We evaluate our method on a number of simulated environments, and compare it to prior methods for representation learning, exploration, and hierarchical reinforcement learning.

Click to Read Paper and Get Code
Advances in neural network based classifiers have transformed automatic feature learning from a pipe dream of stronger AI to a routine and expected property of practical systems. Since the emergence of AlexNet every winning submission of the ImageNet challenge has employed end-to-end representation learning, and due to the utility of good representations for transfer learning, representation learning has become as an important and distinct task from supervised learning. At present, this distinction is inconsequential, as supervised methods are state-of-the-art in learning transferable representations. But recent work has shown that generative models can also be powerful agents of representation learning. Will the representations learned from these generative methods ever rival the quality of those from their supervised competitors? In this work, we argue in the affirmative, that from an information theoretic perspective, generative models have greater potential for representation learning. Based on several experimentally validated assumptions, we show that supervised learning is upper bounded in its capacity for representation learning in ways that certain generative models, such as Generative Adversarial Networks (GANs) are not. We hope that our analysis will provide a rigorous motivation for further exploration of generative representation learning.

* Submitted to ICLR 2017 Workshop Track
Click to Read Paper and Get Code
In this study, we propose the integration of competitive learning into convolutional neural networks (CNNs) to improve the representation learning and efficiency of fine-tuning. Conventional CNNs use back propagation learning, and it enables powerful representation learning by a discrimination task. However, it requires huge amount of labeled data, and acquisition of labeled data is much harder than that of unlabeled data. Thus, efficient use of unlabeled data is getting crucial for DNNs. To address the problem, we introduce unsupervised competitive learning into the convolutional layer, and utilize unlabeled data for effective representation learning. The results of validation experiments using a toy model demonstrated that strong representation learning effectively extracted bases of images into convolutional filters using unlabeled data, and accelerated the speed of the fine-tuning of subsequent supervised back propagation learning. The leverage was more apparent when the number of filters was sufficiently large, and, in such a case, the error rate steeply decreased in the initial phase of fine-tuning. Thus, the proposed method enlarged the number of filters in CNNs, and enabled a more detailed and generalized representation. It could provide a possibility of not only deep but broad neural networks.

* Appeared at NIPS 2017 Workshop: Deep Learning: Bridging Theory and Practice
Click to Read Paper and Get Code
There is general consensus that learning representations is useful for a variety of reasons, e.g. efficient use of labeled data (semi-supervised learning), transfer learning and understanding hidden structure of data. Popular techniques for representation learning include clustering, manifold learning, kernel-learning, autoencoders, Boltzmann machines, etc. To study the relative merits of these techniques, it's essential to formalize the definition and goals of representation learning, so that they are all become instances of the same definition. This paper introduces such a formal framework that also formalizes the utility of learning the representation. It is related to previous Bayesian notions, but with some new twists. We show the usefulness of our framework by exhibiting simple and natural settings -- linear mixture models and loglinear models, where the power of representation learning can be formally shown. In these examples, representation learning can be performed provably and efficiently under plausible assumptions (despite being NP-hard), and furthermore: (i) it greatly reduces the need for labeled data (semi-supervised learning) and (ii) it allows solving classification tasks when simpler approaches like nearest neighbors require too much data (iii) it is more powerful than manifold learning methods.

* 22 pages
Click to Read Paper and Get Code
Scaling end-to-end reinforcement learning to control real robots from vision presents a series of challenges, in particular in terms of sample efficiency. Against end-to-end learning, state representation learning can help learn a compact, efficient and relevant representation of states that speeds up policy learning, reducing the number of samples needed, and that is easier to interpret. We evaluate several state representation learning methods on goal based robotics tasks and propose a new unsupervised model that stacks representations and combines strengths of several of these approaches. This method encodes all the relevant features, performs on par or better than end-to-end learning, and is robust to hyper-parameters change.

* Github repo: https://github.com/araffin/srl-zoo Documentation: https://srl-zoo.readthedocs.io/en/latest/, As part of SRL-Toolbox: https://s-rl-toolbox.readthedocs.io/en/latest/
Click to Read Paper and Get Code
A lot of the recent success in natural language processing (NLP) has been driven by distributed vector representations of words trained on large amounts of text in an unsupervised manner. These representations are typically used as general purpose features for words across a range of NLP problems. However, extending this success to learning representations of sequences of words, such as sentences, remains an open problem. Recent work has explored unsupervised as well as supervised learning techniques with different training objectives to learn general purpose fixed-length sentence representations. In this work, we present a simple, effective multi-task learning framework for sentence representations that combines the inductive biases of diverse training objectives in a single model. We train this model on several data sources with multiple training objectives on over 100 million sentences. Extensive experiments demonstrate that sharing a single recurrent sentence encoder across weakly related tasks leads to consistent improvements over previous methods. We present substantial improvements in the context of transfer learning and low-resource settings using our learned general-purpose representations.

* Accepted at ICLR 2018
Click to Read Paper and Get Code
We investigate a paradigm in multi-task reinforcement learning (MT-RL) in which an agent is placed in an environment and needs to learn to perform a series of tasks, within this space. Since the environment does not change, there is potentially a lot of common ground amongst tasks and learning to solve them individually seems extremely wasteful. In this paper, we explicitly model and learn this shared structure as it arises in the state-action value space. We will show how one can jointly learn optimal value-functions by modifying the popular Value-Iteration and Policy-Iteration procedures to accommodate this shared representation assumption and leverage the power of multi-task supervised learning. Finally, we demonstrate that the proposed model and training procedures, are able to infer good value functions, even under low samples regimes. In addition to data efficiency, we will show in our analysis, that learning abstractions of the state space jointly across tasks leads to more robust, transferable representations with the potential for better generalization. this shared representation assumption and leverage the power of multi-task supervised learning. Finally, we demonstrate that the proposed model and training procedures, are able to infer good value functions, even under low samples regimes. In addition to data efficiency, we will show in our analysis, that learning abstractions of the state space jointly across tasks leads to more robust, transferable representations with the potential for better generalization.

Click to Read Paper and Get Code
Entity images could provide significant visual information for knowledge representation learning. Most conventional methods learn knowledge representations merely from structured triples, ignoring rich visual information extracted from entity images. In this paper, we propose a novel Image-embodied Knowledge Representation Learning model (IKRL), where knowledge representations are learned with both triple facts and images. More specifically, we first construct representations for all images of an entity with a neural image encoder. These image representations are then integrated into an aggregated image-based representation via an attention-based method. We evaluate our IKRL models on knowledge graph completion and triple classification. Experimental results demonstrate that our models outperform all baselines on both tasks, which indicates the significance of visual information for knowledge representations and the capability of our models in learning knowledge representations with images.

* 7 pages; Accepted by IJCAI-2017
Click to Read Paper and Get Code
Recently, multi-view representation learning has become a rapidly growing direction in machine learning and data mining areas. This paper introduces two categories for multi-view representation learning: multi-view representation alignment and multi-view representation fusion. Consequently, we first review the representative methods and theories of multi-view representation learning based on the perspective of alignment, such as correlation-based alignment. Representative examples are canonical correlation analysis (CCA) and its several extensions. Then from the perspective of representation fusion we investigate the advancement of multi-view representation learning that ranges from generative methods including multi-modal topic learning, multi-view sparse coding, and multi-view latent space Markov networks, to neural network-based methods including multi-modal autoencoders, multi-view convolutional neural networks, and multi-modal recurrent neural networks. Further, we also investigate several important applications of multi-view representation learning. Overall, this survey aims to provide an insightful overview of theoretical foundation and state-of-the-art developments in the field of multi-view representation learning and to help researchers find the most appropriate tools for particular applications.

* Accepted by IEEE Transactions on Knowledge and Data Engineering
Click to Read Paper and Get Code
Recent success in training deep neural networks have prompted active investigation into the features learned on their intermediate layers. Such research is difficult because it requires making sense of non-linear computations performed by millions of parameters, but valuable because it increases our ability to understand current models and create improved versions of them. In this paper we investigate the extent to which neural networks exhibit what we call convergent learning, which is when the representations learned by multiple nets converge to a set of features which are either individually similar between networks or where subsets of features span similar low-dimensional spaces. We propose a specific method of probing representations: training multiple networks and then comparing and contrasting their individual, learned representations at the level of neurons or groups of neurons. We begin research into this question using three techniques to approximately align different neural networks on a feature level: a bipartite matching approach that makes one-to-one assignments between neurons, a sparse prediction approach that finds one-to-many mappings, and a spectral clustering approach that finds many-to-many mappings. This initial investigation reveals a few previously unknown properties of neural networks, and we argue that future research into the question of convergent learning will yield many more. The insights described here include (1) that some features are learned reliably in multiple networks, yet other features are not consistently learned; (2) that units learn to span low-dimensional subspaces and, while these subspaces are common to multiple networks, the specific basis vectors learned are not; (3) that the representation codes show evidence of being a mix between a local code and slightly, but not fully, distributed codes across multiple units.

* Published as a conference paper at ICLR 2016
Click to Read Paper and Get Code