Multi-hop inference is necessary for machine learning systems to successfully solve tasks such as Recognising Textual Entailment and Machine Reading. In this work, we demonstrate the effectiveness of adaptive computation for learning the number of inference steps required for examples of different complexity and that learning the correct number of inference steps is difficult. We introduce the first model involving Adaptive Computation Time which provides a small performance benefit on top of a similar model without an adaptive component as well as enabling considerable insight into the reasoning process of the model.

* Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
Click to Read Paper
Transportation and traffic are currently undergoing a rapid increase in terms of both scale and complexity. At the same time, an increasing share of traffic participants are being transformed into agents driven or supported by artificial intelligence resulting in mixed-intelligence traffic. This work explores the implications of distributed decision-making in mixed-intelligence traffic. The investigations are carried out on the basis of an online-simulated highway scenario, namely the MIT \emph{DeepTraffic} simulation. In the first step traffic agents are trained by means of a deep reinforcement learning approach, being deployed inside an elitist evolutionary algorithm for hyperparameter search. The resulting architectures and training parameters are then utilized in order to either train a single autonomous traffic agent and transfer the learned weights onto a multi-agent scenario or else to conduct multi-agent learning directly. Both learning strategies are evaluated on different ratios of mixed-intelligence traffic. The strategies are assessed according to the average speed of all agents driven by artificial intelligence. Traffic patterns that provoke a reduction in traffic flow are analyzed with respect to the different strategies.

* CEUR Workshop Proceedings 2018
* Proc. of the 10th International Workshop on Agents in Traffic and Transportation (ATT 2018), co-located with ECAI/IJCAI, AAMAS and ICML 2018 conferences (FAIM 2018)
Click to Read Paper
Contextual word representations derived from pre-trained bidirectional language models (biLMs) have recently been shown to provide significant improvements to the state of the art for a wide range of NLP tasks. However, many questions remain as to how and why these models are so effective. In this paper, we present a detailed empirical study of how the choice of neural architecture (e.g. LSTM, CNN, or self attention) influences both end task accuracy and qualitative properties of the representations that are learned. We show there is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks. Additionally, all architectures learn representations that vary with network depth, from exclusively morphological based at the word embedding layer through local syntax based in the lower contextual layers to longer range semantics such coreference at the upper layers. Together, these results suggest that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated.

* EMNLP 2018
Click to Read Paper
Ontology alignment is the task of identifying semantically equivalent entities from two given ontologies. Different ontologies have different representations of the same entity, resulting in a need to de-duplicate entities when merging ontologies. We propose a method for enriching entities in an ontology with external definition and context information, and use this additional information for ontology alignment. We develop a neural architecture capable of encoding the additional information when available, and show that the addition of external data results in an F1-score of 0.69 on the Ontology Alignment Evaluation Initiative (OAEI) largebio SNOMED-NCI subtask, comparable with the entity-level matchers in a SOTA system.

* ACL 2018 BioNLP workshop
Click to Read Paper
We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Our word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. We show that these representations can be easily added to existing models and significantly improve the state of the art across six challenging NLP problems, including question answering, textual entailment and sentiment analysis. We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals.

* NAACL 2018. Originally posted to openreview 27 Oct 2017. v2 updated for NAACL camera ready
Click to Read Paper
This paper describes AllenNLP, a platform for research on deep learning methods in natural language understanding. AllenNLP is designed to support researchers who want to build novel language understanding models quickly and easily. It is built on top of PyTorch, allowing for dynamic computation graphs, and provides (1) a flexible data API that handles intelligent batching and padding, (2) high-level abstractions for common operations in working with text, and (3) a modular and extensible experiment framework that makes doing good science easy. It also includes reference implementations of high quality approaches for both core semantic problems (e.g. semantic role labeling (Palmer et al., 2005)) and language understanding applications (e.g. machine comprehension (Rajpurkar et al., 2016)). AllenNLP is an ongoing open-source effort maintained by engineers and researchers at the Allen Institute for Artificial Intelligence.

* Describes the initial version of AllenNLP. Many features and models have been added since the first release. This is the paper to cite if you use AllenNLP in your research. Updated 5/31/2018 with version accepted to the NLP OSS workshop help at ACL 2018
Click to Read Paper
Neuromorphic computing has come to refer to a variety of brain-inspired computers, devices, and models that contrast the pervasive von Neumann computer architecture. This biologically inspired approach has created highly connected synthetic neurons and synapses that can be used to model neuroscience theories as well as solve challenging machine learning problems. The promise of the technology is to create a brain-like ability to learn and adapt, but the technical challenges are significant, starting with an accurate neuroscience model of how the brain works, to finding materials and engineering breakthroughs to build devices to support these models, to creating a programming framework so the systems can learn, to creating applications with brain-like capabilities. In this work, we provide a comprehensive survey of the research and motivations for neuromorphic computing over its history. We begin with a 35-year review of the motivations and drivers of neuromorphic computing, then look at the major research areas of the field, which we define as neuro-inspired models, algorithms and learning approaches, hardware and devices, supporting systems, and finally applications. We conclude with a broad discussion on the major research topics that need to be addressed in the coming years to see the promise of neuromorphic computing fulfilled. The goals of this work are to provide an exhaustive review of the research conducted in neuromorphic computing since the inception of the term, and to motivate further work by illuminating gaps in the field where new research is needed.

Click to Read Paper
Convolutional neural networks (CNN) are limited by the lack of capability to handle geometric information due to the fixed grid kernel structure. The availability of depth data enables progress in RGB-D semantic segmentation with CNNs. State-of-the-art methods either use depth as additional images or process spatial information in 3D volumes or point clouds. These methods suffer from high computation and memory cost. To address these issues, we present Depth-aware CNN by introducing two intuitive, flexible and effective operations: depth-aware convolution and depth-aware average pooling. By leveraging depth similarity between pixels in the process of information propagation, geometry is seamlessly incorporated into CNN. Without introducing any additional parameters, both operators can be easily integrated into existing CNNs. Extensive experiments and ablation studies on challenging RGB-D semantic segmentation benchmarks validate the effectiveness and flexibility of our approach.

Click to Read Paper
Genetic Programming (GP) has found various applications. Understanding this type of algorithm from a theoretical point of view is a challenging task. The first results on the computational complexity of GP have been obtained for problems with isolated program semantics. With this paper, we push forward the computational complexity analysis of GP on a problem with dependent program semantics. We study the well-known sorting problem in this context and analyze rigorously how GP can deal with different measures of sortedness.

* 12 pages
Click to Read Paper
When a problem instance is perturbed by a small modification, one would hope to find a good solution for the new instance by building on a known good solution for the previous one. Via a rigorous mathematical analysis, we show that evolutionary algorithms, despite usually being robust problem solvers, can have unexpected difficulties to solve such re-optimization problems. When started with a random Hamming neighbor of the optimum, the (1+1) evolutionary algorithm takes $\Omega(n^2)$ time to optimize the LeadingOnes benchmark function, which is the same asymptotic optimization time when started in a randomly chosen solution. There is hence no significant advantage from re-optimizing a structurally good solution. We then propose a way to overcome such difficulties. As our mathematical analysis reveals, the reason for this undesired behavior is that during the optimization structurally good solutions can easily be replaced by structurally worse solutions of equal or better fitness. We propose a simple diversity mechanism that prevents this behavior, thereby reducing the re-optimization time for LeadingOnes to $O(\gamma\delta n)$, where $\gamma$ is the population size used by the diversity mechanism and $\delta \le \gamma$ the Hamming distance of the new optimum from the previous solution. We show similarly fast re-optimization times for the optimization of linear functions with changing constraints and for the minimum spanning tree problem.

Click to Read Paper
Evolutionary algorithms have been widely studied from a theoretical perspective. In particular, the area of runtime analysis has contributed significantly to a theoretical understanding and provided insights into the working behaviour of these algorithms. We study how these insights into evolutionary processes can be used for evolutionary art. We introduce the notion of evolutionary image transition which transfers a given starting image into a target image through an evolutionary process. Combining standard mutation effects known from the optimization of the classical benchmark function OneMax and different variants of random walks, we present ways of performing evolutionary image transition with different artistic effects.

Click to Read Paper
Information forms the basis for all human behavior, including the ubiquitous decision-making that people constantly perform in their every day lives. It is thus the mission of researchers to understand how humans process information to reach decisions. In order to facilitate this task, this work proposes a novel method of studying the reception of granular expressions in natural language. The approach utilizes LASSO regularization as a statistical tool to extract decisive words from textual content and draw statistical inferences based on the correspondence between the occurrences of words and an exogenous response variable. Accordingly, the method immediately suggests significant implications for social sciences and Information Systems research: everyone can now identify text segments and word choices that are statistically relevant to authors or readers and, based on this knowledge, test hypotheses from behavioral research. We demonstrate the contribution of our method by examining how authors communicate subjective information through narrative materials. This allows us to answer the question of which words to choose when communicating negative information. On the other hand, we show that investors trade not only upon facts in financial disclosures but are distracted by filler words and non-informative language. Practitioners - for example those in the fields of investor communications or marketing - can exploit our insights to enhance their writings based on the true perception of word choice.

Click to Read Paper
Point clouds are an efficient data format for 3D data. However, existing 3D segmentation methods for point clouds either do not model local dependencies \cite{pointnet} or require added computations \cite{kd-net,pointnet2}. This work presents a novel 3D segmentation framework, RSNet\footnote{Codes are released here https://github.com/qianguih/RSNet}, to efficiently model local structures in point clouds. The key component of the RSNet is a lightweight local dependency module. It is a combination of a novel slice pooling layer, Recurrent Neural Network (RNN) layers, and a slice unpooling layer. The slice pooling layer is designed to project features of unordered points onto an ordered sequence of feature vectors so that traditional end-to-end learning algorithms (RNNs) can be applied. The performance of RSNet is validated by comprehensive experiments on the S3DIS\cite{stanford}, ScanNet\cite{scannet}, and ShapeNet \cite{shapenet} datasets. In its simplest form, RSNets surpass all previous state-of-the-art methods on these benchmarks. And comparisons against previous state-of-the-art methods \cite{pointnet, pointnet2} demonstrate the efficiency of RSNets.

* camera ready version for cvpr 2018 spotlight. codes are available here https://github.com/qianguih/RSNet
Click to Read Paper
The placement of wind turbines on a given area of land such that the wind farm produces a maximum amount of energy is a challenging optimization problem. In this article, we tackle this problem, taking into account wake effects that are produced by the different turbines on the wind farm. We significantly improve upon existing results for the minimization of wake effects by developing a new problem-specific local search algorithm. One key step in the speed-up of our algorithm is the reduction in computation time needed to assess a given wind farm layout compared to previous approaches. Our new method allows the optimization of large real-world scenarios within a single night on a standard computer, whereas weeks on specialized computing servers were required for previous approaches.

* 16 pages, 2 algorithms, 4 figures, 1 table
Click to Read Paper
Evolutionary diversity optimization aims to compute a diverse set of solutions where all solutions meet a given quality criterion. With this paper, we bridge the areas of evolutionary diversity optimization and evolutionary multi-objective optimization. We show how popular indicators frequently used in the area of multi-objective optimization can be used for evolutionary diversity optimization. Our experimental investigations for evolving diverse sets of TSP instances and images according to various features show that two of the most prominent multi-objective indicators, namely the hypervolume indicator and the inverted generational distance, provide excellent results in terms of visualization and various diversity indicators.

Click to Read Paper
This research proposes a novel indicator-based hybrid evolutionary approach that combines approximate and exact algorithms. We apply it to a new bi-criteria formulation of the travelling thief problem, which is known to the Evolutionary Computation community as a benchmark multi-component optimisation problem that interconnects two classical NP-hard problems: the travelling salesman problem and the 0-1 knapsack problem. Our approach employs the exact dynamic programming algorithm for the underlying Packing-While-Travelling (PWT) problem as a subroutine within a bi-objective evolutionary algorithm. This design takes advantage of the data extracted from Pareto fronts generated by the dynamic program to achieve better solutions. Furthermore, we develop a number of novel indicators and selection mechanisms to strengthen synergy of the two algorithmic components of our approach. The results of computational experiments show that the approach is capable to outperform the state-of-the-art results for the single-objective case of the problem.

Click to Read Paper
Many evolutionary and constructive heuristic approaches have been introduced in order to solve the Traveling Thief Problem (TTP). However, the accuracy of such approaches is unknown due to their inability to find global optima. In this paper, we propose three exact algorithms and a hybrid approach to the TTP. We compare these with state-of-the-art approaches to gather a comprehensive overview on the accuracy of heuristic methods for solving small TTP instances.

* 13 pages, 1 figures
Click to Read Paper
Wind energy plays an increasing role in the supply of energy world-wide. The energy output of a wind farm is highly dependent on the weather condition present at the wind farm. If the output can be predicted more accurately, energy suppliers can coordinate the collaborative production of different energy sources more efficiently to avoid costly overproductions. With this paper, we take a computer science perspective on energy prediction based on weather data and analyze the important parameters as well as their correlation on the energy output. To deal with the interaction of the different parameters we use symbolic regression based on the genetic programming tool DataModeler. Our studies are carried out on publicly available weather and energy data for a wind farm in Australia. We reveal the correlation of the different variables for the energy output. The model obtained for energy prediction gives a very reliable prediction of the energy output for newly given weather data.

* 13 pages, 11 figures, 2 tables
Click to Read Paper
With this paper, we contribute to the understanding of ant colony optimization (ACO) algorithms by formally analyzing their runtime behavior. We study simple MAX-MIN ant systems on the class of linear pseudo-Boolean functions defined on binary strings of length 'n'. Our investigations point out how the progress according to function values is stored in pheromone. We provide a general upper bound of O((n^3 \log n)/ \rho) for two ACO variants on all linear functions, where (\rho) determines the pheromone update strength. Furthermore, we show improved bounds for two well-known linear pseudo-Boolean functions called OneMax and BinVal and give additional insights using an experimental study.

* 19 pages, 2 figures
Click to Read Paper
One main challenge for the design of networks is that traffic load is not generally known in advance. This makes it hard to adequately devote resources such as to best prevent or mitigate bottlenecks. While several authors have shown how to predict traffic in a coarse grained manner by aggregating flows, fine grained prediction of traffic at the level of individual flows, including bursty traffic, is widely considered to be impossible. This paper shows, to the best of our knowledge, the first approach to fine grained per flow traffic prediction. In short, we introduce the Frequency-based Kernel Kalman Filter (FKKF), which predicts individual flows' behavior based on measurements. Our FKKF relies on the well known Kalman Filter in combination with a kernel to support the prediction of non linear functions. Furthermore we change the operating space from time to frequency space. In this space, into which we transform the input data via a Short-Time Fourier Transform (STFT), the peak structures of flows can be predicted after gleaning their key characteristics, with a Principal Component Analysis (PCA), from past and ongoing flows that stem from the same socket-to-socket connection. We demonstrate the effectiveness of our approach on popular benchmark traces from a university data center. Our approach predicts traffic on average across 17 out of 20 groups of flows with an average prediction error of 6.43% around 0.49 (average) seconds in advance, whilst existing coarse grained approaches exhibit prediction errors of 77% at best.

Click to Read Paper