Models, code, and papers for "Jay A":

Data mining involves the systematic analysis of large data sets, and data mining in agricultural soil datasets is exciting and modern research area. The productive capacity of a soil depends on soil fertility. Achieving and maintaining appropriate levels of soil fertility, is of utmost importance if agricultural land is to remain capable of nourishing crop production. In this research, Steps for building a predictive model of soil fertility have been explained. This paper aims at predicting soil fertility class using decision tree algorithms in data mining . Further, it focuses on performance tuning of J48 decision tree algorithm with the help of meta-techniques such as attribute selection and boosting.

Despite outstanding success in vision amongst other domains, many of the recent deep learning approaches have evident drawbacks for robots. This manuscript surveys recent work in the literature that pertain to applying deep learning systems to the robotics domain, either as means of estimation or as a tool to resolve motor commands directly from raw percepts. These recent advances are only a piece to the puzzle. We suggest that deep learning as a tool alone is insufficient in building a unified framework to acquire general intelligence. For this reason, we complement our survey with insights from cognitive development and refer to ideas from classical control theory, producing an integrated direction for a lifelong learning architecture.

Major mistakes do not read

In many scenarios, humans prefer a text-based representation of quantitative data over numerical, tabular, or graphical representations. The attractiveness of textual summaries for complex data has inspired research on data-to-text systems. While there are several data-to-text tools for time series, few of them try to mimic how humans summarize for time series. In this paper, we propose a model to create human-like text descriptions for time series. Our system finds patterns in time series data and ranks these patterns based on empirical observations of human behavior using utility estimation. Our proposed utility estimation model is a Bayesian network capturing interdependencies between different patterns. We describe the learning steps for this network and introduce baselines along with their performance for each step. The output of our system is a natural language description of time series that attempts to match a human's summary of the same data.

There is a resurging interest in developing a neural-network-based solution to the supervised machine learning problem. The convolutional neural network (CNN) will be studied in this note. To begin with, we introduce a RECOS transform as a basic building block of CNNs. The "RECOS" is an acronym for "REctified-COrrelations on a Sphere". It consists of two main concepts: 1) data clustering on a sphere and 2) rectification. Afterwards, we interpret a CNN as a network that implements the guided multi-layer RECOS transform with three highlights. First, we compare the traditional single-layer and modern multi-layer signal analysis approaches, point out key ingredients that enable the multi-layer approach, and provide a full explanation to the operating principle of CNNs. Second, we discuss how guidance is provided by labels through backpropagation (BP) in the training. Third, we show that a trained network can be greatly simplified in the testing stage demanding only one-bit representation for both filter weights and inputs.

This work attempts to address two fundamental questions about the structure of the convolutional neural networks (CNN): 1) why a non-linear activation function is essential at the filter output of every convolutional layer? 2) what is the advantage of the two-layer cascade system over the one-layer system? A mathematical model called the "REctified-COrrelations on a Sphere" (RECOS) is proposed to answer these two questions. After the CNN training process, the converged filter weights define a set of anchor vectors in the RECOS model. Anchor vectors represent the frequently occurring patterns (or the spectral components). The necessity of rectification is explained using the RECOS model. Then, the behavior of a two-layer RECOS system is analyzed and compared with its one-layer counterpart. The LeNet-5 and the MNIST dataset are used to illustrate discussion points. Finally, the RECOS model is generalized to a multi-layer system with the AlexNet as an example. Keywords: Convolutional Neural Network (CNN), Nonlinear Activation, RECOS Model, Rectified Linear Unit (ReLU), MNIST Dataset.

Entity resolution, the problem of identifying the underlying entity of references found in data, has been researched for many decades in many communities. A common theme in this research has been the importance of incorporating relational features into the resolution process. Relational entity resolution is particularly important in knowledge graphs (KGs), which have a regular structure capturing entities and their interrelationships. We identify three major problems in KG entity resolution: (1) intra-KG reference ambiguity; (2) inter-KG reference ambiguity; and (3) ambiguity when extending KGs with new facts. We implement a framework that generalizes across these three settings and exploits this regular structure of KGs. Our framework has many advantages over custom solutions widely deployed in industry, including collective inference, scalability, and interpretability. We apply our framework to two real-world KG entity resolution problems, ambiguity in NELL and merging data from Freebase and MusicBrainz, demonstrating the importance of relational features.

We present a long-term intrinsically motivated structure learning method for modeling transition dynamics during controlled interactions between a robot and semi-permanent structures in the world. In particular, we discuss how partially-observable state is represented using distributions over a Markovian state and build models of objects that predict how state distributions change in response to interactions with such objects. These structures serve as the basis for a number of possible future tasks defined as Markov Decision Processes (MDPs). The approach is an example of a structure learning technique applied to a multimodal affordance representation that yields a population of forward models for use in planning. We evaluate the approach using experiments on a bimanual mobile manipulator (uBot-6) that show the performance of model acquisition as the number of transition actions increases.

Sentence embedding is an important research topic in natural language processing (NLP) since it can transfer knowledge to downstream tasks. Meanwhile, a contextualized word representation, called BERT, achieves the state-of-the-art performance in quite a few NLP tasks. Yet, it is an open problem to generate a high quality sentence representation from BERT-based word models. It was shown in previous study that different layers of BERT capture different linguistic properties. This allows us to fusion information across layers to find better sentence representation. In this work, we study the layer-wise pattern of the word representation of deep contextualized models. Then, we propose a new sentence embedding method by dissecting BERT-based word models through geometric analysis of the space spanned by the word representation. It is called the SBERT-WK method. No further training is required in SBERT-WK. We evaluate SBERT-WK on semantic textual similarity and downstream supervised tasks. Furthermore, ten sentence-level probing tasks are presented for detailed linguistic analysis. Experiments show that SBERT-WK achieves the state-of-the-art performance. Our codes are publicly available.

Limited informative data remains the primary challenge for optimization the expensive complex systems. Learning from limited data and finding the set of variables that optimizes an expected output arise practically in the design problems. In such situations, the underlying function is complex yet unknown, a large number of variables are involved though not all of them are important, and the interactions between the variables are significant. On the other hand, it is usually expensive to collect more data and the outcome is under uncertainty. Unfortunately, despite being real-world challenges, exiting works have not addressed these jointly. We propose a new surrogate optimization approach in this article to tackle these challenges. We design a flexible, non-interpolating, and parsimonious surrogate model using a partitioning technique. The proposed model bends at near-optimal locations and identifies the peaks and valleys for optimization purposes. To discover new candidate points an exploration-exploitation Pareto method is implemented as a sampling strategy. Furthermore, we develop a smart replication approach based on hypothesis testing to overcome the uncertainties associated with the black-box outcome. The Smart-Replication approach identifies promising points to replicate rather than wasting evaluation on less informative data points. We conduct a comprehensive set of experiments on challenging global optimization test functions to evaluate the performance of our proposal.

A new machine learning methodology, called successive subspace learning (SSL), is introduced in this work. SSL contains four key ingredients: 1) successive near-to-far neighborhood expansion; 2) unsupervised dimension reduction via subspace approximation; 3) supervised dimension reduction via label-assisted regression (LAG); and 4) feature concatenation and decision making. An image-based object classification method, called PixelHop, is proposed to illustrate the SSL design. It is shown by experimental results that the PixelHop method outperforms the classic CNN model of similar model complexity in three benchmarking datasets (MNIST, Fashion MNIST and CIFAR-10). Although SSL and deep learning (DL) have some high-level concept in common, they are fundamentally different in model formulation, the training process and training complexity. Extensive discussion on the comparison of SSL and DL is made to provide further insights into the potential of SSL.

In this work, we propose a new data visualization and clustering technique for discovering discriminative structures in high-dimensional data. This technique, referred to as cPCA++, utilizes the fact that the interesting features of a "target" dataset may be obscured by high variance components during traditional PCA. By analyzing what is referred to as a "background" dataset (i.e., one that exhibits the high variance principal components but not the interesting structures), our technique is capable of efficiently highlighting the structure that is unique to the "target" dataset. Similar to another recently proposed algorithm called "contrastive PCA" (cPCA), the proposed cPCA++ method identifies important dataset specific patterns that are not detected by traditional PCA in a wide variety of settings. However, the proposed cPCA++ method is significantly more efficient than cPCA, because it does not require the parameter sweep in the latter approach. We applied the cPCA++ method to the problem of image splicing localization. In this application, we utilize authentic edges as the background dataset and the spliced edges as the target dataset. The proposed method is significantly more efficient than state-of-the-art methods, as the former does not require iterative updates of filter weights via stochastic gradient descent and backpropagation, nor the training of a classifier. Furthermore, the cPCA++ method is shown to provide performance scores comparable to the state-of-the-art Multi-task Fully Convolutional Network (MFCN).

Roads are critically important infrastructure to societal and economic development, with huge investments made by governments every year. However, methods for monitoring those investments tend to be time-consuming, laborious, and expensive, placing them out of reach for many developing regions. In this work, we develop a model for monitoring the quality of road infrastructure using satellite imagery. For this task, we harness two trends: the increasing availability of high-resolution, often-updated satellite imagery, and the enormous improvement in speed and accuracy of convolutional neural network-based methods for performing computer vision tasks. We employ a unique dataset of road quality information on 7000km of roads in Kenya combined with 50cm resolution satellite imagery. We create models for a binary classification task as well as a comprehensive 5-category classification task, with accuracy scores of 88 and 73 percent respectively. We also provide evidence of the robustness of our methods with challenging held-out scenarios, though we note some improvement is still required for confident analysis of a never before seen road. We believe these results are well-positioned to have substantial impact on a broad set of transport applications.

We explore a human-driven approach to annotation, curated training (CT), in which annotation is framed as teaching the system by using interactive search to identify informative snippets of text to annotate, unlike traditional approaches which either annotate preselected text or use active learning. A trained annotator performed 80 hours of CT for the thirty event types of the NIST TAC KBP Event Argument Extraction evaluation. Combining this annotation with ACE results in a 6% reduction in error and the learning curve of CT plateaus more slowly than for full-document annotation. 3 NLP researchers performed CT for one event type and showed much sharper learning curves with all three exceeding ACE performance in less than ninety minutes, suggesting that CT can provide further benefits when the annotator deeply understands the system.

In this work, we analyze how memory forms in recurrent neural networks (RNN) and, based on the analysis, how to increase their memory capabilities in a mathematical rigorous way. Here, we define memory as a function that maps previous elements in a sequence to the current output. Our investigation concludes that the three RNN cells: simple RNN (SRN), long short-term memory (LSTM) and gated recurrent unit (GRU) all suffer memory decay as a function of the distance between the output to the input. To overcome this limitation by design, we introduce trainable scaling factors which act like an attention mechanism to increase the memory response to the semantic inputs if there is a memory decay and to decrease the response if memory decay of the noises is not fast enough. We call the new design extended LSTM (ELSTM). Next, we present a dependent bidirectional recurrent neural network (DBRNN), which is more robust to previous erroneous predictions. Extensive experiments are carried out on different language tasks to demonstrate the superiority of our proposed ELSTM and DBRNN solutions. In dependency parsing (DP), our proposed ELTSM has achieved up to 30% increase of labeled attachment score (LAS) as compared to LSTM and GRU. Our proposed models also outperformed other state-of-the-art models such as bi-attention and convolutional sequence to sequence (convseq2seq) by close to 10% LAS.

Knowledge bases (KB) constructed through information extraction from text play an important role in query answering and reasoning. In this work, we study a particular reasoning task, the problem of discovering causal relationships between entities, known as causal discovery. There are two contrasting types of approaches to discovering causal knowledge. One approach attempts to identify causal relationships from text using automatic extraction techniques, while the other approach infers causation from observational data. However, extractions alone are often insufficient to capture complex patterns and full observational data is expensive to obtain. We introduce a probabilistic method for fusing noisy extractions with observational data to discover causal knowledge. We propose a principled approach that uses the probabilistic soft logic (PSL) framework to encode well-studied constraints to recover long-range patterns and consistent predictions, while cheaply acquired extractions provide a proxy for unseen observations. We apply our method gene regulatory networks and show the promise of exploiting KB signals in causal discovery, suggesting a critical, new area of research.

Being motivated by the multilayer RECOS (REctified-COrrelations on a Sphere) transform, we develop a data-driven Saak (Subspace approximation with augmented kernels) transform in this work. The Saak transform consists of three steps: 1) building the optimal linear subspace approximation with orthonormal bases using the second-order statistics of input vectors, 2) augmenting each transform kernel with its negative, 3) applying the rectified linear unit (ReLU) to the transform output. The Karhunen-Lo\'eve transform (KLT) is used in the first step. The integration of Steps 2 and 3 is powerful since they resolve the sign confusion problem, remove the rectification loss and allow a straightforward implementation of the inverse Saak transform at the same time. Multiple Saak transforms are cascaded to transform images of a larger size. All Saak transform kernels are derived from the second-order statistics of input random vectors in a one-pass feedforward manner. Neither data labels nor backpropagation is used in kernel determination. Multi-stage Saak transforms offer a family of joint spatial-spectral representations between two extremes; namely, the full spatial-domain representation and the full spectral-domain representation. We select Saak coefficients of higher discriminant power to form a feature vector for pattern recognition, and use the MNIST dataset classification problem as an illustrative example.

In this paper, we develop a 2D and 3D segmentation pipelines for fully automated cardiac MR image segmentation using Deep Convolutional Neural Networks (CNN). Our models are trained end-to-end from scratch using the ACD Challenge 2017 dataset comprising of 100 studies, each containing Cardiac MR images in End Diastole and End Systole phase. We show that both our segmentation models achieve near state-of-the-art performance scores in terms of distance metrics and have convincing accuracy in terms of clinical parameters. A comparative analysis is provided by introducing a novel dice loss function and its combination with cross entropy loss. By exploring different network structures and comprehensive experiments, we discuss several key insights to obtain optimal model performance, which also is central to the theme of this challenge.

This paper presents a new approach for measuring semantic similarity/distance between words and concepts. It combines a lexical taxonomy structure with corpus statistical information so that the semantic distance between nodes in the semantic space constructed by the taxonomy can be better quantified with the computational evidence derived from a distributional analysis of corpus data. Specifically, the proposed measure is a combined approach that inherits the edge-based approach of the edge counting scheme, which is then enhanced by the node-based approach of the information content calculation. When tested on a common data set of word pair similarity ratings, the proposed approach outperforms other computational models. It gives the highest correlation value (r = 0.828) with a benchmark based on human similarity judgements, whereas an upper bound (r = 0.885) is observed when human subjects replicate the same task.