Models, code, and papers for "Vinicius B":
In this work, we present a novel strategy for correcting imperfections in occupancy grid maps called map decay. The objective of map decay is to correct invalid occupancy probabilities of map cells that are unobservable by sensors. The strategy was inspired by an analogy between the memory architecture believed to exist in the human brain and the maps maintained by an autonomous vehicle. It consists in merging sensory information obtained during runtime (online) with a priori data from a high-precision map constructed offline. In map decay, cells observed by sensors are updated using traditional occupancy grid mapping techniques and unobserved cells are adjusted so that their occupancy probabilities tend to the values found in the offline map. This strategy is grounded in the idea that the most precise information available about an unobservable cell is the value found in the high-precision offline map. Map decay was successfully tested and is still in use in the IARA autonomous vehicle from Universidade Federal do Esp\'irito Santo.
We propose a bio-inspired foveated technique to detect cars in a long range camera view using a deep convolutional neural network (DCNN) for the IARA self-driving car. The DCNN receives as input (i) an image, which is captured by a camera installed on IARA's roof; and (ii) crops of the image, which are centered in the waypoints computed by IARA's path planner and whose sizes increase with the distance from IARA. We employ an overlap filter to discard detections of the same car in different crops of the same image based on the percentage of overlap of detections' bounding boxes. We evaluated the performance of the proposed augmented-range vehicle detection system (ARVDS) using the hardware and software infrastructure available in the IARA self-driving car. Using IARA, we captured thousands of images of real traffic situations containing cars in a long range. Experimental results show that ARVDS increases the Average Precision (AP) of long range car detection from 29.51% (using a single whole image) to 63.15%.
We propose the use of deep neural networks (DNN) for solving the problem of inferring the position and relevant properties of lanes of urban roads with poor or absent horizontal signalization, in order to allow the operation of autonomous cars in such situations. We take a segmentation approach to the problem and use the Efficient Neural Network (ENet) DNN for segmenting LiDAR remission grid maps into road maps. We represent road maps using what we called road grid maps. Road grid maps are square matrixes and each element of these matrixes represents a small square region of real-world space. The value of each element is a code associated with the semantics of the road map. Our road grid maps contain all information about the roads' lanes required for building the Road Definition Data Files (RDDFs) that are necessary for the operation of our autonomous car, IARA (Intelligent Autonomous Robotic Automobile). We have built a dataset of tens of kilometers of manually marked road lanes and used part of it to train ENet to segment road grid maps from remission grid maps. After being trained, ENet achieved an average segmentation accuracy of 83.7%. We have tested the use of inferred road grid maps in the real world using IARA on a stretch of 3.7 km of urban roads and it has shown performance equivalent to that of the previous IARA's subsystem that uses a manually generated RDDF.
Autonomous terrestrial vehicles must be capable of perceiving traffic lights and recognizing their current states to share the streets with human drivers. Most of the time, human drivers can easily identify the relevant traffic lights. To deal with this issue, a common solution for autonomous cars is to integrate recognition with prior maps. However, additional solution is required for the detection and recognition of the traffic light. Deep learning techniques have showed great performance and power of generalization including traffic related problems. Motivated by the advances in deep learning, some recent works leveraged some state-of-the-art deep detectors to locate (and further recognize) traffic lights from 2D camera images. However, none of them combine the power of the deep learning-based detectors with prior maps to recognize the state of the relevant traffic lights. Based on that, this work proposes to integrate the power of deep learning-based detection with the prior maps used by our car platform IARA (acronym for Intelligent Autonomous Robotic Automobile) to recognize the relevant traffic lights of predefined routes. The process is divided in two phases: an offline phase for map construction and traffic lights annotation; and an online phase for traffic light recognition and identification of the relevant ones. The proposed system was evaluated on five test cases (routes) in the city of Vit\'oria, each case being composed of a video sequence and a prior map with the relevant traffic lights for the route. Results showed that the proposed technique is able to correctly identify the relevant traffic light along the trajectory.
Artificial intelligence (AI) has undergone a renaissance recently, making major progress in key domains such as vision, language, control, and decision-making. This has been due, in part, to cheap data and cheap compute resources, which have fit the natural strengths of deep learning. However, many defining characteristics of human intelligence, which developed under much different pressures, remain out of reach for current approaches. In particular, generalizing beyond one's experiences--a hallmark of human intelligence from infancy--remains a formidable challenge for modern AI. The following is part position paper, part review, and part unification. We argue that combinatorial generalization must be a top priority for AI to achieve human-like abilities, and that structured representations and computations are key to realizing this objective. Just as biology uses nature and nurture cooperatively, we reject the false choice between "hand-engineering" and "end-to-end" learning, and instead advocate for an approach which benefits from their complementary strengths. We explore how using relational inductive biases within deep learning architectures can facilitate learning about entities, relations, and rules for composing them. We present a new building block for the AI toolkit with a strong relational inductive bias--the graph network--which generalizes and extends various approaches for neural networks that operate on graphs, and provides a straightforward interface for manipulating structured knowledge and producing structured behaviors. We discuss how graph networks can support relational reasoning and combinatorial generalization, laying the foundation for more sophisticated, interpretable, and flexible patterns of reasoning. As a companion to this paper, we have released an open-source software library for building graph networks, with demonstrations of how to use them in practice.
In this paper, we demonstrate Untrue News, a new search engine for fake stories. Untrue News is easy to use and offers useful features such as: a) a multi-language option combining fake stories from different countries and languages around the same subject or person; b) an user privacy protector, avoiding the filter bubble by employing a bias-free ranking scheme; and c) a collaborative platform that fosters the development of new tools for fighting disinformation. Untrue News relies on Elasticsearch, a new scalable analytic search engine based on the Lucene library that provides near real-time results. We demonstrate two key scenarios: the first related to a politician - looking how the categories are shown for different types of fake stories - and a second related to a refugee - showing the multilingual tool. A prototype of Untrue News is accessible via http://untrue.news
Convolutional neural networks (ConvNets) are the actual standard for image recognizement and classification. On the present work we develop a Computer Aided-Diagnosis (CAD) system using ConvNets to classify a x-rays chest images dataset in two groups: Normal and Pneumonia. The study uses ConvNets models available on the PyTorch platform: AlexNet, SqueezeNet, ResNet and Inception. We initially use three training styles: complete from scratch using random initialization, using a pre-trained ImageNet model training only the last layer adapted to our problem (transfer learning) and a pre-trained model modified training all the classifying layers of the model (fine tuning). The last strategy of training used is with data augmentation techniques that avoid over fitting problems on ConvNets yielding the better results on this study
Several constrained optimization problems have been adequately solved over the years thanks to advances in the metaheuristics area. In this paper, we evaluate a novel self-adaptive and auto-constructive metaheuristic called Drone Squadron Optimization (DSO) in solving constrained engineering design problems. This paper evaluates DSO with death penalty on three widely tested engineering design problems. Results show that the proposed approach is competitive with some very popular metaheuristics.
In this work, we explore the issue of the inter-annotator agreement for training and evaluating automated segmentation of skin lesions. We explore what different degrees of agreement represent, and how they affect different use cases for segmentation. We also evaluate how conditioning the ground truths using different (but very simple) algorithms may help to enhance agreement and may be appropriate for some use cases. The segmentation of skin lesions is a cornerstone task for automated skin lesion analysis, useful both as an end-result to locate/detect the lesions and as an ancillary task for lesion classification. Lesion segmentation, however, is a very challenging task, due not only to the challenge of image segmentation itself but also to the difficulty in obtaining properly annotated data. Detecting accurately the borders of lesions is challenging even for trained humans, since, for many lesions, those borders are fuzzy and ill-defined. Using lesions and annotations from the ISIC Archive, we estimate inter-annotator agreement for skin-lesion segmentation and propose several simple procedures that may help to improve inter-annotator agreement if used to condition the ground truths.
Unsupervised change detection techniques are generally constrained to two multi-band optical images acquired at different times through sensors sharing the same spatial and spectral resolution. This scenario is suitable for a straight comparison of homologous pixels such as pixel-wise differencing. However, in some specific cases such as emergency situations, the only available images may be those acquired through different kinds of sensors with different resolutions. Recently some change detection techniques dealing with images with different spatial and spectral resolutions, have been proposed. Nevertheless, they are focused on a specific scenario where one image has a high spatial and low spectral resolution while the other has a low spatial and high spectral resolution. This paper addresses the problem of detecting changes between any two multi-band optical images disregarding their spatial and spectral resolution disparities. We propose a method that effectively uses the available information by modeling the two observed images as spatially and spectrally degraded versions of two (unobserved) latent images characterized by the same high spatial and high spectral resolutions. Covering the same scene, the latent images are expected to be globally similar except for possible changes in spatially sparse locations. Thus, the change detection task is envisioned through a robust fusion task which enforces the differences between the estimated latent images to be spatially sparse. We show that this robust fusion can be formulated as an inverse problem which is iteratively solved using an alternate minimization strategy. The proposed framework is implemented for an exhaustive list of applicative scenarios and applied to real multi-band optical images. A comparison with state-of-the-art change detection methods evidences the accuracy of the proposed robust fusion-based strategy.
This paper proposes Drone Squadron Optimization, a new self-adaptive metaheuristic for global numerical optimization which is updated online by a hyper-heuristic. DSO is an artifact-inspired technique, as opposed to many algorithms used nowadays, which are nature-inspired. DSO is very flexible because it is not related to behaviors or natural phenomena. DSO has two core parts: the semi-autonomous drones that fly over a landscape to explore, and the Command Center that processes the retrieved data and updates the drones' firmware whenever necessary. The self-adaptive aspect of DSO in this work is the perturbation/movement scheme, which is the procedure used to generate target coordinates. This procedure is evolved by the Command Center during the global optimization process in order to adapt DSO to the search landscape. DSO was evaluated on a set of widely employed benchmark functions. The statistical analysis of the results shows that the proposed method is competitive with the other methods in the comparison, the performance is promising, but several future improvements are planned.
Advances in machine learning and deep neural networks for object detection, coupled with lower cost and power requirements of cameras, led to promising vision-based solutions for sUAS detection. However, solely relying on the visible spectrum has previously led to reliability issues in low contrast scenarios such as sUAS flying below the treeline and against bright sources of light. Alternatively, due to the relatively high heat signatures emitted from sUAS during flight, a long-wave infrared (LWIR) sensor is able to produce images that clearly contrast the sUAS from its background. However, compared to widely available visible spectrum sensors, LWIR sensors have lower resolution and may produce more false positives when exposed to birds or other heat sources. This research work proposes combining the advantages of the LWIR and visible spectrum sensors using machine learning for vision-based detection of sUAS. Utilizing the heightened background contrast from the LWIR sensor combined and synchronized with the relatively increased resolution of the visible spectrum sensor, a deep learning model was trained to detect the sUAS through previously difficult environments. More specifically, the approach demonstrated effective detection of multiple sUAS flying above and below the treeline, in the presence of heat sources, and glare from the sun. Our approach achieved a detection rate of 71.2 +- 8.3%, improving by 69% when compared to LWIR and by 30.4% when visible spectrum alone, and achieved false alarm rate of 2.7 +- 2.6%, decreasing by 74.1% and by 47.1% when compared to LWIR and visible spectrum alone, respectively, on average, for single and multiple drone scenarios, controlled for the same confidence metric of the machine learning object detector of at least 50%. Videos of the solution's performance can be seen at https://sites.google.com/view/tamudrone-spie2020/.
Mutual information (MI) is the standard method used in image registration and the most studied one but can diverge and produce wrong results when used in an automated manner. In this study we compared the results of the ITK Mattes MI function, used in 3D Slicer and ITK derived software solutions, and our own MICUDA Shannon and Tsallis MI functions under the translation, rotation and scale transforms in a 3D mathematical space. This comparison allows to understand why registration fails in some circumstances and how to produce a more robust automated algorithm to register medical images. Since our algorithms were designed to use GPU computations we also have a huge gain in speed while improving the quality of registration.
Keyword extraction has received an increasing attention as an important research topic which can lead to have advancements in diverse applications such as document context categorization, text indexing and document classification. In this paper we propose STF-IDF, a novel semantic method based on TF-IDF, for scoring word importance of informal documents in a corpus. A set of nearly four million documents from health-care social media was collected and was trained in order to draw semantic model and to find the word embeddings. Then, the features of semantic space were utilized to rearrange the original TF-IDF scores through an iterative solution so as to improve the moderate performance of this algorithm on informal texts. After testing the proposed method with 200 randomly chosen documents, our method managed to decrease the TF-IDF mean error rate by a factor of 50% and reaching the mean error of 13.7%, as opposed to 27.2% of the original TF-IDF.
Graphs are useful structures that can model several important real-world problems. Recently, learning graphs have drawn considerable attention, leading to the proposal of new methods for learning these data structures. One of these studies produced NetGAN, a new approach for generating graphs via random walks. Although NetGAN has shown promising results in terms of accuracy in the tasks of generating graphs and link prediction, the choice of vertices from which it starts random walks can lead to inconsistent and highly variable results, especially when the length of walks is short. As an alternative to random starting, this study aims to establish a new method for initializing random walks from a set of dense vertices. We purpose estimating the importance of a node based on the inverse of its influence over the whole vertices of its neighborhood through random walks of different sizes. The proposed method manages to achieve significantly better accuracy, less variance and lesser outliers.
Lexicase selection achieves very good solution quality by introducing ordered test cases. However, the computational complexity of lexicase selection can prohibit its use in many applications. In this paper, we introduce Batch Tournament Selection (BTS), a hybrid of tournament and lexicase selection which is approximately one order of magnitude faster than lexicase selection while achieving a competitive quality of solutions. Tests on a number of regression datasets show that BTS compares well with lexicase selection in terms of mean absolute error while having a speed-up of up to 25 times. Surprisingly, BTS and lexicase selection have almost no difference in both diversity and performance. This reveals that batches and ordered test cases are completely different mechanisms which share the same general principle fostering the specialization of individuals. This work introduces an efficient algorithm that sheds light onto the main principles behind the success of lexicase, potentially opening up a new range of possibilities for algorithms to come.
In this work, we propose an analysis of the presence of gender bias associated with professions in Portuguese word embeddings. The objective of this work is to study gender implications related to stereotyped professions for women and men in the context of the Portuguese language.
The remarkable technological advance in well-equipped wearable devices is pushing an increasing production of long first-person videos. However, since most of these videos have long and tedious parts, they are forgotten or never seen. Despite a large number of techniques proposed to fast-forward these videos by highlighting relevant moments, most of them are image based only. Most of these techniques disregard other relevant sensors present in the current devices such as high-definition microphones. In this work, we propose a new approach to fast-forward videos using psychoacoustic metrics extracted from the soundtrack. These metrics can be used to estimate the annoyance of a segment allowing our method to emphasize moments of sound pleasantness. The efficiency of our method is demonstrated through qualitative results and quantitative results as far as of speed-up and instability are concerned.
Automated discourse analysis tools based on Natural Language Processing (NLP) aiming at the diagnosis of language-impairing dementias generally extract several textual metrics of narrative transcripts. However, the absence of sentence boundary segmentation in the transcripts prevents the direct application of NLP methods which rely on these marks to function properly, such as taggers and parsers. We present the first steps taken towards automatic neuropsychological evaluation based on narrative discourse analysis, presenting a new automatic sentence segmentation method for impaired speech. Our model uses recurrent convolutional neural networks with prosodic, Part of Speech (PoS) features, and word embeddings. It was evaluated intrinsically on impaired, spontaneous speech, as well as, normal, prepared speech, and presents better results for healthy elderly (CTL) (F1 = 0.74) and Mild Cognitive Impairment (MCI) patients (F1 = 0.70) than the Conditional Random Fields method (F1 = 0.55 and 0.53, respectively) used in the same context of our study. The results suggest that our model is robust for impaired speech and can be used in automated discourse analysis tools to differentiate narratives produced by MCI and CTL.
Archetypal scenarios for change detection generally consider two images acquired through sensors of the same modality. However, in some specific cases such as emergency situations, the only images available may be those acquired through different kinds of sensors. More precisely, this paper addresses the problem of detecting changes between two multi-band optical images characterized by different spatial and spectral resolutions. This sensor dissimilarity introduces additional issues in the context of operational change detection. To alleviate these issues, classical change detection methods are applied after independent preprocessing steps (e.g., resampling) used to get the same spatial and spectral resolutions for the pair of observed images. Nevertheless, these preprocessing steps tend to throw away relevant information. Conversely, in this paper, we propose a method that more effectively uses the available information by modeling the two observed images as spatial and spectral versions of two (unobserved) latent images characterized by the same high spatial and high spectral resolutions. As they cover the same scene, these latent images are expected to be globally similar except for possible changes in sparse spatial locations. Thus, the change detection task is envisioned through a robust multi-band image fusion method which enforces the differences between the estimated latent images to be spatially sparse. This robust fusion problem is formulated as an inverse problem which is iteratively solved using an efficient block-coordinate descent algorithm. The proposed method is applied to real panchormatic/multispectral and hyperspectral images with simulated realistic changes. A comparison with state-of-the-art change detection methods evidences the accuracy of the proposed strategy.