Models, code, and papers for "Alberto F":
Decreasing costs of vision sensors and advances in embedded hardware boosted lane related research detection, estimation, and tracking in the past two decades. The interest in this topic has increased even more with the demand for advanced driver assistance systems (ADAS) and self-driving cars. Although extensively studied independently, there is still need for studies that propose a combined solution for the multiple problems related to the ego-lane, such as lane departure warning (LDW), lane change detection, lane marking type (LMT) classification, road markings detection and classification, and detection of adjacent lanes (i.e., immediate left and right lanes) presence. In this paper, we propose a real-time Ego-Lane Analysis System (ELAS) capable of estimating ego-lane position, classifying LMTs and road markings, performing LDW and detecting lane change events. The proposed vision-based system works on a temporal sequence of images. Lane marking features are extracted in perspective and Inverse Perspective Mapping (IPM) images that are combined to increase robustness. The final estimated lane is modeled as a spline using a combination of methods (Hough lines with Kalman filter and spline with particle filter). Based on the estimated lane, all other events are detected. To validate ELAS and cover the lack of lane datasets in the literature, a new dataset with more than 20 different scenes (in more than 15,000 frames) and considering a variety of scenarios (urban road, highways, traffic, shadows, etc.) was created. The dataset was manually annotated and made publicly available to enable evaluation of several events that are of interest for the research community (i.e., lane estimation, change, and centering; road markings; intersections; LMTs; crosswalks and adjacent lanes). ELAS achieved high detection rates in all real-world events and proved to be ready for real-time applications.
Currently, self-driving cars rely greatly on the Global Positioning System (GPS) infrastructure, albeit there is an increasing demand for alternative methods for GPS-denied environments. One of them is known as place recognition, which associates images of places with their corresponding positions. We previously proposed systems based on Weightless Neural Networks (WNN) to address this problem as a classification task. This encompasses solely one part of the global localization, which is not precise enough for driverless cars. Instead of just recognizing past places and outputting their poses, it is desired that a global localization system estimates the pose of current place images. In this paper, we propose to tackle this problem as follows. Firstly, given a live image, the place recognition system returns the most similar image and its pose. Then, given live and recollected images, a visual localization system outputs the relative camera pose represented by those images. To estimate the relative camera pose between the recollected and the current images, a Convolutional Neural Network (CNN) is trained with the two images as input and a relative pose vector as output. Together, these systems solve the global localization problem using the topological and metric information to approximate the current vehicle pose. The full approach is compared to a Real- Time Kinematic GPS system and a Simultaneous Localization and Mapping (SLAM) system. Experimental results show that the proposed approach correctly localizes a vehicle 90% of the time with a mean error of 1.20m compared to 1.12m of the SLAM system and 0.37m of the GPS, 89% of the time.
An important logistics application of robotics involves manipulators that pick-and-place objects placed in warehouse shelves. A critical aspect of this task corre- sponds to detecting the pose of a known object in the shelf using visual data. Solving this problem can be assisted by the use of an RGB-D sensor, which also provides depth information beyond visual data. Nevertheless, it remains a challenging problem since multiple issues need to be addressed, such as low illumination inside shelves, clutter, texture-less and reflective objects as well as the limitations of depth sensors. This paper provides a new rich data set for advancing the state-of-the-art in RGBD- based 3D object pose estimation, which is focused on the challenges that arise when solving warehouse pick- and-place tasks. The publicly available data set includes thousands of images and corresponding ground truth data for the objects used during the first Amazon Picking Challenge at different poses and clutter conditions. Each image is accompanied with ground truth information to assist in the evaluation of algorithms for object detection. To show the utility of the data set, a recent algorithm for RGBD-based pose estimation is evaluated in this paper. Based on the measured performance of the algorithm on the data set, various modifications and improvements are applied to increase the accuracy of detection. These steps can be easily applied to a variety of different methodologies for object pose detection and improve performance in the domain of warehouse pick-and-place.
Correctly identifying crosswalks is an essential task for the driving activity and mobility autonomy. Many crosswalk classification, detection and localization systems have been proposed in the literature over the years. These systems use different perspectives to tackle the crosswalk classification problem: satellite imagery, cockpit view (from the top of a car or behind the windshield), and pedestrian perspective. Most of the works in the literature are designed and evaluated using small and local datasets, i.e. datasets that present low diversity. Scaling to large datasets imposes a challenge for the annotation procedure. Moreover, there is still need for cross-database experiments in the literature because it is usually hard to collect the data in the same place and conditions of the final application. In this paper, we present a crosswalk classification system based on deep learning. For that, crowdsourcing platforms, such as OpenStreetMap and Google Street View, are exploited to enable automatic training via automatic acquisition and annotation of a large-scale database. Additionally, this work proposes a comparison study of models trained using fully-automatic data acquisition and annotation against models that were partially annotated. Cross-database experiments were also included in the experimentation to show that the proposed methods enable use with real world applications. Our results show that the model trained on the fully-automatic database achieved high overall accuracy (94.12%), and that a statistically significant improvement (to 96.30%) can be achieved by manually annotating a specific part of the database. Finally, the results of the cross-database experiments show that both models are robust to the many variations of image and scenarios, presenting a consistent behavior.
High-resolution satellite imagery have been increasingly used on remote sensing classification problems. One of the main factors is the availability of this kind of data. Even though, very little effort has been placed on the zebra crossing classification problem. In this letter, crowdsourcing systems are exploited in order to enable the automatic acquisition and annotation of a large-scale satellite imagery database for crosswalks related tasks. Then, this dataset is used to train deep-learning-based models in order to accurately classify satellite images that contains or not zebra crossings. A novel dataset with more than 240,000 images from 3 continents, 9 countries and more than 20 cities was used in the experiments. Experimental results showed that freely available crowdsourcing data can be used to accurately (97.11%) train robust models to perform crosswalk classification on a global scale.
Community detection is key to understand the structure of complex networks. However, the lack of appropriate evaluation strategies for this specific task may produce biased and incorrect results that might invalidate further analyses or applications based on such networks. In this context, the main contribution of this paper is an approach that supports a robust quality evaluation when detecting communities in real-world networks. In our approach, we use multiple strategies that capture distinct aspects of the communities. The conclusion on the quality of these communities is based on the consensus among the strategies adopted for the structural evaluation, as well as on the comparison with communities detected by different methods and with their existing ground truths. In this way, our approach allows one to overcome biases in network data, detection algorithms and evaluation metrics, thus providing more consistent conclusions about the quality of the detected communities. Experiments conducted with several real and synthetic networks provided results that show the effectiveness of our approach.
In the past few years, Convolutional Neural Networks (CNNs) have been achieving state-of-the-art performance on a variety of problems. Many companies employ resources and money to generate these models and provide them as an API, therefore it is in their best interest to protect them, i.e., to avoid that someone else copies them. Recent studies revealed that state-of-the-art CNNs are vulnerable to adversarial examples attacks, and this weakness indicates that CNNs do not need to operate in the problem domain (PD). Therefore, we hypothesize that they also do not need to be trained with examples of the PD in order to operate in it. Given these facts, in this paper, we investigate if a target black-box CNN can be copied by persuading it to confess its knowledge through random non-labeled data. The copy is two-fold: i) the target network is queried with random data and its predictions are used to create a fake dataset with the knowledge of the network; and ii) a copycat network is trained with the fake dataset and should be able to achieve similar performance as the target network. This hypothesis was evaluated locally in three problems (facial expression, object, and crosswalk classification) and against a cloud-based API. In the copy attacks, images from both non-problem domain and PD were used. All copycat networks achieved at least 93.7% of the performance of the original models with non-problem domain data, and at least 98.6% using additional data from the PD. Additionally, the copycat CNN successfully copied at least 97.3% of the performance of the Microsoft Azure Emotion API. Our results show that it is possible to create a copycat CNN by simply querying a target network as black-box with random non-labeled data.
Deep learning techniques have enabled the emergence of state-of-the-art models to address object detection tasks. However, these techniques are data-driven, delegating the accuracy to the training dataset which must resemble the images in the target task. The acquisition of a dataset involves annotating images, an arduous and expensive process, generally requiring time and manual effort. Thus, a challenging scenario arises when the target domain of application has no annotated dataset available, making tasks in such situation to lean on a training dataset of a different domain. Sharing this issue, object detection is a vital task for autonomous vehicles where the large amount of driving scenarios yields several domains of application requiring annotated data for the training process. In this work, a method for training a car detection system with annotated data from a source domain (day images) without requiring the image annotations of the target domain (night images) is presented. For that, a model based on Generative Adversarial Networks (GANs) is explored to enable the generation of an artificial dataset with its respective annotations. The artificial dataset (fake dataset) is created translating images from day-time domain to night-time domain. The fake dataset, which comprises annotated images of only the target domain (night images), is then used to train the car detector model. Experimental results showed that the proposed method achieved significant and consistent improvements, including the increasing by more than 10% of the detection performance when compared to the training with only the available annotated data (i.e., day images).
Deep learning has been successfully applied to several problems related to autonomous driving. Often, these solutions rely on large networks that require databases of real image samples of the problem (i.e., real world) for proper training. The acquisition of such real-world data sets is not always possible in the autonomous driving context, and sometimes their annotation is not feasible (e.g., takes too long or is too expensive). Moreover, in many tasks, there is an intrinsic data imbalance that most learning-based methods struggle to cope with. It turns out that traffic sign detection is a problem in which these three issues are seen altogether. In this work, we propose a novel database generation method that requires only (i) arbitrary natural images, i.e., requires no real image from the domain of interest, and (ii) templates of the traffic signs, i.e., templates synthetically created to illustrate the appearance of the category of a traffic sign. The effortlessly generated training database is shown to be effective for the training of a deep detector (such as Faster R-CNN) on German traffic signs, achieving 95.66% of mAP on average. In addition, the proposed method is able to detect traffic signs with an average precision, recall and F1-score of about 94%, 91% and 93%, respectively. The experiments surprisingly show that detectors can be trained with simple data generation methods and without problem domain data for the background, which is in the opposite direction of the common sense for deep learning.
In this work, we present a novel strategy for correcting imperfections in occupancy grid maps called map decay. The objective of map decay is to correct invalid occupancy probabilities of map cells that are unobservable by sensors. The strategy was inspired by an analogy between the memory architecture believed to exist in the human brain and the maps maintained by an autonomous vehicle. It consists in merging sensory information obtained during runtime (online) with a priori data from a high-precision map constructed offline. In map decay, cells observed by sensors are updated using traditional occupancy grid mapping techniques and unobserved cells are adjusted so that their occupancy probabilities tend to the values found in the offline map. This strategy is grounded in the idea that the most precise information available about an unobservable cell is the value found in the high-precision offline map. Map decay was successfully tested and is still in use in the IARA autonomous vehicle from Universidade Federal do Esp\'irito Santo.
We propose a bio-inspired foveated technique to detect cars in a long range camera view using a deep convolutional neural network (DCNN) for the IARA self-driving car. The DCNN receives as input (i) an image, which is captured by a camera installed on IARA's roof; and (ii) crops of the image, which are centered in the waypoints computed by IARA's path planner and whose sizes increase with the distance from IARA. We employ an overlap filter to discard detections of the same car in different crops of the same image based on the percentage of overlap of detections' bounding boxes. We evaluated the performance of the proposed augmented-range vehicle detection system (ARVDS) using the hardware and software infrastructure available in the IARA self-driving car. Using IARA, we captured thousands of images of real traffic situations containing cars in a long range. Experimental results show that ARVDS increases the Average Precision (AP) of long range car detection from 29.51% (using a single whole image) to 63.15%.
We present the Model-Predictive Motion Planner (MPMP) of the Intelligent Autonomous Robotic Automobile (IARA). IARA is a fully autonomous car that uses a path planner to compute a path from its current position to the desired destination. Using this path, the current position, a goal in the path and a map, IARA's MPMP is able to compute smooth trajectories from its current position to the goal in less than 50 ms. MPMP computes the poses of these trajectories so that they follow the path closely and, at the same time, are at a safe distance of eventual obstacles. Our experiments have shown that MPMP is able to compute trajectories that precisely follow a path produced by a Human driver (distance of 0.15 m in average) while smoothly driving IARA at speeds of up to 32.4 km/h (9 m/s).
We propose the use of deep neural networks (DNN) for solving the problem of inferring the position and relevant properties of lanes of urban roads with poor or absent horizontal signalization, in order to allow the operation of autonomous cars in such situations. We take a segmentation approach to the problem and use the Efficient Neural Network (ENet) DNN for segmenting LiDAR remission grid maps into road maps. We represent road maps using what we called road grid maps. Road grid maps are square matrixes and each element of these matrixes represents a small square region of real-world space. The value of each element is a code associated with the semantics of the road map. Our road grid maps contain all information about the roads' lanes required for building the Road Definition Data Files (RDDFs) that are necessary for the operation of our autonomous car, IARA (Intelligent Autonomous Robotic Automobile). We have built a dataset of tens of kilometers of manually marked road lanes and used part of it to train ENet to segment road grid maps from remission grid maps. After being trained, ENet achieved an average segmentation accuracy of 83.7%. We have tested the use of inferred road grid maps in the real world using IARA on a stretch of 3.7 km of urban roads and it has shown performance equivalent to that of the previous IARA's subsystem that uses a manually generated RDDF.
Autonomous terrestrial vehicles must be capable of perceiving traffic lights and recognizing their current states to share the streets with human drivers. Most of the time, human drivers can easily identify the relevant traffic lights. To deal with this issue, a common solution for autonomous cars is to integrate recognition with prior maps. However, additional solution is required for the detection and recognition of the traffic light. Deep learning techniques have showed great performance and power of generalization including traffic related problems. Motivated by the advances in deep learning, some recent works leveraged some state-of-the-art deep detectors to locate (and further recognize) traffic lights from 2D camera images. However, none of them combine the power of the deep learning-based detectors with prior maps to recognize the state of the relevant traffic lights. Based on that, this work proposes to integrate the power of deep learning-based detection with the prior maps used by our car platform IARA (acronym for Intelligent Autonomous Robotic Automobile) to recognize the relevant traffic lights of predefined routes. The process is divided in two phases: an offline phase for map construction and traffic lights annotation; and an online phase for traffic light recognition and identification of the relevant ones. The proposed system was evaluated on five test cases (routes) in the city of Vit\'oria, each case being composed of a video sequence and a prior map with the relevant traffic lights for the route. Results showed that the proposed technique is able to correctly identify the relevant traffic light along the trajectory.
Global optimization problems whose objective function is expensive to evaluate can be solved effectively by recursively fitting a surrogate function to function samples and minimizing an acquisition function to generate new samples. The acquisition step trades off between seeking for a new optimization vector where the surrogate is minimum (exploitation of the surrogate) and looking for regions of the feasible space that have not yet been visited and that may potentially contain better values of the objective function (exploration of the feasible space). This paper proposes a new global optimization algorithm that uses a combination of inverse distance weighting (IDW) and radial basis functions (RBF) to construct the acquisition function. Rather arbitrary constraints that are simple to evaluate can be easily taken into account by the approach. Compared to Bayesian optimization, the proposed algorithm is computationally lighter and, as we show in a set of benchmark global optimization and hyperparameter tuning problems, it has a very similar (and sometimes superior) performance. MATLAB and Python implementations of the proposed approach are available at http://cse.lab.imtlucca.it/~bemporad/idwgopt
Natural languages are complexly structured entities. They exhibit characterising regularities that can be exploited to link them one another. In this work, I compare two morphological aspects of languages: Written Patterns and Sentence Structure. I show how languages spontaneously group by similarity in both analyses and derive an average language distance. Finally, exploiting Sentence Structure I developed an Artificial Neural Network capable of distinguishing languages suggesting that not only word roots but also grammatical sentence structure is a characterising trait which alone suffice to identify them.
Attacks to networks are becoming more complex and sophisticated every day. Beyond the so-called script-kiddies and hacking newbies, there is a myriad of professional attackers seeking to make serious profits infiltrating in corporate networks. Either hostile governments, big corporations or mafias are constantly increasing their resources and skills in cybercrime in order to spy, steal or cause damage more effectively. traditional approaches to Network Security seem to start hitting their limits and it is being recognized the need for a smarter approach to threat detections. This paper provides an introduction on the need for evolution of Cyber Security techniques and how Artificial Intelligence could be of application to help solving some of the problems. It provides also, a high-level overview of some state of the art AI Network Security techniques, to finish analysing what is the foreseeable future of the application of AI to Network Security.
Verbal metonymy has received relatively scarce attention in the field of computational linguistics despite the fact that a model to accurately paraphrase metonymy has applications both in academia and the technology sector. The method described in this paper makes use of data from the British National Corpus in order to create word vectors, find instances of verbal metonymy and generate potential paraphrases. Two different ways of creating word vectors are evaluated in this study: Continuous bag of words and Skip-grams. Skip-grams are found to outperform the Continuous bag of words approach. Furthermore, the Skip-gram model is found to operate with better-than-chance accuracy and there is a strong positive relationship (phi coefficient = 0.61) between the model's classification and human judgement of the ranked paraphrases. This study lends credence to the viability of modelling verbal metonymy through computational methods based on distributional semantics.
We introduce a method based on the Public Goods Game for solving optimization tasks. In particular, we focus on the Traveling Salesman Problem, i.e. a NP-hard problem whose search space exponentially grows increasing the number of cities. The proposed method considers a population whose agents are provided with a random solution to the given problem. In doing so, agents interact by playing the Public Goods Game using the fitness of their solution as currency of the game. Notably, agents with better solutions provide higher contributions, while those with lower ones tend to imitate the solution of richer agents for increasing their fitness. Numerical simulations show that the proposed method allows to compute exact solutions, and suboptimal ones, in the considered search spaces. As result, beyond to propose a new heuristic for combinatorial optimization problems, our work aims to highlight the potentiality of evolutionary game theory beyond its current horizons.
In this work we introduce an evolutionary strategy to solve combinatorial optimization tasks, i.e. problems characterized by a discrete search space. In particular, we focus on the Traveling Salesman Problem (TSP), i.e. a famous problem whose search space grows exponentially, increasing the number of cities, up to becoming NP-hard. The solutions of the TSP can be codified by arrays of cities, and can be evaluated by fitness, computed according to a cost function (e.g. the length of a path). Our method is based on the evolution of an agent population by means of an imitative mechanism, we define `partial imitation'. In particular, agents receive a random solution and then, interacting among themselves, may imitate the solutions of agents with a higher fitness. Since the imitation mechanism is only partial, agents copy only one entry (randomly chosen) of another array (i.e. solution). In doing so, the population converges towards a shared solution, behaving like a spin system undergoing a cooling process, i.e. driven towards an ordered phase. We highlight that the adopted `partial imitation' mechanism allows the population to generate solutions over time, before reaching the final equilibrium. Results of numerical simulations show that our method is able to find, in a finite time, both optimal and suboptimal solutions, depending on the size of the considered search space.