Models, code, and papers for "Lucas C":

Multi-set Canonical Correlation Analysis simply explained

Feb 11, 2018
Lucas C Parra

There are a multitude of methods to perform multi-set correlated component analysis (MCCA), including some that require iterative solutions. The methods differ on the criterion they optimize and the constraints placed on the solutions. This note focuses perhaps on the simplest version, which can be solved in a single step as the eigenvectors of matrix ${\bf D}^{-1} {\bf R}$. Here ${\bf R}$ is the covariance matrix of the concatenated data, and ${\bf D}$ is its block-diagonal. This note shows that this solution maximizes inter-set correlation (ISC) without further constraints. It also relates the solution to a two step procedure, which first whitens each dataset using PCA, and then performs an additional PCA on the concatenated and whitened data. Both these solutions are known, although a clear derivation and simple implementation are hard to find. This short note aims to remedy this.

Exploiting video sequences for unsupervised disentangling in generative adversarial networks

Oct 16, 2019
Facundo Tuesca, Lucas C. Uzal

In this work we present an adversarial training algorithm that exploits correlations in video to learn --without supervision-- an image generator model with a disentangled latent space. The proposed methodology requires only a few modifications to the standard algorithm of Generative Adversarial Networks (GAN) and involves training with sets of frames taken from short videos. We train our model over two datasets of face-centered videos which present different people speaking or moving the head: VidTIMIT and YouTube Faces datasets. We found that our proposal allows us to split the generator latent space into two subspaces. One of them controls content attributes, those that do not change along short video sequences. For the considered datasets, this is the identity of the generated face. The other subspace controls motion attributes, those attributes that are observed to change along short videos. We observed that these motion attributes are face expressions, head orientation, lips and eyes movement. The presented experiments provide quantitative and qualitative evidence supporting that the proposed methodology induces a disentangling of this two kinds of attributes in the latent space.

* This preprint is the result of the work done for the undergraduate dissertation of F. Tuesca supervised by L.C. Uzal and presented in June 2018
Tissue segmentation with deep 3D networks and spatial priors

May 24, 2019
Lukas Hirsch, Yu Huang, Lucas C Parra

Conventional automated segmentation of the human head distinguishes different tissues based on image intensities in an MRI volume and prior tissue probability maps (TPM). This works well for normal head anatomies, but fails in the presence of unexpected lesions. Deep convolutional neural networks leverage instead volumetric spatial patterns and can be trained to segment lesions, but have thus far not integrated prior probabilities. Here we add to a three-dimensional convolutional network spatial priors with a TPM, morphological priors with conditional random fields, and context with a wider field-of-view at lower resolution. The new architecture, which we call MultiPrior, was designed to be a fully-trainable, three-dimensional convolutional network. Thus, the resulting architecture represents a neural network with learnable spatial memories. When trained on a set of stroke patients and healthy subjects, MultiPrior outperforms the state-of-the-art segmentation tools such as DeepMedic and SPM segmentation. The approach is further demonstrated on patients with disorders of consciousness, where we find that cognitive state correlates positively with gray-matter volumes and negatively with the extent of ventricles. We make the code and trained networks freely available to support future clinical research projects.

Exploiting GAN Internal Capacity for High-Quality Reconstruction of Natural Images

Oct 26, 2019
Marcos Pividori, Guillermo L. Grinblat, Lucas C. Uzal

Generative Adversarial Networks (GAN) have demonstrated impressive results in modeling the distribution of natural images, learning latent representations that capture semantic variations in an unsupervised basis. Beyond the generation of novel samples, it is of special interest to exploit the ability of the GAN generator to model the natural image manifold and hence generate credible changes when manipulating images. However, this line of work is conditioned by the quality of the reconstruction. Until now, only inversion to the latent space has been considered, we propose to exploit the representation in intermediate layers of the generator, and we show that this leads to increased capacity. In particular, we observe that the representation after the first dense layer, present in all state-of-the-art GAN models, is expressive enough to represent natural images with high visual fidelity. It is possible to interpolate around these images obtaining a sequence of new plausible synthetic images that cannot be generated from the latent space. Finally, as an example of potential applications that arise from this inversion mechanism, we show preliminary results in exploiting the learned representation in the attention map of the generator to obtain an unsupervised segmentation of natural images.

* This preprint is the result of the work done for the undergraduate dissertation of M. Pividori supervised by L.C. Uzal and G.L. Grinblat, and presented in July 2019
Correlated Components Analysis - Extracting Reliable Dimensions in Multivariate Data

Sep 10, 2018
Lucas C. Parra, Stefan Haufe, Jacek P. Dmochowski

How does one find dimensions in multivariate data that are reliably expressed across repetitions? For example, in a brain imaging study one may want to identify combinations of neural signals that are reliably expressed across multiple trials or subjects. For a behavioral assessment with multiple ratings, one may want to identify an aggregate score that is reliably reproduced across raters. Correlated Components Analysis (CorrCA) addresses this problem by identifying components that are maximally correlated between repetitions (e.g. trials, subjects, raters). Here we formalize this as the maximization of the ratio of between-repetition to within-repetition covariance. We show that this criterion maximizes repeat-reliability, defined as mean over variance across repeats, and that it leads to CorrCA or to multi-set Canonical Correlation Analysis, depending on the constraints. Surprisingly, we also find that CorrCA is equivalent to Linear Discriminant Analysis for equal-mean signals, which provides an unexpected link between classic concepts of multivariate analysis. We provided an exact parametric test for statistical significance based on the F-statistic for normally distributed independent samples, and present and validate shuffle statistics for the case of dependent samples. Regularization and extension to non-linear mappings using kernels are also presented. The algorithms are demonstrated on a series of data analysis applications, and we provide all code and data required to reproduce the results.

Dynamic texture analysis with diffusion in networks

Jun 27, 2018
Lucas C. Ribas, Wesley N. Goncalves, Odemir M. Bruno

Dynamic texture is a field of research that has gained considerable interest from computer vision community due to the explosive growth of multimedia databases. In addition, dynamic texture is present in a wide range of videos, which makes it very important in expert systems based on videos such as medical systems, traffic monitoring systems, forest fire detection system, among others. In this paper, a new method for dynamic texture characterization based on diffusion in directed networks is proposed. The dynamic texture is modeled as a directed network. The method consists in the analysis of the dynamic of this network after a series of graph cut transformations based on the edge weights. For each network transformation, the activity for each vertex is estimated. The activity is the relative frequency that one vertex is visited by random walks in balance. Then, texture descriptor is constructed by concatenating the activity histograms. The main contributions of this paper are the use of directed network modeling and diffusion in network to dynamic texture characterization. These tend to provide better performance in dynamic textures classification. Experiments with rotation and interference of the motion pattern were conducted in order to demonstrate the robustness of the method. The proposed approach is compared to other dynamic texture methods on two very well know dynamic texture database and on traffic condition classification, and outperform in most of the cases.

* 30 pages, 20 figures

May 17, 2018
Guillermo L. Grinblat, Lucas C. Uzal, Pablo M. Granitto

Generative Adversarial Networks (GANs) produce systematically better quality samples when class label information is provided., i.e. in the conditional GAN setup. This is still observed for the recently proposed Wasserstein GAN formulation which stabilized adversarial training and allows considering high capacity network architectures such as ResNet. In this work we show how to boost conditional GAN by augmenting available class labels. The new classes come from clustering in the representation space learned by the same GAN model. The proposed strategy is also feasible when no class information is available, i.e. in the unsupervised setup. Our generated samples reach state-of-the-art Inception scores for CIFAR-10 and STL-10 datasets in both supervised and unsupervised setup.

* Under consideration at Pattern Recognition Letters
Spatio-spectral networks for color-texture analysis

Sep 13, 2019
Leonardo F. S. Scabini, Lucas C. Ribas, Odemir M. Bruno

Texture is one of the most-studied visual attribute for image characterization since the 1960s. However, most hand-crafted descriptors are monochromatic, focusing on the gray scale images and discarding the color information. In this context, this work focus on a new method for color texture analysis considering all color channels in a more intrinsic approach. Our proposal consists of modeling color images as directed complex networks that we named Spatio-Spectral Network (SSN). Its topology includes within-channel edges that cover spatial patterns throughout individual image color channels, while between-channel edges tackle spectral properties of channel pairs in an opponent fashion. Image descriptors are obtained through a concise topological characterization of the modeled network in a multiscale approach with radially symmetric neighborhoods. Experiments with four datasets cover several aspects of color-texture analysis, and results demonstrate that SSN overcomes all the compared literature methods, including known deep convolutional networks, and also has the most stable performance between datasets, achieving $98.5(\pm1.1)$ of average accuracy against $97.1(\pm1.3)$ of MCND and $96.8(\pm3.2)$ of AlexNet. Additionally, an experiment verifies the performance of the methods under different color spaces, where results show that SSN also has higher performance and robustness.

MOEA/D with Uniformly Randomly Adaptive Weights

When working with decomposition-based algorithms, an appropriate set of weights might improve quality of the final solution. A set of uniformly distributed weights usually leads to well-distributed solutions on a Pareto front. However, there are two main difficulties with this approach. Firstly, it may fail depending on the problem geometry. Secondly, the population size becomes not flexible as the number of objectives increases. In this paper, we propose the MOEA/D with Uniformly Randomly Adaptive Weights (MOEA/DURAW) which uses the Uniformly Randomly method as an approach to subproblems generation, allowing a flexible population size even when working with many objective problems. During the evolutionary process, MOEA/D-URAW adds and removes subproblems as a function of the sparsity level of the population. Moreover, instead of requiring assumptions about the Pareto front shape, our method adapts its weights to the shape of the problem during the evolutionary process. Experimental results using WFG41-48 problem classes, with different Pareto front shapes, shows that the present method presents better or equal results in 77.5% of the problems evaluated from 2 to 6 objectives when compared with state-of-the-art methods in the literature.

* Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), 2018
Incremental Bounded Model Checking of Artificial Neural Networks in CUDA

Artificial Neural networks (ANNs) are powerful computing systems employed for various applications due to their versatility to generalize and to respond to unexpected inputs/patterns. However, implementations of ANNs for safety-critical systems might lead to failures, which are hardly predicted in the design phase since ANNs are highly parallel and their parameters are hardly interpretable. Here we develop and evaluate a novel symbolic software verification framework based on incremental bounded model checking (BMC) to check for adversarial cases and coverage methods in multi-layer perceptron (MLP). In particular, we further develop the efficient SMT-based Context-Bounded Model Checker for Graphical Processing Units (ESBMC-GPU) in order to ensure the reliability of certain safety properties in which safety-critical systems can fail and make incorrect decisions, thereby leading to unwanted material damage or even put lives in danger. This paper marks the first symbolic verification framework to reason over ANNs implemented in CUDA. Our experimental results show that our approach implemented in ESBMC-GPU can successfully verify safety properties and covering methods in ANNs and correctly generate 28 adversarial cases in MLPs.

* 8 pages
Egocentric Hand Track and Object-based Human Action Recognition

Egocentric vision is an emerging field of computer vision that is characterized by the acquisition of images and video from the first person perspective. In this paper we address the challenge of egocentric human action recognition by utilizing the presence and position of detected regions of interest in the scene explicitly, without further use of visual features. Initially, we recognize that human hands are essential in the execution of actions and focus on obtaining their movements as the principal cues that define actions. We employ object detection and region tracking techniques to locate hands and capture their movements. Prior knowledge about egocentric views facilitates hand identification between left and right. With regard to detection and tracking, we contribute a pipeline that successfully operates on unseen egocentric videos to find the camera wearer's hands and associate them through time. Moreover, we emphasize on the value of scene information for action recognition. We acknowledge that the presence of objects is significant for the execution of actions by humans and in general for the description of a scene. To acquire this information, we utilize object detection for specific classes that are relevant to the actions we want to recognize. Our experiments are targeted on videos of kitchen activities from the Epic-Kitchens dataset. We model action recognition as a sequence learning problem of the detected spatial positions in the frames. Our results show that explicit hand and object detections with no other visual information can be relied upon to classify hand-related human actions. Testing against methods fully dependent on visual features, signals that for actions where hand motions are conceptually important, a region-of-interest-based description of a video contains equally expressive information with comparable classification performance.

* Accepted for publication at UIC 2019:Track 3, 8 pages, 5 figures, Index terms: egocentric action recognition, hand detection, hand tracking, hand identification, sequence classification, code available at: https://github.com/georkap/hand_track_classification
Fusion of complex networks and randomized neural networks for texture analysis

This paper presents a high discriminative texture analysis method based on the fusion of complex networks and randomized neural networks. In this approach, the input image is modeled as a complex networks and its topological properties as well as the image pixels are used to train randomized neural networks in order to create a signature that represents the deep characteristics of the texture. The results obtained surpassed the accuracies of many methods available in the literature. This performance demonstrates that our proposed approach opens a promising source of research, which consists of exploring the synergy of neural networks and complex networks in the texture analysis field.

* 13 pages, 4 figures
Counterexample Guided Inductive Optimization Applied to Mobile Robots Path Planning (Extended Version)

We describe and evaluate a novel optimization-based off-line path planning algorithm for mobile robots based on the Counterexample-Guided Inductive Optimization (CEGIO) technique. CEGIO iteratively employs counterexamples generated from Boolean Satisfiability (SAT) and Satisfiability Modulo Theories (SMT) solvers, in order to guide the optimization process and to ensure global optimization. This paper marks the first application of CEGIO for planning mobile robot path. In particular, CEGIO has been successfully applied to obtain optimal two-dimensional paths for autonomous mobile robots using off-the-shelf SAT and SMT solvers.

* 7 pages, 14rd Latin American Robotics Symposium (LARS'2017)
Counterexample Guided Inductive Optimization

This paper describes three variants of a counterexample guided inductive optimization (CEGIO) approach based on Satisfiability Modulo Theories (SMT) solvers. In particular, CEGIO relies on iterative executions to constrain a verification procedure, in order to perform inductive generalization, based on counterexamples extracted from SMT solvers. CEGIO is able to successfully optimize a wide range of functions, including non-linear and non-convex optimization problems based on SMT solvers, in which data provided by counterexamples are employed to guide the verification engine, thus reducing the optimization domain. The present algorithms are evaluated using a large set of benchmarks typically employed for evaluating optimization techniques. Experimental results show the efficiency and effectiveness of the proposed algorithms, which find the optimal solution in all evaluated benchmarks, while traditional techniques are usually trapped by local minima.

Generalised Wasserstein Dice Score for Imbalanced Multi-class Segmentation using Holistic Convolutional Networks

The Dice score is widely used for binary segmentation due to its robustness to class imbalance. Soft generalisations of the Dice score allow it to be used as a loss function for training convolutional neural networks (CNN). Although CNNs trained using mean-class Dice score achieve state-of-the-art results on multi-class segmentation, this loss function does neither take advantage of inter-class relationships nor multi-scale information. We argue that an improved loss function should balance misclassifications to favour predictions that are semantically meaningful. This paper investigates these issues in the context of multi-class brain tumour segmentation. Our contribution is threefold. 1) We propose a semantically-informed generalisation of the Dice score for multi-class segmentation based on the Wasserstein distance on the probabilistic label space. 2) We propose a holistic CNN that embeds spatial information at multiple scales with deep supervision. 3) We show that the joint use of holistic CNNs and generalised Wasserstein Dice scores achieves segmentations that are more semantically meaningful for brain tumour segmentation.

* Accepted as an oral presentation at the MICCAI 2017 Brain Lesion (BrainLes) Workshop
Scalable multimodal convolutional networks for brain tumour segmentation

Brain tumour segmentation plays a key role in computer-assisted surgery. Deep neural networks have increased the accuracy of automatic segmentation significantly, however these models tend to generalise poorly to different imaging modalities than those for which they have been designed, thereby limiting their applications. For example, a network architecture initially designed for brain parcellation of monomodal T1 MRI can not be easily translated into an efficient tumour segmentation network that jointly utilises T1, T1c, Flair and T2 MRI. To tackle this, we propose a novel scalable multimodal deep learning architecture using new nested structures that explicitly leverage deep features within or across modalities. This aims at making the early layers of the architecture structured and sparse so that the final architecture becomes scalable to the number of modalities. We evaluate the scalable architecture for brain tumour segmentation and give evidence of its regularisation effect compared to the conventional concatenation approach.

* Paper accepted at MICCAI 2017
Traffic Light Recognition Using Deep Learning and Prior Maps for Autonomous Cars

Autonomous terrestrial vehicles must be capable of perceiving traffic lights and recognizing their current states to share the streets with human drivers. Most of the time, human drivers can easily identify the relevant traffic lights. To deal with this issue, a common solution for autonomous cars is to integrate recognition with prior maps. However, additional solution is required for the detection and recognition of the traffic light. Deep learning techniques have showed great performance and power of generalization including traffic related problems. Motivated by the advances in deep learning, some recent works leveraged some state-of-the-art deep detectors to locate (and further recognize) traffic lights from 2D camera images. However, none of them combine the power of the deep learning-based detectors with prior maps to recognize the state of the relevant traffic lights. Based on that, this work proposes to integrate the power of deep learning-based detection with the prior maps used by our car platform IARA (acronym for Intelligent Autonomous Robotic Automobile) to recognize the relevant traffic lights of predefined routes. The process is divided in two phases: an offline phase for map construction and traffic lights annotation; and an online phase for traffic light recognition and identification of the relevant ones. The proposed system was evaluated on five test cases (routes) in the city of Vit\'oria, each case being composed of a video sequence and a prior map with the relevant traffic lights for the route. Results showed that the proposed technique is able to correctly identify the relevant traffic light along the trajectory.

* Accepted in 2019 International Joint Conference on Neural Networks (IJCNN)
ToolNet: Holistically-Nested Real-Time Segmentation of Robotic Surgical Tools

Real-time tool segmentation from endoscopic videos is an essential part of many computer-assisted robotic surgical systems and of critical importance in robotic surgical data science. We propose two novel deep learning architectures for automatic segmentation of non-rigid surgical instruments. Both methods take advantage of automated deep-learning-based multi-scale feature extraction while trying to maintain an accurate segmentation quality at all resolutions. The two proposed methods encode the multi-scale constraint inside the network architecture. The first proposed architecture enforces it by cascaded aggregation of predictions and the second proposed network does it by means of a holistically-nested architecture where the loss at each scale is taken into account for the optimization process. As the proposed methods are for real-time semantic labeling, both present a reduced number of parameters. We propose the use of parametric rectified linear units for semantic labeling in these small architectures to increase the regularization ability of the design and maintain the segmentation accuracy without overfitting the training sets. We compare the proposed architectures against state-of-the-art fully convolutional networks. We validate our methods using existing benchmark datasets, including ex vivo cases with phantom tissue and different robotic surgical instruments present in the scene. Our results show a statistically significant improved Dice Similarity Coefficient over previous instrument segmentation methods. We analyze our design choices and discuss the key drivers for improving accuracy.

* Paper accepted at IROS 2017
NiftyNet: a deep-learning platform for medical imaging

Medical image analysis and computer-assisted intervention problems are increasingly being addressed with deep-learning-based solutions. Established deep-learning platforms are flexible but do not provide specific functionality for medical image analysis and adapting them for this application requires substantial implementation effort. Thus, there has been substantial duplication of effort and incompatible infrastructure developed across many research groups. This work presents the open-source NiftyNet platform for deep learning in medical imaging. The ambition of NiftyNet is to accelerate and simplify the development of these solutions, and to provide a common mechanism for disseminating research outputs for the community to use, adapt and build upon. NiftyNet provides a modular deep-learning pipeline for a range of medical imaging applications including segmentation, regression, image generation and representation learning applications. Components of the NiftyNet pipeline including data loading, data augmentation, network architectures, loss functions and evaluation metrics are tailored to, and take advantage of, the idiosyncracies of medical image analysis and computer-assisted intervention. NiftyNet is built on TensorFlow and supports TensorBoard visualization of 2D and 3D images and computational graphs by default. We present 3 illustrative medical image analysis applications built using NiftyNet: (1) segmentation of multiple abdominal organs from computed tomography; (2) image regression to predict computed tomography attenuation maps from brain magnetic resonance images; and (3) generation of simulated ultrasound images for specified anatomical poses. NiftyNet enables researchers to rapidly develop and distribute deep learning solutions for segmentation, regression, image generation and representation learning applications, or extend the platform to new applications.

* Wenqi Li and Eli Gibson contributed equally to this work. M. Jorge Cardoso and Tom Vercauteren contributed equally to this work. 26 pages, 6 figures; Update includes additional applications, updated author list and formatting for journal submission
The STRANDS Project: Long-Term Autonomy in Everyday Environments

Thanks to the efforts of the robotics and autonomous systems community, robots are becoming ever more capable. There is also an increasing demand from end-users for autonomous service robots that can operate in real environments for extended periods. In the STRANDS project we are tackling this demand head-on by integrating state-of-the-art artificial intelligence and robotics research into mobile service robots, and deploying these systems for long-term installations in security and care environments. Over four deployments, our robots have been operational for a combined duration of 104 days autonomously performing end-user defined tasks, covering 116km in the process. In this article we describe the approach we have used to enable long-term autonomous operation in everyday environments, and how our robots are able to use their long run times to improve their own performance.