Models, code, and papers for "Luc Van Gool":

A Riemannian Network for SPD Matrix Learning

Dec 22, 2016
Zhiwu Huang, Luc Van Gool

Symmetric Positive Definite (SPD) matrix learning methods have become popular in many image and video processing tasks, thanks to their ability to learn appropriate statistical representations while respecting Riemannian geometry of underlying SPD manifolds. In this paper we build a Riemannian network architecture to open up a new direction of SPD matrix non-linear learning in a deep model. In particular, we devise bilinear mapping layers to transform input SPD matrices to more desirable SPD matrices, exploit eigenvalue rectification layers to apply a non-linear activation function to the new SPD matrices, and design an eigenvalue logarithm layer to perform Riemannian computing on the resulting SPD matrices for regular output layers. For training the proposed deep network, we exploit a new backpropagation with a variant of stochastic gradient descent on Stiefel manifolds to update the structured connection weights and the involved SPD matrix data. We show through experiments that the proposed SPD matrix network can be simply trained and outperform existing SPD matrix learning and state-of-the-art methods in three typical visual classification tasks.

* Revised arXiv version, AAAI-17 camera-ready 

  Access Model/Code and Paper
Does V-NIR based Image Enhancement Come with Better Features?

Aug 24, 2016
Vivek Sharma, Luc Van Gool

Image enhancement using the visible (V) and near-infrared (NIR) usually enhances useful image details. The enhanced images are evaluated by observers perception, instead of quantitative feature evaluation. Thus, can we say that these enhanced images using NIR information has better features in comparison to the computed features in the Red, Green, and Blue color channels directly? In this work, we present a new method to enhance the visible images using NIR information via edge-preserving filters, and also investigate which method performs best from a image features standpoint. We then show that our proposed enhancement method produces more stable features than the existing state-of-the-art methods.

  Access Model/Code and Paper
Image-level Classification in Hyperspectral Images using Feature Descriptors, with Application to Face Recognition

May 11, 2016
Vivek Sharma, Luc Van Gool

In this paper, we proposed a novel pipeline for image-level classification in the hyperspectral images. By doing this, we show that the discriminative spectral information at image-level features lead to significantly improved performance in a face recognition task. We also explored the potential of traditional feature descriptors in the hyperspectral images. From our evaluations, we observe that SIFT features outperform the state-of-the-art hyperspectral face recognition methods, and also the other descriptors. With the increasing deployment of hyperspectral sensors in a multitude of applications, we believe that our approach can effectively exploit the spectral information in hyperspectral images, thus beneficial to more accurate classification.

  Access Model/Code and Paper
Unsupervised High-level Feature Learning by Ensemble Projection for Semi-supervised Image Classification and Image Clustering

Feb 04, 2016
Dengxin Dai, Luc Van Gool

This paper investigates the problem of image classification with limited or no annotations, but abundant unlabeled data. The setting exists in many tasks such as semi-supervised image classification, image clustering, and image retrieval. Unlike previous methods, which develop or learn sophisticated regularizers for classifiers, our method learns a new image representation by exploiting the distribution patterns of all available data for the task at hand. Particularly, a rich set of visual prototypes are sampled from all available data, and are taken as surrogate classes to train discriminative classifiers; images are projected via the classifiers; the projected values, similarities to the prototypes, are stacked to build the new feature vector. The training set is noisy. Hence, in the spirit of ensemble learning we create a set of such training sets which are all diverse, leading to diverse classifiers. The method is dubbed Ensemble Projection (EP). EP captures not only the characteristics of individual images, but also the relationships among images. It is conceptually simple and computationally efficient, yet effective and flexible. Experiments on eight standard datasets show that: (1) EP outperforms previous methods for semi-supervised image classification; (2) EP produces promising results for self-taught image classification, where unlabeled samples are a random collection of images rather than being from the same distribution as the labeled ones; and (3) EP improves over the original features for image clustering. The code of the method is available on the project page.

* 22 pages, 8 figures 

  Access Model/Code and Paper
Probabilistic Regression for Visual Tracking

Mar 27, 2020
Martin Danelljan, Luc Van Gool, Radu Timofte

Visual tracking is fundamentally the problem of regressing the state of the target in each video frame. While significant progress has been achieved, trackers are still prone to failures and inaccuracies. It is therefore crucial to represent the uncertainty in the target estimation. Although current prominent paradigms rely on estimating a state-dependent confidence score, this value lacks a clear probabilistic interpretation, complicating its use. In this work, we therefore propose a probabilistic regression formulation and apply it to tracking. Our network predicts the conditional probability density of the target state given an input image. Crucially, our formulation is capable of modeling label noise stemming from inaccurate annotations and ambiguities in the task. The regression network is trained by minimizing the Kullback-Leibler divergence. When applied for tracking, our formulation not only allows a probabilistic representation of the output, but also substantially improves the performance. Our tracker sets a new state-of-the-art on six datasets, achieving 59.8% AUC on LaSOT and 75.8% Success on TrackingNet. The code and models are available at

* CVPR 2020. Includes appendix 

  Access Model/Code and Paper
Deep Unfolding Network for Image Super-Resolution

Mar 23, 2020
Kai Zhang, Luc Van Gool, Radu Timofte

Learning-based single image super-resolution (SISR) methods are continuously showing superior effectiveness and efficiency over traditional model-based methods, largely due to the end-to-end training. However, different from model-based methods that can handle the SISR problem with different scale factors, blur kernels and noise levels under a unified MAP (maximum a posteriori) framework, learning-based methods generally lack such flexibility. To address this issue, this paper proposes an end-to-end trainable unfolding network which leverages both learning-based methods and model-based methods. Specifically, by unfolding the MAP inference via a half-quadratic splitting algorithm, a fixed number of iterations consisting of alternately solving a data subproblem and a prior subproblem can be obtained. The two subproblems then can be solved with neural modules, resulting in an end-to-end trainable, iterative network. As a result, the proposed network inherits the flexibility of model-based methods to super-resolve blurry, noisy images for different scale factors via a single model, while maintaining the advantages of learning-based methods. Extensive experiments demonstrate the superiority of the proposed deep unfolding network in terms of flexibility, effectiveness and also generalizability.

* Accepted by CVPR 2020. Project page: 

  Access Model/Code and Paper
Learning Better Lossless Compression Using Lossy Compression

Mar 23, 2020
Fabian Mentzer, Luc Van Gool, Michael Tschannen

We leverage the powerful lossy image compression algorithm BPG to build a lossless image compression system. Specifically, the original image is first decomposed into the lossy reconstruction obtained after compressing it with BPG and the corresponding residual. We then model the distribution of the residual with a convolutional neural network-based probabilistic model that is conditioned on the BPG reconstruction, and combine it with entropy coding to losslessly encode the residual. Finally, the image is stored using the concatenation of the bitstreams produced by BPG and the learned residual coder. The resulting compression system achieves state-of-the-art performance in learned lossless full-resolution image compression, outperforming previous learned approaches as well as PNG, WebP, and JPEG2000.

* CVPR'20 camera-ready version 

  Access Model/Code and Paper
Replacing Mobile Camera ISP with a Single Deep Learning Model

Feb 13, 2020
Andrey Ignatov, Luc Van Gool, Radu Timofte

As the popularity of mobile photography is growing constantly, lots of efforts are being invested now into building complex hand-crafted camera ISP solutions. In this work, we demonstrate that even the most sophisticated ISP pipelines can be replaced with a single end-to-end deep learning model trained without any prior knowledge about the sensor and optics used in a particular device. For this, we present PyNET, a novel pyramidal CNN architecture designed for fine-grained image restoration that implicitly learns to perform all ISP steps such as image demosaicing, denoising, white balancing, color and contrast correction, demoireing, etc. The model is trained to convert RAW Bayer data obtained directly from mobile camera sensor into photos captured with a professional high-end DSLR camera, making the solution independent of any particular mobile ISP implementation. To validate the proposed approach on the real data, we collected a large-scale dataset consisting of 10 thousand full-resolution RAW-RGB image pairs captured in the wild with the Huawei P20 cameraphone (12.3 MP Sony Exmor IMX380 sensor) and Canon 5D Mark IV DSLR. The experiments demonstrate that the proposed solution can easily get to the level of the embedded P20's ISP pipeline that, unlike our approach, is combining the data from two (RGB + B/W) camera sensors. The dataset, pre-trained models and codes used in this paper are available on the project website.

  Access Model/Code and Paper
MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning

Jan 19, 2020
Simon Vandenhende, Stamatios Georgoulis, Luc Van Gool

In this paper, we highlight the importance of considering task interactions at multiple scales when distilling task information in a multi-task learning setup. In contrast to common belief, we show that tasks with high pattern affinity at a certain scale are not guaranteed to retain this behaviour at other scales, and vice versa. We propose a novel architecture, MTI-Net, that builds upon this finding in three ways. First, it explicitly models task interactions at every scale via a multi-scale multi-modal distillation unit. Second, it propagates distilled task information from lower to higher scales via a feature propagation module. Third, it aggregates the refined task features from all scales via a feature aggregation unit to produce the final per-task predictions. Extensive experiments on two multi-task dense labeling datasets show that, unlike prior work, our multi-task model delivers on the full potential of multi-task learning, that is, smaller memory footprint, reduced number of calculations, and better performance w.r.t. single-task learning.

  Access Model/Code and Paper
Extremely Weak Supervised Image-to-Image Translation for Semantic Segmentation

Sep 18, 2019
Samarth Shukla, Luc Van Gool, Radu Timofte

Recent advances in generative models and adversarial training have led to a flourishing image-to-image (I2I) translation literature. The current I2I translation approaches require training images from the two domains that are either all paired (supervised) or all unpaired (unsupervised). In practice, obtaining paired training data in sufficient quantities is often very costly and cumbersome. Therefore solutions that employ unpaired data, while less accurate, are largely preferred. In this paper, we aim to bridge the gap between supervised and unsupervised I2I translation, with application to semantic image segmentation. We build upon pix2pix and CycleGAN, state-of-the-art seminal I2I translation techniques. We propose a method to select (very few) paired training samples and achieve significant improvements in both supervised and unsupervised I2I translation settings over random selection. Further, we boost the performance by incorporating both (selected) paired and unpaired samples in the training process. Our experiments show that an extremely weak supervised I2I translation solution using only one paired training sample can achieve a quantitative performance much better than the unsupervised CycleGAN model, and comparable to that of the supervised pix2pix model trained on thousands of pairs.

  Access Model/Code and Paper
Semi-Supervised Learning by Augmented Distribution Alignment

May 20, 2019
Qin Wang, Wen Li, Luc Van Gool

In this work, we propose a simple yet effective semi-supervised learning approach called Augmented Distribution Alignment. We reveal that an essential sampling bias exists in semi-supervised learning due to the limited amount of labeled samples, which often leads to a considerable empirical distribution mismatch between labeled data and unlabeled data. To this end, we propose to align the empirical distributions of labeled and unlabeled data to alleviate the bias. On one hand, we adopt an adversarial training strategy to minimize the distribution distance between labeled and unlabeled data as inspired by domain adaptation works. On the other hand, to deal with the small sample size issue of labeled data, we also propose a simple interpolation strategy to generate pseudo training samples. Those two strategies can be easily implemented into existing deep neural networks. We demonstrate the effectiveness of our proposed approach on the benchmark SVHN and CIFAR10 datasets, on which we achieve new state-of-the-art error rates of $3.54\%$ and $10.09\%$, respectively. Our code will be available at \url{}.

  Access Model/Code and Paper
Learning Accurate, Comfortable and Human-like Driving

Mar 26, 2019
Simon Hecker, Dengxin Dai, Luc Van Gool

Autonomous vehicles are more likely to be accepted if they drive accurately, comfortably, but also similar to how human drivers would. This is especially true when autonomous and human-driven vehicles need to share the same road. The main research focus thus far, however, is still on improving driving accuracy only. This paper formalizes the three concerns with the aim of accurate, comfortable and human-like driving. Three contributions are made in this paper. First, numerical map data from HERE Technologies are employed for more accurate driving; a set of map features which are believed to be relevant to driving are engineered to navigate better. Second, the learning procedure is improved from a pointwise prediction to a sequence-based prediction and passengers' comfort measures are embedded into the learning algorithm. Finally, we take advantage of the advances in adversary learning to learn human-like driving; specifically, the standard L1 or L2 loss is augmented by an adversary loss which is based on a discriminator trained to distinguish between human driving and machine driving. Our model is trained and evaluated on the Drive360 dataset, which features 60 hours and 3000 km of real-world driving data. Extensive experiments show that our driving model is more accurate, more comfortable and behaves more like a human driver than previous methods. The resources of this work will be released on the project page.

* submitted to a conference, 10 pages, 3 figures 

  Access Model/Code and Paper
Real-time 3D Traffic Cone Detection for Autonomous Driving

Feb 06, 2019
Ankit Dhall, Dengxin Dai, Luc Van Gool

Considerable progress has been made in semantic scene understanding of road scenes with monocular cameras. It is, however, mainly related to certain classes such as cars and pedestrians. This work investigates traffic cones, an object class crucial for traffic control in the context of autonomous vehicles. 3D object detection using images from a monocular camera is intrinsically an ill-posed problem. In this work, we leverage the unique structure of traffic cones and propose a pipelined approach to the problem. Specifically, we first detect cones in images by a tailored 2D object detector; then, the spatial arrangement of keypoints on a traffic cone are detected by our deep structural regression network, where the fact that the cross-ratio is projection invariant is leveraged for network regularization; finally, the 3D position of cones is recovered by the classical Perspective n-Point algorithm. Extensive experiments show that our approach can accurately detect traffic cones and estimate their position in the 3D world in real time. The proposed method is also deployed on a real-time, critical system. It runs efficiently on the low-power Jetson TX2, providing accurate 3D position estimates, allowing a race-car to map and drive autonomously on an unseen track indicated by traffic cones. With the help of robust and accurate perception, our race-car won both Formula Student Competitions held in Italy and Germany in 2018, cruising at a top-speed of 54 kmph. Visualization of the complete pipeline, mapping and navigation can be found on our project page.

* 8 pages, 11 figures. arXiv admin note: substantial text overlap with arXiv:1809.10548 

  Access Model/Code and Paper
Semantic Nighttime Image Segmentation with Synthetic Stylized Data, Gradual Adaptation and Uncertainty-Aware Evaluation

Jan 17, 2019
Christos Sakaridis, Dengxin Dai, Luc Van Gool

This work addresses the problem of semantic segmentation of nighttime images. The main direction of recent progress in semantic segmentation pertains to daytime scenes with favorable illumination conditions. We focus on improving the performance of state-of-the-art methods on the nighttime domain by adapting them to nighttime data without extra annotations, and designing a new evaluation framework to address the uncertainty of semantics in nighttime images. To this end, we make the following contributions: 1) a novel pipeline for dataset-scale guided style transfer to generate synthetic nighttime images from real daytime input; 2) a framework to gradually adapt semantic segmentation models from day to night via stylized and real images of progressively increasing darkness; 3) a novel uncertainty-aware annotation and evaluation framework and metric for semantic segmentation in adverse conditions; 4) the Dark Zurich dataset with 2416 nighttime and 2920 twilight unlabeled images plus 20 nighttime images with pixel-level annotations that conform to our newly-proposed evaluation. Our experiments evidence that both our stylized data per se and our gradual adaptation significantly boost performance at nighttime both for standard evaluation metrics and our metric. Moreover, our new evaluation reveals that state-of-the-art segmentation models output overly confident predictions at indiscernible regions compared to visible ones.

* 10 pages 

  Access Model/Code and Paper
Unsupervised Deep Single-Image Intrinsic Decomposition using Illumination-Varying Image Sequences

Sep 03, 2018
Louis Lettry, Kenneth Vanhoey, Luc van Gool

Machine learning based Single Image Intrinsic Decomposition (SIID) methods decompose a captured scene into its albedo and shading images by using the knowledge of a large set of known and realistic ground truth decompositions. Collecting and annotating such a dataset is an approach that cannot scale to sufficient variety and realism. We free ourselves from this limitation by training on unannotated images. Our method leverages the observation that two images of the same scene but with different lighting provide useful information on their intrinsic properties: by definition, albedo is invariant to lighting conditions, and cross-combining the estimated albedo of a first image with the estimated shading of a second one should lead back to the second one's input image. We transcribe this relationship into a siamese training scheme for a deep convolutional neural network that decomposes a single image into albedo and shading. The siamese setting allows us to introduce a new loss function including such cross-combinations, and to train solely on (time-lapse) images, discarding the need for any ground truth annotations. As a result, our method has the good properties of i) taking advantage of the time-varying information of image sequences in the (pre-computed) training step, ii) not requiring ground truth data to train on, and iii) being able to decompose single images of unseen scenes at runtime. To demonstrate and evaluate our work, we additionally propose a new rendered dataset containing illumination-varying scenes and a set of quantitative metrics to evaluate SIID algorithms. Despite its unsupervised nature, our results compete with state of the art methods, including supervised and non data-driven methods.

* To appear in Pacific Graphics 2018 

  Access Model/Code and Paper
End-to-End Learning of Driving Models with Surround-View Cameras and Route Planners

Aug 06, 2018
Simon Hecker, Dengxin Dai, Luc Van Gool

For human drivers, having rear and side-view mirrors is vital for safe driving. They deliver a more complete view of what is happening around the car. Human drivers also heavily exploit their mental map for navigation. Nonetheless, several methods have been published that learn driving models with only a front-facing camera and without a route planner. This lack of information renders the self-driving task quite intractable. We investigate the problem in a more realistic setting, which consists of a surround-view camera system with eight cameras, a route planner, and a CAN bus reader. In particular, we develop a sensor setup that provides data for a 360-degree view of the area surrounding the vehicle, the driving route to the destination, and low-level driving maneuvers (e.g. steering angle and speed) by human drivers. With such a sensor setup we collect a new driving dataset, covering diverse driving scenarios and varying weather/illumination conditions. Finally, we learn a novel driving model by integrating information from the surround-view cameras and the route planner. Two route planners are exploited: 1) by representing the planned routes on OpenStreetMap as a stack of GPS coordinates, and 2) by rendering the planned routes on TomTom Go Mobile and recording the progression into a video. Our experiments show that: 1) 360-degree surround-view cameras help avoid failures made with a single front-view camera, in particular for city driving and intersection scenarios; and 2) route planners help the driving task significantly, especially for steering angle prediction.

* to be published at ECCV 2018 

  Access Model/Code and Paper
Multi-bin Trainable Linear Unit for Fast Image Restoration Networks

Jul 30, 2018
Shuhang Gu, Radu Timofte, Luc Van Gool

Tremendous advances in image restoration tasks such as denoising and super-resolution have been achieved using neural networks. Such approaches generally employ very deep architectures, large number of parameters, large receptive fields and high nonlinear modeling capacity. In order to obtain efficient and fast image restoration networks one should improve upon the above mentioned requirements. In this paper we propose a novel activation function, the multi-bin trainable linear unit (MTLU), for increasing the nonlinear modeling capacity together with lighter and shallower networks. We validate the proposed fast image restoration networks for image denoising (FDnet) and super-resolution (FSRnet) on standard benchmarks. We achieve large improvements in both memory and runtime over current state-of-the-art for comparable or better PSNR accuracies.

  Access Model/Code and Paper
Progressive Structure from Motion

Jul 10, 2018
Alex Locher, Michal Havlena, Luc Van Gool

Structure from Motion or the sparse 3D reconstruction out of individual photos is a long studied topic in computer vision. Yet none of the existing reconstruction pipelines fully addresses a progressive scenario where images are only getting available during the reconstruction process and intermediate results are delivered to the user. Incremental pipelines are capable of growing a 3D model but often get stuck in local minima due to wrong (binding) decisions taken based on incomplete information. Global pipelines on the other hand need the access to the complete viewgraph and are not capable of delivering intermediate results. In this paper we propose a new reconstruction pipeline working in a progressive manner rather than in a batch processing scheme. The pipeline is able to recover from failed reconstructions in early stages, avoids to take binding decisions, delivers a progressive output and yet maintains the capabilities of existing pipelines. We demonstrate and evaluate our method on diverse challenging public and dedicated datasets including those with highly symmetric structures and compare to the state of the art.

* Accepted to ECCV 2018 

  Access Model/Code and Paper
Failure Prediction for Autonomous Driving

May 04, 2018
Simon Hecker, Dengxin Dai, Luc Van Gool

The primary focus of autonomous driving research is to improve driving accuracy. While great progress has been made, state-of-the-art algorithms still fail at times. Such failures may have catastrophic consequences. It therefore is important that automated cars foresee problems ahead as early as possible. This is also of paramount importance if the driver will be asked to take over. We conjecture that failures do not occur randomly. For instance, driving models may fail more likely at places with heavy traffic, at complex intersections, and/or under adverse weather/illumination conditions. This work presents a method to learn to predict the occurrence of these failures, i.e. to assess how difficult a scene is to a given driving model and to possibly give the human driver an early headsup. A camera-based driving model is developed and trained over real driving datasets. The discrepancies between the model's predictions and the human `ground-truth' maneuvers were then recorded, to yield the `failure' scores. Experimental results show that the failure score can indeed be learned and predicted. Thus, our prediction method is able to improve the overall safety of an automated driving model by alerting the human driver timely, leading to better human-vehicle collaborative driving.

* published in IEEE Intelligent Vehicle Symposium 2018 

  Access Model/Code and Paper
ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes

Apr 07, 2018
Yuhua Chen, Wen Li, Luc Van Gool

Exploiting synthetic data to learn deep models has attracted increasing attention in recent years. However, the intrinsic domain difference between synthetic and real images usually causes a significant performance drop when applying the learned model to real world scenarios. This is mainly due to two reasons: 1) the model overfits to synthetic images, making the convolutional filters incompetent to extract informative representation for real images; 2) there is a distribution difference between synthetic and real data, which is also known as the domain adaptation problem. To this end, we propose a new reality oriented adaptation approach for urban scene semantic segmentation by learning from synthetic data. First, we propose a target guided distillation approach to learn the real image style, which is achieved by training the segmentation model to imitate a pretrained real style model using real images. Second, we further take advantage of the intrinsic spatial structure presented in urban scene images, and propose a spatial-aware adaptation scheme to effectively align the distribution of two domains. These two modules can be readily integrated with existing state-of-the-art semantic segmentation networks to improve their generalizability when adapting from synthetic to real urban scenes. We evaluate the proposed method on Cityscapes dataset by adapting from GTAV and SYNTHIA datasets, where the results demonstrate the effectiveness of our method.

* Add experiments on SYNTHIA, CVPR 2018 camera-ready version 

  Access Model/Code and Paper