Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Oliver Wasenmüller

360$^\circ$ from a Single Camera: A Few-Shot Approach for LiDAR Segmentation

Sep 12, 2023
Laurenz Reichardt, Nikolas Ebert, Oliver Wasenmüller

$Figure 1 for 360$^\circ$ from a Single Camera: A Few-Shot Approach for LiDAR Segmentation$

$Figure 2 for 360$^\circ$ from a Single Camera: A Few-Shot Approach for LiDAR Segmentation$

$Figure 3 for 360$^\circ$ from a Single Camera: A Few-Shot Approach for LiDAR Segmentation$

$Figure 4 for 360$^\circ$ from a Single Camera: A Few-Shot Approach for LiDAR Segmentation$

Deep learning applications on LiDAR data suffer from a strong domain gap when applied to different sensors or tasks. In order for these methods to obtain similar accuracy on different data in comparison to values reported on public benchmarks, a large scale annotated dataset is necessary. However, in practical applications labeled data is costly and time consuming to obtain. Such factors have triggered various research in label-efficient methods, but a large gap remains to their fully-supervised counterparts. Thus, we propose ImageTo360, an effective and streamlined few-shot approach to label-efficient LiDAR segmentation. Our method utilizes an image teacher network to generate semantic predictions for LiDAR data within a single camera view. The teacher is used to pretrain the LiDAR segmentation student network, prior to optional fine-tuning on 360$^\circ$ data. Our method is implemented in a modular manner on the point level and as such is generalizable to different architectures. We improve over the current state-of-the-art results for label-efficient methods and even surpass some traditional fully-supervised segmentation networks.

* ICCV Workshop 2023

Via

Access Paper or Ask Questions

Transformer-based Detection of Microorganisms on High-Resolution Petri Dish Images

Aug 21, 2023
Nikolas Ebert, Didier Stricker, Oliver Wasenmüller

Many medical or pharmaceutical processes have strict guidelines regarding continuous hygiene monitoring. This often involves the labor-intensive task of manually counting microorganisms in Petri dishes by trained personnel. Automation attempts often struggle due to major challenges: significant scaling differences, low separation, low contrast, etc. To address these challenges, we introduce AttnPAFPN, a high-resolution detection pipeline that leverages a novel transformer variation, the efficient-global self-attention mechanism. Our streamlined approach can be easily integrated in almost any multi-scale object detection pipeline. In a comprehensive evaluation on the publicly available AGAR dataset, we demonstrate the superior accuracy of our network over the current state-of-the-art. In order to demonstrate the task-independent performance of our approach, we perform further experiments on COCO and LIVECell datasets.

* This paper has been accepted at IEEE International Conference on Computer Vision Workshops (ICCV workshop), 2023

Via

Access Paper or Ask Questions

Light-Weight Vision Transformer with Parallel Local and Global Self-Attention

Jul 18, 2023
Nikolas Ebert, Laurenz Reichardt, Didier Stricker, Oliver Wasenmüller

Figure 1 for Light-Weight Vision Transformer with Parallel Local and Global Self-Attention

Figure 2 for Light-Weight Vision Transformer with Parallel Local and Global Self-Attention

Figure 3 for Light-Weight Vision Transformer with Parallel Local and Global Self-Attention

Figure 4 for Light-Weight Vision Transformer with Parallel Local and Global Self-Attention

While transformer architectures have dominated computer vision in recent years, these models cannot easily be deployed on hardware with limited resources for autonomous driving tasks that require real-time-performance. Their computational complexity and memory requirements limits their use, especially for applications with high-resolution inputs. In our work, we redesign the powerful state-of-the-art Vision Transformer PLG-ViT to a much more compact and efficient architecture that is suitable for such tasks. We identify computationally expensive blocks in the original PLG-ViT architecture and propose several redesigns aimed at reducing the number of parameters and floating-point operations. As a result of our redesign, we are able to reduce PLG-ViT in size by a factor of 5, with a moderate drop in performance. We propose two variants, optimized for the best trade-off between parameter count to runtime as well as parameter count to accuracy. With only 5 million parameters, we achieve 79.5$\%$ top-1 accuracy on the ImageNet-1K classification benchmark. Our networks demonstrate great performance on general vision benchmarks like COCO instance segmentation. In addition, we conduct a series of experiments, demonstrating the potential of our approach in solving various tasks specifically tailored to the challenges of autonomous driving and transportation.

* This paper has been accepted at IEEE Intelligent Transportation Systems Conference (ITSC), 2023

Via

Access Paper or Ask Questions

Multitask Network for Joint Object Detection, Semantic Segmentation and Human Pose Estimation in Vehicle Occupancy Monitoring

May 03, 2022
Nikolas Ebert, Patrick Mangat, Oliver Wasenmüller

Figure 1 for Multitask Network for Joint Object Detection, Semantic Segmentation and Human Pose Estimation in Vehicle Occupancy Monitoring

Figure 2 for Multitask Network for Joint Object Detection, Semantic Segmentation and Human Pose Estimation in Vehicle Occupancy Monitoring

Figure 3 for Multitask Network for Joint Object Detection, Semantic Segmentation and Human Pose Estimation in Vehicle Occupancy Monitoring

Figure 4 for Multitask Network for Joint Object Detection, Semantic Segmentation and Human Pose Estimation in Vehicle Occupancy Monitoring

In order to ensure safe autonomous driving, precise information about the conditions in and around the vehicle must be available. Accordingly, the monitoring of occupants and objects inside the vehicle is crucial. In the state-of-the-art, single or multiple deep neural networks are used for either object recognition, semantic segmentation, or human pose estimation. In contrast, we propose our Multitask Detection, Segmentation and Pose Estimation Network (MDSP) -- the first multitask network solving all these three tasks jointly in the area of occupancy monitoring. Due to the shared architecture, memory and computing costs can be saved while achieving higher accuracy. Furthermore, our architecture allows a flexible combination of the three mentioned tasks during a simple end-to-end training. We perform comprehensive evaluations on the public datasets SVIRO and TiCaM in order to demonstrate the superior performance.

* This paper has been accepted at IEEE Intelligent Vehicles Symposium (IV), 2022 (ORAL)

Via

Access Paper or Ask Questions

Detection of Driver Drowsiness by Calculating the Speed of Eye Blinking

Oct 21, 2021
Muhammad Fawwaz Yusri, Patrick Mangat, Oliver Wasenmüller

Figure 1 for Detection of Driver Drowsiness by Calculating the Speed of Eye Blinking

Figure 2 for Detection of Driver Drowsiness by Calculating the Speed of Eye Blinking

Figure 3 for Detection of Driver Drowsiness by Calculating the Speed of Eye Blinking

Figure 4 for Detection of Driver Drowsiness by Calculating the Speed of Eye Blinking

Many road accidents are caused by drowsiness of the driver. While there are methods to detect closed eyes, it is a non-trivial task to detect the gradual process of a driver becoming drowsy. We consider a simple real-time detection system for drowsiness merely based on the eye blinking rate derived from the eye aspect ratio. For the eye detection we use HOG and a linear SVM. If the speed of the eye blinking drops below some empirically determined threshold, the system triggers an alarm, hence preventing the driver from falling into microsleep. In this paper, we extensively evaluate the minimal requirements for the proposed system. We find that this system works well if the face is directed to the camera, but it becomes less reliable once the head is tilted significantly. The results of our evaluations provide the foundation for further developments of our drowsiness detection system.

* This paper has been accepted at the Upper-Rhine Artificial Intelligence Symposium 2021

Via

Access Paper or Ask Questions

PDC: Piecewise Depth Completion utilizing Superpixels

Jul 14, 2021
Dennis Teutscher, Patrick Mangat, Oliver Wasenmüller

Figure 1 for PDC: Piecewise Depth Completion utilizing Superpixels

Figure 2 for PDC: Piecewise Depth Completion utilizing Superpixels

Figure 3 for PDC: Piecewise Depth Completion utilizing Superpixels

Figure 4 for PDC: Piecewise Depth Completion utilizing Superpixels

Depth completion from sparse LiDAR and high-resolution RGB data is one of the foundations for autonomous driving techniques. Current approaches often rely on CNN-based methods with several known drawbacks: flying pixel at depth discontinuities, overfitting to both a given data set as well as error metric, and many more. Thus, we propose our novel Piecewise Depth Completion (PDC), which works completely without deep learning. PDC segments the RGB image into superpixels corresponding the regions with similar depth value. Superpixels corresponding to same objects are gathered using a cost map. At the end, we receive detailed depth images with state of the art accuracy. In our evaluation, we can show both the influence of the individual proposed processing steps and the overall performance of our method on the challenging KITTI dataset.

* This paper has been accepted at IEEE Intelligent Transportation Systems Conference (ITSC), 2021

Via

Access Paper or Ask Questions

DVMN: Dense Validity Mask Network for Depth Completion

Jul 14, 2021
Laurenz Reichardt, Patrick Mangat, Oliver Wasenmüller

Figure 1 for DVMN: Dense Validity Mask Network for Depth Completion

Figure 2 for DVMN: Dense Validity Mask Network for Depth Completion

Figure 3 for DVMN: Dense Validity Mask Network for Depth Completion

Figure 4 for DVMN: Dense Validity Mask Network for Depth Completion

LiDAR depth maps provide environmental guidance in a variety of applications. However, such depth maps are typically sparse and insufficient for complex tasks such as autonomous navigation. State of the art methods use image guided neural networks for dense depth completion. We develop a guided convolutional neural network focusing on gathering dense and valid information from sparse depth maps. To this end, we introduce a novel layer with spatially variant and content-depended dilation to include additional data from sparse input. Furthermore, we propose a sparsity invariant residual bottleneck block. We evaluate our Dense Validity Mask Network (DVMN) on the KITTI depth completion benchmark and achieve state of the art results. At the time of submission, our network is the leading method using sparsity invariant convolution.

* This paper has been accepted at IEEE Intelligent Transportation Systems Conference (ITSC), 2021

Via

Access Paper or Ask Questions

Autoencoder Based Inter-Vehicle Generalization for In-Cabin Occupant Classification

May 07, 2021
Steve Dias Da Cruz, Bertram Taetz, Oliver Wasenmüller, Thomas Stifter, Didier Stricker

Figure 1 for Autoencoder Based Inter-Vehicle Generalization for In-Cabin Occupant Classification

Figure 2 for Autoencoder Based Inter-Vehicle Generalization for In-Cabin Occupant Classification

Figure 3 for Autoencoder Based Inter-Vehicle Generalization for In-Cabin Occupant Classification

Figure 4 for Autoencoder Based Inter-Vehicle Generalization for In-Cabin Occupant Classification

Common domain shift problem formulations consider the integration of multiple source domains, or the target domain during training. Regarding the generalization of machine learning models between different car interiors, we formulate the criterion of training in a single vehicle: without access to the target distribution of the vehicle the model would be deployed to, neither with access to multiple vehicles during training. We performed an investigation on the SVIRO dataset for occupant classification on the rear bench and propose an autoencoder based approach to improve the transferability. The autoencoder is on par with commonly used classification models when trained from scratch and sometimes out-performs models pre-trained on a large amount of data. Moreover, the autoencoder can transform images from unknown vehicles into the vehicle it was trained on. These results are corroborated by an evaluation on real infrared images from two vehicle interiors.

* This paper has been accepted at IEEE Intelligent Vehicles Symposium (IV), 2021

Via

Access Paper or Ask Questions

SALT: A Semi-automatic Labeling Tool for RGB-D Video Sequences

Feb 22, 2021
Dennis Stumpf, Stephan Krauß, Gerd Reis, Oliver Wasenmüller, Didier Stricker

Figure 1 for SALT: A Semi-automatic Labeling Tool for RGB-D Video Sequences

Figure 2 for SALT: A Semi-automatic Labeling Tool for RGB-D Video Sequences

Figure 3 for SALT: A Semi-automatic Labeling Tool for RGB-D Video Sequences

Figure 4 for SALT: A Semi-automatic Labeling Tool for RGB-D Video Sequences

Large labeled data sets are one of the essential basics of modern deep learning techniques. Therefore, there is an increasing need for tools that allow to label large amounts of data as intuitively as possible. In this paper, we introduce SALT, a tool to semi-automatically annotate RGB-D video sequences to generate 3D bounding boxes for full six Degrees of Freedom (DoF) object poses, as well as pixel-level instance segmentation masks for both RGB and depth. Besides bounding box propagation through various interpolation techniques, as well as algorithmically guided instance segmentation, our pipeline also provides built-in pre-processing functionalities to facilitate the data set creation process. By making full use of SALT, annotation time can be reduced by a factor of up to 33.95 for bounding box creation and 8.55 for RGB segmentation without compromising the quality of the automatically generated ground truth.

* Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4 VISAPP: VISAPP (2021) 595-603
* VISAPP 2021 full paper (9 pages, 6 figures), published by SciTePress: https://www.scitepress.org/PublicationsDetail.aspx?ID=ywQZ3GZrka8=&t=1

Via

Access Paper or Ask Questions

HPERL: 3D Human Pose Estimation from RGB and LiDAR

Oct 16, 2020
Michael Fürst, Shriya T. P. Gupta, René Schuster, Oliver Wasenmüller, Didier Stricker

Figure 1 for HPERL: 3D Human Pose Estimation from RGB and LiDAR

Figure 2 for HPERL: 3D Human Pose Estimation from RGB and LiDAR

Figure 3 for HPERL: 3D Human Pose Estimation from RGB and LiDAR

Figure 4 for HPERL: 3D Human Pose Estimation from RGB and LiDAR

In-the-wild human pose estimation has a huge potential for various fields, ranging from animation and action recognition to intention recognition and prediction for autonomous driving. The current state-of-the-art is focused only on RGB and RGB-D approaches for predicting the 3D human pose. However, not using precise LiDAR depth information limits the performance and leads to very inaccurate absolute pose estimation. With LiDAR sensors becoming more affordable and common on robots and autonomous vehicle setups, we propose an end-to-end architecture using RGB and LiDAR to predict the absolute 3D human pose with unprecedented precision. Additionally, we introduce a weakly-supervised approach to generate 3D predictions using 2D pose annotations from PedX [1]. This allows for many new opportunities in the field of 3D human pose estimation.

* 7 pages, 6 figures, 4 tables, LiDAR and RGB Fusion

Via

Access Paper or Ask Questions