Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Helge Ritter

Video Diffusion Models: A Survey

May 06, 2024
Andrew Melnik, Michal Ljubljanac, Cong Lu, Qi Yan, Weiming Ren, Helge Ritter

Diffusion generative models have recently become a robust technique for producing and modifying coherent, high-quality video. This survey offers a systematic overview of critical elements of diffusion models for video generation, covering applications, architectural choices, and the modeling of temporal dynamics. Recent advancements in the field are summarized and grouped into development trends. The survey concludes with an overview of remaining challenges and an outlook on the future of the field. Website: https://github.com/ndrwmlnk/Awesome-Video-Diffusion-Models

Via

Access Paper or Ask Questions

Lane Segmentation Refinement with Diffusion Models

May 01, 2024
Antonio Ruiz, Andrew Melnik, Dong Wang, Helge Ritter

The lane graph is a key component for building high-definition (HD) maps and crucial for downstream tasks such as autonomous driving or navigation planning. Previously, He et al. (2022) explored the extraction of the lane-level graph from aerial imagery utilizing a segmentation based approach. However, segmentation networks struggle to achieve perfect segmentation masks resulting in inaccurate lane graph extraction. We explore additional enhancements to refine this segmentation-based approach and extend it with a diffusion probabilistic model (DPM) component. This combination further improves the GEO F1 and TOPO F1 scores, which are crucial indicators of the quality of a lane graph, in the undirected graph in non-intersection areas. We conduct experiments on a publicly available dataset, demonstrating that our method outperforms the previous approach, particularly in enhancing the connectivity of such a graph, as measured by the TOPO F1 score. Moreover, we perform ablation studies on the individual components of our method to understand their contribution and evaluate their effectiveness.

Via

Access Paper or Ask Questions

Benchmarks for Physical Reasoning AI

Dec 17, 2023
Andrew Melnik, Robin Schiewer, Moritz Lange, Andrei Muresanu, Mozhgan Saeidi, Animesh Garg, Helge Ritter

Physical reasoning is a crucial aspect in the development of general AI systems, given that human learning starts with interacting with the physical world before progressing to more complex concepts. Although researchers have studied and assessed the physical reasoning of AI approaches through various specific benchmarks, there is no comprehensive approach to evaluating and measuring progress. Therefore, we aim to offer an overview of existing benchmarks and their solution approaches and propose a unified perspective for measuring the physical reasoning capacity of AI systems. We select benchmarks that are designed to test algorithmic performance in physical reasoning tasks. While each of the selected benchmarks poses a unique challenge, their ensemble provides a comprehensive proving ground for an AI generalist agent with a measurable skill level for various physical reasoning concepts. This gives an advantage to such an ensemble of benchmarks over other holistic benchmarks that aim to simulate the real world by intertwining its complexity and many concepts. We group the presented set of physical reasoning benchmarks into subcategories so that more narrow generalist AI agents can be tested first on these groups.

Via

Access Paper or Ask Questions

Towards Transferring Tactile-based Continuous Force Control Policies from Simulation to Robot

Nov 13, 2023
Luca Lach, Robert Haschke, Davide Tateo, Jan Peters, Helge Ritter, Júlia Borràs, Carme Torras

The advent of tactile sensors in robotics has sparked many ideas on how robots can leverage direct contact measurements of their environment interactions to improve manipulation tasks. An important line of research in this regard is that of grasp force control, which aims to manipulate objects safely by limiting the amount of force exerted on the object. While prior works have either hand-modeled their force controllers, employed model-based approaches, or have not shown sim-to-real transfer, we propose a model-free deep reinforcement learning approach trained in simulation and then transferred to the robot without further fine-tuning. We therefore present a simulation environment that produces realistic normal forces, which we use to train continuous force control policies. An evaluation in which we compare against a baseline and perform an ablation study shows that our approach outperforms the hand-modeled baseline and that our proposed inductive bias and domain randomization facilitate sim-to-real transfer. Code, models, and supplementary videos are available on https://sites.google.com/view/rl-force-ctrl

Via

Access Paper or Ask Questions

Bio-Inspired Grasping Controller for Sensorized 2-DoF Grippers

Nov 13, 2023
Luca Lach, Séverin Lemaignan, Francesco Ferro, Helge Ritter, Robert Haschke

We present a holistic grasping controller, combining free-space position control and in-contact force-control for reliable grasping given uncertain object pose estimates. Employing tactile fingertip sensors, undesired object displacement during grasping is minimized by pausing the finger closing motion for individual joints on first contact until force-closure is established. While holding an object, the controller is compliant with external forces to avoid high internal object forces and prevent object damage. Gravity as an external force is explicitly considered and compensated for, thus preventing gravity-induced object drift. We evaluate the controller in two experiments on the TIAGo robot and its parallel-jaw gripper proving the effectiveness of the approach for robust grasping and minimizing object displacement. In a series of ablation studies, we demonstrate the utility of the individual controller components.

Via

Access Paper or Ask Questions

Shape complexity estimation using VAE

Apr 05, 2023
Markus Rothgaenger, Andrew Melnik, Helge Ritter

Figure 1 for Shape complexity estimation using VAE

Figure 2 for Shape complexity estimation using VAE

Figure 3 for Shape complexity estimation using VAE

Figure 4 for Shape complexity estimation using VAE

In this paper, we compare methods for estimating the complexity of two-dimensional shapes and introduce a method that exploits reconstruction loss of Variational Autoencoders with different sizes of latent vectors. Although complexity of a shape is not a well defined attribute, different aspects of it can be estimated. We demonstrate that our methods captures some aspects of shape complexity. Code and training details will be publicly available.

Via

Access Paper or Ask Questions

Face Generation and Editing with StyleGAN: A Survey

Dec 18, 2022
Andrew Melnik, Maksim Miasayedzenkau, Dzianis Makarovets, Dzianis Pirshtuk, Eren Akbulut, Dennis Holzmann, Tarek Renusch, Gustav Reichert, Helge Ritter

Figure 1 for Face Generation and Editing with StyleGAN: A Survey

Figure 2 for Face Generation and Editing with StyleGAN: A Survey

Figure 3 for Face Generation and Editing with StyleGAN: A Survey

Figure 4 for Face Generation and Editing with StyleGAN: A Survey

Our goal with this survey is to provide an overview of the state of the art deep learning technologies for face generation and editing. We will cover popular latest architectures and discuss key ideas that make them work, such as inversion, latent representation, loss functions, training procedures, editing methods, and cross domain style transfer. We particularly focus on GAN-based architectures that have culminated in the StyleGAN approaches, which allow generation of high-quality face images and offer rich interfaces for controllable semantics editing and preserving photo quality. We aim to provide an entry point into the field for readers that have basic knowledge about the field of deep learning and are looking for an accessible introduction and overview.

Via

Access Paper or Ask Questions

Solving Learn-to-Race Autonomous Racing Challenge by Planning in Latent Space

Jul 05, 2022
Shivansh Beohar, Fabian Heinrich, Rahul Kala, Helge Ritter, Andrew Melnik

Figure 1 for Solving Learn-to-Race Autonomous Racing Challenge by Planning in Latent Space

Figure 2 for Solving Learn-to-Race Autonomous Racing Challenge by Planning in Latent Space

Figure 3 for Solving Learn-to-Race Autonomous Racing Challenge by Planning in Latent Space

Learn-to-Race Autonomous Racing Virtual Challenge hosted on www<dot>aicrowd<dot>com platform consisted of two tracks: Single and Multi Camera. Our UniTeam team was among the final winners in the Single Camera track. The agent is required to pass the previously unknown F1-style track in the minimum time with the least amount of off-road driving violations. In our approach, we used the U-Net architecture for road segmentation, variational autocoder for encoding a road binary mask, and a nearest-neighbor search strategy that selects the best action for a given state. Our agent achieved an average speed of 105 km/h on stage 1 (known track) and 73 km/h on stage 2 (unknown track) without any off-road driving violations. Here we present our solution and results.

* Published in SL4AD Workshop, ICML 2022

Via

Access Paper or Ask Questions

YOLO -- You only look 10647 times

Jan 21, 2022
Christian Limberg, Andrew Melnik, Augustin Harter, Helge Ritter

Figure 1 for YOLO -- You only look 10647 times

Figure 2 for YOLO -- You only look 10647 times

Figure 3 for YOLO -- You only look 10647 times

Figure 4 for YOLO -- You only look 10647 times

With this work we are explaining the "You Only Look Once" (YOLO) single-stage object detection approach as a parallel classification of 10647 fixed region proposals. We support this view by showing that each of YOLOs output pixel is attentive to a specific sub-region of previous layers, comparable to a local region proposal. This understanding reduces the conceptual gap between YOLO-like single-stage object detection models, RCNN-like two-stage region proposal based models, and ResNet-like image classification models. In addition, we created interactive exploration tools for a better visual understanding of the YOLO information processing streams: https://limchr.github.io/yolo_visualization

Via

Access Paper or Ask Questions

Transfer Learning with Jukebox for Music Source Separation

Nov 28, 2021
Wadhah Zai El Amri, Oliver Tautz, Helge Ritter, Andrew Melnik

Figure 1 for Transfer Learning with Jukebox for Music Source Separation

Figure 2 for Transfer Learning with Jukebox for Music Source Separation

Figure 3 for Transfer Learning with Jukebox for Music Source Separation

In this work, we demonstrate how to adapt a publicly available pre-trained Jukebox model for the problem of audio source separation from a single mixed audio channel. Our neural network architecture for transfer learning is fast to train and results demonstrate comparable performance to other state-of-the-art approaches. We provide an open-source code implementation of our architecture (https://rebrand.ly/transfer-jukebox-github).

* 4Pages, 2 Figures

Via

Access Paper or Ask Questions