Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Niko Sünderhauf

Open-Set Recognition in the Age of Vision-Language Models

Mar 25, 2024
Dimity Miller, Niko Sünderhauf, Alex Kenna, Keita Mason

Are vision-language models (VLMs) open-set models because they are trained on internet-scale datasets? We answer this question with a clear no - VLMs introduce closed-set assumptions via their finite query set, making them vulnerable to open-set conditions. We systematically evaluate VLMs for open-set recognition and find they frequently misclassify objects not contained in their query set, leading to alarmingly low precision when tuned for high recall and vice versa. We show that naively increasing the size of the query set to contain more and more classes does not mitigate this problem, but instead causes diminishing task performance and open-set performance. We establish a revised definition of the open-set problem for the age of VLMs, define a new benchmark and evaluation protocol to facilitate standardised evaluation and research in this important area, and evaluate promising baseline approaches based on predictive uncertainty and dedicated negative embeddings on a range of VLM classifiers and object detectors.

* 31 pages, under review

Via

Access Paper or Ask Questions

LHManip: A Dataset for Long-Horizon Language-Grounded Manipulation Tasks in Cluttered Tabletop Environments

Dec 20, 2023
Federico Ceola, Lorenzo Natale, Niko Sünderhauf, Krishan Rana

Instructing a robot to complete an everyday task within our homes has been a long-standing challenge for robotics. While recent progress in language-conditioned imitation learning and offline reinforcement learning has demonstrated impressive performance across a wide range of tasks, they are typically limited to short-horizon tasks -- not reflective of those a home robot would be expected to complete. While existing architectures have the potential to learn these desired behaviours, the lack of the necessary long-horizon, multi-step datasets for real robotic systems poses a significant challenge. To this end, we present the Long-Horizon Manipulation (LHManip) dataset comprising 200 episodes, demonstrating 20 different manipulation tasks via real robot teleoperation. The tasks entail multiple sub-tasks, including grasping, pushing, stacking and throwing objects in highly cluttered environments. Each task is paired with a natural language instruction and multi-camera viewpoints for point-cloud or NeRF reconstruction. In total, the dataset comprises 176,278 observation-action pairs which form part of the Open X-Embodiment dataset. The full LHManip dataset is made publicly available at https://github.com/fedeceola/LHManip.

* Submitted to IJRR

Via

Access Paper or Ask Questions

Contrastive Language, Action, and State Pre-training for Robot Learning

Apr 21, 2023
Krishan Rana, Andrew Melnik, Niko Sünderhauf

Figure 1 for Contrastive Language, Action, and State Pre-training for Robot Learning

Figure 2 for Contrastive Language, Action, and State Pre-training for Robot Learning

Figure 3 for Contrastive Language, Action, and State Pre-training for Robot Learning

Figure 4 for Contrastive Language, Action, and State Pre-training for Robot Learning

In this paper, we introduce a method for unifying language, action, and state information in a shared embedding space to facilitate a range of downstream tasks in robot learning. Our method, Contrastive Language, Action, and State Pre-training (CLASP), extends the CLIP formulation by incorporating distributional learning, capturing the inherent complexities and one-to-many relationships in behaviour-text alignment. By employing distributional outputs for both text and behaviour encoders, our model effectively associates diverse textual commands with a single behaviour and vice-versa. We demonstrate the utility of our method for the following downstream tasks: zero-shot text-behaviour retrieval, captioning unseen robot behaviours, and learning a behaviour prior for language-conditioned reinforcement learning. Our distributional encoders exhibit superior retrieval and captioning performance on unseen datasets, and the ability to generate meaningful exploratory behaviours from textual commands, capturing the intricate relationships between language, action, and state. This work represents an initial step towards developing a unified pre-trained model for robotics, with the potential to generalise to a broad range of downstream tasks.

Via

Access Paper or Ask Questions

Addressing the Challenges of Open-World Object Detection

Mar 27, 2023
David Pershouse, Feras Dayoub, Dimity Miller, Niko Sünderhauf

Figure 1 for Addressing the Challenges of Open-World Object Detection

Figure 2 for Addressing the Challenges of Open-World Object Detection

Figure 3 for Addressing the Challenges of Open-World Object Detection

Figure 4 for Addressing the Challenges of Open-World Object Detection

We address the challenging problem of open world object detection (OWOD), where object detectors must identify objects from known classes while also identifying and continually learning to detect novel objects. Prior work has resulted in detectors that have a relatively low ability to detect novel objects, and a high likelihood of classifying a novel object as one of the known classes. We approach the problem by identifying the three main challenges that OWOD presents and introduce OW-RCNN, an open world object detector that addresses each of these three challenges. OW-RCNN establishes a new state of the art using the open-world evaluation protocol on MS-COCO, showing a drastically increased ability to detect novel objects (16-21% absolute increase in U-Recall), to avoid their misclassification as one of the known classes (up to 52% reduction in A-OSE), and to incrementally learn to detect them while maintaining performance on previously known classes (1-6% absolute increase in mAP).

Via

Access Paper or Ask Questions

ParticleNeRF: A Particle-Based Encoding for Online Neural Radiance Fields in Dynamic Scenes

Nov 11, 2022
Jad Abou-Chakra, Feras Dayoub, Niko Sünderhauf

Figure 1 for ParticleNeRF: A Particle-Based Encoding for Online Neural Radiance Fields in Dynamic Scenes

Figure 2 for ParticleNeRF: A Particle-Based Encoding for Online Neural Radiance Fields in Dynamic Scenes

Figure 3 for ParticleNeRF: A Particle-Based Encoding for Online Neural Radiance Fields in Dynamic Scenes

Figure 4 for ParticleNeRF: A Particle-Based Encoding for Online Neural Radiance Fields in Dynamic Scenes

Neural Radiance Fields (NeRFs) learn implicit representations of - typically static - environments from images. Our paper extends NeRFs to handle dynamic scenes in an online fashion. We propose ParticleNeRF that adapts to changes in the geometry of the environment as they occur, learning a new up-to-date representation every 350 ms. ParticleNeRF can represent the current state of dynamic environments with much higher fidelity as other NeRF frameworks. To achieve this, we introduce a new particle-based parametric encoding, which allows the intermediate NeRF features - now coupled to particles in space - to move with the dynamic geometry. This is possible by backpropagating the photometric reconstruction loss into the position of the particles. The position gradients are interpreted as particle velocities and integrated into positions using a position-based dynamics (PBS) physics system. Introducing PBS into the NeRF formulation allows us to add collision constraints to the particle motion and creates future opportunities to add other movement priors into the system, such as rigid and deformable body

Via

Access Paper or Ask Questions

ParticleNeRF: Particle Based Encoding for Online Neural Radiance Fields in Dynamic Scenes

Nov 08, 2022
Jad Abou-Chakra, Feras Dayoub, Niko Sünderhauf

Figure 1 for ParticleNeRF: Particle Based Encoding for Online Neural Radiance Fields in Dynamic Scenes

Figure 2 for ParticleNeRF: Particle Based Encoding for Online Neural Radiance Fields in Dynamic Scenes

Figure 3 for ParticleNeRF: Particle Based Encoding for Online Neural Radiance Fields in Dynamic Scenes

Figure 4 for ParticleNeRF: Particle Based Encoding for Online Neural Radiance Fields in Dynamic Scenes

Neural Radiance Fields (NeRFs) are coordinate-based implicit representations of 3D scenes that use a differentiable rendering procedure to learn a representation of an environment from images. This paper extends NeRFs to handle dynamic scenes in an online fashion. We do so by introducing a particle-based parametric encoding, which allows the intermediate NeRF features -- now coupled to particles in space -- to be moved with the dynamic geometry. We backpropagate the NeRF's photometric reconstruction loss into the position of the particles in addition to the features they are associated with. The position gradients are interpreted as particle velocities and integrated into positions using a position-based dynamics (PBS) physics system. Introducing PBS into the NeRF formulation allows us to add collision constraints to the particle motion and creates future opportunities to add other movement priors into the system such as rigid and deformable body constraints. We show that by allowing the features to move in space, we incrementally adapt the NeRF to the changing scene.

Via

Access Paper or Ask Questions

Residual Skill Policies: Learning an Adaptable Skill-based Action Space for Reinforcement Learning for Robotics

Nov 04, 2022
Krishan Rana, Ming Xu, Brendan Tidd, Michael Milford, Niko Sünderhauf

Figure 1 for Residual Skill Policies: Learning an Adaptable Skill-based Action Space for Reinforcement Learning for Robotics

Figure 2 for Residual Skill Policies: Learning an Adaptable Skill-based Action Space for Reinforcement Learning for Robotics

Figure 3 for Residual Skill Policies: Learning an Adaptable Skill-based Action Space for Reinforcement Learning for Robotics

Figure 4 for Residual Skill Policies: Learning an Adaptable Skill-based Action Space for Reinforcement Learning for Robotics

Skill-based reinforcement learning (RL) has emerged as a promising strategy to leverage prior knowledge for accelerated robot learning. Skills are typically extracted from expert demonstrations and are embedded into a latent space from which they can be sampled as actions by a high-level RL agent. However, this skill space is expansive, and not all skills are relevant for a given robot state, making exploration difficult. Furthermore, the downstream RL agent is limited to learning structurally similar tasks to those used to construct the skill space. We firstly propose accelerating exploration in the skill space using state-conditioned generative models to directly bias the high-level agent towards only sampling skills relevant to a given state based on prior experience. Next, we propose a low-level residual policy for fine-grained skill adaptation enabling downstream RL agents to adapt to unseen task variations. Finally, we validate our approach across four challenging manipulation tasks that differ from those used to build the skill space, demonstrating our ability to learn across task variations while significantly accelerating exploration, outperforming prior works. Code and videos are available on our project website: https://krishanrana.github.io/reskill.

* 6th Conference on Robot Learning (CoRL), 2022

Via

Access Paper or Ask Questions

Retrospectives on the Embodied AI Workshop

Oct 17, 2022
Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi, Li Fei-Fei, Anthony Francis, Chuang Gan, Kristen Grauman, David Hall, Winson Han, Unnat Jain, Aniruddha Kembhavi, Jacob Krantz, Stefan Lee, Chengshu Li, Sagnik Majumder, Oleksandr Maksymets, Roberto Martín-Martín, Roozbeh Mottaghi, Sonia Raychaudhuri, Mike Roberts, Silvio Savarese, Manolis Savva, Mohit Shridhar, Niko Sünderhauf, Andrew Szot, Ben Talbot, Joshua B. Tenenbaum, Jesse Thomason, Alexander Toshev, Joanne Truong, Luca Weihs, Jiajun Wu

Figure 1 for Retrospectives on the Embodied AI Workshop

Figure 2 for Retrospectives on the Embodied AI Workshop

Figure 3 for Retrospectives on the Embodied AI Workshop

Figure 4 for Retrospectives on the Embodied AI Workshop

We present a retrospective on the state of Embodied AI research. Our analysis focuses on 13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are grouped into three themes: (1) visual navigation, (2) rearrangement, and (3) embodied vision-and-language. We discuss the dominant datasets within each theme, evaluation metrics for the challenges, and the performance of state-of-the-art models. We highlight commonalities between top approaches to the challenges and identify potential future directions for Embodied AI research.

Via

Access Paper or Ask Questions

Density-aware NeRF Ensembles: Quantifying Predictive Uncertainty in Neural Radiance Fields

Sep 19, 2022
Niko Sünderhauf, Jad Abou-Chakra, Dimity Miller

Figure 1 for Density-aware NeRF Ensembles: Quantifying Predictive Uncertainty in Neural Radiance Fields

Figure 2 for Density-aware NeRF Ensembles: Quantifying Predictive Uncertainty in Neural Radiance Fields

Figure 3 for Density-aware NeRF Ensembles: Quantifying Predictive Uncertainty in Neural Radiance Fields

Figure 4 for Density-aware NeRF Ensembles: Quantifying Predictive Uncertainty in Neural Radiance Fields

We show that ensembling effectively quantifies model uncertainty in Neural Radiance Fields (NeRFs) if a density-aware epistemic uncertainty term is considered. The naive ensembles investigated in prior work simply average rendered RGB images to quantify the model uncertainty caused by conflicting explanations of the observed scene. In contrast, we additionally consider the termination probabilities along individual rays to identify epistemic model uncertainty due to a lack of knowledge about the parts of a scene unobserved during training. We achieve new state-of-the-art performance across established uncertainty quantification benchmarks for NeRFs, outperforming methods that require complex changes to the NeRF architecture and training regime. We furthermore demonstrate that NeRF uncertainty can be utilised for next-best view selection and model refinement.

Via

Access Paper or Ask Questions

Noisy Inliers Make Great Outliers: Out-of-Distribution Object Detection with Noisy Synthetic Outliers

Aug 29, 2022
Samuel Wilson, Tobias Fischer, Feras Dayoub, Niko Sünderhauf

Figure 1 for Noisy Inliers Make Great Outliers: Out-of-Distribution Object Detection with Noisy Synthetic Outliers

Figure 2 for Noisy Inliers Make Great Outliers: Out-of-Distribution Object Detection with Noisy Synthetic Outliers

Figure 3 for Noisy Inliers Make Great Outliers: Out-of-Distribution Object Detection with Noisy Synthetic Outliers

Figure 4 for Noisy Inliers Make Great Outliers: Out-of-Distribution Object Detection with Noisy Synthetic Outliers

Many high-performing works on out-of-distribution (OOD) detection use real or synthetically generated outlier data to regularise model confidence; however, they often require retraining of the base network or specialised model architectures. Our work demonstrates that Noisy Inliers Make Great Outliers (NIMGO) in the challenging field of OOD object detection. We hypothesise that synthetic outliers need only be minimally perturbed variants of the in-distribution (ID) data in order to train a discriminator to identify OOD samples -- without expensive retraining of the base network. To test our hypothesis, we generate a synthetic outlier set by applying an additive-noise perturbation to ID samples at the image or bounding-box level. An auxiliary feature monitoring multilayer perceptron (MLP) is then trained to detect OOD feature representations using the perturbed ID samples as a proxy. During testing, we demonstrate that the auxiliary MLP distinguishes ID samples from OOD samples at a state-of-the-art level, reducing the false positive rate by more than 20\% (absolute) over the previous state-of-the-art on the OpenImages dataset. Extensive additional ablations provide empirical evidence in support of our hypothesis.

Via

Access Paper or Ask Questions