Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sarah Parisot

MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation

Apr 03, 2024
Petru-Daniel Tudosiu, Yongxin Yang, Shifeng Zhang, Fei Chen, Steven McDonagh, Gerasimos Lampouras, Ignacio Iacobacci, Sarah Parisot

Text-to-image generation has achieved astonishing results, yet precise spatial controllability and prompt fidelity remain highly challenging. This limitation is typically addressed through cumbersome prompt engineering, scene layout conditioning, or image editing techniques which often require hand drawn masks. Nonetheless, pre-existing works struggle to take advantage of the natural instance-level compositionality of scenes due to the typically flat nature of rasterized RGB output images. Towards adressing this challenge, we introduce MuLAn: a novel dataset comprising over 44K MUlti-Layer ANnotations of RGB images as multilayer, instance-wise RGBA decompositions, and over 100K instance images. To build MuLAn, we developed a training free pipeline which decomposes a monocular RGB image into a stack of RGBA layers comprising of background and isolated instances. We achieve this through the use of pretrained general-purpose models, and by developing three modules: image decomposition for instance discovery and extraction, instance completion to reconstruct occluded areas, and image re-assembly. We use our pipeline to create MuLAn-COCO and MuLAn-LAION datasets, which contain a variety of image decompositions in terms of style, composition and complexity. With MuLAn, we provide the first photorealistic resource providing instance decomposition and occlusion information for high quality images, opening up new avenues for text-to-image generative AI research. With this, we aim to encourage the development of novel generation and editing technology, in particular layer-wise solutions. MuLAn data resources are available at https://MuLAn-dataset.github.io/.

* CVPR 2024 - Project page: https://MuLAn-dataset.github.io/

Via

Access Paper or Ask Questions

Optimisation-Based Multi-Modal Semantic Image Editing

Nov 28, 2023
Bowen Li, Yongxin Yang, Steven McDonagh, Shifeng Zhang, Petru-Daniel Tudosiu, Sarah Parisot

Image editing affords increased control over the aesthetics and content of generated images. Pre-existing works focus predominantly on text-based instructions to achieve desired image modifications, which limit edit precision and accuracy. In this work, we propose an inference-time editing optimisation, designed to extend beyond textual edits to accommodate multiple editing instruction types (e.g. spatial layout-based; pose, scribbles, edge maps). We propose to disentangle the editing task into two competing subtasks: successful local image modifications and global content consistency preservation, where subtasks are guided through two dedicated loss functions. By allowing to adjust the influence of each loss function, we build a flexible editing solution that can be adjusted to user preferences. We evaluate our method using text, pose and scribble edit conditions, and highlight our ability to achieve complex edits, through both qualitative and quantitative experiments.

Via

Access Paper or Ask Questions

Learning to Name Classes for Vision and Language Models

Apr 04, 2023
Sarah Parisot, Yongxin Yang, Steven McDonagh

Figure 1 for Learning to Name Classes for Vision and Language Models

Figure 2 for Learning to Name Classes for Vision and Language Models

Figure 3 for Learning to Name Classes for Vision and Language Models

Figure 4 for Learning to Name Classes for Vision and Language Models

Large scale vision and language models can achieve impressive zero-shot recognition performance by mapping class specific text queries to image content. Two distinct challenges that remain however, are high sensitivity to the choice of handcrafted class names that define queries, and the difficulty of adaptation to new, smaller datasets. Towards addressing these problems, we propose to leverage available data to learn, for each class, an optimal word embedding as a function of the visual content. By learning new word embeddings on an otherwise frozen model, we are able to retain zero-shot capabilities for new classes, easily adapt models to new datasets, and adjust potentially erroneous, non-descriptive or ambiguous class names. We show that our solution can easily be integrated in image classification and object detection pipelines, yields significant performance gains in multiple scenarios and provides insights into model biases and labelling errors.

* Accepted to CVPR 2023

Via

Access Paper or Ask Questions

Content-Diverse Comparisons improve IQA

Nov 09, 2022
William Thong, Jose Costa Pereira, Sarah Parisot, Ales Leonardis, Steven McDonagh

Figure 1 for Content-Diverse Comparisons improve IQA

Figure 2 for Content-Diverse Comparisons improve IQA

Figure 3 for Content-Diverse Comparisons improve IQA

Figure 4 for Content-Diverse Comparisons improve IQA

Image quality assessment (IQA) forms a natural and often straightforward undertaking for humans, yet effective automation of the task remains highly challenging. Recent metrics from the deep learning community commonly compare image pairs during training to improve upon traditional metrics such as PSNR or SSIM. However, current comparisons ignore the fact that image content affects quality assessment as comparisons only occur between images of similar content. This restricts the diversity and number of image pairs that the model is exposed to during training. In this paper, we strive to enrich these comparisons with content diversity. Firstly, we relax comparison constraints, and compare pairs of images with differing content. This increases the variety of available comparisons. Secondly, we introduce listwise comparisons to provide a holistic view to the model. By including differentiable regularizers, derived from correlation coefficients, models can better adjust predicted scores relative to one another. Evaluation on multiple benchmarks, covering a wide range of distortions and image content, shows the effectiveness of our learning scheme for training image quality assessment models.

* Accepted at British Machine Vision Conference (BMVC) 2022

Via

Access Paper or Ask Questions

CLAD: A realistic Continual Learning benchmark for Autonomous Driving

Oct 07, 2022
Eli Verwimp, Kuo Yang, Sarah Parisot, Hong Lanqing, Steven McDonagh, Eduardo Pérez-Pellitero, Matthias De Lange, Tinne Tuytelaars

Figure 1 for CLAD: A realistic Continual Learning benchmark for Autonomous Driving

Figure 2 for CLAD: A realistic Continual Learning benchmark for Autonomous Driving

Figure 3 for CLAD: A realistic Continual Learning benchmark for Autonomous Driving

Figure 4 for CLAD: A realistic Continual Learning benchmark for Autonomous Driving

In this paper we describe the design and the ideas motivating a new Continual Learning benchmark for Autonomous Driving (CLAD), that focuses on the problems of object classification and object detection. The benchmark utilises SODA10M, a recently released large-scale dataset that concerns autonomous driving related problems. First, we review and discuss existing continual learning benchmarks, how they are related, and show that most are extreme cases of continual learning. To this end, we survey the benchmarks used in continual learning papers at three highly ranked computer vision conferences. Next, we introduce CLAD-C, an online classification benchmark realised through a chronological data stream that poses both class and domain incremental challenges; and CLAD-D, a domain incremental continual object detection benchmark. We examine the inherent difficulties and challenges posed by the benchmark, through a survey of the techniques and methods used by the top-3 participants in a CLAD-challenge workshop at ICCV 2021. We conclude with possible pathways to improve the current continual learning state of the art, and which directions we deem promising for future research.

Via

Access Paper or Ask Questions

Re-examining Distillation For Continual Object Detection

Apr 04, 2022
Eli Verwimp, Kuo Yang, Sarah Parisot, Hong Lanqing, Steven McDonagh, Eduardo Pérez-Pellitero, Matthias De Lange, Tinne Tuytelaars

Figure 1 for Re-examining Distillation For Continual Object Detection

Figure 2 for Re-examining Distillation For Continual Object Detection

Figure 3 for Re-examining Distillation For Continual Object Detection

Figure 4 for Re-examining Distillation For Continual Object Detection

Training models continually to detect and classify objects, from new classes and new domains, remains an open problem. In this work, we conduct a thorough analysis of why and how object detection models forget catastrophically. We focus on distillation-based approaches in two-stage networks; the most-common strategy employed in contemporary continual object detection work.Distillation aims to transfer the knowledge of a model trained on previous tasks -- the teacher -- to a new model -- the student -- while it learns the new task. We show that this works well for the region proposal network, but that wrong, yet overly confident teacher predictions prevent student models from effective learning of the classification head. Our analysis provides a foundation that allows us to propose improvements for existing techniques by detecting incorrect teacher predictions, based on current ground-truth labels, and by employing an adaptive Huber loss as opposed to the mean squared error for the distillation loss in the classification heads. We evidence that our strategy works not only in a class incremental setting, but also in domain incremental settings, which constitute a realistic context, likely to be the setting of representative real-world problems.

Via

Access Paper or Ask Questions

Long-tail Recognition via Compositional Knowledge Transfer

Dec 13, 2021
Sarah Parisot, Pedro M. Esperanca, Steven McDonagh, Tamas J. Madarasz, Yongxin Yang, Zhenguo Li

Figure 1 for Long-tail Recognition via Compositional Knowledge Transfer

Figure 2 for Long-tail Recognition via Compositional Knowledge Transfer

Figure 3 for Long-tail Recognition via Compositional Knowledge Transfer

Figure 4 for Long-tail Recognition via Compositional Knowledge Transfer

In this work, we introduce a novel strategy for long-tail recognition that addresses the tail classes' few-shot problem via training-free knowledge transfer. Our objective is to transfer knowledge acquired from information-rich common classes to semantically similar, and yet data-hungry, rare classes in order to obtain stronger tail class representations. We leverage the fact that class prototypes and learned cosine classifiers provide two different, complementary representations of class cluster centres in feature space, and use an attention mechanism to select and recompose learned classifier features from common classes to obtain higher quality rare class representations. Our knowledge transfer process is training free, reducing overfitting risks, and can afford continual extension of classifiers to new classes. Experiments show that our approach can achieve significant performance boosts on rare classes while maintaining robust common class performance, outperforming directly comparable state-of-the-art models.

Via

Access Paper or Ask Questions

Learning Compositional Shape Priors for Few-Shot 3D Reconstruction

Jun 16, 2021
Mateusz Michalkiewicz, Stavros Tsogkas, Sarah Parisot, Mahsa Baktashmotlagh, Anders Eriksson, Eugene Belilovsky

Figure 1 for Learning Compositional Shape Priors for Few-Shot 3D Reconstruction

Figure 2 for Learning Compositional Shape Priors for Few-Shot 3D Reconstruction

Figure 3 for Learning Compositional Shape Priors for Few-Shot 3D Reconstruction

Figure 4 for Learning Compositional Shape Priors for Few-Shot 3D Reconstruction

The impressive performance of deep convolutional neural networks in single-view 3D reconstruction suggests that these models perform non-trivial reasoning about the 3D structure of the output space. Recent work has challenged this belief, showing that, on standard benchmarks, complex encoder-decoder architectures perform similarly to nearest-neighbor baselines or simple linear decoder models that exploit large amounts of per-category data. However, building large collections of 3D shapes for supervised training is a laborious process; a more realistic and less constraining task is inferring 3D shapes for categories with few available training examples, calling for a model that can successfully generalize to novel object classes. In this work we experimentally demonstrate that naive baselines fail in this few-shot learning setting, in which the network must learn informative shape priors for inference of new categories. We propose three ways to learn a class-specific global shape prior, directly from data. Using these techniques, we are able to capture multi-scale information about the 3D shape, and account for intra-class variability by virtue of an implicit compositional structure. Experiments on the popular ShapeNet dataset show that our method outperforms a zero-shot baseline by over 40%, and the current state-of-the-art by over 10%, in terms of relative performance, in the few-shot setting.

* 13 pages, 12 figures. arXiv admin note: substantial text overlap with arXiv:2004.06302

Via

Access Paper or Ask Questions

Probabilistic 3D surface reconstruction from sparse MRI information

Oct 05, 2020
Katarína Tóthová, Sarah Parisot, Matthew Lee, Esther Puyol-Antón, Andrew King, Marc Pollefeys, Ender Konukoglu

Figure 1 for Probabilistic 3D surface reconstruction from sparse MRI information

Figure 2 for Probabilistic 3D surface reconstruction from sparse MRI information

Figure 3 for Probabilistic 3D surface reconstruction from sparse MRI information

Figure 4 for Probabilistic 3D surface reconstruction from sparse MRI information

Surface reconstruction from magnetic resonance (MR) imaging data is indispensable in medical image analysis and clinical research. A reliable and effective reconstruction tool should: be fast in prediction of accurate well localised and high resolution models, evaluate prediction uncertainty, work with as little input data as possible. Current deep learning state of the art (SOTA) 3D reconstruction methods, however, often only produce shapes of limited variability positioned in a canonical position or lack uncertainty evaluation. In this paper, we present a novel probabilistic deep learning approach for concurrent 3D surface reconstruction from sparse 2D MR image data and aleatoric uncertainty prediction. Our method is capable of reconstructing large surface meshes from three quasi-orthogonal MR imaging slices from limited training sets whilst modelling the location of each mesh vertex through a Gaussian distribution. Prior shape information is encoded using a built-in linear principal component analysis (PCA) model. Extensive experiments on cardiac MR data show that our probabilistic approach successfully assesses prediction uncertainty while at the same time qualitatively and quantitatively outperforms SOTA methods in shape prediction. Compared to SOTA, we are capable of properly localising and orientating the prediction via the use of a spatially aware neural network.

* MICCAI 2020

Via

Access Paper or Ask Questions

Many-shot from Low-shot: Learning to Annotate using Mixed Supervision for Object Detection

Aug 26, 2020
Carlo Biffi, Steven McDonagh, Philip Torr, Ales Leonardis, Sarah Parisot

Figure 1 for Many-shot from Low-shot: Learning to Annotate using Mixed Supervision for Object Detection

Figure 2 for Many-shot from Low-shot: Learning to Annotate using Mixed Supervision for Object Detection

Figure 3 for Many-shot from Low-shot: Learning to Annotate using Mixed Supervision for Object Detection

Figure 4 for Many-shot from Low-shot: Learning to Annotate using Mixed Supervision for Object Detection

Object detection has witnessed significant progress by relying on large, manually annotated datasets. Annotating such datasets is highly time consuming and expensive, which motivates the development of weakly supervised and few-shot object detection methods. However, these methods largely underperform with respect to their strongly supervised counterpart, as weak training signals \emph{often} result in partial or oversized detections. Towards solving this problem we introduce, for the first time, an online annotation module (OAM) that learns to generate a many-shot set of \emph{reliable} annotations from a larger volume of weakly labelled images. Our OAM can be jointly trained with any fully supervised two-stage object detection method, providing additional training annotations on the fly. This results in a fully end-to-end strategy that only requires a low-shot set of fully annotated images. The integration of the OAM with Fast(er) R-CNN improves their performance by $17\%$ mAP, $9\%$ AP50 on PASCAL VOC 2007 and MS-COCO benchmarks, and significantly outperforms competing methods using mixed supervision.

* Accepted at ECCV 2020. Camera-ready version and Appendices

Via

Access Paper or Ask Questions