Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ligong Han

Score-Guided Diffusion for 3D Human Recovery

Mar 14, 2024
Anastasis Stathopoulos, Ligong Han, Dimitris Metaxas

Figure 1 for Score-Guided Diffusion for 3D Human Recovery

Figure 2 for Score-Guided Diffusion for 3D Human Recovery

Figure 3 for Score-Guided Diffusion for 3D Human Recovery

Figure 4 for Score-Guided Diffusion for 3D Human Recovery

We present Score-Guided Human Mesh Recovery (ScoreHMR), an approach for solving inverse problems for 3D human pose and shape reconstruction. These inverse problems involve fitting a human body model to image observations, traditionally solved through optimization techniques. ScoreHMR mimics model fitting approaches, but alignment with the image observation is achieved through score guidance in the latent space of a diffusion model. The diffusion model is trained to capture the conditional distribution of the human model parameters given an input image. By guiding its denoising process with a task-specific score, ScoreHMR effectively solves inverse problems for various applications without the need for retraining the task-agnostic diffusion model. We evaluate our approach on three settings/applications. These are: (i) single-frame model fitting; (ii) reconstruction from multiple uncalibrated views; (iii) reconstructing humans in video sequences. ScoreHMR consistently outperforms all optimization baselines on popular benchmarks across all settings. We make our code and models available at the https://statho.github.io/ScoreHMR.

* CVPR 2024 (project page: https://statho.github.io/ScoreHMR)

Via

Access Paper or Ask Questions

DMCVR: Morphology-Guided Diffusion Model for 3D Cardiac Volume Reconstruction

Aug 18, 2023
Xiaoxiao He, Chaowei Tan, Ligong Han, Bo Liu, Leon Axel, Kang Li, Dimitris N. Metaxas

Figure 1 for DMCVR: Morphology-Guided Diffusion Model for 3D Cardiac Volume Reconstruction

Figure 2 for DMCVR: Morphology-Guided Diffusion Model for 3D Cardiac Volume Reconstruction

Figure 3 for DMCVR: Morphology-Guided Diffusion Model for 3D Cardiac Volume Reconstruction

Figure 4 for DMCVR: Morphology-Guided Diffusion Model for 3D Cardiac Volume Reconstruction

Accurate 3D cardiac reconstruction from cine magnetic resonance imaging (cMRI) is crucial for improved cardiovascular disease diagnosis and understanding of the heart's motion. However, current cardiac MRI-based reconstruction technology used in clinical settings is 2D with limited through-plane resolution, resulting in low-quality reconstructed cardiac volumes. To better reconstruct 3D cardiac volumes from sparse 2D image stacks, we propose a morphology-guided diffusion model for 3D cardiac volume reconstruction, DMCVR, that synthesizes high-resolution 2D images and corresponding 3D reconstructed volumes. Our method outperforms previous approaches by conditioning the cardiac morphology on the generative model, eliminating the time-consuming iterative optimization process of the latent code, and improving generation quality. The learned latent spaces provide global semantics, local cardiac morphology and details of each 2D cMRI slice with highly interpretable value to reconstruct 3D cardiac shape. Our experiments show that DMCVR is highly effective in several aspects, such as 2D generation and 3D reconstruction performance. With DMCVR, we can produce high-resolution 3D cardiac MRI reconstructions, surpassing current techniques. Our proposed framework has great potential for improving the accuracy of cardiac disease diagnosis and treatment planning. Code can be accessed at https://github.com/hexiaoxiao-cs/DMCVR.

* Accepted in MICCAI 2023

Via

Access Paper or Ask Questions

Improving Tuning-Free Real Image Editing with Proximal Guidance

Jun 29, 2023
Ligong Han, Song Wen, Qi Chen, Zhixing Zhang, Kunpeng Song, Mengwei Ren, Ruijiang Gao, Yuxiao Chen, Di Liu, Qilong Zhangli, Anastasis Stathopoulos, Jindong Jiang, Zhaoyang Xia, Akash Srivastava, Dimitris Metaxas

Figure 1 for Improving Tuning-Free Real Image Editing with Proximal Guidance

Figure 2 for Improving Tuning-Free Real Image Editing with Proximal Guidance

Figure 3 for Improving Tuning-Free Real Image Editing with Proximal Guidance

Figure 4 for Improving Tuning-Free Real Image Editing with Proximal Guidance

DDIM inversion has revealed the remarkable potential of real image editing within diffusion-based methods. However, the accuracy of DDIM reconstruction degrades as larger classifier-free guidance (CFG) scales being used for enhanced editing. Null-text inversion (NTI) optimizes null embeddings to align the reconstruction and inversion trajectories with larger CFG scales, enabling real image editing with cross-attention control. Negative-prompt inversion (NPI) further offers a training-free closed-form solution of NTI. However, it may introduce artifacts and is still constrained by DDIM reconstruction quality. To overcome these limitations, we propose proximal guidance and incorporate it to NPI with cross-attention control. We enhance NPI with a regularization term and reconstruction guidance, which reduces artifacts while capitalizing on its training-free nature. Additionally, we extend the concepts to incorporate mutual self-attention control, enabling geometry and layout alterations in the editing process. Our method provides an efficient and straightforward approach, effectively addressing real image editing tasks with minimal computational overhead.

* Added an extension of proximal guidance to MasaCtrl

Via

Access Paper or Ask Questions

Learning Articulated Shape with Keypoint Pseudo-labels from Web Images

Apr 27, 2023
Anastasis Stathopoulos, Georgios Pavlakos, Ligong Han, Dimitris Metaxas

Figure 1 for Learning Articulated Shape with Keypoint Pseudo-labels from Web Images

Figure 2 for Learning Articulated Shape with Keypoint Pseudo-labels from Web Images

Figure 3 for Learning Articulated Shape with Keypoint Pseudo-labels from Web Images

Figure 4 for Learning Articulated Shape with Keypoint Pseudo-labels from Web Images

This paper shows that it is possible to learn models for monocular 3D reconstruction of articulated objects (e.g., horses, cows, sheep), using as few as 50-150 images labeled with 2D keypoints. Our proposed approach involves training category-specific keypoint estimators, generating 2D keypoint pseudo-labels on unlabeled web images, and using both the labeled and self-labeled sets to train 3D reconstruction models. It is based on two key insights: (1) 2D keypoint estimation networks trained on as few as 50-150 images of a given object category generalize well and generate reliable pseudo-labels; (2) a data selection mechanism can automatically create a "curated" subset of the unlabeled web images that can be used for training -- we evaluate four data selection methods. Coupling these two insights enables us to train models that effectively utilize web images, resulting in improved 3D reconstruction performance for several articulated object categories beyond the fully-supervised baseline. Our approach can quickly bootstrap a model and requires only a few images labeled with 2D keypoints. This requirement can be easily satisfied for any new object category. To showcase the practicality of our approach for predicting the 3D shape of arbitrary object categories, we annotate 2D keypoints on giraffe and bear images from COCO -- the annotation process takes less than 1 minute per image.

* CVPR 2023 (project page: https://statho.github.io/projects/animals3d/index.html)

Via

Access Paper or Ask Questions

Constructive Assimilation: Boosting Contrastive Learning Performance through View Generation Strategies

Apr 08, 2023
Ligong Han, Seungwook Han, Shivchander Sudalairaj, Charlotte Loh, Rumen Dangovski, Fei Deng, Pulkit Agrawal, Dimitris Metaxas, Leonid Karlinsky, Tsui-Wei Weng, Akash Srivastava

Figure 1 for Constructive Assimilation: Boosting Contrastive Learning Performance through View Generation Strategies

Figure 2 for Constructive Assimilation: Boosting Contrastive Learning Performance through View Generation Strategies

Figure 3 for Constructive Assimilation: Boosting Contrastive Learning Performance through View Generation Strategies

Figure 4 for Constructive Assimilation: Boosting Contrastive Learning Performance through View Generation Strategies

Transformations based on domain expertise (expert transformations), such as random-resized-crop and color-jitter, have proven critical to the success of contrastive learning techniques such as SimCLR. Recently, several attempts have been made to replace such domain-specific, human-designed transformations with generated views that are learned. However for imagery data, so far none of these view-generation methods has been able to outperform expert transformations. In this work, we tackle a different question: instead of replacing expert transformations with generated views, can we constructively assimilate generated views with expert transformations? We answer this question in the affirmative and propose a view generation method and a simple, effective assimilation method that together improve the state-of-the-art by up to ~3.6% on three different datasets. Importantly, we conduct a detailed empirical study that systematically analyzes a range of view generation and assimilation methods and provides a holistic picture of the efficacy of learned views in contrastive representation learning.

* Accepted at Generative Models for Computer Vision Workshop 2023

Via

Access Paper or Ask Questions

SVDiff: Compact Parameter Space for Diffusion Fine-Tuning

Mar 22, 2023
Ligong Han, Yinxiao Li, Han Zhang, Peyman Milanfar, Dimitris Metaxas, Feng Yang

Figure 1 for SVDiff: Compact Parameter Space for Diffusion Fine-Tuning

Figure 2 for SVDiff: Compact Parameter Space for Diffusion Fine-Tuning

Figure 3 for SVDiff: Compact Parameter Space for Diffusion Fine-Tuning

Figure 4 for SVDiff: Compact Parameter Space for Diffusion Fine-Tuning

Diffusion models have achieved remarkable success in text-to-image generation, enabling the creation of high-quality images from text prompts or other modalities. However, existing methods for customizing these models are limited by handling multiple personalized subjects and the risk of overfitting. Moreover, their large number of parameters is inefficient for model storage. In this paper, we propose a novel approach to address these limitations in existing text-to-image diffusion models for personalization. Our method involves fine-tuning the singular values of the weight matrices, leading to a compact and efficient parameter space that reduces the risk of overfitting and language-drifting. We also propose a Cut-Mix-Unmix data-augmentation technique to enhance the quality of multi-subject image generation and a simple text-based image editing framework. Our proposed SVDiff method has a significantly smaller model size (1.7MB for StableDiffusion) compared to existing methods (vanilla DreamBooth 3.66GB, Custom Diffusion 73MB), making it more practical for real-world applications.

* 20 pages, 21 figures

Via

Access Paper or Ask Questions

Learning Complementary Policies for Human-AI Teams

Feb 06, 2023
Ruijiang Gao, Maytal Saar-Tsechansky, Maria De-Arteaga, Ligong Han, Wei Sun, Min Kyung Lee, Matthew Lease

Figure 1 for Learning Complementary Policies for Human-AI Teams

Figure 2 for Learning Complementary Policies for Human-AI Teams

Figure 3 for Learning Complementary Policies for Human-AI Teams

Figure 4 for Learning Complementary Policies for Human-AI Teams

Human-AI complementarity is important when neither the algorithm nor the human yields dominant performance across all instances in a given context. Recent work that explored human-AI collaboration has considered decisions that correspond to classification tasks. However, in many important contexts where humans can benefit from AI complementarity, humans undertake course of action. In this paper, we propose a framework for a novel human-AI collaboration for selecting advantageous course of action, which we refer to as Learning Complementary Policy for Human-AI teams (\textsc{lcp-hai}). Our solution aims to exploit the human-AI complementarity to maximize decision rewards by learning both an algorithmic policy that aims to complement humans by a routing model that defers decisions to either a human or the AI to leverage the resulting complementarity. We then extend our approach to leverage opportunities and mitigate risks that arise in important contexts in practice: 1) when a team is composed of multiple humans with differential and potentially complementary abilities, 2) when the observational data includes consistent deterministic actions, and 3) when the covariate distribution of future decisions differ from that in the historical data. We demonstrate the effectiveness of our proposed methods using data on real human responses and semi-synthetic, and find that our methods offer reliable and advantageous performance across setting, and that it is superior to when either the algorithm or the AI make decisions on their own. We also find that the extensions we propose effectively improve the robustness of the human-AI collaboration performance in the presence of different challenging settings.

* Previous name: Robust Human-AI Collaboration with Bandit Feedback; Best student paper award at Conference on Information Systems and Technology (CIST), 2022

Via

Access Paper or Ask Questions

Diffusion Guided Domain Adaptation of Image Generators

Dec 09, 2022
Kunpeng Song, Ligong Han, Bingchen Liu, Dimitris Metaxas, Ahmed Elgammal

Figure 1 for Diffusion Guided Domain Adaptation of Image Generators

Figure 2 for Diffusion Guided Domain Adaptation of Image Generators

Figure 3 for Diffusion Guided Domain Adaptation of Image Generators

Figure 4 for Diffusion Guided Domain Adaptation of Image Generators

Can a text-to-image diffusion model be used as a training objective for adapting a GAN generator to another domain? In this paper, we show that the classifier-free guidance can be leveraged as a critic and enable generators to distill knowledge from large-scale text-to-image diffusion models. Generators can be efficiently shifted into new domains indicated by text prompts without access to groundtruth samples from target domains. We demonstrate the effectiveness and controllability of our method through extensive experiments. Although not trained to minimize CLIP loss, our model achieves equally high CLIP scores and significantly lower FID than prior work on short prompts, and outperforms the baseline qualitatively and quantitatively on long and complicated prompts. To our best knowledge, the proposed method is the first attempt at incorporating large-scale pre-trained diffusion models and distillation sampling for text-driven image generator domain adaptation and gives a quality previously beyond possible. Moreover, we extend our work to 3D-aware style-based generators and DreamBooth guidance.

* Project website: https://styleganfusion.github.io/

Via

Access Paper or Ask Questions

SINE: SINgle Image Editing with Text-to-Image Diffusion Models

Dec 08, 2022
Zhixing Zhang, Ligong Han, Arnab Ghosh, Dimitris Metaxas, Jian Ren

Figure 1 for SINE: SINgle Image Editing with Text-to-Image Diffusion Models

Figure 2 for SINE: SINgle Image Editing with Text-to-Image Diffusion Models

Figure 3 for SINE: SINgle Image Editing with Text-to-Image Diffusion Models

Figure 4 for SINE: SINgle Image Editing with Text-to-Image Diffusion Models

Recent works on diffusion models have demonstrated a strong capability for conditioning image generation, e.g., text-guided image synthesis. Such success inspires many efforts trying to use large-scale pre-trained diffusion models for tackling a challenging problem--real image editing. Works conducted in this area learn a unique textual token corresponding to several images containing the same object. However, under many circumstances, only one image is available, such as the painting of the Girl with a Pearl Earring. Using existing works on fine-tuning the pre-trained diffusion models with a single image causes severe overfitting issues. The information leakage from the pre-trained diffusion models makes editing can not keep the same content as the given image while creating new features depicted by the language guidance. This work aims to address the problem of single-image editing. We propose a novel model-based guidance built upon the classifier-free guidance so that the knowledge from the model trained on a single image can be distilled into the pre-trained diffusion model, enabling content creation even with one given image. Additionally, we propose a patch-based fine-tuning that can effectively help the model generate images of arbitrary resolution. We provide extensive experiments to validate the design choices of our approach and show promising editing capabilities, including changing style, content addition, and object manipulation. The code is available for research purposes at https://github.com/zhang-zx/SINE.git .

* Project website: https://zhang-zx.github.io/SINE/

Via

Access Paper or Ask Questions

On the Importance of Calibration in Semi-supervised Learning

Oct 10, 2022
Charlotte Loh, Rumen Dangovski, Shivchander Sudalairaj, Seungwook Han, Ligong Han, Leonid Karlinsky, Marin Soljacic, Akash Srivastava

Figure 1 for On the Importance of Calibration in Semi-supervised Learning

Figure 2 for On the Importance of Calibration in Semi-supervised Learning

Figure 3 for On the Importance of Calibration in Semi-supervised Learning

Figure 4 for On the Importance of Calibration in Semi-supervised Learning

State-of-the-art (SOTA) semi-supervised learning (SSL) methods have been highly successful in leveraging a mix of labeled and unlabeled data by combining techniques of consistency regularization and pseudo-labeling. During pseudo-labeling, the model's predictions on unlabeled data are used for training and thus, model calibration is important in mitigating confirmation bias. Yet, many SOTA methods are optimized for model performance, with little focus directed to improve model calibration. In this work, we empirically demonstrate that model calibration is strongly correlated with model performance and propose to improve calibration via approximate Bayesian techniques. We introduce a family of new SSL models that optimizes for calibration and demonstrate their effectiveness across standard vision benchmarks of CIFAR-10, CIFAR-100 and ImageNet, giving up to 15.9% improvement in test accuracy. Furthermore, we also demonstrate their effectiveness in additional realistic and challenging problems, such as class-imbalanced datasets and in photonics science.

* 24 pages

Via

Access Paper or Ask Questions