Picture for David A. Ross

David A. Ross

SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code

Add code
Mar 02, 2024
Figure 1 for SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code
Figure 2 for SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code
Figure 3 for SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code
Figure 4 for SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code
Viaarxiv icon

VideoPrism: A Foundational Visual Encoder for Video Understanding

Add code
Feb 20, 2024
Figure 1 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 2 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 3 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 4 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Viaarxiv icon

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

Add code
Oct 09, 2023
Figure 1 for Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Figure 2 for Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Figure 3 for Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Figure 4 for Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Viaarxiv icon

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

Add code
Jul 03, 2023
Figure 1 for SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Figure 2 for SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Figure 3 for SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Figure 4 for SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Viaarxiv icon

$IC^3$: Image Captioning by Committee Consensus

Add code
Feb 16, 2023
Figure 1 for $IC^3$: Image Captioning by Committee Consensus
Figure 2 for $IC^3$: Image Captioning by Committee Consensus
Figure 3 for $IC^3$: Image Captioning by Committee Consensus
Figure 4 for $IC^3$: Image Captioning by Committee Consensus
Viaarxiv icon

Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features

Add code
Dec 20, 2022
Figure 1 for Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features
Figure 2 for Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features
Figure 3 for Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features
Figure 4 for Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features
Viaarxiv icon

REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory

Add code
Dec 10, 2022
Figure 1 for REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Figure 2 for REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Figure 3 for REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Figure 4 for REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Viaarxiv icon

What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics

Add code
May 12, 2022
Figure 1 for What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics
Figure 2 for What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics
Figure 3 for What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics
Figure 4 for What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics
Viaarxiv icon

Learn to Dance with AIST++: Music Conditioned 3D Dance Generation

Add code
Feb 02, 2021
Figure 1 for Learn to Dance with AIST++: Music Conditioned 3D Dance Generation
Figure 2 for Learn to Dance with AIST++: Music Conditioned 3D Dance Generation
Figure 3 for Learn to Dance with AIST++: Music Conditioned 3D Dance Generation
Figure 4 for Learn to Dance with AIST++: Music Conditioned 3D Dance Generation
Viaarxiv icon

Active Learning for Video Description With Cluster-Regularized Ensemble Ranking

Add code
Jul 29, 2020
Figure 1 for Active Learning for Video Description With Cluster-Regularized Ensemble Ranking
Figure 2 for Active Learning for Video Description With Cluster-Regularized Ensemble Ranking
Figure 3 for Active Learning for Video Description With Cluster-Regularized Ensemble Ranking
Figure 4 for Active Learning for Video Description With Cluster-Regularized Ensemble Ranking
Viaarxiv icon