Alert button
Picture for Yonatan Bitton

Yonatan Bitton

Alert button

TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation

Add code
Bookmark button
Alert button
May 07, 2024
Hritik Bansal, Yonatan Bitton, Michal Yarom, Idan Szpektor, Aditya Grover, Kai-Wei Chang

Viaarxiv icon

ImageInWords: Unlocking Hyper-Detailed Image Descriptions

Add code
Bookmark button
Alert button
May 05, 2024
Roopal Garg, Andrea Burns, Burcu Karagol Ayan, Yonatan Bitton, Ceslee Montgomery, Yasumasa Onoe, Andrew Bunner, Ranjay Krishna, Jason Baldridge, Radu Soricut

Viaarxiv icon

DOCCI: Descriptions of Connected and Contrasting Images

Add code
Bookmark button
Alert button
Apr 30, 2024
Yasumasa Onoe, Sunayana Rane, Zachary Berger, Yonatan Bitton, Jaemin Cho, Roopal Garg, Alexander Ku, Zarana Parekh, Jordi Pont-Tuset, Garrett Tanzer, Su Wang, Jason Baldridge

Viaarxiv icon

ParallelPARC: A Scalable Pipeline for Generating Natural-Language Analogies

Add code
Bookmark button
Alert button
Mar 02, 2024
Oren Sultan, Yonatan Bitton, Ron Yosef, Dafna Shahaf

Figure 1 for ParallelPARC: A Scalable Pipeline for Generating Natural-Language Analogies
Figure 2 for ParallelPARC: A Scalable Pipeline for Generating Natural-Language Analogies
Figure 3 for ParallelPARC: A Scalable Pipeline for Generating Natural-Language Analogies
Figure 4 for ParallelPARC: A Scalable Pipeline for Generating Natural-Language Analogies
Viaarxiv icon

A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains

Add code
Bookmark button
Alert button
Feb 02, 2024
Alon Jacovi, Yonatan Bitton, Bernd Bohnet, Jonathan Herzig, Or Honovich, Michael Tseng, Michael Collins, Roee Aharoni, Mor Geva

Viaarxiv icon

Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

Add code
Bookmark button
Alert button
Dec 05, 2023
Brian Gordon, Yonatan Bitton, Yonatan Shafir, Roopal Garg, Xi Chen, Dani Lischinski, Daniel Cohen-Or, Idan Szpektor

Viaarxiv icon

VideoCon: Robust Video-Language Alignment via Contrast Captions

Add code
Bookmark button
Alert button
Nov 15, 2023
Hritik Bansal, Yonatan Bitton, Idan Szpektor, Kai-Wei Chang, Aditya Grover

Viaarxiv icon

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

Add code
Bookmark button
Alert button
Aug 12, 2023
Yonatan Bitton, Hritik Bansal, Jack Hessel, Rulin Shao, Wanrong Zhu, Anas Awadalla, Josh Gardner, Rohan Taori, Ludwig Schimdt

Figure 1 for VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Figure 2 for VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Figure 3 for VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Figure 4 for VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Viaarxiv icon

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

Add code
Bookmark button
Alert button
Aug 07, 2023
Anas Awadalla, Irena Gao, Josh Gardner, Jack Hessel, Yusuf Hanafy, Wanrong Zhu, Kalyani Marathe, Yonatan Bitton, Samir Gadre, Shiori Sagawa, Jenia Jitsev, Simon Kornblith, Pang Wei Koh, Gabriel Ilharco, Mitchell Wortsman, Ludwig Schmidt

Figure 1 for OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Figure 2 for OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Figure 3 for OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Figure 4 for OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Viaarxiv icon

Read, Look or Listen? What's Needed for Solving a Multimodal Dataset

Add code
Bookmark button
Alert button
Jul 06, 2023
Netta Madvil, Yonatan Bitton, Roy Schwartz

Figure 1 for Read, Look or Listen? What's Needed for Solving a Multimodal Dataset
Figure 2 for Read, Look or Listen? What's Needed for Solving a Multimodal Dataset
Figure 3 for Read, Look or Listen? What's Needed for Solving a Multimodal Dataset
Figure 4 for Read, Look or Listen? What's Needed for Solving a Multimodal Dataset
Viaarxiv icon