Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jennifer Healey

Adobe

Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models

Apr 23, 2024
Wanrong Zhu, Jennifer Healey, Ruiyi Zhang, William Yang Wang, Tong Sun

Recent advancements in instruction-following models have made user interactions with models more user-friendly and efficient, broadening their applicability. In graphic design, non-professional users often struggle to create visually appealing layouts due to limited skills and resources. In this work, we introduce a novel multimodal instruction-following framework for layout planning, allowing users to easily arrange visual elements into tailored layouts by specifying canvas size and design purpose, such as for book covers, posters, brochures, or menus. We developed three layout reasoning tasks to train the model in understanding and executing layout instructions. Experiments on two benchmarks show that our method not only simplifies the design process for non-professionals but also surpasses the performance of few-shot GPT-4V models, with mIoU higher by 12% on Crello. This progress highlights the potential of multimodal instruction-following models to automate and simplify the design process, providing an approachable solution for a wide range of design tasks on visually-rich documents.

Via

Access Paper or Ask Questions

Gaudí: Conversational Interactions with Deep Representations to Generate Image Collections

Dec 05, 2021
Victor S. Bursztyn, Jennifer Healey, Vishwa Vinay

Figure 1 for Gaudí: Conversational Interactions with Deep Representations to Generate Image Collections

Figure 2 for Gaudí: Conversational Interactions with Deep Representations to Generate Image Collections

Based on recent advances in realistic language modeling (GPT-3) and cross-modal representations (CLIP), Gaud\'i was developed to help designers search for inspirational images using natural language. In the early stages of the design process, with the goal of eliciting a client's preferred creative direction, designers will typically create thematic collections of inspirational images called "mood-boards". Creating a mood-board involves sequential image searches which are currently performed using keywords or images. Gaud\'i transforms this process into a conversation where the user is gradually detailing the mood-board's theme. This representation allows our AI to generate new search queries from scratch, straight from a project briefing, following a theme hypothesized by GPT-3. Compared to previous computational approaches to mood-board creation, to the best of our knowledge, ours is the first attempt to represent mood-boards as the stories that designers tell when presenting a creative direction to a client.

* Accepted at the NeurIPS 2021 Workshop on Machine Learning for Creativity and Design

Via

Access Paper or Ask Questions

Multiscale Manifold Warping

Sep 19, 2021
Sridhar Mahadevan, Anup Rao, Georgios Theocharous, Jennifer Healey

Figure 1 for Multiscale Manifold Warping

Figure 2 for Multiscale Manifold Warping

Figure 3 for Multiscale Manifold Warping

Figure 4 for Multiscale Manifold Warping

Many real-world applications require aligning two temporal sequences, including bioinformatics, handwriting recognition, activity recognition, and human-robot coordination. Dynamic Time Warping (DTW) is a popular alignment method, but can fail on high-dimensional real-world data where the dimensions of aligned sequences are often unequal. In this paper, we show that exploiting the multiscale manifold latent structure of real-world data can yield improved alignment. We introduce a novel framework called Warping on Wavelets (WOW) that integrates DTW with a a multi-scale manifold learning framework called Diffusion Wavelets. We present a theoretical analysis of the WOW family of algorithms and show that it outperforms previous state of the art methods, such as canonical time warping (CTW) and manifold warping, on several real-world datasets.

* 18 pages

Via

Access Paper or Ask Questions

"It doesn't look good for a date": Transforming Critiques into Preferences for Conversational Recommendation Systems

Sep 15, 2021
Victor S. Bursztyn, Jennifer Healey, Nedim Lipka, Eunyee Koh, Doug Downey, Larry Birnbaum

Figure 1 for "It doesn't look good for a date": Transforming Critiques into Preferences for Conversational Recommendation Systems

Figure 2 for "It doesn't look good for a date": Transforming Critiques into Preferences for Conversational Recommendation Systems

Figure 3 for "It doesn't look good for a date": Transforming Critiques into Preferences for Conversational Recommendation Systems

Conversations aimed at determining good recommendations are iterative in nature. People often express their preferences in terms of a critique of the current recommendation (e.g., "It doesn't look good for a date"), requiring some degree of common sense for a preference to be inferred. In this work, we present a method for transforming a user critique into a positive preference (e.g., "I prefer more romantic") in order to retrieve reviews pertaining to potentially better recommendations (e.g., "Perfect for a romantic dinner"). We leverage a large neural language model (LM) in a few-shot setting to perform critique-to-preference transformation, and we test two methods for retrieving recommendations: one that matches embeddings, and another that fine-tunes an LM for the task. We instantiate this approach in the restaurant domain and evaluate it using a new dataset of restaurant critiques. In an ablation study, we show that utilizing critique-to-preference transformation improves recommendations, and that there are at least three general cases that explain this improved performance.

* Accepted to EMNLP 2021's main conference

Via

Access Paper or Ask Questions

Developing a Conversational Recommendation System for Navigating Limited Options

Apr 13, 2021
Victor S. Bursztyn, Jennifer Healey, Eunyee Koh, Nedim Lipka, Larry Birnbaum

Figure 1 for Developing a Conversational Recommendation System for Navigating Limited Options

Figure 2 for Developing a Conversational Recommendation System for Navigating Limited Options

Figure 3 for Developing a Conversational Recommendation System for Navigating Limited Options

Figure 4 for Developing a Conversational Recommendation System for Navigating Limited Options

We have developed a conversational recommendation system designed to help users navigate through a set of limited options to find the best choice. Unlike many internet scale systems that use a singular set of search terms and return a ranked list of options from amongst thousands, our system uses multi-turn user dialog to deeply understand the users preferences. The system responds in context to the users specific and immediate feedback to make sequential recommendations. We envision our system would be highly useful in situations with intrinsic constraints, such as finding the right restaurant within walking distance or the right retail item within a limited inventory. Our research prototype instantiates the former use case, leveraging real data from Google Places, Yelp, and Zomato. We evaluated our system against a similar system that did not incorporate user feedback in a 16 person remote study, generating 64 scenario-based search journeys. When our recommendation system was successfully triggered, we saw both an increase in efficiency and a higher confidence rating with respect to final user choice. We also found that users preferred our system (75%) compared with the baseline.

* 7 pages, 4 figures, to appear in CHI 2021 as a Late Breaking Work, see "https://chi2021.acm.org/"

Via

Access Paper or Ask Questions