Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chun Yu

GestureGPT: Zero-shot Interactive Gesture Understanding and Grounding with Large Language Model Agents

Oct 30, 2023
Xin Zeng, Xiaoyu Wang, Tengxiang Zhang, Chun Yu, Shengdong Zhao, Yiqiang Chen

Current gesture recognition systems primarily focus on identifying gestures within a predefined set, leaving a gap in connecting these gestures to interactive GUI elements or system functions (e.g., linking a 'thumb-up' gesture to a 'like' button). We introduce GestureGPT, a novel zero-shot gesture understanding and grounding framework leveraging large language models (LLMs). Gesture descriptions are formulated based on hand landmark coordinates from gesture videos and fed into our dual-agent dialogue system. A gesture agent deciphers these descriptions and queries about the interaction context (e.g., interface, history, gaze data), which a context agent organizes and provides. Following iterative exchanges, the gesture agent discerns user intent, grounding it to an interactive function. We validated the gesture description module using public first-view and third-view gesture datasets and tested the whole system in two real-world settings: video streaming and smart home IoT control. The highest zero-shot Top-5 grounding accuracies are 80.11% for video streaming and 90.78% for smart home tasks, showing potential of the new gesture understanding paradigm.

Via

Access Paper or Ask Questions

MindShift: Leveraging Large Language Models for Mental-States-Based Problematic Smartphone Use Intervention

Sep 28, 2023
Ruolan Wu, Chun Yu, Xiaole Pan, Yujia Liu, Ningning Zhang, Yue Fu, Yuhan Wang, Zhi Zheng, Li Chen, Qiaolei Jiang, Xuhai Xu, Yuanchun Shi

Figure 1 for MindShift: Leveraging Large Language Models for Mental-States-Based Problematic Smartphone Use Intervention

Figure 2 for MindShift: Leveraging Large Language Models for Mental-States-Based Problematic Smartphone Use Intervention

Figure 3 for MindShift: Leveraging Large Language Models for Mental-States-Based Problematic Smartphone Use Intervention

Figure 4 for MindShift: Leveraging Large Language Models for Mental-States-Based Problematic Smartphone Use Intervention

Problematic smartphone use negatively affects physical and mental health. Despite the wide range of prior research, existing persuasive techniques are not flexible enough to provide dynamic persuasion content based on users' physical contexts and mental states. We first conduct a Wizard-of-Oz study (N=12) and an interview study (N=10) to summarize the mental states behind problematic smartphone use: boredom, stress, and inertia. This informs our design of four persuasion strategies: understanding, comforting, evoking, and scaffolding habits. We leverage large language models (LLMs) to enable the automatic and dynamic generation of effective persuasion content. We develop MindShift, a novel LLM-powered problematic smartphone use intervention technique. MindShift takes users' in-the-moment physical contexts, mental states, app usage behaviors, users' goals & habits as input, and generates high-quality and flexible persuasive content with appropriate persuasion strategies. We conduct a 5-week field experiment (N=25) to compare MindShift with baseline techniques. The results show that MindShift significantly improves intervention acceptance rates by 17.8-22.5% and reduces smartphone use frequency by 12.1-14.4%. Moreover, users have a significant drop in smartphone addiction scale scores and a rise in self-efficacy. Our study sheds light on the potential of leveraging LLMs for context-aware persuasion in other behavior change domains.

Via

Access Paper or Ask Questions

Modeling the Trade-off of Privacy Preservation and Activity Recognition on Low-Resolution Images

Mar 18, 2023
Yuntao Wang, Zirui Cheng, Xin Yi, Yan Kong, Xueyang Wang, Xuhai Xu, Yukang Yan, Chun Yu, Shwetak Patel, Yuanchun Shi

Figure 1 for Modeling the Trade-off of Privacy Preservation and Activity Recognition on Low-Resolution Images

Figure 2 for Modeling the Trade-off of Privacy Preservation and Activity Recognition on Low-Resolution Images

Figure 3 for Modeling the Trade-off of Privacy Preservation and Activity Recognition on Low-Resolution Images

Figure 4 for Modeling the Trade-off of Privacy Preservation and Activity Recognition on Low-Resolution Images

A computer vision system using low-resolution image sensors can provide intelligent services (e.g., activity recognition) but preserve unnecessary visual privacy information from the hardware level. However, preserving visual privacy and enabling accurate machine recognition have adversarial needs on image resolution. Modeling the trade-off of privacy preservation and machine recognition performance can guide future privacy-preserving computer vision systems using low-resolution image sensors. In this paper, using the at-home activity of daily livings (ADLs) as the scenario, we first obtained the most important visual privacy features through a user survey. Then we quantified and analyzed the effects of image resolution on human and machine recognition performance in activity recognition and privacy awareness tasks. We also investigated how modern image super-resolution techniques influence these effects. Based on the results, we proposed a method for modeling the trade-off of privacy preservation and activity recognition on low-resolution images.

* This paper has been accepted by the ACM CHI 2023

Via

Access Paper or Ask Questions

Revamp: Enhancing Accessible Information Seeking Experience of Online Shopping for Blind or Low Vision Users

Feb 01, 2021
Ruolin Wang, Zixuan Chen, Mingrui "Ray" Zhang, Zhaoheng Li, Zhixiu Liu, Zihan Dang, Chun Yu, Xiang "Anthony" Chen

Figure 1 for Revamp: Enhancing Accessible Information Seeking Experience of Online Shopping for Blind or Low Vision Users

Figure 2 for Revamp: Enhancing Accessible Information Seeking Experience of Online Shopping for Blind or Low Vision Users

Figure 3 for Revamp: Enhancing Accessible Information Seeking Experience of Online Shopping for Blind or Low Vision Users

Figure 4 for Revamp: Enhancing Accessible Information Seeking Experience of Online Shopping for Blind or Low Vision Users

Online shopping has become a valuable modern convenience, but blind or low vision (BLV) users still face significant challenges using it, because of: 1) inadequate image descriptions and 2) the inability to filter large amounts of information using screen readers. To address those challenges, we propose Revamp, a system that leverages customer reviews for interactive information retrieval. Revamp is a browser integration that supports review-based question-answering interactions on a reconstructed product page. From our interview, we identified four main aspects (color, logo, shape, and size) that are vital for BLV users to understand the visual appearance of a product. Based on the findings, we formulated syntactic rules to extract review snippets, which were used to generate image descriptions and responses to users' queries. Evaluations with eight BLV users showed that Revamp 1) provided useful descriptive information for understanding product appearance and 2) helped the participants locate key information efficiently.

Via

Access Paper or Ask Questions

Pursuing Sources of Heterogeneity in Modeling Clustered Population

Mar 10, 2020
Yan Li, Chun Yu, Yize Zhao, Robert H. Aseltine, Weixin Yao, Kun Chen

Figure 1 for Pursuing Sources of Heterogeneity in Modeling Clustered Population

Figure 2 for Pursuing Sources of Heterogeneity in Modeling Clustered Population

Figure 3 for Pursuing Sources of Heterogeneity in Modeling Clustered Population

Figure 4 for Pursuing Sources of Heterogeneity in Modeling Clustered Population

Researchers often have to deal with heterogeneous population with mixed regression relationships, increasingly so in the era of data explosion. In such problems, when there are many candidate predictors, it is not only of interest to identify the predictors that are associated with the outcome, but also to distinguish the true sources of heterogeneity, i.e., to identify the predictors that have different effects among the clusters and thus are the true contributors to the formation of the clusters. We clarify the concepts of the source of heterogeneity that account for potential scale differences of the clusters and propose a regularized finite mixture effects regression to achieve heterogeneity pursuit and feature selection simultaneously. As the name suggests, the problem is formulated under an effects-model parameterization, in which the cluster labels are missing and the effect of each predictor on the outcome is decomposed to a common effect term and a set of cluster-specific terms. A constrained sparse estimation of these effects leads to the identification of both the variables with common effects and those with heterogeneous effects. We propose an efficient algorithm and show that our approach can achieve both estimation and selection consistency. Simulation studies further demonstrate the effectiveness of our method under various practical scenarios. Three applications are presented, namely, an imaging genetics study for linking genetic factors and brain neuroimaging traits in Alzheimer's disease, a public health study for exploring the association between suicide risk among adolescents and their school district characteristics, and a sport analytics study for understanding how the salary levels of baseball players are associated with their performance and contractual status.

Via

Access Paper or Ask Questions