HCI

2024-04-11

Goal Recognition via Linear Programming

Authors: Felipe Meneguzzi, Luísa R. de A. Santos, Ramon Fraga Pereira, André G. Pereira

Link: http://arxiv.org/abs/2404.07934v1

Abstract: Goal Recognition is the task by which an observer aims to discern the goals that correspond to plans that comply with the perceived behavior of subject agents given as a sequence of observations. Research on Goal Recognition as Planning encompasses reasoning about the model of a planning task, the observations, and the goals using planning techniques, resulting in very efficient recognition approaches. In this article, we design novel recognition approaches that rely on the Operator-Counting framework, proposing new constraints, and analyze their constraints' properties both theoretically and empirically. The Operator-Counting framework is a technique that efficiently computes heuristic estimates of cost-to-goal using Integer/Linear Programming (IP/LP). In the realm of theory, we prove that the new constraints provide lower bounds on the cost of plans that comply with observations. We also provide an extensive empirical evaluation to assess how the new constraints improve the quality of the solution, and we found that they are especially informed in deciding which goals are unlikely to be part of the solution. Our novel recognition approaches have two pivotal advantages: first, they employ new IP/LP constraints for efficiently recognizing goals; second, we show how the new IP/LP constraints can improve the recognition of goals under both partial and noisy observability.

Leveraging Large Language Models (LLMs) to Support Collaborative Human-AI Online Risk Data Annotation

Authors: Jinkyung Park, Pamela Wisniewski, Vivek Singh

Link: http://arxiv.org/abs/2404.07926v1

Abstract: In this position paper, we discuss the potential for leveraging LLMs as interactive research tools to facilitate collaboration between human coders and AI to effectively annotate online risk data at scale. Collaborative human-AI labeling is a promising approach to annotating large-scale and complex data for various tasks. Yet, tools and methods to support effective human-AI collaboration for data annotation are under-studied. This gap is pertinent because co-labeling tasks need to support a two-way interactive discussion that can add nuance and context, particularly in the context of online risk, which is highly subjective and contextualized. Therefore, we provide some of the early benefits and challenges of using LLMs-based tools for risk annotation and suggest future directions for the HCI research community to leverage LLMs as research tools to facilitate human-AI collaboration in contextualized online data annotation. Our research interests align very well with the purposes of the LLMs as Research Tools workshop to identify ongoing applications and challenges of using LLMs to work with data in HCI research. We anticipate learning valuable insights from organizers and participants into how LLMs can help reshape the HCI community's methods for working with data.

Snake Story: Exploring Game Mechanics for Mixed-Initiative Co-creative Storytelling Games

Authors: Daijin Yang, Erica Kleinman, Giovanni Maria Troiano, Elina Tochilnikova, Casper Harteveld

Link: http://arxiv.org/abs/2404.07901v1

Abstract: Mixed-initiative co-creative storytelling games have existed for some time as a way to merge storytelling with play. However, modern mixed-initiative co-creative storytelling games predominantly prioritize story creation over gameplay mechanics, which might not resonate with all players. As such, there is untapped potential for creating mixed-initiative games with more complex mechanics in which players can engage with both co-creation and gameplay goals. To explore the potential of more prominent gameplay in mixed-initiative co-creative storytelling games, we created Snake Story, a variation of the classic Snake game featuring a human-AI co-writing element. To explore how players interact with the mixed-initiative game, we conducted a qualitative playtest with 11 participants. Analysis of both think-aloud and interview data revealed that players' strategies and experiences were affected by their perception of Snake Story as either a collaborative tool, a traditional game, or a combination of both. Based on these findings, we present design considerations for future development in mixed-initiative co-creative gaming.

Apprentice Tutor Builder: A Platform For Users to Create and Personalize Intelligent Tutors

Authors: Glen Smith, Adit Gupta, Christopher MacLellan

Link: http://arxiv.org/abs/2404.07883v1

Abstract: Intelligent tutoring systems (ITS) are effective for improving students' learning outcomes. However, their development is often complex, time-consuming, and requires specialized programming and tutor design knowledge, thus hindering their widespread application and personalization. We present the Apprentice Tutor Builder (ATB) , a platform that simplifies tutor creation and personalization. Instructors can utilize ATB's drag-and-drop tool to build tutor interfaces. Instructors can then interactively train the tutors' underlying AI agent to produce expert models that can solve problems. Training is achieved via using multiple interaction modalities including demonstrations, feedback, and user labels. We conducted a user study with 14 instructors to evaluate the effectiveness of ATB's design with end users. We found that users enjoyed the flexibility of the interface builder and ease and speed of agent teaching, but often desired additional time-saving features. With these insights, we identified a set of design recommendations for our platform and others that utilize interactive AI agents for tutor creation and customization.

The Dance of Logic and Unpredictability: Examining the Predictability of User Behavior on Visual Analytics Tasks

Authors: Alvitta Ottley

Link: http://arxiv.org/abs/2404.07865v1

Abstract: The quest to develop intelligent visual analytics (VA) systems capable of collaborating and naturally interacting with humans presents a multifaceted and intriguing challenge. VA systems designed for collaboration must adeptly navigate a complex landscape filled with the subtleties and unpredictabilities that characterize human behavior. However, it is noteworthy that scenarios exist where human behavior manifests predictably. These scenarios typically involve routine actions or present a limited range of choices. This paper delves into the predictability of user behavior in the context of visual analytics tasks. It offers an evidence-based discussion on the circumstances under which predicting user behavior is feasible and those where it proves challenging. We conclude with a forward-looking discussion of the future work necessary to cultivate more synergistic and efficient partnerships between humans and the VA system. This exploration is not just about understanding our current capabilities and limitations in mirroring human behavior but also about envisioning and paving the way for a future where human-machine interaction is more intuitive and productive.

Generating Synthetic Satellite Imagery With Deep-Learning Text-to-Image Models -- Technical Challenges and Implications for Monitoring and Verification

Authors: Tuong Vy Nguyen, Alexander Glaser, Felix Biessmann

Link: http://arxiv.org/abs/2404.07754v1

Abstract: Novel deep-learning (DL) architectures have reached a level where they can generate digital media, including photorealistic images, that are difficult to distinguish from real data. These technologies have already been used to generate training data for Machine Learning (ML) models, and large text-to-image models like DALL-E 2, Imagen, and Stable Diffusion are achieving remarkable results in realistic high-resolution image generation. Given these developments, issues of data authentication in monitoring and verification deserve a careful and systematic analysis: How realistic are synthetic images? How easily can they be generated? How useful are they for ML researchers, and what is their potential for Open Science? In this work, we use novel DL models to explore how synthetic satellite images can be created using conditioning mechanisms. We investigate the challenges of synthetic satellite image generation and evaluate the results based on authenticity and state-of-the-art metrics. Furthermore, we investigate how synthetic data can alleviate the lack of data in the context of ML methods for remote-sensing. Finally we discuss implications of synthetic satellite imagery in the context of monitoring and verification.

Unraveling the Dilemma of AI Errors: Exploring the Effectiveness of Human and Machine Explanations for Large Language Models

Authors: Marvin Pafla, Kate Larson, Mark Hancock

Link: http://arxiv.org/abs/2404.07725v1

Abstract: The field of eXplainable artificial intelligence (XAI) has produced a plethora of methods (e.g., saliency-maps) to gain insight into artificial intelligence (AI) models, and has exploded with the rise of deep learning (DL). However, human-participant studies question the efficacy of these methods, particularly when the AI output is wrong. In this study, we collected and analyzed 156 human-generated text and saliency-based explanations collected in a question-answering task (N=40) and compared them empirically to state-of-the-art XAI explanations (integrated gradients, conservative LRP, and ChatGPT) in a human-participant study (N=136). Our findings show that participants found human saliency maps to be more helpful in explaining AI answers than machine saliency maps, but performance negatively correlated with trust in the AI model and explanations. This finding hints at the dilemma of AI errors in explanation, where helpful explanations can lead to lower task performance when they support wrong AI predictions.

Efficient sEMG-based Cross-Subject Joint Angle Estimation via Hierarchical Spiking Attentional Feature Decomposition Network

Authors: Xin Zhou, Chuang Lin, Can Wang, Xiaojiang Peng

Link: http://arxiv.org/abs/2404.07517v1

Abstract: Surface electromyography (sEMG) has demonstrated significant potential in simultaneous and proportional control (SPC). However, existing algorithms for predicting joint angles based on sEMG often suffer from high inference costs or are limited to specific subjects rather than cross-subject scenarios. To address these challenges, we introduced a hierarchical Spiking Attentional Feature Decomposition Network (SAFE-Net). This network initially compresses sEMG signals into neural spiking forms using a Spiking Sparse Attention Encoder (SSAE). Subsequently, the compressed features are decomposed into kinematic and biological features through a Spiking Attentional Feature Decomposition (SAFD) module. Finally, the kinematic and biological features are used to predict joint angles and identify subject identities, respectively. Our validation on two datasets (SIAT-DB1 and SIAT-DB2) and comparison with two existing methods, Informer and Spikformer, demonstrate that SSAE achieves significant power consumption savings of 39.1% and 37.5% respectively over them in terms of inference costs. Furthermore, SAFE-Net surpasses Informer and Spikformer in recognition accuracy on both datasets. This study underscores the potential of SAFE-Net to advance the field of SPC in lower limb rehabilitation exoskeleton robots.

Interactive Prompt Debugging with Sequence Salience

Authors: Ian Tenney, Ryan Mullins, Bin Du, Shree Pandya, Minsuk Kahng, Lucas Dixon

Link: http://arxiv.org/abs/2404.07498v1

Abstract: We present Sequence Salience, a visual tool for interactive prompt debugging with input salience methods. Sequence Salience builds on widely used salience methods for text classification and single-token prediction, and extends this to a system tailored for debugging complex LLM prompts. Our system is well-suited for long texts, and expands on previous work by 1) providing controllable aggregation of token-level salience to the word, sentence, or paragraph level, making salience over long inputs tractable; and 2) supporting rapid iteration where practitioners can act on salience results, refine prompts, and run salience on the new output. We include case studies showing how Sequence Salience can help practitioners work with several complex prompting strategies, including few-shot, chain-of-thought, and constitutional principles. Sequence Salience is built on the Learning Interpretability Tool, an open-source platform for ML model visualizations, and code, notebooks, and tutorials are available at http://goo.gle/sequence-salience.

RASSAR: Room Accessibility and Safety Scanning in Augmented Reality

Authors: Xia Su, Han Zhang, Kaiming Cheng, Jaewook Lee, Qiaochu Liu, Wyatt Olson, Jon Froehlich

Link: http://arxiv.org/abs/2404.07479v1

Abstract: The safety and accessibility of our homes is critical to quality of life and evolves as we age, become ill, host guests, or experience life events such as having children. Researchers and health professionals have created assessment instruments such as checklists that enable homeowners and trained experts to identify and mitigate safety and access issues. With advances in computer vision, augmented reality (AR), and mobile sensors, new approaches are now possible. We introduce RASSAR, a mobile AR application for semi-automatically identifying, localizing, and visualizing indoor accessibility and safety issues such as an inaccessible table height or unsafe loose rugs using LiDAR and real-time computer vision. We present findings from three studies: a formative study with 18 participants across five stakeholder groups to inform the design of RASSAR, a technical performance evaluation across ten homes demonstrating state-of-the-art performance, and a user study with six stakeholders. We close with a discussion of future AI-based indoor accessibility assessment tools, RASSAR's extensibility, and key application scenarios.

Diversity's Double-Edged Sword: Analyzing Race's Effect on Remote Pair Programming Interactions

Authors: Shandler A. Mason, Sandeep Kaur Kuttal

Link: http://arxiv.org/abs/2404.07427v1

Abstract: Remote pair programming is widely used in software development, but no research has examined how race affects these interactions. We embarked on this study due to the historical under representation of Black developers in the tech industry, with White developers comprising the majority. Our study involved 24 experienced developers, forming 12 gender-balanced same- and mixed-race pairs. Pairs collaborated on a programming task using the think-aloud method, followed by individual retrospective interviews. Our findings revealed elevated productivity scores for mixed-race pairs, with no differences in code quality between same- and mixed-race pairs. Mixed-race pairs excelled in task distribution, shared decision-making, and role-exchange but encountered communication challenges, discomfort, and anxiety, shedding light on the complexity of diversity dynamics. Our study emphasizes race's impact on remote pair programming and underscores the need for diverse tools and methods to address racial disparities for collaboration.

Too good to be true: People reject free gifts from robots because they infer bad intentions

Authors: Benjamin Lebrun, Andrew Vonasch, Christoph Bartneck

Link: http://arxiv.org/abs/2404.07409v1

Abstract: A recent psychology study found that people sometimes reject overly generous offers from people because they imagine hidden ''phantom costs'' must be part of the transaction. Phantom costs occur when a person seems overly generous for no apparent reason. This study aims to explore whether people can imagine phantom costs when interacting with a robot. To this end, screen or physically embodied agents (human or robot) offered to people either a cookie or a cookie + $2. Participants were then asked to make a choice whether they would accept or decline the offer. Results showed that people did perceive phantom costs in the offer + $2 conditions when interacting with a human, but also with a robot, across both embodiment levels, leading to the characteristic behavioral effect that offering more money made people less likely to accept the offer. While people were more likely to accept offers from a robot than from a human, people more often accepted offers from humans when they were physically compared to screen embodied but were equally likely to accept the offer from a robot whether it was screen or physically embodied. This suggests that people can treat robots (and humans) as social agents with hidden intentions and knowledge, and that this influences their behavior toward them. This provides not only new insights on how people make decisions when interacting with a robot but also how robot embodiment impacts HRI research.

SealMates: Supporting Communication in Video Conferencing using a Collective Behavior-Driven Avatar

Authors: Mark Armstrong, Chi-Lan Yang, Kinga Skiers, Mengzhen Lim, Tamil Selvan Gunasekaran, Ziyue Wang, Takuji Narumi, Kouta Minamizawa, Yun Suen Pai

Link: http://arxiv.org/abs/2404.07403v1

Abstract: The limited nonverbal cues and spatially distributed nature of remote communication make it challenging for unacquainted members to be expressive during social interactions over video conferencing. Though it enables seeing others' facial expressions, the visual feedback can instead lead to unexpected self-focus, resulting in users missing cues for others to engage in the conversation equally. To support expressive communication and equal participation among unacquainted counterparts, we propose SealMates, a behavior-driven avatar in which the avatar infers the engagement level of the group based on collective gaze and speech patterns and then moves across interlocutors' windows in the video conferencing. By conducting a controlled experiment with 15 groups of triads, we found the avatar's movement encouraged people to experience more self-disclosure and made them perceive everyone was equally engaged in the conversation than when there was no behavior-driven avatar. We discuss how a behavior-driven avatar influences distributed members' perceptions and the implications of avatar-mediated communication for future platforms.

2024-04-10

BISCUIT: Scaffolding LLM-Generated Code with Ephemeral UIs in Computational Notebooks

Authors: Ruijia Cheng, Titus Barik, Alan Leung, Fred Hohman, Jeffrey Nichols

Link: http://arxiv.org/abs/2404.07387v1

Abstract: Novices frequently engage with machine learning tutorials in computational notebooks and have been adopting code generation technologies based on large language models (LLMs). However, they encounter difficulties in understanding and working with code produced by LLMs. To mitigate these challenges, we introduce a novel workflow into computational notebooks that augments LLM-based code generation with an additional ephemeral UI step, offering users UI-based scaffolds as an intermediate stage between user prompts and code generation. We present this workflow in BISCUIT, an extension for JupyterLab that provides users with ephemeral UIs generated by LLMs based on the context of their code and intentions, scaffolding users to understand, guide, and explore with LLM-generated code. Through 10 user studies where novices used BISCUIT for machine learning tutorials, we discover that BISCUIT offers user semantic representation of code to aid their understanding, reduces the complexity of prompt engineering, and creates a playground for users to explore different variables and iterate on their ideas. We discuss the implications of our findings for UI-centric interactive paradigm in code generation LLMs.

Interactive Explanation of Visual Patterns in Dimensionality Reductions with Predicate Logic

Authors: Brian Montambault, Gabriel Appleby, Jen Rogers, Camelia D. Brumar, Mingwei Li, Remco Chang

Link: http://arxiv.org/abs/2404.07386v1

Abstract: Dimensionality reduction techniques are widely used for visualizing high-dimensional data. However, support for interpreting patterns of dimension reduction results in the context of the original data space is often insufficient. Consequently, users may struggle to extract insights from the projections. In this paper, we introduce DimBridge, a visual analytics tool that allows users to interact with visual patterns in a projection and retrieve corresponding data patterns. DimBridge supports several interactions, allowing users to perform various analyses, from contrasting multiple clusters to explaining complex latent structures. Leveraging first-order predicate logic, DimBridge identifies subspaces in the original dimensions relevant to a queried pattern and provides an interface for users to visualize and interact with them. We demonstrate how DimBridge can help users overcome the challenges associated with interpreting visual patterns in projections.

Building Workflows for Interactive Human in the Loop Automated Experiment (hAE) in STEM-EELS

Authors: Utkarsh Pratiush, Kevin M. Roccapriore, Yongtao Liu, Gerd Duscher, Maxim Ziatdinov, Sergei V. Kalinin

Link: http://arxiv.org/abs/2404.07381v1

Abstract: Exploring the structural, chemical, and physical properties of matter on the nano- and atomic scales has become possible with the recent advances in aberration-corrected electron energy-loss spectroscopy (EELS) in scanning transmission electron microscopy (STEM). However, the current paradigm of STEM-EELS relies on the classical rectangular grid sampling, in which all surface regions are assumed to be of equal a priori interest. This is typically not the case for real-world scenarios, where phenomena of interest are concentrated in a small number of spatial locations. One of foundational problems is the discovery of nanometer- or atomic scale structures having specific signatures in EELS spectra. Here we systematically explore the hyperparameters controlling deep kernel learning (DKL) discovery workflows for STEM-EELS and identify the role of the local structural descriptors and acquisition functions on the experiment progression. In agreement with actual experiment, we observe that for certain parameter combinations the experiment path can be trapped in the local minima. We demonstrate the approaches for monitoring automated experiment in the real and feature space of the system and monitor knowledge acquisition of the DKL model. Based on these, we construct intervention strategies, thus defining human-in the loop automated experiment (hAE). This approach can be further extended to other techniques including 4D STEM and other forms of spectroscopic imaging.

Fabricating Paper Circuits with Subtractive Processing

Authors: Ruhan Yang, Krithik Ranjan, Ellen Yi-Luen Do

Link: http://arxiv.org/abs/2404.07364v1

Abstract: This paper introduces a new method of paper circuit fabrication that overcomes design barriers and increases flexibility in circuit design. Conventional circuit boards rely on thin traces, which limits the complexity and accuracy when applied to paper circuits. To address this issue, we propose a method that uses large conductive zones in paper circuits and performs subtractive processing during their fabrication. This approach eliminates design barriers and allows for more flexibility in circuit design. We introduce PaperCAD, a software tool that simplifies the design process by converting traditional circuit design to paper circuit design. We demonstrate our technique by creating two paper circuit boards. Our approach has the potential to promote the development of new applications for paper circuits.

"We Need Structured Output": Towards User-centered Constraints on Large Language Model Output

Authors: Michael Xieyang Liu, Frederick Liu, Alexander J. Fiannaca, Terry Koo, Lucas Dixon, Michael Terry, Carrie J. Cai

Link: http://arxiv.org/abs/2404.07362v1

Abstract: Large language models can produce creative and diverse responses. However, to integrate them into current developer workflows, it is essential to constrain their outputs to follow specific formats or standards. In this work, we surveyed 51 experienced industry professionals to understand the range of scenarios and motivations driving the need for output constraints from a user-centered perspective. We identified 134 concrete use cases for constraints at two levels: low-level, which ensures the output adhere to a structured format and an appropriate length, and high-level, which requires the output to follow semantic and stylistic guidelines without hallucination. Critically, applying output constraints could not only streamline the currently repetitive process of developing, testing, and integrating LLM prompts for developers, but also enhance the user experience of LLM-powered features and applications. We conclude with a discussion on user preferences and needs towards articulating intended constraints for LLMs, alongside an initial design for a constraint prototyping tool.

Enhancing Accessibility in Soft Robotics: Exploring Magnet-Embedded Paper-Based Interactions

Authors: Ruhan Yang, Ellen Yi-Luen Do

Link: http://arxiv.org/abs/2404.07360v1

Abstract: This paper explores the implementation of embedded magnets to enhance paper-based interactions. The integration of magnets in paper-based interactions simplifies the fabrication process, making it more accessible for building soft robotics systems. We discuss various interaction patterns achievable through this approach and highlight their potential applications.

A Transformer-Based Model for the Prediction of Human Gaze Behavior on Videos

Authors: Suleyman Ozdel, Yao Rong, Berat Mert Albaba, Yen-Ling Kuo, Xi Wang

Link: http://arxiv.org/abs/2404.07351v1

Abstract: Eye-tracking applications that utilize the human gaze in video understanding tasks have become increasingly important. To effectively automate the process of video analysis based on eye-tracking data, it is important to accurately replicate human gaze behavior. However, this task presents significant challenges due to the inherent complexity and ambiguity of human gaze patterns. In this work, we introduce a novel method for simulating human gaze behavior. Our approach uses a transformer-based reinforcement learning algorithm to train an agent that acts as a human observer, with the primary role of watching videos and simulating human gaze behavior. We employed an eye-tracking dataset gathered from videos generated by the VirtualHome simulator, with a primary focus on activity recognition. Our experimental results demonstrate the effectiveness of our gaze prediction method by highlighting its capability to replicate human gaze behavior and its applicability for downstream tasks where real human-gaze is used as input.

Mixed Reality Heritage Performance As a Decolonising Tool for Heritage Sites

Authors: Mariza Dima, Damon Daylamani-Zad, Vangelis Lympouridis

Link: http://arxiv.org/abs/2404.07348v1

Abstract: In this paper we introduce two world-first Mixed Reality (MR) experiences that fuse smart AR glasses and live theatre and take place in a heritage site with the purpose to reveal the site's hidden and difficult histories about slavery. We term these unique general audience experiences Mixed Reality Heritage Performances (MRHP). Along with the development of our initial two performances we designed and developed a tool and guidelines that can help heritage organisations with their decolonising process by critically engaging the public with under-represented voices and viewpoints of troubled European and colonial narratives. The evaluations showed the embodied and affective potential of MRHP to attract and educate heritage audiences visitors. Insights of the design process are being formulated into an extensive design toolkit that aims to support experience design, theatre and heritage professionals to collaboratively carry out similar projects.

Gaze-Guided Graph Neural Network for Action Anticipation Conditioned on Intention

Authors: Suleyman Ozdel, Yao Rong, Berat Mert Albaba, Yen-Ling Kuo, Xi Wang

Link: http://arxiv.org/abs/2404.07347v1

Abstract: Humans utilize their gaze to concentrate on essential information while perceiving and interpreting intentions in videos. Incorporating human gaze into computational algorithms can significantly enhance model performance in video understanding tasks. In this work, we address a challenging and innovative task in video understanding: predicting the actions of an agent in a video based on a partial video. We introduce the Gaze-guided Action Anticipation algorithm, which establishes a visual-semantic graph from the video input. Our method utilizes a Graph Neural Network to recognize the agent's intention and predict the action sequence to fulfill this intention. To assess the efficiency of our approach, we collect a dataset containing household activities generated in the VirtualHome environment, accompanied by human gaze data of viewing videos. Our method outperforms state-of-the-art techniques, achieving a 7% improvement in accuracy for 18-class intention recognition. This highlights the efficiency of our method in learning important features from human gaze data.

Authors: Sungwon In, Erick Krokos, Kirsten Whitley, Chris North, Yalong Yang

Link: http://arxiv.org/abs/2404.07161v1

Abstract: The computational notebook serves as a versatile tool for data analysis. However, its conventional user interface falls short of keeping pace with the ever-growing data-related tasks, signaling the need for novel approaches. With the rapid development of interaction techniques and computing environments, there is a growing interest in integrating emerging technologies in data-driven workflows. Virtual reality, in particular, has demonstrated its potential in interactive data visualizations. In this work, we aimed to experiment with adapting computational notebooks into VR and verify the potential benefits VR can bring. We focus on the navigation and comparison aspects as they are primitive components in analysts' workflow. To further improve comparison, we have designed and implemented a Branching&Merging functionality. We tested computational notebooks on the desktop and in VR, both with and without the added Branching&Merging capability. We found VR significantly facilitated navigation compared to desktop, and the ability to create branches enhanced comparison.

Exploring Physiological Responses in Virtual Reality-based Interventions for Autism Spectrum Disorder: A Data-Driven Investigation

Authors: Gianpaolo Alvari, Ersilia Vallefuoco, Melanie Cristofolini, Elio Salvadori, Marco Dianti, Alessia Moltani, Davide Dal Castello, Paola Venuti, Cesare Furlanello

Link: http://arxiv.org/abs/2404.07159v1

Abstract: Virtual Reality (VR) has emerged as a promising tool for enhancing social skills and emotional well-being in individuals with Autism Spectrum Disorder (ASD). Through a technical exploration, this study employs a multiplayer serious gaming environment within VR, engaging 34 individuals diagnosed with ASD and employing high-precision biosensors for a comprehensive view of the participants' arousal and responses during the VR sessions. Participants were subjected to a series of 3 virtual scenarios designed in collaboration with stakeholders and clinical experts to promote socio-cognitive skills and emotional regulation in a controlled and structured virtual environment. We combined the framework with wearable non-invasive sensors for bio-signal acquisition, focusing on the collection of heart rate variability, and respiratory patterns to monitor participants behaviors. Further, behavioral assessments were conducted using observation and semi-structured interviews, with the data analyzed in conjunction with physiological measures to identify correlations and explore digital-intervention efficacy. Preliminary analysis revealed significant correlations between physiological responses and behavioral outcomes, indicating the potential of physiological feedback to enhance VR-based interventions for ASD. The study demonstrated the feasibility of using real-time data to adapt virtual scenarios, suggesting a promising avenue to support personalized therapy. The integration of quantitative physiological feedback into digital platforms represents a forward step in the personalized intervention for ASD. By leveraging real-time data to adjust therapeutic content, this approach promises to enhance the efficacy and engagement of digital-based therapies.

How Consistent are Clinicians? Evaluating the Predictability of Sepsis Disease Progression with Dynamics Models

Authors: Unnseo Park, Venkatesh Sivaraman, Adam Perer

Link: http://arxiv.org/abs/2404.07148v1

Abstract: Reinforcement learning (RL) is a promising approach to generate treatment policies for sepsis patients in intensive care. While retrospective evaluation metrics show decreased mortality when these policies are followed, studies with clinicians suggest their recommendations are often spurious. We propose that these shortcomings may be due to lack of diversity in observed actions and outcomes in the training data, and we construct experiments to investigate the feasibility of predicting sepsis disease severity changes due to clinician actions. Preliminary results suggest incorporating action information does not significantly improve model performance, indicating that clinician actions may not be sufficiently variable to yield measurable effects on disease progression. We discuss the implications of these findings for optimizing sepsis treatment.

"My toxic trait is thinking I'll remember this": gaps in the learner experience of video tutorials for feature-rich software

Authors: Ian Drosos, Advait Sarkar, Andrew D. Gordon

Link: http://arxiv.org/abs/2404.07114v1

Abstract: Video tutorials are a popular medium for informal and formal learning. However, when learners attempt to view and follow along with these tutorials, they encounter what we call gaps, that is, issues that can prevent learning. We examine the gaps encountered by users of video tutorials for feature-rich software, such as spreadsheets. We develop a theory and taxonomy of such gaps, identifying how they act as barriers to learning, by collecting and analyzing 360 viewer comments from 90 Microsoft Excel video tutorials published by 43 creators across YouTube, TikTok, and Instagram. We conducted contextual interviews with 8 highly influential tutorial creators to investigate the gaps their viewers experience and how they address them. Further, we obtain insights into their creative process and frustrations when creating video tutorials. Finally, we present creators with two designs that aim to address gaps identified in the comment analysis for feedback and alternative design ideas.

VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning

Authors: Alexandros Xenos, Niki Maria Foteinopoulou, Ioanna Ntinou, Ioannis Patras, Georgios Tzimiropoulos

Link: http://arxiv.org/abs/2404.07078v1

Abstract: Recognising emotions in context involves identifying the apparent emotions of an individual, taking into account contextual cues from the surrounding scene. Previous approaches to this task have involved the design of explicit scene-encoding architectures or the incorporation of external scene-related information, such as captions. However, these methods often utilise limited contextual information or rely on intricate training pipelines. In this work, we leverage the groundbreaking capabilities of Vision-and-Large-Language Models (VLLMs) to enhance in-context emotion classification without introducing complexity to the training process in a two-stage approach. In the first stage, we propose prompting VLLMs to generate descriptions in natural language of the subject's apparent emotion relative to the visual context. In the second stage, the descriptions are used as contextual information and, along with the image input, are used to train a transformer-based architecture that fuses text and visual features before the final classification task. Our experimental results show that the text and image features have complementary information, and our fused architecture significantly outperforms the individual modalities without any complex training methods. We evaluate our approach on three different datasets, namely, EMOTIC, CAER-S, and BoLD, and achieve state-of-the-art or comparable accuracy across all datasets and metrics compared to much more complex approaches. The code will be made publicly available on github: https://github.com/NickyFot/EmoCommonSense.git

WordDecipher: Enhancing Digital Workspace Communication with Explainable AI for Non-native English Speakers

Authors: Yuexi Chen, Zhicheng Liu

Link: http://arxiv.org/abs/2404.07005v1

Abstract: Non-native English speakers (NNES) face challenges in digital workspace communication (e.g., emails, Slack messages), often inadvertently translating expressions from their native languages, which can lead to awkward or incorrect usage. Current AI-assisted writing tools are equipped with fluency enhancement and rewriting suggestions; however, NNES may struggle to grasp the subtleties among various expressions, making it challenging to choose the one that accurately reflects their intent. Such challenges are exacerbated in high-stake text-based communications, where the absence of non-verbal cues heightens the risk of misinterpretation. By leveraging the latest advancements in large language models (LLM) and word embeddings, we propose WordDecipher, an explainable AI-assisted writing tool to enhance digital workspace communication for NNES. WordDecipher not only identifies the perceived social intentions detected in users' writing, but also generates rewriting suggestions aligned with users' intended messages, either numerically or by inferring from users' writing in their native language. Then, WordDecipher provides an overview of nuances to help NNES make selections. Through a usage scenario, we demonstrate how WordDecipher can significantly enhance an NNES's ability to communicate her request, showcasing its potential to transform workspace communication for NNES.

Untangling Critical Interaction with AI in Students Written Assessment

Authors: Antonette Shibani, Simon Knight, Kirsty Kitto, Ajanie Karunanayake, Simon Buckingham Shum

Link: http://arxiv.org/abs/2404.06955v1

Abstract: Artificial Intelligence (AI) has become a ubiquitous part of society, but a key challenge exists in ensuring that humans are equipped with the required critical thinking and AI literacy skills to interact with machines effectively by understanding their capabilities and limitations. These skills are particularly important for learners to develop in the age of generative AI where AI tools can demonstrate complex knowledge and ability previously thought to be uniquely human. To activate effective human-AI partnerships in writing, this paper provides a first step toward conceptualizing the notion of critical learner interaction with AI. Using both theoretical models and empirical data, our preliminary findings suggest a general lack of Deep interaction with AI during the writing process. We believe that the outcomes can lead to better task and tool design in the future for learners to develop deep, critical thinking when interacting with AI.

ChildCIdbLong: Longitudinal Child-Computer Interaction Database and Quantitative Analysis for Child Development

Authors: Juan Carlos Ruiz-Garcia, Ruben Tolosana, Ruben Vera-Rodriguez, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Jaime Herreros-Rodriguez

Link: http://arxiv.org/abs/2404.06919v1

Abstract: This article provides a comprehensive overview of recent research in the area of Child-Computer Interaction (CCI). The main contributions of the present article are two-fold. First, we present a novel longitudinal CCI database named ChildCIdbLong, which comprises over 600 children aged 18 months to 8 years old, acquired continuously over 4 academic years (2019-2023). As a result, ChildCIdbLong comprises over 12K test acquisitions over a tablet device. Different tests are considered in ChildCIdbLong, requiring different touch and stylus gestures, enabling evaluation of skills like hand-eye coordination, fine motor skills, planning, and visual tracking, among others. In addition to the ChildCIdbLong database, we propose a novel quantitative metric called Test Quality (Q), designed to measure the motor and cognitive development of children through their interaction with a tablet device. In order to provide a better comprehension of the proposed Q metric, popular percentile-based growth representations are introduced for each test, providing a two-dimensional space to compare children's development with respect to the typical age skills of the population. The results achieved in the present article highlight the potential of the novel ChildCIdbLong database in conjunction with the proposed Q metric to measure the motor and cognitive development of children as they grow up. The proposed framework could be very useful as an automatic tool to support child experts (e.g., paediatricians, educators, or neurologists) for early detection of potential physical/cognitive impairments during children's development.

SARA: Smart AI Reading Assistant for Reading Comprehension

Authors: Enkeleda Thaqi, Mohamed Mantawy, Enkelejda Kasneci

Link: http://arxiv.org/abs/2404.06906v1

Abstract: SARA integrates Eye Tracking and state-of-the-art large language models in a mixed reality framework to enhance the reading experience by providing personalized assistance in real-time. By tracking eye movements, SARA identifies the text segments that attract the user's attention the most and potentially indicate uncertain areas and comprehension issues. The process involves these key steps: text detection and extraction, gaze tracking and alignment, and assessment of detected reading difficulty. The results are customized solutions presented directly within the user's field of view as virtual overlays on identified difficult text areas. This support enables users to overcome challenges like unfamiliar vocabulary and complex sentences by offering additional context, rephrased solutions, and multilingual help. SARA's innovative approach demonstrates it has the potential to transform the reading experience and improve reading proficiency.

Impact of Extensions on Browser Performance: An Empirical Study on Google Chrome

Authors: Bihui Jin, Heng Li, Ying Zou

Link: http://arxiv.org/abs/2404.06827v1

Abstract: Web browsers have been used widely by users to conduct various online activities, such as information seeking or online shopping. To improve user experience and extend the functionality of browsers, practitioners provide mechanisms to allow users to install third-party-provided plugins (i.e., extensions) on their browsers. However, little is known about the performance implications caused by such extensions. In this paper, we conduct an empirical study to understand the impact of extensions on the user-perceived performance (i.e., energy consumption and page load time) of Google Chrome, the most popular browser. We study a total of 72 representative extensions from 11 categories (e.g., Developer Tools and Sports). We observe that browser performance can be negatively impacted by the use of extensions, even when the extensions are used in unintended circumstances (e.g., when logging into an extension is not granted but required, or when an extension is not used for designated websites). We also identify a set of factors that significantly influence the performance impact of extensions, such as code complexity and privacy practices (i.e., collection of user data) adopted by the extensions. Based on our empirical observations, we provide recommendations for developers and users to mitigate the performance impact of browser extensions, such as conducting performance testing and optimization for unintended usage scenarios of extensions, or adhering to proper usage practices of extensions (e.g., logging into an extension when required).

A proposal for a revised meta-architecture of intelligent tutoring systems to foster explainability and transparency for educators

Authors: Florian Gnadlinger, Simone Kriglstein

Link: http://arxiv.org/abs/2404.06820v1

Abstract: This contribution draws attention to implications connected with meta-architectural design decisions for intelligent tutoring systems in the context of formative assessments. As a first result of addressing this issue, this contribution presents a meta-architectural system design that includes the role of educators.

Personality-aware Student Simulation for Conversational Intelligent Tutoring Systems

Authors: Zhengyuan Liu, Stella Xin Yin, Geyu Lin, Nancy F. Chen

Link: http://arxiv.org/abs/2404.06762v1

Abstract: Intelligent Tutoring Systems (ITSs) can provide personalized and self-paced learning experience. The emergence of large language models (LLMs) further enables better human-machine interaction, and facilitates the development of conversational ITSs in various disciplines such as math and language learning. In dialogic teaching, recognizing and adapting to individual characteristics can significantly enhance student engagement and learning efficiency. However, characterizing and simulating student's persona remain challenging in training and evaluating conversational ITSs. In this work, we propose a framework to construct profiles of different student groups by refining and integrating both cognitive and noncognitive aspects, and leverage LLMs for personality-aware student simulation in a language learning scenario. We further enhance the framework with multi-aspect validation, and conduct extensive analysis from both teacher and student perspectives. Our experimental results show that state-of-the-art LLMs can produce diverse student responses according to the given language ability and personality traits, and trigger teacher's adaptive scaffolding strategies.

Incremental XAI: Memorable Understanding of AI with Incremental Explanations

Authors: Jessica Y. Bo, Pan Hao, Brian Y. Lim

Link: http://arxiv.org/abs/2404.06733v1

Abstract: Many explainable AI (XAI) techniques strive for interpretability by providing concise salient information, such as sparse linear factors. However, users either only see inaccurate global explanations, or highly-varying local explanations. We propose to provide more detailed explanations by leveraging the human cognitive capacity to accumulate knowledge by incrementally receiving more details. Focusing on linear factor explanations (factors $\times$ values = outcome), we introduce Incremental XAI to automatically partition explanations for general and atypical instances by providing Base + Incremental factors to help users read and remember more faithful explanations. Memorability is improved by reusing base factors and reducing the number of factors shown in atypical cases. In modeling, formative, and summative user studies, we evaluated the faithfulness, memorability and understandability of Incremental XAI against baseline explanation methods. This work contributes towards more usable explanation that users can better ingrain to facilitate intuitive engagement with AI.

MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education

Authors: Murong Yue, Wijdane Mifdal, Yixuan Zhang, Jennifer Suh, Ziyu Yao

Link: http://arxiv.org/abs/2404.06711v1

Abstract: Mathematical modeling (MM) is considered a fundamental skill for students in STEM disciplines. Practicing the MM skill is often the most effective when students can engage in group discussion and collaborative problem-solving. However, due to unevenly distributed teachers and educational resources needed to monitor such group activities, students do not always receive equal opportunities for this practice. Excitingly, large language models (LLMs) have recently demonstrated strong capability in both modeling mathematical problems and simulating characters with different traits and properties. Drawing inspiration from the advancement of LLMs, in this work, we present MATHVC, the very first LLM-powered virtual classroom containing multiple LLM-simulated student characters, with whom a human student can practice their MM skill. To encourage each LLM character's behaviors to be aligned with their specified math-relevant properties (termed "characteristics alignment") and the overall conversational procedure to be close to an authentic student MM discussion (termed "conversational procedural alignment"), we proposed three innovations: integrating MM domain knowledge into the simulation, defining a symbolic schema as the ground for character simulation, and designing a meta planner at the platform level to drive the conversational procedure. Through experiments and ablation studies, we confirmed the effectiveness of our simulation approach and showed the promise for MATHVC to benefit real-life students in the future.

CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge

Authors: Yu Ying Chiu, Liwei Jiang, Maria Antoniak, Chan Young Park, Shuyue Stella Li, Mehar Bhatia, Sahithya Ravi, Yulia Tsvetkov, Vered Shwartz, Yejin Choi

Link: http://arxiv.org/abs/2404.06664v1

Abstract: Frontier large language models (LLMs) are developed by researchers and practitioners with skewed cultural backgrounds and on datasets with skewed sources. However, LLMs' (lack of) multicultural knowledge cannot be effectively assessed with current methods for developing benchmarks. Existing multicultural evaluations primarily rely on expensive and restricted human annotations or potentially outdated internet resources. Thus, they struggle to capture the intricacy, dynamics, and diversity of cultural norms. LLM-generated benchmarks are promising, yet risk propagating the same biases they are meant to measure. To synergize the creativity and expert cultural knowledge of human annotators and the scalability and standardizability of LLM-based automation, we introduce CulturalTeaming, an interactive red-teaming system that leverages human-AI collaboration to build truly challenging evaluation dataset for assessing the multicultural knowledge of LLMs, while improving annotators' capabilities and experiences. Our study reveals that CulturalTeaming's various modes of AI assistance support annotators in creating cultural questions, that modern LLMs fail at, in a gamified manner. Importantly, the increased level of AI assistance (e.g., LLM-generated revision hints) empowers users to create more difficult questions with enhanced perceived creativity of themselves, shedding light on the promises of involving heavier AI assistance in modern evaluation dataset creation procedures. Through a series of 1-hour workshop sessions, we gather CULTURALBENCH-V0.1, a compact yet high-quality evaluation dataset with users' red-teaming attempts, that different families of modern LLMs perform with accuracy ranging from 37.7% to 72.2%, revealing a notable gap in LLMs' multicultural proficiency.

2024-04-09

Missing Pieces: How Framing Uncertainty Impacts Longitudinal Trust in AI Decision Aids -- A Gig Driver Case Study

Authors: Rex Chen, Ruiyi Wang, Norman Sadeh, Fei Fang

Link: http://arxiv.org/abs/2404.06432v1

Abstract: Decision aids based on artificial intelligence (AI) are becoming increasingly common. When such systems are deployed in environments with inherent uncertainty, following AI-recommended decisions may lead to a wide range of outcomes. In this work, we investigate how the framing of uncertainty in outcomes impacts users' longitudinal trust in AI decision aids, which is crucial to ensuring that these systems achieve their intended purposes. More specifically, we use gig driving as a representative domain to address the question: how does exposing uncertainty at different levels of granularity affect the evolution of users' trust and their willingness to rely on recommended decisions? We report on a longitudinal mixed-methods study $(n = 51)$ where we measured the trust of gig drivers as they interacted with an AI-based schedule recommendation tool. Statistically significant quantitative results indicate that participants' trust in and willingness to rely on the tool for planning depended on the perceived accuracy of the tool's estimates; that providing ranged estimates improved trust; and that increasing prediction granularity and using hedging language improved willingness to rely on the tool even when trust was low. Additionally, we report on interviews with participants which revealed a diversity of experiences with the tool, suggesting that AI systems must build trust by going beyond general designs to calibrate the expectations of individual users.

Apprentices to Research Assistants: Advancing Research with Large Language Models

Authors: M. Namvarpour, A. Razi

Link: http://arxiv.org/abs/2404.06404v1

Abstract: Large Language Models (LLMs) have emerged as powerful tools in various research domains. This article examines their potential through a literature review and firsthand experimentation. While LLMs offer benefits like cost-effectiveness and efficiency, challenges such as prompt tuning, biases, and subjectivity must be addressed. The study presents insights from experiments utilizing LLMs for qualitative analysis, highlighting successes and limitations. Additionally, it discusses strategies for mitigating challenges, such as prompt optimization techniques and leveraging human expertise. This study aligns with the 'LLMs as Research Tools' workshop's focus on integrating LLMs into HCI data work critically and ethically. By addressing both opportunities and challenges, our work contributes to the ongoing dialogue on their responsible application in research.

ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in Videos

Authors: Sharana Dharshikgan Suresh Dass, Hrishav Bakul Barua, Ganesh Krishnasamy, Raveendran Paramesran, Raphael C. -W. Phan

Link: http://arxiv.org/abs/2404.06243v1

Abstract: Human action or activity recognition in videos is a fundamental task in computer vision with applications in surveillance and monitoring, self-driving cars, sports analytics, human-robot interaction and many more. Traditional supervised methods require large annotated datasets for training, which are expensive and time-consuming to acquire. This work proposes a novel approach using Cross-Architecture Pseudo-Labeling with contrastive learning for semi-supervised action recognition. Our framework leverages both labeled and unlabelled data to robustly learn action representations in videos, combining pseudo-labeling with contrastive learning for effective learning from both types of samples. We introduce a novel cross-architecture approach where 3D Convolutional Neural Networks (3D CNNs) and video transformers (VIT) are utilised to capture different aspects of action representations; hence we call it ActNetFormer. The 3D CNNs excel at capturing spatial features and local dependencies in the temporal domain, while VIT excels at capturing long-range dependencies across frames. By integrating these complementary architectures within the ActNetFormer framework, our approach can effectively capture both local and global contextual information of an action. This comprehensive representation learning enables the model to achieve better performance in semi-supervised action recognition tasks by leveraging the strengths of each of these architectures. Experimental results on standard action recognition datasets demonstrate that our approach performs better than the existing methods, achieving state-of-the-art performance with only a fraction of labeled data. The official website of this work is available at: https://github.com/rana2149/ActNetFormer.

Multimodal Road Network Generation Based on Large Language Model

Authors: Jiajing Chen, Weihang Xu, Haiming Cao, Zihuan Xu, Yu Zhang, Zhao Zhang, Siyao Zhang

Link: http://arxiv.org/abs/2404.06227v1

Abstract: With the increasing popularity of ChatGPT, large language models (LLMs) have demonstrated their capabilities in communication and reasoning, promising for transportation sector intelligentization. However, they still face challenges in domain-specific knowledge. This paper aims to leverage LLMs' reasoning and recognition abilities to replace traditional user interfaces and create an "intelligent operating system" for transportation simulation software, exploring their potential with transportation modeling and simulation. We introduce Network Generation AI (NGAI), integrating LLMs with road network modeling plugins, validated through experiments for accuracy and robustness. NGAI's effective use has reduced modeling costs, revolutionized transportation simulations, optimized user steps, and proposed a novel approach for LLM integration in the transportation field.

Privacy-preserving Scanpath Comparison for Pervasive Eye Tracking

Authors: Suleyman Ozdel, Efe Bozkir, Enkelejda Kasneci

Link: http://arxiv.org/abs/2404.06216v1

Abstract: As eye tracking becomes pervasive with screen-based devices and head-mounted displays, privacy concerns regarding eye-tracking data have escalated. While state-of-the-art approaches for privacy-preserving eye tracking mostly involve differential privacy and empirical data manipulations, previous research has not focused on methods for scanpaths. We introduce a novel privacy-preserving scanpath comparison protocol designed for the widely used Needleman-Wunsch algorithm, a generalized version of the edit distance algorithm. Particularly, by incorporating the Paillier homomorphic encryption scheme, our protocol ensures that no private information is revealed. Furthermore, we introduce a random processing strategy and a multi-layered masking method to obfuscate the values while preserving the original order of encrypted editing operation costs. This minimizes communication overhead, requiring a single communication round for each iteration of the Needleman-Wunsch process. We demonstrate the efficiency and applicability of our protocol on three publicly available datasets with comprehensive computational performance analyses and make our source code publicly accessible.

EVE: Enabling Anyone to Train Robot using Augmented Reality

Authors: Jun Wang, Chun-Cheng Chang, Jiafei Duan, Dieter Fox, Ranjay Krishna

Link: http://arxiv.org/abs/2404.06089v1

Abstract: The increasing affordability of robot hardware is accelerating the integration of robots into everyday activities. However, training a robot to automate a task typically requires physical robots and expensive demonstration data from trained human annotators. Consequently, only those with access to physical robots produce demonstrations to train robots. To mitigate this issue, we introduce EVE, an iOS app that enables everyday users to train robots using intuitive augmented reality visualizations without needing a physical robot. With EVE, users can collect demonstrations by specifying waypoints with their hands, visually inspecting the environment for obstacles, modifying existing waypoints, and verifying collected trajectories. In a user study ($N=14$, $D=30$) consisting of three common tabletop tasks, EVE outperformed three state-of-the-art interfaces in success rate and was comparable to kinesthetic teaching-physically moving a real robot-in completion time, usability, motion intent communication, enjoyment, and preference ($mean_{p}=0.30$). We conclude by enumerating limitations and design considerations for future AR-based demonstration collection systems for robotics.

Breathing New Life into Existing Visualizations: A Natural Language-Driven Manipulation Framework

Authors: Can Liu, Jiacheng Yu, Yuhan Guo, Jiayi Zhuang, Yuchu Luo, Xiaoru Yuan

Link: http://arxiv.org/abs/2404.06039v1

Abstract: We propose an approach to manipulate existing interactive visualizations to answer users' natural language queries. We analyze the natural language tasks and propose a design space of a hierarchical task structure, which allows for a systematic decomposition of complex queries. We introduce a four-level visualization manipulation space to facilitate in-situ manipulations for visualizations, enabling a fine-grained control over the visualization elements. Our methods comprise two essential components: the natural language-to-task translator and the visualization manipulation parser. The natural language-to-task translator employs advanced NLP techniques to extract structured, hierarchical tasks from natural language queries, even those with varying degrees of ambiguity. The visualization manipulation parser leverages the hierarchical task structure to streamline these tasks into a sequence of atomic visualization manipulations. To illustrate the effectiveness of our approach, we provide real-world examples and experimental results. The evaluation highlights the precision of our natural language parsing capabilities and underscores the smooth transformation of visualization manipulations.

Cymatics Cup: Shape-Changing Drinks by Leveraging Cymatics

Authors: Weijen Chen, Yang Yang, Kao-Hua Liu, Yun Suen Pai, Junichi Yamaoka, Kouta Minamizawa

Link: http://arxiv.org/abs/2404.06027v1

Abstract: To enhance the dining experience, prior studies in Human-Computer Interaction (HCI) and gastrophysics have demonstrated that modifying the static shape of solid foods can amplify taste perception. However, the exploration of dynamic shape-changing mechanisms in liquid foods remains largely untapped. In the present study, we employ cymatics, a scientific discipline focused on utilizing sound frequencies to generate patterns in liquids and particles to augment the drinking experience. Utilizing speakers, we dynamically reshaped liquids exhibiting five distinct taste profiles and evaluated resultant changes in taste perception and drinking experience. Our research objectives extend beyond merely augmenting taste from visual to tactile sensations; we also prioritize the experiential aspects of drinking. Through a series of experiments and workshops, we revealed a significant impact on taste perception and overall drinking experience when mediated by cymatics effects. Building upon these findings, we designed and developed tableware to integrate cymatics principles into gastronomic experiences.

Combinational Nonuniform Timeslicing of Dynamic Networks

Authors: Seokweon Jung, DongHwa Shin, Hyeon Jeon, Jinwook Seo

Link: http://arxiv.org/abs/2404.06021v1

Abstract: Dynamic networks represent the complex and evolving interrelationships between real-world entities. Given the scale and variability of these networks, finding an optimal slicing interval is essential for meaningful analysis. Nonuniform timeslicing, which adapts to density changes within the network, is drawing attention as a solution to this problem. In this research, we categorized existing algorithms into two domains -- data mining and visualization -- according to their approach to the problem. Data mining approach focuses on capturing temporal patterns of dynamic networks, while visualization approach emphasizes lessening the burden of analysis. We then introduce a novel nonuniform timeslicing method that synthesizes the strengths of both approaches, demonstrating its efficacy with a real-world data. The findings suggest that combining the two approaches offers the potential for more effective network analysis.

Inclusive Practices for Child-Centered AI Design and Testing

Authors: Emani Dotch, Vitica Arnold

Link: http://arxiv.org/abs/2404.05920v1

Abstract: We explore ideas and inclusive practices for designing and testing child-centered artificially intelligent technologies for neurodivergent children. AI is promising for supporting social communication, self-regulation, and sensory processing challenges common for neurodivergent children. The authors, both neurodivergent individuals and related to neurodivergent people, draw from their professional and personal experiences to offer insights on creating AI technologies that are accessible and include input from neurodivergent children. We offer ideas for designing AI technologies for neurodivergent children and considerations for including them in the design process while accounting for their sensory sensitivities. We conclude by emphasizing the importance of adaptable and supportive AI technologies and design processes and call for further conversation to refine child-centered AI design and testing methods.

2024-04-08

ClusterRadar: an Interactive Web-Tool for the Multi-Method Exploration of Spatial Clusters Over Time

Authors: Lee Mason, Blánaid Hicks, Jonas S. Almeida

Link: http://arxiv.org/abs/2404.05897v1

Abstract: Spatial cluster analysis, the detection of localized patterns of similarity in geospatial data, has a wide-range of applications for scientific discovery and practical decision making. One way to detect spatial clusters is by using local indicators of spatial association, such as Local Moran's I or Getis-Ord Gi*. However, different indicators tend to produce substantially different results due to their distinct operational characteristics. Choosing a suitable method or comparing results from multiple methods is a complex task. Furthermore, spatial clusters are dynamic and it is often useful to track their evolution over time, which adds an additional layer of complexity. ClusterRadar is a web-tool designed to address these analytical challenges. The tool allows users to easily perform spatial clustering and analyze the results in an interactive environment, uniquely prioritizing temporal analysis and the comparison of multiple methods. The tool's interactive dashboard presents several visualizations, each offering a distinct perspective of the temporal and methodological aspects of the spatial clustering results. ClusterRadar has several features designed to maximize its utility to a broad user-base, including support for various geospatial formats, and a fully in-browser execution environment to preserve the privacy of sensitive data. Feedback from a varied set of researchers suggests ClusterRadar's potential for enhancing the temporal analysis of spatial clusters.

Authors: Rafael M. L. Silva, Ana María Cárdenas Gasca, Joshua A. Fisher, Erica Principe Cruz, Cinthya Jauregui, Amy Lueck, Fannie Liu, Andrés Monroy-Hernández, Kai Lukoff

Link: http://arxiv.org/abs/2404.05889v1

Abstract: This volume represents the proceedings of With or Without Permission: Site-Specific Augmented Reality for Social Justice CHI 2024 workshop.

Youth as Peer Auditors: Engaging Teenagers with Algorithm Auditing of Machine Learning Applications

Authors: Luis Morales-Navarro, Yasmin B. Kafai, Vedya Konda, Danaë Metaxa

Link: http://arxiv.org/abs/2404.05874v1

Abstract: As artificial intelligence/machine learning (AI/ML) applications become more pervasive in youth lives, supporting them to interact, design, and evaluate applications is crucial. This paper positions youth as auditors of their peers' ML-powered applications to better understand algorithmic systems' opaque inner workings and external impacts. In a two-week workshop, 13 youth (ages 14-15) designed and audited ML-powered applications. We analyzed pre/post clinical interviews in which youth were presented with auditing tasks. The analyses show that after the workshop all youth identified algorithmic biases and inferred dataset and model design issues. Youth also discussed algorithmic justice issues and ML model improvements. Furthermore, youth reflected that auditing provided them new perspectives on model functionality and ideas to improve their own models. This work contributes (1) a conceptualization of algorithm auditing for youth; and (2) empirical evidence of the potential benefits of auditing. We discuss potential uses of algorithm auditing in learning and child-computer interaction research.

An empirical evaluation for defining a mid-air gesture dictionary for web-based interaction

Authors: Thomas Pasquale, Cristina Gena, Fabiana Vernero

Link: http://arxiv.org/abs/2404.05842v1

Abstract: This paper presents an empirical evaluation of mid-air gestures in a web setting. Fifty-six (56) subjects, all of them HCI students, were divided into 16 groups and involved as designers. Each group worked separately with the same requirements. Firstly, designers identified the main actions required for a web-based interaction with a university classroom search service. Secondly, they proposed a set of mid-air gestures to carry out the identified actions: 99 different mid-air gestures for 16 different web actions were produced in total. Then, designers validated their proposals involving external subjects, namely 248 users in total. Finally, we analyzed their results and identified the most recurring or intuitive gestures as well as the potential criticalities associated with their proposals. Hence, we defined a mid-air gesture dictionary that contains, according to our analysis, the most suitable gestures for each identified web action. Our results suggest that most people tend to replicate gestures used in touch-based and mouse-based interfaces also in touchless interactions, ignoring the fact that they can be problematic due to the different distance between the user and the device in each interaction context.

Human-Machine Interaction in Automated Vehicles: Reducing Voluntary Driver Intervention

Authors: Xinzhi Zhong, Yang Zhou, Varshini Kamaraj, Zhenhao Zhou, Wissam Kontar, Dan Negrut, John D. Lee, Soyoung Ahn

Link: http://arxiv.org/abs/2404.05832v1

Abstract: This paper develops a novel car-following control method to reduce voluntary driver interventions and improve traffic stability in Automated Vehicles (AVs). Through a combination of experimental and empirical analysis, we show how voluntary driver interventions can instigate substantial traffic disturbances that are amplified along the traffic upstream. Motivated by these findings, we present a framework for driver intervention based on evidence accumulation (EA), which describes the evolution of the driver's distrust in automation, ultimately resulting in intervention. Informed through the EA framework, we propose a deep reinforcement learning (DRL)-based car-following control for AVs that is strategically designed to mitigate unnecessary driver intervention and improve traffic stability. Numerical experiments are conducted to demonstrate the effectiveness of the proposed control model.

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Authors: Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, Zhe Gan

Link: http://arxiv.org/abs/2404.05719v1

Abstract: Recent advancements in multimodal large language models (MLLMs) have been noteworthy, yet, these general-domain MLLMs often fall short in their ability to comprehend and interact effectively with user interface (UI) screens. In this paper, we present Ferret-UI, a new MLLM tailored for enhanced understanding of mobile UI screens, equipped with referring, grounding, and reasoning capabilities. Given that UI screens typically exhibit a more elongated aspect ratio and contain smaller objects of interest (e.g., icons, texts) than natural images, we incorporate "any resolution" on top of Ferret to magnify details and leverage enhanced visual features. Specifically, each screen is divided into 2 sub-images based on the original aspect ratio (i.e., horizontal division for portrait screens and vertical division for landscape screens). Both sub-images are encoded separately before being sent to LLMs. We meticulously gather training samples from an extensive range of elementary UI tasks, such as icon recognition, find text, and widget listing. These samples are formatted for instruction-following with region annotations to facilitate precise referring and grounding. To augment the model's reasoning ability, we further compile a dataset for advanced tasks, including detailed description, perception/interaction conversations, and function inference. After training on the curated datasets, Ferret-UI exhibits outstanding comprehension of UI screens and the capability to execute open-ended instructions. For model evaluation, we establish a comprehensive benchmark encompassing all the aforementioned tasks. Ferret-UI excels not only beyond most open-source UI MLLMs, but also surpasses GPT-4V on all the elementary UI tasks.

Eye Tracking on Text Reading with Visual Enhancements

Authors: Franziska Huth, Maurice Koch, Miriam Awad, Daniel Weiskopf, Kuno Kurzhals

Link: http://arxiv.org/abs/2404.05572v1

Abstract: The interplay between text and visualization is gaining importance for media where traditional text is enriched by visual elements to improve readability and emphasize facts. In two controlled eye-tracking experiments ($N=12$), we approach answers to the question: How do visualization techniques influence reading behavior? We compare plain text to that marked with highlights, icons, and word-sized data visualizations. We assess quantitative metrics~(eye movement, completion time, error rate) and subjective feedback~(personal preference and ratings). The results indicate that visualization techniques, especially in the first experiment, show promising trends for improved reading behavior. The results also show the need for further research to make reading more effective and inform suggestions for future studies.

Interactive Formal Specification for Mathematical Problems of Engineers

Authors: Walther Neuper

Link: http://arxiv.org/abs/2404.05462v1

Abstract: The paper presents the second part of a precise description of the prototype that has been developed in the course of the ISAC project over the last two decades. This part describes the "specify-phase", while the first part describing the "solve-phase" is already published. In the specify-phase a student interactively constructs a formal specification. The ISAC prototype implements formal specifications as established in theoretical computer science, however, the input language for the construction avoids requiring users to have knowledge of logic; this makes the system useful for various engineering faculties (and also for high school). The paper discusses not only ISAC's design of the specify-phase in detail, but also gives a brief introduction to implementation with the aim of advertising the re-use of formal frameworks (inclusive respective front-ends) with their generic tools for language definition and their rich pool of software components for formal mathematics.

Unlocking Adaptive User Experience with Generative AI

Authors: Yutan Huang, Tanjila Kanij, Anuradha Madugalla, Shruti Mahajan, Chetan Arora, John Grundy

Link: http://arxiv.org/abs/2404.05442v1

Abstract: Developing user-centred applications that address diverse user needs requires rigorous user research. This is time, effort and cost-consuming. With the recent rise of generative AI techniques based on Large Language Models (LLMs), there is a possibility that these powerful tools can be used to develop adaptive interfaces. This paper presents a novel approach to develop user personas and adaptive interface candidates for a specific domain using ChatGPT. We develop user personas and adaptive interfaces using both ChatGPT and a traditional manual process and compare these outcomes. To obtain data for the personas we collected data from 37 survey participants and 4 interviews in collaboration with a not-for-profit organisation. The comparison of ChatGPT generated content and manual content indicates promising results that encourage using LLMs in the adaptive interfaces design process.

Re-Ranking News Comments by Constructiveness and Curiosity Significantly Increases Perceived Respect, Trustworthiness, and Interest

Authors: Emily Saltz, Zaria Howard, Tin Acosta

Link: http://arxiv.org/abs/2404.05429v1

Abstract: Online commenting platforms have commonly developed systems to address online harms by removing and down-ranking content. An alternative, under-explored approach is to focus on up-ranking content to proactively prioritize prosocial commentary and set better conversational norms. We present a study with 460 English-speaking US-based news readers to understand the effects of re-ranking comments by constructiveness, curiosity, and personal stories on a variety of outcomes related to willingness to participate and engage, as well as perceived credibility and polarization in a comment section. In our rich-media survey experiment, participants across these four ranking conditions and a control group reviewed prototypes of comment sections of a Politics op-ed and Dining article. We found that outcomes varied significantly by article type. Up-ranking curiosity and constructiveness improved a number of measures for the Politics article, including perceived \textit{Respect}, \textit{Trustworthiness}, and \textit{Interestingness} of the comment section. Constructiveness also increased perceptions that the comments were favorable to Republicans, with no condition worsening perceptions of partisans. Additionally, in the Dining article, personal stories and constructiveness rankings significantly improved the perceived informativeness of the comments. Overall, these findings indicate that incorporating prosocial qualities of speech into ranking could be a promising approach to promote healthier, less polarized dialogue in online comment sections.

Indexing Analytics to Instances: How Integrating a Dashboard can Support Design Education

Authors: Ajit Jain, Andruid Kerne, Nic Lupfer, Gabriel Britain, Aaron Perrine, Yoonsuck Choe, John Keyser, Ruihong Huang, Jinsil Seo, Annie Sungkajun, Robert Lightfoot, Timothy McGuire

Link: http://arxiv.org/abs/2404.05417v1

Abstract: We investigate how to use AI-based analytics to support design education. The analytics at hand measure multiscale design, that is, students' use of space and scale to visually and conceptually organize their design work. With the goal of making the analytics intelligible to instructors, we developed a research artifact integrating a design analytics dashboard with design instances, and the design environment that students use to create them. We theorize about how Suchman's notion of mutual intelligibility requires contextualized investigation of AI in order to develop findings about how analytics work for people. We studied the research artifact in 5 situated course contexts, in 3 departments. A total of 236 students used the multiscale design environment. The 9 instructors who taught those students experienced the analytics via the new research artifact. We derive findings from a qualitative analysis of interviews with instructors regarding their experiences. Instructors reflected on how the analytics and their presentation in the dashboard have the potential to affect design education. We develop research implications addressing: (1) how indexing design analytics in the dashboard to actual design work instances helps design instructors reflect on what they mean and, more broadly, is a technique for how AI-based design analytics can support instructors' assessment and feedback experiences in situated course contexts; and (2) how multiscale design analytics, in particular, have the potential to support design education. By indexing, we mean linking which provides context, here connecting the numbers of the analytics with visually annotated design work instances.

WebXR, A-Frame and Networked-Aframe as a Basis for an Open Metaverse: A Conceptual Architecture

Authors: Giuseppe Macario

Link: http://arxiv.org/abs/2404.05317v1

Abstract: This work proposes a WebXR-based cross-platform conceptual architecture, leveraging the A-Frame and Networked-Aframe frameworks, in order to facilitate the development of an open, accessible, and interoperable metaverse. By introducing the concept of spatial web app, this research contributes to the discourse on the metaverse, offering an architecture that democratizes access to virtual environments and extended reality through the web, and aligns with Tim Berners-Lee's original vision of the World Wide Web as an open platform in the digital realm.

Exploiting Preference Elicitation in Interactive and User-centered Algorithmic Recourse: An Initial Exploration

Authors: Seyedehdelaram Esfahani, Giovanni De Toni, Bruno Lepri, Andrea Passerini, Katya Tentori, Massimo Zancanaro

Link: http://arxiv.org/abs/2404.05270v1

Abstract: Algorithmic Recourse aims to provide actionable explanations, or recourse plans, to overturn potentially unfavourable decisions taken by automated machine learning models. In this paper, we propose an interaction paradigm based on a guided interaction pattern aimed at both eliciting the users' preferences and heading them toward effective recourse interventions. In a fictional task of money lending, we compare this approach with an exploratory interaction pattern based on a combination of alternative plans and the possibility of freely changing the configurations by the users themselves. Our results suggest that users may recognize that the guided interaction paradigm improves efficiency. However, they also feel less freedom to experiment with "what-if" scenarios. Nevertheless, the time spent on the purely exploratory interface tends to be perceived as a lack of efficiency, which reduces attractiveness, perspicuity, and dependability. Conversely, for the guided interface, more time on the interface seems to increase its attractiveness, perspicuity, and dependability while not impacting the perceived efficiency. That might suggest that this type of interfaces should combine these two approaches by trying to support exploratory behavior while gently pushing toward a guided effective solution.

Allowing humans to interactively guide machines where to look does not always improve a human-AI team's classification accuracy

Authors: Giang Nguyen, Mohammad Reza Taesiri, Sunnie S. Y. Kim, Anh Nguyen

Link: http://arxiv.org/abs/2404.05238v1

Abstract: Via thousands of papers in Explainable AI (XAI), attention maps \cite{vaswani2017attention} and feature attribution maps \cite{bansal2020sam} have been established as a common means for explaining the input features that are important to AI's decisions. It is an interesting but unexplored question whether allowing users to edit the importance scores of input features at test time would improve the human-AI team's accuracy on downstream tasks. In this paper, we address this question by taking CHM-Corr, a state-of-the-art, ante-hoc explanation method \cite{taesiri2022visual} that first predicts patch-wise correspondences between the input and the training-set images, and then uses them to make classification decisions. We build an interactive interface on top of CHM-Corr, enabling users to directly edit the initial feature attribution map provided by CHM-Corr. Via our CHM-Corr++ interface, users gain insights into if, when, and how the model changes its outputs, enhancing understanding beyond static explanations. Our user study with 18 machine learning researchers who performed $\sim$1,400 decisions shows that our interactive approach does not improve user accuracy on CUB-200 bird image classification over static explanations. This challenges the belief that interactivity inherently boosts XAI effectiveness~\cite{sokol2020one,sun2022exploring,shen2024towards,singh2024rethinking,mindlin2024beyond,lakkaraju2022rethinking,cheng2019explaining,liu2021understanding} and raises needs for future research. Our work contributes to the field by open-sourcing an interactive tool for manipulating model attention, and it lays the groundwork for future research to enable effective human-AI interaction in computer vision. We release code and data on \href{https://anonymous.4open.science/r/CHMCorrPlusPlus/}{github}. Our interface are available \href{http://137.184.82.109:7080/}{here}.

Fair Machine Guidance to Enhance Fair Decision Making in Biased People

Authors: Mingzhe Yang, Hiromi Arai, Naomi Yamashita, Yukino Baba

Link: http://arxiv.org/abs/2404.05228v1

Abstract: Teaching unbiased decision-making is crucial for addressing biased decision-making in daily life. Although both raising awareness of personal biases and providing guidance on unbiased decision-making are essential, the latter topics remains under-researched. In this study, we developed and evaluated an AI system aimed at educating individuals on making unbiased decisions using fairness-aware machine learning. In a between-subjects experimental design, 99 participants who were prone to bias performed personal assessment tasks. They were divided into two groups: a) those who received AI guidance for fair decision-making before the task and b) those who received no such guidance but were informed of their biases. The results suggest that although several participants doubted the fairness of the AI system, fair machine guidance prompted them to reassess their views regarding fairness, reflect on their biases, and modify their decision-making criteria. Our findings provide insights into the design of AI systems for guiding fair decision-making in humans.

Evaluation of an LLM in Identifying Logical Fallacies: A Call for Rigor When Adopting LLMs in HCI Research

Authors: Gionnieve Lim, Simon T. Perrault

Link: http://arxiv.org/abs/2404.05213v1

Abstract: There is increasing interest in the adoption of LLMs in HCI research. However, LLMs may often be regarded as a panacea because of their powerful capabilities with an accompanying oversight on whether they are suitable for their intended tasks. We contend that LLMs should be adopted in a critical manner following rigorous evaluation. Accordingly, we present the evaluation of an LLM in identifying logical fallacies that will form part of a digital misinformation intervention. By comparing to a labeled dataset, we found that GPT-4 achieves an accuracy of 0.79, and for our intended use case that excludes invalid or unidentified instances, an accuracy of 0.90. This gives us the confidence to proceed with the application of the LLM while keeping in mind the areas where it still falls short. The paper describes our evaluation approach, results and reflections on the use of the LLM for our intended task.

2024-04-07

Chart What I Say: Exploring Cross-Modality Prompt Alignment in AI-Assisted Chart Authoring

Authors: Nazar Ponochevnyi, Anastasia Kuzminykh

Link: http://arxiv.org/abs/2404.05103v1

Abstract: Recent chart-authoring systems, such as Amazon Q in QuickSight and Copilot for Power BI, demonstrate an emergent focus on supporting natural language input to share meaningful insights from data through chart creation. Currently, chart-authoring systems tend to integrate voice input capabilities by relying on speech-to-text transcription, processing spoken and typed input similarly. However, cross-modality input comparisons in other interaction domains suggest that the structure of spoken and typed-in interactions could notably differ, reflecting variations in user expectations based on interface affordances. Thus, in this work, we compare spoken and typed instructions for chart creation. Findings suggest that while both text and voice instructions cover chart elements and element organization, voice descriptions have a variety of command formats, element characteristics, and complex linguistic features. Based on these findings, we developed guidelines for designing voice-based authoring-oriented systems and additional features that can be incorporated into existing text-based systems to support speech modality.

Co-design Accessible Public Robots: Insights from People with Mobility Disability, Robotic Practitioners and Their Collaborations

Authors: Howard Ziyu Han, Franklin Mingzhe Li, Alesandra Baca Vazquez, Daragh Byrne, Nikolas Martelaro, Sarah E Fox

Link: http://arxiv.org/abs/2404.05050v1

Abstract: Sidewalk robots are increasingly common across the globe. Yet, their operation on public paths poses challenges for people with mobility disabilities (PwMD) who face barriers to accessibility, such as insufficient curb cuts. We interviewed 15 PwMD to understand how they perceive sidewalk robots. Findings indicated that PwMD feel they have to compete for space on the sidewalk when robots are introduced. We next interviewed eight robotics practitioners to learn about their attitudes towards accessibility. Practitioners described how issues often stem from robotic companies addressing accessibility only after problems arise. Both interview groups underscored the importance of integrating accessibility from the outset. Building on this finding, we held four co-design workshops with PwMD and practitioners in pairs. These convenings brought to bear accessibility needs around robots operating in public spaces and in the public interest. Our study aims to set the stage for a more inclusive future around public service robots.

Reduction of Forgetting by Contextual Variation During Encoding Using 360-Degree Video-Based Immersive Virtual Environments

Authors: Takato Mizuho, Takuji Narumi, Hideaki Kuzuoka

Link: http://arxiv.org/abs/2404.05007v1

Abstract: Recall impairment in a different environmental context from learning is called context-dependent forgetting. Two learning methods have been proposed to prevent context-dependent forgetting: reinstatement and decontextualization. Reinstatement matches the environmental context between learning and retrieval, whereas decontextualization involves repeated learning in various environmental contexts and eliminates the context dependency of memory. Conventionally, these methods have been validated by switching between physical rooms. However, in this study, we use immersive virtual environments (IVEs) as the environmental context assisted by virtual reality (VR), which is known for its low cost and high reproducibility compared to traditional manipulation. Whereas most existing studies using VR have failed to reveal the reinstatement effect, we test its occurrence using a 360-degree video-based IVE with improved familiarity and realism instead of a computer graphics-based IVE. Furthermore, we are the first to address decontextualization using VR. Our experiment showed that repeated learning in the same constant IVE as retrieval did not significantly reduce forgetting compared to repeated learning in different constant IVEs. Conversely, repeated learning in various IVEs significantly reduced forgetting than repeated learning in constant IVEs. These findings contribute to the design of IVEs for VR-based applications, particularly in educational settings.

Towards Developing Brain-Computer Interfaces for People with Multiple Sclerosis

Authors: John S. Russo, Tim Mahoney, Kirill Kokorin, Ashley Reynolds, Chin-Hsuan Sophie Lin, Sam E. John, David B. Grayden

Link: http://arxiv.org/abs/2404.04965v2

Abstract: Multiple Sclerosis (MS) is a severely disabling condition that leads to various neurological symptoms. A Brain-Computer Interface (BCI) may substitute some lost function; however, there is a lack of BCI research in people with MS. To progress this research area effectively and efficiently, we aimed to evaluate user needs and assess the feasibility and user-centric requirements of a BCI for people with MS. We conducted an online survey of 34 people with MS to qualitatively assess user preferences and establish the initial steps of user-centred design. The survey aimed to understand their interest and preferences in BCI and bionic applications. We demonstrated widespread interest for BCI applications in all stages of MS, with a preference for a non-invasive (n = 12) or minimally invasive (n = 15) BCI over carer assistance (n = 6). Qualitative assessment indicated that this preference was not influenced by level of independence. Additionally, strong interest was noted in bionic technology for sensory and autonomic functions. Considering the potential to enhance independence and quality of life for people living with MS, the results emphasise the importance of user-centred design for future advancement of BCIs that account for the unique pathological changes associated with MS.

Balancing Information Perception with Yin-Yang: Agent-Based Information Neutrality Model for Recommendation Systems

Authors: Mengyan Wang, Yuxuan Hu, Shiqing Wu, Weihua Li, Quan Bai, Verica Rupar

Link: http://arxiv.org/abs/2404.04906v1

Abstract: While preference-based recommendation algorithms effectively enhance user engagement by recommending personalized content, they often result in the creation of ``filter bubbles''. These bubbles restrict the range of information users interact with, inadvertently reinforcing their existing viewpoints. Previous research has focused on modifying these underlying algorithms to tackle this issue. Yet, approaches that maintain the integrity of the original algorithms remain largely unexplored. This paper introduces an Agent-based Information Neutrality model grounded in the Yin-Yang theory, namely, AbIN. This innovative approach targets the imbalance in information perception within existing recommendation systems. It is designed to integrate with these preference-based systems, ensuring the delivery of recommendations with neutral information. Our empirical evaluation of this model proved its efficacy, showcasing its capacity to expand information diversity while respecting user preferences. Consequently, AbIN emerges as an instrumental tool in mitigating the negative impact of filter bubbles on information consumption.

2024-04-06

Navigating the Landscape of Hint Generation Research: From the Past to the Future

Authors: Anubhav Jangra, Jamshid Mozafari, Adam Jatowt, Smaranda Muresan

Link: http://arxiv.org/abs/2404.04728v1

Abstract: Digital education has gained popularity in the last decade, especially after the COVID-19 pandemic. With the improving capabilities of large language models to reason and communicate with users, envisioning intelligent tutoring systems (ITSs) that can facilitate self-learning is not very far-fetched. One integral component to fulfill this vision is the ability to give accurate and effective feedback via hints to scaffold the learning process. In this survey article, we present a comprehensive review of prior research on hint generation, aiming to bridge the gap between research in education and cognitive science, and research in AI and Natural Language Processing. Informed by our findings, we propose a formal definition of the hint generation task, and discuss the roadmap of building an effective hint generation system aligned with the formal definition, including open challenges, future directions and ethical considerations.

"Don't Step on My Toes": Resolving Editing Conflicts in Real-Time Collaboration in Computational Notebooks

Authors: April Yi Wang, Zihan Wu, Christopher Brooks, Steve Oney

Link: http://arxiv.org/abs/2404.04695v1

Abstract: Real-time collaborative editing in computational notebooks can improve the efficiency of teamwork for data scientists. However, working together through synchronous editing of notebooks introduces new challenges. Data scientists may inadvertently interfere with each others' work by altering the shared codebase and runtime state if they do not set up a social protocol for working together and monitoring their collaborators' progress. In this paper, we propose a real-time collaborative editing model for resolving conflict edits in computational notebooks that introduces three levels of edit protection to help collaborators avoid introducing errors to both the program source code and changes to the runtime state.

Designing for Complementarity: A Conceptual Framework to Go Beyond the Current Paradigm of Using XAI in Healthcare

Authors: Elisa Rubegni, Omran Ayoub, Stefania Maria Rita Rizzo, Marco Barbero, Guenda Bernegger, Francesca Faraci, Francesca Mangili, Emiliano Soldini, Pierpaolo Trimboli, Alessandro Facchini

Link: http://arxiv.org/abs/2404.04638v1

Abstract: The widespread use of Artificial Intelligence-based tools in the healthcare sector raises many ethical and legal problems, one of the main reasons being their black-box nature and therefore the seemingly opacity and inscrutability of their characteristics and decision-making process. Literature extensively discusses how this can lead to phenomena of over-reliance and under-reliance, ultimately limiting the adoption of AI. We addressed these issues by building a theoretical framework based on three concepts: Feature Importance, Counterexample Explanations, and Similar-Case Explanations. Grounded in the literature, the model was deployed within a case study in which, using a participatory design approach, we designed and developed a high-fidelity prototype. Through the co-design and development of the prototype and the underlying model, we advanced the knowledge on how to design AI-based systems for enabling complementarity in the decision-making process in the healthcare domain. Our work aims at contributing to the current discourse on designing AI systems to support clinicians' decision-making processes.

Analyzing LLM Usage in an Advanced Computing Class in India

Authors: Chaitanya Arora, Utkarsh Venaik, Pavit Singh, Sahil Goyal, Jatin Tyagi, Shyama Goel, Ujjwal Singhal, Dhruv Kumar

Link: http://arxiv.org/abs/2404.04603v1

Abstract: This paper investigates the usage patterns of undergraduate and graduate students when engaging with large language models (LLMs) to tackle programming assignments in the context of advanced computing courses. Existing work predominantly focuses on the influence of LLMs in introductory programming contexts. Additionally, there is a scarcity of studies analyzing actual conversations between students and LLMs. Our study provides a comprehensive quantitative and qualitative analysis of raw interactions between students and LLMs within an advanced computing course (Distributed Systems) at an Indian University. We further complement this by conducting student interviews to gain deeper insights into their usage patterns. Our study shows that students make use of large language models (LLMs) in various ways: generating code or debugging code by identifying and fixing errors. They also copy and paste assignment descriptions into LLM interfaces for specific solutions, ask conceptual questions about complex programming ideas or theoretical concepts, and generate test cases to check code functionality and robustness. Our analysis includes over 4,000 prompts from 411 students and conducting interviews with 10 students. Our analysis shows that LLMs excel at generating boilerplate code and assisting in debugging, while students handle the integration of components and system troubleshooting. This aligns with the learning objectives of advanced computing courses, which are oriented towards teaching students how to build systems and troubleshoot, with less emphasis on generating code from scratch. Therefore, LLM tools can be leveraged to increase student productivity, as shown by the data we collected. This study contributes to the ongoing discussion on LLM use in education, advocating for their usefulness in advanced computing courses to complement higher-level learning and productivity.

TeleAware Robot: Designing Awareness-augmented Telepresence Robot for Remote Collaborative Locomotion

Authors: Ruyi Li, Yaxin Zhu, Min Liu, Yihang Zeng, Shanning Zhuang, Jiayi Fu, Yi Lu, Guyue Zhou, Can Liu, Jiangtao Gong

Link: http://arxiv.org/abs/2404.04579v1

Abstract: Telepresence robots can be used to support users to navigate an environment remotely and share the visiting experience with their social partners. Although such systems allow users to see and hear the remote environment and communicate with their partners via live video feed, this does not provide enough awareness of the environment and their remote partner's activities. In this paper, we introduce an awareness framework for collaborative locomotion in scenarios of onsite and remote users visiting a place together. From an observational study of small groups of people visiting exhibitions, we derived four design goals for enhancing the environmental and social awareness between social partners, and developed a set of awareness-enhancing techniques to add to a standard telepresence robot - named TeleAware robot. Through a controlled experiment simulating a guided exhibition visiting task, TeleAware robot showed the ability to lower the workload, facilitate closer social proximity, and improve mutual awareness and social presence compared with the standard one. We discuss the impact of mobility and roles of local and remote users, and provide insights for the future design of awareness-enhancing telepresence robot systems that facilitate collaborative locomotion.

A Map of Exploring Human Interaction patterns with LLM: Insights into Collaboration and Creativity

Authors: Jiayang Li, Jiale Li

Link: http://arxiv.org/abs/2404.04570v1

Abstract: The outstanding performance capabilities of large language model have driven the evolution of current AI system interaction patterns. This has led to considerable discussion within the Human-AI Interaction (HAII) community. Numerous studies explore this interaction from technical, design, and empirical perspectives. However, the majority of current literature reviews concentrate on interactions across the wider spectrum of AI, with limited attention given to the specific realm of interaction with LLM. We searched for articles on human interaction with LLM, selecting 110 relevant publications meeting consensus definition of Human-AI interaction. Subsequently, we developed a comprehensive Mapping Procedure, structured in five distinct stages, to systematically analyze and categorize the collected publications. Applying this methodical approach, we meticulously mapped the chosen studies, culminating in a detailed and insightful representation of the research landscape. Overall, our review presents an novel approach, introducing a distinctive mapping method, specifically tailored to evaluate human-LLM interaction patterns. We conducted a comprehensive analysis of the current research in related fields, employing clustering techniques for categorization, which enabled us to clearly delineate the status and challenges prevalent in each identified area.

Language Models as Critical Thinking Tools: A Case Study of Philosophers

Authors: Andre Ye, Jared Moore, Rose Novick, Amy X. Zhang

Link: http://arxiv.org/abs/2404.04516v1

Abstract: Current work in language models (LMs) helps us speed up or even skip thinking by accelerating and automating cognitive work. But can LMs help us with critical thinking -- thinking in deeper, more reflective ways which challenge assumptions, clarify ideas, and engineer new concepts? We treat philosophy as a case study in critical thinking, and interview 21 professional philosophers about how they engage in critical thinking and on their experiences with LMs. We find that philosophers do not find LMs to be useful because they lack a sense of selfhood (memory, beliefs, consistency) and initiative (curiosity, proactivity). We propose the selfhood-initiative model for critical thinking tools to characterize this gap. Using the model, we formulate three roles LMs could play as critical thinking tools: the Interlocutor, the Monitor, and the Respondent. We hope that our work inspires LM researchers to further develop LMs as critical thinking tools and philosophers and other 'critical thinkers' to imagine intellectually substantive uses of LMs.

Majority Voting of Doctors Improves Appropriateness of AI Reliance in Pathology

Authors: Hongyan Gu. Chunxu Yang, Shino Magaki, Neda Zarrin-Khameh, Nelli S. Lakis, Inma Cobos, Negar Khanlou, Xinhai R. Zhang, Jasmeet Assi, Joshua T. Byers, Ameer Hamza, Karam Han, Anders Meyer, Hilda Mirbaha, Carrie A. Mohila, Todd M. Stevens, Sara L. Stone, Wenzhong Yan, Mohammad Haeri, Xiang 'Anthony' Chen

Link: http://arxiv.org/abs/2404.04485v1

Abstract: As Artificial Intelligence (AI) making advancements in medical decision-making, there is a growing need to ensure doctors develop appropriate reliance on AI to avoid adverse outcomes. However, existing methods in enabling appropriate AI reliance might encounter challenges while being applied in the medical domain. With this regard, this work employs and provides the validation of an alternative approach -- majority voting -- to facilitate appropriate reliance on AI in medical decision-making. This is achieved by a multi-institutional user study involving 32 medical professionals with various backgrounds, focusing on the pathology task of visually detecting a pattern, mitoses, in tumor images. Here, the majority voting process was conducted by synthesizing decisions under AI assistance from a group of pathology doctors (pathologists). Two metrics were used to evaluate the appropriateness of AI reliance: Relative AI Reliance (RAIR) and Relative Self-Reliance (RSR). Results showed that even with groups of three pathologists, majority-voted decisions significantly increased both RAIR and RSR -- by approximately 9% and 31%, respectively -- compared to decisions made by one pathologist collaborating with AI. This increased appropriateness resulted in better precision and recall in the detection of mitoses. While our study is centered on pathology, we believe these insights can be extended to general high-stakes decision-making processes involving similar visual tasks.

2024-04-05

HIV Client Perspectives on Digital Health in Malawi

Authors: Lisa Orii, Caryl Feldacker, Jacqueline Madalitso Huwa, Agness Thawani, Evelyn Viola, Christine Kiruthu-Kamamia, Odala Sande, Hannock Tweya, Richard Anderson

Link: http://arxiv.org/abs/2404.04444v1

Abstract: eHealth has strong potential to advance HIV care in low- and middle-income countries. Given the sensitivity of HIV-related information and the risks associated with unintended HIV status disclosure, clients' privacy perceptions towards eHealth applications should be examined to develop client-centered technologies. Through focus group discussions with antiretroviral therapy (ART) clients from Lighthouse Trust, Malawi's public HIV care program, we explored perceptions of data security and privacy, including their understanding of data flow and their concerns about data confidentiality across several layers of data use. Our findings highlight the broad privacy concerns that affect ART clients' day-to-day choices, clients' trust in Malawi's health system, and their acceptance of, and familiarity with, point-of-care technologies used in HIV care. Based on our findings, we provide recommendations for building robust digital health systems in low- and middle-income countries with limited resources, nascent privacy regulations, and political will to take action to protect client data.

Humanoid Robots at work: where are we ?

Authors: Fabrice R. Noreils

Link: http://arxiv.org/abs/2404.04249v1

Abstract: Launched by Elon Musk and its Optimus, we are witnessing a new race in which many companies have already engaged. The objective it to put at work a new generation of humanoid robots in demanding industrial environments within 2 or 3 years. Is this objective realistic ? The aim of this document and its main contributions is to provide some hints by covering the following topics: First an analysis of 12 companies based on eight criteria that will help us to distinguish companies based on their maturity and approach to the market; second as these humanoids are very complex systems we will provide an overview of the technological challenges to be addressed; third when humanoids are deployed at scale, Operation and Maintenance become critical and the we will explore what is new with these complex machines; Finally Pilots are the last step to test the feasibility of a new system before mass deployment. This is an important step to test the maturity of a product and the strategy of the humanoid supplier to address a market and two pragmatic approaches will be discussed.

Authors: Diyi Yang, Caleb Ziems, William Held, Omar Shaikh, Michael S. Bernstein, John Mitchell

Link: http://arxiv.org/abs/2404.04204v1

Abstract: People rely on social skills like conflict resolution to communicate effectively and to thrive in both work and personal life. However, practice environments for social skills are typically out of reach for most people. How can we make social skill training more available, accessible, and inviting? Drawing upon interdisciplinary research from communication and psychology, this perspective paper identifies social skill barriers to enter specialized fields. Then we present a solution that leverages large language models for social skill training via a generic framework. Our AI Partner, AI Mentor framework merges experiential learning with realistic practice and tailored feedback. This work ultimately calls for cross-disciplinary innovation to address the broader implications for workforce development and social equality.

Designing Robots to Help Women

Authors: Martin Cooney, Lena Klasén, Fernando Alonso-Fernandez

Link: http://arxiv.org/abs/2404.04123v1

Abstract: Robots are being designed to help people in an increasing variety of settings--but seemingly little attention has been given so far to the specific needs of women, who represent roughly half of the world's population but are highly underrepresented in robotics. Here we used a speculative prototyping approach to explore this expansive design space: First, we identified some potential challenges of interest, including crimes and illnesses that disproportionately affect women, as well as potential opportunities for designers, which were visualized in five sketches. Then, one of the sketched scenarios was further explored by developing a prototype, of a robotic helper drone equipped with computer vision to detect hidden cameras that could be used to spy on women. While object detection introduced some errors, hidden cameras were identified with a reasonable accuracy of 80% (Intersection over Union (IoU) score: 0.40). Our aim is that the identified challenges and opportunities could help spark discussion and inspire designers, toward realizing a safer, more inclusive future through responsible use of technology.

ChoreoVis: Planning and Assessing Formations in Dance Choreographies

Authors: Samuel Beck, Nina Doerr, Kuno Kurzhals, Alexander Riedlinger, Fabian Schmierer, Michael Sedlmair, Steffen Koch

Link: http://arxiv.org/abs/2404.04100v1

Abstract: Sports visualization has developed into an active research field over the last decades. Many approaches focus on analyzing movement data recorded from unstructured situations, such as soccer. For the analysis of choreographed activities like formation dancing, however, the goal differs, as dancers follow specific formations in coordinated movement trajectories. To date, little work exists on how visual analytics methods can support such choreographed performances. To fill this gap, we introduce a new visual approach for planning and assessing dance choreographies. In terms of planning choreographies, we contribute a web application with interactive authoring tools and views for the dancers' positions and orientations, movement trajectories, poses, dance floor utilization, and movement distances. For assessing dancers' real-world movement trajectories, extracted by manual bounding box annotations, we developed a timeline showing aggregated trajectory deviations and a dance floor view for detailed trajectory comparison. Our approach was developed and evaluated in collaboration with dance instructors, showing that introducing visual analytics into this domain promises improvements in training efficiency for the future.

Hierarchical Neural Additive Models for Interpretable Demand Forecasts

Authors: Leif Feddersen, Catherine Cleophas

Link: http://arxiv.org/abs/2404.04070v1

Abstract: Demand forecasts are the crucial basis for numerous business decisions, ranging from inventory management to strategic facility planning. While machine learning (ML) approaches offer accuracy gains, their interpretability and acceptance are notoriously lacking. Addressing this dilemma, we introduce Hierarchical Neural Additive Models for time series (HNAM). HNAM expands upon Neural Additive Models (NAM) by introducing a time-series specific additive model with a level and interacting covariate components. Covariate interactions are only allowed according to a user-specified interaction hierarchy. For example, weekday effects may be estimated independently of other covariates, whereas a holiday effect may depend on the weekday and an additional promotion may depend on both former covariates that are lower in the interaction hierarchy. Thereby, HNAM yields an intuitive forecasting interface in which analysts can observe the contribution for each known covariate. We evaluate the proposed approach and benchmark its performance against other state-of-the-art machine learning and statistical models extensively on real-world retail data. The results reveal that HNAM offers competitive prediction performance whilst providing plausible explanations.

VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive Robots

Authors: Akhil Padmanabha, Jessie Yuan, Janavi Gupta, Zulekha Karachiwalla, Carmel Majidi, Henny Admoni, Zackory Erickson

Link: http://arxiv.org/abs/2404.04066v1

Abstract: Physically assistive robots present an opportunity to significantly increase the well-being and independence of individuals with motor impairments or other forms of disability who are unable to complete activities of daily living. Speech interfaces, especially ones that utilize Large Language Models (LLMs), can enable individuals to effectively and naturally communicate high-level commands and nuanced preferences to robots. Frameworks for integrating LLMs as interfaces to robots for high level task planning and code generation have been proposed, but fail to incorporate human-centric considerations which are essential while developing assistive interfaces. In this work, we present a framework for incorporating LLMs as speech interfaces for physically assistive robots, constructed iteratively with 3 stages of testing involving a feeding robot, culminating in an evaluation with 11 older adults at an independent living facility. We use both quantitative and qualitative data from the final study to validate our framework and additionally provide design guidelines for using LLMs as speech interfaces for assistive robots. Videos and supporting files are located on our project website: https://sites.google.com/andrew.cmu.edu/voicepilot/

Which Experimental Design is Better Suited for VQA Tasks? Eye Tracking Study on Cognitive Load, Performance, and Gaze Allocations

Authors: Sita A. Vriend, Sandeep Vidyapu, Amer Rama, Kun-Ting Chen, Daniel Weiskopf

Link: http://arxiv.org/abs/2404.04036v1

Abstract: We conducted an eye-tracking user study with 13 participants to investigate the influence of stimulus-question ordering and question modality on participants using visual question-answering (VQA) tasks. We examined cognitive load, task performance, and gaze allocations across five distinct experimental designs, aiming to identify setups that minimize the cognitive burden on participants. The collected performance and gaze data were analyzed using quantitative and qualitative methods. Our results indicate a significant impact of stimulus-question ordering on cognitive load and task performance, as well as a noteworthy effect of question modality on task performance. These findings offer insights for the experimental design of controlled user studies in visualization research.

Validation of critical maneuvers based on shared control

Authors: Mauricio Marcano, Joseba Sarabia, Asier Zubizarreta, Sergio Díaz

Link: http://arxiv.org/abs/2404.04011v1

Abstract: This paper presents the validation of shared control strategies for critical maneuvers in automated driving systems. Shared control involves collaboration between the driver and automation, allowing both parties to actively engage and cooperate at different levels of the driving task. The involvement of the driver adds complexity to the control loop, necessitating comprehensive validation methodologies. The proposed approach focuses on two critical maneuvers: overtaking in low visibility scenarios and lateral evasive actions. A modular architecture with an arbitration module and shared control algorithms is implemented, primarily focusing on the lateral control of the vehicle. The validation is conducted using a dynamic simulator, involving 8 real drivers interacting with a virtual environment. The results demonstrate improved safety and user acceptance, indicating the effectiveness of the shared control strategies in comparison with no shared-control support. Future work involves implementing shared control in drive-by-wire systems to enhance safety and driver comfort during critical maneuvers. Overall, this research contributes to the development and validation of shared control approaches in automated driving systems.

From Theory to Comprehension: A Comparative Study of Differential Privacy and $k$-Anonymity

Authors: Saskia Nuñez von Voigt, Luise Mehner, Florian Tschorsch

Link: http://arxiv.org/abs/2404.04006v1

Abstract: The notion of $\varepsilon$-differential privacy is a widely used concept of providing quantifiable privacy to individuals. However, it is unclear how to explain the level of privacy protection provided by a differential privacy mechanism with a set $\varepsilon$. In this study, we focus on users' comprehension of the privacy protection provided by a differential privacy mechanism. To do so, we study three variants of explaining the privacy protection provided by differential privacy: (1) the original mathematical definition; (2) $\varepsilon$ translated into a specific privacy risk; and (3) an explanation using the randomized response technique. We compare users' comprehension of privacy protection employing these explanatory models with their comprehension of privacy protection of $k$-anonymity as baseline comprehensibility. Our findings suggest that participants' comprehension of differential privacy protection is enhanced by the privacy risk model and the randomized response-based model. Moreover, our results confirm our intuition that privacy protection provided by $k$-anonymity is more comprehensible.

Approximate UMAP allows for high-rate online visualization of high-dimensional data streams

Authors: Peter Wassenaar, Pierre Guetschel, Michael Tangermann

Link: http://arxiv.org/abs/2404.04001v1

Abstract: In the BCI field, introspection and interpretation of brain signals are desired for providing feedback or to guide rapid paradigm prototyping but are challenging due to the high noise level and dimensionality of the signals. Deep neural networks are often introspected by transforming their learned feature representations into 2- or 3-dimensional subspace visualizations using projection algorithms like Uniform Manifold Approximation and Projection (UMAP). Unfortunately, these methods are computationally expensive, making the projection of data streams in real-time a non-trivial task. In this study, we introduce a novel variant of UMAP, called approximate UMAP (aUMAP). It aims at generating rapid projections for real-time introspection. To study its suitability for real-time projecting, we benchmark the methods against standard UMAP and its neural network counterpart parametric UMAP. Our results show that approximate UMAP delivers projections that replicate the projection space of standard UMAP while decreasing projection speed by an order of magnitude and maintaining the same training time.

Tensions between Preference and Performance: Designing for Visual Exploration of Multi-frequency Medical Network Data

Authors: Christian Knoll, Laura Koesten, Isotta Rigoni, Serge Vulliémoz, Torsten Möller

Link: http://arxiv.org/abs/2404.03965v1

Abstract: The analysis of complex high-dimensional data is a common task in many domains, resulting in bespoke visual exploration tools. Expectations and practices of domain experts as users do not always align with visualization theory. In this paper, we report on a design study in the medical domain where we developed two high-fidelity prototypes encoding EEG-derived brain network data with different types of visualizations. We evaluate these prototypes regarding effectiveness, efficiency, and preference with two groups: participants with domain knowledge (domain experts in medical research) and those without domain knowledge, both groups having little or no visualization experience. A requirement analysis and study of low-fidelity prototypes revealed a strong preference for a novel and aesthetically pleasing visualization design, as opposed to a design that is considered more optimal based on visualization theory. Our study highlights the pros and cons of both approaches, discussing trade-offs between task-specific measurements and subjective preference. While the aesthetically pleasing and novel low-fidelity prototype was favored, the results of our evaluation show that, in most cases, this was not reflected in participants' performance or subjective preference for the high-fidelity prototypes.

Open vocabulary keyword spotting through transfer learning from speech synthesis

Authors: Kesavaraj V, Anil Kumar Vuppala

Link: http://arxiv.org/abs/2404.03914v1

Abstract: Identifying keywords in an open-vocabulary context is crucial for personalizing interactions with smart devices. Previous approaches to open vocabulary keyword spotting dependon a shared embedding space created by audio and text encoders. However, these approaches suffer from heterogeneous modality representations (i.e., audio-text mismatch). To address this issue, our proposed framework leverages knowledge acquired from a pre-trained text-to-speech (TTS) system. This knowledge transfer allows for the incorporation of awareness of audio projections into the text representations derived from the text encoder. The performance of the proposed approach is compared with various baseline methods across four different datasets. The robustness of our proposed model is evaluated by assessing its performance across different word lengths and in an Out-of-Vocabulary (OOV) scenario. Additionally, the effectiveness of transfer learning from the TTS system is investigated by analyzing its different intermediate representations. The experimental results indicate that, in the challenging LibriPhrase Hard dataset, the proposed approach outperformed the cross-modality correspondence detector (CMCD) method by a significant improvement of 8.22% in area under the curve (AUC) and 12.56% in equal error rate (EER).

Effects of Multisensory Feedback on the Perception and Performance of Virtual Reality Hand-Retargeted Interaction

Authors: Hyunyoung Jang, Jinwook Kim, Jeongmi Lee

Link: http://arxiv.org/abs/2404.03899v1

Abstract: Retargeting methods that modify the visual representation of real movements have been widely used to expand the interaction space and create engaging virtual reality experiences. For optimal user experience and performance, it is essential to specify the perception of retargeting and utilize the appropriate range of modification parameters. However, previous studies mostly concentrated on whether users perceived the target sense or not and rarely examined the perceptual accuracy and sensitivity to retargeting. Moreover, it is unknown how the perception and performance in hand-retargeted interactions are influenced by multisensory feedback. In this study, we used rigorous psychophysical methods to specify users' perceptual accuracy and sensitivity to hand-retargeting and provide acceptable ranges of retargeting parameters. We also presented different multisensory feedback simultaneously with the retargeting to probe its effect on users' perception and task performance. The experimental results showed that providing continuous multisensory feedback, proportionate to the distance between the virtual hand and the targeted destination, heightened the accuracy of users' perception of hand retargeting without altering their perceptual sensitivity. Furthermore, the utilization of multisensory feedback considerably improved the precision of task performance, particularly at lower gain factors. Based on these findings, we propose design guidelines and potential applications of VR hand-retargeted interactions and multisensory feedback for optimal user experience and performance.

Authors: Eason Chen, Zimo Xiao, Justa Liang, Damien Chen, Pierce Hung, Kostas Kryptos Chalkias

Link: http://arxiv.org/abs/2404.03845v1

Abstract: In this paper, we developed a blockchain application to demonstrate the functionality of Sui's recent innovations: Zero Knowledge Login and Sponsored Transactions. Zero Knowledge Login allows users to create and access their blockchain wallets just with their OAuth accounts (e.g., Google, Facebook, Twitch), while Sponsored Transactions eliminate the need for users to prepare transaction fees, as they can delegate fees to sponsors' accounts. Additionally, thanks to Sui's Storage Rebate feature, sponsors in Sponsored Transactions can profit from the sponsorship, achieving a win-win and sustainable service model. Zero Knowledge Login and Sponsored Transactions are pivotal in overcoming key challenges novice blockchain users face, particularly in managing private keys and depositing initial transaction fees. By addressing these challenges in the user experience of blockchain, Sui makes the blockchain more accessible and engaging for novice users and paves the way for the broader adoption of blockchain applications in everyday life.

2024-04-04

SleepVST: Sleep Staging from Near-Infrared Video Signals using Pre-Trained Transformers

Authors: Jonathan F. Carter, João Jorge, Oliver Gibson, Lionel Tarassenko

Link: http://arxiv.org/abs/2404.03831v1

Abstract: Advances in camera-based physiological monitoring have enabled the robust, non-contact measurement of respiration and the cardiac pulse, which are known to be indicative of the sleep stage. This has led to research into camera-based sleep monitoring as a promising alternative to "gold-standard" polysomnography, which is cumbersome, expensive to administer, and hence unsuitable for longer-term clinical studies. In this paper, we introduce SleepVST, a transformer model which enables state-of-the-art performance in camera-based sleep stage classification (sleep staging). After pre-training on contact sensor data, SleepVST outperforms existing methods for cardio-respiratory sleep staging on the SHHS and MESA datasets, achieving total Cohen's kappa scores of 0.75 and 0.77 respectively. We then show that SleepVST can be successfully transferred to cardio-respiratory waveforms extracted from video, enabling fully contact-free sleep staging. Using a video dataset of 50 nights, we achieve a total accuracy of 78.8% and a Cohen's $\kappa$ of 0.71 in four-class video-based sleep staging, setting a new state-of-the-art in the domain.

I Did Not Notice: A Comparison of Immersive Analytics with Augmented and Virtual Reality

Authors: Xiaoyan Zhou, Anil Ufuk Batmaz, Adam S. Williams, Dylan Schreiber, Francisco Ortega

Link: http://arxiv.org/abs/2404.03814v1

Abstract: Immersive environments enable users to engage in embodied interaction, enhancing the sensemaking processes involved in completing tasks such as immersive analytics. Previous comparative studies on immersive analytics using augmented and virtual realities have revealed that users employ different strategies for data interpretation and text-based analytics depending on the environment. Our study seeks to investigate how augmented and virtual reality influences sensemaking processes in quantitative immersive analytics. Our results, derived from a diverse group of participants, indicate that users demonstrate comparable performance in both environments. However, it was observed that users exhibit a higher tolerance for cognitive load in VR and travel further in AR. Based on our findings, we recommend providing users with the option to switch between AR and VR, thereby enabling them to select an environment that aligns with their preferences and task requirements.

Authors: Mukund Telukunta, Sukruth Rao, Gabriella Stickney, Venkata Sriram Siddardh Nadendla, Casey Canfield

Link: http://arxiv.org/abs/2404.03800v1

Abstract: Modern kidney placement incorporates several intelligent recommendation systems which exhibit social discrimination due to biases inherited from training data. Although initial attempts were made in the literature to study algorithmic fairness in kidney placement, these methods replace true outcomes with surgeons' decisions due to the long delays involved in recording such outcomes reliably. However, the replacement of true outcomes with surgeons' decisions disregards expert stakeholders' biases as well as social opinions of other stakeholders who do not possess medical expertise. This paper alleviates the latter concern and designs a novel fairness feedback survey to evaluate an acceptance rate predictor (ARP) that predicts a kidney's acceptance rate in a given kidney-match pair. The survey is launched on Prolific, a crowdsourcing platform, and public opinions are collected from 85 anonymous crowd participants. A novel social fairness preference learning algorithm is proposed based on minimizing social feedback regret computed using a novel logit-based fairness feedback model. The proposed model and learning algorithm are both validated using simulation experiments as well as Prolific data. Public preferences towards group fairness notions in the context of kidney placement have been estimated and discussed in detail. The specific ARP tested in the Prolific survey has been deemed fair by the participants.

Revisiting Categorical Color Perception in Scatterplots: Sequential, Diverging, and Categorical Palettes

Authors: Chin Tseng, Arran Zeyu Wang, Ghulam Jilani Quadri, Danielle Albers Szafir

Link: http://arxiv.org/abs/2404.03787v1

Abstract: Existing guidelines for categorical color selection are heuristic, often grounded in intuition rather than empirical studies of readers' abilities. While design conventions recommend palettes maximize hue differences, more recent exploratory findings indicate other factors, such as lightness, may play a role in effective categorical palette design. We conducted a crowdsourced experiment on mean value judgments in multi-class scatterplots using five color palette families--single-hue sequential, multi-hue sequential, perceptually-uniform multi-hue sequential, diverging, and multi-hue categorical--that differ in how they manipulate hue and lightness. Participants estimated relative mean positions in scatterplots containing 2 to 10 categories using 20 colormaps. Our results confirm heuristic guidance that hue-based categorical palettes are most effective. However, they also provide additional evidence that scalable categorical encoding relies on more than hue variance.

Fakes of Varying Shades: How Warning Affects Human Perception and Engagement Regarding LLM Hallucinations

Authors: Mahjabin Nahar, Haeseung Seo, Eun-Ju Lee, Aiping Xiong, Dongwon Lee

Link: http://arxiv.org/abs/2404.03745v1

Abstract: The widespread adoption and transformative effects of large language models (LLMs) have sparked concerns regarding their capacity to produce inaccurate and fictitious content, referred to as `hallucinations'. Given the potential risks associated with hallucinations, humans should be able to identify them. This research aims to understand the human perception of LLM hallucinations by systematically varying the degree of hallucination (genuine, minor hallucination, major hallucination) and examining its interaction with warning (i.e., a warning of potential inaccuracies: absent vs. present). Participants (N=419) from Prolific rated the perceived accuracy and engaged with content (e.g., like, dislike, share) in a Q/A format. Results indicate that humans rank content as truthful in the order genuine > minor hallucination > major hallucination and user engagement behaviors mirror this pattern. More importantly, we observed that warning improves hallucination detection without significantly affecting the perceived truthfulness of genuine content. We conclude by offering insights for future tools to aid human detection of hallucinations.

Explaining Explainability: Understanding Concept Activation Vectors

Authors: Angus Nicolson, Lisa Schut, J. Alison Noble, Yarin Gal

Link: http://arxiv.org/abs/2404.03713v1

Abstract: Recent interpretability methods propose using concept-based explanations to translate the internal representations of deep learning models into a language that humans are familiar with: concepts. This requires understanding which concepts are present in the representation space of a neural network. One popular method for finding concepts is Concept Activation Vectors (CAVs), which are learnt using a probe dataset of concept exemplars. In this work, we investigate three properties of CAVs. CAVs may be: (1) inconsistent between layers, (2) entangled with different concepts, and (3) spatially dependent. Each property provides both challenges and opportunities in interpreting models. We introduce tools designed to detect the presence of these properties, provide insight into how they affect the derived explanations, and provide recommendations to minimise their impact. Understanding these properties can be used to our advantage. For example, we introduce spatially dependent CAVs to test if a model is translation invariant with respect to a specific concept and class. Our experiments are performed on ImageNet and a new synthetic dataset, Elements. Elements is designed to capture a known ground truth relationship between concepts and classes. We release this dataset to facilitate further research in understanding and evaluating interpretability methods.

Creator Hearts: Investigating the Impact Positive Signals from YouTube Creators in Shaping Comment Section Behavior

Authors: Frederick Choi, Charlotte Lambert, Vinay Koshy, Sowmya Pratipati, Tue Do, Eshwar Chandrasekharan

Link: http://arxiv.org/abs/2404.03612v1

Abstract: Much of the research in online moderation focuses on punitive actions. However, emerging research has shown that positive reinforcement is effective at encouraging desirable behavior on online platforms. We extend this research by studying the "creator heart" feature on YouTube, quantifying their primary effects on comments that receive hearts and on videos where hearts have been given. We find that creator hearts increased the visibility of comments, and increased the amount of positive engagement they received from other users. We also find that the presence of a creator hearted comment soon after a video is published can incentivize viewers to comment, increasing the total engagement with the video over time. We discuss the potential for creators to use hearts to shape behavior in their communities by highlighting, rewarding, and incentivizing desirable behaviors from users. We discuss avenues for extending our study to understanding positive signals from moderators on other platforms.

Integrating Large Language Models with Multimodal Virtual Reality Interfaces to Support Collaborative Human-Robot Construction Work

Authors: Somin Park, Carol C. Menassa, Vineet R. Kamat

Link: http://arxiv.org/abs/2404.03498v1

Abstract: In the construction industry, where work environments are complex, unstructured and often dangerous, the implementation of Human-Robot Collaboration (HRC) is emerging as a promising advancement. This underlines the critical need for intuitive communication interfaces that enable construction workers to collaborate seamlessly with robotic assistants. This study introduces a conversational Virtual Reality (VR) interface integrating multimodal interaction to enhance intuitive communication between construction workers and robots. By integrating voice and controller inputs with the Robot Operating System (ROS), Building Information Modeling (BIM), and a game engine featuring a chat interface powered by a Large Language Model (LLM), the proposed system enables intuitive and precise interaction within a VR setting. Evaluated by twelve construction workers through a drywall installation case study, the proposed system demonstrated its low workload and high usability with succinct command inputs. The proposed multimodal interaction system suggests that such technological integration can substantially advance the integration of robotic assistants in the construction industry.

Agora Elevator Bodily Sensation Study -- a report

Authors: Rebekah Rousi

Link: http://arxiv.org/abs/2404.03356v1

Abstract: This study set out to examine the relationship between expressed social emotions (i.e. that what people say they are feeling) and physical sensations, the connection between emotion and bodily experience. It additionally provided the opportunity to investigate how the neurological findings of gender differences can be observed in practice, what difference does it make in behaviour and judgment that we have varying levels of mirror neuron activity? The following report documents the study, procedure, results and findings.

Influence of Gameplay Duration, Hand Tracking, and Controller Based Control Methods on UX in VR

Authors: Tanja Kojić, Maurizio Vergari, Simon Knuth, Maximilian Warsinke, Sebastian Möller, Jan-Niklas Voigt-Antons

Link: http://arxiv.org/abs/2404.03337v1

Abstract: Inside-out tracking is growing popular in consumer VR, enhancing accessibility. It uses HMD camera data and neural networks for effective hand tracking. However, limited user experience studies have compared this method to traditional controllers, with no consensus on the optimal control technique. This paper investigates the impact of control methods and gaming duration on VR user experience, hypothesizing hand tracking might be preferred for short sessions and by users new to VR due to its simplicity. Through a lab study with twenty participants, evaluating presence, emotional response, UX quality, and flow, findings revealed control type and session length affect user experience without significant interaction. Controllers were generally superior, attributed to their reliability, and longer sessions increased presence and realism. The study found that individuals with more VR experience were more inclined to recommend hand tracking to others, which contradicted predictions.

Exploring Emotions in Multi-componential Space using Interactive VR Games

Authors: Rukshani Somarathna, Gelareh Mohammadi

Link: http://arxiv.org/abs/2404.03239v1

Abstract: Emotion understanding is a complex process that involves multiple components. The ability to recognise emotions not only leads to new context awareness methods but also enhances system interaction's effectiveness by perceiving and expressing emotions. Despite the attention to discrete and dimensional models, neuroscientific evidence supports those emotions as being complex and multi-faceted. One framework that resonated well with such findings is the Component Process Model (CPM), a theory that considers the complexity of emotions with five interconnected components: appraisal, expression, motivation, physiology and feeling. However, the relationship between CPM and discrete emotions has not yet been fully explored. Therefore, to better understand emotions underlying processes, we operationalised a data-driven approach using interactive Virtual Reality (VR) games and collected multimodal measures (self-reports, physiological and facial signals) from 39 participants. We used Machine Learning (ML) methods to identify the unique contributions of each component to emotion differentiation. Our results showed the role of different components in emotion differentiation, with the model including all components demonstrating the most significant contribution. Moreover, we found that at least five dimensions are needed to represent the variation of emotions in our dataset. These findings also have implications for using VR environments in emotion research and highlight the role of physiological signals in emotion recognition within such environments.

NLP4Gov: A Comprehensive Library for Computational Policy Analysis

Authors: Mahasweta Chakraborti, Sailendra Akash Bonagiri, Santiago Virgüez-Ruiz, Seth Frey

Link: http://arxiv.org/abs/2404.03206v1

Abstract: Formal rules and policies are fundamental in formally specifying a social system: its operation, boundaries, processes, and even ontology. Recent scholarship has highlighted the role of formal policy in collective knowledge creation, game communities, the production of digital public goods, and national social media governance. Researchers have shown interest in how online communities convene tenable self-governance mechanisms to regulate member activities and distribute rights and privileges by designating responsibilities, roles, and hierarchies. We present NLP4Gov, an interactive kit to train and aid scholars and practitioners alike in computational policy analysis. The library explores and integrates methods and capabilities from computational linguistics and NLP to generate semantic and symbolic representations of community policies from text records. Versatile, documented, and accessible, NLP4Gov provides granular and comparative views into institutional structures and interactions, along with other information extraction capabilities for downstream analysis.

Towards Collaborative Family-Centered Design for Online Safety, Privacy and Security

Authors: Mamtaj Akter, Zainab Agha, Ashwaq Alsoubai, Naima Ali, Pamela Wisniewski

Link: http://arxiv.org/abs/2404.03165v1

Abstract: Traditional online safety technologies often overly restrict teens and invade their privacy, while parents often lack knowledge regarding their digital privacy. As such, prior researchers have called for more collaborative approaches on adolescent online safety and networked privacy. In this paper, we propose family-centered approaches to foster parent-teen collaboration in ensuring their mobile privacy and online safety while respecting individual privacy, to enhance open discussion and teens' self-regulation. However, challenges such as power imbalances and conflicts with family values arise when implementing such approaches, making parent-teen collaboration difficult. Therefore, attending the family-centered design workshop will provide an invaluable opportunity for us to discuss these challenges and identify best research practices for the future of collaborative online safety and privacy within families.

Biodegradable Interactive Materials

Authors: Zhihan Zhang, Mallory Parker, Kuotian Liao, Jerry Cao, Anandghan Waghmare, Joseph Breda, Chris Matsumura, Serena Eley, Eleftheria Roumeli, Shwetak Patel, Vikram Iyer

Link: http://arxiv.org/abs/2404.03130v1

Abstract: The sense of touch is fundamental to how we interact with the physical and digital world. Conventional interactive surfaces and tactile interfaces use electronic sensors embedded into objects, however this approach poses serious challenges both for environmental sustainability and a future of truly ubiquitous interaction systems where information is encoded into everyday objects. In this work, we present Biodegradable Interactive Materials: backyard-compostable interactive interfaces that leverage information encoded in material properties. Inspired by natural systems, we propose an architecture that programmatically encodes multidimensional information into materials themselves and combines them with wearable devices that extend human senses to perceive the embedded data. We combine unrefined biological matter from plants and algae like chlorella with natural minerals like graphite and magnetite to produce materials with varying electrical, magnetic, and surface properties. We perform in-depth analysis using physics models, computational simulations, and real-world experiments to characterize their information density and develop decoding methods. Our passive, chip-less materials can robustly encode 12 bits of information, equivalent to 4096 unique classes. We further develop wearable device prototypes that can decode this information during touch interactions using off-the-shelf sensors. We demonstrate sample applications such as customized buttons, tactile maps, and interactive surfaces. We further demonstrate the natural degradation of these interactive materials in degrade outdoors within 21 days and perform a comparative environmental analysis of the benefits of this approach.

2024-04-03

Writing with AI Lowers Psychological Ownership, but Longer Prompts Can Help

Authors: Nikhita Joshi, Daniel Vogel

Link: http://arxiv.org/abs/2404.03108v1

Abstract: Feelings of something belonging to someone is called "psychological ownership." A common assumption is that writing with generative AI lowers psychological ownership, but the extent to which this occurs and the role of prompt length are unclear. We report on two experiments to better understand the relationship between psychological ownership and prompt length. Participants wrote short stories either completely by themselves or wrote prompts of varying lengths, enforced through word limits. Results show that when participants wrote longer prompts, they had higher levels of psychological ownership. Their comments suggest they felt encouraged to think more about their prompts and include more details about the story plot. However, these benefits plateaued when the prompt length was 75-100% of the target story length. Based on these results, we propose prompt entry interface designs that nudge users with soft and hard constraints to write longer prompts for increased psychological ownership.

Talaria: Interactively Optimizing Machine Learning Models for Efficient Inference

Authors: Fred Hohman, Chaoqun Wang, Jinmook Lee, Jochen Görtler, Dominik Moritz, Jeffrey P Bigham, Zhile Ren, Cecile Foret, Qi Shan, Xiaoyi Zhang

Link: http://arxiv.org/abs/2404.03085v1

Abstract: On-device machine learning (ML) moves computation from the cloud to personal devices, protecting user privacy and enabling intelligent user experiences. However, fitting models on devices with limited resources presents a major technical challenge: practitioners need to optimize models and balance hardware metrics such as model size, latency, and power. To help practitioners create efficient ML models, we designed and developed Talaria: a model visualization and optimization system. Talaria enables practitioners to compile models to hardware, interactively visualize model statistics, and simulate optimizations to test the impact on inference metrics. Since its internal deployment two years ago, we have evaluated Talaria using three methodologies: (1) a log analysis highlighting its growth of 800+ practitioners submitting 3,600+ models; (2) a usability survey with 26 users assessing the utility of 20 Talaria features; and (3) a qualitative interview with the 7 most active users about their experience using Talaria.

Toward Safe Evolution of Artificial Intelligence (AI) based Conversational Agents to Support Adolescent Mental and Sexual Health Knowledge Discovery

Authors: Jinkyung Park, Vivek Singh, Pamela Wisniewski

Link: http://arxiv.org/abs/2404.03023v1

Abstract: Following the recent release of various Artificial Intelligence (AI) based Conversation Agents (CAs), adolescents are increasingly using CAs for interactive knowledge discovery on sensitive topics, including mental and sexual health topics. Exploring such sensitive topics through online search has been an essential part of adolescent development, and CAs can support their knowledge discovery on such topics through human-like dialogues. Yet, unintended risks have been documented with adolescents' interactions with AI-based CAs, such as being exposed to inappropriate content, false information, and/or being given advice that is detrimental to their mental and physical well-being (e.g., to self-harm). In this position paper, we discuss the current landscape and opportunities for CAs to support adolescents' mental and sexual health knowledge discovery. We also discuss some of the challenges related to ensuring the safety of adolescents when interacting with CAs regarding sexual and mental health topics. We call for a discourse on how to set guardrails for the safe evolution of AI-based CAs for adolescents.

Generative AI in the Wild: Prospects, Challenges, and Strategies

Authors: Yuan Sun, Eunchae Jang, Fenglong Ma, Ting Wang

Link: http://arxiv.org/abs/2404.04101v1

Abstract: Propelled by their remarkable capabilities to generate novel and engaging content, Generative Artificial Intelligence (GenAI) technologies are disrupting traditional workflows in many industries. While prior research has examined GenAI from a techno-centric perspective, there is still a lack of understanding about how users perceive and utilize GenAI in real-world scenarios. To bridge this gap, we conducted semi-structured interviews with (N=18) GenAI users in creative industries, investigating the human-GenAI co-creation process within a holistic LUA (Learning, Using and Assessing) framework. Our study uncovered an intriguingly complex landscape: Prospects-GenAI greatly fosters the co-creation between human expertise and GenAI capabilities, profoundly transforming creative workflows; Challenges-Meanwhile, users face substantial uncertainties and complexities arising from resource availability, tool usability, and regulatory compliance; Strategies-In response, users actively devise various strategies to overcome many of such challenges. Our study reveals key implications for the design of future GenAI tools.

ASAP: Interpretable Analysis and Summarization of AI-generated Image Patterns at Scale

Authors: Jinbin Huang, Chen Chen, Aditi Mishra, Bum Chul Kwon, Zhicheng Liu, Chris Bryan

Link: http://arxiv.org/abs/2404.02990v1

Abstract: Generative image models have emerged as a promising technology to produce realistic images. Despite potential benefits, concerns grow about its misuse, particularly in generating deceptive images that could raise significant ethical, legal, and societal issues. Consequently, there is growing demand to empower users to effectively discern and comprehend patterns of AI-generated images. To this end, we developed ASAP, an interactive visualization system that automatically extracts distinct patterns of AI-generated images and allows users to interactively explore them via various views. To uncover fake patterns, ASAP introduces a novel image encoder, adapted from CLIP, which transforms images into compact "distilled" representations, enriched with information for differentiating authentic and fake images. These representations generate gradients that propagate back to the attention maps of CLIP's transformer block. This process quantifies the relative importance of each pixel to image authenticity or fakeness, exposing key deceptive patterns. ASAP enables the at scale interactive analysis of these patterns through multiple, coordinated visualizations. This includes a representation overview with innovative cell glyphs to aid in the exploration and qualitative evaluation of fake patterns across a vast array of images, as well as a pattern view that displays authenticity-indicating patterns in images and quantifies their impact. ASAP supports the analysis of cutting-edge generative models with the latest architectures, including GAN-based models like proGAN and diffusion models like the latent diffusion model. We demonstrate ASAP's usefulness through two usage scenarios using multiple fake image detection benchmark datasets, revealing its ability to identify and understand hidden patterns in AI-generated images, especially in detecting fake human faces produced by diffusion-based techniques.

Fragmented Moments, Balanced Choices: How Do People Make Use of Their Waiting Time?

Authors: Jian Zheng, Ge Gao

Link: http://arxiv.org/abs/2404.02880v1

Abstract: Everyone spends some time waiting every day. HCI research has developed tools for boosting productivity while waiting. However, little is known about how people naturally spend their waiting time. We conducted an experience sampling study with 21 working adults who used a mobile app to report their daily waiting time activities over two weeks. The aim of this study is to understand the activities people do while waiting and the effect of situational factors. We found that participants spent about 60% of their waiting time on leisure activities, 20% on productive activities, and 20% on maintenance activities. These choices are sensitive to situational factors, including accessible device, location, and certain routines of the day. Our study complements previous ones by demonstrating that people purpose waiting time for various goals beyond productivity and to maintain work-life balance. Our findings shed light on future empirical research and system design for time management.

The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers

Authors: Hussein Mozannar, Valerie Chen, Mohammed Alsobay, Subhro Das, Sebastian Zhao, Dennis Wei, Manish Nagireddy, Prasanna Sattigeri, Ameet Talwalkar, David Sontag

Link: http://arxiv.org/abs/2404.02806v1

Abstract: Evaluation of large language models (LLMs) for code has primarily relied on static benchmarks, including HumanEval (Chen et al., 2021), which measure the ability of LLMs to generate complete code that passes unit tests. As LLMs are increasingly used as programmer assistants, we study whether gains on existing benchmarks translate to gains in programmer productivity when coding with LLMs, including time spent coding. In addition to static benchmarks, we investigate the utility of preference metrics that might be used as proxies to measure LLM helpfulness, such as code acceptance or copy rates. To do so, we introduce RealHumanEval, a web interface to measure the ability of LLMs to assist programmers, through either autocomplete or chat support. We conducted a user study (N=213) using RealHumanEval in which users interacted with six LLMs of varying base model performance. Despite static benchmarks not incorporating humans-in-the-loop, we find that improvements in benchmark performance lead to increased programmer productivity; however gaps in benchmark versus human performance are not proportional -- a trend that holds across both forms of LLM support. In contrast, we find that programmer preferences do not correlate with their actual performance, motivating the need for better, human-centric proxy signals. We also open-source RealHumanEval to enable human-centric evaluation of new models and the study data to facilitate efforts to improve code models.

AI and personalized learning: bridging the gap with modern educational goals

Authors: Kristjan-Julius Laak, Jaan Aru

Link: http://arxiv.org/abs/2404.02798v1

Abstract: Personalized learning (PL) aspires to provide an alternative to the one-size-fits-all approach in education. Technology-based PL solutions have shown notable effectiveness in enhancing learning performance. However, their alignment with the broader goals of modern education is inconsistent across technologies and research areas. In this paper, we examine the characteristics of AI-driven PL solutions in light of the OECD Learning Compass 2030 goals. Our analysis indicates a gap between the objectives of modern education and the current direction of PL. We identify areas where most present-day PL technologies could better embrace essential elements of contemporary education, such as collaboration, cognitive engagement, and the development of general competencies. While the present PL solutions are instrumental in aiding learning processes, the PL envisioned by educational experts extends beyond simple technological tools and requires a holistic change in the educational system. Finally, we explore the potential of large language models, such as ChatGPT, and propose a hybrid model that blends artificial intelligence with a collaborative, teacher-facilitated approach to personalized learning.

IEEE VIS Workshop on Visualization for Climate Action and Sustainability

Authors: Benjamin Bach, Fanny Chevalier, Helen-Nicole Kostis, Mark Subbaro, Yvonne Jansen, Robert Soden

Link: http://arxiv.org/abs/2404.02743v1

Abstract: This first workshop on visualization for climate action and sustainability aims to explore and consolidate the role of data visualization in accelerating action towards addressing the current environmental crisis. Given the urgency and impact of the environmental crisis, we ask how our skills, research methods, and innovations can help by empowering people and organizations. We believe visualization holds an enormous power to aid understanding, decision making, communication, discussion, participation, education, and exploration of complex topics around climate action and sustainability. Hence, this workshop invites submissions and discussion around these topics with the goal of establishing a visible and actionable link between these fields and their respective stakeholders. The workshop solicits work-in-progress and research papers as well as pictorials and interactive demos from the whole range of visualization research (dashboards, interactive spaces, scientific visualization, storytelling, visual analytics, explainability etc.), within the context of environmentalism (climate science, sustainability, energy, circular economy, biodiversity, etc.) and across a range of scenarios from public awareness and understanding, visual analysis, expert decision making, science communication, personal decision making etc. After presentations of submissions, the workshop will feature dedicated discussion groups around data driven interactive experiences for the public, and tools for personal and professional decision making.

Evolving Agents: Interactive Simulation of Dynamic and Diverse Human Personalities

Authors: Jiale Li, Jiayang Li, Jiahao Chen, Yifan Li, Shijie Wang, Hugo Zhou, Minjun Ye, Yunsheng Su

Link: http://arxiv.org/abs/2404.02718v1

Abstract: Human-like Agents with diverse and dynamic personality could serve as an important design probe in the process of user-centered design, thereby enabling designers to enhance the user experience of interactive application.In this article, we introduce Evolving Agents, a novel agent architecture that consists of two systems: Personality and Behavior. The Personality system includes three modules: Cognition, Emotion and Character Growth. The Behavior system comprises two modules: Planning and Action. We also build a simulation platform that enables agents to interact with the environment and other agents. Evolving Agents can simulate the human personality evolution process. Compared to its initial state, agents' personality and behavior patterns undergo believable development after several days of simulation. Agents reflect on their behavior to reason and develop new personality traits. These traits, in turn, generate new behavior patterns, forming a feedback loop-like personality evolution.In our experiment, we utilized simulation platform with 10 agents for evaluation. During the evaluation, these agents experienced believable and inspirational personality evolution. Through ablation and control experiments, we demonstrated the outstanding effectiveness of agent personality evolution and all modules of our agent architecture contribute to creating believable human-like agents with diverse and dynamic personalities. We also demonstrated through workshops how Evolving Agents could inspire designers.

Unblind Text Inputs: Predicting Hint-text of Text Input in Mobile Apps via LLM

Authors: Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Yuekai Huang, Jun Hu, Qing Wang

Link: http://arxiv.org/abs/2404.02706v1

Abstract: Mobile apps have become indispensable for accessing and participating in various environments, especially for low-vision users. Users with visual impairments can use screen readers to read the content of each screen and understand the content that needs to be operated. Screen readers need to read the hint-text attribute in the text input component to remind visually impaired users what to fill in. Unfortunately, based on our analysis of 4,501 Android apps with text inputs, over 0.76 of them are missing hint-text. These issues are mostly caused by developers' lack of awareness when considering visually impaired individuals. To overcome these challenges, we developed an LLM-based hint-text generation model called HintDroid, which analyzes the GUI information of input components and uses in-context learning to generate the hint-text. To ensure the quality of hint-text generation, we further designed a feedback-based inspection mechanism to further adjust hint-text. The automated experiments demonstrate the high BLEU and a user study further confirms its usefulness. HintDroid can not only help visually impaired individuals, but also help ordinary people understand the requirements of input components. HintDroid demo video: https://youtu.be/FWgfcctRbfI.

Spatial Summation of Localized Pressure for Haptic Sensory Prostheses

Authors: Sreela Kodali, Cihualpilli Camino Cruz, Thomas C. Bulea, Kevin S. Rao Diana Bharucha-Goebel, Alexander T. Chesler, Carsten G. Bonnemann, Allison M. Okamura

Link: http://arxiv.org/abs/2404.02565v1

Abstract: A host of medical conditions, including amputations, diabetes, stroke, and genetic disease, result in loss of touch sensation. Because most types of sensory loss have no pharmacological treatment or rehabilitative therapy, we propose a haptic sensory prosthesis that provides substitutive feedback. The wrist and forearm are compelling locations for feedback due to available skin area and not occluding the hands, but have reduced mechanoreceptor density compared to the fingertips. Focusing on localized pressure as the feedback modality, we hypothesize that we can improve on prior devices by invoking a wider range of stimulus intensity using multiple points of pressure to evoke spatial summation, which is the cumulative perceptual experience from multiple points of stimuli. We conducted a preliminary perceptual test to investigate this idea and found that just noticeable difference is reduced with two points of pressure compared to one, motivating future work using spatial summation in sensory prostheses.

Cultural influence on autonomous vehicles acceptance

Authors: Chowdhury Shahriar Muzammel, Maria Spichkova, James Harland

Link: http://arxiv.org/abs/2404.03694v1

Abstract: Autonomous vehicles and other intelligent transport systems have been evolving rapidly and are being increasingly deployed worldwide. Previous work has shown that perceptions of autonomous vehicles and attitudes towards them depend on various attributes, including the respondent's age, education level and background. These findings with respect to age and educational level are generally uniform, such as showing that younger respondents are typically more accepting of autonomous vehicles, as are those with higher education levels. However the influence of factors such as culture are much less clear cut. In this paper we analyse the relationship between acceptance of autonomous vehicles and national culture by means of the well-known Hofstede cultural model.

PromptRPA: Generating Robotic Process Automation on Smartphones from Textual Prompts

Authors: Tian Huang, Chun Yu, Weinan Shi, Zijian Peng, David Yang, Weiqi Sun, Yuanchun Shi

Link: http://arxiv.org/abs/2404.02475v1

Abstract: Robotic Process Automation (RPA) offers a valuable solution for efficiently automating tasks on the graphical user interface (GUI), by emulating human interactions, without modifying existing code. However, its broader adoption is constrained by the need for expertise in both scripting languages and workflow design. To address this challenge, we present PromptRPA, a system designed to comprehend various task-related textual prompts (e.g., goals, procedures), thereby generating and performing corresponding RPA tasks. PromptRPA incorporates a suite of intelligent agents that mimic human cognitive functions, specializing in interpreting user intent, managing external information for RPA generation, and executing operations on smartphones. The agents can learn from user feedback and continuously improve their performance based on the accumulated knowledge. Experimental results indicated a performance jump from a 22.28% success rate in the baseline to 95.21% with PromptRPA, requiring an average of 1.66 user interventions for each new task. PromptRPA presents promising applications in fields such as tutorial creation, smart assistance, and customer service.

A neuroergonomics model to evaluating nuclear power plants operators' performance under heat stress driven by ECG time-frequency spectrums and fNIRS prefrontal cortex network: a CNN-GAT fusion model

Authors: Yan Zhang, Ming Jia, Meng Li, JianYu Wang, XiangMin Hu, ZhiHui Xu, Tao Chen

Link: http://arxiv.org/abs/2404.02439v1

Abstract: Operators experience complicated physiological and psychological states when exposed to extreme heat stress, which can impair cognitive function and decrease performance significantly, ultimately leading to severe secondary disasters. Therefore, there is an urgent need for a feasible technique to identify their abnormal states to enhance the reliability of human-cybernetics systems. With the advancement of deep learning in physiological modeling, a model for evaluating operators' performance driven by electrocardiogram (ECG) and functional near-infrared spectroscopy (fNIRS) was proposed, demonstrating high ecological validity. The model fused a convolutional neural network (CNN) backbone and a graph attention network (GAT) backbone to extract discriminative features from ECG time-frequency spectrums and fNIRS prefrontal cortex (PFC) network respectively with deeper neuroscience domain knowledge, and eventually achieved 0.90 AUC. Results supported that handcrafted features extracted by specialized neuroscience methods can alleviate overfitting. Inspired by the small-world nature of the brain network, the fNIRS PFC network was organized as an undirected graph and embedded by GAT. It is proven to perform better in information aggregation and delivery compared to a simple non-linear transformation. The model provides a potential neuroergonomics application for evaluating the human state in vital human-cybernetics systems under industry 5.0 scenarios.

A Unified Editing Method for Co-Speech Gesture Generation via Diffusion Inversion

Authors: Zeyu Zhao, Nan Gao, Zhi Zeng, Guixuan Zhang, Jie Liu, Shuwu Zhang

Link: http://arxiv.org/abs/2404.02411v1

Abstract: Diffusion models have shown great success in generating high-quality co-speech gestures for interactive humanoid robots or digital avatars from noisy input with the speech audio or text as conditions. However, they rarely focus on providing rich editing capabilities for content creators other than high-level specialized measures like style conditioning. To resolve this, we propose a unified framework utilizing diffusion inversion that enables multi-level editing capabilities for co-speech gesture generation without re-training. The method takes advantage of two key capabilities of invertible diffusion models. The first is that through inversion, we can reconstruct the intermediate noise from gestures and regenerate new gestures from the noise. This can be used to obtain gestures with high-level similarities to the original gestures for different speech conditions. The second is that this reconstruction reduces activation caching requirements during gradient calculation, making the direct optimization on input noises possible on current hardware with limited memory. With different loss functions designed for, e.g., joint rotation or velocity, we can control various low-level details by automatically tweaking the input noises through optimization. Extensive experiments on multiple use cases show that this framework succeeds in unifying high-level and low-level co-speech gesture editing.

2024-04-02

From Delays to Densities: Exploring Data Uncertainty through Speech, Text, and Visualization

Authors: Chase Stokes, Chelsea Sanker, Bridget Cogley, Vidya Setlur

Link: http://arxiv.org/abs/2404.02317v1

Abstract: Understanding and communicating data uncertainty is crucial for making informed decisions in sectors like finance and healthcare. Previous work has explored how to express uncertainty in various modes. For example, uncertainty can be expressed visually with quantile dot plots or linguistically with hedge words and prosody. Our research aims to systematically explore how variations within each mode contribute to communicating uncertainty to the user; this allows us to better understand each mode's affordances and limitations. We completed an exploration of the uncertainty design space based on pilot studies and ran two crowdsourced experiments examining how speech, text, and visualization modes and variants within them impact decision-making with uncertain data. Visualization and text were most effective for rational decision-making, though text resulted in lower confidence. Speech garnered the highest trust despite sometimes leading to risky decisions. Results from these studies indicate meaningful trade-offs among modes of information and encourage exploration of multimodal data representations.

A Change of Scenery: Transformative Insights from Retrospective VR Embodied Perspective-Taking of Conflict With a Close Other

Authors: Seraphina Yong, Leo Cui, Evan Suma Rosenberg, Svetlana Yarosh

Link: http://arxiv.org/abs/2404.02277v1

Abstract: Close relationships are irreplaceable social resources, yet prone to high-risk conflict. Building on findings from the fields of HCI, virtual reality, and behavioral therapy, we evaluate the unexplored potential of retrospective VR-embodied perspective-taking to fundamentally influence conflict resolution in close others. We develop a biographically-accurate Retrospective Embodied Perspective-Taking system (REPT) and conduct a mixed-methods evaluation of its influence on close others' reflection and communication, compared to video-based reflection methods currently used in therapy (treatment as usual, or TAU). Our key findings provide evidence that REPT was able to significantly improve communication skills and positive sentiment of both partners during conflict, over TAU. The qualitative data also indicated that REPT surpassed basic perspective-taking by exclusively stimulating users to embody and reflect on both their own and their partner's experiences at the same level. In light of these findings, we provide implications and an agenda for social embodiment in HCI design: conceptualizing the use of `embodied social cognition,' and envisioning socially-embodied experiences as an interactive context.

Exploring How Multiple Levels of GPT-Generated Programming Hints Support or Disappoint Novices

Authors: Ruiwei Xiao, Xinying Hou, John Stamper

Link: http://arxiv.org/abs/2404.02213v1

Abstract: Recent studies have integrated large language models (LLMs) into diverse educational contexts, including providing adaptive programming hints, a type of feedback focuses on helping students move forward during problem-solving. However, most existing LLM-based hint systems are limited to one single hint type. To investigate whether and how different levels of hints can support students' problem-solving and learning, we conducted a think-aloud study with 12 novices using the LLM Hint Factory, a system providing four levels of hints from general natural language guidance to concrete code assistance, varying in format and granularity. We discovered that high-level natural language hints alone can be helpless or even misleading, especially when addressing next-step or syntax-related help requests. Adding lower-level hints, like code examples with in-line comments, can better support students. The findings open up future work on customizing help responses from content, format, and granularity levels to accurately identify and meet students' learning needs.

Harder, Better, Faster, Stronger: Interactive Visualization for Human-Centered AI Tools

Authors: Md Naimul Hoque, Sungbok Shin, Niklas Elmqvist

Link: http://arxiv.org/abs/2404.02147v1

Abstract: Human-centered AI (HCAI), rather than replacing the human, puts the human user in the driver's seat of so-called human-centered AI-infused tools (HCAI tools): interactive software tools that amplify, augment, empower, and enhance human performance using AI models; often novel generative or foundation AI ones. In this paper, we discuss how interactive visualization can be a key enabling technology for creating such human-centered AI tools. Visualization has already been shown to be a fundamental component in explainable AI models, and coupling this with data-driven, semantic, and unified interaction feedback loops will enable a human-centered approach to integrating AI models in the loop with human users. We present several examples of our past and current work on such HCAI tools, including for creative writing, temporal prediction, and user experience analysis. We then draw parallels between these tools to suggest common themes on how interactive visualization can support the design of future HCAI tools.

The Effects of Group Sanctions on Participation and Toxicity: Quasi-experimental Evidence from the Fediverse

Authors: Carl Colglazier, Nathan TeBlunthuis, Aaron Shaw

Link: http://arxiv.org/abs/2404.02109v1

Abstract: Online communities often overlap and coexist, despite incongruent norms and approaches to content moderation. When communities diverge, decentralized and federated communities may pursue group-level sanctions, including defederation (disconnection) to block communication between members of specific communities. We investigate the effects of defederation in the context of the Fediverse, a set of decentralized, interconnected social networks with independent governance. Mastodon and Pleroma, the most popular software powering the Fediverse, allow administrators on one server to defederate from another. We use a difference-in-differences approach and matched controls to estimate the effects of defederation events on participation and message toxicity among affected members of the blocked and blocking servers. We find that defederation causes a drop in activity for accounts on the blocked servers, but not on the blocking servers. Also, we find no evidence of an effect of defederation on message toxicity.

Explainability in JupyterLab and Beyond: Interactive XAI Systems for Integrated and Collaborative Workflows

Authors: Grace Guo, Dustin Arendt, Alex Endert

Link: http://arxiv.org/abs/2404.02081v1

Abstract: Explainable AI (XAI) tools represent a turn to more human-centered and human-in-the-loop AI approaches that emphasize user needs and perspectives in machine learning model development workflows. However, while the majority of ML resources available today are developed for Python computational environments such as JupyterLab and Jupyter Notebook, the same has not been true of interactive XAI systems, which are often still implemented as standalone interfaces. In this paper, we address this mismatch by identifying three design patterns for embedding front-end XAI interfaces into Jupyter, namely: 1) One-way communication from Python to JavaScript, 2) Two-way data synchronization, and 3) Bi-directional callbacks. We also provide an open-source toolkit, bonXAI, that demonstrates how each design pattern might be used to build interactive XAI tools for a Pytorch text classification workflow. Finally, we conclude with a discussion of best practices and open questions. Our aims for this paper are to discuss how interactive XAI tools might be developed for computational notebooks, and how they can better integrate into existing model development workflows to support more collaborative, human-centered AI.

Preuve de concept d'un bot vocal dialoguant en wolof

Authors: Elodie Gauthier, Papa-Séga Wade, Thierry Moudenc, Patrice Collen, Emilie De Neef, Oumar Ba, Ndeye Khoyane Cama, Cheikh Ahmadou Bamba Kebe, Ndeye Aissatou Gningue, Thomas Mendo'o Aristide

Link: http://arxiv.org/abs/2404.02009v1

Abstract: This paper presents the proof-of-concept of the first automatic voice assistant ever built in Wolof language, the main vehicular language spoken in Senegal. This voicebot is the result of a collaborative research project between Orange Innovation in France, Orange Senegal (aka Sonatel) and ADNCorp, a small IT company based in Dakar, Senegal. The purpose of the voicebot is to provide information to Orange customers about the Sargal loyalty program of Orange Senegal by using the most natural mean to communicate: speech. The voicebot receives in input the customer's oral request that is then processed by a SLU system to reply to the customer's request using audio recordings. The first results of this proof-of-concept are encouraging as we achieved 22% of WER for the ASR task and 78% of F1-score on the NLU task.

Cash or Non-Cash? Unveiling Ideators' Incentive Preferences in Crowdsourcing Contests

Authors: Christoph Riedl, Johann Füller, Katja Hutter, Gerard J. Tellis

Link: http://arxiv.org/abs/2404.01997v1

Abstract: Even though research has repeatedly shown that non-cash incentives can be effective, cash incentives are the de facto standard in crowdsourcing contests. In this multi-study research, we quantify ideators' preferences for non-cash incentives and investigate how allowing ideators to self-select their preferred incentive -- offering ideators a choice between cash and non-cash incentives -- affects their creative performance. We further explore whether the market context of the organization hosting the contest -- social (non-profit) or monetary (for-profit) -- moderates incentive preferences and their effectiveness. We find that individuals exhibit heterogeneous incentive preferences and often prefer non-cash incentives, even in for-profit contexts. Offering ideators a choice of incentives can enhance creative performance. Market context moderates the effect of incentives, such that ideators who receive non-cash incentives in for-profit contexts tend to exert less effort. We show that heterogeneity of ideators' preferences (and the ability to satisfy diverse preferences with suitably diverse incentive options) is a critical boundary condition to realizing benefits from offering ideators a choice of incentives. We provide managers with guidance to design effective incentives by improving incentive-preference fit for ideators.

Fast and Adaptive Questionnaires for Voting Advice Applications

Authors: Fynn Bachmann, Cristina Sarasua, Abraham Bernstein

Link: http://arxiv.org/abs/2404.01872v1

Abstract: The effectiveness of Voting Advice Applications (VAA) is often compromised by the length of their questionnaires. To address user fatigue and incomplete responses, some applications (such as the Swiss Smartvote) offer a condensed version of their questionnaire. However, these condensed versions can not ensure the accuracy of recommended parties or candidates, which we show to remain below 40%. To tackle these limitations, this work introduces an adaptive questionnaire approach that selects subsequent questions based on users' previous answers, aiming to enhance recommendation accuracy while reducing the number of questions posed to the voters. Our method uses an encoder and decoder module to predict missing values at any completion stage, leveraging a two-dimensional latent space reflective of political science's traditional methods for visualizing political orientations. Additionally, a selector module is proposed to determine the most informative subsequent question based on the voter's current position in the latent space and the remaining unanswered questions. We validated our approach using the Smartvote dataset from the Swiss Federal elections in 2019, testing various spatial models and selection methods to optimize the system's predictive accuracy. Our findings indicate that employing the IDEAL model both as encoder and decoder, combined with a PosteriorRMSE method for question selection, significantly improves the accuracy of recommendations, achieving 74% accuracy after asking the same number of questions as in the condensed version.

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

Authors: Xu He, Qiaochu Huang, Zhensong Zhang, Zhiwei Lin, Zhiyong Wu, Sicheng Yang, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu

Link: http://arxiv.org/abs/2404.01862v1

Abstract: Co-speech gestures, if presented in the lively form of videos, can achieve superior visual effects in human-machine interaction. While previous works mostly generate structural human skeletons, resulting in the omission of appearance information, we focus on the direct generation of audio-driven co-speech gesture videos in this work. There are two main challenges: 1) A suitable motion feature is needed to describe complex human movements with crucial appearance information. 2) Gestures and speech exhibit inherent dependencies and should be temporally aligned even of arbitrary length. To solve these problems, we present a novel motion-decoupled framework to generate co-speech gesture videos. Specifically, we first introduce a well-designed nonlinear TPS transformation to obtain latent motion features preserving essential appearance information. Then a transformer-based diffusion model is proposed to learn the temporal correlation between gestures and speech, and performs generation in the latent motion space, followed by an optimal motion selection module to produce long-term coherent and consistent gesture videos. For better visual perception, we further design a refinement network focusing on missing details of certain areas. Extensive experimental results show that our proposed framework significantly outperforms existing approaches in both motion and video-related evaluations. Our code, demos, and more resources are available at https://github.com/thuhcsi/S2G-MDDiffusion.

"That's Not Good Science!": An Argument for the Thoughtful Use of Formative Situations in Research through Design

Authors: Raquel B Robinson, Anya Osborne, Chen Ji, James Collin Fey, Ella Dagan, Katherine Isbister

Link: http://arxiv.org/abs/2404.01848v1

Abstract: Most currently accepted approaches to evaluating Research through Design (RtD) presume that design prototypes are finalized and ready for robust testing in laboratory or in-the-wild settings. However, it is also valuable to assess designs at intermediate phases with mid-fidelity prototypes, not just to inform an ongoing design process, but also to glean knowledge of broader use to the research community. We propose 'formative situations' as a frame for examining mid-fidelity prototypes-in-process in this way. We articulate a set of criteria to help the community better assess the rigor of formative situations, in the service of opening conversation about establishing formative situations as a valuable contribution type within the RtD community.

Authors: Malik Muhammad Qirtas, Evi Zafeirid, Dirk Pesch, Eleanor Bantry White

Link: http://arxiv.org/abs/2404.01845v1

Abstract: Background: Loneliness among students is increasing across the world, with potential consequences for mental health and academic success. To address this growing problem, accurate methods of detection are needed to identify loneliness and to differentiate social and emotional loneliness so that intervention can be personalized to individual need. Passive sensing technology provides a unique technique to capture behavioral patterns linked with distinct loneliness forms, allowing for more nuanced understanding and interventions for loneliness. Methods: To differentiate between social and emotional loneliness using digital biomarkers, our study included statistical tests, machine learning for predictive modeling, and SHAP values for feature importance analysis, revealing important factors in loneliness classification. Results: Our analysis revealed significant behavioral differences between socially and emotionally lonely groups, particularly in terms of phone usage and location-based features , with machine learning models demonstrating substantial predictive power in classifying loneliness levels. The XGBoost model, in particular, showed high accuracy and was effective in identifying key digital biomarkers, including phone usage duration and location-based features, as significant predictors of loneliness categories. Conclusion: This study underscores the potential of passive sensing data, combined with machine learning techniques, to provide insights into the behavioral manifestations of social and emotional loneliness among students. The identification of key digital biomarkers paves the way for targeted interventions aimed at mitigating loneliness in this population.

Rethinking Annotator Simulation: Realistic Evaluation of Whole-Body PET Lesion Interactive Segmentation Methods

Authors: Zdravko Marinov, Moon Kim, Jens Kleesiek, Rainer Stiefelhagen

Link: http://arxiv.org/abs/2404.01816v1

Abstract: Interactive segmentation plays a crucial role in accelerating the annotation, particularly in domains requiring specialized expertise such as nuclear medicine. For example, annotating lesions in whole-body Positron Emission Tomography (PET) images can require over an hour per volume. While previous works evaluate interactive segmentation models through either real user studies or simulated annotators, both approaches present challenges. Real user studies are expensive and often limited in scale, while simulated annotators, also known as robot users, tend to overestimate model performance due to their idealized nature. To address these limitations, we introduce four evaluation metrics that quantify the user shift between real and simulated annotators. In an initial user study involving four annotators, we assess existing robot users using our proposed metrics and find that robot users significantly deviate in performance and annotation behavior compared to real annotators. Based on these findings, we propose a more realistic robot user that reduces the user shift by incorporating human factors such as click variation and inter-annotator disagreement. We validate our robot user in a second user study, involving four other annotators, and show it consistently reduces the simulated-to-real user shift compared to traditional robot users. By employing our robot user, we can conduct more large-scale and cost-efficient evaluations of interactive segmentation models, while preserving the fidelity of real user studies. Our implementation is based on MONAI Label and will be made publicly available.

Generative AI for Immersive Communication: The Next Frontier in Internet-of-Senses Through 6G

Authors: Nassim Sehad, Lina Bariah, Wassim Hamidouche, Hamed Hellaoui, Riku Jäntti, Mérouane Debbah

Link: http://arxiv.org/abs/2404.01713v1

Abstract: Over the past two decades, the Internet-of-Things (IoT) has been a transformative concept, and as we approach 2030, a new paradigm known as the Internet of Senses (IoS) is emerging. Unlike conventional Virtual Reality (VR), IoS seeks to provide multi-sensory experiences, acknowledging that in our physical reality, our perception extends far beyond just sight and sound; it encompasses a range of senses. This article explores existing technologies driving immersive multi-sensory media, delving into their capabilities and potential applications. This exploration includes a comparative analysis between conventional immersive media streaming and a proposed use case that lever- ages semantic communication empowered by generative Artificial Intelligence (AI). The focal point of this analysis is the substantial reduction in bandwidth consumption by 99.93% in the proposed scheme. Through this comparison, we aim to underscore the practical applications of generative AI for immersive media while addressing the challenges and outlining future trajectories.

Tell and show: Combining multiple modalities to communicate manipulation tasks to a robot

Authors: Petr Vanc, Radoslav Skoviera, Karla Stepanova

Link: http://arxiv.org/abs/2404.01702v1

Abstract: As human-robot collaboration is becoming more widespread, there is a need for a more natural way of communicating with the robot. This includes combining data from several modalities together with the context of the situation and background knowledge. Current approaches to communication typically rely only on a single modality or are often very rigid and not robust to missing, misaligned, or noisy data. In this paper, we propose a novel method that takes inspiration from sensor fusion approaches to combine uncertain information from multiple modalities and enhance it with situational awareness (e.g., considering object properties or the scene setup). We first evaluate the proposed solution on simulated bimodal datasets (gestures and language) and show by several ablation experiments the importance of various components of the system and its robustness to noisy, missing, or misaligned observations. Then we implement and evaluate the model on the real setup. In human-robot interaction, we must also consider whether the selected action is probable enough to be executed or if we should better query humans for clarification. For these purposes, we enhance our model with adaptive entropy-based thresholding that detects the appropriate thresholds for different types of interaction showing similar performance as fine-tuned fixed thresholds.

NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps

Authors: Kristina Gligoric, Myra Cheng, Lucia Zheng, Esin Durmus, Dan Jurafsky

Link: http://arxiv.org/abs/2404.01651v1

Abstract: The use of words to convey speaker's intent is traditionally distinguished from the `mention' of words for quoting what someone said, or pointing out properties of a word. Here we show that computationally modeling this use-mention distinction is crucial for dealing with counterspeech online. Counterspeech that refutes problematic content often mentions harmful language but is not harmful itself (e.g., calling a vaccine dangerous is not the same as expressing disapproval of someone for calling vaccines dangerous). We show that even recent language models fail at distinguishing use from mention, and that this failure propagates to two key downstream tasks: misinformation and hate speech detection, resulting in censorship of counterspeech. We introduce prompting mitigations that teach the use-mention distinction, and show they reduce these errors. Our work highlights the importance of the use-mention distinction for NLP and CSS and offers ways to address it.

InsightLens: Discovering and Exploring Insights from Conversational Contexts in Large-Language-Model-Powered Data Analysis

Authors: Luoxuan Weng, Xingbo Wang, Junyu Lu, Yingchaojie Feng, Yihan Liu, Wei Chen

Link: http://arxiv.org/abs/2404.01644v1

Abstract: The proliferation of large language models (LLMs) has revolutionized the capabilities of natural language interfaces (NLIs) for data analysis. LLMs can perform multi-step and complex reasoning to generate data insights based on users' analytic intents. However, these insights often entangle with an abundance of contexts in analytic conversations such as code, visualizations, and natural language explanations. This hinders efficient identification, verification, and interpretation of insights within the current chat-based interfaces of LLMs. In this paper, we first conduct a formative study with eight experienced data analysts to understand their general workflow and pain points during LLM-powered data analysis. Then, we propose an LLM-based multi-agent framework to automatically extract, associate, and organize insights along with the analysis process. Based on this, we introduce InsightLens, an interactive system that visualizes the intricate conversational contexts from multiple aspects to facilitate insight discovery and exploration. A user study with twelve data analysts demonstrates the effectiveness of InsightLens, showing that it significantly reduces users' manual and cognitive effort without disrupting their conversational data analysis workflow, leading to a more efficient analysis experience.

Gen4DS: Workshop on Data Storytelling in an Era of Generative AI

Authors: Xingyu Lan, Leni Yang, Zezhong Wang, Danqing Shi, Sheelagh Carpendale

Link: http://arxiv.org/abs/2404.01622v1

Abstract: Storytelling is an ancient and precious human ability that has been rejuvenated in the digital age. Over the last decade, there has been a notable surge in the recognition and application of data storytelling, both in academia and industry. Recently, the rapid development of generative AI has brought new opportunities and challenges to this field, sparking numerous new questions. These questions may not necessarily be quickly transformed into papers, but we believe it is necessary to promptly discuss them to help the community better clarify important issues and research agendas for the future. We thus invite you to join our workshop (Gen4DS) to discuss questions such as: How can generative AI facilitate the creation of data stories? How might generative AI alter the workflow of data storytellers? What are the pitfalls and risks of incorporating AI in storytelling? We have designed both paper presentations and interactive activities (including hands-on creation, group discussion pods, and debates on controversial issues) for the workshop. We hope that participants will learn about the latest advances and pioneering work in data storytelling, engage in critical conversations with each other, and have an enjoyable, unforgettable, and meaningful experience at the event.

Collaborative human-AI trust (CHAI-T): A process framework for active management of trust in human-AI collaboration

Authors: Melanie J. McGrath, Andreas Duenser, Justine Lacey, Cecile Paris

Link: http://arxiv.org/abs/2404.01615v1

Abstract: Collaborative human-AI (HAI) teaming combines the unique skills and capabilities of humans and machines in sustained teaming interactions leveraging the strengths of each. In tasks involving regular exposure to novelty and uncertainty, collaboration between adaptive, creative humans and powerful, precise artificial intelligence (AI) promises new solutions and efficiencies. User trust is essential to creating and maintaining these collaborative relationships. Established models of trust in traditional forms of AI typically recognize the contribution of three primary categories of trust antecedents: characteristics of the human user, characteristics of the technology, and environmental factors. The emergence of HAI teams, however, requires an understanding of human trust that accounts for the specificity of task contexts and goals, integrates processes of interaction, and captures how trust evolves in a teaming environment over time. Drawing on both the psychological and computer science literature, the process framework of trust in collaborative HAI teams (CHAI-T) presented in this paper adopts the tripartite structure of antecedents established by earlier models, while incorporating team processes and performance phases to capture the dynamism inherent to trust in teaming contexts. These features enable active management of trust in collaborative AI systems, with practical implications for the design and deployment of collaborative HAI teams.

Helmsman of the Masses? Evaluate the Opinion Leadership of Large Language Models in the Werewolf Game

Authors: Silin Du, Xiaowei Zhang

Link: http://arxiv.org/abs/2404.01602v1

Abstract: Large language models (LLMs) have exhibited memorable strategic behaviors in social deductive games. However, the significance of opinion leadership exhibited by LLM-based agents has been overlooked, which is crucial for practical applications in multi-agent and human-AI interaction settings. Opinion leaders are individuals who have a noticeable impact on the beliefs and behaviors of others within a social group. In this work, we employ the Werewolf game as a simulation platform to assess the opinion leadership of LLMs. The game features the role of the Sheriff, tasked with summarizing arguments and recommending decision options, and therefore serves as a credible proxy for an opinion leader. We develop a framework integrating the Sheriff role and devise two novel metrics for evaluation based on the critical characteristics of opinion leaders. The first metric measures the reliability of the opinion leader, and the second assesses the influence of the opinion leader on other players' decisions. We conduct extensive experiments to evaluate LLMs of different scales. In addition, we collect a Werewolf question-answering dataset (WWQA) to assess and enhance LLM's grasp of the game rules, and we also incorporate human participants for further analysis. The results suggest that the Werewolf game is a suitable test bed to evaluate the opinion leadership of LLMs and few LLMs possess the capacity for opinion leadership.

Leveraging Digital Perceptual Technologies for Remote Perception and Analysis of Human Biomechanical Processes: A Contactless Approach for Workload and Joint Force Assessment

Authors: Jesudara Omidokun, Darlington Egeonu, Bochen Jia, Liang Yang

Link: http://arxiv.org/abs/2404.01576v1

Abstract: This study presents an innovative computer vision framework designed to analyze human movements in industrial settings, aiming to enhance biomechanical analysis by integrating seamlessly with existing software. Through a combination of advanced imaging and modeling techniques, the framework allows for comprehensive scrutiny of human motion, providing valuable insights into kinematic patterns and kinetic data. Utilizing Convolutional Neural Networks (CNNs), Direct Linear Transform (DLT), and Long Short-Term Memory (LSTM) networks, the methodology accurately detects key body points, reconstructs 3D landmarks, and generates detailed 3D body meshes. Extensive evaluations across various movements validate the framework's effectiveness, demonstrating comparable results to traditional marker-based models with minor differences in joint angle estimations and precise estimations of weight and height. Statistical analyses consistently support the framework's reliability, with joint angle estimations showing less than a 5-degree difference for hip flexion, elbow flexion, and knee angle methods. Additionally, weight estimation exhibits an average error of less than 6 % for weight and less than 2 % for height when compared to ground-truth values from 10 subjects. The integration of the Biomech-57 landmark skeleton template further enhances the robustness and reinforces the framework's credibility. This framework shows significant promise for meticulous biomechanical analysis in industrial contexts, eliminating the need for cumbersome markers and extending its utility to diverse research domains, including the study of specific exoskeleton devices' impact on facilitating the prompt return of injured workers to their tasks.

2024-04-01

PlayFutures: Imagining Civic Futures with AI and Puppets

Authors: Supratim Pait, Sumita Sharma, Ashley Frith, Michael Nitsche, Noura Howell

Link: http://arxiv.org/abs/2404.01527v1

Abstract: Children are the builders of the future and crucial to how the technologies around us develop. They are not voters but are participants in how the public spaces in a city are used. Through a workshop designed around kids of age 9-12, we investigate if novel technologies like artificial intelligence can be integrated in existing ways of play and performance to 1) re-imagine the future of civic spaces, 2) reflect on these novel technologies in the process and 3) build ways of civic engagement through play. We do this using a blend AI image generation and Puppet making to ultimately build future scenarios, perform debate and discussion around the futures and reflect on AI, its role and potential in their process. We present our findings of how AI helped envision these futures, aid performances, and report some initial reflections from children about the technology.

DeLVE into Earth's Past: A Visualization-Based Exhibit Deployed Across Multiple Museum Contexts

Authors: Mara Solen, Nigar Sultana, Laura Lukes, Tamara Munzner

Link: http://arxiv.org/abs/2404.01488v1

Abstract: While previous work has found success in deploying visualizations as museum exhibits, differences in visitor behaviour across varying museum contexts are understudied. We present an interactive Deep-time Literacy Visualization Exhibit (DeLVE) to help museum visitors understand deep time (lengths of extremely long geological processes) by improving proportional reasoning skills through comparison of different time periods. DeLVE uses a new visualization idiom, Connected Multi-Tier Ranges, to visualize curated datasets of past events across multiple scales of time, relating extreme scales with concrete scales that have more familiar magnitudes and units. Museum staff at three separate museums approved the deployment of DeLVE as a digital kiosk, and devoted time to curating a unique dataset in each of them. We collect data from two sources, an observational study and system trace logs, yielding evidence of successfully meeting our requirements. We discuss the importance of context: similar museum exhibits in different contexts were received very differently by visitors. We additionally discuss differences in our process from standard design study methodology which is focused on design studies for data analysis purposes, rather than for presentation. Supplemental materials are available at: https://osf.io/z53dq/?view_only=4df33aad207144aca149982412125541

A Design Space for Visualization with Large Scale-Item Ratios

Authors: Mara Solen, Tamara Munzner

Link: http://arxiv.org/abs/2404.01485v1

Abstract: The scale-item ratio is the relationship between the largest scale and the smallest item in a visualization. Designing visualizations when this ratio is large can be challenging, and designers have developed many approaches to overcome this challenge. We present a design space for visualization with large scale-item ratios. The design space includes three dimensions, with eight total subdimensions. We demonstrate its descriptive power by using it to code approaches from a corpus we compiled of 54 examples, created by a mix of academics and practitioners. We then partition these examples into five strategies, which are shared approaches with respect to design space dimension choices. We demonstrate generative power by analyzing missed opportunities within the corpus of examples, identified through analysis of the design space, where we note how certain examples could have benefited from different choices. Supplemental materials: https://osf.io/wbrdm/?view_only=04389a2101a04e71a2c208a93bf2f7f2

Will the Real Linda Please Stand up...to Large Language Models? Examining the Representativeness Heuristic in LLMs

Authors: Pengda Wang, Zilin Xiao, Hanjie Chen, Frederick L. Oswald

Link: http://arxiv.org/abs/2404.01461v1

Abstract: Although large language models (LLMs) have demonstrated remarkable proficiency in understanding text and generating human-like text, they may exhibit biases acquired from training data in doing so. Specifically, LLMs may be susceptible to a common cognitive trap in human decision-making called the representativeness heuristic. This is a concept in psychology that refers to judging the likelihood of an event based on how closely it resembles a well-known prototype or typical example versus considering broader facts or statistical evidence. This work investigates the impact of the representativeness heuristic on LLM reasoning. We created REHEAT (Representativeness Heuristic AI Testing), a dataset containing a series of problems spanning six common types of representativeness heuristics. Experiments reveal that four LLMs applied to REHEAT all exhibited representativeness heuristic biases. We further identify that the model's reasoning steps are often incorrectly based on a stereotype rather than the problem's description. Interestingly, the performance improves when adding a hint in the prompt to remind the model of using its knowledge. This suggests the uniqueness of the representativeness heuristic compared to traditional biases. It can occur even when LLMs possess the correct knowledge while failing in a cognitive trap. This highlights the importance of future research focusing on the representativeness heuristic in model reasoning and decision-making and on developing solutions to address it.

A Preliminary Roadmap for LLMs as Assistants in Exploring, Analyzing, and Visualizing Knowledge Graphs

Authors: Harry Li, Gabriel Appleby, Ashley Suh

Link: http://arxiv.org/abs/2404.01425v1

Abstract: We present a mixed-methods study to explore how large language models (LLMs) can assist users in the visual exploration and analysis of knowledge graphs (KGs). We surveyed and interviewed 20 professionals from industry, government laboratories, and academia who regularly work with KGs and LLMs, either collaboratively or concurrently. Our findings show that participants overwhelmingly want an LLM to facilitate data retrieval from KGs through joint query construction, to identify interesting relationships in the KG through multi-turn conversation, and to create on-demand visualizations from the KG that enhance their trust in the LLM's outputs. To interact with an LLM, participants strongly prefer a chat-based 'widget,' built on top of their regular analysis workflows, with the ability to guide the LLM using their interactions with a visualization. When viewing an LLM's outputs, participants similarly prefer a combination of annotated visuals (e.g., subgraphs or tables extracted from the KG) alongside summarizing text. However, participants also expressed concerns about an LLM's ability to maintain semantic intent when translating natural language questions into KG queries, the risk of an LLM 'hallucinating' false data from the KG, and the difficulties of engineering a 'perfect prompt.' From the analysis of our interviews, we contribute a preliminary roadmap for the design of LLM-driven knowledge graph exploration systems and outline future opportunities in this emergent design space.

Towards a potential paradigm shift in health data collection and analysis

Authors: David Josef Herzog, Nitsa Judith Herzog

Link: http://arxiv.org/abs/2404.01403v1

Abstract: Industrial Revolution 4.0 transforms healthcare systems. The first three technological revolutions changed the relationship between human and machine interaction due to the exponential growth of machine numbers. The fourth revolution put humans into a situation where heterogeneous data is produced with unmatched quantity and quality not only by traditional methods, enforced by digitization, but also by ubiquitous computing, machine-to-machine interactions and smart environment. The modern cyber-physical space underlines the role of the person in the expanding context of computerization and big data processing. In healthcare, where data collection and analysis particularly depend on human efforts, the disruptive nature of these developments is evident. Adaptation to this process requires deep scrutiny of the trends and recognition of future medical data technologies` evolution. Significant difficulties arise from discrepancies in requirements by healthcare, administrative and technology stakeholders. Black box and grey box decisions made in medical imaging and diagnostic Decision Support Software are often not transparent enough for the professional, social and medico-legal requirements. While Explainable AI proposes a partial solution for AI applications in medicine, the approach has to be wider and multiplex. LLM potential and limitations are also discussed. This paper lists the most significant issues in these topics and describes possible solutions.

Evaluating Privacy Perceptions, Experience, and Behavior of Software Development Teams

Authors: Maxwell Prybylo, Sara Haghighi, Sai Teja Peddinti, Sepideh Ghanavati

Link: http://arxiv.org/abs/2404.01283v1

Abstract: With the increase in the number of privacy regulations, small development teams are forced to make privacy decisions on their own. In this paper, we conduct a mixed-method survey study, including statistical and qualitative analysis, to evaluate the privacy perceptions, practices, and knowledge of members involved in various phases of software development (SDLC). Our survey includes 362 participants from 23 countries, encompassing roles such as product managers, developers, and testers. Our results show diverse definitions of privacy across SDLC roles, emphasizing the need for a holistic privacy approach throughout SDLC. We find that software teams, regardless of their region, are less familiar with privacy concepts (such as anonymization), relying on self-teaching and forums. Most participants are more familiar with GDPR and HIPAA than other regulations, with multi-jurisdictional compliance being their primary concern. Our results advocate the need for role-dependent solutions to address the privacy challenges, and we highlight research directions and educational takeaways to help improve privacy-aware software development.

Information Plane Analysis Visualization in Deep Learning via Transfer Entropy

Authors: Adrian Moldovan, Angel Cataron, Razvan Andonie

Link: http://arxiv.org/abs/2404.01364v1

Abstract: In a feedforward network, Transfer Entropy (TE) can be used to measure the influence that one layer has on another by quantifying the information transfer between them during training. According to the Information Bottleneck principle, a neural model's internal representation should compress the input data as much as possible while still retaining sufficient information about the output. Information Plane analysis is a visualization technique used to understand the trade-off between compression and information preservation in the context of the Information Bottleneck method by plotting the amount of information in the input data against the compressed representation. The claim that there is a causal link between information-theoretic compression and generalization, measured by mutual information, is plausible, but results from different studies are conflicting. In contrast to mutual information, TE can capture temporal relationships between variables. To explore such links, in our novel approach we use TE to quantify information transfer between neural layers and perform Information Plane analysis. We obtained encouraging experimental results, opening the possibility for further investigations.

Image Reconstruction from Electroencephalography Using Latent Diffusion

Authors: Teng Fei, Virginia de Sa

Link: http://arxiv.org/abs/2404.01250v1

Abstract: In this work, we have adopted the diffusion-based image reconstruction pipeline previously used for fMRI image reconstruction and applied it to Electroencephalography (EEG). The EEG encoding method is very simple, and forms a baseline from which more sophisticated EEG encoding methods can be compared. We have also evaluated the fidelity of the generated image using the same metrics used in the previous functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG) works. Our results show that while the reconstruction from EEG recorded to rapidly presented images is not as good as reconstructions from fMRI to slower presented images, it holds a surprising amount of information that could be applied in specific use cases. Also, EEG-based image reconstruction works better in some categories-such as land animals and food-than others, shedding new light on previous findings of EEG's sensitivity to those categories and revealing potential for these methods to further understand EEG responses to human visual coding. More investigation should use longer-duration image stimulations to elucidate the later components that might be salient to the different image categories.

AURORA: Navigating UI Tarpits via Automated Neural Screen Understanding

Authors: Safwat Ali Khan, Wenyu Wang, Yiran Ren, Bin Zhu, Jiangfan Shi, Alyssa McGowan, Wing Lam, Kevin Moran

Link: http://arxiv.org/abs/2404.01240v1

Abstract: Nearly a decade of research in software engineering has focused on automating mobile app testing to help engineers in overcoming the unique challenges associated with the software platform. Much of this work has come in the form of Automated Input Generation tools (AIG tools) that dynamically explore app screens. However, such tools have repeatedly been demonstrated to achieve lower-than-expected code coverage - particularly on sophisticated proprietary apps. Prior work has illustrated that a primary cause of these coverage deficiencies is related to so-called tarpits, or complex screens that are difficult to navigate. In this paper, we take a critical step toward enabling AIG tools to effectively navigate tarpits during app exploration through a new form of automated semantic screen understanding. We introduce AURORA, a technique that learns from the visual and textual patterns that exist in mobile app UIs to automatically detect common screen designs and navigate them accordingly. The key idea of AURORA is that there are a finite number of mobile app screen designs, albeit with subtle variations, such that the general patterns of different categories of UI designs can be learned. As such, AURORA employs a multi-modal, neural screen classifier that is able to recognize the most common types of UI screen designs. After recognizing a given screen, it then applies a set of flexible and generalizable heuristics to properly navigate the screen. We evaluated AURORA both on a set of 12 apps with known tarpits from prior work, and on a new set of five of the most popular apps from the Google Play store. Our results indicate that AURORA is able to effectively navigate tarpit screens, outperforming prior approaches that avoid tarpits by 19.6% in terms of method coverage. The improvements can be attributed to AURORA's UI design classification and heuristic navigation techniques.

LLM Attributor: Interactive Visual Attribution for LLM Generation

Authors: Seongmin Lee, Zijie J. Wang, Aishwarya Chakravarthy, Alec Helbling, ShengYun Peng, Mansi Phute, Duen Horng Chau, Minsuk Kahng

Link: http://arxiv.org/abs/2404.01361v1

Abstract: While large language models (LLMs) have shown remarkable capability to generate convincing text across diverse domains, concerns around its potential risks have highlighted the importance of understanding the rationale behind text generation. We present LLM Attributor, a Python library that provides interactive visualizations for training data attribution of an LLM's text generation. Our library offers a new way to quickly attribute an LLM's text generation to training data points to inspect model behaviors, enhance its trustworthiness, and compare model-generated text with user-provided text. We describe the visual and interactive design of our tool and highlight usage scenarios for LLaMA2 models fine-tuned with two different datasets: online articles about recent disasters and finance-related question-answer pairs. Thanks to LLM Attributor's broad support for computational notebooks, users can easily integrate it into their workflow to interactively visualize attributions of their models. For easier access and extensibility, we open-source LLM Attributor at https://github.com/poloclub/ LLM-Attribution. The video demo is available at https://youtu.be/mIG2MDQKQxM.

Chat Modeling: Natural Language-based Procedural Modeling of Biological Structures without Training

Authors: Donggang Jia, Yunhai Wang, Ivan Viola

Link: http://arxiv.org/abs/2404.01063v1

Abstract: 3D modeling of biological structures is an inherently complex process, necessitating both biological and geometric understanding. Additionally, the complexity of user interfaces of 3D modeling tools and the associated steep learning curve further exacerbate the difficulty of authoring a 3D model. In this paper, we introduce a novel framework to address the challenge of using 3D modeling software by converting users' textual inputs into modeling actions within an interactive procedural modeling system. The framework incorporates a code generator of a novel code format and a corresponding code interpreter. The major technical innovation includes the user-refinement mechanism that captures the degree of user dissatisfaction with the modeling outcome, offers an interactive revision, and leverages this feedback for future improved 3D modeling. This entire framework is powered by large language models and eliminates the need for a traditional training process. We develop a prototype tool named Chat Modeling, offering both automatic and step-by-step 3D modeling approaches. Our evaluation of the framework with structural biologists highlights the potential of our approach being utilized in their scientific workflows. All supplemental materials are available at https://osf.io/x4qb7/.

Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation

Authors: Haofeng Liu, Chenshu Xu, Yifei Yang, Lihua Zeng, Shengfeng He

Link: http://arxiv.org/abs/2404.01050v1

Abstract: Point-based interactive editing serves as an essential tool to complement the controllability of existing generative models. A concurrent work, DragDiffusion, updates the diffusion latent map in response to user inputs, causing global latent map alterations. This results in imprecise preservation of the original content and unsuccessful editing due to gradient vanishing. In contrast, we present DragNoise, offering robust and accelerated editing without retracing the latent map. The core rationale of DragNoise lies in utilizing the predicted noise output of each U-Net as a semantic editor. This approach is grounded in two critical observations: firstly, the bottleneck features of U-Net inherently possess semantically rich features ideal for interactive editing; secondly, high-level semantics, established early in the denoising process, show minimal variation in subsequent stages. Leveraging these insights, DragNoise edits diffusion semantics in a single denoising step and efficiently propagates these changes, ensuring stability and efficiency in diffusion editing. Comparative experiments reveal that DragNoise achieves superior control and semantic retention, reducing the optimization time by over 50% compared to DragDiffusion. Our codes are available at https://github.com/haofengl/DragNoise.

How Can Large Language Models Enable Better Socially Assistive Human-Robot Interaction: A Brief Survey

Authors: Zhonghao Shi, Ellen Landrum, Amy O' Connell, Mina Kian, Leticia Pinto-Alva, Kaleen Shrestha, Xiaoyuan Zhu, Maja J Matarić

Link: http://arxiv.org/abs/2404.00938v1

Abstract: Socially assistive robots (SARs) have shown great success in providing personalized cognitive-affective support for user populations with special needs such as older adults, children with autism spectrum disorder (ASD), and individuals with mental health challenges. The large body of work on SAR demonstrates its potential to provide at-home support that complements clinic-based interventions delivered by mental health professionals, making these interventions more effective and accessible. However, there are still several major technical challenges that hinder SAR-mediated interactions and interventions from reaching human-level social intelligence and efficacy. With the recent advances in large language models (LLMs), there is an increased potential for novel applications within the field of SAR that can significantly expand the current capabilities of SARs. However, incorporating LLMs introduces new risks and ethical concerns that have not yet been encountered, and must be carefully be addressed to safely deploy these more advanced systems. In this work, we aim to conduct a brief survey on the use of LLMs in SAR technologies, and discuss the potentials and risks of applying LLMs to the following three major technical challenges of SAR: 1) natural language dialog; 2) multimodal understanding; 3) LLMs as robot policies.

2024-03-31

Designing Human-AI Systems: Anthropomorphism and Framing Bias on Human-AI Collaboration

Authors: Samuel Aleksander Sánchez Olszewski

Link: http://arxiv.org/abs/2404.00634v1

Abstract: AI is redefining how humans interact with technology, leading to a synergetic collaboration between the two. Nevertheless, the effects of human cognition on this collaboration remain unclear. This study investigates the implications of two cognitive biases, anthropomorphism and framing effect, on human-AI collaboration within a hiring setting. Subjects were asked to select job candidates with the help of an AI-powered recommendation tool. The tool was manipulated to have either human-like or robot-like characteristics and presented its recommendations in either positive or negative frames. The results revealed that the framing of AI's recommendations had no significant influence on subjects' decisions. In contrast, anthropomorphism significantly affected subjects' agreement with AI recommendations. Contrary to expectations, subjects were less likely to agree with the AI if it had human-like characteristics. These findings demonstrate that cognitive biases can impact human-AI collaboration and highlight the need for tailored approaches to AI product design, rather than a single, universal solution.

"My agent understands me better": Integrating Dynamic Human-like Memory Recall and Consolidation in LLM-Based Agents

Authors: Yuki Hou, Haruki Tamoto, Homei Miyashita

Link: http://arxiv.org/abs/2404.00573v1

Abstract: In this study, we propose a novel human-like memory architecture designed for enhancing the cognitive abilities of large language model based dialogue agents. Our proposed architecture enables agents to autonomously recall memories necessary for response generation, effectively addressing a limitation in the temporal cognition of LLMs. We adopt the human memory cue recall as a trigger for accurate and efficient memory recall. Moreover, we developed a mathematical model that dynamically quantifies memory consolidation, considering factors such as contextual relevance, elapsed time, and recall frequency. The agent stores memories retrieved from the user's interaction history in a database that encapsulates each memory's content and temporal context. Thus, this strategic storage allows agents to recall specific memories and understand their significance to the user in a temporal context, similar to how humans recognize and recall past experiences.

The Emotional Impact of Game Duration: A Framework for Understanding Player Emotions in Extended Gameplay Sessions

Authors: Anoop Kumar, Suresh Dodda, Navin Kamuni, Venkata Sai Mahesh Vuppalapati

Link: http://arxiv.org/abs/2404.00526v1

Abstract: Video games have played a crucial role in entertainment since their development in the 1970s, becoming even more prominent during the lockdown period when people were looking for ways to entertain them. However, at that time, players were unaware of the significant impact that playtime could have on their feelings. This has made it challenging for designers and developers to create new games since they have to control the emotional impact that these games will take on players. Thus, the purpose of this study is to look at how a player's emotions are affected by the duration of the game. In order to achieve this goal, a framework for emotion detection is created. According to the experiment's results, the volunteers' general ability to express emotions increased from 20 to 60 minutes. In comparison to shorter gameplay sessions, the experiment found that extended gameplay sessions did significantly affect the player's emotions. According to the results, it was recommended that in order to lessen the potential emotional impact that playing computer and video games may have in the future, game producers should think about creating shorter, entertaining games.

Humane Speech Synthesis through Zero-Shot Emotion and Disfluency Generation

Authors: Rohan Chaudhury, Mihir Godbole, Aakash Garg, Jinsil Hwaryoung Seo

Link: http://arxiv.org/abs/2404.01339v1

Abstract: Contemporary conversational systems often present a significant limitation: their responses lack the emotional depth and disfluent characteristic of human interactions. This absence becomes particularly noticeable when users seek more personalized and empathetic interactions. Consequently, this makes them seem mechanical and less relatable to human users. Recognizing this gap, we embarked on a journey to humanize machine communication, to ensure AI systems not only comprehend but also resonate. To address this shortcoming, we have designed an innovative speech synthesis pipeline. Within this framework, a cutting-edge language model introduces both human-like emotion and disfluencies in a zero-shot setting. These intricacies are seamlessly integrated into the generated text by the language model during text generation, allowing the system to mirror human speech patterns better, promoting more intuitive and natural user interactions. These generated elements are then adeptly transformed into corresponding speech patterns and emotive sounds using a rule-based approach during the text-to-speech phase. Based on our experiments, our novel system produces synthesized speech that's almost indistinguishable from genuine human communication, making each interaction feel more personal and authentic.

2024-03-30

Contextual AI Journaling: Integrating LLM and Time Series Behavioral Sensing Technology to Promote Self-Reflection and Well-being using the MindScape App

Authors: Subigya Nepal, Arvind Pillai, William Campbell, Talie Massachi, Eunsol Soul Choi, Orson Xu, Joanna Kuc, Jeremy Huckins, Jason Holden, Colin Depp, Nicholas Jacobson, Mary Czerwinski, Eric Granholm, Andrew T. Campbell

Link: http://arxiv.org/abs/2404.00487v1

Abstract: MindScape aims to study the benefits of integrating time series behavioral patterns (e.g., conversational engagement, sleep, location) with Large Language Models (LLMs) to create a new form of contextual AI journaling, promoting self-reflection and well-being. We argue that integrating behavioral sensing in LLMs will likely lead to a new frontier in AI. In this Late-Breaking Work paper, we discuss the MindScape contextual journal App design that uses LLMs and behavioral sensing to generate contextual and personalized journaling prompts crafted to encourage self-reflection and emotional development. We also discuss the MindScape study of college students based on a preliminary user study and our upcoming study to assess the effectiveness of contextual AI journaling in promoting better well-being on college campuses. MindScape represents a new application class that embeds behavioral intelligence in AI.

Interactive Multi-Robot Flocking with Gesture Responsiveness and Musical Accompaniment

Authors: Catie Cuan, Kyle Jeffrey, Kim Kleiven, Adrian Li, Emre Fisher, Matt Harrison, Benjie Holson, Allison Okamura, Matt Bennice

Link: http://arxiv.org/abs/2404.00442v1

Abstract: For decades, robotics researchers have pursued various tasks for multi-robot systems, from cooperative manipulation to search and rescue. These tasks are multi-robot extensions of classical robotic tasks and often optimized on dimensions such as speed or efficiency. As robots transition from commercial and research settings into everyday environments, social task aims such as engagement or entertainment become increasingly relevant. This work presents a compelling multi-robot task, in which the main aim is to enthrall and interest. In this task, the goal is for a human to be drawn to move alongside and participate in a dynamic, expressive robot flock. Towards this aim, the research team created algorithms for robot movements and engaging interaction modes such as gestures and sound. The contributions are as follows: (1) a novel group navigation algorithm involving human and robot agents, (2) a gesture responsive algorithm for real-time, human-robot flocking interaction, (3) a weight mode characterization system for modifying flocking behavior, and (4) a method of encoding a choreographer's preferences inside a dynamic, adaptive, learned system. An experiment was performed to understand individual human behavior while interacting with the flock under three conditions: weight modes selected by a human choreographer, a learned model, or subset list. Results from the experiment showed that the perception of the experience was not influenced by the weight mode selection. This work elucidates how differing task aims such as engagement manifest in multi-robot system design and execution, and broadens the domain of multi-robot tasks.

Visualizing Routes with AI-Discovered Street-View Patterns

Authors: Tsung Heng Wu, Md Amiruzzaman, Ye Zhao, Deepshikha Bhati, Jing Yang

Link: http://arxiv.org/abs/2404.00431v1

Abstract: Street-level visual appearances play an important role in studying social systems, such as understanding the built environment, driving routes, and associated social and economic factors. It has not been integrated into a typical geographical visualization interface (e.g., map services) for planning driving routes. In this paper, we study this new visualization task with several new contributions. First, we experiment with a set of AI techniques and propose a solution of using semantic latent vectors for quantifying visual appearance features. Second, we calculate image similarities among a large set of street-view images and then discover spatial imagery patterns. Third, we integrate these discovered patterns into driving route planners with new visualization techniques. Finally, we present VivaRoutes, an interactive visualization prototype, to show how visualizations leveraged with these discovered patterns can help users effectively and interactively explore multiple routes. Furthermore, we conducted a user study to assess the usefulness and utility of VivaRoutes.

A Taxonomy for Human-LLM Interaction Modes: An Initial Exploration

Authors: Jie Gao, Simret Araya Gebreegziabher, Kenny Tsu Wei Choo, Toby Jia-Jun Li, Simon Tangi Perrault, Thomas W. Malone

Link: http://arxiv.org/abs/2404.00405v1

Abstract: With ChatGPT's release, conversational prompting has become the most popular form of human-LLM interaction. However, its effectiveness is limited for more complex tasks involving reasoning, creativity, and iteration. Through a systematic analysis of HCI papers published since 2021, we identified four key phases in the human-LLM interaction flow - planning, facilitating, iterating, and testing - to precisely understand the dynamics of this process. Additionally, we have developed a taxonomy of four primary interaction modes: Mode 1: Standard Prompting, Mode 2: User Interface, Mode 3: Context-based, and Mode 4: Agent Facilitator. This taxonomy was further enriched using the "5W1H" guideline method, which involved a detailed examination of definitions, participant roles (Who), the phases that happened (When), human objectives and LLM abilities (What), and the mechanics of each interaction mode (How). We anticipate this taxonomy will contribute to the future design and evaluation of human-LLM interaction.

Designing a User-centric Framework for Information Quality Ranking of Large-scale Street View Images

Authors: Tahiya Chowdhury, Ilan Mandel, Jorge Ortiz, Wendy Ju

Link: http://arxiv.org/abs/2404.00392v1

Abstract: Street view imagery (SVI), largely captured via outfitted fleets or mounted dashcams in consumer vehicles is a rapidly growing source of geospatial data used in urban sensing and development. These datasets are often collected opportunistically, are massive in size, and vary in quality which limits the scope and extent of their use in urban planning. Thus far there has not been much work to identify the obstacles experienced and tools needed by the users of such datasets. This severely limits the opportunities of using emerging street view images in supporting novel research questions that can improve the quality of urban life. This work includes a formative interview study with 5 expert users of large-scale street view datasets from academia, urban planning, and related professions which identifies novel use cases, challenges, and opportunities to increase the utility of these datasets. Based on the user findings, we present a framework to evaluate the quality of information for street images across three attributes (spatial, temporal, and content) that stakeholders can utilize for estimating the value of a dataset, and to improve it over time for their respective use case. We then present a case study using novel street view images where we evaluate our framework and present practical use cases for users. We discuss the implications for designing future systems to support the collection and use of street view data to assist in sensing and planning the urban environment.

On Task and in Sync: Examining the Relationship between Gaze Synchrony and Self-Reported Attention During Video Lecture Learning

Authors: Babette Bühler, Efe Bozkir, Hannah Deininger, Peter Gerjets, Ulrich Trautwein, Enkelejda Kasneci

Link: http://arxiv.org/abs/2404.00333v1

Abstract: Successful learning depends on learners' ability to sustain attention, which is particularly challenging in online education due to limited teacher interaction. A potential indicator for attention is gaze synchrony, demonstrating predictive power for learning achievements in video-based learning in controlled experiments focusing on manipulating attention. This study (N=84) examines the relationship between gaze synchronization and self-reported attention of learners, using experience sampling, during realistic online video learning. Gaze synchrony was assessed through Kullback-Leibler Divergence of gaze density maps and MultiMatch algorithm scanpath comparisons. Results indicated significantly higher gaze synchronization in attentive participants for both measures and self-reported attention significantly predicted post-test scores. In contrast, synchrony measures did not correlate with learning outcomes. While supporting the hypothesis that attentive learners exhibit similar eye movements, the direct use of synchrony as an attention indicator poses challenges, requiring further research on the interplay of attention, gaze synchrony, and video content type.

Enhancing Empathy in Virtual Reality: An Embodied Approach to Mindset Modulation

Authors: Seoyeon Bae, Yoon Kyung Lee, Jungcheol Lee, Jaeheon Kim, Haeseong Jeon, Seung-Hwan Lim, Byung-Cheol Kim, Sowon Hahn

Link: http://arxiv.org/abs/2404.00300v1

Abstract: A growth mindset has shown promising outcomes for increasing empathy ability. However, stimulating a growth mindset in VR-based empathy interventions is under-explored. In the present study, we implemented prosocial VR content, Our Neighbor Hero, focusing on embodying a virtual character to modulate players' mindsets. The virtual body served as a stepping stone, enabling players to identify with the character and cultivate a growth mindset as they followed mission instructions. We considered several implementation factors to assist players in positioning within the VR experience, including positive feedback, content difficulty, background lighting, and multimodal feedback. We conducted an experiment to investigate the intervention's effectiveness in increasing empathy. Our findings revealed that the VR content and mindset training encouraged participants to improve their growth mindsets and empathic motives. This VR content was developed for college students to enhance their empathy and teamwork skills. It has the potential to improve collaboration in organizational and community environments.

Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World

Authors: Guande Wu, Chen Zhao, Claudio Silva, He He

Link: http://arxiv.org/abs/2404.00246v1

Abstract: Language agents that interact with the world on their own have great potential for automating digital tasks. While large language model (LLM) agents have made progress in understanding and executing tasks such as textual games and webpage control, many real-world tasks also require collaboration with humans or other LLMs in equal roles, which involves intent understanding, task coordination, and communication. To test LLM's ability to collaborate, we design a blocks-world environment, where two agents, each having unique goals and skills, build a target structure together. To complete the goals, they can act in the world and communicate in natural language. Under this environment, we design increasingly challenging settings to evaluate different collaboration perspectives, from independent to more complex, dependent tasks. We further adopt chain-of-thought prompts that include intermediate reasoning steps to model the partner's state and identify and correct execution errors. Both human-machine and machine-machine experiments show that LLM agents have strong grounding capacities, and our approach significantly improves the evaluation metric.

2024-03-29

Tools and Tasks in Sensemaking: A Visual Accessibility Perspective

Authors: Yichun Zhao, Miguel A. Nacenta

Link: http://arxiv.org/abs/2404.00192v1

Abstract: Our previous interview study explores the needs and uses of diagrammatic information by the Blind and Low Vision (BLV) community, resulting in a framework called the Ladder of Diagram Access. The framework outlines five levels of information access when interacting with a diagram. In this paper, we connect this framework to include the global activity of sensemaking and discuss its (in)accessibility to the BLV demographic. We also discuss the integration of this framework into the sensemaking process and explore the current sensemaking practices and strategies employed by the BLV community, the challenges they face at different levels of the ladder, and potential solutions to enhance inclusivity towards a data-driven workforce.

No Risk, No Reward: Towards An Automated Measure of Psychological Safety from Online Communication

Authors: Sharon Ferguson, Georgia Van de Zande, Alison Olechowski

Link: http://arxiv.org/abs/2404.00171v1

Abstract: The data created from virtual communication platforms presents the opportunity to explore automated measures for monitoring team performance. In this work, we explore one important characteristic of successful teams - Psychological Safety - or the belief that a team is safe for interpersonal risk-taking. To move towards an automated measure of this phenomenon, we derive virtual communication characteristics and message keywords related to elements of Psychological Safety from the literature. Using a mixed methods approach, we investigate whether these characteristics are present in the Slack messages from two design teams - one high in Psychological Safety, and one low. We find that some usage characteristics, such as replies, reactions, and user mentions, might be promising metrics to indicate higher levels of Psychological Safety, while simple keyword searches may not be nuanced enough. We present the first step towards the automated detection of this important, yet complex, team characteristic.

Circle Back Next Week: The Effect of Meeting-Free Weeks on Distributed Workers' Unstructured Time and Attention Negotiation

Authors: Sharon Ferguson, Michael Massimi

Link: http://arxiv.org/abs/2404.00161v1

Abstract: While distributed workers rely on scheduled meetings for coordination and collaboration, these meetings can also challenge their ability to focus. Protecting worker focus has been addressed from a technical perspective, but companies are now attempting organizational interventions, such as meeting-free weeks. Recognizing distributed collaboration as a sociotechnical challenge, we first present an interview study with distributed workers participating in meeting-free weeks at an enterprise software company. We identify three orientations workers exhibit during these weeks: Focus, Collaborative, and Time-Bound, each with varying levels and use of unstructured time. These different orientations result in challenges in attention negotiation, which may be suited for technical interventions. This motivated a follow-up study investigating attention negotiation and the compensating mechanisms workers developed during meeting-free weeks. Our framework identified tensions between the attention-getting and attention-delegation strategies. We extend past work to show how workers adapt their virtual collaboration mechanisms in response to organizational interventions

Give Text A Chance: Advocating for Equal Consideration for Language and Visualization

Authors: Chase Stokes, Marti A. Hearst

Link: http://arxiv.org/abs/2404.00131v1

Abstract: Visualization research tends to de-emphasize consideration of the textual context in which its images are placed. We argue that visualization research should consider textual representations as a primary alternative to visual options when assessing designs, and when assessing designs, equal attention should be given to the construction of the language as to the visualizations. We also call for a consideration of readability when integrating visualizations with written text. In highlighting these points, visualization research would be elevated in efficacy and demonstrate thorough accounting for viewers' needs and responses.

Enhancing Dimension-Reduced Scatter Plots with Class and Feature Centroids

Authors: Daniel B. Hier, Tayo Obafemi-Ajayi, Gayla R. Olbricht, Devin M. Burns, Sasha Petrenko, Donald C. Wunsch II

Link: http://arxiv.org/abs/2403.20246v1

Abstract: Dimension reduction is increasingly applied to high-dimensional biomedical data to improve its interpretability. When datasets are reduced to two dimensions, each observation is assigned an x and y coordinates and is represented as a point on a scatter plot. A significant challenge lies in interpreting the meaning of the x and y axes due to the complexities inherent in dimension reduction. This study addresses this challenge by using the x and y coordinates derived from dimension reduction to calculate class and feature centroids, which can be overlaid onto the scatter plots. This method connects the low-dimension space to the original high-dimensional space. We illustrate the utility of this approach with data derived from the phenotypes of three neurogenetic diseases and demonstrate how the addition of class and feature centroids increases the interpretability of scatter plots.

Entertainment chatbot for the digital inclusion of elderly people without abstraction capabilities

Authors: Silvia García-Méndez, Francisco de Arriba-Pérez, Francisco J. González-Castaño, José A. Regueiro-Janeiro, Felipe Gil-Castiñeira

Link: http://arxiv.org/abs/2404.01327v1

Abstract: Current language processing technologies allow the creation of conversational chatbot platforms. Even though artificial intelligence is still too immature to support satisfactory user experience in many mass market domains, conversational interfaces have found their way into ad hoc applications such as call centres and online shopping assistants. However, they have not been applied so far to social inclusion of elderly people, who are particularly vulnerable to the digital divide. Many of them relieve their loneliness with traditional media such as TV and radio, which are known to create a feeling of companionship. In this paper we present the EBER chatbot, designed to reduce the digital gap for the elderly. EBER reads news in the background and adapts its responses to the user's mood. Its novelty lies in the concept of "intelligent radio", according to which, instead of simplifying a digital information system to make it accessible to the elderly, a traditional channel they find familiar -- background news -- is augmented with interactions via voice dialogues. We make it possible by combining Artificial Intelligence Modelling Language, automatic Natural Language Generation and Sentiment Analysis. The system allows accessing digital content of interest by combining words extracted from user answers to chatbot questions with keywords extracted from the news items. This approach permits defining metrics of the abstraction capabilities of the users depending on a spatial representation of the word space. To prove the suitability of the proposed solution we present results of real experiments conducted with elderly people that provided valuable insights. Our approach was considered satisfactory during the tests and improved the information search capabilities of the participants.

ITCMA: A Generative Agent Based on a Computational Consciousness Structure

Authors: Hanzhong Zhang, Jibin Yin, Haoyang Wang, Ziwei Xiang

Link: http://arxiv.org/abs/2403.20097v1

Abstract: Large Language Models (LLMs) still face challenges in tasks requiring understanding implicit instructions and applying common-sense knowledge. In such scenarios, LLMs may require multiple attempts to achieve human-level performance, potentially leading to inaccurate responses or inferences in practical environments, affecting their long-term consistency and behavior. This paper introduces the Internal Time-Consciousness Machine (ITCM), a computational consciousness structure. We further propose the ITCM-based Agent (ITCMA), which supports behavior generation and reasoning in open-world settings. ITCMA enhances LLMs' ability to understand implicit instructions and apply common-sense knowledge by considering agents' interaction and reasoning with the environment. Evaluations in the Alfworld environment show that trained ITCMA outperforms the state-of-the-art (SOTA) by 9% on the seen set. Even untrained ITCMA achieves a 96% task completion rate on the seen set, 5% higher than SOTA, indicating its superiority over traditional intelligent agents in utility and generalization. In real-world tasks with quadruped robots, the untrained ITCMA achieves an 85% task completion rate, which is close to its performance in the unseen set, demonstrating its comparable utility in real-world settings.

MindArm: Mechanized Intelligent Non-Invasive Neuro-Driven Prosthetic Arm System

Authors: Maha Nawaz, Abdul Basit, Muhammad Shafique

Link: http://arxiv.org/abs/2403.19992v1

Abstract: Currently, people with disability or difficulty to move their arms (referred to as "patients") have very limited technological solutions to efficiently address their physiological limitations. It is mainly due to two reasons: (1) the non-invasive solutions like mind-controlled prosthetic devices are typically very costly and require expensive maintenance; and (2) other solutions require costly invasive brain surgery, which is high risk to perform, expensive, and difficult to maintain. Therefore, current technological solutions are not accessible for all patients with different financial backgrounds. Toward this, we propose a low-cost technological solution called MindArm, a mechanized intelligent non-invasive neuro-driven prosthetic arm system. Our MindArm system employs a deep neural network (DNN) engine to translate brain signals into the intended prosthetic arm motion, thereby helping patients to perform many activities despite their physiological limitations. Here, our MindArm system utilizes widely accessible and low-cost surface electroencephalogram (EEG) electrodes coupled with an Open Brain Computer Interface and UDP networking for acquiring brain signals and transmitting them to the compute module for signal processing. In the compute module, we run a trained DNN model to interpret normalized micro-voltage of the brain signals, and then translate them into a prosthetic arm action via serial communication seamlessly. The experimental results on a fully working prototype demonstrate that, from the three defined actions, our MindArm system achieves positive success rates, i.e., 91% for idle/stationary, 85% for shake hand, and 84% for pick-up cup. This demonstrates that our MindArm provides a novel approach for an alternate low-cost mind-controlled prosthetic devices for all patients.

2024-03-28

"I'm categorizing LLM as a productivity tool": Examining ethics of LLM use in HCI research practices

Authors: Shivani Kapania, Ruiyi Wang, Toby Jia-Jun Li, Tianshi Li, Hong Shen

Link: http://arxiv.org/abs/2403.19876v1

Abstract: Large language models are increasingly applied in real-world scenarios, including research and education. These models, however, come with well-known ethical issues, which may manifest in unexpected ways in human-computer interaction research due to the extensive engagement with human subjects. This paper reports on research practices related to LLM use, drawing on 16 semi-structured interviews and a survey conducted with 50 HCI researchers. We discuss the ways in which LLMs are already being utilized throughout the entire HCI research pipeline, from ideation to system development and paper writing. While researchers described nuanced understandings of ethical issues, they were rarely or only partially able to identify and address those ethical concerns in their own projects. This lack of action and reliance on workarounds was explained through the perceived lack of control and distributed responsibility in the LLM supply chain, the conditional nature of engaging with ethics, and competing priorities. Finally, we reflect on the implications of our findings and present opportunities to shape emerging norms of engaging with large language models in HCI research.

Creating Aesthetic Sonifications on the Web with SIREN

Authors: Tristan Peng, Hongchan Choi, Jonathan Berger

Link: http://arxiv.org/abs/2403.19763v1

Abstract: SIREN is a flexible, extensible, and customizable web-based general-purpose interface for auditory data display (sonification). Designed as a digital audio workstation for sonification, synthesizers written in JavaScript using the Web Audio API facilitate intuitive mapping of data to auditory parameters for a wide range of purposes. This paper explores the breadth of sound synthesis techniques supported by SIREN, and details the structure and definition of a SIREN synthesizer module. The paper proposes further development that will increase SIREN's utility.

Leveraging Counterfactual Paths for Contrastive Explanations of POMDP Policies

Authors: Benjamin Kraske, Zakariya Laouar, Zachary Sunberg

Link: http://arxiv.org/abs/2403.19760v1

Abstract: As humans come to rely on autonomous systems more, ensuring the transparency of such systems is important to their continued adoption. Explainable Artificial Intelligence (XAI) aims to reduce confusion and foster trust in systems by providing explanations of agent behavior. Partially observable Markov decision processes (POMDPs) provide a flexible framework capable of reasoning over transition and state uncertainty, while also being amenable to explanation. This work investigates the use of user-provided counterfactuals to generate contrastive explanations of POMDP policies. Feature expectations are used as a means of contrasting the performance of these policies. We demonstrate our approach in a Search and Rescue (SAR) setting. We analyze and discuss the associated challenges through two case studies.

Collaborative Interactive Evolution of Art in the Latent Space of Deep Generative Models

Authors: Ole Hall, Anil Yaman

Link: http://arxiv.org/abs/2403.19620v1

Abstract: Generative Adversarial Networks (GANs) have shown great success in generating high quality images and are thus used as one of the main approaches to generate art images. However, usually the image generation process involves sampling from the latent space of the learned art representations, allowing little control over the output. In this work, we first employ GANs that are trained to produce creative images using an architecture known as Creative Adversarial Networks (CANs), then, we employ an evolutionary approach to navigate within the latent space of the models to discover images. We use automatic aesthetic and collaborative interactive human evaluation metrics to assess the generated images. In the human interactive evaluation case, we propose a collaborative evaluation based on the assessments of several participants. Furthermore, we also experiment with an intelligent mutation operator that aims to improve the quality of the images through local search based on an aesthetic measure. We evaluate the effectiveness of this approach by comparing the results produced by the automatic and collaborative interactive evolution. The results show that the proposed approach can generate highly attractive art images when the evolution is guided by collaborative human feedback.

Exploring Communication Dynamics: Eye-tracking Analysis in Pair Programming of Computer Science Education

Authors: Wunmin Jang, Hong Gao, Tilman Michaeli, Enkelejda Kasneci

Link: http://arxiv.org/abs/2403.19560v1

Abstract: Pair programming is widely recognized as an effective educational tool in computer science that promotes collaborative learning and mirrors real-world work dynamics. However, communication breakdowns within pairs significantly challenge this learning process. In this study, we use eye-tracking data recorded during pair programming sessions to study communication dynamics between various pair programming roles across different student, expert, and mixed group cohorts containing 19 participants. By combining eye-tracking data analysis with focus group interviews and questionnaires, we provide insights into communication's multifaceted nature in pair programming. Our findings highlight distinct eye-tracking patterns indicating changes in communication skills across group compositions, with participants prioritizing code exploration over communication, especially during challenging tasks. Further, students showed a preference for pairing with experts, emphasizing the importance of understanding group formation in pair programming scenarios. These insights emphasize the importance of understanding group dynamics and enhancing communication skills through pair programming for successful outcomes in computer science education.

LLMs as Academic Reading Companions: Extending HCI Through Synthetic Personae

Authors: Celia Chen, Alex Leitch

Link: http://arxiv.org/abs/2403.19506v1

Abstract: This position paper argues that large language models (LLMs) constitute promising yet underutilized academic reading companions capable of enhancing learning. We detail an exploratory study examining Claude.ai from Anthropic, an LLM-based interactive assistant that helps students comprehend complex qualitative literature content. The study compares quantitative survey data and qualitative interviews assessing outcomes between a control group and an experimental group leveraging Claude.ai over a semester across two graduate courses. Initial findings demonstrate tangible improvements in reading comprehension and engagement among participants using the AI agent versus unsupported independent study. However, there is potential for overreliance and ethical considerations that warrant continued investigation. By documenting an early integration of an LLM reading companion into an educational context, this work contributes pragmatic insights to guide development of synthetic personae supporting learning. Broader impacts compel policy and industry actions to uphold responsible design in order to maximize benefits of AI integration while prioritizing student wellbeing.

A theoretical framework for the design and analysis of computational thinking problems in education

Authors: Giorgia Adorni, Alberto Piatti, Engin Bumbacher, Lucio Negrini, Francesco Mondada, Dorit Assaf, Francesca Mangili, Luca Gambardella

Link: http://arxiv.org/abs/2403.19475v1

Abstract: The field of computational thinking education has grown in recent years as researchers and educators have sought to develop and assess students' computational thinking abilities. While much of the research in this area has focused on defining computational thinking, the competencies it involves and how to assess them in teaching and learning contexts, this work takes a different approach. We provide a more situated perspective on computational thinking, focusing on the types of problems that require computational thinking skills to be solved and the features that support these processes. We develop a framework for analysing existing computational thinking problems in an educational context. We conduct a comprehensive literature review to identify prototypical activities from areas where computational thinking is typically pursued in education. We identify the main components and characteristics of these activities, along with their influence on activating computational thinking competencies. The framework provides a catalogue of computational thinking skills that can be used to understand the relationship between problem features and competencies activated. This study contributes to the field of computational thinking education by offering a tool for evaluating and revising existing problems to activate specific skills and for assisting in designing new problems that target the development of particular competencies. The results of this study may be of interest to researchers and educators working in computational thinking education.

"At the end of the day, I am accountable": Gig Workers' Self-Tracking for Multi-Dimensional Accountability Management

Authors: Rie Helene, Hernandez, Qiurong Song, Yubo Kou, Xinning Gui

Link: http://arxiv.org/abs/2403.19436v1

Abstract: Tracking is inherent in and central to the gig economy. Platforms track gig workers' performance through metrics such as acceptance rate and punctuality, while gig workers themselves engage in self-tracking. Although prior research has extensively examined how gig platforms track workers through metrics -- with some studies briefly acknowledging the phenomenon of self-tracking among workers -- there is a dearth of studies that explore how and why gig workers track themselves. To address this, we conducted 25 semi-structured interviews, revealing how gig workers self-tracking to manage accountabilities to themselves and external entities across three identities: the holistic self, the entrepreneurial self, and the platformized self. We connect our findings to neoliberalism, through which we contextualize gig workers' self-accountability and the invisible labor of self-tracking. We further discuss how self-tracking mitigates information and power asymmetries in gig work and offer design implications to support gig workers' multi-dimensional self-tracking.

An Interactive Human-Machine Learning Interface for Collecting and Learning from Complex Annotations

Authors: Jonathan Erskine, Matt Clifford, Alexander Hepburn, Raúl Santos-Rodríguez

Link: http://arxiv.org/abs/2403.19339v1

Abstract: Human-Computer Interaction has been shown to lead to improvements in machine learning systems by boosting model performance, accelerating learning and building user confidence. In this work, we aim to alleviate the expectation that human annotators adapt to the constraints imposed by traditional labels by allowing for extra flexibility in the form that supervision information is collected. For this, we propose a human-machine learning interface for binary classification tasks which enables human annotators to utilise counterfactual examples to complement standard binary labels as annotations for a dataset. Finally we discuss the challenges in future extensions of this work.

CogniDot: Vasoactivity-based Cognitive Load Monitoring with a Miniature On-skin Sensor

Authors: Hongbo Lan, Yanrong Li, Shixuan Li, Xin Yi, Tengxiang Zhang

Link: http://arxiv.org/abs/2403.19206v1

Abstract: Vascular activities offer valuable signatures for psychological monitoring applications. We present CogniDot, an affordable, miniature skin sensor placed on the temporal area on the head that senses cognitive loads with a single-pixel color sensor. With its energy-efficient design, bio-compatible adhesive, and compact size (22mm diameter, 8.5mm thickness), it is ideal for long-term monitoring of mind status. We showed in detail the hardware design of our sensor. The user study results with 12 participants show that CogniDot can accurately differentiate between three levels of cognitive loads with a within-user accuracy of 97%. We also discuss its potential for broader applications.

Algorithmic Ways of Seeing: Using Object Detection to Facilitate Art Exploration

Authors: Louie Søs Meyer, Johanne Engel Aaen, Anitamalina Regitse Tranberg, Peter Kun, Matthias Freiberger, Sebastian Risi, Anders Sundnes Løvlie

Link: http://arxiv.org/abs/2403.19174v1

Abstract: This Research through Design paper explores how object detection may be applied to a large digital art museum collection to facilitate new ways of encountering and experiencing art. We present the design and evaluation of an interactive application called SMKExplore, which allows users to explore a museum's digital collection of paintings by browsing through objects detected in the images, as a novel form of open-ended exploration. We provide three contributions. First, we show how an object detection pipeline can be integrated into a design process for visual exploration. Second, we present the design and development of an app that enables exploration of an art museum's collection. Third, we offer reflections on future possibilities for museums and HCI researchers to incorporate object detection techniques into the digitalization of museums.

Exploring Holistic HMI Design for Automated Vehicles: Insights from a Participatory Workshop to Bridge In-Vehicle and External Communication

Authors: Haoyu Dong, Tram Thi Minh Tran, Rutger Verstegen, Silvia Cazacu, Ruolin Gao, Marius Hoggenmüller, Debargha Dey, Mervyn Franssen, Markus Sasalovici, Pavlo Bazilinskyy, Marieke Martens

Link: http://arxiv.org/abs/2403.19153v1

Abstract: Human-Machine Interfaces (HMIs) for automated vehicles (AVs) are typically divided into two categories: internal HMIs for interactions within the vehicle, and external HMIs for communication with other road users. In this work, we examine the prospects of bridging these two seemingly distinct domains. Through a participatory workshop with automotive user interface researchers and practitioners, we facilitated a critical exploration of holistic HMI design by having workshop participants collaboratively develop interaction scenarios involving AVs, in-vehicle users, and external road users. The discussion offers insights into the escalation of interface elements as an HMI design strategy, the direct interactions between different users, and an expanded understanding of holistic HMI design. This work reflects a collaborative effort to understand the practical aspects of this holistic design approach, offering new perspectives and encouraging further investigation into this underexplored aspect of automotive user interfaces.

Real-time accident detection and physiological signal monitoring to enhance motorbike safety and emergency response

Authors: S. M. Kayser Mehbub Siam, Khadiza Islam Sumaiya, Md Rakib Al-Amin, Tamim Hasan Turjo, Ahsanul Islam, A. H. M. A. Rahim, Md Rakibul Hasan

Link: http://arxiv.org/abs/2403.19085v1

Abstract: Rapid urbanization and improved living standards have led to a substantial increase in the number of vehicles on the road, consequently resulting in a rise in the frequency of accidents. Among these accidents, motorbike accidents pose a particularly high risk, often resulting in serious injuries or deaths. A significant number of these fatalities occur due to delayed or inadequate medical attention. To this end, we propose a novel automatic detection and notification system specifically designed for motorbike accidents. The proposed system comprises two key components: a detection system and a physiological signal monitoring system. The detection system is integrated into the helmet and consists of a microcontroller, accelerometer, GPS, GSM, and Wi-Fi modules. The physio-monitoring system incorporates a sensor for monitoring pulse rate and SpO${2}$ saturation. All collected data are presented on an LCD display and wirelessly transmitted to the detection system through the microcontroller of the physiological signal monitoring system. If the accelerometer readings consistently deviate from the specified threshold decided through extensive experimentation, the system identifies the event as an accident and transmits the victim's information -- including the GPS location, pulse rate, and SpO${2}$ saturation rate -- to the designated emergency contacts. Preliminary results demonstrate the efficacy of the proposed system in accurately detecting motorbike accidents and promptly alerting emergency contacts. We firmly believe that the proposed system has the potential to significantly mitigate the risks associated with motorbike accidents and save lives.

2024-03-27

Towards Human-Centered Construction Robotics: An RL-Driven Companion Robot For Contextually Assisting Carpentry Workers

Authors: Yuning Wu, Jiaying Wei, Jean Oh, Daniel Cardoso Llach

Link: http://arxiv.org/abs/2403.19060v1

Abstract: In the dynamic construction industry, traditional robotic integration has primarily focused on automating specific tasks, often overlooking the complexity and variability of human aspects in construction workflows. This paper introduces a human-centered approach with a ``work companion rover" designed to assist construction workers within their existing practices, aiming to enhance safety and workflow fluency while respecting construction labor's skilled nature. We conduct an in-depth study on deploying a robotic system in carpentry formwork, showcasing a prototype that emphasizes mobility, safety, and comfortable worker-robot collaboration in dynamic environments through a contextual Reinforcement Learning (RL)-driven modular framework. Our research advances robotic applications in construction, advocating for collaborative models where adaptive robots support rather than replace humans, underscoring the potential for an interactive and collaborative human-robot workforce.

Visualizing High-Dimensional Temporal Data Using Direction-Aware t-SNE

Authors: Pavlin G. Poličar, Blaž Zupan

Link: http://arxiv.org/abs/2403.19040v1

Abstract: Many real-world data sets contain a temporal component or involve transitions from state to state. For exploratory data analysis, we can represent these high-dimensional data sets in two-dimensional maps, using embeddings of the data objects under exploration and representing their temporal relationships with directed edges. Most existing dimensionality reduction techniques, such as t-SNE and UMAP, do not take into account the temporal or relational nature of the data when constructing the embeddings, resulting in temporally cluttered visualizations that obscure potentially interesting patterns. To address this problem, we propose two complementary, direction-aware loss terms in the optimization function of t-SNE that emphasize the temporal aspects of the data, guiding the optimization and the resulting embedding to reveal temporal patterns that might otherwise go unnoticed. The Directional Coherence Loss (DCL) encourages nearby arrows connecting two adjacent time series points to point in the same direction, while the Edge Length Loss (ELL) penalizes arrows - which effectively represent time gaps in the visualized embedding - based on their length. Both loss terms are differentiable and can be easily incorporated into existing dimensionality reduction techniques. By promoting local directionality of the directed edges, our procedure produces more temporally meaningful and less cluttered visualizations. We demonstrate the effectiveness of our approach on a toy dataset and two real-world datasets.

Women are less comfortable expressing opinions online than men and report heightened fears for safety: Surveying gender differences in experiences of online harms

Authors: Francesca Stevens, Florence E. Enock, Tvesha Sippy, Jonathan Bright, Miranda Cross, Pica Johansson, Judy Wajcman, Helen Z. Margetts

Link: http://arxiv.org/abs/2403.19037v1

Abstract: Online harms, such as hate speech, trolling and self-harm promotion, continue to be widespread. While some work suggests women are disproportionately affected, other studies find mixed evidence for gender differences in experiences with content of this kind. Using a nationally representative survey of UK adults (N=1992), we examine exposure to a variety of harms, fears surrounding being targeted, the psychological impact of online experiences, the use of safety tools to protect against harm, and comfort with various forms of online participation across men and women. We find that while men and women see harmful content online to a roughly similar extent, women are more at risk than men of being targeted by harms including online misogyny, cyberstalking and cyberflashing. Women are significantly more fearful of being targeted by harms overall, and report greater negative psychological impact as a result of particular experiences. Perhaps in an attempt to mitigate risk, women report higher use of a range of safety tools and less comfort with several forms of online participation, with just 23% of women comfortable expressing political views online compared to 40% of men. We also find direct associations between fears surrounding harms and comfort with online behaviours. For example, fear of being trolled significantly decreases comfort expressing opinions, and fear of being targeted by misogyny significantly decreases comfort sharing photos. Our results are important because with much public discourse happening online, we must ensure all members of society feel safe and able to participate in online spaces.

Should I Help a Delivery Robot? Cultivating Prosocial Norms through Observations

Authors: Vivienne Bihe Chi, Shashank Mehrotra, Teruhisa Misu, Kumar Akash

Link: http://arxiv.org/abs/2403.19027v1

Abstract: We propose leveraging prosocial observations to cultivate new social norms to encourage prosocial behaviors toward delivery robots. With an online experiment, we quantitatively assess updates in norm beliefs regarding human-robot prosocial behaviors through observational learning. Results demonstrate the initially perceived normativity of helping robots is influenced by familiarity with delivery robots and perceptions of robots' social intelligence. Observing human-robot prosocial interactions notably shifts peoples' normative beliefs about prosocial actions; thereby changing their perceived obligations to offer help to delivery robots. Additionally, we found that observing robots offering help to humans, rather than receiving help, more significantly increased participants' feelings of obligation to help robots. Our findings provide insights into prosocial design for future mobility systems. Improved familiarity with robot capabilities and portraying them as desirable social partners can help foster wider acceptance. Furthermore, robots need to be designed to exhibit higher levels of interactivity and reciprocal capabilities for prosocial behavior.

The Correlations of Scene Complexity, Workload, Presence, and Cybersickness in a Task-Based VR Game

Authors: Mohammadamin Sanaei, Stephen B. Gilbert, Nikoo Javadpour, Hila Sabouni, Michael C. Dorneich, Jonathan W. Kelly

Link: http://arxiv.org/abs/2403.19019v1

Abstract: This investigation examined the relationships among scene complexity, workload, presence, and cybersickness in virtual reality (VR) environments. Numerous factors can influence the overall VR experience, and existing research on this matter is not yet conclusive, warranting further investigation. In this between-subjects experimental setup, 44 participants engaged in the Pendulum Chair game, with half exposed to a simple scene with lower optic flow and lower familiarity, and the remaining half to a complex scene characterized by higher optic flow and greater familiarity. The study measured the dependent variables workload, presence, and cybersickness and analyzed their correlations. Equivalence testing was also used to compare the simple and complex environments. Results revealed that despite the visible differences between the environments, within the 10% boundaries of the maximum possible value for workload and presence, and 13.6% of the maximum SSQ value, a statistically significant equivalence was observed between the simple and complex scenes. Additionally, a moderate, negative correlation emerged between workload and SSQ scores. The findings suggest two key points: (1) the nature of the task can mitigate the impact of scene complexity factors such as optic flow and familiarity, and (2) the correlation between workload and cybersickness may vary, showing either a positive or negative relationship.

Thelxinoë: Recognizing Human Emotions Using Pupillometry and Machine Learning

Authors: Darlene Barker, Haim Levkowitz

Link: http://arxiv.org/abs/2403.19014v1

Abstract: In this study, we present a method for emotion recognition in Virtual Reality (VR) using pupillometry. We analyze pupil diameter responses to both visual and auditory stimuli via a VR headset and focus on extracting key features in the time-domain, frequency-domain, and time-frequency domain from VR generated data. Our approach utilizes feature selection to identify the most impactful features using Maximum Relevance Minimum Redundancy (mRMR). By applying a Gradient Boosting model, an ensemble learning technique using stacked decision trees, we achieve an accuracy of 98.8% with feature engineering, compared to 84.9% without it. This research contributes significantly to the Thelxino"e framework, aiming to enhance VR experiences by integrating multiple sensor data for realistic and emotionally resonant touch interactions. Our findings open new avenues for developing more immersive and interactive VR environments, paving the way for future advancements in virtual touch technology.

SolderlessPCB: Reusing Electronic Components in PCB Prototyping through Detachable 3D Printed Housings

Authors: Zeyu Yan, Jiasheng Li, Zining Zhang, Huaishu Peng

Link: http://arxiv.org/abs/2403.18797v1

Abstract: The iterative prototyping process for printed circuit boards (PCBs) frequently employs surface-mounted device (SMD) components, which are often discarded rather than reused due to the challenges associated with desoldering, leading to unnecessary electronic waste. This paper introduces SolderlessPCB, a collection of techniques for solder-free PCB prototyping, specifically designed to promote the recycling and reuse of electronic components. Central to this approach are custom 3D-printable housings that allow SMD components to be mounted onto PCBs without soldering. We detail the design of SolderlessPCB and the experiments conducted to evaluate its design parameters, electrical performance, and durability. To illustrate the potential for reusing SMD components with SolderlessPCB, we discuss two scenarios: the reuse of components from earlier design iterations and from obsolete prototypes. We also provide examples demonstrating that SolderlessPCB can handle high-current applications and is suitable for high-speed data transmission. The paper concludes by discussing the limitations of our approach and suggesting future directions to overcome these challenges.

Teaching Introductory HRI: UChicago Course "Human-Robot Interaction: Research and Practice"

Authors: Sarah Sebo

Link: http://arxiv.org/abs/2403.18692v1

Abstract: In 2020, I designed the course CMSC 20630/30630 Human-Robot Interaction: Research and Practice as a hands-on introduction to human-robot interaction (HRI) research for both undergraduate and graduate students at the University of Chicago. Since 2020, I have taught and refined this course each academic year. Human-Robot Interaction: Research and Practice focuses on the core concepts and cutting-edge research in the field of human-robot interaction (HRI), covering topics that include: nonverbal robot behavior, verbal robot behavior, social dynamics, norms & ethics, collaboration & learning, group interactions, applications, and future challenges of HRI. Course meetings involve students in the class leading discussions about cutting-edge peer-reviewed research HRI publications. Students also participate in a quarter-long collaborative research project, where they pursue an HRI research question that often involves conducing their own human-subjects research study where they recruit human subjects to interact with a robot. In this paper, I detail the structure of the course and its learning goals as well as my reflections and student feedback on the course.

An Exploratory Study on Upper-Level Computing Students' Use of Large Language Models as Tools in a Semester-Long Project

Authors: Ben Arie Tanay, Lexy Arinze, Siddhant S. Joshi, Kirsten A. Davis, James C. Davis

Link: http://arxiv.org/abs/2403.18679v1

Abstract: Background: Large Language Models (LLMs) such as ChatGPT and CoPilot are influencing software engineering practice. Software engineering educators must teach future software engineers how to use such tools well. As of yet, there have been few studies that report on the use of LLMs in the classroom. It is, therefore, important to evaluate students' perception of LLMs and possible ways of adapting the computing curriculum to these shifting paradigms. Purpose: The purpose of this study is to explore computing students' experiences and approaches to using LLMs during a semester-long software engineering project. Design/Method: We collected data from a senior-level software engineering course at Purdue University. This course uses a project-based learning (PBL) design. The students used LLMs such as ChatGPT and Copilot in their projects. A sample of these student teams were interviewed to understand (1) how they used LLMs in their projects; and (2) whether and how their perspectives on LLMs changed over the course of the semester. We analyzed the data to identify themes related to students' usage patterns and learning outcomes. Results/Discussion: When computing students utilize LLMs within a project, their use cases cover both technical and professional applications. In addition, these students perceive LLMs to be efficient tools in obtaining information and completion of tasks. However, there were concerns about the responsible use of LLMs without being detrimental to their own learning outcomes. Based on our findings, we recommend future research to investigate the usage of LLM's in lower-level computer engineering courses to understand whether and how LLMs can be integrated as a learning aid without hurting the learning outcomes.

Aiming for Relevance

Authors: Bar Eini Porat, Danny Eytan, Uri Shalit

Link: http://arxiv.org/abs/2403.18668v1

Abstract: Vital signs are crucial in intensive care units (ICUs). They are used to track the patient's state and to identify clinically significant changes. Predicting vital sign trajectories is valuable for early detection of adverse events. However, conventional machine learning metrics like RMSE often fail to capture the true clinical relevance of such predictions. We introduce novel vital sign prediction performance metrics that align with clinical contexts, focusing on deviations from clinical norms, overall trends, and trend deviations. These metrics are derived from empirical utility curves obtained in a previous study through interviews with ICU clinicians. We validate the metrics' usefulness using simulated and real clinical datasets (MIMIC and eICU). Furthermore, we employ these metrics as loss functions for neural networks, resulting in models that excel in predicting clinically significant events. This research paves the way for clinically relevant machine learning model evaluation and optimization, promising to improve ICU patient care. 10 pages, 9 figures.