Causal reasoning is viewed as crucial for achieving human-level machine intelligence. Recent advances in language models have expanded the horizons of artificial intelligence across various domains, sparking inquiries into their potential for causal reasoning. In this work, we introduce Causal evaluation of Language Models (CaLM), which, to the best of our knowledge, is the first comprehensive benchmark for evaluating the causal reasoning capabilities of language models. First, we propose the CaLM framework, which establishes a foundational taxonomy consisting of four modules: causal target (i.e., what to evaluate), adaptation (i.e., how to obtain the results), metric (i.e., how to measure the results), and error (i.e., how to analyze the bad results). This taxonomy defines a broad evaluation design space while systematically selecting criteria and priorities. Second, we compose the CaLM dataset, comprising 126,334 data samples, to provide curated sets of causal targets, adaptations, metrics, and errors, offering extensive coverage for diverse research pursuits. Third, we conduct an extensive evaluation of 28 leading language models on a core set of 92 causal targets, 9 adaptations, 7 metrics, and 12 error types. Fourth, we perform detailed analyses of the evaluation results across various dimensions (e.g., adaptation, scale). Fifth, we present 50 high-level empirical findings across 9 dimensions (e.g., model), providing valuable guidance for future language model development. Finally, we develop a multifaceted platform, including a website, leaderboards, datasets, and toolkits, to support scalable and adaptable assessments. We envision CaLM as an ever-evolving benchmark for the community, systematically updated with new causal targets, adaptations, models, metrics, and error types to reflect ongoing research advancements. Project website is at https://opencausalab.github.io/CaLM.
Reconfigurable Intelligent Surface (RIS) is attracting more and more research interest because of its ability to reprogram the radio environment. Designing and implementing the RIS, however, is challenging because of limitations of printed circuit board (PCB) technology related to manufacturing of large sizes as well as the cost of switches. Thus, a low-cost manufacturing process suitable for large size and volume of devices, such as screen-printing is necessary. In this paper, for the first time, a fully screen-printed reconfigurable intelligent surface (RIS) with vanadium dioxide (VO2) switches for 5G and beyond communications is proposed. A VO2 ink has been prepared and batches of switches have been printed and integrated with the resonator elements. These switches are a fraction of the cost of commercial switches. Furthermore, the printing of these switches directly on metal patterns negates the need of any minute soldering of the switches. To avoid the complications of multilayer printing and realizing the RIS without vias, the resonators and the biasing lines are realized on a single layer. However, this introduces the challenge of interference between the biasing lines and the resonators, which is tackled in this work by designing the bias lines as part of the resonator. By adjusting the unit cell periodicity and the dimension of the H-shaped resonator, we achieve a 220 to 170{\deg} phase shift from 23.5 GHz to 29.5 GHz covering both n257 and n258 bands. Inside the wide bandwidth, the maximum ON reflection magnitude is 74%, and the maximum OFF magnitude is 94%. The RIS array comprises 20x20 unit cells (4.54x4.54{\lambda}^2 at 29.5 GHz). Each column of unit cells is serially connected to a current biasing circuit. To validate the array's performance, we conduct full-wave simulations as well as near-field and far-field measurements.
An interactive social robotic assistant must provide services in complex and crowded spaces while adapting its behavior based on real-time human language commands or feedback. In this paper, we propose a novel hybrid approach called Social Robot Planner (SRLM), which integrates Large Language Models (LLM) and Deep Reinforcement Learning (DRL) to navigate through human-filled public spaces and provide multiple social services. SRLM infers global planning from human-in-loop commands in real-time, and encodes social information into a LLM-based large navigation model (LNM) for low-level motion execution. Moreover, a DRL-based planner is designed to maintain benchmarking performance, which is blended with LNM by a large feedback model (LFM) to address the instability of current text and LLM-driven LNM. Finally, SRLM demonstrates outstanding performance in extensive experiments. More details about this work are available at: https://sites.google.com/view/navi-srlm
We present AnaMoDiff, a novel diffusion-based method for 2D motion analogies that is applied to raw, unannotated videos of articulated characters. Our goal is to accurately transfer motions from a 2D driving video onto a source character, with its identity, in terms of appearance and natural movement, well preserved, even when there may be significant discrepancies between the source and driving characters in their part proportions and movement speed and styles. Our diffusion model transfers the input motion via a latent optical flow (LOF) network operating in a noised latent space, which is spatially aware, efficient to process compared to the original RGB videos, and artifact-resistant through the diffusion denoising process even amid dense movements. To accomplish both motion analogy and identity preservation, we train our denoising model in a feature-disentangled manner, operating at two noise levels. While identity-revealing features of the source are learned via conventional noise injection, motion features are learned from LOF-warped videos by only injecting noise with large values, with the stipulation that motion properties involving pose and limbs are encoded by higher-level features. Experiments demonstrate that our method achieves the best trade-off between motion analogy and identity preservation.
Hypergraphs provide an effective modeling approach for modeling high-order relationships in many real-world datasets. To capture such complex relationships, several hypergraph neural networks have been proposed for learning hypergraph structure, which propagate information from nodes to hyperedges and then from hyperedges back to nodes. However, most existing methods focus on information propagation between hyperedges and nodes, neglecting the interactions among hyperedges themselves. In this paper, we propose HeIHNN, a hyperedge interaction-aware hypergraph neural network, which captures the interactions among hyperedges during the convolution process and introduce a novel mechanism to enhance information flow between hyperedges and nodes. Specifically, HeIHNN integrates the interactions between hyperedges into the hypergraph convolution by constructing a three-stage information propagation process. After propagating information from nodes to hyperedges, we introduce a hyperedge-level convolution to update the hyperedge embeddings. Finally, the embeddings that capture rich information from the interaction among hyperedges will be utilized to update the node embeddings. Additionally, we introduce a hyperedge outlier removal mechanism in the information propagation stages between nodes and hyperedges, which dynamically adjusts the hypergraph structure using the learned embeddings, effectively removing outliers. Extensive experiments conducted on real-world datasets show the competitive performance of HeIHNN compared with state-of-the-art methods.
This paper explores the mutual coupling in the reconfigurable intelligent surface (RIS)-aided communication. Despite the existence of several mutual coupling-aware models for RIS-aided communication, a notable gap remains due to the lack of experimental validation. This paper bridges this gap by first introducing a novel model training approach based on the 3D full-wave simulation and subsequently validating the obtained model via experimental measurements in a 1-bit quasi-passive RIS prototype operating in the mmWave band. Comparative analyses reveal precision in both the employed mutual coupling-aware model and the assessed model parameters, offering a realistic evaluation of mutual coupling in authentic RIS hardware. Utilizing the validated mutual coupling-aware communication model, we systematically examine the impact of mutual coupling on communication performance by adopting the achievable rate as a performance indicator. Our results reveal that the mutual coupling in RIS exhibits heightened significance with increased RIS amplitude gains and showcases a frequency-dependent effect.
IoT devices are increasingly the source of data for machine learning (ML) applications running on edge servers. Data transmissions from devices to servers are often over local wireless networks whose bandwidth is not just limited but, more importantly, variable. Furthermore, in cyber-physical systems interacting with the physical environment, image offloading is also commonly subject to timing constraints. It is, therefore, important to develop an adaptive approach that maximizes the inference performance of ML applications under timing constraints and the resource constraints of IoT devices. In this paper, we use image classification as our target application and propose progressive neural compression (PNC) as an efficient solution to this problem. Although neural compression has been used to compress images for different ML applications, existing solutions often produce fixed-size outputs that are unsuitable for timing-constrained offloading over variable bandwidth. To address this limitation, we train a multi-objective rateless autoencoder that optimizes for multiple compression rates via stochastic taildrop to create a compression solution that produces features ordered according to their importance to inference performance. Features are then transmitted in that order based on available bandwidth, with classification ultimately performed using the (sub)set of features received by the deadline. We demonstrate the benefits of PNC over state-of-the-art neural compression approaches and traditional compression methods on a testbed comprising an IoT device and an edge server connected over a wireless network with varying bandwidth.
Multi-human multi-robot teams (MH-MR) obtain tremendous potential in tackling intricate and massive missions by merging distinct strengths and expertise of individual members. The inherent heterogeneity of these teams necessitates advanced initial task assignment (ITA) methods that align tasks with the intrinsic capabilities of team members from the outset. While existing reinforcement learning approaches show encouraging results, they might fall short in addressing the nuances of long-horizon ITA problems, particularly in settings with large-scale MH-MR teams or multifaceted tasks. To bridge this gap, we propose an attention-enhanced hierarchical reinforcement learning approach that decomposes the complex ITA problem into structured sub-problems, facilitating more efficient allocations. To bolster sub-policy learning, we introduce a hierarchical cross-attribute attention (HCA) mechanism, encouraging each sub-policy within the hierarchy to discern and leverage the specific nuances in the state space that are crucial for its respective decision-making phase. Through an extensive environmental surveillance case study, we demonstrate the benefits of our model and the HCA inside.
Traditional Time-series Anomaly Detection (TAD) methods often struggle with the composite nature of complex time-series data and a diverse array of anomalies. We introduce TADNet, an end-to-end TAD model that leverages Seasonal-Trend Decomposition to link various types of anomalies to specific decomposition components, thereby simplifying the analysis of complex time-series and enhancing detection performance. Our training methodology, which includes pre-training on a synthetic dataset followed by fine-tuning, strikes a balance between effective decomposition and precise anomaly detection. Experimental validation on real-world datasets confirms TADNet's state-of-the-art performance across a diverse range of anomalies.
Self-driving software pipelines include components that are learned from a significant number of training examples, yet it remains challenging to evaluate the overall system's safety and generalization performance. Together with scaling up the real-world deployment of autonomous vehicles, it is of critical importance to automatically find simulation scenarios where the driving policies will fail. We propose a method that efficiently generates adversarial simulation scenarios for autonomous driving by solving an optimal control problem that aims to maximally perturb the policy from its nominal trajectory. Given an image-based driving policy, we show that we can inject new objects in a neural rendering representation of the deployment scene, and optimize their texture in order to generate adversarial sensor inputs to the policy. We demonstrate that adversarial scenarios discovered purely in the neural renderer (surrogate scene) can often be successfully transferred to the deployment scene, without further optimization. We demonstrate this transfer occurs both in simulated and real environments, provided the learned surrogate scene is sufficiently close to the deployment scene.