The widespread use of pre-trained language models (PLMs) in natural language processing (NLP) has greatly improved performance outcomes. However, these models' vulnerability to adversarial attacks (e.g., camouflaged hints from drug dealers), particularly in the Chinese language with its rich character diversity/variation and complex structures, hatches vital apprehension. In this study, we propose a novel method, CHinese vAriatioN Graph Enhancement (CHANGE), to increase the robustness of PLMs against character variation attacks in Chinese content. CHANGE presents a novel approach for incorporating a Chinese character variation graph into the PLMs. Through designing different supplementary tasks utilizing the graph structure, CHANGE essentially enhances PLMs' interpretation of adversarially manipulated text. Experiments conducted in a multitude of NLP tasks show that CHANGE outperforms current language models in combating against adversarial attacks and serves as a valuable contribution to robust language model research. These findings contribute to the groundwork on robust language models and highlight the substantial potential of graph-guided pre-training strategies for real-world applications.
Deploying machine learning (ML) on diverse computing platforms is crucial to accelerate and broaden their applications. However, it presents significant software engineering challenges due to the fast evolution of models, especially the recent Large Language Models (LLMs), and the emergence of new computing platforms. Current ML frameworks are primarily engineered for CPU and CUDA platforms, leaving a big gap in enabling emerging ones like Metal, Vulkan, and WebGPU. While a traditional bottom-up development pipeline fails to close the gap timely, we introduce TapML, a top-down approach and tooling designed to streamline the deployment of ML systems on diverse platforms, optimized for developer productivity. Unlike traditional bottom-up methods, which involve extensive manual testing and debugging, TapML automates unit testing through test carving and adopts a migration-based strategy for gradually offloading model computations from mature source platforms to emerging target platforms. By leveraging realistic inputs and remote connections for gradual target offloading, TapML accelerates the validation and minimizes debugging scopes, significantly optimizing development efforts. TapML was developed and applied through a year-long, real-world effort that successfully deployed significant emerging models and platforms. Through serious deployments of 82 emerging models in 17 distinct architectures across 5 emerging platforms, we showcase the effectiveness of TapML in enhancing developer productivity while ensuring model reliability and efficiency. Furthermore, we summarize comprehensive case studies from our real-world development, offering best practices for developing emerging ML systems.
Reliable segmentation of road lines and markings is critical to autonomous driving. Our work is motivated by the observations that road lines and markings are (1) frequently occluded in the presence of moving vehicles, shadow, and glare and (2) highly structured with low intra-class shape variance and overall high appearance consistency. To solve these issues, we propose a Homography Guided Fusion (HomoFusion) module to exploit temporally-adjacent video frames for complementary cues facilitating the correct classification of the partially occluded road lines or markings. To reduce computational complexity, a novel surface normal estimator is proposed to establish spatial correspondences between the sampled frames, allowing the HomoFusion module to perform a pixel-to-pixel attention mechanism in updating the representation of the occluded road lines or markings. Experiments on ApolloScape, a large-scale lane mark segmentation dataset, and ApolloScape Night with artificial simulated night-time road conditions, demonstrate that our method outperforms other existing SOTA lane mark segmentation models with less than 9\% of their parameters and computational complexity. We show that exploiting available camera intrinsic data and ground plane assumption for cross-frame correspondence can lead to a light-weight network with significantly improved performances in speed and accuracy. We also prove the versatility of our HomoFusion approach by applying it to the problem of water puddle segmentation and achieving SOTA performance.
Evaluating large language models (LLMs) is fundamental, particularly in the context of practical applications. Conventional evaluation methods, typically designed primarily for LLM development, yield numerical scores that ignore the user experience. Therefore, our study shifts the focus from model-centered to human-centered evaluation in the context of AI-powered writing assistance applications. Our proposed metric, termed ``Revision Distance,'' utilizes LLMs to suggest revision edits that mimic the human writing process. It is determined by counting the revision edits generated by LLMs. Benefiting from the generated revision edit details, our metric can provide a self-explained text evaluation result in a human-understandable manner beyond the context-independent score. Our results show that for the easy-writing task, ``Revision Distance'' is consistent with established metrics (ROUGE, Bert-score, and GPT-score), but offers more insightful, detailed feedback and better distinguishes between texts. Moreover, in the context of challenging academic writing tasks, our metric still delivers reliable evaluations where other metrics tend to struggle. Furthermore, our metric also holds significant potential for scenarios lacking reference texts.
6G mobile networks aim to realize ubiquitous intelligence at the network edge via distributed learning, sensing, and data analytics. Their common operation is to aggregate high-dimensional data, which causes a communication bottleneck that cannot be resolved using traditional orthogonal multi-access schemes. A promising solution, called over-the-air computation (AirComp), exploits channels' waveform superposition property to enable simultaneous access, thereby overcoming the bottleneck. Nevertheless, its reliance on uncoded linear analog modulation exposes data to perturbation by noise and interference. Hence, the traditional analog AirComp falls short of meeting the high-reliability requirement for 6G. Overcoming the limitation of analog AirComp motivates this work, which focuses on developing a framework for digital AirComp. The proposed framework features digital modulation of each data value, integrated with the bit-slicing technique to allocate its bits to multiple symbols, thereby increasing the AirComp reliability. To optimally detect the aggregated digital symbols, we derive the optimal maximum a posteriori detector that is shown to outperform the traditional maximum likelihood detector. Furthermore, a comparative performance analysis of digital AirComp with respect to its analog counterpart with repetition coding is conducted to quantify the practical signal-to-noise ratio (SNR) regime favoring the proposed scheme. On the other hand, digital AirComp is enhanced by further development to feature awareness of heterogeneous bit importance levels and its exploitation in channel adaptation. Lastly, simulation results demonstrate the achivability of substantial reliability improvement of digital AirComp over its analog counterpart given the same channel uses.
As a next-generation wireless technology, the in-band full-duplex (IBFD) transmission enables simultaneous transmission and reception of signals over the same frequency, thereby doubling spectral efficiency. Further, a continuous up-scaling of wireless network carrier frequencies arising from ever-increasing data traffic is driving research on integrated sensing and communications (ISAC) systems. In this context, we study the co-design of common waveforms, precoders, and filters for an IBFD multi-user (MU) multiple-input multiple-output (MIMO) communications with a distributed MIMO radar. This paper, along with companion papers (Part I and II), proposes a comprehensive MRMC framework that addresses all these challenges. In the companion papers, we developed signal processing and joint design algorithms for this distributed system. In this paper, we tackle multi-target detection, localization, and tracking. This co-design problem that includes practical MU-MIMO constraints on power and quality-of-service is highly non-convex. We propose a low-complexity procedure based on Barzilai-Borwein gradient algorithm to obtain the design parameters and mixed-integer linear program for distributed target localization. Numerical experiments demonstrate the feasibility and accuracy of multi-target sensing of the distributed FD ISAC system. Finally, we localize and track multiple targets by adapting the joint probabilistic data association and extended Kalman filter for this system.
We address the challenge of spectral sharing between a statistical multiple-input multiple-output (MIMO) radar and an in-band full-duplex (IBFD) multi-user MIMO (MU-MIMO) communications system operating simultaneously in the same frequency band. Existing research on joint MIMO-radar-MIMO-communications (MRMC) systems has limitations, such as focusing on colocated MIMO radars, half-duplex MIMO communications, single-user scenarios, neglecting practical constraints, or employing separate transmit/receive units for MRMC coexistence. This paper, along with companion papers (Part I and III), proposes a comprehensive MRMC framework that addresses all these challenges. In the previous companion paper (Part I), we presented signal processing techniques for a distributed IBFD MRMC system. In this paper, we introduce joint design of statistical MIMO radar codes, uplink/downlink precoders, and corresponding receive filters using a novel metric called compounded-and-weighted sum mutual information. To solve the resulting highly non-convex problem, we employ a combination of block coordinate descent (BCD) and alternating projection methods. Numerical experiments show convergence of our algorithm, mitigation of uplink interference, and stable data rates under varying noise levels, channel estimate imperfections, and self-interference. The subsequent companion paper (Part III) extends the discussion to multiple targets and evaluates the tracking performance of our MRMC system.
Understanding information cascades in networks is a fundamental issue in numerous applications. Current researches often sample cascade information into several independent paths or subgraphs to learn a simple cascade representation. However, these approaches fail to exploit the hierarchical semantic associations between different modalities, limiting their predictive performance. In this work, we propose a novel Hierarchical Information Enhancement Network (HIENet) for cascade prediction. Our approach integrates fundamental cascade sequence, user social graphs, and sub-cascade graph into a unified framework. Specifically, HIENet utilizes DeepWalk to sample cascades information into a series of sequences. It then gathers path information between users to extract the social relationships of propagators. Additionally, we employ a time-stamped graph convolutional network to aggregate sub-cascade graph information effectively. Ultimately, we introduce a Multi-modal Cascade Transformer to powerfully fuse these clues, providing a comprehensive understanding of cascading process. Extensive experiments have demonstrated the effectiveness of the proposed method.
Identifying key nodes in social networks plays a crucial role in timely blocking false information. Existing key node identification methods usually consider node influence only from the propagation structure perspective and have insufficient generalization ability to unknown scenarios. In this paper, we propose a novel Multi-perspective Memory Enhanced Network (MMEN) for identifying key nodes in social networks, which mines key nodes from multiple perspectives and utilizes memory networks to store historical information. Specifically, MMEN first constructs two propagation networks from the perspectives of user attributes and propagation structure and updates node feature representations using graph attention networks. Meanwhile, the memory network is employed to store information of similar subgraphs, enhancing the model's generalization performance in unknown scenarios. Finally, MMEN applies adaptive weights to combine the node influence of the two propagation networks to select the ultimate key nodes. Extensive experiments demonstrate that our method significantly outperforms previous methods.