Object detection methods under known single degradations have been extensively investigated. However, existing approaches require prior knowledge of the degradation type and train a separate model for each, limiting their practical applications in unpredictable environments. To address this challenge, we propose a chain-of-thought (CoT) prompted adaptive enhancer, CPA-Enhancer, for object detection under unknown degradations. Specifically, CPA-Enhancer progressively adapts its enhancement strategy under the step-by-step guidance of CoT prompts, that encode degradation-related information. To the best of our knowledge, it's the first work that exploits CoT prompting for object detection tasks. Overall, CPA-Enhancer is a plug-and-play enhancement model that can be integrated into any generic detectors to achieve substantial gains on degraded images, without knowing the degradation type priorly. Experimental results demonstrate that CPA-Enhancer not only sets the new state of the art for object detection but also boosts the performance of other downstream vision tasks under unknown degradations.
Large language models(LLMs) have shown its outperforming ability on various tasks and question answering. However, LLMs require high computation cost and large memory cost. At the same time, LLMs may cause privacy leakage when training or prediction procedure contains sensitive information. In this paper, we propose SPA(Side Plugin Adaption), a lightweight architecture for fast on-devices inference and privacy retaining on the constraints of strict on-devices computation and memory constraints. Compared with other on-devices seq2seq generation, SPA could make a fast and stable inference on low-resource constraints, allowing it to obtain cost effiency. Our method establish an interaction between a pretrained LLMs on-cloud and additive parameters on-devices, which could provide the knowledge on both pretrained LLMs and private personal feature.Further more, SPA provides a framework to keep feature-base parameters on private guaranteed but low computational devices while leave the parameters containing general information on the high computational devices.
Large language models (LLMs) have achieved commendable accomplishments in various natural language processing tasks. However, LLMs still encounter significant challenges when dealing with complex scenarios involving multiple entities. These challenges arise from the presence of implicit relationships that demand multi-step reasoning. In this paper, we propose a novel approach ERA-CoT, which aids LLMs in understanding context by capturing relationships between entities and supports the reasoning of diverse tasks through Chain-of-Thoughts (CoT). Experimental results show that ERA-CoT demonstrates the superior performance of our proposed method compared to current CoT prompting methods, achieving a significant improvement of an average of 5.1\% on GPT3.5 compared to previous SOTA baselines. Our analysis indicates that ERA-CoT increases the LLM's understanding of entity relationships, significantly improves the accuracy of question answering, and enhances the reasoning ability of LLMs.
Large language models (LLMs) demonstrate exceptional performance in numerous tasks but still heavily rely on knowledge stored in their parameters. Moreover, updating this knowledge incurs high training costs. Retrieval-augmented generation (RAG) methods address this issue by integrating external knowledge. The model can answer questions it couldn't previously by retrieving knowledge relevant to the query. This approach improves performance in certain scenarios for specific tasks. However, if irrelevant texts are retrieved, it may impair model performance. In this paper, we propose Retrieval Augmented Iterative Self-Feedback (RA-ISF), a framework that iteratively decomposes tasks and processes them in three submodules to enhance the model's problem-solving capabilities. Experiments show that our method outperforms existing benchmarks, performing well on models like GPT3.5, Llama2, significantly enhancing factual reasoning capabilities and reducing hallucinations.
Movable antenna (MA) is a new technology with great potential to improve communication performance by enabling local movement of antennas for pursuing better channel conditions. In particular, the acquisition of complete channel state information (CSI) between the transmitter (Tx) and receiver (Rx) regions is an essential problem for MA systems to reap performance gains. In this paper, we propose a general channel estimation framework for MA systems by exploiting the multi-path field response channel structure. Specifically, the angles of departure (AoDs), angles of arrival (AoAs), and complex coefficients of the multi-path components (MPCs) are jointly estimated by employing the compressed sensing method, based on multiple channel measurements at designated positions of the Tx-MA and Rx-MA. Under this framework, the Tx-MA and Rx-MA measurement positions fundamentally determine the measurement matrix for compressed sensing, of which the mutual coherence is analyzed from the perspective of Fourier transform. Moreover, two criteria for MA measurement positions are provided to guarantee the successful recovery of MPCs. Then, we propose several MA measurement position setups and compare their performance. Finally, comprehensive simulation results show that the proposed framework is able to estimate the complete CSI between the Tx and Rx regions with a high accuracy.
PARAGEN is a PyTorch-based NLP toolkit for further development on parallel generation. PARAGEN provides thirteen types of customizable plugins, helping users to experiment quickly with novel ideas across model architectures, optimization, and learning strategies. We implement various features, such as unlimited data loading and automatic model selection, to enhance its industrial usage. ParaGen is now deployed to support various research and industry applications at ByteDance. PARAGEN is available at https://github.com/bytedance/ParaGen.
Unmanned aerial vehicles (UAVs) have found widespread commercial, civilian, and military applications. Wireless communication has always been one of the core technologies for UAV. However, the communication capacity is becoming a bottleneck for UAV to support more challenging application scenarios. The heavily-occupied sub-6 GHz frequency band is not sufficient to meet the ultra high-data-traffic requirements. The utilization of the millimeter-wave (mmWave) frequency bands is a promising direction for UAV communications, where large antenna arrays can be packed in a small area on the UAV to perform three-dimensional (3D) beamforming. On the other hand, UAVs serving as aerial access points or relays can significantly enhance the coverage and quality of service of the terrestrial mmWave cellular networks. In this paper, we provide a comprehensive survey on mmWave beamforming enabled UAV communications and networking. The technical potential of and challenges for mmWave-UAV communications are presented first. Then, we provide an overview on relevant mmWave antenna structures and channel modeling. Subsequently, the technologies and solutions for UAV-connected mmWave cellular networks and mmWave-UAV ad hoc networks are reviewed, respectively. Finally, we present open issues and promising directions for future research in mmWave beamforming enabled UAV communications and networking.
Orthogonal time frequency space (OTFS) modulation has been confirmed to provide significant performance advantages against Doppler in high-mobility scenarios. The core feature of OTFS is that the time-variant channel is converted into a non-fading 2D channel in the delay-Doppler (DD) domain so that all symbols experience the same channel gain. In now available literature, the channel is assumed to be quasi-static over an OTFS frame. As for more practical channels, the input-output relation will be time-variant as the environment or medium changes. In this paper, we analyze the characterizations of OTFS modulation over a more general multipath channel, where the signal of each path has experienced a unique rapid fading. First, we derive the explicit input-output relationship of OTFS in the DD domain for the case of ideal pulse and rectangular pulse. It is shown that the rapid fading will produce extra Doppler dispersion without impacting on delay domain. We next demonstrate that OTFS can be interpreted as an efficient time diversity technology that combines space-time encoding and interleaving. Simulation results reveal that OTFS is insensitive to rapid fading and still outperforms orthogonal frequency-division multiplexing (OFDM) in these types of channels.
For reentry or near space communication, owing to the influence of the time-varying plasma sheath channel environment, the received IQ baseband signals are severely rotated on the constellation. Researches have shown that the frequency of electron density varies from 20kHz to 100 kHz which is on the same order as the symbol rate of most TT\&C communication systems and a mass of bandwidth will be consumed to track the time-varying channel with traditional estimation. In this paper, motivated by principal curve analysis, we propose a deep learning (DL) algorithm which called symmetric manifold network (SMN) to extract the curves on the constellation and classify the signals based on the curves. The key advantage is that SMN can achieve joint optimization of demodulation and channel estimation. From our simulation results, the new algorithm significantly reduces the symbol error rate (SER) compared to existing algorithms and enables accurate estimation of fading with extremely high bandwith utilization rate.