Environmental perception in Automated Valet Parking (AVP) has been a challenging task due to severe occlusions in parking garages. Although Collaborative Perception (CP) can be applied to broaden the field of view of connected vehicles, the limited bandwidth of vehicular communications restricts its application. In this work, we propose a BEV feature-based CP network architecture for infrastructure-assisted AVP systems. The model takes the roadside camera and LiDAR as optional inputs and adaptively fuses them with onboard sensors in a unified BEV representation. Autoencoder and downsampling are applied for channel-wise and spatial-wise dimension reduction, while sparsification and quantization further compress the feature map with little loss in data precision. Combining these techniques, the size of a BEV feature map is effectively compressed to fit in the feasible data rate of the NR-V2X network. With the synthetic AVP dataset, we observe that CP can effectively increase perception performance, especially for pedestrians. Moreover, the advantage of infrastructure-assisted CP is demonstrated in two typical safety-critical scenarios in the AVP setting, increasing the maximum safe cruising speed by up to 3m/s in both scenarios.
Timely and reliable environment perception is fundamental to safe and efficient automated driving. However, the perception of standalone intelligence inevitably suffers from occlusions. A new paradigm, Cooperative Perception (CP), comes to the rescue by sharing sensor data from another perspective, i.e., from a cooperative vehicle (CoV). Due to the limited communication bandwidth, it is essential to schedule the most beneficial CoV, considering both the viewpoints and communication quality. Existing methods rely on the exchange of meta-information, such as visibility maps, to predict the perception gains from nearby vehicles, which induces extra communication and processing overhead. In this paper, we propose a new approach, learning while scheduling, for distributed scheduling of CP. The solution enables CoVs to predict the perception gains using past observations, leveraging the temporal continuity of perception gains. Specifically, we design a mobility-aware sensor scheduling (MASS) algorithm based on the restless multi-armed bandit (RMAB) theory to maximize the expected average perception gain. An upper bound on the expected average learning regret is proved, which matches the lower bound of any online algorithm up to a logarithmic factor. Extensive simulations are carried out on realistic traffic traces. The results show that the proposed MASS algorithm achieves the best average perception gain and improves recall by up to 4.2 percentage points compared to other learning-based algorithms. Finally, a case study on a trace of LiDAR frames qualitatively demonstrates the superiority of adaptive exploration, the key element of the MASS algorithm.
Vehicle-to-Everything (V2X) network has enabled collaborative perception in autonomous driving, which is a promising solution to the fundamental defect of stand-alone intelligence including blind zones and long-range perception. However, the lack of datasets has severely blocked the development of collaborative perception algorithms. In this work, we release DOLPHINS: Dataset for cOllaborative Perception enabled Harmonious and INterconnected Self-driving, as a new simulated large-scale various-scenario multi-view multi-modality autonomous driving dataset, which provides a ground-breaking benchmark platform for interconnected autonomous driving. DOLPHINS outperforms current datasets in six dimensions: temporally-aligned images and point clouds from both vehicles and Road Side Units (RSUs) enabling both Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) based collaborative perception; 6 typical scenarios with dynamic weather conditions make the most various interconnected autonomous driving dataset; meticulously selected viewpoints providing full coverage of the key areas and every object; 42376 frames and 292549 objects, as well as the corresponding 3D annotations, geo-positions, and calibrations, compose the largest dataset for collaborative perception; Full-HD images and 64-line LiDARs construct high-resolution data with sufficient details; well-organized APIs and open-source codes ensure the extensibility of DOLPHINS. We also construct a benchmark of 2D detection, 3D detection, and multi-view collaborative perception tasks on DOLPHINS. The experiment results show that the raw-level fusion scheme through V2X communication can help to improve the precision as well as to reduce the necessity of expensive LiDAR equipment on vehicles when RSUs exist, which may accelerate the popularity of interconnected self-driving vehicles. DOLPHINS is now available on https://dolphins-dataset.net/.
Cooperative perception of connected vehicles comes to the rescue when the field of view restricts stand-alone intelligence. While raw-level cooperative perception preserves most information to guarantee accuracy, it is demanding in communication bandwidth and computation power. Therefore, it is important to schedule the most beneficial vehicle to share its sensor in terms of supplementary view and stable network connection. In this paper, we present a model of raw-level cooperative perception and formulate the energy minimization problem of sensor sharing scheduling as a variant of the Multi-Armed Bandit (MAB) problem. Specifically, volatility of the neighboring vehicles, heterogeneity of V2X channels, and the time-varying traffic context are taken into consideration. Then we propose an online learning-based algorithm with logarithmic performance loss, achieving a decent trade-off between exploration and exploitation. Simulation results under different scenarios indicate that the proposed algorithm quickly learns to schedule the optimal cooperative vehicle and saves more energy as compared to baseline algorithms.