Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yu-Chee Tseng

Scale-Aware Crowd Count Network with Annotation Error Correction

Dec 28, 2023
Yi-Kuan Hsieh, Jun-Wei Hsieh, Yu-Chee Tseng, Ming-Ching Chang, Li Xin

Traditional crowd counting networks suffer from information loss when feature maps are downsized through pooling layers, leading to inaccuracies in counting crowds at a distance. Existing methods often assume correct annotations during training, disregarding the impact of noisy annotations, especially in crowded scenes. Furthermore, the use of a fixed Gaussian kernel fails to account for the varying pixel distribution with respect to the camera distance. To overcome these challenges, we propose a Scale-Aware Crowd Counting Network (SACC-Net) that introduces a ``scale-aware'' architecture with error-correcting capabilities of noisy annotations. For the first time, we {\bf simultaneously} model labeling errors (mean) and scale variations (variance) by spatially-varying Gaussian distributions to produce fine-grained heat maps for crowd counting. Furthermore, the proposed adaptive Gaussian kernel variance enables the model to learn dynamically with a low-rank approximation, leading to improved convergence efficiency with comparable accuracy. The performance of SACC-Net is extensively evaluated on four public datasets: UCF-QNRF, UCF CC 50, NWPU, and ShanghaiTech A-B. Experimental results demonstrate that SACC-Net outperforms all state-of-the-art methods, validating its effectiveness in achieving superior crowd counting accuracy.

* 7 pages, 6 figues. arXiv admin note: text overlap with arXiv:2211.06835

Via

Access Paper or Ask Questions

Tracking Players in a Badminton Court by Two Cameras

Aug 09, 2023
Young-Ching Chou, Shen-Ru Zhang, Bo-Wei Chen, Hong-Qi Chen, Cheng-Kuan Lin, Yu-Chee Tseng

Figure 1 for Tracking Players in a Badminton Court by Two Cameras

Figure 2 for Tracking Players in a Badminton Court by Two Cameras

Figure 3 for Tracking Players in a Badminton Court by Two Cameras

Figure 4 for Tracking Players in a Badminton Court by Two Cameras

This study proposes a simple method for multi-object tracking (MOT) of players in a badminton court. We leverage two off-the-shelf cameras, one on the top of the court and the other on the side of the court. The one on the top is to track players' trajectories, while the one on the side is to analyze the pixel features of players. By computing the correlations between adjacent frames and engaging the information of the two cameras, MOT of badminton players is obtained. This two-camera approach addresses the challenge of player occlusion and overlapping in a badminton court, providing player trajectory tracking and multi-angle analysis. The presented system offers insights into the positions and movements of badminton players, thus serving as a coaching or self-training tool for badminton players to improve their gaming strategies.

Via

Access Paper or Ask Questions

Scale-Aware Crowd Counting Using a Joint Likelihood Density Map and Synthetic Fusion Pyramid Network

Nov 13, 2022
Yi-Kuan Hsieh, Jun-Wei Hsieh, Yu-Chee Tseng, Ming-Ching Chang, Bor-Shiun Wang

Figure 1 for Scale-Aware Crowd Counting Using a Joint Likelihood Density Map and Synthetic Fusion Pyramid Network

Figure 2 for Scale-Aware Crowd Counting Using a Joint Likelihood Density Map and Synthetic Fusion Pyramid Network

Figure 3 for Scale-Aware Crowd Counting Using a Joint Likelihood Density Map and Synthetic Fusion Pyramid Network

Figure 4 for Scale-Aware Crowd Counting Using a Joint Likelihood Density Map and Synthetic Fusion Pyramid Network

We develop a Synthetic Fusion Pyramid Network (SPF-Net) with a scale-aware loss function design for accurate crowd counting. Existing crowd-counting methods assume that the training annotation points were accurate and thus ignore the fact that noisy annotations can lead to large model-learning bias and counting error, especially for counting highly dense crowds that appear far away. To the best of our knowledge, this work is the first to properly handle such noise at multiple scales in end-to-end loss design and thus push the crowd counting state-of-the-art. We model the noise of crowd annotation points as a Gaussian and derive the crowd probability density map from the input image. We then approximate the joint distribution of crowd density maps with the full covariance of multiple scales and derive a low-rank approximation for tractability and efficient implementation. The derived scale-aware loss function is used to train the SPF-Net. We show that it outperforms various loss functions on four public datasets: UCF-QNRF, UCF CC 50, NWPU and ShanghaiTech A-B datasets. The proposed SPF-Net can accurately predict the locations of people in the crowd, despite training on noisy training annotations.

* 8 pages, 8 figures, 4 tables

Via

Access Paper or Ask Questions

CoachAI: A Project for Microscopic Badminton Match Data Collection and Tactical Analysis

Jul 12, 2019
Tzu-Han Hsu, Ching-Hsuan Chen, Nyan Ping Ju, Tsì-Uí İk, Wen-Chih Peng, Chih-Chuan Wang, Yu-Shuen Wang, Yuan-Hsiang Lin, Yu-Chee Tseng, Jiun-Long Huang, Yu-Tai Ching

Figure 1 for CoachAI: A Project for Microscopic Badminton Match Data Collection and Tactical Analysis

Figure 2 for CoachAI: A Project for Microscopic Badminton Match Data Collection and Tactical Analysis

Figure 3 for CoachAI: A Project for Microscopic Badminton Match Data Collection and Tactical Analysis

Figure 4 for CoachAI: A Project for Microscopic Badminton Match Data Collection and Tactical Analysis

Computer vision based object tracking has been used to annotate and augment sports video. For sports learning and training, video replay is often used in post-match review and training review for tactical analysis and movement analysis. For automatically and systematically competition data collection and tactical analysis, a project called CoachAI has been supported by the Ministry of Science and Technology, Taiwan. The proposed project also includes research of data visualization, connected training auxiliary devices, and data warehouse. Deep learning techniques will be used to develop video-based real-time microscopic competition data collection based on broadcast competition video. Machine learning techniques will be used to develop a tactical analysis. To reveal data in more understandable forms and to help in pre-match training, AR/VR techniques will be used to visualize data, tactics, and so on. In addition, training auxiliary devices including smart badminton rackets and connected serving machines will be developed based on the IoT technology to further utilize competition data and tactical data and boost training efficiency. Especially, the connected serving machines will be developed to perform specified tactics and to interact with players in their training.

Via

Access Paper or Ask Questions

Fusing Video and Inertial Sensor Data for Walking Person Identification

Feb 20, 2018
Yuehong Huang, Yu-Chee Tseng

Figure 1 for Fusing Video and Inertial Sensor Data for Walking Person Identification

Figure 2 for Fusing Video and Inertial Sensor Data for Walking Person Identification

Figure 3 for Fusing Video and Inertial Sensor Data for Walking Person Identification

Figure 4 for Fusing Video and Inertial Sensor Data for Walking Person Identification

An autonomous computer system (such as a robot) typically needs to identify, locate, and track persons appearing in its sight. However, most solutions have their limitations regarding efficiency, practicability, or environmental constraints. In this paper, we propose an effective and practical system which combines video and inertial sensors for person identification (PID). Persons who do different activities are easy to identify. To show the robustness and potential of our system, we propose a walking person identification (WPID) method to identify persons walking at the same time. By comparing features derived from both video and inertial sensor data, we can associate sensors in smartphones with human objects in videos. Results show that the correctly identified rate of our WPID method can up to 76% in 2 seconds.

Via

Access Paper or Ask Questions