Soccer Player Re-Identification through Broadcast Video Streams

In recent years, computer vision has demonstrated its potential in fields like surveillance, entertainment, and sports analytics. One area that presents intriguing challenges is re-identifying soccer players from broadcast video streams. This project represents a comprehensive research endeavor to develop a real-time computer vision pipeline capable of detecting, tracking, and re-identifying soccer players using broadcast footage. The project uses deep learning techniques, custom datasets, and novel methodologies to overcome obstacles like occlusions, low resolution, and jersey number recognition complexities.

Literature Review: Foundations of Re-Identification and Jersey Number Recognition

The literature surrounding soccer player re-identification can be broadly categorized into player detection, jersey number recognition, and tracking methodologies. Various researchers have attempted to solve these problems with a combination of classic computer vision techniques and deep learning models. The literature we reviewed included advancements in person re-identification through deep learning and object detection techniques specifically applied to sports scenarios.

One prominent challenge in soccer player identification is the occlusion of visual cues, such as facial features or jersey numbers, which often become blurred or hidden during fast-paced gameplay. For jersey number recognition, traditional techniques using handcrafted features (like Histogram of Oriented Gradients) combined with Support Vector Machines proved ineffective at generalizing under realistic match conditions, achieving only moderate accuracy levels. Instead, convolutional neural networks (CNNs), particularly YOLO ("You Only Look Once") models, emerged as a reliable solution for localizing jersey numbers on players’ backs. The literature suggests improvements using deep models, such as Spatial Transformer Networks (STN), to focus on jersey localization, achieving higher recognition accuracy compared to earlier methods.

Person re-identification (ReID) builds upon feature extraction techniques to identify players across frames. Methods such as Siamese convolutional networks have been shown to effectively match visual features from different frames, thereby allowing the network to maintain individual identities despite partial occlusions. Additionally, newer approaches like Gated Siamese CNN architectures, which emphasize salient local features, achieved promising results when benchmarked on public datasets. The literature underscored that leveraging body poses and attention mechanisms significantly improved player tracking and re-identification, especially under occlusion-heavy conditions.

This foundation from the literature has been instrumental in guiding our approach to creating a robust soccer player identification pipeline. To successfully track and label players, we incorporated lessons from previous models and datasets to adapt the methods to the complexities present in broadcast soccer video streams.

Literature Summary Table

Technique	Purpose	Key Models/Methods	Results/Findings
Handcrafted Features + SVM	Jersey Number Recognition	HOG, SVM	Moderate accuracy, ineffective for generalizing
Convolutional Neural Networks	Player and Jersey Localization	YOLO, Spatial Transformer	Improved accuracy in jersey localization tasks
Siamese Networks	Person Re-Identification	Siamese CNN	Effective feature matching despite occlusions
Gated Siamese CNN	Enhanced Feature Matching	Gated Siamese CNN	Improved results over traditional Siamese networks
Attention Mechanisms + Body Pose	Improved Tracking	Pose-Guided R-CNN	Significant improvement under occlusion conditions

Our Solution: A Comprehensive Vision Pipeline

Our proposed solution is a computer vision pipeline that integrates person tracking and jersey number recognition, carefully balancing accuracy and processing speed to function effectively in real time. Here’s an overview of our approach:

Dataset Creation and Fine-Tuning

To develop our player detection and jersey number recognition models, we crafted two private datasets. The first dataset was designed for localizing soccer players in frames, while the second targeted jersey number localization and recognition. We modified Google’s Street View House Numbers (SVHN) dataset to include whole numbers, ranging from 0 to 99 and fine-tuned this model to recognize soccer jersey numbers effectively in dynamic environments. The modified SVHN dataset consisted of more than 31,000 training images and over 4,000 test samples, allowing our models to adapt well to varying lighting conditions and jersey styles.

Player Detection and Tracking

Player detection was achieved using a YOLOv5-based object detector that was fine-tuned on a custom dataset of soccer players. This model was able to detect players in broadcast video frames with over 90% accuracy. Once a player was detected, the tracking task was handled by a DeepSORT model, which was further fine-tuned using our custom dataset of soccer player tracking sequences. This model was pre-trained on the CrowdHuman dataset and fine-tuned on 85 tracks from Premier League matches, resulting in a robust tracking capability with a 95.3% rank-1 recognition rate. By coupling YOLOv5 with DeepSORT, we achieved seamless tracking in real-time with an accuracy of 85%.

Jersey Number Localization

Jersey number localization is a crucial component of our solution that enables player identification. For localization, we used a YOLOv5 model that was trained on a custom dataset of jersey number bounding boxes. These bounding boxes were manually annotated from 21 Premier League matches, covering different scenarios and team combinations. The YOLOv5 model achieved a mean Average Precision (mAP) of 96.3%, effectively localizing jersey numbers with high precision and recall rates.

Jersey Number Recognition

To recognize jersey numbers, we leveraged a ResNet18 model trained on our modified SVHN dataset, which includes all jersey numbers from 0 to 99. This model was then fine-tuned on our custom dataset of jersey numbers extracted from Premier League broadcasts, achieving an average accuracy of 88%. For further optimization, the model utilized weighted cross-entropy loss to handle class imbalance and numerous augmentation techniques to ensure it performed well under different viewing angles and lighting conditions.

Solution Summary Table

Component	Dataset Description	Model/Technique	Accuracy/Performance
Player Detection	Custom dataset of soccer players in frames	YOLOv5	95.3% rank-1 recognition
Player Tracking	Custom ReID dataset of soccer players	DeepSORT + YOLOv5	85% tracking accuracy
Jersey Number Localization	21 Premier League matches (bounding boxes)	YOLOv5	96.3% mAP for number localization
Jersey Number Recognition	Modified SVHN, fine-tuned custom dataset	ResNet18	88% accuracy for jersey number recognition

Conclusion: Real-Time Player Identification Achieved

Our research successfully culminated in a real-time computer vision pipeline capable of localizing, tracking, and re-identifying soccer players in broadcast video streams. By using a combination of YOLOv5 models for localization, ResNet for jersey number recognition, and DeepSORT for player tracking, we achieved a system that is both efficient and reliable. Our solution balances the need for real-time processing with the intricacies of player identification, even under challenging conditions like occlusions and frequent motion.

This project represents a significant advancement in the application of computer vision to sports analytics, providing a foundation for future research in player performance analysis, team strategies, and automated broadcast enhancements.