Models I used:

  • RF-DETR – a DETR-style real-time object detector. We fine-tuned it to detect players, jersey numbers, referees, the ball, and even shot types.
  • SAM2 – a segmentation and tracking. It re-identifies players after occlusions and keeps IDs stable through contact plays.
  • SigLIP + UMAP + K-means – vision-language embeddings plus unsupervised clustering. This separates players into teams using uniform colors and textures, without manual labels.
  • SmolVLM2 – a compact vision-language model originally trained on OCR. After fine-tuning on NBA jersey crops, it jumped from 56% to 86% accuracy.
  • ResNet-32 – a classic CNN fine-tuned for jersey number classification. It reached 93% test accuracy, outperforming the fine-tuned SmolVLM2.



Posted by RandomForests92

Share.

12 Comments

  1. CleverNameThing on

    It’s cool when it’s sports, but this is also how we’re being tracked in public. But hey, we have the convenience of unlocking our phones a micro-second faster.

  2. Cool… but this basic machine vision that’s been available for a decade now. YOLO made this available in package form 5 years ago

  3. Charming-Strain-6070 on

    Wouldn’t it be better to also track the ball. Stats velocity, hang time, bounce count, points etc.

  4. This is so cool! I know there have been similar projects for soccer but I would love to do it myself too. I think the biggest issue there is lack of tactical cam footage. The moment the broadcast feed cuts over to show a single player or refcam or just moves the defenders out of the frame because the ball is too far forward it becomes very messy.

    Do you have any tips?

  5. Cool! I’m assuming this isn’t real time right? Needs to run on recorded video? How long does it take to process?

  6. Wow this is impressive!! I hear tracking overlapping players and clashes of three or more when going for a ball is one of the main hurdles is that true? I also read they still haven’t pinned this with a high degree of accuracy enough for soccer. I’m no expert at all but is theres a high degree of success depending of camera fidelity and how much frames per second it can handle? Or is there a number thats considered enough and past that is overkill? If there is, this means investment on all teams in a league at the same time for fairplay reasons. But hey thats me just rambling about what I read, if anyone has good intel on this please share!

  7. W8kingNightmare on

    Would be cool to have under their name their shooting % for that location on the court when they have the ball