
| General Information
The SMT Data Challenge is an advanced data competition where students analyze real-world, player-tracking data. Projects are open-ended, emphasizing process, relevance, creativity and communication rather than purely quantitative analysis. The Data Challenge has become a top recruiting ground for MLB teams—more than 25% of past participants have been hired by professional teams or sports companies.
IF YOU WANT TO STAND OUT AS A JOB CANDIDATE, THIS IS WHERE TO START!
| Eligibility
-
The SMT Data Challenge is open to STUDENTS ONLY.
-
Must be 18 years or older and enrolled for both Spring and Fall 2025 semesters
-
Students may participate individually or in teams (max: 4 students). There is also an option for individuals looking for team members.
| Project Guidelines & Submissions
PROJECT TOPIC
“Good” baseball is all about decision-making. Did events on the field go as planned? This year, we challenge you to use tracking data to infer and analyze player or team “intent.” Possible approaches include:
• Determining who should field a ball and assigning responsibility for misplays
• Evaluating when a fielder should make a play or not
• Analyzing whether a fielder or baserunner moves as intended
• Investigating what an advanced scout can deduce about a team’s strategy
MLB teams have asked to see:
• Effective use of data visualization—compelling graphics that advance storytelling
• Proficiency with "messy" data, effective cleaning, analysis, and justifications for inclusion/exclusion
SUBMISSION REQUIREMENTS | Deadline August 1
-
Technical Paper in PDF format (max: 2000 words)
-
Analysis Code
-
Results in CSV format
| Judging
Judges from academia, industry, journalism, and sports will evaluate submissions based on creativity, relevance, methodology, storytelling and communication to technical and non-technical audiences.
| Finalist Showcase
Three finalist teams will be invited to a Showcase Weekend at SMT headquarters in Durham, NC to present to a panel of judges.
(Mid-Nov, date TBD)
Optional analysis demo/Q&A sessions:
-
May 8 – Introduction
-
May 22 - Analysis
-
June 5 – Data Visualization
-
July 3 – Technical Writing
Virtual office hours available: May 6 – July 31
| Key Dates
April 30: Registration Deadline
May 5: Earliest date Data might be available
Aug 1: Submission Deadline
Oct 6: Finalists Announced
Mid-Nov: Finalist Presentations
2025 DATA CHALENGE
FINALIST & HONORABLE MENTIONS
We are pleased to announce the Finalists and Honorable Mentions for the 2025 SMT Data Challenge.
This annual competition gives students the opportunity to do original analysis using exclusively available player tracking data, afterwards putting that work in front of professional sports teams. This year’s competition was the most impressive to date, with high-caliber submissions from 50 teams comprised of 114 students. We also received assistance from more than 100 judges, bringing knowledge and insight from all aspects of baseball and sports. This competition wouldn’t be possible without their time, energy, and dedication. Thanks so much to all of you!
Given the scope and quality of this year’s competition, we have expanded recognition beyond the Finalists, creating an “Honorable Mentions” category to acknowledge the excellence of our highest-level projects. Congratulations to all of you!
Finalists now advance to the next round, attending our Finalist Showcase Weekend (November 14-18) at SMT Headquarters in Durham, North Carolina. Expected activities include a Carolina Hurricanes game and a tour of the Durham Bulls Athletic Park, and the event concludes with in-person presentations for a panel of judges. The 2025 SMT Data Challenge Winner will be announced on November 18.
| FINALISTS
TEAM 97: Alex Frederick, Jack Kalsched, Pranav Rajaram, Andrew Zaletski (UC San Diego)
The Man in the Middle
Cutoff plays are a critical but under-analyzed component of public baseball analysis. In this project, we develop a framework to assess cutoff play decision-making using anonymized MiLB player and ball tracking data. First, we construct a logistic regression model to estimate baserunner safe probabilities for extra base advancement paths. Next, we combine these probabilities with RE24 run expectancies to compute expected run values for each possible cutoff action, identifying the optimal cutoff choice for every play. Using these labels, we train a Random Forest classifier with features such as runner and fielder distances, arm strength, and sprint speed to predict optimal actions. Our findings reveal that cutoff men overwhelmingly favor aggressive throws when our model recommends holding on to the ball, leading to notable run expectancy losses. Finally, we present an interactive dashboard that provides team- and player-level cutoff decision-making insights that can supplement scouting and coaching. This project offers a scalable approach to cutoff play evaluation that can be used by teams to improve their defensive strategy in crucial situations.
TEAM 158: Timothy Jackson Millar (University of Canberra, Canberra, Australia)
The Cognitive Clock Score (CCS) Framework: Measuring Player Cognition during the Pre-Acquisition Phase
Traditional defensive metrics in baseball excel at quantifying physical outcomes but fail to capture the subconscious, split-second mental processing that precedes them. This paper introduces the Cognitive Clock Score (CCS), a metric framework designed to objectively measure a player's cognitive performance during the pre-acquisition stage of a defensive play: the critical process from bat-on-ball contact to securing the ball in the glove. The data is displayed in a dynamic, lightweight web app available at https://cognitiveclock.com.
Leveraging SMT’s tracking data, the CCS deconstructs plays into five core cognitive components: Reaction Time (RT), Path Efficiency (PE), Movement Decisiveness (MD), Adaptability (AD), and Anticipation (AN). These metrics are calculated and contextualised via a data pipeline that normalises performance against league positional benchmarks. In addition to creating a 20-80 score for each factor, multi-layered Cognitive Archetypes are generated for Players. This framework translates complex quantitative scores into intuitive narratives, identifying players with archetypes such as "Thinkers," "Reactors," or "Hesitant Processors" and detailing specific trade-offs such as "Explosive but Inefficient" or “Chaotic Creator”. The result is a powerful tool that complements traditional eye tests, providing an empirical foundation for player evaluation, targeted development, and strategic team construction.
TEAM 199: Ty Albao (UC San Diego)
It’s BRI, Bro: Evaluating BaseRunning Intelligence
In my analysis, I aim to obtain a way of evaluating a player’s and team’s baserunning choices in critical and fast-paced situations. I first get two probabilities. One is the breakeven probability of a runner advancing that tells the runner what the needed probability of being safe is to make it worth attempting to advance (calculated using a run expectancy matrix), and the other is the actual probability of a baserunner being safe on a given play (calculated using the model I built). I then compare these two probabilities to see if the runner made the right decision, to stay or advance. This adds another dimension for baserunning critique as it enables analysts to observe aggressiveness and good decision making, regardless if the runner was safe or not.
TEAM 208: Tom Kim (University of Toronto), Bennett Moore (University of Guelph)
The Intent to Give it Your All
Modern player evaluation concentrates on talent metrics such as maximum pitch velocity and sprint speed due to their predictive capacities. However, the expression of talent depends on deliberate athlete intent. Despite widespread acknowledgment, the concept of “hustle” remains relegated to conventional wisdom rather than quantitative analysis. In this project, we develop quantitative metrics, Hustle+ and Safe Probability Added (SPA), to objectively measure hustle and its influence in game outcomes. We define these metrics through statistical estimation of athletes’ true capacities and their effort-driven expression. True capacity is estimated using softened maximum-a-posteriori estimation combined with Z-score-based variance reduction, while the expression of this capacity is modeled based on established sport science literature. Gradient-boosting decision tree ensembles are employed to simulate outcomes. Results indicate a linear relationship between hustle and performance. Furthermore, significant discrepancies in hustle are identified among teams, with consequential impacts on seasonal outcomes.
| HONORABLE MENTION
TEAM 15: Diego Issac Osborn (UC San Diego)
Bayesball in the Outfield: Quantifying Defensive Aggression With a Bayesian Hierarchical Model
TEAM 88: Kique Ruiz (Bowdoin College)
Flipping Awesome or Flipping Embarrassing?: Using Player Tracking Data to Analyze Bat Flips
Team 128: Atul Venkatesh, Levon Sarian, Daniel Wurzburger (Dartmouth College), Leo Feingold (Carleton College)
Being Kelly Leak - A Quantitative Analysis of Outfielder Intent in Professional Baseball
Team 171: Austin Ambler, Hayden Orenstein (Syracuse University)
Pickoff Decision Analysis
Team 180: Lucas Lodermeier, Karl Pederson, Caleb Pederson (University of Minnesota, Twin Cities)
Jump-starting Stolen Bases
Team 214: Jay Irby (University of Florida)
Pickoff, then Takeoff? Predicting Stolen Base Attempts After Pickoff Moves
