Formula 1 ML Analysis

Posted Dec 8, 2025 Updated Jan 7, 2026

By Rishi Vemulapalli

4 min read

Overview

This project analyzes the Formula 1 Pit Stop Dataset from Kaggle to understand how pit stop strategy influences performance across a full Formula 1 season. The dataset provides detailed race strategy information including driver and team data, race details, and pit stop metrics. The analysis combines exploratory data analysis, statistical testing, and predictive modeling to quantify the relationship between race context and pit stop performance.

Dataset & Preprocessing

Source & Scope: The dataset spans Formula 1 races from 1950-2024, containing pit stop data for each driver across every race round. The data includes season, round, circuit, driver, constructor, laps completed, finishing position, total pit stops, and average pit stop time.

Data Cleaning:

Original dataset: 21,184 rows with missing AvgPitStopTime values were removed, reducing focus to races with reliable pit stop timing data.
Outlier removal: Used the 1.5×IQR (Interquartile Range) rule to remove extreme pit stop times caused by repairs, penalties, or logging errors.
Final cleaned dataset: Focused on modern hybrid-era Formula 1 (2011–2024) with 35 circuits, 76 drivers, and 23 constructors.

Key Statistics:

Mean pit stop time: 24.37 seconds
Standard deviation: 3.4 seconds
Interquartile range: 22–26 seconds
Typical race length: 52–66 laps
Typical pit stops per race: 1–3

This tight distribution reflects the highly optimized and consistent pit crew procedures in modern Formula 1.

Exploratory Data Analysis

After cleaning, the dataset revealed several key patterns:

Categorical Overview:

Mercedes appears most frequently (505 entries), followed by Ferrari (495) and Red Bull (488)
Circuit de Barcelona-Catalunya is the most represented track due to reliable timing data
The cleaned data predominantly reflects hybrid-era races where pit stop timing became standardized and complete

Numeric Distribution:

Pit stop times form a slightly right-skewed distribution, concentrated between 22–26 seconds
A secondary bump around 29–30 seconds likely represents stops with minor delays or complications
Position finishes span the full field (1–24), indicating analysis covers competitive and midfield teams
Laps completed show realistic race distances for modern F1

Visualization & Key Findings

Distribution of Pit Stop Times: The histogram of average pit stop times shows a tight, bell-shaped distribution centered around 24 seconds. This confirms that modern F1 pit stops are highly consistent once extreme outliers are removed, reflecting the professionalism and precision of contemporary pit crews.

Pit Stop Performance by Constructor: When comparing the top 10 constructors, there are small but noticeable differences:

Mercedes, Ferrari, and Red Bull show slightly faster average pit stops
Mid-field teams display slightly slower averages
The differences are typically less than 1 second, but can compound across multiple stops in a race

This aligns with expectations: top teams invest heavily in pit crew training and optimization, creating measurable performance advantages.

Statistical Analysis

One-Sample T-Tests (comparing constructor means to overall dataset mean of 24.37 seconds):

Constructor	n	Mean (sec)	Std Dev	t-statistic	p-value	Conclusion
Mercedes	505	23.84	3.21	−3.695	2.44e−4	Significantly faster
Ferrari	495	23.96	3.24	−2.798	5.34e−3	Significantly faster
Red Bull	488	23.78	3.29	−3.952	8.89e−5	Significantly faster

All three top constructors have statistically significant differences from the dataset mean (p < 0.01). However, the practical significance is modest—differences are fractions of a second, and box plots reveal substantial overlap between teams.

Linear Regression Model: A predictive model using Laps, Position, TotalPitStops, and constructor dummy variables achieved:

RMSE: ~2.8 seconds
R² Score: ~0.32

Top predictive features include constructor identity and total pit stops, indicating that team and race strategy are meaningful drivers of pit stop performance.

Key Insights

Team Consistency: Top teams (Mercedes, Ferrari, Red Bull) demonstrate faster and more consistent pit stops, creating measurable competitive advantages.
Modern Optimization: The cleaned dataset reflects highly optimized modern F1, where pit stops are predictable and clustered tightly around 24 seconds.
Statistical vs. Practical Significance: While differences between top teams are statistically significant due to large sample sizes, the practical impact (sub-second differences) depends on race context, circuit characteristics, and stacked stop scenarios.
Predictability: Pit stop times are partially predictable from race and team features, but substantial variation remains unexplained, suggesting situational factors and execution variability matter.

Next Steps & Future Work

Incorporate race-level features (e.g., tire compound, fuel load, track position) for improved predictions.
Analyze pit stop times as a time series to identify trends within seasons or across teams.
Build machine learning models (Random Forest, Gradient Boosting) to capture non-linear relationships.
Evaluate effect sizes (Cohen’s d) to quantify practical significance alongside statistical tests.
Extend analysis to include pit stop consistency metrics for crew reliability assessment.

Applications

This post is licensed under CC BY 4.0 by the author.