Formula 1 ML Analysis
Overview
This project analyzes the Formula 1 Pit Stop Dataset from Kaggle to understand how pit stop strategy influences performance across a full Formula 1 season. The dataset provides detailed race strategy information including driver and team data, race details, and pit stop metrics. The analysis combines exploratory data analysis, statistical testing, and predictive modeling to quantify the relationship between race context and pit stop performance.
Dataset & Preprocessing
Source & Scope: The dataset spans Formula 1 races from 1950-2024, containing pit stop data for each driver across every race round. The data includes season, round, circuit, driver, constructor, laps completed, finishing position, total pit stops, and average pit stop time.
Data Cleaning:
- Original dataset: 21,184 rows with missing
AvgPitStopTimevalues were removed, reducing focus to races with reliable pit stop timing data. - Outlier removal: Used the 1.5×IQR (Interquartile Range) rule to remove extreme pit stop times caused by repairs, penalties, or logging errors.
- Final cleaned dataset: Focused on modern hybrid-era Formula 1 (2011–2024) with 35 circuits, 76 drivers, and 23 constructors.
Key Statistics:
- Mean pit stop time: 24.37 seconds
- Standard deviation: 3.4 seconds
- Interquartile range: 22–26 seconds
- Typical race length: 52–66 laps
- Typical pit stops per race: 1–3
This tight distribution reflects the highly optimized and consistent pit crew procedures in modern Formula 1.
Exploratory Data Analysis
After cleaning, the dataset revealed several key patterns:
Categorical Overview:
- Mercedes appears most frequently (505 entries), followed by Ferrari (495) and Red Bull (488)
- Circuit de Barcelona-Catalunya is the most represented track due to reliable timing data
- The cleaned data predominantly reflects hybrid-era races where pit stop timing became standardized and complete
Numeric Distribution:
- Pit stop times form a slightly right-skewed distribution, concentrated between 22–26 seconds
- A secondary bump around 29–30 seconds likely represents stops with minor delays or complications
- Position finishes span the full field (1–24), indicating analysis covers competitive and midfield teams
- Laps completed show realistic race distances for modern F1
Visualization & Key Findings
Distribution of Pit Stop Times: The histogram of average pit stop times shows a tight, bell-shaped distribution centered around 24 seconds. This confirms that modern F1 pit stops are highly consistent once extreme outliers are removed, reflecting the professionalism and precision of contemporary pit crews.
Pit Stop Performance by Constructor: When comparing the top 10 constructors, there are small but noticeable differences:
- Mercedes, Ferrari, and Red Bull show slightly faster average pit stops
- Mid-field teams display slightly slower averages
- The differences are typically less than 1 second, but can compound across multiple stops in a race
This aligns with expectations: top teams invest heavily in pit crew training and optimization, creating measurable performance advantages.
Statistical Analysis
One-Sample T-Tests (comparing constructor means to overall dataset mean of 24.37 seconds):
| Constructor | n | Mean (sec) | Std Dev | t-statistic | p-value | Conclusion |
|---|---|---|---|---|---|---|
| Mercedes | 505 | 23.84 | 3.21 | −3.695 | 2.44e−4 | Significantly faster |
| Ferrari | 495 | 23.96 | 3.24 | −2.798 | 5.34e−3 | Significantly faster |
| Red Bull | 488 | 23.78 | 3.29 | −3.952 | 8.89e−5 | Significantly faster |
All three top constructors have statistically significant differences from the dataset mean (p < 0.01). However, the practical significance is modest—differences are fractions of a second, and box plots reveal substantial overlap between teams.
Linear Regression Model: A predictive model using Laps, Position, TotalPitStops, and constructor dummy variables achieved:
- RMSE: ~2.8 seconds
- R² Score: ~0.32
Top predictive features include constructor identity and total pit stops, indicating that team and race strategy are meaningful drivers of pit stop performance.
Key Insights
Team Consistency: Top teams (Mercedes, Ferrari, Red Bull) demonstrate faster and more consistent pit stops, creating measurable competitive advantages.
Modern Optimization: The cleaned dataset reflects highly optimized modern F1, where pit stops are predictable and clustered tightly around 24 seconds.
Statistical vs. Practical Significance: While differences between top teams are statistically significant due to large sample sizes, the practical impact (sub-second differences) depends on race context, circuit characteristics, and stacked stop scenarios.
Predictability: Pit stop times are partially predictable from race and team features, but substantial variation remains unexplained, suggesting situational factors and execution variability matter.
Next Steps & Future Work
- Incorporate race-level features (e.g., tire compound, fuel load, track position) for improved predictions.
- Analyze pit stop times as a time series to identify trends within seasons or across teams.
- Build machine learning models (Random Forest, Gradient Boosting) to capture non-linear relationships.
- Evaluate effect sizes (Cohen’s d) to quantify practical significance alongside statistical tests.
- Extend analysis to include pit stop consistency metrics for crew reliability assessment.