I play Football Manager and always wondered: what stats actually matter? The game shows dozens of metrics per player, but which ones actually predict match performance?
I couldn't find a simple tool that lets me:
- Generate realistic player data
- Test correlations between stats
- Build a predictive model for player ratings
- Visualize everything interactively
So I built one.
Data Generation: Instead of using real data (hard to get), I wrote a generator that creates realistic synthetic data. Each position has different distributions:
- Forwards: More goals, fewer tackles
- Midfielders: High pass counts, moderate goals
- Defenders: High tackles, low shots
- Goalkeepers: Specialized stats
Analysis Pipeline:
- Descriptive stats: Means, distributions by position
- Correlation analysis: Which stats relate to each other?
- Regression: Can we predict player rating from other stats?
- Hypothesis testing: Do forwards actually score more than midfielders? (spoiler: yes)
Dashboard: Built with Plotly Dash + Bootstrap. Interactive filtering by position, player, stat type.
A fully interactive dashboard showing:
- Player performance trends over 20 games
- Position comparisons (box plots, bar charts)
- Statistical analysis (correlations, regression, hypothesis tests)
- Player categorization (star/good/average/needs improvement)
Key Findings from the Data:
- Pass accuracy correlates with rating (r ≈ 0.4)
- Forwards score significantly more than other positions (p < 0.001, obviously)
- A simple linear regression can predict rating with R² ≈ 0.6 from just 5 stats
# Setup
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Launch dashboard
python my_football_dashboard.py
# Or run statistical analysis only
python stats_stuff.pyThen open http://127.0.0.1:8050 in your browser.
FootStats/
├── my_football_dashboard.py # Interactive Dash dashboard (main entry point)
├── stats_stuff.py # Statistical analysis module
├── requirements.txt # Dependencies
└── README.md # This file
Statistics:
- T-tests for comparing groups (forwards vs midfielders)
- ANOVA for comparing multiple groups
- Pearson correlation for linear relationships
- Linear regression for prediction
- Effect size matters, not just p-values
Dash/Plotly:
- Callbacks for interactivity
- Layout management with Bootstrap
- Multi-axis charts for different scales
Data Generation:
- Poisson distribution for count data (goals, assists)
- Normal distribution for continuous stats (pass %, km run)
- Position-specific parameters make data realistic
- Real Data: Integrate with an API like Football-data.org or scrape real stats
- More Models: Try Random Forest or XGBoost for better prediction
- Time Series: Add trend analysis (players improving/declining over season)
- Team Analysis: Not just individual players, but team-level patterns
- Deployment: Deploy to Render or similar for public access
- Python 3.8+
- Dash, Plotly, scikit-learn, scipy
See requirements.txt for pinned versions.
Student project - built for learning data science and visualization.
