Predicting NHL Game Outcomes

A machine learning practice project
By Josh Crowhurst
March 2023

Use arrow keys or spacebar to navigate slideshow

The challenge: create a model that predicts NHL game outcomes, and performs on par with other NHL prediction models

Overview of Project 🏒

 

Input Data

Data used:

  • Source 1: Game data pulled from the NHL API
  • Source 2: Elo data from FiveThirtyEight
  • Train/test: 2017-2022. Validation: Current season
  • 6807 total games in dataset

Initial hypotheses:

  • Playing on back-to-back nights reduces chance of win
  • Teams on win/loss streaks will extend their streaks
  • “Stronger” teams are likely to beat “weaker” teams

Exploratory Data Analysis

EDA helped inform feature engineering and decide which features to include:

  • ✅ Home or away flag
  • ✅ Tired / rested flag
  • ✅ Elo differential between team and opponent
  • ✅ Recent short-term win rate (3- and 7-game windows)
  • ❌ Trends in special teams, faceoff %, shot differential
  • ❌ Blowout loss in previous game flag

And inspired some additional features to include in future:

  • Personnel-related features: backup goaltenders, injuries and absences
  • More robust “tiredness” proxies: KMs travelled and number of games over previous 4 calendar days
  • More opposing team features: short term win rate, tired / rested, personnel features

Modelling with tidymodels

The tidymodels framework helps simplify and align the syntax of various popular packages for machine learning. It was useful here to train and evaluate multiple algorithms with minimal configuration


The next natural question: can it be used for profitable sports betting?

More work needed to take this beyond a learning exercise

I used odds data from bettingdata.com to backtest a simple strategy: wagering $100 any time the model suggested a positive expected value on a bet during the season to date.

This strategy was not profitable, with a cumulative loss of -10.5% ($2538) on 241 bets.

A few next steps to make this usable for sports betting:

  • Improve model with additional features
  • Test additional algorithms; neural network was not used due to technical difficulties 💔
  • Try more realistic betting strategies such as variable bet sizing or using model as decision support system

Thanks for reading