I have been promising some people that this (my first) post will be “coming soon” for quite some time now. It has finally arrived and the main reasons for my slowness are:
I am a slow writer. I often have all the ideas in my head, but when it comes to putting them into nice, publishable words, it has a tendency to become a bit of a slog.
The scope of this project was overly ambitious and expanded as I kept on branching off from the main mission in order provide additional background (when given free reign, this is my tendency).
The R code and analysis itself did not take too long, so the main areas I am looking to improve in the future are my writing efficiency and keeping the scope of my blog posts under control. However, it would feel like going against my style to avoid going down rabbit holes and off on tangents entirely, so there is still a balance to be struck there.
The out-of-control scope has also impacted the length of this article. I commend anyone who actually reads the whole thing, but hopefully it’s also possible to jump around and only read the parts that interest you in particular using the table of contents. I will endeavour to keep future blog posts a lot shorter going forward.
Thanks to everyone who has provided feedback and suggestions.
Fixed some grammatical, punctuation and spelling errors
Re-worded the question concerning goal kickers to include the words “for a team in a single game” instead of merely “in a single game”. I decided to change the question to suit the answer, instead of the reverse (this is because changing the answers would mean materially modifying the content of the article, which I didn’t want to do given the vast majority of the people who would ever read it, have done so already)
Prelude
Since antiquity (or at the very least living memory), sporting data has been recorded, published and analysed for almost every professional sport known to man. In the modern day of analytics and social media it has reached a point where sports statistics are constantly being recorded and opined on by teams, the press and your average punter alike.
My beloved sport of Australian Rules Football (which I will henceforth refer to as “footy”1) is no different. Indeed some have even described the AFL (Australia’s national competition) as “the most data rich sport on Earth”2, although I would suggest that certain American sports such as baseball (i.e. Moneyball) have made far better use of their data.
1 As a Western Australian this is how I define it (along with the majority of Australia) but I am aware that this term is reserved for rugby league in New South Wales and Queensland
2 Source
It is my observation that it is common for those watching or attending live footy to ask questions such as:
What is the record for the most [statistical category]?
or
When was the last time [obscure event] happened?
I believe a contributor to this is that we are emulating what we hear on the broadcast commentary. The difference is that they often have a team of computer-type boffins3 behind the scenes feeding them the answer.
3 The video linked here is unfortunately clipped from an American guy who kind of missed the point a bit, but it is also the highest quality clip I could find of this brilliant piece of commentary from BT (Brian Taylor for the uninitiated).
But what are us plebeians meant to do, bereft of such luxuries?
you might ask…
Well, a quick Google search will make short work of questions of a more trivial nature such as “which player has kicked the most career goals”4 or “which team has won the most premierships”5. The more savvy among us may find answers to slightly more edifying questions by performing pro gamer moves such as
Trawling through more obscure websites such as AFL Tables6 to answer things like “What is the most disposals Zac Dawson had in a game”7; or
Digging into the deep recesses of the AFL Live App to answer questions like “What is the record for the longest distance run in a game”8
6 the baseball/basketball reference of the footy world (but maybe not quite as extensive)
8 Tom Scully, 18.9 KM
Even so, some more sophisticated questions will still go unanswered.
However, if you are a gadget-type operator9 like myself, you will expand the number of footy stats questions you can answer immensely by accessing and manipulating the raw data yourself. There are, of course, a multitude of tools and approaches to this, but in this post, I will be using R (my preferred programming language)
9 My footy tipping username is Gadget-type Operator and I often use other BT quotes for my username on other (even non-footy-related) accounts
Target Audience Unclear
If you are a footy fan this article will likely present as some mildy interesting footy facts, combined with incomprehensible techno-babble. For R users this will likely appear to be a fairly elementary data wrangling exercise, combined with a bunch of references you don’t understand. But if you are both a footy fan and an R user, it will hopefully prove to be a quite interesting read.
A concrete example
I recently had the misfortune (as a West Coast fan) of attending a game live between the Sydney Swans and the (not so) mighty West Coast Eagles with the following score line (the equal fourth highest margin of all time):
In one-sided games like this, it seems to me that footy stats questions become more common than usual for two key reasons:
It adds something interesting to a game that otherwise lacks excitement
These games are often filled with large statistical anomalies that might set new records
As my usual footy-going companion (Saroop) and I are both actuaries by trade, footy stats questions were flying left, right, and centre on that gloomy (though not due to the weather) Saturday night at the SCG. The questions we posed did not just vanish into the aether either, I ,with the idea of writing this article in mind, decided to record the more interesting (and doable) questions.
Question List
This article aims to tackle the questions listed below:
- What is the record for the highest scoring quarter?
- What is the record for the most individual goal kickers for a team in a single game?
- What is the record for the most multiple goal kickers for a team in a single game?
- What is the record for the most players kicking five or more goals for a team in a single game (i.e. the most “bags”)?
- What is the record for the most clangers in a game?
- What is the record for the worst disposal efficiency in a game?
- Who was the youngest player to win a Norm Smith Medal?
- What is the record for the most unanswered goals in a game?*
- What is the school with the most AFL players on their list?*
- What city/town has the most AFL players relative to population?*
- Which player has the best goals to behinds ratio?*
- Have there ever been any undefeated seasons?*
- What is the worst win-loss record to make finals?*
I have put an asterisk next to questions that either:
- cannot easily be answered by the methods I discuss below; or
- are too lengthy for this blog post (these may get their own dedicated blog post in the future).
The remaining questions (which have hyperlinks to later sections of this article) will be tackled using R below.
Note that things are about to get very technical so if you are only really interested in the answers (and not the R coding), you can jump ahead by clicking here.
Technical Background
The fitzRoy
Package
The first step in analysing AFL data is obtaining the data (the so-called “collection” phase). Our first thought might be to search the web for publicly available AFL datasets and APIs or even scrape the data from websites such as the official AFL website, Footywire or AFL Tables. But there is a more straight-forward way.
While most people now know Fitzroy as a trendy inner city suburb of Melbourne, filled with terraces and over-priced croissants, it was once home10 to the mighty (and now merged out of existence) Fitzroy Lions Football Club.
10 It is also (regrettably) the place of my birth but as a WA boy I don’t like to talk about the fact that my parents happened to be in Melbourne when I was born.
We can obtain the data we need very simply using its name-sake, the fitzRoy
R package. This package abstracts away all the web scraping and API calls for us into a very helpful family of fetch_*
functions.
So let’s begin by loading the fitzRoy
package and while we’re at it, we will also load all the other packages we will be using later on.
fitzRoy
Data Sources
fitzRoy
provides access to a number of footy data sources11 including AFL Tables and the official AFL website. Each data source has its own advantages and disadvantages, for example:
11 Up-to-date information on data sources can be found on fitzRoy
’s documentation site
AFL Tables has the entirety of AFL/VFL history (1897 to present) but lacks some of the more advanced stats.
The official AFL website only has data from 2014 onwards but it also probably the most complete in terms of the advanced statistics it contains (e.g. centre bounce attendances12).
12 Centre bounce attendances (CBAs) are a commonly-used metric in AFL Fantasy, fantasy “coaches” often look at tools such as this one to help with researching their trades.
All of the different data sources are compared in the table below:
Note that each row of the table can be expanded to reveal what data is available from each source, as well at its use-case. In addition to the sources listed in this table, the following functions only come from one source:
Importing the Data
For the purposes of answering the questions above, I am most interested in the full history of the AFL and have decided to use AFL Tables as my primary data source13. I will also use Fryzigg for one small use-case where AFL Tables is missing key data (disposal efficiency) and some bespoke web scraping for Norm Smith Medallists.
13 In hindsight I somewhat regret this decision and would have probably preferred to use Fryzigg for everything (with the exception of quarter scores which it doesn’t have and AFL Tables does) but I only realised it had the full AFL/VFL history when I constructed the table comparing data sources above
The fetch_*
family of functions from the fitzRoy
package allow us to read data from the various sources. Consult the documentation site for a complete list of all the available functions.
We can fetch this data via fitzRoy
with the following code:
player_stats <- fetch_player_stats_afltables(season = 1897:2023)
results <- fetch_results_afltables(season = 1897:2023)
player_details <- fetch_player_details_afltables()
player_stats_fryzigg <- fetch_player_stats_fryzigg(season = 1897:2023)
When sourcing data from fitzRoy
, it is important to follow good data collection14 etiquette by only downloading the data you need and avoiding repeatedly downloading the same data over and over again. This prevents servers being overloaded and will mean everyone will get their data faster.
In keeping with this, for the purposes of this blog post, I have saved the data in a local RDS file. That way, I can simply use readRDS()
instead of of repeatedly calling the fetch_*
functions. The code for this is below (and the code above is not actually run but is cleaner for demonstration purposes):
if(file.exists("data/player_stats.RDS")) {
player_stats <- readRDS("data/player_stats.RDS")
} else {
player_stats <- fitzRoy::fetch_player_stats_afltables(season = 1897:2023)
saveRDS(player_stats, "data/player_stats.RDS")
}
if(file.exists("data/results.RDS")) {
results <- readRDS("data/results.RDS")
} else {
results <- fitzRoy::fetch_results_afltables(season = 1897:2023)
saveRDS(results, "data/results.RDS")
}
if(file.exists("data/player_details.RDS")) {
player_details <- readRDS("data/player_details.RDS")
} else {
player_details <- fetch_player_details_afltables()
saveRDS(player_details, "data/player_details.RDS")
}
if(file.exists("data/player_stats_fryzigg.RDS")) {
player_stats_fryzigg <- readRDS("data/player_stats_fryzigg.RDS")
} else {
player_stats_fryzigg <- fetch_player_stats_fryzigg(season = 1897:2023)
saveRDS(player_stats_fryzigg, "data/player_stats_fryzigg.RDS")
}
14 this topic is discussed on the fitzRoy
documentation site here
The data we have read in is as at round 19 of the 2023 AFL season.
Finicky Details About Other R Packages
Tidyverse Versus data.table
In the R community, there is an ongoing power struggle between using the Posit15-backed tidyverse and the heavily-optimised data.table
.
16 Note that I have not written a base
R dataframes version because I can see arguments for using both tidyverse and data.table
but base
R data.frame
s will probably cause more pain than they are worth (there is a reason that tidyverse and data.table
exist)
17 I will typically will only use data.table
if the size of data necessitates it. In this case, the data is less than a million rows so there are no problems.
As to not unsettle people who prefer either dplyr
(and the tidyverse) or data.table
, I have written code in both packages16. Where relevant, I have used a tabbed layout for the convenience of the reader. As my personal preference for readability purposes is the tidyverse17, I will place this code in the first tab.
While doing things in this way did scratch something of a perfectionist’s itch in me and was a fun learning experience, I will probably refrain from doing something like this again in future posts. I don’t think the additional time it took me to essentially write the same code twice is worth the effort.
Note that the code below is somewhat redundant as the fitzRoy
package follows the tidyverse philosophy and returns tibbles. I have used the _tb
suffix18 to distinguish tibble
/dplyr
/tidyverse from the data.table
code.
18 an abbreviation of “tibble”
Henceforth, all data.table
code will use the _dt
suffix19 as to distinguish it from the tidyverse code.
19 an acronym of “data.table”
player_stats_dt <- as.data.table(player_stats)
results_dt <- as.data.table(results)
player_details_dt <- as.data.table(player_details)
player_stats_fryzigg_dt <- as.data.table(player_stats_fryzigg)
Adoption of the Native Pipe Operator (|>
)
The so-called pipe operator (%>%
) of the magrittr
package has been a core staple of tidyverse since its inception, but since the R core team introduced the so-called native pipe (|>
) to base
R (in version 4.120), this has led to a split in adoption. There are some nuances in its usage21 but it overall behaves in a similar way to the magrittr
pipe and has less overhead (and is therefore faster). While the native pipe was initially missing some of the key features of the magrittr
pipe, new features22 have been added to it that (in my mind) mean that it might have even surpassed the magrittr
pipe.
20 another cool thing introduced in this version of R was so-called function shorthand (\()
), see help("function")
for more details
21 I may even cover these in a future blog post
22 In R version 4.2, the _
symbol was added as a placeholder character and in R version 4.3, extractions using the $
symbol are now allowed
While I have tried to appease people in both the tidyverse and data.table
camps, I will not be re-writing my code more than once with such as minor syntactic difference as the pipe I use. I will therefore be dragging all my tidyverse-using readers kicking and screaming into the R 4.1 world by adopting the native pipe (|>
) in my tidyverse code.
Note that the common RStudio shortcut, Ctrl+Shift+M
can be changed from the magrittr
pipe (%>%
), which is still the default, to the native pipe (|>
).
Webscraping package
While the majority of our data will be sourced using the fitzRoy
package, a small amount of data (namely Norm Smith medalists, which are outside of the scope of fitzRoy
) will require us to perform some bespoke web scraping. This will be performed using the rvest
package (loaded above).
Preliminary Data Wrangling
Flattening the Data
To begin with, let’s scrutinise the results data in order to figure out what we have to work with.
str(results)
tibble [16,352 × 16] (S3: tbl_df/tbl/data.frame)
$ Game : num [1:16352] 1 2 3 4 5 6 7 8 9 10 ...
$ Date : Date[1:16352], format: "1897-05-08" "1897-05-08" ...
$ Round : chr [1:16352] "R1" "R1" "R1" "R1" ...
$ Home.Team : chr [1:16352] "Fitzroy" "Collingwood" "Geelong" "Sydney" ...
$ Home.Goals : int [1:16352] 6 5 3 3 6 4 3 9 6 5 ...
$ Home.Behinds: int [1:16352] 13 11 6 9 4 6 8 10 5 9 ...
$ Home.Points : int [1:16352] 49 41 24 27 40 30 26 64 41 39 ...
$ Away.Team : chr [1:16352] "Carlton" "St Kilda" "Essendon" "Melbourne" ...
$ Away.Goals : int [1:16352] 2 2 7 6 5 8 10 3 5 7 ...
$ Away.Behinds: int [1:16352] 4 4 5 8 6 2 6 1 7 8 ...
$ Away.Points : int [1:16352] 16 16 47 44 36 50 66 19 37 50 ...
$ Venue : chr [1:16352] "Brunswick St" "Victoria Park" "Corio Oval" "Lake Oval" ...
$ Margin : int [1:16352] 33 25 -23 -17 4 -20 -40 45 4 -11 ...
$ Season : num [1:16352] 1897 1897 1897 1897 1897 ...
$ Round.Type : chr [1:16352] "Regular" "Regular" "Regular" "Regular" ...
$ Round.Number: int [1:16352] 1 1 1 1 2 2 2 2 3 3 ...
While inspecting the results
we may note that certain key match-level information (e.g. quarter-by-quarter scores) for answering some of our questions is missing from it. As it turns out, this data is actually available on the player_stats_afl_tables
data (one row per player per match) instead. Thus, we will opt to create a ‘flattened’ version of player_stats_afl_tables
with all the match-level fields available to use on both datasets and discard the results
dataset (save for some quick checks to make sure the player data ‘flattening’ worked as expected).
Now, let’s take a look at the player_stats_afl_tables
dataset to determine which fields are player-level and which are match-level.
str(player_stats)
Note that the output has been placed into another tab as it is rather long.
tibble [663,115 × 59] (S3: tbl_df/tbl/data.frame)
$ Season : num [1:663115] 1897 1897 1897 1897 1897 ...
$ Round : chr [1:663115] "1" "1" "1" "1" ...
$ Date : Date[1:663115], format: "1897-05-08" "1897-05-08" ...
$ Local.start.time : int [1:663115] 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 ...
$ Venue : chr [1:663115] "Brunswick St" "Brunswick St" "Brunswick St" "Brunswick St" ...
$ Attendance : num [1:663115] 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 ...
$ Home.team : chr [1:663115] "Fitzroy" "Fitzroy" "Fitzroy" "Fitzroy" ...
$ HQ1G : int [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ HQ1B : int [1:663115] 5 5 5 5 5 5 5 5 5 5 ...
$ HQ2G : int [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ HQ2B : int [1:663115] 11 11 11 11 11 11 11 11 11 11 ...
$ HQ3G : int [1:663115] 5 5 5 5 5 5 5 5 5 5 ...
$ HQ3B : int [1:663115] 13 13 13 13 13 13 13 13 13 13 ...
$ HQ4G : int [1:663115] 6 6 6 6 6 6 6 6 6 6 ...
$ HQ4B : int [1:663115] 13 13 13 13 13 13 13 13 13 13 ...
$ Home.score : int [1:663115] 49 49 49 49 49 49 49 49 49 49 ...
$ Away.team : chr [1:663115] "Carlton" "Carlton" "Carlton" "Carlton" ...
$ AQ1G : int [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ AQ1B : int [1:663115] 3 3 3 3 3 3 3 3 3 3 ...
$ AQ2G : int [1:663115] 1 1 1 1 1 1 1 1 1 1 ...
$ AQ2B : int [1:663115] 3 3 3 3 3 3 3 3 3 3 ...
$ AQ3G : int [1:663115] 2 2 2 2 2 2 2 2 2 2 ...
$ AQ3B : int [1:663115] 3 3 3 3 3 3 3 3 3 3 ...
$ AQ4G : int [1:663115] 2 2 2 2 2 2 2 2 2 2 ...
$ AQ4B : int [1:663115] 4 4 4 4 4 4 4 4 4 4 ...
$ Away.score : int [1:663115] 16 16 16 16 16 16 16 16 16 16 ...
$ First.name : chr [1:663115] "Bill" "Jimmy" "Bob" "Tom" ...
$ Surname : chr [1:663115] "Ahern" "Aitken" "Armstrong" "Blake" ...
$ ID : num [1:663115] 4415 4416 4417 4419 4421 ...
$ Jumper.No. : chr [1:663115] "0" "0" "0" "0" ...
$ Playing.for : chr [1:663115] "Carlton" "Carlton" "Carlton" "Carlton" ...
$ Kicks : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ Marks : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ Handballs : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ Goals : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ Behinds : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ Hit.Outs : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ Tackles : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ Rebounds : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ Inside.50s : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ Clearances : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ Clangers : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ Frees.For : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ Frees.Against : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ Brownlow.Votes : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ Contested.Possessions : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ Uncontested.Possessions: num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ Contested.Marks : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ Marks.Inside.50 : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ One.Percenters : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ Bounces : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ Goal.Assists : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ Time.on.Ground.. : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ Substitute : int [1:663115] 0 0 0 0 0 0 0 0 0 0 ...
$ Umpire.1 : chr [1:663115] "Samuel Hood" "Samuel Hood" "Samuel Hood" "Samuel Hood" ...
$ Umpire.2 : chr [1:663115] "" "" "" "" ...
$ Umpire.3 : chr [1:663115] "" "" "" "" ...
$ Umpire.4 : chr [1:663115] "" "" "" "" ...
$ group_id : int [1:663115] 2 2 2 2 2 2 2 2 2 2 ...
Inspecting the fields and using some knowledge of the game, we can determine that the following fields are player-level:
player_level_fields <- c(
"First.name", "Surname", "ID", "Jumper.No.", "Playing.for", "Kicks", "Marks",
"Handballs", "Goals", "Behinds", "Hit.Outs", "Tackles", "Rebounds", "Inside.50s",
"Clearances", "Clangers", "Frees.For", "Frees.Against", "Brownlow.Votes",
"Contested.Possessions", "Uncontested.Possessions", "Contested.Marks",
"Marks.Inside.50", "One.Percenters", "Bounces", "Goal.Assists", "Time.on.Ground..",
"Substitute"
)
match_level_fields <- setdiff(colnames(player_stats), player_level_fields)
We can now safely group and aggregate by the match_level_fields
below:
player_stats_tb |>
mutate(
home_player = Playing.for == Home.team,
away_player = Playing.for == Away.team
) |>
group_by(pick(all_of(match_level_fields))) |>
summarise(
player_count = n(),
home_kicks = sum(Kicks * home_player),
away_kicks = sum(Kicks * away_player),
home_marks = sum(Marks * home_player),
away_marks = sum(Marks * away_player),
home_handballs = sum(Handballs * home_player),
away_handballs = sum(Handballs * away_player),
home_hit_outs = sum(Hit.Outs * home_player),
away_hit_outs = sum(Hit.Outs * away_player),
home_tackles = sum(Tackles * home_player),
away_tackles = sum(Tackles * away_player),
home_rebounds = sum(Rebounds * home_player),
away_rebounds = sum(Rebounds * away_player),
home_inside_50s = sum(Inside.50s * home_player),
away_inside_50s = sum(Inside.50s * away_player),
home_clearances = sum(Clearances * home_player),
away_clearances = sum(Clearances * away_player),
home_clangers = sum(Clangers * home_player),
away_clangers = sum(Clangers * away_player),
home_frees_for = sum(Frees.For * home_player),
away_frees_for = sum(Frees.For * away_player),
home_frees_against = sum(Frees.Against * home_player),
away_frees_against = sum(Frees.Against * away_player),
home_contested_possessions = sum(Contested.Possessions * home_player),
away_contested_possessions = sum(Contested.Possessions * away_player),
home_uncontested_possessions = sum(Uncontested.Possessions * home_player),
away_uncontested_possessions = sum(Uncontested.Possessions * away_player),
home_contested_marks = sum(Contested.Marks * home_player),
away_contested_marks = sum(Contested.Marks * away_player),
home_marks_inside_50 = sum(Marks.Inside.50 * home_player),
away_marks_inside_50 = sum(Marks.Inside.50 * away_player),
home_one_percenters = sum(One.Percenters * home_player),
away_one_percenters = sum(One.Percenters * away_player),
home_bounces = sum(Bounces * home_player),
away_bounces = sum(Bounces * away_player),
home_goal_assists = sum(Goal.Assists * home_player),
away_goal_assists = sum(Goal.Assists * away_player),
.groups = "drop"
) |>
arrange(Date, Local.start.time, Home.team) ->
match_stats_flat_tb
# verify correct number of games:
nrow(match_stats_flat_tb) == nrow(results_tb)
[1] TRUE
match_stats_flat_dt <- copy(player_stats_dt)
match_stats_flat_dt[, home_player := Playing.for == Home.team]
match_stats_flat_dt[, away_player := Playing.for == Away.team]
match_stats_flat_dt <- match_stats_flat_dt[, .(
player_count = .N,
home_kicks = sum(Kicks * home_player),
away_kicks = sum(Kicks * away_player),
home_marks = sum(Marks * home_player),
away_marks = sum(Marks * away_player),
home_handballs = sum(Handballs * home_player),
away_handballs = sum(Handballs * away_player),
home_hit_outs = sum(Hit.Outs * home_player),
away_hit_outs = sum(Hit.Outs * away_player),
home_tackles = sum(Tackles * home_player),
away_tackles = sum(Tackles * away_player),
home_rebounds = sum(Rebounds * home_player),
away_rebounds = sum(Rebounds * away_player),
home_inside_50s = sum(Inside.50s * home_player),
away_inside_50s = sum(Inside.50s * away_player),
home_clearances = sum(Clearances * home_player),
away_clearances = sum(Clearances * away_player),
home_clangers = sum(Clangers * home_player),
away_clangers = sum(Clangers * away_player),
home_frees_for = sum(Frees.For * home_player),
away_frees_for = sum(Frees.For * away_player),
home_frees_against = sum(Frees.Against * home_player),
away_frees_against = sum(Frees.Against * away_player),
home_contested_possessions = sum(Contested.Possessions * home_player),
away_contested_possessions = sum(Contested.Possessions * away_player),
home_uncontested_possessions = sum(Uncontested.Possessions * home_player),
away_uncontested_possessions = sum(Uncontested.Possessions * away_player),
home_contested_marks = sum(Contested.Marks * home_player),
away_contested_marks = sum(Contested.Marks * away_player),
home_marks_inside_50 = sum(Marks.Inside.50 * home_player),
away_marks_inside_50 = sum(Marks.Inside.50 * away_player),
home_one_percenters = sum(One.Percenters * home_player),
away_one_percenters = sum(One.Percenters * away_player),
home_bounces = sum(Bounces * home_player),
away_bounces = sum(Bounces * away_player),
home_goal_assists = sum(Goal.Assists * home_player),
away_goal_assists = sum(Goal.Assists * away_player)
), by = match_level_fields]
setorder(match_stats_flat_dt, Date, Local.start.time, Home.team)
# verify outputs match:
identical(as.data.frame(match_stats_flat_tb), as.data.frame(match_stats_flat_dt))
[1] TRUE
Henceforth, player_stats_*
and match_stats_flat_*
will be the two datasets we will use predominantly.
IDs and URLs
One thing that our match_stats_flat_*
dataset is currently lacking is a game ID for use as a primary key. In addition, being able to link directly to AFL tables when talking about a particular game or player would be handy.
Game ID and URL
Let’s tackle the game ID by writing some functions to an ID which also conveniently lines up with the way AFL Tables game URLs work (two birds with one stone).
team_code_map <- c(
"Adelaide" = "01",
"Adelaide Crows" = "01",
"Brisbane Bears" = "02",
"Carlton" = "03",
"Collingwood" = "04",
"Essendon" = "05",
"Fitzroy" = "06",
"Western Bulldogs" = "07",
"Fremantle" = "08",
"Geelong" = "09",
"Geelong Cats" = "09",
"Hawthorn" = "10",
"Melbourne" = "11",
"North Melbourne" = "12",
"Port Adelaide" = "13",
"Richmond" = "14",
"St Kilda" = "15",
"Sydney" = "16",
"Sydney Swans" = "16",
"University" = "17",
"West Coast" = "18",
"West Coast Eagles" = "18",
"Brisbane Lions" = "19",
"Gold Coast" = "20",
"Gold Coast Suns" = "20",
"Greater Western Sydney" = "21",
"GWS Giants" = "21"
)
# The three functions below are all vectorised for efficiency purposes
get_team_code <- function(team_name) {
unname(team_code_map[team_name])
}
get_game_id <- function(home_team_code, away_team_code, game_date) {
# example ID: 161820230624
game_date_string <- format(game_date, "%Y%m%d")
ifelse(
home_team_code > away_team_code,
# the smaller code is always first
paste0(away_team_code, home_team_code, game_date_string),
paste0(home_team_code, away_team_code, game_date_string)
)
}
get_game_afltables_url <- function(game_id, season) {
# example url: https://afltables.com/afl/stats/games/2023/161820230624.html
paste0("https://afltables.com/afl/stats/games/", season,"/", game_id, ".html")
}
Now let’s use these functions23 to add a primary key to our match_stats_flat_*
datasets.
23 Note that as the functions are vectorised, we need not use the slow purrr::map*()
or *apply()
family of functions to apply them to a column of our tibble
and data.table
respectively.
match_stats_flat_tb |>
mutate(
home_team_code = get_team_code(Home.team),
away_team_code = get_team_code(Away.team),
game_id = get_game_id(home_team_code, away_team_code, Date),
game_afltables_url = get_game_afltables_url(game_id, Season)
) |>
relocate(game_id, .before = Season) |>
arrange(game_id) ->
match_stats_flat_tb
match_stats_flat_dt[, home_team_code := get_team_code(Home.team)]
match_stats_flat_dt[, away_team_code := get_team_code(Away.team)]
match_stats_flat_dt[, game_id := get_game_id(home_team_code, away_team_code, Date)]
match_stats_flat_dt[, game_afltables_url := get_game_afltables_url(game_id, Season)]
setcolorder(match_stats_flat_dt, c("game_id", setdiff(names(match_stats_flat_dt), "game_id")))
setkey(match_stats_flat_dt, game_id)
# verify outputs match:
identical(as.data.frame(match_stats_flat_tb), as.data.frame(match_stats_flat_dt))
[1] TRUE
Player URLs
In a similar way we can add a player URL to our player_stats_*
datasets, we start by creating a mapping table.
# non-duplicate URL: https://afltables.com/afl/stats/players/E/Errol_Gulden.html
# duplicate URL: https://afltables.com/afl/stats/players/J/Josh_Kennedy0.html, https://afltables.com/afl/stats/players/J/Josh_Kennedy1.html
# for dealing with duplicates, for example Peter Brown (6 of the same name!) seems to have a nonsensical order
player_stats_tb |>
mutate(full_name = paste(First.name, Surname, sep = "_")) |>
distinct(ID, full_name) |>
group_by(full_name) |>
arrange(ID) |>
mutate(
instance_number = as.character(cumsum(rep(1L, n())) - 1L),
dup_count = n()
) |>
mutate(
number_suffix = if_else(dup_count == 1L, "", instance_number),
first_letter = str_sub(full_name, 1, 1),
player_afltables_url = paste0("https://afltables.com/afl/stats/players/",
first_letter, "/", full_name, number_suffix, ".html")
) |>
ungroup() |>
select(ID, player_afltables_url) ->
player_url_tb
player_url_dt <- copy(player_stats_dt)
player_url_dt[, full_name := paste(First.name, Surname, sep = "_")]
player_url_dt <- unique(player_url_dt, by = c("ID", "full_name"))
setorder(player_url_dt, ID)
player_url_dt <- player_url_dt[, `:=`(
instance_number = as.character(cumsum(rep(1L, .N)) - 1L),
dup_count = .N
), "full_name"]
player_url_dt[, number_suffix := fifelse((dup_count == 1L), "", instance_number)]
player_url_dt[, first_letter := str_sub(full_name, 1, 1)]
player_url_dt[, player_afltables_url := paste0("https://afltables.com/afl/stats/players/",
first_letter, "/", full_name, number_suffix, ".html")]
player_url_dt <- player_url_dt[, .(ID, player_afltables_url)]
# verify outputs match:
identical(as.data.frame(player_url_tb), as.data.frame(player_url_dt))
[1] TRUE
Now we can add add the game ID, game URL and player URL to the player_stats_*
dataset.
player_stats_tb <- as_tibble(player_stats) # copied from above
player_stats_tb |>
mutate(
home_team_code = get_team_code(Home.team),
away_team_code = get_team_code(Away.team),
game_id = get_game_id(home_team_code, away_team_code, Date),
player = paste0(First.name, " ", Surname, " (", Playing.for,")")
) |>
left_join(match_stats_flat_tb |> select(game_id, game_afltables_url), by = "game_id") |>
left_join(player_url_tb, by = "ID") |>
relocate(c("game_id", "player", "ID"), .before = Season) |>
arrange(game_id, Playing.for, ID) ->
player_stats_tb
player_stats_dt <- as.data.table(player_stats) # copied from above
player_stats_dt[, home_team_code := get_team_code(Home.team)]
player_stats_dt[, away_team_code := get_team_code(Away.team)]
player_stats_dt[, game_id := get_game_id(home_team_code, away_team_code, Date)]
player_stats_dt[, player := paste0(First.name, " ", Surname, " (", Playing.for,")")]
player_stats_dt <- merge(
player_stats_dt, match_stats_flat_dt[, c("game_id", "game_afltables_url")],
by = "game_id")
player_stats_dt <- merge(player_stats_dt, player_url_dt, by = "ID")
setcolorder(player_stats_dt, c(c("game_id", "player"), setdiff(names(player_stats_dt), c("game_id", "player"))))
setkey(player_stats_dt, game_id, Playing.for, ID)
# verify outputs match:
identical(as.data.frame(player_stats_tb), as.data.frame(player_stats_dt))
[1] TRUE
Finding the Infamous Game
Let’s use these new datasets to perform the simple exercise of obtaining the game ID for the aforementioned Swans versus Eagles game. We can henceforth use this game ID whenever relevant to rank the Swans in the statistical category we investigate.
(
infamous_game_id <- get_game_id(
home_team_code = get_team_code("Sydney"),
away_team_code = get_team_code("West Coast"),
game_date = as.Date("2023-06-24")
)
)
[1] "161820230624"
We can then filter the data and present it below in a table below24.
24 note that the code to format the table is omitted.
Match Stats
match_stats_flat_tb |>
filter(game_id == infamous_game_id)
match_stats_flat_dt[game_id == infamous_game_id, ]
Player Stats
player_stats_tb |>
filter(game_id == infamous_game_id)
player_stats_dt[game_id == infamous_game_id, ]
Figuring Out the Answers
In this section I will provide my working using R for each of the aforementioned questions. Where relevant, I will figure out where the aforementioned infamous game places in the history of the AFL for that particular category.
The pathways I go down are only one of many permutations of stats you can look at and angles you can approach things from. The code I have written is also probably more thorough and well-presented than how I would typically do it. When I do this type of thing with no intent on publishing it, my data manipulations will generally be far more ad-hoc and expedient (I pay far less attention to reproducibility and consistent naming conventions).
Highest Scoring Quarter
As listed above, our first question was:
What is the record for the highest scoring quarter?
Note that there is already a page on this topic on AFL Tables, but it is a good one to start with regardless.
To answer this question, we will begin by creating a reshaped version of the match_stats_flat_*
dataset that is structured around quarters.
game_level_columns <- c("game_id", "game_afltables_url", "Season", "Round", "Venue", "Home.team", "Away.team", "Home.score", "Away.score")
match_stats_flat_tb |>
select(all_of(game_level_columns), starts_with("HQ"), starts_with("AQ")) |>
pivot_longer(cols = c(starts_with("HQ"), starts_with("AQ")), names_to = "quarter_gb", values_to = "gb_count") |>
mutate(
quarter = str_extract(quarter_gb, "\\d"),
gb_label = if_else(str_detect(quarter_gb, "G$"), "goals", "behinds"),
is_home_score = str_detect(quarter_gb, "^H")
) |>
pivot_wider(id_cols = all_of(c(game_level_columns, "quarter", "is_home_score")), names_from = gb_label, values_from = gb_count) |>
arrange(game_id, is_home_score, quarter) |>
group_by(game_id, is_home_score) |>
mutate(# make quarters incremental
goals = c(head(goals, 1), diff(goals)),
behinds = c(head(behinds, 1), diff(behinds))
) |>
ungroup() |>
mutate(
score = goals * 6 + behinds,
team = if_else(is_home_score, Home.team, Away.team),
opposition = if_else(!is_home_score, Home.team, Away.team)
) |>
select(-is_home_score) -> quarter_stats_tb
game_level_columns <- c("game_id", "game_afltables_url", "Season", "Round", "Venue", "Home.team", "Away.team", "Home.score", "Away.score")
quarter_stats_dt <- copy(match_stats_flat_dt)
quarter_stats_dt <- quarter_stats_dt[, .SD, .SDcols = names(quarter_stats_dt) %like% paste(
paste(game_level_columns, collapse = "|"), "^HQ", "^AQ",
sep = "|")]
quarter_stats_dt <- melt(quarter_stats_dt, id.vars = game_level_columns, variable.name = "quarter_gb", value.name = "gb_count")
quarter_stats_dt[, quarter := str_extract(quarter_gb, "\\d")]
quarter_stats_dt[, gb_label := fifelse(str_detect(quarter_gb, "G$"), "goals", "behinds")]
quarter_stats_dt[, is_home_score := str_detect(quarter_gb, "^H")]
quarter_stats_dt[, quarter_gb:=NULL]
quarter_stats_dt <- dcast(quarter_stats_dt, ... ~ gb_label, value.var = "gb_count")
# make quarters incremental
setorder(quarter_stats_dt, game_id, is_home_score, quarter)
quarter_stats_dt[, goals := c(head(goals, 1), diff(goals)), c("game_id", "is_home_score")]
quarter_stats_dt[, behinds := c(head(behinds, 1), diff(behinds)), c("game_id", "is_home_score")]
quarter_stats_dt[, score := goals * 6 + behinds]
quarter_stats_dt[, team := fifelse(is_home_score, Home.team, Away.team)]
quarter_stats_dt[, opposition := fifelse(!is_home_score, Home.team, Away.team)]
quarter_stats_dt <- quarter_stats_dt[, .SD, .SDcols = c(game_level_columns, c("quarter", "goals", "behinds", "score", "team", "opposition"))]
# verify outputs match:
identical(as.data.frame(quarter_stats_tb), as.data.frame(quarter_stats_dt))
[1] TRUE
We will answer this question for each quarter (first, second, third and fourth), as well as overall. This means we we will be repeating the same process five times, so this calls for writing a function. The function will give us the top 5 scoring quarters, as well as ranking for the aforementioned infamous game on the all time list of quarters.
get_top_quarter_scores_tb <- function(data, quarter_selection) {
data |>
filter(quarter %in% quarter_selection) |>
arrange(desc(score)) |>
mutate(rank = seq_along(team)) |>
filter(rank %in% 1:5 | (game_id == infamous_game_id & team == "Sydney")) |>
select(rank, team, opposition, score, quarter, Season, Round, Venue, game_afltables_url, game_id)
}
get_top_quarter_scores_dt <- function(data, quarter_selection) {
top_quarters_q1_dt <- copy(quarter_stats_dt)
top_quarters_q1_dt <- top_quarters_q1_dt[
quarter %in% quarter_selection, ]
setorder(top_quarters_q1_dt, -score)
top_quarters_q1_dt[, rank := seq_along(team)]
top_quarters_q1_dt[rank %in% 1:5 | (game_id == infamous_game_id & team == "Sydney"),
.(rank, team, opposition, score, quarter, Season, Round, Venue, game_afltables_url, game_id)]
}
First Quarter
top_quarter_scores_q1_tb <- get_top_quarter_scores_tb(quarter_stats_tb, 1L)
top_quarter_scores_q1_dt <- get_top_quarter_scores_dt(quarter_stats_tb, 1L)
# verify outputs match:
identical(as.data.frame(top_quarter_scores_q1_tb), as.data.frame(top_quarter_scores_q1_dt))
[1] TRUE
The record for the highest-scoring first quarter occurred during the bloodbath of an encounter that was the Bombers’ first ever clash with the Gary Ablett Jr.-led Gold Coast Suns in their inaugural season in the AFL. The Bombers came out of the blocks in a flash and mercilessly obliterated the inexperienced Gold Coast side, notching up a blistering 93 point lead at quarter time. Interestingly, the Suns actually managed to win the second quarter as the Bombers appeared to take their foot off the accelerator a little to only win by 139 points when all was said and done.
Perhaps the Suns’ lethargy in the first quarter against the Dons can be explained as a hangover25 following on from their first ever win the previous week26. It is exciting to me that this is a game that I can remember watching on the television at the time, and it may have even been the first Gold Coast game I ever watched27. Footy is full of narratives and it is fun to spin one around this particular game (the context and stories make footy stats even more fun).
25 As a club with an abundance of 18 or 19 year old blokes living out of home for the first time, the Suns were known to over-indulge in the Gold Coast party culture in those days.
26 Courtesy of a (missed) shot at goal after the siren from Justin Westhoff.
27 I am glad that I didn’t give up on watching them after that (mainly due to Gary Ablett I will admit) because otherwise I would have missed unbelievable goals like this.
Second Quarter
top_quarter_scores_q2_tb <- get_top_quarter_scores_tb(quarter_stats_tb, 2L)
top_quarter_scores_q2_dt <- get_top_quarter_scores_dt(quarter_stats_tb, 2L)
# verify outputs match:
identical(as.data.frame(top_quarter_scores_q2_tb), as.data.frame(top_quarter_scores_q2_dt))
[1] TRUE
From one teams’ first season, to another’s last. It is quite fitting (although sad) that the highest scoring second quarter was against a floundering (aforementioned) fitzRoy
Football Club (to which we owe the ease with which we obtained this data) en-route to a wooden spoon in their final season prior to merging with Brisbane Bears to form the Brisbane Lions.
As I was not yet born, I do not remember the game, but on the video below, the commentator shrewdly points to a strong wind prevailing towards the Crows’ goal at the beginning of the second quarter which certainly didn’t bode well for the Lions.
Third Quarter
top_quarter_scores_q3_tb <- get_top_quarter_scores_tb(quarter_stats_tb, 3L)
top_quarter_scores_q3_dt <- get_top_quarter_scores_dt(quarter_stats_tb, 3L)
# verify outputs match:
identical(as.data.frame(top_quarter_scores_q3_tb), as.data.frame(top_quarter_scores_q3_dt))
[1] TRUE
The Swans’ third quarter appears in 28th position here, which is the best position it gets.
Fourth Quarter
top_quarter_scores_q4_tb <- get_top_quarter_scores_tb(quarter_stats_tb, 4L)
top_quarter_scores_q4_dt <- get_top_quarter_scores_dt(quarter_stats_tb, 4L)
# verify outputs match:
identical(as.data.frame(top_quarter_scores_q4_tb), as.data.frame(top_quarter_scores_q4_dt))
[1] TRUE
Well this was slightly unexpected, the Bloods28 came home like a freight train against the woeful Saints in a game that took place over 100 years ago. It is also the only quarter in AFL history that has notched up a ton. Upon seeing this, given its vintage, I thought that perhaps the story of this game might have been lost to time but the Swans have a most insightful article up on their website about it. The explanation in the article claims that St Kilda were undermanned through a combination of injury and player protest on account of off-field disputes. It is safe to say that the “Bloods” showed them no mercy.
28 Incorrectly labelled here as “Sydney” in the table above because at the time they resided in South Melbourne (they relocated to Sydney in 1982), they were also known as the “Bloods” prior to adopting their current Swans mascot in 1933 due to the number of Western Australians in the side (as a WA boy I couldn’t help mentioning this)
All Quarters
top_quarter_scores_tb <- get_top_quarter_scores_tb(quarter_stats_tb, 1L:4L)
top_quarter_scores_dt <- get_top_quarter_scores_dt(quarter_stats_tb, 1L:4L)
# verify outputs match:
identical(as.data.frame(top_quarter_scores_tb), as.data.frame(top_quarter_scores_dt))
[1] TRUE
In the infamous game, the Swans’ third quarter 29 was the only one that reached the top 100 quarters of all time. The fact that no quarter was even close to the the top indicates that the Swans were very consistent through-out the game. To use a cliche, they put in a consistent four-quarter effort and I suppose the Eagles were consistent too (consistently dismal).
29 i.e. the premiership quarter
Most Goal-kickers
Three of the aforementioned questions concern goal kickers. We can therefore write a function that can generalise our approach like we did for the previous question.
These questions were:
What is the record for the most individual goal kickers for a team in a single game?
What is the record for the most multiple goal kickers for a team in a single game?
What is the record for the most players kicking five or more goals for a team in a single game (i.e. the most “bags”)?
The Swans game appeared to have a rather even distribution of goal kickers in the infamous game, so it will be interesting to see where it places on the all time list in this category.
get_top_goal_scorers_tb <- function(data, min_goals) {
data |>
mutate(
team = Playing.for,
opposition = if_else(team == Home.team, Away.team, Home.team)
) |>
group_by(team, opposition, Season, Round, Venue, game_afltables_url, game_id, Date) |>
summarise(
goal_kickers = sum(Goals > min_goals),
.groups = "drop"
) |>
arrange(desc(goal_kickers), desc(Date)) |>
mutate(rank = seq_along(game_id)) |>
relocate(rank, .before = "team") |>
relocate(goal_kickers, .before = "Season") |>
filter(rank %in% 1:5|(game_id == infamous_game_id & team == "Sydney")) |>
select(-Date)
}
get_top_goal_scorers_dt <- function(data, min_goals) {
top_goal_scorers_dt <- copy(data)
top_goal_scorers_dt[, team := Playing.for]
top_goal_scorers_dt[, opposition := fifelse(team == Home.team, Away.team, Home.team)]
top_goal_scorers_dt <- top_goal_scorers_dt[,.(goal_kickers = sum(Goals > min_goals)),
c("team", "opposition", "Season", "Round", "Venue",
"game_afltables_url", "game_id", "Date")]
setorder(top_goal_scorers_dt, -goal_kickers, -Date)
top_goal_scorers_dt[, rank := seq_along(game_id)]
top_goal_scorers_dt[rank %in% 1:5|(game_id == infamous_game_id & team == "Sydney"),
.(rank, team, opposition, goal_kickers, Season, Round, Venue,
game_afltables_url, game_id)]
}
Individual
top_goal_scorers_single_tb <- get_top_goal_scorers_tb(player_stats_tb, 0L)
top_goal_scorers_single_dt <- get_top_goal_scorers_dt(player_stats_dt, 0L)
# verify outputs match:
identical(as.data.frame(top_goal_scorers_single_tb), as.data.frame(top_goal_scorers_single_dt))
[1] TRUE
The record for most goal kickers is actually tied by multiple teams. The most recent time this happened was in the Bulldogs’ 101 point drubbing of the Eagles last year30. The infamous game is a bit off the pace in 238th but 12 goal-kickers is still double a starting forward line.
30 Yet another example of how poorly the Eagles have performed in 2022 and 2023
Multiple
top_goal_scorers_multiple_tb <- get_top_goal_scorers_tb(player_stats_tb, 1L)
top_goal_scorers_multiple_dt <- get_top_goal_scorers_dt(player_stats_dt, 1L)
# verify outputs match:
identical(as.data.frame(top_goal_scorers_multiple_tb), as.data.frame(top_goal_scorers_multiple_dt))
[1] TRUE
The Swans game actually places equal 7th on the list of all time which is quite a notable result. It is also interesting that it was a similarly one-sided Swans game31 at the SCG that takes outright top spot. In that game (circa 1987) the human highlight reel Warwick Capper led all comers for the Swans with a bag of 6 snags.
Five or More (Bags)
top_goal_scorers_bags_tb <- get_top_goal_scorers_tb(player_stats_tb, 4L)
top_goal_scorers_bags_dt <- get_top_goal_scorers_dt(player_stats_dt, 4L)
# verify outputs match:
identical(as.data.frame(top_goal_scorers_bags_tb), as.data.frame(top_goal_scorers_bags_dt))
[1] TRUE
Four bags in one game has happened on two occasions, the most recent of which (in 1991) yet again featured the Fitzroy Lions, who were trounced by 157 points by the Hawks in North Hobart.
The list of bag-getters in this game makes for interesting reading, all were recognisable names (although one more for his family connections that his own merit). As one might expect, one of the bags was courtesy of Hawthorn spearhead Jason Duntall (6 snags), along with 7 apiece from WA boy Ben Allan32 and the three-time premiership player Darren Jarmon. Rounding out the four was a contribution of 5 snags from Paul Hudson, who is the son of Tasmanian footy legend Peter Hudson (how fitting that this game was played in Tassie) who averaged more than 5 goals a game himself (an incredible feat).
32 Sorry I couldn’t help myself, he was also a Claremont Tiger (up the mighty Tiges)
Questions About Questionable Disposal
Two of the questions concerned clangers and disposal efficiency:
What is the record for the most clangers in a game?
What is the record for the worst disposal efficiency in a game?
These statistics (which we will define below) are more advanced and have only been recorded more recently, so we will therefore have to check which data sources to use and what years they are available for.
Most Clangers
A clanger is defined as:
an absurd or embarrassing blunder.
Or in more precise football statistics terms:
An error made by a player resulting in a negative result for his side. Disposal clangers are any kick or handball that directly turns the ball over to the opposition. Frees and 50-metre penalties against, No Pressure Errors, Dropped Marks and Debits are all included in clangers.
Source: Champion Data.
Clanger data is available on the AFL Tables data from 1998 onwards, so we will have to make do with only recent memory.
player_stats_tb |>
filter(!is.na(Clangers)) |>
group_by(Home.team, Away.team, Season, Round, Venue, game_afltables_url, game_id) |>
summarise(
clangers = sum(Clangers),
.groups = "drop"
) |>
arrange(desc(clangers), Season, Round) |>
mutate(
rank = seq_along(clangers)
) |>
filter(rank %in% 1:5 | game_id == infamous_game_id) |>
select(
rank, game_id, Home.team, Away.team, clangers, Season, Round, Venue, game_afltables_url
) -> most_clangers_tb
most_clangers_dt <- copy(player_stats_dt)
most_clangers_dt <- most_clangers_dt[!is.na(Clangers),]
most_clangers_dt <- most_clangers_dt[, .(clangers = sum(Clangers)), c("Home.team", "Away.team", "Season", "Round", "Venue", "game_afltables_url", "game_id")]
setorder(most_clangers_dt, -clangers, Season, Round)
most_clangers_dt[,rank := seq_along(clangers)]
most_clangers_dt <- most_clangers_dt[rank %in% 1:5 | game_id == infamous_game_id,]
most_clangers_dt <- most_clangers_dt[, .(rank, game_id, Home.team, Away.team, clangers, Season, Round, Venue, game_afltables_url)]
# verify outputs match:
identical(as.data.table(most_clangers_tb), as.data.table(most_clangers_dt))
[1] TRUE
A lot of the games in the top 5 are from very recent years, probably highlighting the more recent trend in teams rolling the dice more with possession (high risk, high reward), as popularised by Richmond and adopted to great success of late by the Magpies. Having spot-checked a few examples, there is also a bit of a pattern of wet weather impacting these games it would seem.
Also notable in the top 5 is one of three games played in Shanghai as part of the AFL’s attempt at entering into the Chinese market between 2017 and 2019. While there was some rain about, one might presume from this that the Suns and Port didn’t put on a particularly impressive display of our game on that occasion, either that or they were putting on an entertaining show with plenty of high-risk, high-reward plays. The video highlights appear to be reasonably exciting, so I am going to assume the latter.
Worst Disposal Efficiency
Disposal efficiency is
the percentage of disposals that are effective.33
Where effective disposal is any of:
- Effective handball: a handball to a teammate that hits the intended target.
- Effective Short Kick: A kick of less than 40 metres that results in the intended target retaining possession. Does not include kicks that are spoiled by the opposition.
- Effective Long Kick: A kick of more than 40 metres to a 50/50 contest or better for the team.34
Note how the distance of the disposal is an element of how lenient the definition of “effective” is.
This statistic requires our first (and only) use of the Fryzigg data, as disposal efficiency is not present on the AFL Tables data. While the Fryzigg data has the full history of the AFL, disposal efficiency is missing for seasons prior to 2012 onwards. We will therefore have to make do with answering this question only for about the past decade.
player_stats_fryzigg_tb |>
filter(!is.na(disposal_efficiency_percentage)) |>
mutate(
home_team_code = get_team_code(match_home_team),
away_team_code = get_team_code(match_away_team),
season = str_sub(match_date, 1, 4),
afl_tables_game_id = get_game_id(home_team_code, away_team_code, as.Date(match_date)),
afl_tables_url = get_game_afltables_url(afl_tables_game_id, season)
) |>
group_by(afl_tables_game_id, match_home_team, match_away_team, venue_name, season, match_round, afl_tables_url) |>
summarise(
disposal_efficiency_game = sum(disposal_efficiency_percentage * disposals) / sum(disposals) / 100,
.groups = "drop"
) |>
arrange(disposal_efficiency_game) |>
mutate(
rank = seq_along(disposal_efficiency_game)
) |>
filter(rank %in% 1:5 | afl_tables_game_id == infamous_game_id) |>
select(
rank, afl_tables_game_id, match_home_team, match_away_team, disposal_efficiency_game, season, match_round, venue_name, afl_tables_url
) -> worst_disposal_efficiency_games_tb
worst_disposal_efficiency_games_dt <- copy(player_stats_fryzigg_dt)
worst_disposal_efficiency_games_dt <- worst_disposal_efficiency_games_dt[!is.na(disposal_efficiency_percentage), ]
worst_disposal_efficiency_games_dt[, home_team_code := get_team_code(match_home_team)]
worst_disposal_efficiency_games_dt[, away_team_code := get_team_code(match_away_team)]
worst_disposal_efficiency_games_dt[, season := str_sub(match_date, 1, 4)]
worst_disposal_efficiency_games_dt[, afl_tables_game_id := get_game_id(home_team_code, away_team_code, as.Date(match_date))]
worst_disposal_efficiency_games_dt[, afl_tables_url := get_game_afltables_url(afl_tables_game_id, season)]
worst_disposal_efficiency_games_dt <- worst_disposal_efficiency_games_dt[,
.(disposal_efficiency_game = sum(disposal_efficiency_percentage * disposals) / sum(disposals) / 100),
c("afl_tables_game_id", "match_home_team", "match_away_team", "venue_name", "season", "match_round", "afl_tables_url")
]
setorder(worst_disposal_efficiency_games_dt, disposal_efficiency_game)
worst_disposal_efficiency_games_dt[, rank := seq_along(disposal_efficiency_game)]
worst_disposal_efficiency_games_dt <- worst_disposal_efficiency_games_dt[rank %in% 1:5 | afl_tables_game_id == infamous_game_id,]
worst_disposal_efficiency_games_dt <- worst_disposal_efficiency_games_dt[, .(rank, afl_tables_game_id, match_home_team, match_away_team, disposal_efficiency_game, season, match_round, venue_name, afl_tables_url)]
# verify outputs match:
identical(as.data.table(worst_disposal_efficiency_games_tb), as.data.table(worst_disposal_efficiency_games_dt))
[1] TRUE
The only game with less than 50% disposal efficiency was played in torrid conditions up in Cairns. Looking at the video highlights, the players were running through puddles the whole game. I have however seen equal or worse conditions in the past, so it is somewhat curious that this was the worst by such a margin. For reference, Gold Coast were very poor that year, coming second last but the Roos came in a respectable ninth position (only one win and percentage outside the top 8), this also points to them probably winning this game if has been played in more favourable conditions, but they still would have missed out on finals due to the mammoth percentage of the Cats that year. The “cleanest” player on the day was Jesse Joyce, whose 8 touches came at 75% efficiency (however it was a rather low sample size).
Further disjointed musings:
- Seeing a game in 2018 is also a fun reminder of the last time the Eagles won a premiership, which feels a long way off given the current predicament the club is in, in spite of their recent win against the similarly languishing35 Roos last Sunday.
35 But far more optimistic due to a combination of Clarko (he returns from his hiatus this week) and promising talent on their list such as Harry Sheezel
In the 20th century, the use of suburban grounds where the quality of the surface was subpar I am sure led to far more games with lower disposal efficiency than this (muddy fields were far more common in those days).
We can’t entirely blame the players, given the conditions. Anyone who has kicked a footy around in the wet will know how much heavier and slipperier than usual it can get (it is often described as being like a bar of soap). By looking through old highlights packages of the top several games in this metric, all of them appear to have been impacted significantly by weather conditions.
The Fryzigg data actually has weather conditions as a field on it but it appears to be somewhat unreliable, when cross referencing the games with match reports and highlights, some of the “sunny” games turned out to be played in torrential rain.
I also checked which players have the highest career kicking efficiency and it appears to be mostly defenders who probably inflate their numbers by getting involved in switches of play and chipping the ball around the back line. So we have to take this kind of metric with a grain of salt, there is a certain difficulty level with executing certain types of disposal (e.g. a kick inside 50) that it does not fully capture.
Youngest Norm Smith Medalist
Our question regarding Norm Smith medallists from above reads:
Who was the youngest player to win a Norm Smith Medal?
The Norm Smith Medal is awarded to the player who is adjudged best on ground in the AFL grand final.
The award is named after legendary Melbourne full forward of the 1940’s and coach of the 1950’s and 1960’s, Norm Smith. In his decorated career, he won a total of 10 premierships, 4 as a player, 6 as coach and all for the Melbourne football club (given Melbourne have only won 13 in their history, quite a feat). At the back end of his playing career, he spent two years as captain-coach (yes that was a thing at the time) of the Fitzroy football club (here they pop up again).
The Norm Smith medal is usually given to a player on the winning team but very occasionally, players have managed to win the award in a losing side, the last time being 4 out of 45 times and the last time was Eagles (and Carlton) superstar Chris Judd in 2005.
The Norm Smith Medal was first instituted in 1979. Prior to this, there was no official award given, however the media and fans of the day had their opinions of who the best on ground was in prior grand finals. While this article lists some “unofficial” best on ground performances in grand finals prior to 1979, I will stick with the official list. As a Western Australian, I would have no qualms with discarding the older, exclusively Victorian seasons particularly as this data is of dubious reliability.
As previously mentioned, Norm Smith Medal data is not available on fitzRoy
, so we will have to scrape it with some of our own bespoke code36. The AFL website conveniently has a nice table, listing all the winners since the award began in 1979. We will supplement this with data from fitzRoy
to figure out the level of experience of each player.
36 This illustrates the fact that sometimes you need to stray outside of fitzRoy
but it gives most of the footy data you could ever want.
data.table
code
While all the other code has thus far been written in both tidyverse and data.table
, I decided to leave it for this one as it was rather intricate and painful to perform the same process twice.
norm_smith_url <- "https://www.afl.com.au/stats/leaders-awards/norm-smith-medal"
norm_smith_html <- read_html(norm_smith_url)
norm_smith_html |>
html_table(header = TRUE) |>
_[[1]] |>
mutate( #manually adjust to help mapping
Club = case_when(
Club == "Geelong Cats" ~ "Geelong",
Club == "West Coast Eagles" ~ "West Coast",
Club == "Sydney Swans" ~ "Sydney",
TRUE ~ Club
),
Player = case_when(
Player == "Billy Duckworth" ~ "Bill Duckworth",
Player == "Ryan O'Keefe" ~ "Ryan OKeefe",
TRUE ~ Player
),
Year = as.integer(str_sub(Year, 1, 4))
) -> norm_smith_tb
# get the date of their first game
player_stats_tb |>
group_by(
player_afltables_url, Playing.for
) |>
summarise(
first_game_date = min(Date),
.groups = "drop"
) -> more_player_details
# get games played to date
player_stats_tb |>
group_by(ID) |>
arrange(Date) |>
mutate(
games_played_to_date = seq_along(ID)
) |>
ungroup() ->
player_stats_tb_games_played
# get grand final information
player_stats_tb_games_played |>
filter(
Round == "GF",
Season >= 1979
) |>
mutate(
player_name = paste(First.name, Surname),
on_winning_team = (Home.team == Playing.for & Home.score > Away.score) | (Away.team == Playing.for & Home.score < Away.score)
) |>
select(ID, player_name, Playing.for, game_afltables_url, player_afltables_url, grand_final_season = Season, grand_final_date = Date, games_played_to_date, on_winning_team) ->
grand_final_tb
grand_final_tb |>
select(grand_final_date) |>
distinct() |>
arrange(desc(grand_final_date)) |>
pull(grand_final_date) ->
grand_final_dates
player_details_tb |>
mutate(
first_year = str_sub(Seasons, 1, 4)
) ->
player_details_tb_joinable
norm_smith_tb |>
mutate(Date = grand_final_dates) |>
left_join(
grand_final_tb, by = c("Player" = "player_name", "Club" = "Playing.for", "Date" = "grand_final_date")
) |>
left_join(
more_player_details, by = c("player_afltables_url", "Club" = "Playing.for")
) |>
mutate(
first_year = format(first_game_date, "%Y")
) |>
left_join(
player_details_tb_joinable, by = c("Player", "Club" = "Team", "first_year")
) |>
mutate(
debut_age_years = as.integer(str_sub(Debut, 1, 2)),
debut_age_days = as.integer(str_remove_all(Debut, ".*y |d")),
date_of_birth = first_game_date - years(debut_age_years) - days(debut_age_days),
age_at_grand_final = as.period(interval(start = date_of_birth, end = Date)),
age_at_grand_final_seconds = as.period(interval(start = date_of_birth, end = Date))
) |>
arrange(age_at_grand_final ) |>
select(
Player, Club, Year, on_winning_team, games_played_to_date, age_at_grand_final, age_at_grand_final_seconds, player_afltables_url, game_afltables_url) -> norm_smith_youngest_tb
The youngest player on the list is the inaugural winner, Wayne Harmes, who was only 19 years of age when he won the prestigious award. He is known for a legendary moment during this match, where, towards the end of the fourth quarter, he ran down his own (errant) kick by sliding along the ground and tapping the ball to keep it in play, sending it into the path of his team mate Ken Sheldon who ran into the open goal. The goal ended up being a decisive one as the Blues came out as victors by only 5 points.
The most inexperienced in terms of games played was Maurice Rioli (father of Maurice Jr. who is currently plying his trade at his old man’s club), member of the famous Rioli family of the Tiwi Islands which has uncannily produced a plethora of great footballers and premiership players. He was the first (and arguably the greatest) Rioli to ever play in the AFL/VFL. One caveat is that while at the time he had only played 21 VFL games, he was 24 years of age and had previously played 6 years of WAFL footy37 for South Fremantle, so he wasn’t your typical 21 game player, he was really in his prime.
37 At this time (prior to a national competition), WAFL football (as well as the SANFL in South Australia) could be viewed as being at a similar level to the VFL (although VFL did benefit from having the larger population in Victoria as a talent pool). While these days, the WAFL is the tier below the AFL (like the English Championship is the to Premier League), at that time, it could instead be viewed as a competition that was the best within its own region (like Serie A is to La Liga).
Epilogue
Closing Remarks
To mention one last record, the margin of 171 in the infamous game is actually the equal 4th highest winning margin in a AFL/VFL game. Interestingly, the all-time record in this category comes full circle, being a game between the Fitzroy Lions and the Melbourne Demons back in 1979. Fittingly, the victor of this game, was our friends (for whom we owe the greatest gratitude for helping us import data), the mighty fitzRoy
footy club by a whopping 190 points. So while they had their trials and tribulations as a club (some of which we have covered in this post), it is nice to finish with them on a high note.
Interestingly, the aforementioned infamous game isn’t the record-holder (or even in the top 5) for any of our questions38 but it is only one of 31 games where a team has scored 200 points or more which is notable enough I think, particularly given I had the (mis-)fortune of witnessing such a rare event in the flesh. Perhaps we could dig deeper to find a record it holds (every game is uniquely remarkable in some way if you look hard enough) but I somehow find more satisfaction in it being a thought-provoking enough game to coax these questions out of us without it ever being the answer.
38 the closest it came was in the most multiple goal kickers category at equal 7th
Notable AFL Stats Figures
I will conclude by listing some people who are doing interesting work with AFL stats (often with heavy use of R and the fitzRoy
package) to provide further motivation:
fitzRoy
: as outlined in this article, this R package is the de facto way of sourcing AFL data.Useless AFL Stats: a Facebook page which shares always interesting, sometimes abstract and often amusing AFL stats content. Liam Crow is their data scientist and posts some interesting content of his own on his website: https://www.crowdatascience.com.
squiggle.com.au: displays a bunch of people’s data-driven tipping models, many of which have websites and social media accounts where they do AFL stats.
Jaiden Popowski: is prominent in the AFL Fantasy community for the interesting data-driven analysis he produces.
DFS Australia: has some great data-driven tools that provide insight on advanced stats commonly used in AFL Fantasy.
Comments