load_pbp()
and load_player_stats()
now call nflreadr::load_pbp()
and nflreadr::load_player_stats()
respectively. Therefore the argument qs
has been deprecated in both functions. It will be removed in a future release. Running load_player_stats()
without any argument will now return player stats of the current season only (the default in nflreadr
).source
and pp
in the functions fast_scraper_*()
and build_nflfastR_pbp()
have been removedracr
(“Receiver Air Conversion Ratio”), target_share
, air_yards_share
, wopr
(“Weighted Opportunity Rating”) and pacr
(“Passing Air Conversion Ratio”) to the output of calculate_player_stats()
report()
which will be used by the maintainers to help users debug their problems (#274).update_db()
receiver
names (#270)return_team
on interception return touchdowns (#275)wpa
variables are NA
on end game linewp
variables are 0, 0.5, 1, or NA
on end game linedecode_player_ids()
now really decodes the new variable fantasy_id
(#229)wp
values depending on the first game in the data set (#183)sack_yards
, sack_fumbles
, rushing_fumbles
and receiving_fumbles
to the output of the function calculate_player_stats()
, thanks to Mike Filicicchia (@TheMathNinja). (#239)calculate_player_stats()
falsely counted lost fumbles on aborted snaps (#238)season_type
to the output of calculate_player_stats()
and load_player_stats()
in preparation of the extended Regular Season starting in 2021 (#240)season_type
definitions in preparation of the extended Regular Season starting in 2021 (#242)fixed_drive
where it wasn’t incrementing when there was a muffed punt followed by timeout (#244)fixed_drive
where it wasn’t incrementing following an interception with the intercepting player then losing a fumble (#247)safety_player_name
and safety_player_id
to the play-by-play data (#252)usethis
calculate_player_stats()
that aggregates official passing, rushing, and receiving stats either at game level or overallload_player_stats()
that loads weekly player stats from 1999 to the most recent seasonadd_xyac()
and clean_pbp()
has been significantly improvedtd_player_name
and td_player_id
to clearly identify the player who scored a touchdown (this is especially helpful for plays with multiple fumbles or laterals resulting in a touchdown)calculate_player_stats()
now adds the variable dakota
, the epa
+ cpoe
composite, for players with minimum 5 pass attempts.home_opening_kickoff
to clean_pbp()
sack_player_id
, sack_player_name
, half_sack_1_player_id
, half_sack_1_player_name
, half_sack_2_player_id
and half_sack_2_player_name
who identify players that recorded sacks (or half sacks). Also updated the description of the variables qb_hit_1_player_id
, qb_hit_1_player_name
, qb_hit_2_player_id
and qb_hit_2_player_name
to make more clear that they did not record a sack. (#180)qb_scramble
was incomplete for the 2005 season because of missing scramble indicators in the play description. This has been mostly fixed courtesy of charting data from Football Outsiders (with thanks to Aaron Schatz!). Some notes on this fix: Weeks 1-16 are based on charting. Weeks 17-21 are guesses (basically every QB run except those that were a) a loss, b) no gain, or c) on 3/4 down with 1-2 to go). Plays nullified by penalty are not included.name
, id
, rusher
, and rusher_id
to be the player charged with the fumble on aborted snaps when the QB is unable to make a play (i.e. pass, sack, or scramble) (#162)clean_pbp()
now standardizes the team name columns tackle_with_assist_*_team
drive
that was causing incorrect overtime win probabilities (#194)posteam
was not NA
on end of quarter 2 (or end of quarter 4 in overtime games) causing wrong values for fixed_drive
, fixed_drive_result
, series
and series_result
fixed_drive
and series
were falsely incrementing on kickoffs recovered by the kicking team or on defensive touchdowns followed by timeoutsfixed_drive
and series
were falsely incrementing on muffed punts recovered by the punting team for a touchdownadd_xpass()
crashed when ran with data already including xpass variables.epa
when a safety is scored by the team beginning the play in possession of the ball (#186)calculate_player_stats()
forgot to clean player names by using their IDscalculate_player_stats()
(#203)update_db()
no more falsely closes a database connection provided by the argument db_connection
(#210)yards_gained
was missing yardage on plays with laterals. (#216)fixed_drive
now increments properly on onside kick recoveries (#215)fixed_drive
no longer counts a muffed kickoff as a one-play drive on its own (#217)fixed_drive
now properly increments after a safety (#219)penalty_type
and updated the description of the variable to make more clear it’s the first penalty that happened on a play. (#223)source
and pp
all across the package. Using them will cause a warning. Parallel processing has to be activated by choosing an appropriate future::plan()
before calling the relevant functions. For more information please see the package documentation.build_nflfastR_pbp()
will now run decode_player_ids()
by default (can be deactivated with the argument decode = FALSE
).build_nflfastR_pbp()
will now run add_xpass()
by default and add the new variables xpass
and pass_oe
.fast_scraper()
and build_nflfastR_pbp()
now allow the output of fast_scraper_schedules()
directly as input so it’s not necessary anymore to pull the game_id
first.load_pbp()
that loads complete seasons into memory for fast access of the play-by-play data.rushing_yards
, lateral_rushing_yards
, passing_yards
, receiving_yards
, lateral_receiving_yards
to fix an old bug where yards_gained
gets overwritten on plays with laterals (#115).vegas_wpa
and vegas_home_wpa
which contain Win Probability Added from the spread-adjusted WP modelout_of_bounds
fantasy
, fantasy_id
, fantasy_player_name
, and fantasy_player_id
that indicate the rusher or receiver on the playtackle_with_assist
, tackle_with_assist_1_player_id
, tackle_with_assist_1_player_name
, tackle_with_assist_1_team
, tackle_with_assist_2_player_id
, tackle_with_assist_2_player_name
, tackle_with_assist_2_team
calculate_win_probability()
vignette("field_descriptions")
with a searchable list of all nflfastR variables?field_descriptions
and ?stat_ids
to the packagefixed_drive
and series
weren’t updating after muffed punt (#144)defteam
instead of the posteam
(#152)update_db()
(added qs
and curl
to dependencies)calculate_expected_points()
and calculate_win_probability()
duplicated some existing variables instead of replacing them (#170)penalty_type
wasn’t "no_play"
although it should have been (#172)penalty_team
could be incorrect in games of the Jaguars in the seasons 2011 - 2015 (#174)epa
on plays before a failed pass interference challenge in a few 2019 games (#175)NA
on offsetting penalties (#44)epa
when possession team changes at end of 1st or 3rd quarter (#182)vegas_wp
is now NA
on final line since there is no possession teamvegas_wp
)yardline_100
as an input to both win probability models (not having it included was an oversight)series
was increased on PATsteam_wordmark
- which contains URLs to the team’s wordmarks - to the included data frame ?teams_colors_logos
update_db()
force_rebuild
of the function update_db()
is now of hybrid type. It can rebuild the play by play data table either for the whole nflfastR era (with force_rebuild = TRUE
) or just for specified seasons (e.g. force_rebuild = 2019:2020
). The latter is intended to be used for running seasons because the NFL fixes bugs in the play by play data during the week and we recommend to rebuild the current season every Thursday.update_db()
disconnected the connection to a database provided by the argument db_connection
(#102)update_db()
didn’t build a fresh database without providing the argument force_rebuild
update_db()
no longer removes the complete data table when a numeric argument force_rebuild
is passed but only removes the rows within the table (#109)build_nflfastR_pbp()
, a convenient wrapper around multiple nflfastR functions for an easy creation of the nflfastR play-by-play data setadd_xpass()
, that creates columns xpass
and pass_oe
fixed_drive
which was not incrementing properly on drives that began following a timeoutusethis
pass
= 1)fast
(either TRUE
or FALSE
) to the function decode_player_ids()
to activate the high efficient C++ decoder of the package gsisdecoder
fast_scraper_roster()
is finally back! It loads NFL roster of a given season.decode_player_ids()
to decode all player IDs to the commonly known GSIS ID format (00-00xxxxx)source = "old"
to fast_scraper()
to enable scraping of old source. This is mostly useless as it doesn’t work for 2020 and provides less infodb_connection
to update_db()
to allow advanced users to use other DBI drivers, such as RMariaDB::MariaDB()
, RPostgres::Postgres()
or odbc::odbc()
(please see dbplyr for more information)clean_pbp()
now fixes some bugs in jersey numbersclean_pbp()
, add_qb_epa()
and add_xyac()
can now handle empty data framesfast_scraper()
to fail (affects multiple games of the 2020 season)fixed_drive
that counted PAT after defensive TD as its own driveadd_xyac()
breaking with some old packagesadd_xyac()
and add_qb_epa()
calculations being wrong for some failed 4th downsvignette("examples")
with the new add_xyac()
functionvignette("nflfastR-models")
fixed_drive
and fixed_drive_result
to the output of fast_scraper()
because the NFL-provided drive info is extremely buggyseries_result
clean_pbp()
now adds 4 new variables passer_jersey_number
, rusher_jersey_number
, receiver_jersey_number
and jersey_number
. These can be used to join rosters.timeout_team
, return_team
, fumble_recovery_1_team
for JAX games from 2011-2015fixed_drive
and corrections to timeout_team
add_xyac()
which adds the following columns associated with expected yards after the catch (xYAC): xyac_epa
, xyac_success
, xyac_fd
, xyac_mean_yardage
, xyac_median_yardage
series_success
caused by bad drive
information provided by NFLspecial_teams_play
, st_play_type
, time_of_day
, and order_sequence
old_game_id
column (useful for merging to external data that still uses this ID: format is YYYYMMDDxx)clean_pbp()
function now adds an aborted_play
columnplay_type
= no_play
rather than pass
teams_colors_logos
for the interim name of the ‘Washington Football Team’ and the corresponding logo urls.tidyselect
version to be >= 1.1.0clean_pbp()
now standardizes player IDs across the old (1999-2010) and new (2011+) data sources. Player IDs once again uniquely identify players, and each unique player has one unique ID (as they did before the NFL data source change):
clean_pbp()
now removes all variables it is about to create to make sure nothing unexpected can happenAdded minimum version requirements to some package dependencies because installation broke for some users with outdated packages
Made a minor bug fix to catch more out-of-order plays and fixed a bug where some plays were being incorrectly dropped in older seasons
Standardized team names (e.g. SD
–> LAC
) in some columns we had missed
week
from Expected Points models along with an update of vignette("nflfastR-models")
and vignette("examples")
update_db()
which adds all completed games to a SQLite databasecalculate_win_probability()
vignette("examples")
demonstrating the usage of the above mentioned functionsdrive_real_start_time
pre and post 2011game_id
s were overwritten during the play by play parsingfast_scraper()
now loads the raw game data from a separate raw data repo.data
from the rlang package (this is a major code change that takes some getting used to but we need it in preparation of a future release)yards_gained
more precisely definedvignette("examples")
to demonstrate Expected Points calculator calculate_expected_points()
clean_pbp()
first_down_rush
and return_touchdown
fast_scraper()
for not yet played gamesxgboost
(>= 1.1) as the recent xgboost
update caused a breaking change leading to failure in adding model results to dataAdded new models for Expected Points, Win Probability and Completion Probability and removed nflscrapR
dependency. This is a major change as we are stepping away from the well established nflscrapR
models. But we believe it is a good step forward. See data-raw/MODEL-README.md
for detailed model information.
Added internal functions for EPA
and WPA
to helper_add_ep_wp.R
.
Added new function calculate_expected_points()
usable for the enduser.
Completely overhauled fast_scraper()
to make it work with the NFL’s new server backend. The option source
is still available but will be deprecated since there is only one source now. There are some changes in the output as well (please see below).
fast_scraper()
now adds game data to the play by play data set courtesy of Lee Sharpe. Game data include: away_score, home_score, location, result, total, spread_line, total_line, div_game, roof, surface, temp, wind, home_coach, away_coach, stadium, stadium_id, gameday
fastcraper_schedules()
now incorporates Lee Sharpe’s games.rds
.
The functions fast_scraper_clips()
and fast_scraper_roster()
are deactivated due to the missing data source. They might be reactivated or completely dropped in future versions.
The function fix_fumbles()
has been renamed to add_qb_epa()
as the new name much better describes what the function is actually doing.
Added progress information using the progressr
package and removed the furrr
progress bars.
clean_pbp()
now adds the column ìd
which is the id of the player in the column name
. Because we have to piece together different data to cover the full span of years, player IDs are not consistent between the early (1999-2010) and recent (2011 onward) periods.
Added a NEWS.md
file to track changes to the package.
Fixed several bugs inhereted from nflscrapR
, including one where EPA was missing when a play was followed by two timeouts (for example, a two-minute warning followed by a timeout), and another where play_type
was incorrect on plays with declined penalties.
Fixed a bug, where receiver_player_name
and receiver
didn’t name the correct players on plays with lateral passes.
The output has changed a little bit.
Dropped Variables | Description |
---|---|
game_key | RS feed game identifier. |
game_time_local | Kickoff time in local time zone. |
iso_time | Kickoff time according ISO 8601. |
game_type | One of ‘REG’, ‘WC’, ‘DIV’, ‘CON’, ‘SB’ indicating if a game was a regular season game or one of the playoff rounds. |
site_id | RS feed id for game site. |
site_city | Game site city. |
site_state | Game site state. |
drive_possession_team_abbr | Abbreviation of the possession team in a given drive. |
scoring_team_abbr | Abbreviation of the scoring team if the play was a scoring play. |
scoring_type | String indicating the scoring type. One of ‘FG’, ‘TD’, ‘PAT’, ‘SFTY’, ‘PAT2’. |
alert_play_type | String describing the play type of a play the NFL has listed as alert play. For most of those plays there are highlight clips available through fast_scraper_clips. |
time_of_day | Local time at the beginning of the play. |
yards | Analogue yards_gained but with the kicking team being the possession team (which means that there are many yards gained through kickoffs and punts). |
end_yardline_number | Yardline number within the above given side at the end of the given play. |
end_yardline_side | String indicating the side of the field at the end of the given play. |
Renamed Variables | Description |
---|---|
game_time_eastern -> start_time | Kickoff time in eastern time zone. |
site_fullname -> stadium | Game site name. |
drive_how_started -> drive_start_transition | String indicating how the offense got the ball. |
drive_how_ended -> drive_end_transition | String indicating how the offense lost the ball. |
drive_start_time -> drive_game_clock_start | Game time at the beginning of a given drive. |
drive_end_time -> drive_game_clock_end | Game time at the end of a given drive. |
drive_start_yardline -> drive_start_yard_line | String indicating where a given drive started consisting of team half and yard line number. |
drive_end_yardline -> drive_end_yard_line | String indicating where a given drive ended consisting of team half and yard line number. |
roof_type -> roof | One of ‘dome’, ‘outdoors’, ‘closed’, ‘open’ indicating indicating the roof status of the stadium the game was played in. (Source: Pro-Football-Reference) |
Added Variables | Description |
---|---|
vegas_wp | Estimated win probabiity for the posteam given the current situation at the start of the given play, incorporating pre-game Vegas line. |
vegas_home_wp | Estimated win probability for the home team incorporating pre-game Vegas line. |
weather | String describing the weather including temperature, humidity and wind (direction and speed). Doesn’t change during the game! |
nfl_api_id | UUID of the game in the new NFL API. |
play_clock | Time on the playclock when the ball was snapped. |
play_deleted | Binary indicator for deleted plays. |
end_clock_time | Game time at the end of a given play. |
end_yard_line | String indicating the yardline at the end of the given play consisting of team half and yard line number. |
drive_real_start_time | Local day time when the drive started (currently not used by the NFL and therefore mostly ‘NA’). |
drive_ended_with_score | Binary indicator the drive ended with a score. |
drive_quarter_start | Numeric value indicating in which quarter the given drive has started. |
drive_quarter_end | Numeric value indicating in which quarter the given drive has ended. |
drive_play_id_started | Play_id of the first play in the given drive. |
drive_play_id_ended | Play_id of the last play in the given drive. |
away_score | Total points scored by the away team. |
home_score | Total points scored by the home team. |
location | Either ‘Home’ o ‘Neutral’ indicating if the home team played at home or at a neutral site. |
result | Equals home_score - away_score and means the game outcome from the perspective of the home team. |
total | Equals home_score + away_score and means the total points scored in the given game. |
spread_line | The closing spread line for the game. A positive number means the home team was favored by that many points, a negative number means the away team was favored by that many points. (Source: Pro-Football-Reference) |
total_line | The closing total line for the game. (Source: Pro-Football-Reference) |
div_game | Binary indicator for if the given game was a division game. |
roof | One of ‘dome’, ‘outdoors’, ‘closed’, ‘open’ indicating indicating the roof status of the stadium the game was played in. (Source: Pro-Football-Reference) |
surface | What type of ground the game was played on. (Source: Pro-Football-Reference) |
temp | The temperature at the stadium only for ‘roof’ = ‘outdoors’ or ‘open’.(Source: Pro-Football-Reference) |
wind | The speed of the wind in miles/hour only for ‘roof’ = ‘outdoors’ or ‘open’. (Source: Pro-Football-Reference) |
home_coach | First and last name of the home team coach. (Source: Pro-Football-Reference) |
away_coach | First and last name of the away team coach. (Source: Pro-Football-Reference) |
stadium_id | ID of the stadium the game was played in. (Source: Pro-Football-Reference) |
game_stadium | Name of the stadium the game was played in. (Source: Pro-Football-Reference) |