We can use natural language processing to compare prospects to others in their draft class and those from the past. Then, we can tie in previously built advanced descriptive statistics to gauge how well a prospect fits within a certain NFL mold.
For this analysis, we took The Athletic's Dane Brugler‘s prospect write-ups, as he is one of the best football film analysts, over the past eight seasons (including 2022) and used latent semantic analysis (LSA) to derive similarity scores between the text in prospects’ scouting reports.
After building our dataset to span eight seasons, we can create a prospect's score in a number of ways. We decided to use a weighted average of similar players’ WAR (wins above replacement), using the similarity score derived above as the weights. For example, if a player has a 0.60 similarity score with a player who has earned 7.0 WAR since being drafted and a -0.3 similarity score with someone who has earned 4.0 WAR, his overall score would be +3.
Using the analyses above, we can look at 2022 prospects in a couple of ways. First, we can examine player comparisons for notable prospects. Second, we can rank the players in each position group by the score derived above. These scores have correlated well with draft position and future WAR generated at the NFL level, although a more robust analysis using more seasons and data sources is beyond the scope of this article.
SUCCESSFUL RUNNING BACK TEXT ANALYTIC TRAITS
BAD RUNNING BACK TEXT ANALYTIC TRAITS
PLAYERS EXCEEDING THEIR DRAFT PEDIGREE