We are back with our second positional writeup of draft prospects using text analytics, this time focusing on the running back position. Our quarterback article was the first in the series.

We can use natural language processing to compare prospects to others in their draft class and those from the past. Then, we can tie in previously built advanced descriptive statistics to gauge how well a prospect fits within a certain NFL mold.

For this analysis, we took The Athletic's Dane Brugler‘s prospect write-ups, as he is one of the best football film analysts, over the past eight seasons (including 2022) and used latent semantic analysis (LSA) to derive similarity scores between the text in prospects’ scouting reports.

After building our dataset to span eight seasons, we can create a prospect's score in a number of ways. We decided to use a weighted average of similar players’ WAR (wins above replacement), using the similarity score derived above as the weights. For example, if a player has a 0.60 similarity score with a player who has earned 7.0 WAR since being drafted and a -0.3 similarity score with someone who has earned 4.0 WAR, his overall score would be +3.

Using the analyses above, we can look at 2022 prospects in a couple of ways. First, we can examine player comparisons for notable prospects. Second, we can rank the players in each position group by the score derived above. These scores have correlated well with draft position and future WAR generated at the NFL level, although a more robust analysis using more seasons and data sources is beyond the scope of this article.