Saturday, October 3, 2015


Entropy in information theory is a measure of the randomness or uncertainty of a "piece" of information. Used in this way, as the randomness of information increases, so does its entropy, which, in turn, renders this information more informative. If this sounds counterintuitive, it might help to consider Luciano Floridi's explanation in terms of information as a reduction in ignorance. If I tell you something you already know, my message is completely uninformative. There is no reduction of your ignorance. On the other hand, if I tell you something unexpected, then you learn something, rendering my message informative.

One of the difficulties we confront in predicting a football game is caused by the fact that all the variables we use initially characterize a particular team playing a particular game, thereby making our variables relational. That is, they don't give pure information about a team in general, but how team X fares against team Y. Thus, in arriving at our predictions, we have to pry general information about a team from its opponents to arrive at information about the target team that characterizes its standing in the whole NFL and not merely in relation to the target team's opponents.

In the 2014 model, this occurs by holding information for a target team "steady" (admittedly a vague notion here) as the team plays various opponents throughout the season. Compared to the target team, then, opponent information becomes more random as the team plays more and more opponents. This is why, as the season progresses, our information about a particular team becomes more specifically about that team and less so about that team in relation to its particular opponents, which generally increases the reliability of our predictions. Furthermore, our information about a team becomes more certain as the season progresses because the information about an opponent team starts to include information about all of its opponent teams as well.

While the students are writing the software for our upcoming network model, I have tried to find a way to accelerate the acquisition of information in rating the opponent teams by using later games in the season to redefine a team's standing earlier in the season. That is, instead of using data as the games were actually played in the past by itself, I am using that data to make predictions as if all games from the past were played in the present. The net result is that the opponent ratings for a target team begin to include information that is less easy to discern, i.e., more random. In fact, it is practically fictitious since it no longer represents what actually happened, but rather what could happen if all past games were replayed given the current standings. Using this opponent information, I then determined new ratings for the 32 teams in the league to generate a new set of predictions for week four that may better represent how each team stands in relation to the whole league.

In all but four cases, this method agreed with the predictions I posted earlier, and it settled on the side of WAS in the PHI @ WAS game against the crowd. Keeping in mind that this method is highly experimental, the four differences are as follows:
  • JAX over IN (Unfortunately) - against the crowd's 94%
  • HOU over ATL - against the crowd's 92%
  • SD over CLE - with the crowd's 87%
  • MIN over DEN - against the crowd's 87%
The HOU @ ATL and the MIN @ DEN predictions here strike me as very counterintuitive. But stranger things have happened. So, let's see if this accelerated method gets us anywhere. It may not.

Added at 9:33 pm on October 4th: Okay, not.

No comments:

Post a Comment