Wednesday, October 21, 2015

2015-2016 CSML "PREDICTING THE NFL" UPDATE

Attempted Pattern Detection of One Style of Play

In an earlier post, A Glimpse of Things to Come, I announced the general direction of our project for this academic year. Things are progressing well, but a brief update at this point is in order.

The general approach that we are taking to improve prediction over last year's model is by way of an AI network analysis model of various play strategies.

Part of this initiative is the creation of a multi-layered feature detection network that implicitly compared the various drive profiles of possible play over the last several years.

Currently, we are compiling data from all games since 2000 to create a network base of more than 3500 games that will serve as our training set. At the highest level of analysis the drive profile is no longer attached to various teams. Rather, we are inspecting which styles of play can beat other styles of play regardless of team.

The preliminary feature detection network provided a coarse-grained analysis. However, as many a network specialist would suspect, many games share many of the same features. Thus, finer grained analysis means that we must control for information overflow due to excessive commonalities. To do this, we are adding two layers to the detection network, one to account for information entropy and the other to show only those nodes that pass a dynamic threshold. The result will be a network that isolates which drive features are most pertinent to the final results of a game ... hopefully.

The feature detection layer will be five-layered, including both its input and output layer. On top of that, we are building several decision layers to match particular play styles to particular game outcomes. This will include matching of point spreads, which we hope to be able to get the network to determine for specific games in time.

After the network is complete, we will use the current season's play styles for each team to make predictions about the final result of each game. The early results are very preliminary, but the network is already better at guessing game winners based solely on two weeks of data from the current season than the method from last year.

Whether this approach will work as well as we would like remains to be seen. Minimally, however, the network will allow for a rigorous comparison of teams based on similarity measures. Thus, for any target team, we will be able to provide an ordered list of the 31 remaining teams ranked by their similarities to the target. This will allow us to predict that if team X beats Y and if team Z has a drive profile very similar to team X, it too should be able to beat team Y.

If this does not work by itself, we still have other components to add to the network to (try to) improve results. Our target is to hit in the 75%-80% range with regularity.

No comments:

Post a Comment