Saturday, June 18, 2016

Recollections: The CSML During Its Teenage Years

By Hiten Sonpal, Former Intern
I had been working with Tony as a work-study student in 1995 to assemble the Ecole Initiative, a web collection of links that Tony had reviewed. A key feature of the Initiative was temporal searching - users could find events that occurred around a certain date as long as they were added to the database. Although Ecole was a useful resource, finding information about a specific subject nonetheless involved browsing multiple collections and URLs, and this was still tedious. Additionally, Altavista and Lycos, the popular search engines of that time, did not have any provisions to address our specific needs.

The summer of 1997 was about scaling up our fledgling effort and making real Tony's vision of a peer-reviewed, quality-controlled section of the internet in the Internet Applications Laboratory ("IALAB"). This fancy name referred to an empty desk in Tony's office and a $10K budget to cover all the hardware needs for the project AND the salary for my summer internship.

Today, we would create an AWS or Azure account, provision a virtual server and set up CentOS or Ubuntu. Unfortunately, none of those things were available to us. We had previously downloaded a copy of Slackware Linux and ordered a UPS, a tape drive, and two expensive dual-processor Pentium Pro towers from Gateway. Both computers were intended to be configured almost identically for redundancy, but generally meant for two different missions: one for crawling web pages and creating the search database, and the other for serving web pages and executing search queries.

On my first official day of work that summer, Tony handed me a piece of paper with the two static IP addresses that had been assigned to us, and the strange names that he had picked for the machines. I later learned "Castor" and "Pollux" were the Dioskouri twins from Greek mythology, better known in some circles as the Gemini twins. (Naming our servers after the sons of Zeus may have been a bit grand, but then computer science has never been known for meaningful names.)

Although we had a 100 Mbit switched Ethernet connection to the IALAB, 10/100 Ethernet cards were new in this era of blazingly fast 56K modems, aol.com e-mail addresses and scandalous ASCII art pin-ups. The Intel 10/100 Ethernet cards on our Pentium Pro machines needed an e100 driver which did not then ship with Slackware. Also, multi-processor support was new to Linux and many drivers were not stable in multi-processor kernels. I had recompiled the kernel for our hardware using the menu-based kernel configuration utility, but adding an unsupported driver to the kernel was a little outside my depth, and simultaneously alien and barbaric as far as the University's IT department was concerned.

So we called our emergency tech support hotline, Tony's brother, Paul, in Houston on how to add this driver. After several long phone calls, numerous painful transfers of different driver versions using a floppy disk, many failed patches, several make errors, and a bunch of kernel oops, we found a driver that worked if we disabled multi-processor support in the machines. From there, I was able to execute my first successful ping to www.evansville.edu, and then to www.kernel.org. The rest, as they say, was just software engineering (and quite a lot of it at that.)

Hiten Sonpal is a Computer Engineering graduate of the University of Evansville, and the first intern of the now-named CSML. He is currently an Engineering Director at iRobot Corporation, overseeing the Mechanical Engineering, Far East Engineering and ID/UX groups. In his spare time, Hiten enjoys rooting for the New England Patriots, and swimming and biking with his wife and two kids. This blog entry does not reflect the views of his employer.

Thursday, December 31, 2015

WEEK SEVENTEEN NFL PREDICTIONS

Below please find the 2015 week seventeen NFL predictions based on the model we built last year. This model uses a learning algorithm fed by data from the last sixteen weeks. To date, success rates this season are as follows: Week 2, 50%; Week 3, 62.5%; Week 4, 60%; Week 5, 64%; Week 6, 57%; Week 7, 50%; Week 8, 71%; Week 9, 53.8%; Week 10, 42.7%; Week 11, 57.1%; Week 12, 56.2%; Week 13, 62.5%; Week 14, 56.2%; Week 15, 68.7%; Week 16, 50%; and this week ...

Note that the model agrees with the crowd on all games but one, the MIN @ GB game in which the model goes for MIN and the crowd for GB. This should be a good game to watch.

So here goes - (The % indicated after each game represents the crowd-sourced prediction from nfl.com as of this morning. The two digit decimal numbers following team names should be interpreted relative to each other for comparison to determine, in part, how close the game will be.):

  • NYJ @ BUF
    • NYJ (.58) over BUF (.54)
    • Crowd Agrees @ 78%
  • NE @ MIA
    • NE (.61) over MIA (.42)
    • Crowd Agrees @ 92% 
  • NO @ ATL
    • ATL (.48) over NO (.48) - Close Game
    • Crowd Agrees @ 78%
  • DET @ CHI
    • DET (.55) over CHI (.45)
    • Crowd Agrees @ 51%
  • PHI @ NYG
    • NYG (.47) over PHI (.45)
    • Crowd Agrees @ 78%
  • WAS @ DAL
    • WAS (.53) over DAL (.40)
    • Crowd Agrees @ 81%
  • TEN @ IND
    • IND (.46) over TEN (.34)
    • Crowd Agrees @ 91%
  • BAL @ CIN
    • CIN (.59) over BAL (.44)
    • Crowd Agrees @ 92%
  • PIT @ CLE
    • PIT (.57) over CLE (.39)
    • Crowd Agrees @ 95%
  • JAX @ HOU
    • HOU (.59) over JAX (.49)
    • Crowd Agrees @ 86%
  • OAK @ KC
    • KC (.61) over OAK (.48)
    • Crowd Agrees @ 84%
  • SD @ DEN
    • DEN (.55) over SD (.43)
    • Crowd Agrees @ 95%
  • TB @ CAR
    • CAR (.61) over TB (.47)
    • Crowd Agrees @ 93%
  • SEA @ ARI
    • ARI (.68) over SEA (.63)
    • Crowd Agrees @ 79%
  • STL @ SF
    • STL (.48) over SF (.37)
    • Crowd Agrees @ 87%
  • MIN @ GB
    • MIN (.56) over GB (.52)
    • Crowd Disagrees @ 61%
Games to watch: MIN @ GB and really all of them this week.

Sunday, December 27, 2015

WEEK FIFTEEN NFL RESULTS

Below please find the 2015 week fifteen NFL results based on the model we built last year. This model uses a learning algorithm fed by data from the last fourteen weeks. To date, success rates this season are as follows: Week 2, 50%; Week 3, 62.5%; Week 4, 60%; Week 5, 64%; Week 6, 57%; Week 7, 50%; Week 8, 71%; Week 9, 53.8%; Week 10, 42.7%; Week 11, 57.1%; Week 12, 56.2%; Week 13, 62.5%; Week 14, 56.2%, and this Week, 68.7%.

Note that the crowd was also correct on 68.7%, though there was divergence on two games: the model as incorrect on the CHI @ MIN game where the crowd was correct, and the model was correct on the DET @ NO game where the crowd was incorrect. Importantly, the weekly model percentages for the season are still remaining pretty flat, which is a gain over last year, even though last year's overall accuracy was higher. See Week Fourteen NFL Results for more.

Due to the holiday, Week 16 Predictions and Results will not be posted. The percentage correct, however, will be posted with the Week 17 reports.

So here goes - The % indicated after each game represents the crowd-sourced prediction from nfl.com. The two digit decimal numbers following team names should be interpreted relative to each other for comparison to determine, in part, how close the game will be:

  • TB @ STL
    • TB (.47) over STL (.42)
    • STL 31 over TB 23 - Prediction Incorrect
    • Crowd Incorrect @ 57%
  • NYJ @ DAL
    • NYJ (.60) over DAL (.39)
    • NYJ 19 over DAL 16 - Prediction Correct
    • Crowd Correct @ 84% 
  • KC @ BAL
    • KC (.65) over BAL (.41)
    • KC 34 over BAL 14 - Prediction Correct
    • Crowd Correct @ 91%
  • HOU @ IND
    • HOU (.46) over IND (.40)
    • HOU 16 over IND 10 - Prediction Correct
    • Crowd Correct @ 59%
  • ATL @ JAX
    • JAX (.54) over ATL (.36)
    • ATL 23 over JAX 17 - Prediction Incorrect
    • Crowd Incorrect @ 59%
  • CHI @ MIN
    • CHI (.47) over MIN (.47) - Very Close Game
    • MIN 38 over CHI 17 - Prediction Incorrect
    • Crowd Correct @ 91%
  • TEN @ NE
    • NE (.64) over TEN (.37)
    • NE 33 over TEN 16 - Prediction Correct
    • Crowd Correct @ 98%
  • CAR @ NYG
    • CAR (.71) over NYG (.51)
    • CAR 38 over NYG 35 - Prediction Correct
    • Crowd Correct @ 79%
  • BUF @ WAS
    • BUF (.50) over WAS (.49) - Very Close Game
    • WAS 35 over BUF 25 - Prediction Incorrect
    • Crowd Incorrect @ 53%
  • GB @ OAK
    • GB (.60) over OAK (.48)
    • GB 30 over OAK 20 - Prediction Correct
    • Crowd Correct @ 85%
  • CLE @ SEA
    • SEA (.69) over CLE (.41)
    • SEA 30 over CLE 13 - Prediction Correct
    • Crowd Correct @ 98%
  • DEN @ PIT
    • PIT (.60) over DEN (.56) - Close Game
    • PIT 34 over DEN 27 - Prediction Correct
    • Crowd Correct @ 65%
  • MIA @ SD
    • MIA (.44) over SD (.33)
    • SD 30 over MIA 14 - Prediction Incorrect
    • Crowd Incorrect @ 70%
  • CIN @ SF
    • CIN (.60) over SF (.37)
    • CIN 24 over SF 14 - Prediction Correct
    • Crowd Correct @ 87%
  • ARI @ PHI
    • ARI (.62) over PHI (.48)
    • ARI 40 over PHI 17 - Prediction Correct
    • Crowd Correct @ 89%
  • DET @ NO
    • DET (.46) over NO (.46) - Very Close Game
    • DET 35 over NO 27 - Prediction Correct
    • Crowd Incorrect @ 76%

Thursday, December 17, 2015

WEEK FIFTEEN NFL PREDICTIONS

Below please find the 2015 week fifteen NFL predictions based on the model we built last year. This model uses a learning algorithm fed by data from the last fourteen weeks. To date, success rates this season are as follows: Week 2, 50%; Week 3, 62.5%; Week 4, 60%; Week 5, 64%; Week 6, 57%; Week 7, 50%; Week 8, 71%; Week 9, 53.8%; Week 10, 42.7%; Week 11, 57.1%; Week 12, 56.2%; Week 13, 62.5%; Week 14, 56.2%.

Note that the crowd disagrees only on two predictions, CHI @ MIN and DET @ NO. In both cases, the model differentiates between the teams in the second decimal place and is rounded accordingly by the value in the third decimal place. My intuitions largely agree with the crowd on the CHI @ MIN game, though, while I think NO has a very good chance against DET, that game could go either way. More importantly, however, is the concurrence between the model and crowd predictions. (See Predicting the Winners vs. Predicting the Crowd posted back in September.)

I am uncertain about what to think of the DEN @ PIT game. The model predicts PIT, which seems a good call, but this will be a game worth watching, no doubt.

So here goes - (The % indicated after each game represents the crowd-sourced prediction from nfl.com as of this morning. The two digit decimal numbers following team names should be interpreted relative to each other for comparison to determine, in part, how close the game will be.):

  • TB @ STL
    • TB (.47) over STL (.42)
    • Crowd Agrees @ 57%
  • NYJ @ DAL
    • NYJ (.60) over DAL (.39)
    • Crowd Agrees @ 84% 
  • KC @ BAL
    • KC (.65) over BAL (.41)
    • Crowd Agrees @ 91%
  • HOU @ IND
    • HOU (.46) over IND (.40)
    • Crowd Agrees @ 59%
  • ATL @ JAX
    • JAX (.54) over ATL (.36)
    • Crowd Agrees @ 59%
  • CHI @ MIN
    • CHI (.47) over MIN (.47) - Very Close Game
    • Crowd Disagrees @ 91%
  • TEN @ NE
    • NE (.64) over TEN (.37)
    • Crowd Agrees @ 98%
  • CAR @ NYG
    • CAR (.71) over NYG (.51)
    • Crowd Agrees @ 79%
  • BUF @ WAS
    • BUF (.50) over WAS (.49) - Very Close Game
    • Crowd Agrees @ 53%
  • GB @ OAK
    • GB (.60) over OAK (.48)
    • Crowd Agrees @ 85%
  • CLE @ SEA
    • SEA (.69) over CLE (.41)
    • Crowd Agrees @ 98%
  • DEN @ PIT
    • PIT (.60) over DEN (.56) - Close Game
    • Crowd Agrees @ 65%
  • MIA @ SD
    • MIA (.44) over SD (.33)
    • Crowd Agrees @ 70%
  • CIN @ SF
    • CIN (.60) over SF (.37)
    • Crowd Agrees @ 87%
  • ARI @ PHI
    • ARI (.62) over PHI (.48)
    • Crowd Agrees @ 89%
  • DET @ NO
    • DET (.46) over NO (.46) - Very Close Game
    • Crowd Disagrees @ 76%

Games to watch: CHI @ MIN, BUF @ WAS, DEN @ PIT, DET @ NO.

Saturday, December 12, 2015

WEEK FOURTEEN NFL RESULTS

Below please find the 2015 week fourteen NFL prediction results based on the model we built last year. This model uses a learning algorithm fed by data from the last thirteen weeks. To date, success rates this season are as follows: Week 2, 50%; Week 3, 62.5%; Week 4, 60%; Week 5, 64%; Week 6, 57%; Week 7, 50%; Week 8, 71.4%; Week 9, 53.8%; Week 10, 42.7%; Week 11, 57.1%; Week 12, 56.2%; Week 13, 62.5%, and this week 56.2%.

Note that the model agrees with the crowd in correct and incorrect calls except in the IND @ JAX game where the model was correct and the crowd incorrect. A closer inspection is necessary, but it looks like we are close to a heuristic that accounts for the way the crowd makes decisions.

Note also that while the overall predictive success of 2014 was averaging higher than 2015, the 2015 results are less erratic than the 2014 results. We are watching this closely. If the pattern persists in this matter, we will have done something successful with a slight tweak to last year's model that will be explained in a later post.


I expect that things will become a little more erratic during the last three weeks of the regular season as teams fight for a place in the playoffs.

So here goes - The % indicated after each game represents the crowd-sourced prediction from nfl.com as of Saturday morning. The two digit decimal numbers following team names should be interpreted relative to each other for comparison to determine, in part, how close the game will be:

  • MIN @ ARI
    • ARI (.66) over MIN (.45)
    • ARI 23 over MIN 20 - Prediction Correct
  • SEA @ BAL
    • SEA (.65) over BAL (.49)
    • SEA 35 over BAL 6 - Prediction Correct
    • Crowd Agrees @ 95% 
  • ATL @ CAR
    • CAR (.62) over ATL (.47)
    • CAR 38 over ATL 0 - Prediction Correct
    • Crowd Agrees @ 92%
  • WAS @ CHI
    • CHI (.48) over WAS (.48)
    • WAS 24 over CHI 21 - Prediction Incorrect
    • Crowd Incorrect @ 71%
  • PIT @ CIN
    • CIN (.69) over PIT (.47)
    • PIT 33 over CIN 20 - Prediction Incorrect
    • Crowd Incorrect @ 65%
  • SF @ CLE
    • SF (.41) over CLE (.31)
    • CLE 24 or SF 10 - Prediction Incorrect
    • Crowd Incorrect @ 76%
  • IND @ JAX
    • JAX (.48) over IND (.44)
    • JAX 51 over IND 16 - Prediction Correct
    • Crowd Incorrect @ 67%
  • SD @ KC
    • KC (.62) over SD (.35)
    • KC 10 over SD 3 - Prediction Correct
    • Crowd Correct @ 94%
  • TEN @ NYJ
    • NYJ (.54) over TEN (.43)
    • NYJ 30 over TEN 8 - Prediction Correct
    • Crowd Correct @ 93%
  • BUF @ PHI
    • BUF (.51) over PHI (.47)
    • PHI 23 over BUF 20 - Prediction Incorrect
    • Crowd Incorrect @ 65%
  • DET @ STL
    • DET (.49) over STL (.34)
    • STL 21 over DET 14 - Prediction Incorrect
    • Crowd Incorrect @ 66%
  • NO @ TB
    • TB (.50) over NO (.41)
    • NO 24 over TB 17 - Prediction Incorrect
    • Crowd Incorrect @ 73%
  • OAK @ DEN
    • DEN (.62) over OAK (.45)
    • OAK 15 over DEN 12 - Prediction Incorrect
    • Crowd Incorrect @ 92%
  • DAL @ GB
    • GB (.54) over DAL (.46)
    • GB 28 over DAL 7 - Prediction Correct
    • Crowd Correct @ 92%
  • NE @ HOU
    • NE (.57) over HOU (.56) - Very Close Game ???
    • NE 27 over HOU 6 - Prediction Correct
    • Crowd Correct @ 86%
  • NYG @ MIA
    • NYG (.49) over MIA (.45) - Close Game
    • NYG 31 over MIA 24 - Prediction Correct
    • Crowd Correct @ 66%

WEEK FOURTEEN NFL PREDICTIONS

Below please find the 2015 week fourteen NFL predictions based on the model we built last year. This model uses a learning algorithm fed by data from the last thirteen weeks. To date, success rates this season are as follows: Week 2, 50%; Week 3, 62.5%; Week 4, 60%; Week 5, 64%; Week 6, 57%; Week 7, 50%; Week 8, 71%; Week 9, 53.8%; Week 10, 42.7%; Week 11, 57.1%; Week 12, 56.2%; Week 13, 62.5%.

Again, I'm sorry to say that I didn't get this early enough to get the crowd agreement on the Thursday night game. Otherwise, note that the model agrees with crowd except on one game, the IND @ JAX game, where the model weighs in favor of JAX. Note also that only one game is predicted to be "very close" by the model in which NE is predicted to win over HOU but only by a little. This strikes me a very counterintuitive. NE should win this easily, but we shall see. The model and crowd both predict a win for TB over NO. Personally, I wouldn't be surprised to see NO win this one.

So here goes - (The % indicated after each game represents the crowd-sourced prediction from nfl.com as of this morning. The two digit decimal numbers following team names should be interpreted relative to each other for comparison to determine, in part, how close the game will be.):

  • MIN @ ARI
    • ARI (.66) over MIN (.45)
  • SEA @ BAL
    • SEA (.65) over BAL (.49)
    • Crowd Agrees @ 95% 
  • ATL @ CAR
    • CAR (.62) over ATL (.47)
    • Crowd Agrees @ 92%
  • WAS @ CHI
    • CHI (.48) over WAS (.48)
    • Crowd Agrees @ 71%
  • PIT @ CIN
    • CIN (.69) over PIT (.47)
    • Crowd Agrees @ 65%
  • SF @ CLE
    • SF (.41) over CLE (.31)
    • Crowd Agrees @ 76%
  • IND @ JAX
    • JAX (.48) over IND (.44)
    • Crowd Disagrees @ 67%
  • SD @ KC
    • KC (.62) over SD (.35)
    • Crowd Agrees @ 94%
  • TEN @ NYJ
    • NYJ (.54) over TEN (.43)
    • Crowd Agrees @ 93%
  • BUF @ PHI
    • BUF (.51) over PHI (.47)
    • Crowd Agrees @ 65%
  • DET @ STL
    • DET(.49) over STL (.34)
    • Crowd Agrees @ 66%
  • NO @ TB
    • TB (.50) over NO (.41)
    • Crowd Agrees @ 73%
  • OAK @ DEN
    • DEN (.62) over OAK (.45)
    • Crowd Agrees @ 92%
  • DAL @ GB
    • GB (.54) over DAL (.46)
    • Crowd Agrees @ 92%
  • NE @ HOU
    • NE (.57) over HOU (.56) - Very Close Game ???
    • Crowd Agrees @ 86%
  • NYG @ MIA
    • NYG (.49) over MIA (.45) - Close Game
    • Crowd Agrees @ 66%

Games to watch: NE @ HOU, NYG @ MIA, NO @ TB (?)