Development Data Sets

Development data sets are provided for familiarizing yourself with the format and developing your learning model. Using them is optional, and your predictions on these data sets will not count toward determining the winner of the competition. Development data sets differ from challenge sets in that the actual student performance values for the prediction column, "Correct First Attempt", are provided for all steps—see the file ending in "_master.txt".

Challenge Data Sets

Predictions on challenge data sets will count toward determining the winner of the competition. In each of these two data sets, you'll be asked to provide predictions in the column "Correct First Attempt" for a subset of the steps. For more information on which steps these will be, see the bottom of our Data Format page.

Having trouble unzipping these archives? Download a compressed tar file containing both challenge data sets, which you can decompress with the following Unix command:

tar xvzf kddcup_challenge.tar.gz
File Size SHA1
kddcup_challenge.tar.gz 707 MB bc11ac8ebbcf11dcd6b4485a193c07ad9be3853e

About the data format

