Download the development and challenge data sets.
Note: You must be logged in to download data. Log in or create an account.
Note: If you use any of these data sets in your research, please cite them as follows:
E.g.,
Development data sets are provided for familiarizing yourself with the format and developing your learning model. Using them is optional, and your predictions on these data sets will not count toward determining the winner of the competition. Development data sets differ from challenge sets in that the actual student performance values for the prediction column, "Correct First Attempt", are provided for all steps—see the file ending in "_master.txt".
Predictions on challenge data sets will count toward determining the winner of the competition. In each of these two data sets, you'll be asked to provide predictions in the column "Correct First Attempt" for a subset of the steps. For more information on which steps these will be, see the bottom of our Data Format page.
Having trouble unzipping these archives? Download a compressed tar file containing both challenge data sets, which you can decompress with the following Unix command:
tar xvzf kddcup_challenge.tar.gz
File | Size | SHA1 |
---|---|---|
kddcup_challenge.tar.gz | 707 MB | bc11ac8ebbcf11dcd6b4485a193c07ad9be3853e |
For a description of the format of the data, see the Data Format page.
Contact us via email.