You will be allowed to train on the training portion of each data set, and will then be evaluated on your performance at providing Correct First Attempt values for the test portion. We will provide feedback for formatting errors in prediction files, but we will not reveal accuracy on test data until the end of the competition. Note that for each test file you submit, an unidentified portion will be used to validate your data and provide scores for the leaderboard, while the remaining portion will be used for determining the winner of the competition.
For a valid submission, the evaluation program will compare the predictions you provided against the undisclosed true values and report the difference as Root Mean Squared Error (RMSE). If a data set file is missing from a submission, the evaluation program will report the RMSE as 1 for that file. The total score for a submission will then be the average of the RMSE values. All data sets will receive equal weight in the final average, independent of their size.
At the end of the competition, the winner will be the team with the lowest total score.
Call for participants
Registration opens at 2pm EDT, development data sets available
Competition starts at 2pm EDT, challenge data sets available
Competition ends at 11:59pm EDT
Fact sheet and team composition info due by 11:59pm EDT
Winners announced
KDD Cup Workshop
KDD Cup is the annual Data Mining and Knowledge Discovery competition organized by ACM Special Interest Group on Knowledge Discovery and Data Mining (KDD), the leading professional organization of data miners.
This year's competition is hosted by PSLC DataShop. Learn more about the organizers and sponsors.
Contact us via email.