## Rules

### The Challenge

How generally or narrowly do students learn? How quickly or slowly? Will the rate of improvement vary between students? What does it mean for one problem to be similar to another? It might depend on whether the knowledge required for one problem is the same as the knowledge required for another. But is it possible to infer the knowledge requirements of problems directly from student performance data, without human analysis of the tasks?

This year's challenge asks you to predict student performance on mathematical problems from logs of student interaction with Intelligent Tutoring Systems. This task presents interesting technical challenges, has practical importance, and is scientifically interesting.

#### Technical Challenges

In terms of technical challenges, we mention just a few:

• The data matrix is sparse: not all students are given every problem, and some problems have only 1 or 2 students who completed each item. So, the contestants need to exploit relationships among problems to bring to bear enough data to hope to learn.
• There is a strong temporal dimension to the data: students improve over the course of the school year, students must master some skills before moving on to others, and incorrect responses to some items lead to incorrect assumptions in other items. So, contestants must pay attention to temporal relationships as well as conceptual relationships among items.
• Which problems a given student sees is determined in part by student choices or past success history: e.g., students only see remedial problems if they are having trouble with the non-remedial problems. So, contestants need to pay attention to causal relationships in order to avoid selection bias.

#### Scientific and Practical Importance

From a practical perspective, improved models could be saving millions of hours of students' time (and effort) in learning algebra. These models should both increase achievement levels and reduce time needed. Focusing on just the latter, for the .5 million students that spend about 50 hours per year with Cognitive Tutors for mathematics, let's say these optimizations can reduce time to mastery by at least 10%. One experiment showed the time reduction was about 15% (Cen et al. 2007). That's 5 hours per student, or 2.5 million student hours per year saved. And this .5 million is less than 5% of all algebra-studying students in the US. If we include all algebra students (20x) and the grades 6-11 for which there are Carnegie Learning and Assistment applications (5x), that brings our rough estimate to 250 million student hours per year saved! In that time, students can be moving on in math and science or doing other things they enjoy.

From a scientific viewpoint, the ability to achieve low prediction error on unseen data is evidence that the learner has accurately discovered the underlying factors which make items easier or harder for students. Knowing these factors is essential for the design of high-quality curricula and lesson plans (both for human instructors and for automated tutoring software). So you, the contestants, have the potential to influence lesson design, improving retention, increasing student engagement, reducing wasted time, and increasing transfer to future lessons.

Currently K-12 education is extremely focused on assessment. The No Child Left Behind act has put incredible pressure on schools to "teach to the test", meaning that a significant amount of time is spent preparing and taking standardized tests. Much of the time spent drilling for and taking these tests is wasted from the point of view of deep learning (long-term retention, transfer, and desire for future learning); so any advances which allow us to reduce the role of standardized tests hold the promise of increasing deep learning.

To this end, a model which accurately predicts long-term future performance as a byproduct of day-to-day tutoring could augment or replace some of the current standardized tests: this idea is called "assistment", from the goal of assessing performance while simultaneously assisting learning. Previous work has suggested that assistment is indeed possible: e.g., an appropriate analysis of 8th-grade tutoring logs can predict 10th-grade standardized test performance as well as 8th-grade standardized test results can predict 10th-grade standardized test performance (Feng, Heffernan, & Koedinger, 2009). But it is far from clear what the best prediction methods are; so, the contestants' algorithms may provide insights that allow important improvements in assistment.

#### Fundamental Questions

If a student is correct at one problem (e.g., "Starting with a number, if I multiply it by 6 and then add 66, I get 81.90. What's the number?") at one time, how likely are they to be correct at another problem (e.g., "Solve for x: 6x+66=81.90") at a later time?

These questions are of both scientific interest and practical importance. Scientifically, relevant deep questions include what is the nature of human knowledge representations and how generally do humans transfer their learning from one situation to another. Human learners do not always represent and solve mathematical tasks as we might expect. You might be surprised if you thought that a student working on the second problem above, the equation 6x+66=81.90, is likely to be correct given that he was correct on the first problem, the story problem. It turns out that most students are able to solve simple story problems like this one more successfully than the matched equation (Koedinger & Nathan, 2004; Koedinger, Alibali, & Nathan, 2008). In other words, there are interesting surprises to be found in student performance data.

Cognitive Tutors for mathematics are now in use in more than 2,500 schools across the US for some 500,000 students per year. While these systems have been quite successful, surprises like the one above suggest that the models behind these systems can be much improved. More generally, a number of studies have demonstrated how detailed cognitive task analysis can result in dramatically better instruction (Clark, Feldon, van Merriënboer, Yates, & Early, 2007; Lee, 2003). However, such analysis is painstaking and requires a high level of psychological expertise. We believe it possible that machine learning on large data sets can reap many of the benefits of cognitive task analysis, but without the great effort and expertise currently required.

### Competition Rules

Conditions of participation: Anybody who complies with the rules of the challenge (KDD Cup 2010) is welcome to participate. Only the organizers are excluded from participating. The KDD Cup 2010 is part of the competition program of the Knowledge Discovery in Databases conference (KDD 2010), July 25-28 in Washington, DC. Participants are not required to attend the KDD Cup 2010 workshop, which will be held at the conference, and the workshop is open to anyone who registers. The proceedings of the competition will be published in a volume of the JMLR: Workshop and Conference Proceedings series.

Anonymity: All entrants must identify themselves by registering on the KDD Cup 2010 website. However, they may elect to remain anonymous by choosing a nickname and checking the box "Make my profile anonymous". If this box is checked, only the nickname will appear in the Leaderboard instead of the real name. Participant emails will not appear anywhere on the website and will be used only by the organizers to communicate with the participants. To be eligible for prizes, the participants will have to publicly reveal their identity and uncheck the box "Make my profile anonymous".

Teams: To register a team, only register the team leader and choose a nickname for your team. We'll let you know later how to disclose the members of your team. We limit each team to one final entry. As an individual, you cannot enter under multiple names—this would be considered cheating and would disqualify you—nor can you participate under multiple teams. Multiple teams from the same organization, however, are allowed so long as each team leader is a different person and the teams do not intersect. During the development period, each team must have a different registered team leader. To be ranked in the challenge and qualify for prizes, each registered participant (individual or team leader) will have to disclose the names of eventual team members, before the final results of the challenge get released. Hence, at the end of the challenge, you will have to choose to which team you want to belong (only one!), before the results are publicly released. If a person participates in multiple teams, those teams will be disqualified. After the results are released, no change in team composition will be allowed. Before the end of the challenge the team leaders will have to declare the composition of their team. This will have to correspond to the list of co-authors in the proceedings, if they decide to publish their results. Hence a professor cannot have his/her name on all his/her students papers (but can be thanked in acknowledgments).

A team can be either a student team (eligible for student-team prizes) or not a student team (eligible for travel awards). In a student team, a professor should be cited appropriately, but in the spirit of the competition, student teams should consist primarily of student work. We will ask for participants to state whether they are a student team prior to the end of the competition.

Data: Data are available for download from the Downloads page to registered participants. Each data set is available as a separate archive to facilitate downloading. For viewing accuracy on the Leaderboard, participants may enter results on either or both development and challenge data sets, but results on the development data sets will not count toward the final evaluation.

Challenge duration: The challenge is about 2 months in duration (April 1 - June 8, 2010). To be eligible for prizes, final submissions must be received by June 8 11:59pm EDT (-4 GMT).

On-line feedback: On-line feedback is available through the upload results page and Leaderboard.

Submission method: The method of submission is via the form on the Upload page. To be ranked, submissions must include results on test portion only of the challenge or development data sets. Results on the development data sets will not count as part of the competition. Multiple submissions are allowed.

Evaluation and ranking: For each team, only the last valid entry made by the team leader will count towards determining the winner. Valid entries must include results on both challenge data sets. The method of scoring is described on the Evaluation page.

Reproducibility: Participation is not conditioned on delivering code nor publishing methods. However, we will ask the top ranking participants to voluntarily fill out a fact sheet about their methods, contribute papers to the proceedings, and help in reproducing their results.

Prizes: Thanks to our sponsors, Facebook, Elsevier, and IBM Research, we will be offering the following prizes to student teams:

Prize amounts increased on April 23, 2010:
First place: \$4500 \$5500
Second place: \$2500 \$3000
Third place: \$1000 \$1500

The Pittsburgh Science of Learning Center (PSLC) will provide the following travel awards to cover expenses related to attending the KDD Cup 2010 workshop at the KDD conference:

Overall first place: \$1700
Overall second place: \$1150
Overall third place: \$650

Student first place: \$1700
Student second place: \$1150
Student third place: \$650

### References

• Cen, H., Koedinger, K. R., & Junker, B. (2006). Learning Factors Analysis: A general method for cognitive model evaluation and improvement. In M. Ikeda, K. D. Ashley, T.- W. Chan (Eds.) Proceedings of the 8th International Conference on Intelligent Tutoring Systems, 164-175. Berlin: Springer-Verlag.
• Clark, R. E., Feldon, D., van Merriënboer, J., Yates, K., & Early, S. (2007). Cognitive task analysis. In J. M. Spector, M. D. Merrill, J. J. G. van Merriënboer, & M. P. Driscoll (Eds.), Handbook of research on educational communications and technology (3rd ed., pp. 577-593). Mahwah, NJ: Lawrence Erlbaum Associates.
• Feng, M., Heffernan, N.T., & Koedinger, K.R. (2009). Addressing the assessment challenge in an online system that tutors as it assesses. User Modeling and User-Adapted Interaction: The Journal of Personalization Research (UMUAI). 19(3), pp. 243-266.
• Koedinger, K. R. & Aleven, V. (2007). Exploring the assistance dilemma in experiments with Cognitive Tutors. Educational Psychology Review, 19 (3): 239-264. Lee, R. L. (2003). Cognitive task analysis: A meta-analysis of comparative studies. Unpublished doctoral dissertation, University of Southern California, Los Angeles, California.
• Pavlik, P. I., Cen, H., Wu, L.,& Koedinger, K. R. (2008). Using item-type performance covariance to improve the skill model of an existing tutor. In Proceedings of the First International Conference on Educational Data Mining. 77-86.