Sample Selector

Sample Selector is a tool for creating and editing samples, or groups of data you compare across—they're not "samples" in the statistical sense, but more like filters.

By default, a single sample exists: "All Data". With the Sample Selector, you can create new samples to organize your data.

You can use samples to:

A sample is composed of one or more filters, specific conditions that narrow down your sample.

Creating a sample

The general process for creating a sample is to:

The effect of multiple filters

DataShop interprets each filter after the first as an additional restriction on the data that is included in the sample. This is also known as a logical "AND". You can see the results of multiple filters in the sample preview as soon as all filters are "saved".

« Back to DataShop

Learning Curve

A learning curve visualizes changes in student performance over time. The line graph displays opportunities across the x-axis, and a measure of student performance along the y-axis. A good learning curve reveals improvement in student performance as opportunity count (i.e., practice with a given knowledge component) increases. It can also "describe performance at the start of training, the rate at which learning occurs, and the flexibility with which the acquired skills can be used" (Koedinger and Mathan 2004).

See also Learning Curve Examples and Learning Curve Algorithm.

Learning Curve types

You can view a learning curve by student or by knowledge component (KC).

By Knowledge Component View an average across all selected KCs, or view a curve for an individual KC. In the "all selected KCs" graph, each point is an average across data for all selected students and KCs. In a graph for an individual KC, each point is an average across all selected students.
By Student View an average across all selected students, or view a curve for an individual student. In the "all selected students" graph, each point is an average across data for all selected students and KCs. In a graph for an individual student, each point is an average across all selected KCs.

Toggle the inclusion of knowledge components or students by clicking their name in the navigation boxes on the left.

Change the measure of student performance by hovering over the y-axis on the graph and clicking the new measure.

Measures of student performance are described below. Regardless of metric, each point on the graph is an average across all selected knowledge components and students.

MeasureDescription
Assistance Score The number of incorrect attempts plus hint requests for a given opportunity
Error Rate The percentage of students that asked for a hint or were incorrect on their first attempt. For example, an error rate of 45% means that 45% of students asked for a hint or performed an incorrect action on their first attempt. Error rate differs from assistance score in that it provides data based only on the first attempt. As such, an error rate provides no distinction between a student that made multiple incorrect attempts and a student that made only one.
Number of Incorrects The number of incorrect attempts for each opportunity
Number of Hints The number of hints requested for each opportunity
Step Duration The elapsed time of a step in seconds, calculated by adding all of the durations for transactions that were attributed to the step.
Correct Step Duration The step duration if the first attempt for the step was correct. The duration of time for which students are "silent", with respect to their interaction with the tutor, before they complete the step correctly. This is often called "reaction time" (on correct trials) in the psychology literature. If the first attempt is an error (incorrect attempt or hint request), the observation is dropped.
Error Step Duration The step duration if the first attempt for the step was an error (hint request or incorrect attempt). If the first attempt is a correct attempt, the observation is dropped.

Viewing different curves

To switch between learning curve types:

  • Move your mouse pointer over the y-axis of the graph and click the new measure.

To switch between knowledge component and student views:

  • Select the desired view from the portion of the navigation side-bar.

To examine a single knowledge component or student:

  • Select its thumbnail from the gallery of available graphs on the bottom portion of the screen. The main graph then updates.

Available graphs are provided based on the selected samples, students, and knowledge components. (The default sample is titled 'All Data'.)

To compare conditions or other groups of data, you might define a number of samples that are subsets of 'All Data'. For more information on creating and modifying samples, see Sample Selector.

Viewing the details of a point on the curve

Explore a single point on the curve by clicking it. You can then navigate points on the curve by using the previous- and next-opportunity arrows ( and ) beneath the graph. Change the selected line in the graph by using the sample drop-down.

Each change of the selected point updates the point information beneath the graph. This displays the point's value and observation count, as well as counts of the various units of analysis—unique KCs, problems, steps, and students—that compose the point.

Click a count to see values for observations composing a point. Values shown in the table are averaged by unit of analysis, and exclude dropped or null observations.

For views showing values by student or KC, links below the table allow you to toggle the selected items in the main navigation boxes based on the values composing the point.

There are a few things to keep in mind when comparing the values for individual observations composing a single point with the total number of observations for a point and the summary value for that point:

The number of value rows in the details box might not match the number of observations for the point. This can be observed for a few reasons. One is that multiple observations often fall under a single KC, problem, or step (but not student—there is only one observation per opportunity for a student). In this case, the number of observations would exceed the number of KCs, problems, or steps, and be averaged within items of these categories. Another reason is that multiple KCs might be attributed to a single step, showing more KCs than there are observations. In no case should the number of problems, steps, or students exceed the number of observations (although future data might invalidate this claim, such as data attributing multiple possible steps to a single student action).

The average of the values in the details box might not equal the point value. Although averaging the individual values often gets you the same number as that of the point, it may not in the case where there are dropped observations for that point. (Dropped observations are shown in parentheses after the number of included observations, and are the result of a standard deviation cutoff.) This is because the values in the details box are themselves averages—by KC, problem, step, or student. When one or more observations are dropped, the values shown in details box will be averages that are unevenly weighted, since at least one row (the one with the dropped observation(s)) is an average among fewer items. It then follows that you cannot just average all rows as if they had the same weight; to find the learning curve point value, each row value would need to be weighted by the number of items it includes in its average.

A short example is given below:

Step Duration by student, KC
KC BKC doneKC a Student Avg
No Std Dev cutoff
Student Avg
Std Dev cutoff of 2.5
Student F105301515
Student C1053516.66716.667
Student 510347206.5
17.222212.7222

In this example, six values are shown for three students on three KCs, but the value 47 is dropped when a standard deviation cutoff of 2.5 SDs is introduced. When the value 47 is included, averaging down the student average column—(15+16.667+20)/3—yields the same number as averaging all six observation values. But when the value 47 is dropped, averaging down the column does not, since it's an average of averages. Such an average would need to weight each value in the last column to get the correct result (13.5).

Opportunity Cutoff

When examining a learning curve, it may be useful to limit which student/knowledge component pairs are included in the graph based on the number of opportunities students had with the knowledge component. DataShop calls this the opportunity cutoff. For example, specifying an opportunity cutoff max value of 5 would remove student/knowledge component pairs where students had more than 5 opportunities with the chosen knowledge component(s). This may remove outliers from the data and provide a better means for analysis.

You can set a minimum and/or maximum opportunity cutoff by entering numbers in the learning curve navigation and pressing Refresh Graph.

Standard Deviation Cutoff

For latency curves (“Step Duration”, “Correct Step Duration”, and “Error Step Duration”), you can set a standard deviation cutoff. This is the number of standard deviations above and below the mean for which to include data points. Data points (observations) falling outside the specified standard deviation are dropped from the graph; the x-axis (number of opportunities) is not affected.

Standard deviation for an opportunity is calculated based on data for all knowledge components in the current knowledge-component model and the currently selected students. Therefore, changing the selected KCs will not affect the standard deviation values but changing the selected students may.

Note: If you set both a standard deviation cutoff and min and/or max opportunity cutoff, DataShop calculates the standard deviation before applying the opportunity cutoff(s).

Predicted Learning Curve

The empirical learning curves (average observed errors of a skill over each learning opportunity) calculated directly from the data contain lots of noise and take the form of wiggly lines. This noise comes from various places, such as recording errors, or the environment where the students worked. The predicted learning curve is much smoother. It is computed using the Learning Factors Analysis (LFA) method, which uses a set of customized Item-Response models to predict how a student will perform for each skill on each learning opportunity. The predicted learning curves are the average predicted error of a skill over each of the learning opportunities. As much of the noise is filtered out by the LFA models, the predicted learning curves are much smoother than the empirical learning curves.

While the empirical learning curve may give a visual clue as to how well a student may do over a set of learning opportunities, the predicted curves allow for a more precise prediction of a success rate at any learning opportunity.

There are several ways to use the predicted learning curves. One is to measure how much practice is needed to master a skill. When you see a learning curve that starts high and ends high, students probably finished the curriculum without mastering the skill corresponding to that learning curve. On the other hand, a learning curve that starts low and ends low with lots of learning opportunities probably implies that that the skill is easy and students were over-practicing it. For a detailed example, see Is Over Practice Necessary? Improving Learning Efficiency with the Cognitive Tutor through Educational Data Mining (Cen, Koedinger, and Junker 2007).

The second use of predicted learning curves is to find a better set of skills that matches the student learning. An ideal predicted learning curve should be smooth and downward sloping. If a learning curve is too flat, goes up, or is too wiggly, the corresponding skill is probably not well-chosen and worth refining. For reference, see Learning Factors Analysis - A General Method for Cognitive Model Evaluation and Improvement (Cen, Koedinger, and Junker 2006).

To view the predicted learning curve (Error Rate learning curve only):

  • Select "View Predicted" from the learning curve navigation box.

In DataShop, LFA computes the statistics of a cognitive model including AIC, BIC, the coefficients of student proficiency, initial knowledge component difficulty, and knowledge component learning rate, generating the probability of success on each trial on different knowledge components. You can view the values of these parameters on the LFA Values report.

For more information on the LFA algorithm, or for assistance interpreting the predicted learning curve, contact Hao Cen. See also Learning Factors Analysis - A General Method for Cognitive Model Evaluation and Improvement (Cen, Koedinger, and Junker 2006).

Version 3.6.9 October 28, 2009 LearnLab logo