Sample Selector

Sample Selector is a tool for creating and editing samples, or groups of data you compare across—they're not "samples" in the statistical sense, but more like filters.

By default, a single sample exists: "All Data". With the Sample Selector, you can create new samples to organize your data.

You can use samples to:

A sample is composed of one or more filters, specific conditions that narrow down your sample.

Creating a sample

The general process for creating a sample is to:

The effect of multiple filters

DataShop interprets each filter after the first as an additional restriction on the data that is included in the sample. This is also known as a logical "AND". You can see the results of multiple filters in the sample preview as soon as all filters are "saved".

Learning Curve Algorithm

To draw a learning curve, DataShop calculates individual points based on step aggregate values, which can be seen in the student-step rollup table. As the table name implies, these aggregate values are calculated by single student per single step. Two key values in these aggregates are the knowledge components (KCs) attributed to the student-step attempts, and the time attributed to the student step, or step time.

Knowledge component attribution

Most attempts in the transaction data have a knowledge component (KC) associated with them. In Table 1 (below), three attempts are shown.

Tx #TimeStepEvaluationKC
Table 1. Transaction sample for step X
111:51Step XHintKC B
211:52Step XIncorrectKC B
311:53Step XCorrectKC A

In this example, the first two transactions are error attempts on step X. The tutor attributed them to KC B. The third transaction is a correct attempt on step X, which the tutor attributed to KC A.

Datashop categorizes this KC error attribution information as the tutor's best guess as to which KC(s) the student was working toward. For the student-step described in Table 1, DataShop attributes KC A to the step; KC B is not attributed to the step. Since the correct attempt existed for the step, DataShop attributed only the KC from the correct attempt, and assumed that the preceding attempts were really toward KC A.

KC attribution rule 1: If a correct attempt exists for a student-step, attribute all KCs on that correct attempt to the student-step.

A correct attempt may not exist for a student-step. In Table 2, no correct attempt exists for that student-step.

Tx #TimeStepEvaluationKC
Table 2. Transaction sample for step Y
411:55Step YIncorrectKC A
511:56Step YHintKC B
611:57Step YIncorrectKC A

If no correct attempt exists for the student-step, DataShop assigns the KCs from all error attempts to that step. For step Y, the KCs assigned are A and B.

KC attribution rule 2: If no correct attempt exists for a student-step, attribute the union of all KCs from all error attempts on that step to the student-step.

Step time and opportunity attribution

DataShop also needs to attribute a time to each step so that it can identify and sort opportunities.

For a given student-step in which a correct attempt exists, DataShop attributes the time of the first correct transaction as the step time. For a student-step in which no correct attempt exists, DataShop assigns the maximum time for all error attempts as the step time. The student-steps are then ordered by the step time to determine the opportunities for the KCs.

In the example in Table 1 below, a correct attempt exists.

Tx #TimeStepEvaluationKC
Table 1. Transaction sample for step X
111:51Step XHintKC B
211:52Step XIncorrectKC B
311:53Step XCorrectKC A

For step X, the step time is therefore "11:53", the time of the first correct attempt.

In Table 2, no correct attempt exists.

Tx #TimeStepEvaluationKC
Table 2. Transaction sample for step Y
411:55Step YIncorrectKC A
511:56Step YHintKC B
611:57Step YIncorrectKC A

For step Y, the step time is therefore "11:57", the maximum time of the incorrect attempts.

The resulting opportunity counts for KCs A and B across steps X and Y would be:

StepKCOpportunity
Table 3. Step table showing opportunity counts for KCs A and B
Step XKC A1
Step YKC A, KC B2, 1

KC A on step X receives opportunity 1 because its step time (11:53) comes before the step time for KC A on step Y (11:57). KC B receives opportunity 1 because it only appears once, with step Y at 11:57. Step Y has KCs A and B associated with it due to KC attribution rule 2 (see above). The opportunity counts are also incremented independently for the two KCs on step Y.

Learning curve plotting

With the KC and step-time attribution determined, DataShop can then plot points in a learning curve.

We can draw a simple error rate learning curve for KC A based on the six transactions in Tables 1 and 2.

ER learning curve example

This graph can be summarized as: On the first opportunity to demonstrate KC A (step X), the first attempt was a hint request (ie, an error). On the second opportunity (step Y), the first attempt was an incorrect attempt (ie, an error). As error rate is either 0 or 1 for a step, we have two error rates of 1 (100%).

An assistance score (incorrect attempts plus hint requests) graph would look like the following:

Assistance Score learning curve example

Creating aggregate curves

With data for individual student-steps stored, DataShop can create aggregate graphs (KC A across all students, for example) by simply computing an average for each opportunity. Viewing by student or KC means computing an average for a subset of all data points at each opportunity.

Edge cases in determining KC or step-time attribution

In some data, a single step may repeat for a given student. DataShop determines the boundary between opportunities by examining problem start events. A problem start event between two student actions toward the same step means that the second action is toward a new, unique opportunity.

A problem start event can be indicated in an XML log or tab-delimited file that describes the tutoring session. In XML, a problem start event is indicated by a context message with a name attribute of "START_PROBLEM". In a tab-delimited file, a problem start event is indicated by an increment of the "Problem View" column or a new value in the "Problem Start Time" column.