Sample Selector

Sample Selector is a tool for creating and editing samples, or groups of data you compare across—they're not "samples" in the statistical sense, but more like filters.

By default, a single sample exists: "All Data". With the Sample Selector, you can create new samples to organize your data.

You can use samples to:

A sample is composed of one or more filters, specific conditions that narrow down your sample.

Creating a sample

The general process for creating a sample is to:

The effect of multiple filters

DataShop interprets each filter after the first as an additional restriction on the data that is included in the sample. This is also known as a logical "AND". You can see the results of multiple filters in the sample preview as soon as all filters are "saved".

Export: Getting Data Out

DataShop's Export function allows you to save your log data out of DataShop and into an anonymous, tab-delimited text file. As in the other DataShop reporting tools, the sample selector allows you to filter rows based on criteria you define, and apply your own knowledge component model to the data before exporting.

Exporting data

  1. Select samples to include in your data by clicking the name of the sample(s) in the sidebar. (Bolded sample names are included.) The Export preview will update to reflect the changes you've made. If you select more than one sample to export, transactions/steps/problems that occur in more than one sample will be duplicated in the export. You can identify them with the "Sample" column.
  2. Examine the "Export File Status" table above the Export button to see 1) how fast the export process will be, and 2) to gauge the age of the cached export file, if one exists.
    - Checkmark Icon means the sample is cached and up-to-date. DataShop will prepare the file for download quickly.
    - Hourglass Icon means the sample is not cached, so exporting may take considerably longer.
    - Alert Icon means that although the sample is cached, either recent KC models or logged data are not yet included in the sample. To find out which, hover your mouse cursor over the icon. This sample should be up-to-date within 24 hours.
    Including at least one sample that is not up-to-date means the export will take longer.
    Note: If a sample is not cached, the order of rows in the export preview table may not match the order that you will see in the export file (i.e., rows ordered by session, then time).
  3. (Export by Student-Step or Student-Problem only) Select the desired knowledge component model from the knowledge component model combobox. The Export Preview will update to display a table with the chosen knowledge component model applied.
  4. Click the Export button. When the export process is complete, you will be prompted to save a zip file containing a folder hierarchy of one or more text files, each containing a tab-delimited export.
    Note: The naming convention for the top-level zip file is: the dataset ID (a number visible in the URL of each DataShop webpage that uniquely identifies the dataset), the export type, and a timestamp. For exports that are specific to a KC model, the KC model ID is also included in the file name. Within the exported zip file, a text file is included for each selected sample. The naming convention for these files is: the sample name, sample ID, and the time stamp from when the data was cached.

Viewing exported data

You can now view and edit your file in a text or spreadsheet editor.

Important: If you save your data from a spreadsheet editor and would like to import the data into DataShop, be sure to preserve the tab-delimited text format of the file.

Microsoft Excel users: potential data loss when opening and saving exported data in Excel

Newer versions of Microsoft Excel tend to automatically format date/time fields for you. The result of this is that timestamp values are presented in a format that may obscure levels of detail in the time data. If you then save the file from Excel, you will lose this information!

To work around this issue and safely view your data in Microsoft Excel:

  1. Launch Microsoft Excel.
  2. Select File > Open. Do not double-click your data file or drag it into Excel as you won't be able to complete the following steps.
  3. Browse for your exported data file and click Open. A Text Import Wizard will appear.
  4. On screen 1 of 3 of the Text Import Wizard, ensure that Delimited is selected as the Original Data Type. Click Next.
  5. On screen 2 of 3 of the Text Import Wizard, ensure that Tab is selected as the only delimiter, and click Next.
  6. On screen 3 of 3 of the Text Import Wizard, select all columns (click the first column, hold SHIFT, and click the last column). With all columns selected, change the Column data format to Text. The preview should update so that the header Text appears over each.
  7. Click Finish. You can now view your data in Excel, as Excel now knows not to automatically format any columns.

Column descriptions

Columns of the current export formats are described below.

Note: The list and order of columns in any of the export formats can change at any time. If you are writing a program that expects the columns in a certain order, be sure to verify the header of the column before assuming it's the column you expect.

See the history of changes to these formats.

By Transaction

Within each sample, rows are ordered by student then by transaction time. If the transaction time is identical for a given student, we can't know the real order in which the transactions occurred, so DataShop uses internal database identifiers to order the rows consistently.

Column Description
Row A row counter
Sample Name The sample that contains the transaction. If a transaction appears in multiple samples, the transaction will be repeated, but with a different sample name.
Transaction Id A unique ID that identifies the transaction. Currently used for annotating transactions with custom fields via web services.
Anon Student Id DataShop-generated anonymous student ID. To obtain original student identifiers or to learn more about data anonymization, see About data anonymization below.
Session Id A dataset-unique string that identifies the user's session with the tutor.
Time Time the transaction occurred. For instance, if a student types "25" and presses return, the transaction time is at the point in which they press return.
Time Zone The local time zone (e.g., EST, PST, US/Eastern).
Duration (sec) Duration of the transaction in seconds. This is the time of the current transaction minus that of the preceding transaction or problem start event—whichever is closer in time to the current transaction. If this difference is greater than 10 minutes, or if the prior transaction occurred during a different user session, DataShop reports the duration as null (a dot). If the current transaction is preceded by neither another transaction or a problem start event, duration is shown as null. The duration is formatted without decimal places if the two times used in the calculation were without millisecond precision.
Student Response Type The type of attempt made by the student (e.g., "ATTEMPT" or "HINT_REQUEST"). This is logged in the semantic_event element.
Student Response Subtype A more detailed classification of the student attempt. For example, the CTAT software describes actions taken by the tutor on behalf of the student as having subtype "tutor-performed".
Tutor Response Type The type of response made by the tutor (e.g., "RESULT" or "HINT_MSG").
Tutor Response Subtype A more detailed classification of the tutor response.
Level (level_type) The problem hierarchy name (e.g., "Understanding Fractions") of the type specified in the column header (e.g., "Unit"). There may be multiple "Level" columns if the problem hierarchy is more than one level deep. Level is logged in the level element.
Problem Name The name of the problem. Two problems with the same "Problem Name" are considered different "problems" by DataShop if the following logged values are not identical: problem name, context, tutor_flag (whether or not the problem or activity is tutored) and "other" field. These fields are logged in the problem element.
Problem View The number of times the student encountered the problem so far. This counter increases with each instance of the same problem. See "Problem View" in the "By Student-Step" table below.
Problem Start Time If the problem start time is not given in the original log data, then it is set to the time of the last transaction of the prior problem. If there is no prior problem for the session, the time of the earliest transaction is used. Earliest transaction time is equivalent to the minimum transaction time for the earliest step of the problem. For more detail on how problem start time is determined, see Determining Problem Start Time.
Step Name Formed by concatenating the "selection" and "action". Also see the glossary entry for "step".
Attempt at Step As of this transaction, the current number of attempts toward the identified step.
Outcome The tutor's evaluation of the student's attempt. For example, "CORRECT", "INCORRECT", or "HINT". This is logged in the action_evaluation element.
Selection A description of the interface element(s) that the student selected or interacted with (for example, "LowestCommonDenominatorCell"). This is logged in the event_descriptor element.
Action A description of the manipulation applied to the selection.
Input The input the student submitted (e.g., the text entered, the text of a menu item or a combobox entry).
Feedback Text The body of a hint, success, or incorrect action message shown to the student. It is generally a text value, logged in the tutor_advice element.
Feedback Classification The type of error (e.g., "sign error") or type of hint.
Help Level In the case of hierarchical hints, this is the depth of the hint. "1", for example, is an initial hint, while "3" is the third hint.
Total Num Hints The total number of hints available. This is logged in the action_evaluation element.
Condition Name The name of the condition (e.g., "Unworked").
Condition Type A condition classification (e.g., "Experimental", "Control"); optional at the time of logging.
KC (model_name) The knowledge component for this transaction. It is a member of the knowledge component model named in the column header. One "KC (model_name)" column should appear in the export for each KC model in the dataset.
KC Category (model_name) The knowledge component "category" logged by some tutors. It is a member of the knowledge component model named in the column header. One "KC Category (model_name)" column should appear in the export for each KC model in the dataset.
School The name of the school where the student used the tutor to create this transaction.
Class The name of the class the student was in when he or she used the tutor to create this transaction.
CF (custom_field_name) The value of a custom field. This is usually information that did not fit into any of the other logging fields (i.e., any of the other columns), and so was logged in this special container.
Event Type Allowed values are "assess", "instruct" and "assess_instruct". Blank is also allowed. Only "instruct" and "assess_instruct" values are treated as learning opportunities. Value of "instruct" causes the Outcome column to be blank.

By Student-Step

Within each sample, rows are ordered by student, then time of the first correct attempt (“Correct Transaction Time”) or, in the absence of a correct attempt, the time of the final transaction on the step (“Step End Time”).

Column Description
Row A row counter.
Sample The sample that includes this step. If you select more than one sample to export, steps that occur in more than one sample will be duplicated in the export.
Anon Student ID The student that performed the step.
Problem Hierarchy The location in the curriculum hierarchy where this step occurs.
Problem Name The name of the problem in which the step occurs.
Problem View The number of times the student encountered the problem so far. This counter increases with each instance of the same problem. Note that problem view increases regardless of whether or not the step was encountered in previous problem views. For example, a step can have a "Problem View" of "3", indicating the problem was viewed three times by this student, but that same step need not have been encountered by that student in all instances of the problem. If this number does not increase as you expect it to, it might be that DataShop has identified similar problems as distinct: two problems with the same "Problem Name" are considered different "problems" by DataShop if the following logged values are not identical: problem name, context, tutor_flag (whether or not the problem or activity is tutored) and "other" field. For more on the logging of these fields, see the description of the "problem" element in the Guide to the Tutor Message Format. For more detail on how problem view is determined, see Determining Problem View.
Step Name Formed by concatenating the "selection" and "action". Also see the glossary entry for "step".
Step Start Time The step start time is determined one of three ways:
  • If it's the first step of the problem, the step start time is the same as the problem start time
  • If it's a subsequent step, then the step start time is the time of the preceding transaction, if that transaction is within 10 minutes.
  • If it's a subsequent step and the elapsed time between the previous transaction and the first transaction of this step is more than 10 minutes, then the step start time is set to null as it's considered an unreliable value.
For a visual example, see the Examples page.
First Transaction Time The time of the first transaction toward the step.
Correct Transaction Time The time of the correct attempt toward the step, if there was one.
Step End Time The time of the last transaction toward the step.
Step Duration (sec) The elapsed time of the step in seconds, calculated by adding all of the durations for transactions that were attributed to the step. See the glossary entry for more detail. This column was previously labeled "Assistance Time". It differs from "Assistance Time" in that its values are derived by summing transaction durations, not finding the difference between only two points in time (step start time and the last correct attempt).
Correct Step Duration (sec) The step duration if the first attempt for the step was correct. This might also be described as "reaction time" since it's the duration of time from the previous transaction or problem start event to the correct attempt. See the glossary entry for more detail. This column was previously labeled "Correct Step Time (sec)".
Error Step Duration (sec) The step duration if the first attempt for the step was an error (incorrect attempt or hint request).
First Attempt The tutor's response to the student's first attempt on the step. Example values are "hint", "correct", and "incorrect".
Incorrects Total number of incorrect attempts by the student on the step.
Hints Total number of hints requested by the student for the step.
Corrects Total correct attempts by the student for the step. (Only increases if the step is encountered more than once.)
Condition The name and type of the condition the student is assigned to. In the case of a student assigned to multiple conditions (factors in a factorial design), condition names are separated by a comma and space. This differs from the transaction format, which optionally has "Condition Name" and "Condition Type" columns.
KC (model_name) (Only shown when the "Knowledge Components" option is selected.) Knowledge component(s) associated with the correct performance of this step. In the case of multiple KCs assigned to a single step, KC names are separated by two tildes ("~~").
Opportunity (model_name) (Only shown when the "Knowledge Components" option is selected.) An opportunity is the first chance on a step for a student to demonstrate whether he or she has learned the associated knowledge component. Opportunity number is therefore a count that increases by one each time the student encounters a step with the listed knowledge component. In the case of multiple KCs assigned to a single step, opportunity number values are separated by two tildes ("~~") and are given in the same order as the KC names. Check here to see how opportunity count is computed when Event Type column is present in transaction data.
Predicted Error Rate (model_name) A hypothetical error rate based on the Additive Factor Model (AFM) algorithm. A value of "1" is a prediction that a student's first attempt will be an error (incorrect attempt or hint request); a value of "0" is a prediction that the student's first attempt will be correct. For specifics, see below "Predicted Error Rate" and how it's calculated. In the case of multiple KCs assigned to a single step, Datashop implements a compensatory sum across all of the KCs, thus a single value of predicted error rate is provided (i.e., the same predicted error rate for each KC assigned to a step). For more detail on Datashop's implementation for multi-skilled step, see Model Values page.

By Student-Problem

Within each sample, rows are ordered by student, then problem start time.

Column Description
Row A row counter.
Sample The sample that includes this problem. If you select more than one sample to export, problems that occur in more than one sample will be duplicated in the export.
Anon Student ID The student that worked on the problem.
Problem Hierarchy The location in the curriculum hierarchy where this problem occurs.
Problem Name The name of the problem.
Problem View The number of times the student encountered the problem so far. This counter increases with each instance of the same problem. See "Problem View" in the "By Student-Step" table above.
Problem Start Time If the problem start time is not given in the original log data, then it is set to the time of the last transaction of the prior problem. If there is no prior problem for the session, the time of the earliest transaction is used. Earliest transaction time is equivalent to the minimum transaction time for the earliest step of the problem. For more detail on how problem start time is determined, see Determining Problem Start Time.
Problem End Time Derived from the maximum transaction time of the latest step of the problem.
Latency (sec) The amount of time the student spent on this problem. Specifically, the difference between the problem start time and the last transaction on this problem.
Steps Missing Start Times The number of steps (from the student-step table) with "Step Start Time" values of "null".
Hints Total number of hints the student requested for this problem.
Incorrects Total number of incorrect attempts the student made on this problem.
Corrects Total number of correct attempts the student made for this problem.
Avg Corrects The total number of correct attempts / total number of steps in the problem.
Steps Total number of steps the student took while working on the problem.
Avg Assistance Score Calculated as (total hints requested + total incorrect attempts) / total steps.
Correct First Attempts Total number of correct first attempts made by the student for this problem.
Condition The name and type of the condition the student is assigned to. In the case of a student assigned to multiple conditions (factors in a factorial design), condition names are separated by a comma and space. This differs from the transaction format, which optionally has "Condition Name" and "Condition Type" columns.
KCs Total number of KCs practiced by the student for this problem.
Steps without KCs Total number of steps in this problem (performed by the student) without an assigned KC.
KC List Comma-delimited list of KCs practiced by the student for this problem.

Determining Problem View

Problem View is determined one of three ways:

  1. If the original log data was in the tutor message format XML, a problem start or restart can be indicated in the context message with a START_PROBLEM in the name attribute. See context message attributes.
  2. If the original log data came from tab-delimited files and the Problem View or Problem Start Time is included
  3. If no information is given about the Problem View or Problem Start Time in the original log data, then DataShop determines when a new instance of the problem occurs by looking for interleaved problems. If another problem's transactions occur in between, then the problem view is incremented.

Determining Problem Start Time

Problem Start Time is determined one of three ways:

  1. If the original log data was in the tutor message format XML, a problem start or restart can be indicated in the context message with a START_PROBLEM in the name attribute. The problem start time is set to the time field in the context message. See the meta element.
  2. If the original log data came from tab-delimited files and the Problem Start Time is included.
  3. If no information is given about the Problem Start Time in the original log data, then DataShop determines the problem start time to be the time of the last transaction of the prior problem (if one exists) or the time of the earliest transaction.

About data anonymization

Exported data is anonymized; real student IDs are replaced with anonymous IDs during the export process. Should you wish to obtain identifiable student IDs—for example, if you are the instructor for a course or if the original data was anonymous—please contact us so we can confirm that you are authorized to view the real student IDs. We will then provide a mapping table from the anonymized IDs to the real student IDs.

Export tips for Internet Explorer

If downloading of your export file is blocked by Internet Explorer, check your browser's security settings:

  1. Select Tools > Internet Options
  2. Select the Security tab.
  3. Select the Internet icon.
  4. Click Custom Level...
  5. Under “Downloads”, ensure that both Automatic prompting for file downloads and File download settings are set to Enable.