Sample Selector

Sample Selector is a tool for creating and editing samples, or groups of data you compare across—they're not "samples" in the statistical sense, but more like filters.

By default, a single sample exists: "All Data". With the Sample Selector, you can create new samples to organize your data.

You can use samples to:

A sample is composed of one or more filters, specific conditions that narrow down your sample.

Creating a sample

The general process for creating a sample is to:

The effect of multiple filters

DataShop interprets each filter after the first as an additional restriction on the data that is included in the sample. This is also known as a logical "AND". You can see the results of multiple filters in the sample preview as soon as all filters are "saved".


KC Models

A KC (Knowledge Component) model is a mapping between steps and knowledge components in a dataset. In DataShop, each unique step can map to zero or more knowledge components.

From the KC Models page, you can compare existing KC models, export an existing model or template for creating a new KC model, or import a new model that you've created.

Why create additional KC models and import them to DataShop?

A primary reason for creating a new KC model is that an existing model is insufficient in some way—it may model some knowledge components too coarsely, producing learning curves that spike or dip, or it may be too fine-grained (too many knowledge components), producing curves that end after one or two opportunities. Or perhaps the model fails to model the domain sufficiently or with the right terminology. In any case, you may find value in creating a new KC model.

By importing the resulting KC model that you created back into DataShop, you can use DataShop tools to assess your new model. Most reports in DataShop support analysis by knowledge component model, while some currently support comparing values from two KC models simultaneously—see the predicted values on the error rate Learning Curve, for example. We plan to create new features in DataShop that support more direct knowledge component model comparison.

We recommend these two DataShop tutorial videos that describe how to find instances where an alternative knowledge component model might be useful, and how to create such a model easily using Excel and Datashop.

Auto-generated KC models

DataShop creates two knowledge component models in addition to the model that was logged or imported when the dataset was created:

  • single-KC model: the same knowledge component is applied to every transaction in the dataset, producing a very general model
  • unique-step model: a unique knowledge component is applied to each unique step in the dataset, producing a very precise (likely too much so) model.

Note: For the unique-step model, DataShop will not create a KC for a unique step if the number of observations for that step is under a certain threshold. This threshold is currently 10% of the total number of students represented in the dataset, or 10 or more students. So in a dataset with 100 students, a step with fewer than 10 observations will not have a KC created for it in the unique-step model.

KC model mapping types

A mapping type describes the level of granularity of the connection between knowledge components and log data. Two mapping types currently exist:

  • correct-transaction-to-KC
  • step-to-KC

Tutors that log data with KC information produce a mapping at the transaction level where each transaction can have one or more associated knowledge components. This is the lowest level possible in DataShop's schema. For these data, DataShop creates a KC model based on correct transactions alone (the correct-transaction-to-KC mapping type). (See Knowledge component attribution for more information.)

Auto-generated KC models created by DataShop map knowledge components to steps (the step-to-KC mapping type). This is at a level coarser than that of a transaction-to-KC mapping.

KC models you create are also at the level of step-to-KC (the step-to-KC mapping type).

The primary difference between the two mapping types is that for a correct-transaction-to-KC mapping, a step can have different KCs associated with it depending on the tutoring situation, while for a step-to-kc mapping, all steps will have the same KCs for all students. Whether or not there is a practical difference between the two types depends on the tutoring system and the data it logged.

Comparing KC models

On the KC Models page, each model is described by:

  • a number of KCs
  • a number of observations labeled with KCs
  • five statistical measures of goodness of fit for the model: AIC, BIC, and three Cross Validation RMSE values. These model fit values are described in more detail on the Model Values help page.

Models are grouped by the number of observations, sorted in ascending order. The secondary sort, which can be changed, defaults to AIC (lowest to highest, or best fit with fewest parameters to worst fit or additional parameters) and then model name. You can change the secondary sort order by using the drop-down list at the top of the page. The sort order chosen also affects the order of models in the KC Models drop-down list seen in the navigation area of various reports.

Note: The program that generates the statistical measures of goodness of fit (called AFM) will not run on large datasets. "Large", in this case, is a function of the number of transactions, students, and KCs—a dataset with more than 300,000 transactions, 250 students, and 300 KCs will prevent AFM from running successfully. The current workaround for this limitation is to create a smaller dataset with a subset of the data.

One general goal of KC modeling is to determine the "best" model for representing knowledge by fitting the model to the data. The "best" model would not only account for most of the data—it would have the highest number of observations labeled with KCs—and fit the data well, but it would do so with fewest parameters (KCs). The BIC value that DataShop calculates tells you how well the model fits the data (lower values are better), and it penalizes the models for overfitting (having additional parameters). This penalty for having additional parameters is stronger than AIC's penalty, so it is used in DataShop for sorting models.

Additional statistical information about a KC model can be found on the Model Values page (Learning Curve > Model Values), which is documented here.

Creating a new KC model

Step 1: Export an existing model or blank template

  • To get started, click Export at the top of the KC Models page.
  • Select one or more existing KC models to use as a template for the new one, or choose "(new only)" to download a blank template.
  • Click the Export button to download your file.

Step 2: Edit the KC model file in Excel or other text-file/spreadsheet editor

  • Define the KC model by filling in the cells in the column KC (model_name), replacing "model_name" with a name for your new model.
  • Assign multiple KCs to a step by adding additional KC (model_name) columns, placing one KC in each column. Replace "model_name" with the same model name you used for your new model; you will have multiple columns with the same header.
  • Add additional KC models by creating a new KC (model_name) column for each KC model, replacing "model_name" with the name of your new model.
  • Delete any KC model columns that duplicate existing KC models already in the dataset (unless you want to overwrite these).
  • Do not change the values or headers of any other columns.

Step 3: Import a KC model file

  • Start the import process by clicking Import at the top of the KC Models page.
  • Click Choose File to browse for the KC model file you edited.
  • Click Verify to start file verification. If errors are found in your file, fix them and re-verify the file. When DataShop successfully verifies the file, you can then import it by clicking the Import button.

Columns of a KC model export

The KC model export is most similar to a student-step export except that it aggregates data across students for each step. Some columns in the KC model export are described in Export. Those not covered are described in the table below.

Column Description
Step ID A unique step identifier used for importing a KC model into DataShop. This column must remain intact for a KC Model import to work.
Max problem view The maximum number of times the problem was viewed for the step. Note that problem view increases regardless of whether or not the step was encountered in previous problem views. For example, a step can have a "Max problem view" of "3", indicating the problem was viewed three times by a single student (the most of any student), but that same step need not have been encountered by that student in all instances of the problem.
Avg Incorrects The average number of incorrect attempts for this step.
Avg Hints The average number of hint requests for this step.
Avg Corrects The average number of correct attempts for this step.
% First Attempt Incorrects The percentage of first attempts that were incorrect attempts.
% First Attempt Hints The percentage of first attempts that were hint requests.
% First Attempt Corrects The percentage of first attempts that were correct attempts.
Avg Step Duration Average step duration.
Avg Correct Step Duration Average correct step duration.
Avg Error Step Time Average error step duration.
Total Students The count of distinct students who worked on this step.
Total Opportunities The total number of times students encountered this step. Multiple encounters by a single student are counted as distinct opportunities. For example, if a Student A encountered Step X two times (possibly from separate instances of the same problem) and Student B encountered the same step once, the "Total Opportunities" for Step X would be "3".