DataShop @CMUa data analysis service for the learning science community
This page provides both an overview and context for the current dataset. It may answer questions such as:
If you are a project admin for this project, you can edit some of the fields in the Overview table—click a field to edit it. You can help other researchers by describing the dataset and the context in which it was created.
Have you or someone you know published about these data? Attach a paper to this dataset on the Files tab.
You can gauge the size of the dataset by looking at the numbers in the Statistics table, particularly the Total Number of Students, Transactions, and Student Hours.
The Knowledge Component Models, or step-to-knowledge-component mappings, are listed at the bottom of the table. If you see a few Knowledge Component Models listed, researchers have likely thought about different ways of attributing skills to steps, and potentially new ways of categorizing knowledge in this domain. You can learn more about these models and create new ones by clicking the KC Models subtab.Read more about Dataset Info / Overview
A sample is a proper subset of a dataset and is composed of one or more filters, specific conditions that narrow down your sample. This page lists samples shared by others, as well as those owned by you.
You can use samples to:
A new dataset can be created from an existing sample by clicking on the Save as Dataset icon next to a sample. Creating a dataset from an existing sample will place the new dataset into the same project as the source dataset, thus, inheriting the same permissions, IRB attributes, Principal Investigator, and Data Provider as the parent project.
The general process for creating a new dataset from an existing sample is to:
The general process for creating a sample is to:
The general process for modifying a sample is to:
Once a sample has been deleted, it cannot be recovered.
DataShop interprets each filter after the first as an additional restriction on the data that is included in the sample. This is also known as a logical "AND". You can see the results of multiple filters in the sample preview as soon as all filters are "saved".
A KC (Knowledge Component) model is a mapping between steps and knowledge components. In DataShop, each unique step can map to zero or more knowledge components.
From the KC Models page, you can compare existing KC models, export an existing model or template for creating a new KC model, or import a new model that you've created.
On the KC Models page, each model is described by:
Models are grouped by the number of observations, sorted in ascending order. The secondary sort defaults to AIC (lowest to highest, or best fit with fewest parameters to worst fit or additional parameters) and then model name.
One general goal of KC modeling is to determine the "best" model for representing knowledge by fitting the model to the data. The "best" model would not only account for most of the data—it would have the highest number of observations labeled with KCs—and fit the data well, but it would do so with fewest parameters (KCs). The BIC value that DataShop calculates tells you how well the model fits the data (lower values are better), and it penalizes the models for overfitting (having additional parameters). This penalty for having additional parameters is stronger than AIC's penalty, so it is used in DataShop for sorting models.
A primary reason for creating a new KC model is that an existing model is insufficient in some way—it may model some knowledge components too coarsely, producing learning curves that spike or dip, or it may be too fine-grained (too many knowledge components), producing curves that end after one or two opportunities. Or perhaps the model fails to model the domain sufficiently or with the right terminology. In any case, you may find value in creating a new KC model.
By importing the resulting KC model that you created back into DataShop, you can use DataShop tools to assess your new model. Most reports in DataShop support analysis by knowledge component model, while some currently support comparing values from two KC models simultaneously—see the predicted values on the error rate Learning Curve, for example. We plan to create new features in DataShop that support more direct knowledge component model comparison.
DataShop creates two knowledge component models in addition to the model that was logged or imported when the dataset was created:
A custom field is a new column you define for annotating transaction data. DataShop currently supports adding and modifying custom fields at the transaction level.
You can add or modify a custom field's metadata from this page, but to set the data in that custom field, you need to use web services, which is a way to interact with DataShop through a program you write. You can also add custom fields when logging or importing new data.
A custom field has an owner, the user who created it. Users who have edit or admin permission for a project can create custom fields for a dataset in it. Only the owner or a DataShop administrator can delete or modify the custom field. Only DataShop administrators can delete custom fields that were logged with the data.
The following fields describe a custom field:
A custom field value is classified as one or more of the following data types assigned internally by DataShop:
The Custom Fields page indicates the types of custom fields, what percentage of those custom fields fall into the aforementioned categories, and what percentage of transactions are associated with each custom field.
Caveat: Very large custom fields may cause unexpected behavior in some applications. Excel correctly handles exports with very large custom field values if you import the text from Excel. Other text editors may incorrectly wrap the text values if they become too large while programs like vim, jEdit, and Notepad++ correctly handle the maximum lengths. Additionally, when viewing custom fields in the web interface, the values are truncated to 255 characters to prevent issues with browsers. To get the full custom field value, use the transaction export feature.Read more about custom fields
The problem list page lists all problems in dataset, grouped by problem hierarchy, which is a unique hierarchy of curriculum levels containing the problem (e.g., a problem might be contained in a Unit A, Section B hierarchy).
This page is most useful for seeing which particular problems have problem content stored: any problem name shown as a hyperlink will link to the content that students saw when they interacted with that problem. You can also filter on problems with or without problem content, and search those lists.
Download all of the problem content associated with the dataset by clicking the Download Problem Content button. The format of the download is a single .zip file containing a hierarchy of .html and web content files (e.g., images, videos, audio). The exact hierarchy of this file differs depending on the source of the problem content.Read more about the Problem List
The Step List table lists and decomposes all of the problems in the dataset. It details the problem hierarchy (the unit, section, or other divisions that contain the problem) and composition (the steps that make up a problem).
A unique problem-solving step is shown on each row.
Export the step list table by clicking the Export button.Read more about the Step List
This page displays dataset-specific citation guidance. This information is taken from the Dataset Info fields "Acknowledgement for Secondary Analysis" and "Preferred Citation for Secondary Analysis", which are settable by researchers who have edit access to the dataset.
More general citation guidance is available at the link below.Read more about citing DataShop and datasets
The Problem Content tool allows admins to map problem content to datasets.
Problem content refers to a representation (text, images, html, etc.) of the content that students interacted with in the system that generated the dataset's data. Note that the word "problem" is used in the sense of any activity the user did that was named in the problem column of the data.
When problem content is mapped to a dataset and its problems, users can jump from DataShop reports to the problem content by clicking one of the "View Problem" buttons throughout the interface (often in tooltips on problem or step name), allowing them to better understand the activities that correspond with the data.
With problem content, you can:
Datasets with problem content are noted on the list of datasets with a problem content icon .
Please contact us, and we will consult with you on the format DataShop expects for problem content. For a faster solution, consider attaching files documenting your system on the Files tab of your dataset.
If you are a project admin for a dataset with problem content that has already been uploaded to the DataShop server, you can use the Problem Content page to map problem content to problems within the dataset. Select the Conversion Tool and Content Version to see a list of content items that can be mapped to the dataset, then click add to perform the mapping.
To see a list of all problems in dataset and which have problem content, or to download all problem content for a datset, visit the Problem List page.Read more about problem content