Sample Selector is a tool for creating and editing samples, or groups of data you compare across—they're not "samples" in the statistical sense, but more like filters.

By default, a single sample exists: "All Data". With the Sample Selector, you can create new samples to organize your data.

You can use samples to:

A sample is composed of one or more filters, specific conditions that narrow down your sample.

Creating a sample

The general process for creating a sample is to:

The effect of multiple filters

DataShop interprets each filter after the first as an additional restriction on the data that is included in the sample. This is also known as a logical "AND". You can see the results of multiple filters in the sample preview as soon as all filters are "saved".


Dataset Info

The Dataset Info report provides both an overview and context for the currently selected dataset. It may answer questions such as how and when were these data collected? What's the scope of the dataset? If the owner of the dataset chose to provide other information about the dataset, that is displayed here as well.

The main components of the Dataset Info report are:

Dataset Overview and Statistics

The dataset overview table is presented at the top of the Dataset Info report. This report loads by default when you select a dataset to browse. If you don't see the overview, click the main tab titled Dataset Info followed by the Overview link below.

In the Overview table, a number of fields describe the dataset's characteristics. If you are a project admin for this project, you can edit some of the fields in the Overview table—click a field to edit it.

The overview fields are:

Category Description
Project A collection of datasets with a principal investigator and a data provider (who is often the same as the principal investigator). We consider a project to be a title for your research (e.g., Perceptual Fluency in Geometry Achievement). It might be similar to the title of a grant proposal, or some other phrase that identifies your work. To change the project name, contact us.
Principal Investigator Defined at the project level, this is the person who, along with the data provider, determines who has access to the project. To change the principal investigator, contact us.
Data Provider Defined at the project level, the data provider is a person responsible for providing a dataset to DataShop. He or she, with the agreement of CMU legal, may specify whether a project-specific terms of use should apply to a project. Most datasets in DataShop use the same person for both the data provider and principal investigator fields; in this case, data provider is not shown. The data provider, along with the principal investigator, determines who has access to the project. To change the data provider, contact us.
Curriculum Used to describe the curriculum in which these data were collected (e.g., Algebra I).
Dates The date range(s) for when these data were collected. This can be determined from the log data by pressing the auto-set button.
Domain/LearnLab The Domain/LearnLab group to which this dataset belongs (e.g., Language/Chinese or Math/Algebra).
Tutor The title of the tutor software used to collect data (e.g., Algebra 1 2005 or CTAT 2.7)
Description A description of the dataset. This can include links to outside resources. It can be helpful to enter as much contextual information here as possible so that other researchers can attempt to make sense of the dataset. This is especially true if the dataset is part of a public project.
Has Study Data Whether or not the dataset contains data that are the result of a research study or experiment.
Hypothesis The hypothesis that was tested. Only displayed if "Has Study Data" is "yes".
Status The status of the dataset (one of on-going, complete, files-only, or other ).
School(s) The school(s) where these data were collected.
Acknowledgment for Secondary Analysis Acknowledgement that a researcher should include in a publication if they use this dataset for their research. The acknowledgement, if entered, is shown on the Citation page and in a text file included with each export.
Preferred Citation for Secondary Analysis Citation that a researcher should include in a publication if they use this dataset for their research. The citation, if entered, is shown on the Citation page and in a text file included with each export. A citation must be for a paper attached to the dataset.
Additional Notes Any additional information about the dataset.

The statistics table, described below, is generated from the data and is therefore not editable.

Category Description
Number of Students The total number of students for which there is data.
Number of Unique Steps The number of unique steps in the dataset, where uniqueness is defined as a step within a specific problem hierarchy (the curriculum location where the problem appears). The same step attempted by two students equals only one unique step.
Total Number of Steps The number of steps in the dataset, where each student-step counts as one step. The same step attempted by two students equals two steps in the total number of steps. For example, if problem A has steps S1, S2, and S3, and student A does S1 and S2 while student B does S2 and S3, and there is just that problem in the dataset, then there are 3 unique steps and 4 total steps.
Total Number of Transactions The total number of transactions in the dataset.
Total Student Hours The number of hours of student activity in the dataset, represented by the sum of the duration of all student transactions in the dataset.
Knowledge Component Model(s) The knowledge component models for this dataset (e.g., Default, Manual-Model). The number of unique knowledge components in the model is displayed following each model listed.
