The Dataset Info report provides both an overview and context for
the currently selected dataset. It may answer questions such as how and
when were these data collected? What's the scope of the dataset? If the
owner of the dataset chose to provide other information about the
dataset, that is displayed here as well.
The main components of the Dataset Info report are:
The dataset overview table is presented at the top of the Dataset
Info report. This report loads by default when you select a dataset to
browse. If you don't see the overview, click the main tab titled
Dataset Info followed by the Overview
In the Overview table, a number of fields describe the dataset's
characteristics. If you are a project admin for this project, you can edit some of
the fields in the Overview table—click a field to edit it.
The overview fields are:
A collection of datasets with a principal investigator and a data provider (who
is often the same as the principal investigator). We consider a project to be a
title for your research (e.g., Perceptual Fluency in Geometry Achievement).
It might be similar to the title of a grant proposal, or some other phrase that
identifies your work. To change the project name, contact
Defined at the project level, this is the person who, along with the data provider,
determines who has access to the project. To change the principal investigator,
Defined at the project level, the data provider is a person responsible for providing
a dataset to DataShop. He or she, with the agreement of CMU legal, may specify whether a
same person for both the data provider and principal investigator fields; in this case,
data provider is not shown. The data provider, along with the principal investigator,
determines who has access to the project. To change the data provider,
Used to describe the curriculum in which these data were collected (e.g., Algebra
The date range(s) for when these data were collected. This can be determined from
the log data by pressing the auto-set button.
The Domain/LearnLab group to which this dataset belongs (e.g., Language/Chinese or
The title of the tutor software used to collect data (e.g., Algebra 1 2005
or CTAT 2.7)
A description of the dataset. This can include links to outside resources. It can
be helpful to enter as much contextual information here as possible so that other
researchers can attempt to make sense of the dataset. This is especially true if the
dataset is part of a public project.
Has Study Data
Whether or not the dataset contains data that are the result of a research study
The hypothesis that was tested. Only displayed if "Has Study Data" is "yes".
The status of the dataset (one of on-going, complete, files-only,
or other ).
The school(s) where these data were collected.
Acknowledgment for Secondary Analysis
Acknowledgement that a researcher should include in a publication if they use this
dataset for their research. The acknowledgement, if entered, is shown on the Citation page
and in a text file included with each export.
Preferred Citation for Secondary Analysis
Citation that a researcher should include in a publication if they use this
dataset for their research. The citation, if entered, is shown on the Citation page
and in a text file included with each export. A citation must be for a paper attached
to the dataset.
Any additional information about the dataset.
The statistics table, described below, is generated from the data
and is therefore not editable.
Number of Students
The total number of students for which there is data.
Number of Unique Steps
The number of unique steps in
the dataset, where uniqueness is defined as a step within a specific problem hierarchy (the curriculum
location where the problem appears). The same step attempted by two students equals
only one unique step.
Total Number of Steps
The number of steps in the dataset, where each student-step counts as one step.
The same step attempted by two students equals two steps in the total number of steps.
For example, if problem A has steps S1, S2, and S3, and student A does S1 and S2 while
student B does S2 and S3, and there is just that problem in the dataset, then there are
3 unique steps and 4 total steps.
Sample Selector is a tool for creating and editing
samples, or groups of data you compare across—they're
not "samples" in the statistical sense, but more like filters.
By default, a single sample exists: "All Data". With the Sample
Selector, you can create new samples to organize your data.
You can use samples to:
Compare across conditions
Narrow the scope of data analysis to a specific time range,
set of students, problem category, or unit of a curriculum (for example)
A sample is composed of one or more filters, specific
conditions that narrow down your sample.
Creating a sample
The general process for creating a sample is to:
Add a filter from the categories at the left to the composition
area at the right
Modify the filter to select the subset of data you're interested
in, saving it when done
View the sample preview table to see the effect of adding your filter,
making sure you don't have an empty set (ie, a filter or combination
of filters that exclude all transactions).
Name and describe the sample
Decide whether to share the sample with others who can view the
Save the sample
The effect of multiple filters
DataShop interprets each filter after the first as an additional
restriction on the data that is included in the sample. This is also known
as a logical "AND". You can see the results of multiple filters in the
sample preview as soon as all filters are "saved".