Sample Selector

Sample Selector is a tool for creating and editing samples, or groups of data you compare across—they're not "samples" in the statistical sense, but more like filters.

By default, a single sample exists: "All Data". With the Sample Selector, you can create new samples to organize your data.

You can use samples to:

A sample is composed of one or more filters, specific conditions that narrow down your sample.

Creating a sample

The general process for creating a sample is to:

The effect of multiple filters

DataShop interprets each filter after the first as an additional restriction on the data that is included in the sample. This is also known as a logical "AND". You can see the results of multiple filters in the sample preview as soon as all filters are "saved".

Samples

A sample is a proper subset of a dataset and is composed of one or more filters, specific conditions that narrow down your sample. A new dataset can be created from an existing sample via the Samples page in the Dataset Info report. Creating a dataset from an existing sample will be place the new dataset into the same project as the source dataset, thus, inheriting the same permissions, IRB attributes, Principal Investigator, and Data Provider.

You can use samples to:

  • Compare across conditions
  • Narrow the scope of data analysis to a specific time range, set of students, problem category, or unit of a curriculum (for example)

Creating a new dataset from an existing sample

A new dataset can be created from an existing sample if the user has permissions to do so. Requesting these permissions can be done by contacting DataShop directly or via the web request form. To create a dataset, go to the Samples subtab of the Dataset Info page and click on the Save as Dataset icon Save next to any sample. Creating a dataset from an existing sample will place the new dataset into the same project as the source dataset, thus, inheriting the same permissions, IRB attributes, Principal Investigator, and Data Provider as the parent project.

The general process for creating a new dataset from an existing sample is to:

  • Choose a unique name for the new dataset
  • Decide whether or not to include user-created KC models in your new dataset. If you choose to include them, they will be copied to the new dataset. If you choose to exclude them, your new dataset will still contain the 'default' KC model, if one was included in the original data.
  • Save the Dataset
  • Your new dataset will be added to the Import Queue. The system will send an email once the new dataset has been loaded.

Samples used to create new datasets are limited to 250,000 transactions. Please contact DataShop help (datashop-help@lists.andrew.cmu.edu) if you feel that you require a larger limit for this feature.

Creating a new sample

The general process for creating a sample is to:

  • Click the edit sample icon Edit next to the All Data sample.
  • Choose a unique sample name.
  • Add or modify an existing filter to select the subset of data you're interested in, saving the filter when done.
  • View the sample preview table to see the effect of adding your filter, making sure you don't have an empty set (ie, a filter or combination of filters that exclude all transactions).
  • Decide whether to share the sample with others who can view the dataset
  • Save as New

Modifying an existing sample

The general process for modifying a sample is to:

  • Click the edit sample icon Edit next to the desired sample.
  • Choose a unique sample name.
  • Add or modify an existing filter to select the subset of data you're interested in, saving the filter when done.
  • View the sample preview table to see the effect of adding your filter, making sure you don't have an empty set (ie, a filter or combination of filters that exclude all transactions).
  • Decide whether to share the sample with others who can view the dataset
  • Save the sample

Deleting a sample Delete

Once a sample has been deleted, it cannot be recovered.

The effect of multiple filters on samples

DataShop interprets each filter after the first as an additional restriction on the data that is included in the sample. This is also known as a logical "AND". You can see the results of multiple filters in the sample preview as soon as all filters are "saved".