DataShop > Help > Getting Data In

Importing New Data

Upload process
Creating a new project
Import process

In addition to logging data directly to the DataShop logging database, you can import data to create a new dataset in DataShop. To begin the import process, upload a new dataset from the Upload a dataset page. If you've never uploaded a dataset before, DataShop will prompt you to request permission to do so.

You can upload a file in either of the following formats:

Import file format (tab-delimited)

Import file format (XML)

Before uploading your file(s), you should verify that it meets the format requirements.

Upload process

Specify the project that should contain this new dataset. You can choose an existing project, specify a new one, or choose one later.
Name your dataset. The name must be unique amongst datasets that have been loaded already. You can change the dataset name later.
(Optional) Add transaction data—see Import Process below.

Transaction data

On the upload page, you will be asked to specify whether you want to upload transaction data. Transaction data is data in either of the above two formats. If you want to create a dataset that will hold file attachments (of any format), or if you want to create the dataset as a placeholder and add transaction data later, choose No transaction data now.

De-identification requirements

Data uploaded to DataShop must be de-identified. That is, the identity of human subjects referenced in the data must not be discoverable.

If your file is entirely de-identified, choose the first option, I certify that all data in this file including the content of the "Anon Student Id" column is de-identified.

If your file is de-identified except for the identifiers present in the Anon Student Id column, select the second option, I certify that all data in this file except the content of the "Anon Student Id" column is de-identified. DataShop will de-identify that column for you, substituting the identifiers in that column with anonymous ones. (You can later obtain a mapping from DataShop identifiers to the original identifiers by emailing us.)

Creating a new project

A project is primarily a container for a group of related datasets. In addition, access to datasets is granted by project. You can create a new project from the upload page or the Create a project page. When specifying a new project, you will be asked to specify a data collection type. Those options are described on our IRB page.

Import process

The import process is as follows:

Upload one or more files (as a .ZIP file) to be imported as a dataset.
DataShop will perform a quick verification of the file's first 100 lines and display the results.* You will need to correct any errors that are found. If any potential issues are found, you will be asked to decide if you want to continue.
After the initial verification completes, the dataset will appear in your Import Queue as Queued for Verification, where a separate process will verify the dataset in its entirety.*
When verification is complete, you will receive an email with the verification results. The status for your dataset will update in your Import Queue. When your dataset is loaded, you will be notified via email.
After your dataset is loaded, we ask that you examine the dataset and then release it. When you release a dataset, it inherits the permissions of its project (those who can access the project can then access this dataset) and becomes visible in the main index of datasets.

* Tab-delimited files only. XML files are verified by DataShop staff.

Sample Selector

Creating a sample

The effect of multiple filters

DataShop @CMU

Table of Contents