About / Frequently Asked Questions (FAQ)

Updated May 29, 2012

Table of Contents

What is DataShop?
The Pittsburgh Science of Learning Center (PSLC) DataShop is the world's preeminent central repository for data on the interactions between students and educational software and a suite of tools to analyze that data. It provides secure data storage as well as an array of exploratory analysis and visualization tools available through a web-based interface.
How do I access DataShop?
DataShop access is free. You can access DataShop by going to: http://pslcdatashop.web.cmu.edu. DataShop supports both InCommon and Google SSO. On the login page you can choose to authenticate with either your university or Google account. After authenticating, if you've never logged in to DataShop you will be asked to create a free account. No information we collect will be distributed to third parties. For more information on accessing DataShop, see our help topic on the subject.
What are the capabilities of DataShop?
DataShop can store many types of data associated with online courses and learning-science studies. The analysis and visualization tools are particularly well-suited for click-stream data from interactive learning environments such as intelligent tutoring systems and virtual labs. In addition, you can store related publications, files, presentations, or electronic artifacts.
What can DataShop do for me?
DataShop facilitates data representation and collection, and exploratory analysis. Toward collecting data in a uniform format, we have developed a standard XML logging format and two logging libraries (one in Flash ActionScript, the other in Java) to write this XML. Data can also be imported using a similar tab-delimited format. After importing data or logging the data to the DataShop database, the DataShop web application can help you start exploratory data analysis with tools for common learning science analyses. You can also export data for further manipulation and analysis in other tools.

Researchers have utilized DataShop to explore learning issues in a variety of educational domains. These include, but are not limited to, collaborative problem solving in Algebra (Rummel, Spada, Diziol, 2007), self-explanation in Physics (Hausmann & VanLehn , 2007), the effectiveness of worked examples and polite language in a Stoichiometry tutor (McLaren, Lim, Yaron, & Koedinger, 2007) and the optimization of knowledge component learning in Chinese (Pavlik, Presson, & Koedinger , 2007).

I want to do x. What dataset should I use?
Contact us with some information about your goals and we'll do our best to recommend a dataset. If your goal is to just explore DataShop, see the list of recommended datasets at the top of the dataset list after logging in.
What statistical support is available in DataShop?
The statistical support directly available in the DataShop is limited to statistics on learning curves and knowledge component models. However, you can export the data to a file and use your favorite statistical software package.

Datashop integrates the results of the AFM (Additive Factor Model) algorithm, a logistic regression performed over the “error rate” learning curve data. The AFM logistic regression, a standard regression bounded between 0 and 1, attempts to find the best-fit curve for error-rate data, which also ranges between 0 and 1. The results of this model are shown as the "predicted learning curve" on each line graph of student error rate. The predicted learning curves are the average predicted error of a skill over each of the learning opportunities.

The Model Values page in DataShop presents a quantitative analysis of how well, given the selected knowledge component model, the AFM statistical model fits the data (via AIC, BIC, log likelihood) and how well it might generalize to an independent dataset from the same tutor (via cross validation RMSE).

For more on the Additive Factor Model, see Is Over Practice Necessary? Improving Learning Efficiency with the Cognitive Tutor through Educational Data Mining (Cen, Koedinger, and Junker 2007).

What format is the data in? In what format can I get the data out?
DataShop accepts data according to the Tutor Message format. Data can come in as XML or tab-delimited text. Once processed, the data is stored in a relational database. Data can be exported to a tab-delimited text file.
What kind of data gets logged?
Primarily, DataShop stores data on learner interactions with online course and study materials that include intelligent tutors, virtual labs, simulations, and games. We have plans of storing more types of data (e.g., audio and video data, writing samples) in the future.

Data is collected from the seven PSLC courses (Algebra, Chemistry, Chinese, English, French, Geometry and Physics) and various studies. There are also sources external to the PSLC that contribute to DataShop, such as middle school math data from the Assistment project at WPI.

How do I get my data into DataShop?
The best method for getting your data into DataShop depends on the state of your project.

If you are developing a course or study and have not yet collected data, then you probably want to log student-tutor transactions as they occur. The page Logging New Data describes a number of scenarios where you would log data from your course or study to DataShop.

If you are interested in storing and viewing data from a course or study that has occurred in the past, then you probably want to import the existing data. The page Importing New Data describes the two main types of data that can be imported into DataShop: XML files and tab-delimited text files.

Can I use DataShop data for my own research purpose?
You do not need permission to view or use public data sets; they are freely accessible to any researcher in the world. For private data sets, if you are the PI or have permission from the PI, you may examine the data sets and use them in your own research.

To gain access to private data sets, first create an account (see “How do I access DataShop?” above), then visit the Other Datasets tab and click the “Request Access” button next to the name of the project you would like to access. In the dialog that appears, enter a brief reason for why you would like access. The request for access will be sent to the project's principal investigator and data provider (if one exists). The status of your request will be shown on the Access Requests page. Any projects for which you have been given access will appear on the My Datasets page.

If you're not sure what data you need, please contact us and we'll do our best to help.

I ran a LearnLab study. Who has access to my data, and how do I control access?
The principal investigator of a LearnLab study has full control over his/her own data. With a new data set, only the PI has access to the data. We might not know you're the PI, so please tell us! Other users of DataShop may request access to your data set; it's up to you who receives access. A user can have view access to the data set, or edit access, which allows the changing of data set metadata and adding or removing papers and files.
How do I get or create custom queries, analyses, or reports?
If you have a general feature or change in mind, we encourage you to contact us. In the past, a number of reports and modifications to DataShop have started this way. If the analysis is specific to your project and unlikely to benefit others, however, you might be better off exporting the data from DataShop and performing the analysis in another program such as SPSS, R, or Excel. (For instance, many kinds of reports can be generated from Excel if you know how to use features like Pivot Tables and Auto Filter.) The line between these two categories of analyses isn't always clear, so don't hesitate to start a dialogue with us regarding your needs.
What is the time frame between completing a study and getting data in/from DataShop?
The time frame varies depending primarily on the source of the data. Tutors which log directly to the PSLC server are moved into the DataShop’s database daily. For this reason, we encourage you to develop tutors using CTAT, which can log data to the PSLC server for you.

Tutors which produce log data but do not log directly to the PSLC server, such as Andes (Physics LearnLab) or the Carnegie Learning Cognitive Tutors (Algebra and Geometry LearnLabs) must go through a collection and conversion process. The length of this process depends on the availability of the personnel to collect and anonymize the data, as well as the state of the program needed to run the conversion. Also note that conversion of extremely large datasets can add time.

If you need a dataset urgently, please contact us and put "urgent" in the subject of the email.

What restrictions are there on publishing about another researcher's data?
As long as proper IRB rules and guidelines have been followed and you have access to the data through DataShop, you may publish an analysis you have conducted on another researcher's data. You must acknowledge the source of the data in your publication. Additional information is available on our Citing Datashop and Datasets help page.
What is the relationship between DataShop, Cognitive Tutor Authoring Tools (CTAT), and the Open Learning Initiative (OLI)?
The three projects—DataShop, CTAT, and OLI—are often in communication with one another and in some cases build on each other's technology. CTAT is a research project at CMU that creates tools for building intelligent tutors. OLI, also a CMU project, researches and builds open and free online courses. In short, any tutor created with CTAT has logging functionality built-in and can create data in the format DataShop accepts, so we often recommend you use CTAT if you're developing a new intelligent tutor or application. CTAT tutors can log directly to DataShop, decreasing the amount of time between when your students use the tutors and when you can view your data in DataShop.
I'm testing a CTAT tutor that should be logging but I don't see any log data in DataShop. Why not?
Although troubleshooting depends on lots of specifics, here are some general things to check:
  • Is logging turned off? Can you confirm it's explicitly turned on?
  • Is the log server (the location that should be receiving logs from CTAT) set to the same server as the one you're looking at via DataShop? Note that DataShop runs on two separate servers, QA and production. Each of these servers has to run a log conversion process before data will be available through its respective DataShop web application. On QA, this process is at 2am and 2pm daily; on production, the process runs at 3am daily.
  • Have you set a dataset name to go along with the logs? If you haven't, your data will fall into an "Unclassified" bucket, making it hard to find your log data.
  • Are you logging to disk? If so, we need to obtain the log files and import them. We might not know about your project or be aware of your schedule, so please ask us about your data.

For troubleshooting logging from CTAT tutors, see a few pages on the CTAT website: Troubleshooting logging from Flash tutors and Logging from Java. Also, don't hesitate to contact the CTAT team.

Where can I get more help?
DataShop documentation is online at http://pslcdatashop.web.cmu.edu/help

You can also subscribe to the DataShop users email list or email the DataShop team.