News Archive - 2009
Tuesday, 15 December 2009
Book Chapter in Press
Koedinger, K.R., Baker, R.S.J.d., Cunningham, K., Skogsholm, A., Leber, B., Stamper, J. (in press) A Data Repository for the EDM commuity: The PSLC DataShop. To appear in Romero, C., Ventura, S., Pechenizkiy, M., Baker, R.S.J.d. (Eds.) Handbook of Educational Data Mining. Boca Raton, FL: CRC Press.
Pre-prints are available upon request.
Monday, 20 November 2009
DataShop User Meeting
Please join the DataShop Team for our upcoming user meeting, to be held on Monday, 7 December 2009 in Gates Center Room 6115, Carnegie Mellon Campus, Pittsburgh, PA.
Below you will find the finalized schedule of events for the meeting:
9:30 - 10:00 am | Breakfast, Meet and Greet | |
10:00 - 10:15 am | DataShop Accomplishments to Date (John Stamper) | |
10:15 - 11:00 am | Improvements since 3.0 Release (Brett Leber) | |
11:00 - 11:15 am | Break | |
11:15 - 12 noon | Future of DataShop (John Stamper) | |
12:00 - 1:30 pm | Lunch and Poster Session | |
1:30 - 2:15 pm | Advanced Methods to Discover Cognitive Models (Geoff Gordon or John Stamper) | |
2:15 - 3:00 pm | Detecting Metacognitive States in the Data / Exploring Motivation using Data Mining (Ryan Baker and Ben Shih) | |
3:00 - 3:15 pm | Break | |
3:15 - 4:00 pm | Analyzing Patterns in Student Errors (Ken Koedinger) | |
4:00 pm | Closing Remarks, Survey and Feedback |
Download a copy of the schedule
Friday, 23 October 2009
DataShop v3.6 Released
Performance improvements, a new report for checking a study's logging activity, and the start of DataShop web services
Today, we are rolling out some fairly big changes to DataShop, all requested by researchers. One is an improvement under the hood that will affect how fast DataShop generates the samples you create or modify (and how fast logged data is made available to the web application in general). Another change is a new report to help you (the researcher or programmer) tell if the tutors in your study or course are logging. We call this new page "Logging Activity" as it gives you an overview of all logging activity on the production log server. We've also taken some big first steps for introducing DataShop web services, which enable you to query DataShop and retrieve data programmatically.
Logging Activity
How do you know if your course or study is logging? You might ask us to verify that DataShop is receiving log data from your study site, but we rarely know how much data is "enough". That approach also requires us to be in the loop, which isn't scalable. A better solution would be to get some diagnostics directly from DataShop. You might try your tutor before students use it, verify the data is being received by DataShop, and then also monitor the data-collection progress of your study or course as it progresses.
We created a new page for this purpose. It shows you recent logging activity at the logging server end--it displays counts of all recent log messages we received, organized by dataset and student session.
As we're not 100% sure how this page will be used or how its use will affect server performance, we're asking that you first click a button to request access to the report. (We've given many of you access already.) Try it out and tell us what you think.
Web Services
The goal of DataShop web services is to provide a way for researchers with a background in programming to enable their program or web site to retrieve DataShop data and (eventually) insert data back to the central repository. We've created the start of such a service--right now, the service allows you to authenticate with DataShop, and retrieve metadata about datasets and samples in DataShop. Coming next will be the ability to retrieve transaction and step-level data.
The service follows the REST guidelines, which means that requests to web services are done over HTTP using URLs that represent resources.
It's a work in progress, and documentation will be available here in the next few days
As with Logging Activity, we're asking that you first request access before using web services. Once we grant you access, you'll be able to retrieve access credentials for making web-service requests.
Wednesday, 8 September 2009
DataShop v3.5 Released: Changes to measures of latency
In this release of DataShop, we've made some significant changes to latency curves and and how DataShop determines latency.
A "latency curve", as we defined it, is a type of learning curve that graphs a duration of time at each opportunity to learn a knowledge component. When we first implemented latency learning curves in June 2008, we introduced two dependent variables of latency, "Assistance Time" and "Correct Step Time", which are essentially the time it took a student to reach a correct attempt on a step, and the time it took a student to reach a correct attempt when no errors preceded that correct attempt, respectively. Based on some researcher feedback, we learned a few things about these measures: 1) the names of the variables were confusing, and 2) we were measuring latency as the time between two events, regardless of what happened in between the two events (such as a student working on other steps).
To address these issues, we started by making our measures of latency more precise. To do this, we began by calculating a duration for every transaction. This enabled us to determine a step's duration by summing the durations of transactions that were toward that step, ignoring the rest. You can see the results of these changes in the latency curves and in the new "duration" column in the transaction export.
After making these changes, we renamed the variables. "Assistance Time" is now "Step Duration" since it's really the time spent on the step regardless of assistance sought. "Correct Step Time" became "Correct Step Duration", and we added another variable, "Error Step Duration". To simplify things, "Correct Step Duration" is now just the step duration when the first attempt was correct; "Error Step Duration" is just the step duration when the first attempt was an error. We propogated these changes through the learning curves and student-step rollup.
We hope these changes are useful for you in exploring data, and save you time when doing latency-based analyses outside of DataShop. We encourage you to explore the new learning curves and changes to the various table formats, and tell us what you think.
- History of changes to the student-step rollup and transaction formats
- See an example of how we now calculate Step Duration and Correct Step Duration
Wednesday, 24 June 2009
DataShop v3.4 Released
Ah, another summer another release. The one you've been waiting for.
- Exporting new samples is faster than ever
- Learning Curve Point Info Details
You can now drill down on the point of a learning curve to find out what problems, steps, students or knowledge components make up that point. Take a look at our 30-second video that demonstrates this new feature:
Exporting by transaction for a brand new, very large sample is no longer slow like we told you before. For example, a sample with 344,530 transactions took 2 hours to export before but now it takes only 7 minutes! That's 17 times faster.
Tuesday, 19 May 2009
Case Studies and EDM paper
On a new Case Studies page, you'll find stories of DataShop use—what were some research goals and how was DataShop used to approach them? The first, presented as a 7-minute video, illustrates the use of DataShop to perform exploratory analysis of DataShop data, generate a theory for optimizing a cognitive model, and test that theory both visually and statistically within DataShop. We hope these are helpful to others who have used DataShop, are considering using DataShop, or want to learn more about the project.
We also created a publications page and posted our Educational Data Mining 2008 paper there. We think it serves as a good introduction to the project and web application.
Friday, 1 May 2009
New DataShop FAQ
It turned out we had an out-of-date FAQ on learnlab.org and a few other sources of similar information, so we've revised and combined them into this new FAQ. Going forward, we'll keep this one updated with answers to real frequently asked questions.
Thursday, 2 April 2009
DataShop v3.3 Released
We have good news for you! There is a new version of DataShop that does two things faster:
- Creating new samples
- Exporting by transaction
You can now create samples faster than before, and the bugs associated with creating big samples on large datasets have been fixed as well.
Also, we are caching the transaction export files before you ask for them to make the download of these files "infinitely" faster than before. For example, the 'Algebra I 2006-2007 (6 schools)' dataset, which has 5.4 million transactions, has been cached.
But we still have work to do: exporting by transaction for a brand new, very large sample is still slow -- for example, a sample with 344,530 transactions takes 16 minutes to create and 2 hours to export by transaction. For samples larger than this, you should opt to wait a day to retrieve it so that DataShop has time to cache it, or contact us if it's urgent. The fix for this problem is coming in our next release in June.
Monday, 16 February 2009
DataShop v3.2 Released
We released DataShop 3.2 this afternoon, which introduces a faster way to cache the Transaction Export files. We are optimistic that we'll finally be able to cache them all now.
We also fixed bugs, but there are still some known issues.