News Archive - 2010

Friday, 17 December 2010

DataShop 4.4 released - Cross validation, unique file name on export

Today's release of DataShop includes both features and bug fixes.

Cross Validation

We've added a cross validation metric to the "KC Models" and "Model Values" (formerly called "LFA Values") pages. In addition to AIC and BIC, cross validation is an additional metric for assessing how well the statistical model fits the data. In this case, the statistical model is the logistic regression that is run for each KC model in DataShop, and the goal might be to compare a variety of KC models. A lower root mean square error (RMSE) value indicates a better fit between a model's predictions and the observed data. You can read more about cross validation on our help page.

Unique file name on export

In the past, when you exported a dataset, you were prompted to save either a zip file named export.zip or a text file named export.txt, depending on the export format. This made managing more than one of these files an error-prone process. DataShop now names these files uniquely so it's possible to tell a lot more from the file name before you open it.

Here are some example file names:

  • ds5_tx_2010_1217_105717.zip—transaction export. Includes the dataset ID ("5"), the format ("tx" for transaction), and the current date and time ("1217_105717"). Within that zip file, you'll find a folder containing the dataset name, and another zip file within it containing the sample name and the time it was last cached.
  • ds5_student_step_2010_1217_105441.txt—student-step export. Similar to the transaction export file name, but not zipped.
  • ds123_kcm8112_2010_1011_093245.txt—KC model export. Includes the model ID ("8112").

Database merge tool

This is an internal tool but the results of us using it will positively impact many of you. Using this tool, we will be able to load Carnegie Learning datasets into DataShop much faster than before, resulting in less of a delay between the time a course or study completes and the time the data are available for analysis.

Remove the term "LFA" from the interface

When we used the "LFA" acronym in our interface (learning factors analysis), we really meant "AFM" (additive factor model). But acronyms are not great, so we've made some labeling changes: "LFA Values" is now "Model Values", and "LFA status" is now "Logistic regression model status". You won't find "LFA" in our help or interface, but you will see "AFM" in the help, which is the more accurate term for the name of the logistic regression model in DataShop that uses each KC model to generate predictions of student learning. More information on AFM can be found on our Model Values help page.

Other notable changes and bug fixes:

  • On the Model Values page, we now display 3 decimal places for KC Slope and do not truncate any of the values in the export file for that page.
  • Fixed a bug where the number of student hours on the Dataset Info page showed a negative number.
  • Web Services schema for KC models changed in the following ways :
    • added new number_of_parameters and log_likelihood elements, both of which will be omitted if the AFM model is unable to run
    • lfa_status is now called logistic_regression_model_status
    • added 4 new cross validation elements: cross_validation_status, cross_validation_rmse, cross_validation_number_of_observations, and cross_validation_number_of_parameters. If cross validation is unable to run on a KC model, the latter 3 elements will be omitted.
Posted by Alida at 4:57 PM

Tuesday, 23 November 2010

Introduction to DataShop Workshop

We are writing to invite you to our Introduction to PSLC DataShop Workshop.

When: Friday, December 10, 2010, 10am-1:30pm (lunch will be provided)
Where: CMU, Gates-Hillman Center (GHC) 6115

During this interactive half-day event, you'll have the opportunity to learn how you can use PSLC DataShop for exploratory data analysis, get an update on what we are developing for future releases, and talk with DataShop developers and other users/researchers. You'll also hear a DataShop case study from CMU Associate Teaching Professor of Psychology Marsha Lovett.

10:00 - 10:15 Introduction
10:15 - 11:00 DataShop Hands On
11:00 - 11:15 Break
11:15 - 12:00 Case study with Marsha Lovett
12:00 - 13:30 Q&A -- Lunch

Be sure to bring a laptop so that you can participate in the hands-on portion. If you already know how to use DataShop, please feel free to attend the case study and lunch. If you are interested in attending, please RSVP via the Doodle poll.

Posted by Alida at 11:07 AM

Wednesday, 9 September 2010

DataShop 4.3 - Metrics Report student hours bug fixed

Bug fix release.

This release consists of several bug fixes. See the fixed issues link below to find out more.

Posted by Alida at 8:57 AM

Thursday, 19 August 2010

DataShop 4.2 - New Metrics Report

Metrics Report

This release includes the new Metrics Report, which provides an overview of the quantity of data in DataShop, organized by domain and PSLC LearnLab.

All of the fields in this report are shown on the Dataset Info / Overview page for each dataset. From the Dataset Info / Overview and Papers and Files pages, you can set the Domain/Learnlab and add files or papers. The rest of the fields are calculated by examining the data contained in each dataset.

If a dataset does not have a Domain/LearnLab set for it, then it is excluded from this report.

fixed issues |  known issues

Posted by Alida at 2:08 PM

Tuesday, 25 May 2010

DataShop 4.1 - Bugs fixed

Bug fix release.

This release consists of several bug fixes. See the fixed issues link below to find out more.

fixed issues |  known issues

Posted by Jim at 1:00 PM

Wednesday, 17 February 2010

DataShop 4.0 Released

New DataShop Web Services features, plus more

There seemed to be a lot of interest in DataShop Web Services at the DataShop User Meeting in November. At the time of the meeting, we could only demo what was in development. We're now happy to release the services we previewed. We hope these two new features—Get Transactions and Get Student-Step Records—will make Web Services a useful approach for researchers who want to automate data retrieval and analysis.

Get Transactions

https://pslcdatashop.web.cmu.edu/services/datasets/[id]/[?samples/id]/transactions

  • Get a tab-delimited response (can be zipped as well) of transactions for a given dataset or sample and your request parameters
  • If a sample ID is not provided, transactions for the "All Data" sample will be returned.

Get Student-Step Records

https://pslcdatashop.web.cmu.edu/services/datasets/[id]/samples/[?id]/steps

  • Get a tab-delimited response (can be zipped as well) of student-step records for a given dataset or sample and your request parameters.
  • If a sample ID is not provided, student-step records for the "All Data" sample will be returned.

Learn more about these new services on the Web Services page.

We've also released the following tweaks and improvements:

  • Project announcements. On the home page that lists the datasets in DataShop, you'll see a small box with the title "Announcements" that shows recent news about the project, with links to the full news posts.
  • Learning curve point info "Obs" column. When clicking on points in a learning curve, you can now see the frequency of items going into the breakdown by KCs/Problems/Steps/Students. For example, before you could only tell that data for 13 steps contributed to an aggregate point in the learning curve, and you could see error rate values (for example) for each, but you didn't know how much each step contributed to the aggregate. Now, an "Obs" (Observation) column displays the frequency of each item in the aggregate, so you can tell which step is contributing most to that error rate.
  • "#" column header renamed to "Row, "Total # Hints" renamed to "Total Num Hints". In all of the export formats, the "#" symbol, which appeared in the column header of the first column to represent the number of the row, is now the text "Row". In the transaction export format, the column header "Total # Hints" is now "Total Num Hints". We made these changes because the "#" character is a comment character in analysis programs such as R, so directly opening a DataShop export file was problematic.
  • The DataShop import file verification tool was also changed to expect a column with the title "Row" instead of "#" and "Total Num Hints" instead of "Total # Hints". If you plan on importing data into DataShop, you will need to make these changes to your file(s).
  • Study "Condition" in student-step export. You'll now see a "Condition" column in the student-step rollup. This new column appears as the last column in the table. In the case of a student assigned to multiple conditions (factors in a factorial design), condition names are separated by a comma and space. This differs from the transaction format, which optionally has "Condition Name" and "Condition Type" columns.
  • Cached export file status. With the DataShop release in April 2009, we started caching transaction export files, resulting in less wait time and faster downloading of these files. Caching, however, is done on a sample-by-sample basis, and it wasn't clear from the DataShop interface which samples were cached or when they were created. We're now displaying a small table on the transaction export page that shows the cache status of each sample and when that cached file was created. This will tell you which samples can be downloaded most quickly and those that will take longer (but will be cached when you request them). The date and time of the cached file tells you the cutoff for data included in the file, useful if you're running a study that's logging to DataShop. To learn more about the various states of a cached export file, visit our help topic on exporting.

fixed issues |  known issues

Posted by Kyle at 14:30 PM

Archived news: 2011, 2010, 2009, 2008, 2007, 2006