Sample Selector

Sample Selector is a tool for creating and editing samples, or groups of data you compare across—they're not "samples" in the statistical sense, but more like filters.

By default, a single sample exists: "All Data". With the Sample Selector, you can create new samples to organize your data.

You can use samples to:

A sample is composed of one or more filters, specific conditions that narrow down your sample.

Creating a sample

The general process for creating a sample is to:

The effect of multiple filters

DataShop interprets each filter after the first as an additional restriction on the data that is included in the sample. This is also known as a logical "AND". You can see the results of multiple filters in the sample preview as soon as all filters are "saved".

R (software)

R is a language and free software environment for statistical computing and graphics. If you're familiar with R, you can easily open a DataShop student-step export or transaction export stored on your computer. The command to do so is shown below:

R> ds = read.delim("export.txt", header = TRUE, quote="\"", dec=".", fill = TRUE, comment.char="")

The above command reads the tab-delimited file and stores it in a "data frame" object called ds (you could use any variable name here). The function read.delim() is shorthand for read.table() with some default values for the various arguments to read.table(). This command works because the student-step and transaction export files have a form that R expects for data frames:

  • the first line of the file has a name for each variable in the data frame
  • each additional line of the file has as its first item a row label and the values for each variable.

You can graphically view the data frame with the command edit(ds).

Note: Each of the above commands loads data into R and implicitly tells R that all columns that appear as strings should be considered factors (for factor analysis). Although convenient, this can be problematic for some types of analysis. To avoid this issue, append the parameter stringsAsFactors="false" to one of the above commands. Then identify individual columns as factors using the R syntax dat$col = factor(dat$col), where dat is the data frame and col is the column name.

At this point, you can work with the data in R to perform any analysis you'd like.

Using R to replicate the AFM model

Using R notation, the AFM model (applied to a modified student-step export file called "ds") can be approximately* represented as:

R> L = length(ds$Anon.Student.Id)
R> success = vector(mode="numeric", length=L)
R> success[ds$First.Attempt=="correct"]=1
R> model1.lmer <- lmer(success~knowledge_component+
   knowledge_component:opportunity+(1|anon_student_id),data=ds,family=binomial())

Note: The success variable must be 0 or 1. The first three R commands simply convert the "First Attempt" values (in the student-step export) of "incorrect" and "hint" to 0, and "correct" to "1".

* The AFM code is different from the R expression above in two ways:

  1. To reduce over-fitting the data, AFM assumes learning cannot be negative and thus constraints the "slope" estimates of the knowledge_component:opportunity parameters to be greater or equal to 0.
  2. The optimization applies a penalty to estimates of the student parameters (anon_student_id) for deviating from 0—essentially treating anon_student_id as a random effect.

The above R code, with additional analyses, is available for download here.

For more about the AFM model, see the Model Values help page.

Using DataShop Web Services to access DataShop data in R

The following R commands show how you can access DataShop data via web services from R. Windows paths are shown in the example. A few prerequisites are needed for this script to run successfully:

  • Get access to web services, including your access keys, at the Web Services Credentials page.
  • Download and extract the DataShop sample web services client to C:\ws
  • Enter your access key ID in the file C:\ws\webservices.properties in the line that starts api.token=, and enter your secret access key in the line that starts secret=. Save the file.
  • Edit the script below by specifying a datasetid for a dataset you have access to. This is the number that appears in the URL of DataShop when browsing a dataset (e.g., https://pslcdatashop.web.cmu.edu/DatasetInfo?datasetId=123). Also set the datasetlength to a number higher than the number of student-step rows.

This code will retrieve transaction data for only the columns specified in colsneeded. It retrieves the transaction data in batches of 5000. For a full reference of possible columns and data formats, see our Web Services API.

setwd('C:/ws/')
df<-data.frame()

colsneeded <- "row,anon_student_id,session_id,time,duration,problem_name,attempt_at_step,outcome,condition"
datasetid <- 123
datasetlength <- 40000
datasetlength <- datasetlength/5000

for (i in 0:(datasetlength-1)) {
  callval <- paste("java -jar C:/ws/dist/datashop-webservices.jar \"https://pslcdatashop.web.cmu.edu/services/datasets/",
    datasetid,"/transactions?offset=",
    i*5000,"&limit=5000&cols=",colsneeded,"\"",sep="")

  tablines <- read.delim(pipe(callval),  header = TRUE, sep = "\t")

  df <- rbind(df,tablines)
  print(paste("Group",i,"completed."))
}

Useful Links