R (software)
R is a language and free software environment for statistical computing and graphics. If you're familiar with R, you can easily open a DataShop student-step export or transaction export stored on your computer. The command to do so is shown below:
R> ds = read.delim("export.txt", header = TRUE, quote="\"", dec=".", fill = TRUE, comment.char="")
The above command reads the tab-delimited file and stores it in a "data frame" object called
ds
(you could use any variable name here). The function read.delim()
is shorthand for
read.table()
with some default values for the various arguments to read.table()
.
This command works because the student-step and transaction export files have a form that R expects
for data frames:
- the first line of the file has a name for each variable in the data frame
- each additional line of the file has as its first item a row label and the values for each variable.
You can graphically view the data frame with the command edit(ds)
.
Note: Each of the above commands loads data into R and implicitly tells R that all columns
that appear as strings should be considered factors (for factor analysis). Although convenient, this can be problematic
for some types of analysis. To avoid this issue, append the parameter stringsAsFactors="false"
to one of the
above commands. Then identify individual columns as factors using the R syntax dat$col = factor(dat$col)
,
where dat
is the data frame and col
is the column name.
At this point, you can work with the data in R to perform any analysis you'd like.
Using R to replicate the AFM model
Using R notation, the AFM model (applied to a modified student-step export file called "ds") can be approximately* represented as:
R> L = length(ds$Anon.Student.Id)
R> success = vector(mode="numeric", length=L)
R> success[ds$First.Attempt=="correct"]=1
R> model1.lmer <- lmer(success~knowledge_component+ knowledge_component:opportunity+(1|anon_student_id),data=ds,family=binomial())
Note: The success
variable must be 0 or 1. The first three R commands simply convert the "First Attempt" values (in the student-step export) of "incorrect" and "hint" to 0, and "correct" to "1".
* The AFM code is different from the R expression above in two ways:
- To reduce over-fitting the data, AFM assumes learning cannot be negative and thus constraints the
"slope" estimates of the
knowledge_component:opportunity
parameters to be greater or equal to 0.
- The optimization applies a penalty to estimates of the student parameters (
anon_student_id
) for deviating from 0—essentially treatinganon_student_id
as a random effect.
The above R code, with additional analyses, is available for download here.
For more about the AFM model, see the Model Values help page.
Using DataShop Web Services to access DataShop data in R
The following R commands show how you can access DataShop data via web services from R. Windows paths are shown in the example. A few prerequisites are needed for this script to run successfully:
- Get access to web services, including your access keys, at the Web Services Credentials page.
- Download and extract the DataShop sample web services client to C:\ws
- Enter your access key ID in the file C:\ws\webservices.properties in the line that starts
api.token=
, and enter your secret access key in the line that startssecret=
. Save the file. - Edit the script below by specifying a datasetid for a dataset you have access to. This is the number that appears in the URL of DataShop when browsing a dataset (e.g., https://pslcdatashop.web.cmu.edu/DatasetInfo?datasetId=123). Also set the datasetlength to a number higher than the number of student-step rows.
This code will retrieve transaction data for only the columns specified in colsneeded. It retrieves the transaction data in batches of 5000. For a full reference of possible columns and data formats, see our Web Services API.
setwd('C:/ws/') df<-data.frame() colsneeded <- "row,anon_student_id,session_id,time,duration,problem_name,attempt_at_step,outcome,condition" datasetid <- 123 datasetlength <- 40000 datasetlength <- datasetlength/5000 for (i in 0:(datasetlength-1)) { callval <- paste("java -jar C:/ws/dist/datashop-webservices.jar \"https://pslcdatashop.web.cmu.edu/services/datasets/", datasetid,"/transactions?offset=", i*5000,"&limit=5000&cols=",colsneeded,"\"",sep="") tablines <- read.delim(pipe(callval), header = TRUE, sep = "\t") df <- rbind(df,tablines) print(paste("Group",i,"completed.")) }