## R (software)

R is a language and free software environment for statistical computing and graphics. If you're familiar with R, you can easily open a DataShop student-step export or transaction export stored on your computer. The command to do so is shown below:

`R> ds = read.delim("export.txt", header = TRUE, quote="\"", dec=".", fill = TRUE, comment.char="")`

The above command reads the tab-delimited file and stores it in a "data frame" object called `ds` (you could use any variable name here). The function `read.delim()` is shorthand for `read.table()` with some default values for the various arguments to `read.table()`. This command works because the student-step and transaction export files have a form that R expects for data frames:

• the first line of the file has a name for each variable in the data frame
• each additional line of the file has as its first item a row label and the values for each variable.

You can graphically view the data frame with the command `edit(ds)`.

Note: Each of the above commands loads data into R and implicitly tells R that all columns that appear as strings should be considered factors (for factor analysis). Although convenient, this can be problematic for some types of analysis. To avoid this issue, append the parameter `stringsAsFactors="false"` to one of the above commands. Then identify individual columns as factors using the R syntax `dat\$col = factor(dat\$col)`, where `dat` is the data frame and `col` is the column name.

At this point, you can work with the data in R to perform any analysis you'd like.

### Using R to replicate the AFM model

Using R notation, the AFM model (applied to a modified student-step export file called "ds") can be approximately* represented as:

`R> L = length(ds\$Anon.Student.Id)`
`R> success = vector(mode="numeric", length=L)`
`R> success[ds\$First.Attempt=="correct"]=1`
```R> model1.lmer <- lmer(success~knowledge_component+
knowledge_component:opportunity+(1|anon_student_id),data=ds,family=binomial())```

Note: The `success` variable must be 0 or 1. The first three R commands simply convert the "First Attempt" values (in the student-step export) of "incorrect" and "hint" to 0, and "correct" to "1".

* The AFM code is different from the R expression above in two ways:

1. To reduce over-fitting the data, AFM assumes learning cannot be negative and thus constraints the "slope" estimates of the `knowledge_component:opportunity` parameters to be greater or equal to 0.
2. The optimization applies a penalty to estimates of the student parameters (`anon_student_id`) for deviating from 0—essentially treating `anon_student_id` as a random effect.

The above R code, with additional analyses, is available for download here.

For more about the AFM model, see the Model Values help page.

### Using DataShop Web Services to access DataShop data in R

The following R commands show how you can access DataShop data via web services from R. Windows paths are shown in the example. A few prerequisites are needed for this script to run successfully:

• Get access to web services, including your access keys, at the Web Services Credentials page.
• Download and extract the DataShop sample web services client to C:\ws
• Enter your access key ID in the file C:\ws\webservices.properties in the line that starts `api.token=`, and enter your secret access key in the line that starts `secret=`. Save the file.
• Edit the script below by specifying a datasetid for a dataset you have access to. This is the number that appears in the URL of DataShop when browsing a dataset (e.g., https://pslcdatashop.web.cmu.edu/DatasetInfo?datasetId=123). Also set the datasetlength to a number higher than the number of student-step rows.

This code will retrieve transaction data for only the columns specified in colsneeded. It retrieves the transaction data in batches of 5000. For a full reference of possible columns and data formats, see our Web Services API.

```setwd('C:/ws/')
df<-data.frame()

colsneeded <- "row,anon_student_id,session_id,time,duration,problem_name,attempt_at_step,outcome,condition"
datasetid <- 123
datasetlength <- 40000
datasetlength <- datasetlength/5000

for (i in 0:(datasetlength-1)) {
callval <- paste("java -jar C:/ws/dist/datashop-webservices.jar \"https://pslcdatashop.web.cmu.edu/services/datasets/",
datasetid,"/transactions?offset=",
i*5000,"&limit=5000&cols=",colsneeded,"\"",sep="")

tablines <- read.delim(pipe(callval),  header = TRUE, sep = "\t")

df <- rbind(df,tablines)
print(paste("Group",i,"completed."))
}```

### Useful Links

 home about terms contact us legal info documentation Version 11.1.2 July 26, 2024