Sample Selector

Sample Selector is a tool for creating and editing samples, or groups of data you compare across—they're not "samples" in the statistical sense, but more like filters.

By default, a single sample exists: "All Data". With the Sample Selector, you can create new samples to organize your data.

You can use samples to:

A sample is composed of one or more filters, specific conditions that narrow down your sample.

Creating a sample

The general process for creating a sample is to:

The effect of multiple filters

DataShop interprets each filter after the first as an additional restriction on the data that is included in the sample. This is also known as a logical "AND". You can see the results of multiple filters in the sample preview as soon as all filters are "saved".


What can I do with DataShop?

I'm a


and I want to ...

Analyze process data from an experiment

Many hypotheses on learning are tested through in vivo experimentation with data stored in DataShop. Within DataShop, users can create samples on subsets of data and compare different conditions within the data. When separate samples are created for experimental conditions, selecting them all will yield learning curves for each and performance profiler data charts for each.

You can see examples of the kinds of analyses that researchers have performed by clicking on the show related datasets and papers link below and reading one of those papers. For example, MacLaren et al. (2008) show results of analyzing process data to see if experimental conditions produce different patterns of hint requests (Table 5) or produce different amounts of example study or problem solving (Table 6). One way to do such an analysis is to export the dataset from the Export tab. You may want to export one of the smaller "rollup" exports, like the student-problem rollup or the student-step rollup, which give you higher level summary data. You can open the export in your favorite tool, such as R or Excel (e.g., use a pivot table with condition in the rows, Knowledge Component in columns, and average of hints in the cells).

Error rates, times, and hints can also be viewed by condition in learning curves or the performance profiler by creating samples for each condition and selecting those. An example dataset that has condition samples is Digital Games for Improving Number Sense - Study 1 (on the Learning Curve tab, inspect the two existing samples and try turning them on and off).

Show related datasets and papers

Improve student learning in my system

There are many ways DataShop can help you analyze your dataset to try to discover ways you might improve student learning from your system. First, a simple strategy is to inspect learning curves to see if any are "low and flat", implying students are getting asked to do easy tasks repeatedly, potentially wasting their valuable learning time (see Cen et al., 2007).

A second, more sophisticated approach is to inspect your learning curves to identify opportunities for improving your knowledge component (KC) model. See Stamper et al. (2011) and watch either of these two videos. Koedinger et al. (2013) describes how an improved KC model was used to redesign a tutor and describes an experiment showing that students learn faster and better from this redesigned tutor than they do from the original tutor. Koedinger & McLaughlin (2010) provides a similar result, with both showing how KC model improvements can inspire the design of novel instructional tasks.

A third, automated approach is to employ Learning Factors Analysis (LFA; see Koedinger et al., 2012). If you would like us to apply LFA to your dataset, contact us. There are many other ways researchers have improved their systems and run experiments demonstrating that these improvements work. See the topic Test an instructional principle.

Watch a video in which researchers Martina Rau and Richard Scheines discuss sense-making before fluency. Using data collected from fourth and fifth grade students who used an intelligent tutoring system for fractions learning, Rau et al. were able to determine that an instructional model that emphasizes making sense of a fractions concept using graphical representation before demonstrating fluency in using graphical representations produces significantly enhanced learning gains. For their award-winning EDM 2013 conference paper, see Rau et al. (2013).

Show related datasets and papers

Discovering knowledge component/skill/cognitive/student models

You can test and improve a knowledge component model in DataShop, or even add a model to a dataset that doesn't have one. Models of student learning map the actions students take in a set of instruction to the knowledge components (KCs) or skills that students are expected to be practicing or learning. DataShop has a simple interface for allowing the creation and editing of knowledge component models. You can export and edit an existing model (or blank template if no such model exists) and then import the new model you create. The new models are then validated and ranked in a "leaderboard" (the KC Models page). You can then use the Learning Curve analysis tool to further explore the individual KCs in the model. A video example of exploring an alternative skill model is available here.

Show related datasets and papers

Predicting student performance

One of the most common uses of educational data mining is prediction. You might want to use prediction to say if a student will get a question correct or incorrect, or we might predict if a student is proficient in a certain skill, task, or knowledge component (KC). You might also build predictive models of which students need intervention to avoid failing a course. These models can then be put back into the systems in which the data was collected. The learning curves analysis tool in DataShop shows a blue predicted line for each KC based on the Additive Factors Model (AFM). Stamper & Koedinger (2011) used the learning curve tool to improve knowledge component models that were then updated in the intelligent tutor that produced the original data. Also, KDD Cup 2010 was a challenge based on predicting student performances that DataShop hosted. The KDD Cup website is still functional and researchers are still submitting predictions.

Show related datasets and papers

Test a model of metacognition

A number of researchers have built models of aspects of metacognitive (or "learning to learn") strategies that have been influenced by and/or tested with datasets in DataShop. If you are considering creating a new model of a metacognitive strategy, take a look at Aleven & Koedinger (2002) for an example of analyzing data to identify limitations in students' metacognitive (or self-regulatory learning) behaviors. Such limitations suggest opportunities for modeling desired metacognitive behavior (e.g., Aleven et al., 2004) and for developing tutoring support at the metacognitive level (e.g., Roll et al., 2007). Looking across multiple DataShop datasets, one can test for long-term effects of a metacognitive intervention (e.g., Roll et al., 2011).

In addition to exploring metacognitive help-seeking strategies (as in references above), other metacognitive strategies have also been explored including self-explanation (e.g., Shi et al., 2008), error self-correction (e.g., Mathan & Koedinger, 2005), self-assessment (e.g., Long & Aleven, 2013), and collaboration skills (e.g., Walker et al., 2011). Many more are possible!

Show related datasets and papers

Test a theory of performance or learning

If, for example, you want to test whether a power law or exponential function better fits learning data, you might use DataShop data sets to do so as follows. You might export data from a dataset, e.g. Geometry Area, 1996-1997, open it into a software package like Matlab or R, and use programs for modeling, such as generalized linear regression, to compare alternate versions of your theory. You can find instructions on how to read an exported file into R here.

Show related datasets and papers

Determine the grain size of transfer of learning

One way you may improve the tracking of skills or knowledge within a set of educational instruction is to change the grain size of the skills or knowledge components (KCs). If you have an existing model of student procedural knowledge (a KC model) in DataShop, you can try "merging" or "splitting" KCs, or trying wholly different KC models, to see which better track the data. For more information, see the topic Discovering knowledge component/skill/cognitive/student models.

Show related datasets and papers

Test a theory of motivation

A number of researchers have found clever ways to detect student motivational or affective states from log data. See papers by Baker and associated datasets, below. Others have run experiments comparing different instructional treatments designed to enhance student engagement or motivation (e.g., see papers by McLaren et al). Many interesting open questions remain, for example, whether timing gaps in data are indications of thoughtfulness or disengagement.

Show related datasets and papers

Test an instructional principle

The best way to test an instructional principle is to run a randomized controlled experiment with a control condition that does not employ the principle and an otherwise-identical treatment condition that does employ that principle. Many such studies have been run as illustrated in many of the associated papers listed below. One benefit of log data is that it provides information on process in addition to the outcome data present in post-tests. This data can enhance explanations of results (e.g., one potential benefit of worked examples is that students can process them faster than matched problems -- do they? is it too fast?). If you are interested in a particular principle, a study may have already been done that you can use as a jumping off point.

Show related datasets and papers

Detecting motivation or engagement

A number of researchers in the fields of educational data mining and learning analytics focus on affective states of students. Using DataShop, it is possible to detect a student's level of motivation or engagement by looking at patterns in the data. Baker and colleagues have built models for detecting behaviors such as when students were gaming the system (Baker et al., 2008) or off task (Cocea et al., 2009). Often the additional data needed to create these tutors is human tagged at first and then built into a model using EDM and machine learning techniques (Baker & Carvallo, 2008) This additional information can easily be imported into and stored in DataShop using custom fields.

Show related datasets and papers

Computer-based assessment, build or test a model for

Educational technology data can be used for accurate assessment of student proficiency, both conceptual and procedural. Feng et al. (2009) provide a great example of how accurate assessment can be achieved while students are learning from an on-line tutor and, in fact, dynamic learning data enhances prediction. A model built from on-line interactions predicts standardized test scores with a correlation of over 0.8!

There are plenty of further opportunities for exploring the quality of online interaction for assessment. Datasets that also have attached pre- or post-test data are particularly good candidates for this goal (see below).

Projects with pre- and post-test data attached

Perfetti - Read Write Integration
Perceptual Fluency in Geometry Achievement
Robust learning with a Meta-Cognitive Tutor
Intelligent Writing Tutor
Geometry Cognitive Model Discovery Closing-the-Loop
Teachable Peer Learner

Show related datasets and papers

Explore student collaboration data

While much of the data in DataShop is from individual use of tutors, online courses, games, etc., there are a number of datasets that include or involve some form of student collaboration. We not only encourage the addition of more such datasets, but also more secondary analyses of existing datasets. Two projects in DataShop with such data are Fractions Collaboration and Individual Data and Rummel - Improving Algebra Learning and Collaboration.

Note: If you are not finding what you are looking for, do not hesitate to ask us.

Show related datasets and papers

Modeling the rate of learning

Some fundamental cognitive and educational psychology questions that analysis of DataShop data could help answer are:

1) How "fast" do human's learn?
2) What is the shape of the "learning curve" (e.g., Chi et al., 2011)?
3) Are there individual student differences in the rate of learning (cf., Yudelson et al., 2013)?

There has been significant research on question 2 using reaction time as the measure of performance (e.g., Heathcote et al., 2000, cited below), but insufficient investigation of the shape of the learning curve when the performance measure is error rate (an arguably more relevant variable for educational goals).

Pursuing any of these questions in the near term is quite likely to lead to a valuable (and publishable!) scientific contribution.

Heathcote, A., Brown, S., and D.J.K., M. 2000. The power law repealed: The case for an exponential law of practice. Psychonomic Bulletin and Review 7, 2, 185207.

Show related datasets and papers

Applications of Bayesian modeling

There are a number of places where Bayesian modeling can be observed or enhanced using datasets in DataShop. Bayesian Knowledge Tracing (BKT) has been studied extensively in the context of cognitive tutors. Several notable works include Baker's work on estimating slip and guess parameters (Baker et al., 2008) and Koedinger et al. (2011), work exploring student thrashing in knowledge component mastery. DataShop includes an external tool provided by Michael Yudelson that provides a BKT fitting algorithm which is described in this video tutorial.

Show related datasets and papers

Data-driven improvement in hints & instruction

Show related datasets and papers

Test my data mining method on multiple data sets

If you built a model for detecting a specific affective state, you can test your detector across multiple datasets. Baker et al. (2006) applied a detector of gaming the system to a number of DataShop datasets. Koedinger et al. (2012) (EDM 2012 Best Paper) showed an automated technique of improving knowledge component models across 11 different data sets. You can also use web services to connect to DataShop which facilitates running your own analyses on multiple datasets.

Show related datasets and papers

Multiple skills

You might have data from problems or activities where the student skills or knowledge components (KCs) are tagged in multiples such that a single student answer or response may require multiple skills of KCs. This is not an unusual situation, and one that DataShop can handle. When building and importing a KC model, you may make additional columns with the same KC headings to show multiple KCs on a student step.

Multiple skills can present challenges when trying to track individual skills as blame assignment becomes an issue. Such is the case in the Koedinger et al. (2011) where multi-skill-assigned steps led to a problem selection thrashing issue where some students could not get past a set of problems because of incorrect blame assignment in the skill model. VanLehn et al. (2005) discussed issues related to multiple skills in the Andes physics intelligent tutor. They found it better to isolate individual skills. This area of investigation (multiple skills) deserves more attention and is ripe for scientific investigation and progress.

Show related datasets and papers

Analyze data from another system to get ideas

Datasets provide examples of different kinds of activities and instructional methods. Analyzing data that is related to your interests (e.g., similar content or similar technology) may give you ideas for better instructional development. Similarly, analysis of datasets may inspire research ideas. Try out exploratory data analysis techniques on a dataset, including using DataShop tools like the Performance Profiler or the Error Report as well as exporting a dataset (transaction level is most detailed) and using your favorite tool(s) for exploratory data analysis (e.g., pivot tables in Excel).

Show related datasets and papers
Version 10.12.6 June 22, 2023