DataShop > Project

SelfCode 2.0: Annotated Corpus of Student Self-Explanations to Introductory JAVA Programs in Computer Science

Datasets Terms of Use

Arun Balajiee Lekshmi Narayanan

Data Provider

Description

Assessing student responses is a critical task in adap-
tive educational systems. More specifically, automati-
cally evaluating students’ self-explanations contributes
to understanding their knowledge state which is needed
for personalized instruction, the crux of adaptive edu-
cational systems. To facilitate the development of Ar-
tificial Intelligence (AI) and Machine Learning models
for automated assessment of learners’ self-explanations,
annotated datasets are essential. In response to this
need, we developed the SelfCode2.0 corpus, which con-
sists of 3,019 pairs of student and expert explanations of
Java code snippets, each annotated with semantic sim-
ilarity, correctness, and completeness scores provided
by experts. Alongside the dataset, we also provide per-
formance results obtained with several baseline models
based on TF-IDF and Sentence-BERT vectorial repre-
sentations. This work aims to enhance the effectiveness
of automated assessment tools in programming education and contribute to a better understanding and supporting student learning of programming

Tags

Natural Language Processing, Introductory Programming, JAVA, Self Explanations, LLMs, BERT

Separate tags with commas.

External Links

Zenodo

TitleURL

delete this link? no / yes

Datasets

Dataset	Area/ Subject	Dates	Data Last Modified	Transactions	KC Models	Status	Papers
SelfCode 2.0 Student Self-Explanations SelfCode 2.0 Student Self-Explanations	Computer Science/ Introductory Programming: Java	Aug 1, 2018 - Dec 31, 2023	-	0	0	files-only	1

Sample Selector

Creating a sample

The effect of multiple filters

DataShop @CMU

Explore

Learn More

SelfCode 2.0: Annotated Corpus of Student Self-Explanations to Introductory JAVA Programs in Computer Science

Project Page