DataShop @CMUa data analysis service for the learning science community | |||||||||||||||||
Help | |||||||||||||||||
SelfCode 2.0: Annotated Corpus of Student Self-Explanations to Introductory JAVA Programs in Computer Science
Datasets
Terms of Use
PI
Arun Balajiee Lekshmi Narayanan
Data Provider
Description
Assessing student responses is a critical task in adap-
tive educational systems. More specifically, automati- cally evaluating students’ self-explanations contributes to understanding their knowledge state which is needed for personalized instruction, the crux of adaptive edu- cational systems. To facilitate the development of Ar- tificial Intelligence (AI) and Machine Learning models for automated assessment of learners’ self-explanations, annotated datasets are essential. In response to this need, we developed the SelfCode2.0 corpus, which con- sists of 3,019 pairs of student and expert explanations of Java code snippets, each annotated with semantic sim- ilarity, correctness, and completeness scores provided by experts. Alongside the dataset, we also provide per- formance results obtained with several baseline models based on TF-IDF and Sentence-BERT vectorial repre- sentations. This work aims to enhance the effectiveness of automated assessment tools in programming education and contribute to a better understanding and supporting student learning of programming Tags
Natural Language Processing, Introductory Programming, JAVA, Self Explanations, LLMs, BERT
Separate tags with commas.
Datasets
|
|||||||||||||||||
|