The NCTE Transcripts: A Dataset of Elementary Math Classroom Transcripts
- URL: http://arxiv.org/abs/2211.11772v2
- Date: Wed, 24 May 2023 18:41:18 GMT
- Title: The NCTE Transcripts: A Dataset of Elementary Math Classroom Transcripts
- Authors: Dorottya Demszky and Heather Hill
- Abstract summary: We introduce the largest dataset of mathematics classroom transcripts available to researchers.
The dataset consists of 1,660 minute long 4th and 5th grade elementary mathematics observations.
The anonymized transcripts represent data from 317 teachers across 4 school districts that serve largely marginalized students.
- Score: 4.931378519409227
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classroom discourse is a core medium of instruction - analyzing it can
provide a window into teaching and learning as well as driving the development
of new tools for improving instruction. We introduce the largest dataset of
mathematics classroom transcripts available to researchers, and demonstrate how
this data can help improve instruction. The dataset consists of 1,660 45-60
minute long 4th and 5th grade elementary mathematics observations collected by
the National Center for Teacher Effectiveness (NCTE) between 2010-2013. The
anonymized transcripts represent data from 317 teachers across 4 school
districts that serve largely historically marginalized students. The
transcripts come with rich metadata, including turn-level annotations for
dialogic discourse moves, classroom observation scores, demographic
information, survey responses and student test scores. We demonstrate that our
natural language processing model, trained on our turn-level annotations, can
learn to identify dialogic discourse moves and these moves are correlated with
better classroom observation scores and learning outcomes. This dataset opens
up several possibilities for researchers, educators and policymakers to learn
about and improve K-12 instruction. The dataset can be found at
https://github.com/ddemszky/classroom-transcript-analysis.
Related papers
- Towards Student Actions in Classroom Scenes: New Dataset and Baseline [43.268586725768465]
We present a new multi-label student action video (SAV) dataset for complex classroom scenes.
The dataset consists of 4,324 carefully trimmed video clips from 758 different classrooms, each labeled with 15 different actions displayed by students in classrooms.
arXiv Detail & Related papers (2024-09-02T03:44:24Z) - Measuring Five Accountable Talk Moves to Improve Instruction at Scale [1.4549461207028445]
We fine-tune models to identify five instructional talk moves inspired by accountable talk theory.
We correlate the instructors' use of each talk move with indicators of student engagement and satisfaction.
These results corroborate previous research on the effectiveness of accountable talk moves.
arXiv Detail & Related papers (2023-11-02T03:04:50Z) - MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties
Grounded in Math Reasoning Problems [74.73881579517055]
We propose a framework to generate such dialogues by pairing human teachers with a Large Language Model prompted to represent common student errors.
We describe how we use this framework to collect MathDial, a dataset of 3k one-to-one teacher-student tutoring dialogues.
arXiv Detail & Related papers (2023-05-23T21:44:56Z) - Selective Annotation Makes Language Models Better Few-Shot Learners [97.07544941620367]
Large language models can perform in-context learning, where they learn a new task from a few task demonstrations.
This work examines the implications of in-context learning for the creation of datasets for new natural language tasks.
We propose an unsupervised, graph-based selective annotation method, voke-k, to select diverse, representative examples to annotate.
arXiv Detail & Related papers (2022-09-05T14:01:15Z) - The TalkMoves Dataset: K-12 Mathematics Lesson Transcripts Annotated for
Teacher and Student Discursive Moves [8.090330715662962]
This paper describes the TalkMoves dataset, composed of 567 human-annotated K-12 mathematics lesson transcripts.
The dataset can be used by educators, policymakers, and researchers to understand the nature of teacher and student discourse in K-12 math classrooms.
arXiv Detail & Related papers (2022-04-06T18:12:30Z) - A Semi-Supervised Learning Approach with Two Teachers to Improve
Breakdown Identification in Dialogues [25.499578161686355]
We propose a novel semi-supervised teacher-student learning framework to tackle this task.
We introduce two teachers which are trained on labeled data and perturbed labeled data respectively.
We leverage unlabeled data to improve classification in student training where we employ two teachers to refine the labeling of unlabeled data.
arXiv Detail & Related papers (2022-02-22T14:39:51Z) - On Guiding Visual Attention with Language Specification [76.08326100891571]
We use high-level language specification as advice for constraining the classification evidence to task-relevant features, instead of distractors.
We show that supervising spatial attention in this way improves performance on classification tasks with biased and noisy data.
arXiv Detail & Related papers (2022-02-17T22:40:19Z) - Annotation Curricula to Implicitly Train Non-Expert Annotators [56.67768938052715]
voluntary studies often require annotators to familiarize themselves with the task, its annotation scheme, and the data domain.
This can be overwhelming in the beginning, mentally taxing, and induce errors into the resulting annotations.
We propose annotation curricula, a novel approach to implicitly train annotators.
arXiv Detail & Related papers (2021-06-04T09:48:28Z) - SLADE: A Self-Training Framework For Distance Metric Learning [75.54078592084217]
We present a self-training framework, SLADE, to improve retrieval performance by leveraging additional unlabeled data.
We first train a teacher model on the labeled data and use it to generate pseudo labels for the unlabeled data.
We then train a student model on both labels and pseudo labels to generate final feature embeddings.
arXiv Detail & Related papers (2020-11-20T08:26:10Z) - Watch and Learn: Mapping Language and Noisy Real-world Videos with
Self-supervision [54.73758942064708]
We teach machines to understand visuals and natural language by learning the mapping between sentences and noisy video snippets without explicit annotations.
For training and evaluation, we contribute a new dataset ApartmenTour' that contains a large number of online videos and subtitles.
arXiv Detail & Related papers (2020-11-19T03:43:56Z) - Teaching to Learn: Sequential Teaching of Agents with Inner States [20.556373950863247]
We introduce a multi-agent formulation in which learners' inner state may change with the teaching interaction.
In order to teach such learners, we propose an optimal control approach that takes the future performance of the learner after teaching into account.
arXiv Detail & Related papers (2020-09-14T07:03:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.