Evaluation of mathematical questioning strategies using data collected
through weak supervision
- URL: http://arxiv.org/abs/2112.00985v1
- Date: Thu, 2 Dec 2021 05:12:36 GMT
- Title: Evaluation of mathematical questioning strategies using data collected
through weak supervision
- Authors: Debajyoti Datta, Maria Phillips, James P Bywater, Jennifer Chiu,
Ginger S. Watson, Laura E. Barnes, Donald E Brown
- Abstract summary: This paper presents a high-fidelity, AI-based classroom simulator to help teachers rehearse research-based mathematical questioning skills.
Using a human-in-the-loop approach, we collected a high-quality training dataset for a mathematical questioning scenario.
- Score: 1.794107419334178
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: A large body of research demonstrates how teachers' questioning strategies
can improve student learning outcomes. However, developing new scenarios is
challenging because of the lack of training data for a specific scenario and
the costs associated with labeling. This paper presents a high-fidelity,
AI-based classroom simulator to help teachers rehearse research-based
mathematical questioning skills. Using a human-in-the-loop approach, we
collected a high-quality training dataset for a mathematical questioning
scenario. Using recent advances in uncertainty quantification, we evaluated our
conversational agent for usability and analyzed the practicality of
incorporating a human-in-the-loop approach for data collection and system
evaluation for a mathematical questioning scenario.
Related papers
- SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - Multi-Faceted Question Complexity Estimation Targeting Topic Domain-Specificity [0.0]
This paper presents a novel framework for domain-specific question difficulty estimation, leveraging a suite of NLP techniques and knowledge graph analysis.
We introduce four key parameters: Topic Retrieval Cost, Topic Salience, Topic Coherence, and Topic Superficiality.
A model trained on these features demonstrates the efficacy of our approach in predicting question difficulty.
arXiv Detail & Related papers (2024-08-23T05:40:35Z) - Evaluating Mathematical Reasoning Beyond Accuracy [50.09931172314218]
We introduce ReasonEval, a new methodology for evaluating the quality of reasoning steps.
We show that ReasonEval consistently outperforms baseline methods in the meta-evaluation datasets.
We observe that ReasonEval can play a significant role in data selection.
arXiv Detail & Related papers (2024-04-08T17:18:04Z) - DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation [83.30006900263744]
Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights.
We propose to automatically generate high-quality answer annotations leveraging the code-generation capabilities of LLMs.
Our DACO-RL algorithm is evaluated by human annotators to produce more helpful answers than SFT model in 57.72% cases.
arXiv Detail & Related papers (2024-03-04T22:47:58Z) - Benchmarking Data Science Agents [11.582116078653968]
Large Language Models (LLMs) have emerged as promising aids as data science agents, assisting humans in data analysis and processing.
Yet their practical efficacy remains constrained by the varied demands of real-world applications and complicated analytical process.
We introduce DSEval -- a novel evaluation paradigm, as well as a series of innovative benchmarks tailored for assessing the performance of these agents.
arXiv Detail & Related papers (2024-02-27T03:03:06Z) - Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence [0.0]
Large language models (LLMs) have been shown to present an unprecedented opportunity to scale up data analytics in the humanities and social sciences.
We build on mixed methods quantitizing and converting design principles, and feature analysis from linguistics, to transparently integrate human expertise and machine scalability.
The approach is discussed and demonstrated in over a dozen LLM-assisted case studies, covering 9 diverse languages, multiple disciplines and tasks.
arXiv Detail & Related papers (2023-09-24T14:21:50Z) - Towards a Holistic Understanding of Mathematical Questions with
Contrastive Pre-training [65.10741459705739]
We propose a novel contrastive pre-training approach for mathematical question representations, namely QuesCo.
We first design two-level question augmentations, including content-level and structure-level, which generate literally diverse question pairs with similar purposes.
Then, to fully exploit hierarchical information of knowledge concepts, we propose a knowledge hierarchy-aware rank strategy.
arXiv Detail & Related papers (2023-01-18T14:23:29Z) - Offline Robot Reinforcement Learning with Uncertainty-Guided Human
Expert Sampling [11.751910133386254]
Recent advances in batch (offline) reinforcement learning have shown promising results in learning from available offline data.
We propose a novel approach that uses uncertainty estimation to trigger the injection of human demonstration data.
Our experiments show that this approach is more sample efficient when compared to a naive way of combining expert data with data collected from a sub-optimal agent.
arXiv Detail & Related papers (2022-12-16T01:41:59Z) - Improving Imbalanced Text Classification with Dynamic Curriculum
Learning [32.731900584216724]
We propose a novel self-paced dynamic curriculum learning method for imbalanced text classification.
Our SPDCL can reorder and resample training data by difficulty criterion with an adaptive from easy to hard pace.
The experiments on several classification tasks show the effectiveness of SPDCL strategy, especially for the imbalanced dataset.
arXiv Detail & Related papers (2022-10-25T07:57:59Z) - Stateful Offline Contextual Policy Evaluation and Learning [88.9134799076718]
We study off-policy evaluation and learning from sequential data.
We formalize the relevant causal structure of problems such as dynamic personalized pricing.
We show improved out-of-sample policy performance in this class of relevant problems.
arXiv Detail & Related papers (2021-10-19T16:15:56Z) - Predicting Themes within Complex Unstructured Texts: A Case Study on
Safeguarding Reports [66.39150945184683]
We focus on the problem of automatically identifying the main themes in a safeguarding report using supervised classification approaches.
Our results show the potential of deep learning models to simulate subject-expert behaviour even for complex tasks with limited labelled data.
arXiv Detail & Related papers (2020-10-27T19:48:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.