Stan: An LLM-based thermodynamics course assistant
- URL: http://arxiv.org/abs/2603.04657v1
- Date: Wed, 04 Mar 2026 22:44:50 GMT
- Title: Stan: An LLM-based thermodynamics course assistant
- Authors: Eric M. Furst, Vasudevan Venkateshwaran,
- Abstract summary: Stan is a suite of tools for an undergraduate chemical engineering thermodynamics course built on a data pipeline that we develop and deploy in dual roles.<n>On the student side, a retrieval-augmented generation (RAG) pipeline answers natural-language queries by extracting technical terms.<n>On the instructor side, the same transcript corpus is processed through structured analysis pipelines that produce per-lecture summaries.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Discussions of AI in education focus predominantly on student-facing tools -- chatbots, tutors, and problem generators -- while the potential for the same infrastructure to support instructors remains largely unexplored. We describe Stan, a suite of tools for an undergraduate chemical engineering thermodynamics course built on a data pipeline that we develop and deploy in dual roles: serving students and supporting instructors from a shared foundation of lecture transcripts and a structured textbook index. On the student side, a retrieval-augmented generation (RAG) pipeline answers natural-language queries by extracting technical terms, matching them against the textbook index, and synthesizing grounded responses with specific chapter and page references. On the instructor side, the same transcript corpus is processed through structured analysis pipelines that produce per-lecture summaries, identify student questions and moments of confusion, and catalog the anecdotes and analogies used to motivate difficult material -- providing a searchable, semester-scale record of teaching that supports course reflection, reminders, and improvement. All components, including speech-to-text transcription, structured content extraction, and interactive query answering, run entirely on locally controlled hardware using open-weight models (Whisper large-v3, Llama~3.1 8B) with no dependence on cloud APIs, ensuring predictable costs, full data privacy, and reproducibility independent of third-party services. We describe the design, implementation, and practical failure modes encountered when deploying 7--8 billion parameter models for structured extraction over long lecture transcripts, including context truncation, bimodal output distributions, and schema drift, along with the mitigations that resolved them.
Related papers
- ScholarGym: Benchmarking Deep Research Workflows on Academic Literature Retrieval [11.41528830724814]
We present ScholarGym, a simulation environment for reproducible evaluation of deep research on academic literature.<n>Built on a static corpus of 570K papers with deterministic retrieval, ScholarGym provides 2,536 queries with expert-annotated ground truth.
arXiv Detail & Related papers (2026-01-29T12:51:44Z) - From Context to EDUs: Faithful and Structured Context Compression via Elementary Discourse Unit Decomposition [46.36937947958481]
We introduce a novel explicit compression framework designed to preserve both global structure and fine-grained details.<n>Our approach reformulates a structural context compression as a structure-then-select process.<n>Our method achieves state-of-the-art structural prediction accuracy and significantly outperforms frontier LLMs.
arXiv Detail & Related papers (2025-12-16T09:52:58Z) - Thinking Like a Student: AI-Supported Reflective Planning in a Theory-Intensive Computer Science Course [1.5229257192293202]
In the aftermath of COVID-19, many universities implemented supplementary "reinforcement" roles to support students in demanding courses.<n>This paper reports on the redesign of reinforcement sessions in a challenging undergraduate course on formal methods and computational models.<n>The intervention received positive student feedback, indicating increased confidence, reduced anxiety, and improved clarity.
arXiv Detail & Related papers (2025-10-31T12:35:18Z) - AI-driven formative assessment and adaptive learning in data-science education: Evaluating an LLM-powered virtual teaching assistant [6.874351093155318]
VITA (Virtual Teaching Assistants) is an adaptive distributed learning platform that embeds a large language model (LLM)-powered bot (BotCaptain)<n>The paper describes an end-to-end data pipeline that transforms chat logs into Experience API (xAPI) statements, instructor dashboards that surface outliers for just-in-time intervention.<n>Future work will refine the platform's adaptive intelligence and examine applicability across varied educational settings.
arXiv Detail & Related papers (2025-09-17T11:27:45Z) - Mixture of Structural-and-Textual Retrieval over Text-rich Graph Knowledge Bases [78.62158923194153]
Text-rich Graph Knowledge Bases (TG-KBs) have become increasingly crucial for answering queries by providing textual and structural knowledge.<n>We propose a Mixture of Structural-and-Textual Retrieval (MoR) to retrieve these two types of knowledge via a Planning-Reasoning-Organizing framework.
arXiv Detail & Related papers (2025-02-27T17:42:52Z) - Learning Disentangled Speech Representations [0.412484724941528]
SynSpeech is a novel large-scale synthetic speech dataset designed to enable research on disentangled speech representations.<n>We present a framework to evaluate disentangled representation learning techniques, applying both linear probing and established supervised disentanglement metrics.<n>We find that SynSpeech facilitates benchmarking across a range of factors, achieving promising disentanglement of simpler features like gender and speaking style, while highlighting challenges in isolating complex attributes like speaker identity.
arXiv Detail & Related papers (2023-11-04T04:54:17Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Demonstrate-Search-Predict: Composing retrieval and language models for
knowledge-intensive NLP [77.817293104436]
We propose a framework that relies on passing natural language texts in sophisticated pipelines between an LM and an RM.
We have written novel DSP programs for answering questions in open-domain, multi-hop, and conversational settings.
arXiv Detail & Related papers (2022-12-28T18:52:44Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.