Related papers: Automating the Information Extraction from Semi-Structured Interview Transcripts

Related papers

Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey [59.3507264893654]
Issue resolution is a complex Software Engineering task integral to real-world development.<n> benchmarks like SWE-bench revealed this task as profoundly difficult for large language models.<n>This paper presents a systematic survey of this emerging domain.
arXiv Detail & Related papers (2026-01-15T18:55:03Z)
AutoMalDesc: Large-Scale Script Analysis for Cyber Threat Research [81.04845910798387]
Generating natural language explanations for threat detections remains an open problem in cybersecurity research.<n>We present AutoMalDesc, an automated static analysis summarization framework that operates independently at scale.<n>We publish our complete dataset of more than 100K script samples, including annotated seed (0.9K) datasets, along with our methodology and evaluation framework.
arXiv Detail & Related papers (2025-11-17T13:05:25Z)
The Cultural Mapping and Pattern Analysis (CMAP) Visualization Toolkit: Open Source Text Analysis for Qualitative and Computational Social Science [5.761643793915607]
The CMAP visualization toolkit is an open-source suite for analyzing and visualizing text data.<n>The toolkit is designed for scholars integrating pattern analysis, data visualization, and explanation in social science.
arXiv Detail & Related papers (2025-10-17T18:24:47Z)
Data Therapist: Eliciting Domain Knowledge from Subject Matter Experts Using Large Language Models [17.006423792670414]
We present the Data Therapist, a web-based tool that helps domain experts externalize implicit knowledge through a mixed-initiative process. The resulting structured knowledge base can inform both human and automated visualization design.
arXiv Detail & Related papers (2025-05-01T11:10:17Z)
The AI Co-Ethnographer: How Far Can Automation Take Qualitative Research? [51.40252017262535]
The AI Co-Ethnographer (AICoE) is a novel end-to-end pipeline developed for qualitative research.<n>AICoE organizes the entire process, encompassing open coding, code consolidation, code application, and even pattern discovery.
arXiv Detail & Related papers (2025-04-21T21:31:28Z)
Retrieval Augmented Generation for Topic Modeling in Organizational Research: An Introduction with Empirical Demonstration [0.0]
This paper introduces Agentic Retrieval-Augmented Generation (Agentic RAG) as a method for topic modeling with LLMs. It integrates three key components: (1) retrieval, enabling automatized access to external data beyond an LLM's pre-trained knowledge; (2) generation, leveraging LLM capabilities for text synthesis; and (3) agent-driven learning, iteratively refining retrieval and query formulation processes. Our findings demonstrate that the approach is more efficient, interpretable and at the same time achieves higher reliability and validity in comparison to the standard machine learning approach.
arXiv Detail & Related papers (2025-02-28T11:25:11Z)
Data Analysis in the Era of Generative AI [56.44807642944589]
This paper explores the potential of AI-powered tools to reshape data analysis, focusing on design considerations and challenges. We explore how the emergence of large language and multimodal models offers new opportunities to enhance various stages of data analysis workflow. We then examine human-centered design principles that facilitate intuitive interactions, build user trust, and streamline the AI-assisted analysis workflow across multiple apps.
arXiv Detail & Related papers (2024-09-27T06:31:03Z)
Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling. EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z)
generAItor: Tree-in-the-Loop Text Generation for Language Model Explainability and Adaptation [28.715001906405362]
Large language models (LLMs) are widely deployed in various downstream tasks, e.g., auto-completion, aided writing, or chat-based text generation. We tackle this shortcoming by proposing a tree-in-the-loop approach, where a visual representation of the beam search tree is the central component for analyzing, explaining, and adapting the generated outputs. We present generAItor, a visual analytics technique, augmenting the central beam search tree with various task-specific widgets, providing targeted visualizations and interaction possibilities.
arXiv Detail & Related papers (2024-03-12T13:09:15Z)
Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral. This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z)
Can Large Language Models Serve as Data Analysts? A Multi-Agent Assisted Approach for Qualitative Data Analysis [6.592797748561459]
Large Language Models (LLMs) have enabled collaborative human-bot interactions in Software Engineering (SE) We introduce a new dimension of scalability and accuracy in qualitative research, potentially transforming data interpretation methodologies in SE.
arXiv Detail & Related papers (2024-02-02T13:10:46Z)
Combatting Human Trafficking in the Cyberspace: A Natural Language Processing-Based Methodology to Analyze the Language in Online Advertisements [55.2480439325792]
This project tackles the pressing issue of human trafficking in online C2C marketplaces through advanced Natural Language Processing (NLP) techniques. We introduce a novel methodology for generating pseudo-labeled datasets with minimal supervision, serving as a rich resource for training state-of-the-art NLP models. A key contribution is the implementation of an interpretability framework using Integrated Gradients, providing explainable insights crucial for law enforcement.
arXiv Detail & Related papers (2023-11-22T02:45:01Z)
Leveraging Contextual Information for Effective Entity Salience Detection [21.30389576465761]
We show that fine-tuning medium-sized language models with a cross-encoder style architecture yields substantial performance gains over feature engineering approaches. We also show that zero-shot prompting of instruction-tuned language models yields inferior results, indicating the task's uniqueness and complexity.
arXiv Detail & Related papers (2023-09-14T19:04:40Z)
Accelerated materials language processing enabled by GPT [5.518792725397679]
We develop generative transformer (GPT)-enabled pipelines for materials language processing. First, we develop a GPT-enabled document classification method for screening relevant documents. Secondly, for NER task, we design an entity-centric prompts, and learning few-shot of them improved the performance. Finally, we develop an GPT-enabled extractive QA model, which provides improved performance and shows the possibility of automatically correcting annotations.
arXiv Detail & Related papers (2023-08-18T07:31:13Z)
GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration [97.68234051078997]
We discuss how Pyserini can be integrated with the Hugging Face ecosystem of open-source AI libraries and artifacts. We include a Jupyter Notebook-based walk through the core interoperability features, available on GitHub. We present GAIA Search - a search engine built following previously laid out principles, giving access to four popular large-scale text collections.
arXiv Detail & Related papers (2023-06-02T12:09:59Z)
Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms. Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications. By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z)
Text Mining for Processing Interview Data in Computational Social Science [0.6820436130599382]
We use commercially available text analysis technology to process interview text data from a computational social science study. We find that topical clustering and terminological enrichment provide for convenient exploration and quantification of the responses. We encourage studies in social science to use text analysis, especially for exploratory open-ended studies.
arXiv Detail & Related papers (2020-11-28T00:44:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.