Related papers: Text Mining for Processing Interview Data in Computational Social Science

Text Mining for Processing Interview Data in Computational Social Science

URL: http://arxiv.org/abs/2011.14037v1
Date: Sat, 28 Nov 2020 00:44:35 GMT
Title: Text Mining for Processing Interview Data in Computational Social Science
Authors: Jussi Karlgren, Renee Li, Eva M Meyersson Milgrom
Abstract summary: We use commercially available text analysis technology to process interview text data from a computational social science study. We find that topical clustering and terminological enrichment provide for convenient exploration and quantification of the responses. We encourage studies in social science to use text analysis, especially for exploratory open-ended studies.
Score: 0.6820436130599382
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: We use commercially available text analysis technology to process interview text data from a computational social science study. We find that topical clustering and terminological enrichment provide for convenient exploration and quantification of the responses. This makes it possible to generate and test hypotheses and to compare textual and non-textual variables, and saves analyst effort. We encourage studies in social science to use text analysis, especially for exploratory open-ended studies. We discuss how replicability requirements are met by text analysis technology. We note that the most recent learning models are not designed with transparency in mind, and that research requires a model to be editable and its decisions to be explainable. The tools available today, such as the one used in the present study, are not built for processing interview texts. While many of the variables under consideration are quantifiable using lexical statistics, we find that some interesting and potentially valuable features are difficult or impossible to automatise reliably at present. We note that there are some potentially interesting applications for traditional natural language processing mechanisms such as named entity recognition and anaphora resolution in this application area. We conclude with a suggestion for language technologists to investigate the challenge of processing interview data comprehensively, especially the interplay between question and response, and we encourage social science researchers not to hesitate to use text analysis tools, especially for the exploratory phase of processing interview data.?

Related papers

Likelihood as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that likelihoods serve as an effective gauge for language model performance. We propose two methods that use question likelihood as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z)
A Primer on Word Embeddings: AI Techniques for Text Analysis in Social Work [0.0]
This paper introduces word embeddings to social work researchers. We discuss fundamental concepts, technical foundations, and practical applications. We conclude that successfully implementing embedding technologies in social work requires developing domain-specific models, creating accessible tools, and establishing best practices aligned with social work's ethical principles.
arXiv Detail & Related papers (2024-11-11T17:33:51Z)
Automating the Information Extraction from Semi-Structured Interview Transcripts [0.0]
This paper explores the development and application of an automated system designed to extract information from semi-structured interview transcripts. We present a user-friendly software prototype that enables researchers to efficiently process and visualize the thematic structure of interview data.
arXiv Detail & Related papers (2024-03-07T13:53:03Z)
Artificial intelligence to automate the systematic review of scientific literature [0.0]
We present a survey of AI techniques proposed in the last 15 years to help researchers conduct systematic analyses of scientific literature. We describe the tasks currently supported, the types of algorithms applied, and available tools proposed in 34 primary studies.
arXiv Detail & Related papers (2024-01-13T19:12:49Z)
Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence [0.0]
Large language models (LLMs) have been shown to present an unprecedented opportunity to scale up data analytics in the humanities and social sciences. We build on mixed methods quantitizing and converting design principles, and feature analysis from linguistics, to transparently integrate human expertise and machine scalability. The approach is discussed and demonstrated in over a dozen LLM-assisted case studies, covering 9 diverse languages, multiple disciplines and tasks.
arXiv Detail & Related papers (2023-09-24T14:21:50Z)
Chat2Brain: A Method for Mapping Open-Ended Semantic Queries to Brain Activation Maps [59.648646222905235]
We propose a method called Chat2Brain that combines LLMs to basic text-2-image model, known as Text2Brain, to map semantic queries to brain activation maps. We demonstrate that Chat2Brain can synthesize plausible neural activation patterns for more complex tasks of text queries.
arXiv Detail & Related papers (2023-09-10T13:06:45Z)
An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP. We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z)
Revisiting the Roles of "Text" in Text Games [102.22750109468652]
This paper investigates the roles of text in the face of different reinforcement learning challenges. We propose a simple scheme to extract relevant contextual information into an approximate state hash. Such a lightweight plug-in achieves competitive performance with state-of-the-art text agents.
arXiv Detail & Related papers (2022-10-15T21:52:39Z)
ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering [70.6359636116848]
We propose a new large-scale dataset, ConvFinQA, to study the chain of numerical reasoning in conversational question answering. Our dataset poses great challenge in modeling long-range, complex numerical reasoning paths in real-world conversations.
arXiv Detail & Related papers (2022-10-07T23:48:50Z)
Polling Latent Opinions: A Method for Computational Sociolinguistics Using Transformer Language Models [4.874780144224057]
We use the capacity for memorization and extrapolation of Transformer Language Models to learn the linguistic behaviors of a subgroup within larger corpora of Yelp reviews. We show that even in cases where a specific keyphrase is limited or not present at all in the training corpora, the GPT is able to accurately generate large volumes of text that have the correct sentiment.
arXiv Detail & Related papers (2022-04-15T14:33:58Z)
Positioning yourself in the maze of Neural Text Generation: A Task-Agnostic Survey [54.34370423151014]
This paper surveys the components of modeling approaches relaying task impacts across various generation tasks such as storytelling, summarization, translation etc. We present an abstraction of the imperative techniques with respect to learning paradigms, pretraining, modeling approaches, decoding and the key challenges outstanding in the field in each of them.
arXiv Detail & Related papers (2020-10-14T17:54:42Z)
ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine Reading Comprehension [53.037401638264235]
We present an evaluation server, ORB, that reports performance on seven diverse reading comprehension datasets. The evaluation server places no restrictions on how models are trained, so it is a suitable test bed for exploring training paradigms and representation learning.
arXiv Detail & Related papers (2019-12-29T07:27:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.