Text Mining for Processing Interview Data in Computational Social
Science
- URL: http://arxiv.org/abs/2011.14037v1
- Date: Sat, 28 Nov 2020 00:44:35 GMT
- Title: Text Mining for Processing Interview Data in Computational Social
Science
- Authors: Jussi Karlgren, Renee Li, Eva M Meyersson Milgrom
- Abstract summary: We use commercially available text analysis technology to process interview text data from a computational social science study.
We find that topical clustering and terminological enrichment provide for convenient exploration and quantification of the responses.
We encourage studies in social science to use text analysis, especially for exploratory open-ended studies.
- Score: 0.6820436130599382
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We use commercially available text analysis technology to process interview
text data from a computational social science study. We find that topical
clustering and terminological enrichment provide for convenient exploration and
quantification of the responses. This makes it possible to generate and test
hypotheses and to compare textual and non-textual variables, and saves analyst
effort. We encourage studies in social science to use text analysis, especially
for exploratory open-ended studies. We discuss how replicability requirements
are met by text analysis technology. We note that the most recent learning
models are not designed with transparency in mind, and that research requires a
model to be editable and its decisions to be explainable. The tools available
today, such as the one used in the present study, are not built for processing
interview texts. While many of the variables under consideration are
quantifiable using lexical statistics, we find that some interesting and
potentially valuable features are difficult or impossible to automatise
reliably at present. We note that there are some potentially interesting
applications for traditional natural language processing mechanisms such as
named entity recognition and anaphora resolution in this application area. We
conclude with a suggestion for language technologists to investigate the
challenge of processing interview data comprehensively, especially the
interplay between question and response, and we encourage social science
researchers not to hesitate to use text analysis tools, especially for the
exploratory phase of processing interview data.?
Related papers
- Likelihood as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that likelihoods serve as an effective gauge for language model performance.
We propose two methods that use question likelihood as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z) - A Primer on Word Embeddings: AI Techniques for Text Analysis in Social Work [0.0]
This paper introduces word embeddings to social work researchers.
We discuss fundamental concepts, technical foundations, and practical applications.
We conclude that successfully implementing embedding technologies in social work requires developing domain-specific models, creating accessible tools, and establishing best practices aligned with social work's ethical principles.
arXiv Detail & Related papers (2024-11-11T17:33:51Z) - Automating the Information Extraction from Semi-Structured Interview
Transcripts [0.0]
This paper explores the development and application of an automated system designed to extract information from semi-structured interview transcripts.
We present a user-friendly software prototype that enables researchers to efficiently process and visualize the thematic structure of interview data.
arXiv Detail & Related papers (2024-03-07T13:53:03Z) - Artificial intelligence to automate the systematic review of scientific
literature [0.0]
We present a survey of AI techniques proposed in the last 15 years to help researchers conduct systematic analyses of scientific literature.
We describe the tasks currently supported, the types of algorithms applied, and available tools proposed in 34 primary studies.
arXiv Detail & Related papers (2024-01-13T19:12:49Z) - Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence [0.0]
Large language models (LLMs) have been shown to present an unprecedented opportunity to scale up data analytics in the humanities and social sciences.
We build on mixed methods quantitizing and converting design principles, and feature analysis from linguistics, to transparently integrate human expertise and machine scalability.
The approach is discussed and demonstrated in over a dozen LLM-assisted case studies, covering 9 diverse languages, multiple disciplines and tasks.
arXiv Detail & Related papers (2023-09-24T14:21:50Z) - Chat2Brain: A Method for Mapping Open-Ended Semantic Queries to Brain
Activation Maps [59.648646222905235]
We propose a method called Chat2Brain that combines LLMs to basic text-2-image model, known as Text2Brain, to map semantic queries to brain activation maps.
We demonstrate that Chat2Brain can synthesize plausible neural activation patterns for more complex tasks of text queries.
arXiv Detail & Related papers (2023-09-10T13:06:45Z) - Revisiting the Roles of "Text" in Text Games [102.22750109468652]
This paper investigates the roles of text in the face of different reinforcement learning challenges.
We propose a simple scheme to extract relevant contextual information into an approximate state hash.
Such a lightweight plug-in achieves competitive performance with state-of-the-art text agents.
arXiv Detail & Related papers (2022-10-15T21:52:39Z) - ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational
Finance Question Answering [70.6359636116848]
We propose a new large-scale dataset, ConvFinQA, to study the chain of numerical reasoning in conversational question answering.
Our dataset poses great challenge in modeling long-range, complex numerical reasoning paths in real-world conversations.
arXiv Detail & Related papers (2022-10-07T23:48:50Z) - Polling Latent Opinions: A Method for Computational Sociolinguistics
Using Transformer Language Models [4.874780144224057]
We use the capacity for memorization and extrapolation of Transformer Language Models to learn the linguistic behaviors of a subgroup within larger corpora of Yelp reviews.
We show that even in cases where a specific keyphrase is limited or not present at all in the training corpora, the GPT is able to accurately generate large volumes of text that have the correct sentiment.
arXiv Detail & Related papers (2022-04-15T14:33:58Z) - Positioning yourself in the maze of Neural Text Generation: A
Task-Agnostic Survey [54.34370423151014]
This paper surveys the components of modeling approaches relaying task impacts across various generation tasks such as storytelling, summarization, translation etc.
We present an abstraction of the imperative techniques with respect to learning paradigms, pretraining, modeling approaches, decoding and the key challenges outstanding in the field in each of them.
arXiv Detail & Related papers (2020-10-14T17:54:42Z) - ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine
Reading Comprehension [53.037401638264235]
We present an evaluation server, ORB, that reports performance on seven diverse reading comprehension datasets.
The evaluation server places no restrictions on how models are trained, so it is a suitable test bed for exploring training paradigms and representation learning.
arXiv Detail & Related papers (2019-12-29T07:27:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.