Detecting The Corruption Of Online Questionnaires By Artificial
Intelligence
- URL: http://arxiv.org/abs/2308.07499v1
- Date: Mon, 14 Aug 2023 23:47:56 GMT
- Title: Detecting The Corruption Of Online Questionnaires By Artificial
Intelligence
- Authors: Benjamin Lebrun, Sharon Temtsin, Andrew Vonasch, Christoph Bartneck
- Abstract summary: This study tested if text generated by an AI for the purpose of an online study can be detected by both humans and automatic AI detection systems.
Humans were able to correctly identify authorship of text above chance level.
But their performance was still below what would be required to ensure satisfactory data quality.
- Score: 1.9458156037869137
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Online questionnaires that use crowd-sourcing platforms to recruit
participants have become commonplace, due to their ease of use and low costs.
Artificial Intelligence (AI) based Large Language Models (LLM) have made it
easy for bad actors to automatically fill in online forms, including generating
meaningful text for open-ended tasks. These technological advances threaten the
data quality for studies that use online questionnaires. This study tested if
text generated by an AI for the purpose of an online study can be detected by
both humans and automatic AI detection systems. While humans were able to
correctly identify authorship of text above chance level (76 percent accuracy),
their performance was still below what would be required to ensure satisfactory
data quality. Researchers currently have to rely on the disinterest of bad
actors to successfully use open-ended responses as a useful tool for ensuring
data quality. Automatic AI detection systems are currently completely unusable.
If AIs become too prevalent in submitting responses then the costs associated
with detecting fraudulent submissions will outweigh the benefits of online
questionnaires. Individual attention checks will no longer be a sufficient tool
to ensure good data quality. This problem can only be systematically addressed
by crowd-sourcing platforms. They cannot rely on automatic AI detection systems
and it is unclear how they can ensure data quality for their paying clients.
Related papers
- Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts [0.0]
A huge number of detectors and collections with AI fragments have emerged, and several detection methods even showed recognition quality up to 99.9%.
Are detectors actually highly trustworthy or do their high benchmark scores come from the poor quality of evaluation datasets?
We present a systematic review of datasets from competitions dedicated to AI-generated content detection and propose methods for evaluating the quality of datasets containing AI-generated fragments.
arXiv Detail & Related papers (2024-10-18T17:59:57Z) - Personhood credentials: Artificial intelligence and the value of privacy-preserving tools to distinguish who is real online [5.365346373228897]
Malicious actors have long used misleading identities to conduct fraud, spread disinformation, and carry out other deceptive schemes.
With the advent of increasingly capable AI, bad actors can amplify the potential scale and effectiveness of their operations.
We analyze the value of a new tool to address this challenge: "personhood credentials" (PHCs)
PHCs empower users to demonstrate that they are real people -- not AIs -- to online services, without disclosing any personal information.
arXiv Detail & Related papers (2024-08-15T02:41:25Z) - Data Readiness for AI: A 360-Degree Survey [0.9343816282846432]
Poor quality data produces inaccurate and ineffective AI models.
Numerous R&D efforts have been spent on improving data quality.
We propose a taxonomy of data readiness for AI (DRAI) metrics for structured and unstructured datasets.
arXiv Detail & Related papers (2024-04-08T15:19:57Z) - Towards Possibilities & Impossibilities of AI-generated Text Detection:
A Survey [97.33926242130732]
Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses.
Despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs.
To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text.
arXiv Detail & Related papers (2023-10-23T18:11:32Z) - Who Said That? Benchmarking Social Media AI Detection [12.862865254507177]
This paper introduces SAID (Social media AI Detection), a novel benchmark developed to assess AI-text detection models' capabilities in real social media platforms.
It incorporates real AI-generate text from popular social media platforms like Zhihu and Quora.
A notable finding of our study, based on the Zhihu dataset, reveals that annotators can distinguish between AI-generated and human-generated texts with an average accuracy rate of 96.5%.
arXiv Detail & Related papers (2023-10-12T11:35:24Z) - Human-Centric Multimodal Machine Learning: Recent Advances and Testbed
on AI-based Recruitment [66.91538273487379]
There is a certain consensus about the need to develop AI applications with a Human-Centric approach.
Human-Centric Machine Learning needs to be developed based on four main requirements: (i) utility and social good; (ii) privacy and data ownership; (iii) transparency and accountability; and (iv) fairness in AI-driven decision-making processes.
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
arXiv Detail & Related papers (2023-02-13T16:44:44Z) - The Role of AI in Drug Discovery: Challenges, Opportunities, and
Strategies [97.5153823429076]
The benefits, challenges and drawbacks of AI in this field are reviewed.
The use of data augmentation, explainable AI, and the integration of AI with traditional experimental methods are also discussed.
arXiv Detail & Related papers (2022-12-08T23:23:39Z) - A Survey of Machine Unlearning [56.017968863854186]
Recent regulations now require that, on request, private information about a user must be removed from computer systems.
ML models often remember' the old data.
Recent works on machine unlearning have not been able to completely solve the problem.
arXiv Detail & Related papers (2022-09-06T08:51:53Z) - LioNets: A Neural-Specific Local Interpretation Technique Exploiting
Penultimate Layer Information [6.570220157893279]
Interpretable machine learning (IML) is an urgent topic of research.
This paper focuses on a local-based, neural-specific interpretation process applied to textual and time-series data.
arXiv Detail & Related papers (2021-04-13T09:39:33Z) - Trustworthy AI [75.99046162669997]
Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, are some of the most prominent limitations.
We propose the tutorial on Trustworthy AI to address six critical issues in enhancing user and public trust in AI systems.
arXiv Detail & Related papers (2020-11-02T20:04:18Z) - Bias in Multimodal AI: Testbed for Fair Automatic Recruitment [73.85525896663371]
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
We train automatic recruitment algorithms using a set of multimodal synthetic profiles consciously scored with gender and racial biases.
Our methodology and results show how to generate fairer AI-based tools in general, and in particular fairer automated recruitment systems.
arXiv Detail & Related papers (2020-04-15T15:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.