Related papers: AINL-Eval 2025 Shared Task: Detection of AI-Generated Scientific Abstracts in Russian

AINL-Eval 2025 Shared Task: Detection of AI-Generated Scientific Abstracts in Russian

URL: http://arxiv.org/abs/2508.09622v1
Date: Wed, 13 Aug 2025 08:53:17 GMT
Title: AINL-Eval 2025 Shared Task: Detection of AI-Generated Scientific Abstracts in Russian
Authors: Tatiana Batura, Elena Bruches, Milana Shvenk, Valentin Malykh,
Abstract summary: Large language models (LLMs) have revolutionized text generation, making it increasingly difficult to distinguish between human- and AI-generated content.<n>To address this critical gap, we introduce the AINL-Eval 2025 Shared Task, specifically focused on the detection of AI-generated scientific abstracts in Russian.<n>We present a novel, large-scale dataset comprising 52,305 samples, including human-written abstracts across 12 diverse scientific domains and AI-generated counterparts from five state-of-the-art LLMs.
Score: 4.819285818808181
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid advancement of large language models (LLMs) has revolutionized text generation, making it increasingly difficult to distinguish between human- and AI-generated content. This poses a significant challenge to academic integrity, particularly in scientific publishing and multilingual contexts where detection resources are often limited. To address this critical gap, we introduce the AINL-Eval 2025 Shared Task, specifically focused on the detection of AI-generated scientific abstracts in Russian. We present a novel, large-scale dataset comprising 52,305 samples, including human-written abstracts across 12 diverse scientific domains and AI-generated counterparts from five state-of-the-art LLMs (GPT-4-Turbo, Gemma2-27B, Llama3.3-70B, Deepseek-V3, and GigaChat-Lite). A core objective of the task is to challenge participants to develop robust solutions capable of generalizing to both (i) previously unseen scientific domains and (ii) models not included in the training data. The task was organized in two phases, attracting 10 teams and 159 submissions, with top systems demonstrating strong performance in identifying AI-generated content. We also establish a continuous shared task platform to foster ongoing research and long-term progress in this important area. The dataset and platform are publicly available at https://github.com/iis-research-team/AINL-Eval-2025.

Related papers

HybridQuestion: Human-AI Collaboration for Identifying High-Impact Research Questions [48.1029746371619]
"AI Scientist" paradigm is transforming scientific research by automating key stages of the research process.<n>Key question remains unclear: can AI scientists identify meaningful research questions?<n>We propose a human-AI hybrid solution that integrates scalable data processing capabilities of AI with the value judgment of human experts.
arXiv Detail & Related papers (2025-12-18T15:10:38Z)
M-DAIGT: A Shared Task on Multi-Domain Detection of AI-Generated Text [3.91352287996586]
We introduce the Multi-Domain Detection of AI-Generated Text (M-DAIGT) shared task.<n>M-DAIGT comprises two binary classification subtasks: News Article Detection (NAD) and Academic Writing Detection (AWD)<n>A total of 46 unique teams registered for the shared task, of which four teams submitted final results.
arXiv Detail & Related papers (2025-11-14T14:26:31Z)
A Comprehensive Dataset for Human vs. AI Generated Text Detection [23.0218614564443]
We present a comprehensive dataset comprising over 58,000 text samples from authentic New York Times articles.<n>The dataset provides original article abstracts as prompts, full human-authored narratives.<n>We establish baseline results for two key tasks: distinguishing human-written from AI-generated text, and attributing AI texts to their generating models with an accuracy of 8.92%.
arXiv Detail & Related papers (2025-10-26T23:50:52Z)
AI-generated Text Detection: A Multifaceted Approach to Binary and Multiclass Classification [0.13392361199400257]
Large Language Models (LLMs) have demonstrated remarkable capabilities in generating text that closely resembles human writing.<n>Such capabilities are prone to potential misuse, such as fake news generation, spam email creation, and misuse in academic assignments.<n>We propose two neural architectures: an optimized model and a simpler variant.<n>For Task A, the optimized neural architecture achieved fifth place with $F1$ score of 0.994, and for Task B, the simpler neural architecture also ranked fifth place with $F1$ score of 0.627.
arXiv Detail & Related papers (2025-05-15T09:28:06Z)
A Large-Scale Vision-Language Dataset Derived from Open Scientific Literature to Advance Biomedical Generalist AI [70.06771291117965]
We introduce Biomedica, an open-source dataset derived from the PubMed Central Open Access subset.<n>Biomedica contains over 6 million scientific articles and 24 million image-text pairs.<n>We provide scalable streaming and search APIs through a web server, facilitating seamless integration with AI systems.
arXiv Detail & Related papers (2025-03-26T05:56:46Z)
Towards Global AI Inclusivity: A Large-Scale Multilingual Terminology Dataset (GIST) [19.91873751674613]
GIST is a large-scale multilingual AI terminology dataset containing 5K terms extracted from top AI conference papers spanning 2000 to 2023.<n>The terms are translated into Arabic, Chinese, French, Japanese, and Russian using a hybrid framework that combines LLMs for extraction with human expertise for translation.<n>The dataset's quality is benchmarked against existing resources, demonstrating superior translation accuracy through crowdsourced evaluation.
arXiv Detail & Related papers (2024-12-24T11:50:18Z)
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories [55.161075901665946]
Super aims to capture the realistic challenges faced by researchers working with Machine Learning (ML) and Natural Language Processing (NLP) research repositories. Our benchmark comprises three distinct problem sets: 45 end-to-end problems with annotated expert solutions, 152 sub problems derived from the expert set that focus on specific challenges, and 602 automatically generated problems for larger-scale development. We show that state-of-the-art approaches struggle to solve these problems with the best model (GPT-4o) solving only 16.3% of the end-to-end set, and 46.1% of the scenarios.
arXiv Detail & Related papers (2024-09-11T17:37:48Z)
A Survey on Vision-Language-Action Models for Embodied AI [71.16123093739932]
Embodied AI is widely recognized as a key element of artificial general intelligence.<n>A new category of multimodal models has emerged to address language-conditioned robotic tasks in embodied AI.<n>We present the first survey on vision-language-action models for embodied AI.
arXiv Detail & Related papers (2024-05-23T01:43:54Z)
Generative AI in Writing Research Papers: A New Type of Algorithmic Bias and Uncertainty in Scholarly Work [0.38850145898707145]
Large language models (LLMs) and generative AI tools present challenges in identifying and addressing biases. generative AI tools are susceptible to goal misgeneralization, hallucinations, and adversarial attacks such as red teaming prompts. We find that incorporating generative AI in the process of writing research manuscripts introduces a new type of context-induced algorithmic bias.
arXiv Detail & Related papers (2023-12-04T04:05:04Z)
Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey [97.33926242130732]
Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses. Despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs. To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text.
arXiv Detail & Related papers (2023-10-23T18:11:32Z)
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT [63.58711128819828]
ChatGPT and other Generative AI (GAI) techniques belong to the category of Artificial Intelligence Generated Content (AIGC) The goal of AIGC is to make the content creation process more efficient and accessible, allowing for the production of high-quality content at a faster pace.
arXiv Detail & Related papers (2023-03-07T20:36:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.