Related papers: EasyECR: A Library for Easy Implementation and Evaluation of Event Coreference Resolution Models

EasyECR: A Library for Easy Implementation and Evaluation of Event Coreference Resolution Models

URL: http://arxiv.org/abs/2406.14106v1
Date: Thu, 20 Jun 2024 08:40:21 GMT
Title: EasyECR: A Library for Easy Implementation and Evaluation of Event Coreference Resolution Models
Authors: Yuncong Li, Tianhua Xu, Sheng-hua Zhong, Haiqin Yang,
Abstract summary: Event Coreference Resolution (ECR) is the task of clustering event mentions that refer to the same real-world event. EasyECR is the first open-source library designed to standardize data structures and abstract ECR pipelines.
Score: 9.773388073690326
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Event Coreference Resolution (ECR) is the task of clustering event mentions that refer to the same real-world event. Despite significant advancements, ECR research faces two main challenges: limited generalizability across domains due to narrow dataset evaluations, and difficulties in comparing models within diverse ECR pipelines. To address these issues, we develop EasyECR, the first open-source library designed to standardize data structures and abstract ECR pipelines for easy implementation and fair evaluation. More specifically, EasyECR integrates seven representative pipelines and ten popular benchmark datasets, enabling model evaluations on various datasets and promoting the development of robust ECR pipelines. By conducting extensive evaluation via our EasyECR, we find that, \lowercase\expandafter{\romannumeral1}) the representative ECR pipelines cannot generalize across multiple datasets, hence evaluating ECR pipelines on multiple datasets is necessary, \lowercase\expandafter{\romannumeral2}) all models in ECR pipelines have a great effect on pipeline performance, therefore, when one model in ECR pipelines are compared, it is essential to ensure that the other models remain consistent. Additionally, reproducing ECR results is not trivial, and the developed library can help reduce this discrepancy. The experimental results provide valuable baselines for future research.

Related papers

LongWeave: A Long-Form Generation Benchmark Bridging Real-World Relevance and Verifiability [60.451734326001564]
We introduce textbfLongWeave, which balances real-world and verifiable assessment with Constraint-Verifier Evaluation (CoV-Eval)<n>LongWeave supports customizable input/output lengths (up to 64K/8K tokens) across seven distinct tasks.<n> Evaluation on 23 Large Language Models shows that even state-of-the-art models encounter significant challenges in long-form generation as real-world complexity and output length increase.
arXiv Detail & Related papers (2025-10-28T12:11:12Z)
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation [72.34977512403643]
Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm for enhancing large language models (LLMs) by retrieving relevant documents from an external corpus.<n>Existing RAG systems primarily focus on unimodal text documents, and often fall short in real-world scenarios where both queries and documents may contain mixed modalities (such as text and images)<n>We propose Nyx, a unified mixed-modal to mixed-modal retriever tailored for Universal Retrieval-Augmented Generation scenarios.
arXiv Detail & Related papers (2025-10-20T09:56:43Z)
CodeFuse-CR-Bench: A Comprehensiveness-aware Benchmark for End-to-End Code Review Evaluation in Python Projects [23.9752442213364]
We introduce CodeFuse-CR-Bench, the first comprehensiveness-aware benchmark for repository-level CR evaluation.<n>CodeFuse-CR-Bench comprises 601 high-quality instances from 70 Python projects covering nine Pull-Request (PR) problem domains.<n>We present the first large-scale assessment of state-of-the-art Large Language Models (LLMs) on this comprehensive CR task.
arXiv Detail & Related papers (2025-09-18T11:24:09Z)
MSRS: Evaluating Multi-Source Retrieval-Augmented Generation [51.717139132190574]
Many real-world applications demand the ability to integrate and summarize information scattered across multiple sources.<n>We present a scalable framework for constructing evaluation benchmarks that challenge RAG systems to integrate information across distinct sources.
arXiv Detail & Related papers (2025-08-28T14:59:55Z)
VISTA-OCR: Towards generative and interactive end to end OCR models [3.7548609506798494]
VISTA-OCR is a lightweight architecture that unifies text detection and recognition within a single generative model. Built on an encoder-decoder architecture, VISTA-OCR is progressively trained, starting with the visual feature extraction phase. To enhance the model's capabilities, we built a new dataset composed of real-world examples enriched with bounding box annotations and synthetic samples.
arXiv Detail & Related papers (2025-04-04T17:39:53Z)
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation [27.897982337072335]
Retrieval-augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating external knowledge to reduce hallucinations and incorporate up-to-date information without retraining. As an essential part of RAG, external knowledge bases are commonly built by extracting structured data from unstructured PDF documents using Optical Character Recognition (OCR) In this paper, we introduce OHRBench, the first benchmark for understanding the cascading impact of OCR on RAG systems.
arXiv Detail & Related papers (2024-12-03T17:23:47Z)
Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs [67.54302101989542]
Legal case retrieval aims to provide similar cases as references for a given fact description. Existing works mainly focus on case-to-case retrieval using lengthy queries. Data scale is insufficient to satisfy the training requirements of existing data-hungry neural models.
arXiv Detail & Related papers (2024-10-09T06:26:39Z)
EBES: Easy Benchmarking for Event Sequences [17.277513178760348]
Event Sequences (EvS) refer to sequential data characterized by irregular sampling intervals and a mix of categorical and numerical features. EBES is a comprehensive benchmark for EvS classification with sequence-level targets. It features standardized evaluation scenarios and protocols, along with an open-source PyTorch library that implements 9 modern models.
arXiv Detail & Related papers (2024-10-04T13:03:43Z)
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework [66.93260816493553]
This paper introduces RAGEval, a framework designed to assess RAG systems across diverse scenarios. With a focus on factual accuracy, we propose three novel metrics: Completeness, Hallucination, and Irrelevance. Experimental results show that RAGEval outperforms zero-shot and one-shot methods in terms of clarity, safety, conformity, and richness of generated samples.
arXiv Detail & Related papers (2024-08-02T13:35:11Z)
A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems [67.52782366565658]
State-of-the-art recommender systems (RSs) depend on categorical features, which ecoded by embedding vectors, resulting in excessively large embedding tables. Despite the prosperity of lightweight embedding-based RSs, a wide diversity is seen in evaluation protocols. This study investigates various LERS' performance, efficiency, and cross-task transferability via a thorough benchmarking process.
arXiv Detail & Related papers (2024-06-25T07:45:00Z)
Contextualization with SPLADE for High Recall Retrieval [5.973857434357868]
High Recall Retrieval (HRR) is a search problem that optimize the cost of retrieving most relevant documents in a given collection. In this work, we leverage SPLADE, an efficient retrieval model that transforms documents into contextualized sparse vectors. It reduces 10% and 18% of the review cost in two HRR evaluation collections under a one-phase review workflow with a target recall of 80%.
arXiv Detail & Related papers (2024-05-07T03:05:37Z)
Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR) It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z)
Data Roaming and Quality Assessment for Composed Image Retrieval [25.452015862927766]
Composed Image Retrieval (CoIR) involves queries that combine image and text modalities, allowing users to express their intent more effectively. We introduce the Large Scale Composed Image Retrieval (LaSCo) dataset, a new CoIR dataset which is ten times larger than existing ones. We also introduce a new CoIR baseline, the Cross-Attention driven Shift (CASE)
arXiv Detail & Related papers (2023-03-16T16:02:24Z)
Incorporating Relevance Feedback for Information-Seeking Retrieval using Few-Shot Document Re-Ranking [56.80065604034095]
We introduce a kNN approach that re-ranks documents based on their similarity with the query and the documents the user considers relevant. To evaluate our different integration strategies, we transform four existing information retrieval datasets into the relevance feedback scenario.
arXiv Detail & Related papers (2022-10-19T16:19:37Z)
Sound Event Classification in an Industrial Environment: Pipe Leakage Detection Use Case [3.9414768019101682]
A multi-stage Machine Learning pipeline is proposed for pipe leakage detection in an industrial environment. The proposed pipeline applies multiple steps, each addressing the environment's challenges. The results show that the model produces excellent results with 99% accuracy and an F1-score of 0.93 and 0.9 for the respective datasets.
arXiv Detail & Related papers (2022-05-05T15:26:22Z)
On Continual Model Refinement in Out-of-Distribution Data Streams [64.62569873799096]
Real-world natural language processing (NLP) models need to be continually updated to fix the prediction errors in out-of-distribution (OOD) data streams. Existing continual learning (CL) problem setups cannot cover such a realistic and complex scenario. We propose a new CL problem formulation dubbed continual model refinement (CMR)
arXiv Detail & Related papers (2022-05-04T11:54:44Z)
Donut: Document Understanding Transformer without OCR [17.397447819420695]
We propose a novel VDU model that is end-to-end trainable without underpinning OCR framework. Our approach achieves state-of-the-art performance on various document understanding tasks in public benchmark datasets and private industrial service datasets.
arXiv Detail & Related papers (2021-11-30T18:55:19Z)
Data Augmentation for Abstractive Query-Focused Multi-Document Summarization [129.96147867496205]
We present two QMDS training datasets, which we construct using two data augmentation methods. These two datasets have complementary properties, i.e., QMDSCNN has real summaries but queries are simulated, while QMDSIR has real queries but simulated summaries. We build end-to-end neural network models on the combined datasets that yield new state-of-the-art transfer results on DUC datasets.
arXiv Detail & Related papers (2021-03-02T16:57:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.