Related papers: A Unified Evaluation Framework for Novelty Detection and Accommodation in NLP with an Instantiation in Authorship Attribution

A Unified Evaluation Framework for Novelty Detection and Accommodation in NLP with an Instantiation in Authorship Attribution

URL: http://arxiv.org/abs/2305.05079v1
Date: Mon, 8 May 2023 22:37:30 GMT
Title: A Unified Evaluation Framework for Novelty Detection and Accommodation in NLP with an Instantiation in Authorship Attribution
Authors: Neeraj Varshney, Himanshu Gupta, Eric Robertson, Bing Liu, Chitta Baral
Abstract summary: We introduce 'NoveltyTask', a multi-stage task to evaluate a system's performance on pipelined novelty 'detection' and 'accommodation' tasks. We use Amazon reviews corpus and compile a large dataset (consisting of 250k instances across 200 authors/labels) for NoveltyTask.
Score: 25.52598351435189
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: State-of-the-art natural language processing models have been shown to achieve remarkable performance in 'closed-world' settings where all the labels in the evaluation set are known at training time. However, in real-world settings, 'novel' instances that do not belong to any known class are often observed. This renders the ability to deal with novelties crucial. To initiate a systematic research in this important area of 'dealing with novelties', we introduce 'NoveltyTask', a multi-stage task to evaluate a system's performance on pipelined novelty 'detection' and 'accommodation' tasks. We provide mathematical formulation of NoveltyTask and instantiate it with the authorship attribution task that pertains to identifying the correct author of a given text. We use Amazon reviews corpus and compile a large dataset (consisting of 250k instances across 200 authors/labels) for NoveltyTask. We conduct comprehensive experiments and explore several baseline methods for the task. Our results show that the methods achieve considerably low performance making the task challenging and leaving sufficient room for improvement. Finally, we believe our work will encourage research in this underexplored area of dealing with novelties, an important step en route to developing robust systems.

Related papers

Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation [54.945281159783896]
We present a scalable pipeline for automatically generating high-quality training data for web agents.<n>We introduce a novel constraint-based evaluation framework that provides fine-grained assessment of progress towards task completion.
arXiv Detail & Related papers (2026-02-13T02:52:18Z)
NoveltyRank: Estimating Conceptual Novelty of AI Papers [8.218640708170119]
This project aims to develop a model that estimates and ranks conceptual novelty of AI papers.<n>Our approach evaluates novelty primarily through a paper's title, abstract, and semantic similarity to prior literature.<n>We fine-tune Qwen3-4B-Instruct-2507 and SciBERT on both tasks, benchmarking against GPT-5.1 to analyze how task formulation and modeling choices affect performance.
arXiv Detail & Related papers (2025-12-12T03:33:32Z)
Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning. We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads. We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
Movie2Story: A framework for understanding videos and telling stories in the form of novel text [0.0]
We propose a novel benchmark to evaluate text generation capabilities in scenarios enriched with auxiliary information. Our work introduces an innovative automatic dataset generation method to ensure the availability of accurate auxiliary information. Our experiments reveal that current Multi-modal Large Language Models (MLLMs) perform suboptimally under the proposed evaluation metrics.
arXiv Detail & Related papers (2024-12-19T15:44:04Z)
Artificial Intuition: Efficient Classification of Scientific Abstracts [42.299140272218274]
Short scientific texts efficiently transmit dense information to experts possessing a rich body of knowledge to aid interpretation. To address this gap, we have developed a novel approach to generate and appropriately assign coarse domain-specific labels. We show that a Large Language Model (LLM) can provide metadata essential to the task, in a process akin to the augmentation of supplemental knowledge.
arXiv Detail & Related papers (2024-07-08T16:34:47Z)
Systematic Task Exploration with LLMs: A Study in Citation Text Generation [63.50597360948099]
Large language models (LLMs) bring unprecedented flexibility in defining and executing complex, creative natural language generation (NLG) tasks. We propose a three-component research framework that consists of systematic input manipulation, reference data, and output measurement. We use this framework to explore citation text generation -- a popular scholarly NLP task that lacks consensus on the task definition and evaluation metric.
arXiv Detail & Related papers (2024-07-04T16:41:08Z)
Graded Relevance Scoring of Written Essays with Dense Retrieval [4.021352247826289]
We propose a novel approach for graded relevance scoring of written essays that employs dense retrieval encoders. We leverage Contriever, which is pre-trained with contrastive learning and demonstrated comparable performance to supervised dense retrieval models. Our method establishes a new state-of-the-art performance in the task-specific scenario, while its extension for the cross-task scenario exhibited a performance that is on par with the state-of-the-art model for that scenario.
arXiv Detail & Related papers (2024-05-08T16:37:58Z)
Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral. This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z)
Little Giants: Exploring the Potential of Small LLMs as Evaluation Metrics in Summarization in the Eval4NLP 2023 Shared Task [53.163534619649866]
This paper focuses on assessing the effectiveness of prompt-based techniques to empower Large Language Models to handle the task of quality estimation. We conducted systematic experiments with various prompting techniques, including standard prompting, prompts informed by annotator instructions, and innovative chain-of-thought prompting. Our work reveals that combining these approaches using a "small", open source model (orca_mini_v3_7B) yields competitive results.
arXiv Detail & Related papers (2023-11-01T17:44:35Z)
NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research [96.53307645791179]
We introduce the Never-Ending VIsual-classification Stream (NEVIS'22), a benchmark consisting of a stream of over 100 visual classification tasks. Despite being limited to classification, the resulting stream has a rich diversity of tasks from OCR, to texture analysis, scene recognition, and so forth. Overall, NEVIS'22 poses an unprecedented challenge for current sequential learning approaches due to the scale and diversity of tasks.
arXiv Detail & Related papers (2022-11-15T18:57:46Z)
Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification. Our key idea is to represent each task using gradient information from a base model. Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z)
Evaluating Document Coherence Modelling [37.287725949616934]
We examine the performance of a broad range of pretrained LMs on a sentence intrusion detection task for English. Our experiments show that pretrained LMs perform impressively in in-domain evaluation, but experience a substantial drop in the cross-domain setting.
arXiv Detail & Related papers (2021-03-18T10:05:06Z)
Curriculum-Meta Learning for Order-Robust Continual Relation Extraction [12.494209368988253]
We propose a novel curriculum-meta learning method to tackle the challenges of continual relation extraction. We combine meta learning and curriculum learning to quickly adapt model parameters to a new task. We present novel difficulty-based metrics to quantitatively measure the extent of order-sensitivity of a given model.
arXiv Detail & Related papers (2021-01-06T08:52:34Z)
Text Recognition in Real Scenarios with a Few Labeled Samples [55.07859517380136]
Scene text recognition (STR) is still a hot research topic in computer vision field. This paper proposes a few-shot adversarial sequence domain adaptation (FASDA) approach to build sequence adaptation. Our approach can maximize the character-level confusion between the source domain and the target domain.
arXiv Detail & Related papers (2020-06-22T13:03:01Z)
Exploring Bottom-up and Top-down Cues with Attentive Learning for Webly Supervised Object Detection [76.9756607002489]
We propose a novel webly supervised object detection (WebSOD) method for novel classes. Our proposed method combines bottom-up and top-down cues for novel class detection. We demonstrate our proposed method on PASCAL VOC dataset with three different novel/base splits.
arXiv Detail & Related papers (2020-03-22T03:11:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.