A request for clarity over the End of Sequence token in the
Self-Critical Sequence Training
- URL: http://arxiv.org/abs/2305.12254v1
- Date: Sat, 20 May 2023 18:01:47 GMT
- Title: A request for clarity over the End of Sequence token in the
Self-Critical Sequence Training
- Authors: Jia Cheng Hu, Roberto Cavicchioli and Alessandro Capotondi
- Abstract summary: This work proposes to solve the problem by spreading awareness of the issue itself.
In particular, we invite future works to share a simple and informative signature with the help of a library called SacreEOS.
- Score: 69.3939291118954
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The Image Captioning research field is currently compromised by the lack of
transparency and awareness over the End-of-Sequence token (<Eos>) in the
Self-Critical Sequence Training. If the <Eos> token is omitted, a model can
boost its performance up to +4.1 CIDEr-D using trivial sentence fragments.
While this phenomenon poses an obstacle to a fair evaluation and comparison of
established works, people involved in new projects are given the arduous choice
between lower scores and unsatisfactory descriptions due to the competitive
nature of the research. This work proposes to solve the problem by spreading
awareness of the issue itself. In particular, we invite future works to share a
simple and informative signature with the help of a library called SacreEOS.
Code available at
\emph{\href{https://github.com/jchenghu/sacreeos}{https://github.com/jchenghu/sacreeos}}
Related papers
- Keypoint Promptable Re-Identification [76.31113049256375]
Occluded Person Re-Identification (ReID) is a metric learning task that involves matching occluded individuals based on their appearance.
We introduce Keypoint Promptable ReID (KPR), a novel formulation of the ReID problem that explicitly complements the input bounding box with a set of semantic keypoints.
We release custom keypoint labels for four popular ReID benchmarks. Experiments on person retrieval, but also on pose tracking, demonstrate that our method systematically surpasses previous state-of-the-art approaches.
arXiv Detail & Related papers (2024-07-25T15:20:58Z) - DenoSent: A Denoising Objective for Self-Supervised Sentence
Representation Learning [59.4644086610381]
We propose a novel denoising objective that inherits from another perspective, i.e., the intra-sentence perspective.
By introducing both discrete and continuous noise, we generate noisy sentences and then train our model to restore them to their original form.
Our empirical evaluations demonstrate that this approach delivers competitive results on both semantic textual similarity (STS) and a wide range of transfer tasks.
arXiv Detail & Related papers (2024-01-24T17:48:45Z) - Divide & Bind Your Attention for Improved Generative Semantic Nursing [19.67265541441422]
We propose Divide & Bind to address the challenges posed by complex prompts and scenarios involving multiple entities.
Our approach stands out in its ability to faithfully synthesize desired objects with improved attribute alignment from complex prompts.
arXiv Detail & Related papers (2023-07-20T13:33:28Z) - Task-Adaptive Saliency Guidance for Exemplar-free Class Incremental Learning [60.501201259732625]
We introduce task-adaptive saliency for EFCIL and propose a new framework, which we call Task-Adaptive Saliency Supervision (TASS)
Our experiments demonstrate that our method can better preserve saliency maps across tasks and achieve state-of-the-art results on the CIFAR-100, Tiny-ImageNet, and ImageNet-Subset EFCIL benchmarks.
arXiv Detail & Related papers (2022-12-16T02:43:52Z) - A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive
Learning Framework for Sentence Embeddings [28.046786376565123]
We propose a semantics-aware contrastive learning framework for sentence embeddings, termed Pseudo-Token BERT (PT-BERT)
We exploit the pseudo-token space (i.e., latent semantic space) representation of a sentence while eliminating the impact of superficial features such as sentence length and syntax.
Our model outperforms the state-of-the-art baselines on six standard semantic textual similarity (STS) tasks.
arXiv Detail & Related papers (2022-03-11T12:29:22Z) - Consensus Synergizes with Memory: A Simple Approach for Anomaly
Segmentation in Urban Scenes [132.16748656557013]
Anomaly segmentation is a crucial task for safety-critical applications, such as autonomous driving in urban scenes.
We propose a novel and simple approach named Consensus Synergizes with Memory (CosMe) to address this challenge.
Experimental results on several urban scene anomaly segmentation datasets show that CosMe outperforms previous approaches by large margins.
arXiv Detail & Related papers (2021-11-24T10:01:20Z) - Egocentric Action Recognition by Video Attention and Temporal Context [83.57475598382146]
We present the submission of Samsung AI Centre Cambridge to the CVPR 2020 EPIC-Kitchens Action Recognition Challenge.
In this challenge, action recognition is posed as the problem of simultaneously predicting a single verb' and noun' class label given an input trimmed video clip.
Our solution achieves strong performance on the challenge metrics without using object-specific reasoning nor extra training data.
arXiv Detail & Related papers (2020-07-03T18:00:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.