The ACL OCL Corpus: Advancing Open Science in Computational Linguistics
- URL: http://arxiv.org/abs/2305.14996v2
- Date: Tue, 24 Oct 2023 05:18:32 GMT
- Title: The ACL OCL Corpus: Advancing Open Science in Computational Linguistics
- Authors: Shaurya Rohatgi, Yanxia Qin, Benjamin Aw, Niranjana Unnithan, Min-Yen
Kan
- Abstract summary: The ACL OCL spans seven decades, containing 73K papers, alongside 210K figures.
By detecting paper topics with a supervised neural model, we note that interest in "Syntax: Tagging, Chunking and Parsing" is waning and "hugging Language Generation" is resurging.
- Score: 19.282407097200917
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present ACL OCL, a scholarly corpus derived from the ACL Anthology to
assist Open scientific research in the Computational Linguistics domain.
Integrating and enhancing the previous versions of the ACL Anthology, the ACL
OCL contributes metadata, PDF files, citation graphs and additional structured
full texts with sections, figures, and links to a large knowledge resource
(Semantic Scholar). The ACL OCL spans seven decades, containing 73K papers,
alongside 210K figures.
We spotlight how ACL OCL applies to observe trends in computational
linguistics. By detecting paper topics with a supervised neural model, we note
that interest in "Syntax: Tagging, Chunking and Parsing" is waning and "Natural
Language Generation" is resurging. Our dataset is available from HuggingFace
(https://huggingface.co/datasets/WINGNUS/ACL-OCL).
Related papers
- LATEX-GCL: Large Language Models (LLMs)-Based Data Augmentation for Text-Attributed Graph Contrastive Learning [35.69403361648343]
GCL for learning on Text-Attributed Graphs (TAGs) has yet to be explored.
A naive strategy for applying GCL to TAGs is to encode the textual attributes into feature embeddings via a language model.
We propose a novel GCL framework named LATEX-GCL to utilize Large Language Models (LLMs) to produce textual augmentations.
arXiv Detail & Related papers (2024-09-02T10:30:55Z) - ACL Anthology Helper: A Tool to Retrieve and Manage Literature from ACL
Anthology [30.962672279263778]
ACL Anthology Helper automates the process of parsing and downloading papers along with their meta-information.
This allows for efficient management of the local papers using a wide range of operations, including "where," "group," "order," and more.
arXiv Detail & Related papers (2023-10-31T13:59:05Z) - In-Context Learning Learns Label Relationships but Is Not Conventional
Learning [60.891931501449726]
There is currently no consensus about how in-context learning (ICL) ability of Large Language Models works.
We provide novel insights into how ICL leverages label information, revealing both capabilities and limitations.
Our experiments show that ICL predictions almost always depend on in-context labels and that ICL can learn truly novel tasks in-context.
arXiv Detail & Related papers (2023-07-23T16:54:41Z) - HomoGCL: Rethinking Homophily in Graph Contrastive Learning [64.85392028383164]
HomoGCL is a model-agnostic framework to expand the positive set using neighbor nodes with neighbor-specific significances.
We show that HomoGCL yields multiple state-of-the-art results across six public datasets.
arXiv Detail & Related papers (2023-06-16T04:06:52Z) - Label Words are Anchors: An Information Flow Perspective for
Understanding In-Context Learning [77.7070536959126]
In-context learning (ICL) emerges as a promising capability of large language models (LLMs)
In this paper, we investigate the working mechanism of ICL through an information flow lens.
We introduce an anchor re-weighting method to improve ICL performance, a demonstration compression technique to expedite inference, and an analysis framework for diagnosing ICL errors in GPT2-XL.
arXiv Detail & Related papers (2023-05-23T15:26:20Z) - On Codex Prompt Engineering for OCL Generation: An Empirical Study [10.184056098238765]
The Object Constraint Language (OCL) is a declarative language that adds constraints and object query expressions to MOF models.
Recent advancements in LLMs, such as GPT-3, have shown their capability in many NLP tasks.
We investigate the reliability of OCL constraints generated by Codex from natural language specifications.
arXiv Detail & Related papers (2023-03-28T18:50:51Z) - OpenICL: An Open-Source Framework for In-context Learning [48.75452105457122]
We introduce OpenICL, an open-source toolkit for In-context Learning (ICL) and large language model evaluation.
OpenICL is research-friendly with a highly flexible architecture that users can easily combine different components to suit their needs.
The effectiveness of OpenICL has been validated on a wide range of NLP tasks, including classification, QA, machine translation, and semantic parsing.
arXiv Detail & Related papers (2023-03-06T06:20:25Z) - ACL-Fig: A Dataset for Scientific Figure Classification [15.241086410108512]
We develop a pipeline that extracts figures and tables from the scientific literature and a deep-learning-based framework that classifies scientific figures using visual features.
We build the first large-scale automatically annotated corpus, ACL-Fig, consisting of 112,052 scientific figures extracted from 56K research papers in the ACL Anthology.
The ACL-Fig-Pilot dataset contains 1,671 manually labeled scientific figures belonging to 19 categories.
arXiv Detail & Related papers (2023-01-28T20:27:35Z) - A Survey on In-context Learning [77.78614055956365]
In-context learning (ICL) has emerged as a new paradigm for natural language processing (NLP)
We first present a formal definition of ICL and clarify its correlation to related studies.
We then organize and discuss advanced techniques, including training strategies, prompt designing strategies, and related analysis.
arXiv Detail & Related papers (2022-12-31T15:57:09Z) - DenseCLIP: Extract Free Dense Labels from CLIP [130.3830819077699]
Contrastive Language-Image Pre-training (CLIP) has made a remarkable breakthrough in open-vocabulary zero-shot image recognition.
DenseCLIP+ surpasses SOTA transductive zero-shot semantic segmentation methods by large margins.
Our finding suggests that DenseCLIP can serve as a new reliable source of supervision for dense prediction tasks.
arXiv Detail & Related papers (2021-12-02T09:23:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.