EPIE Dataset: A Corpus For Possible Idiomatic Expressions
- URL: http://arxiv.org/abs/2006.09479v1
- Date: Tue, 16 Jun 2020 19:43:30 GMT
- Title: EPIE Dataset: A Corpus For Possible Idiomatic Expressions
- Authors: Prateek Saxena and Soma Paul
- Abstract summary: We present our English Possibleatic(EPIE) corpus containing 25206 sentences labelled with lexical instances of 717 idiomatic expressions.
We also present the utility of our dataset by using it to train a sequence labelling module and testing on three independent datasets with high accuracy, precision and recall scores.
- Score: 11.891511657648941
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Idiomatic expressions have always been a bottleneck for language
comprehension and natural language understanding, specifically for tasks like
Machine Translation(MT). MT systems predominantly produce literal translations
of idiomatic expressions as they do not exhibit generic and linguistically
deterministic patterns which can be exploited for comprehension of the
non-compositional meaning of the expressions. These expressions occur in
parallel corpora used for training, but due to the comparatively high
occurrences of the constituent words of idiomatic expressions in literal
context, the idiomatic meaning gets overpowered by the compositional meaning of
the expression. State of the art Metaphor Detection Systems are able to detect
non-compositional usage at word level but miss out on idiosyncratic phrasal
idiomatic expressions. This creates a dire need for a dataset with a wider
coverage and higher occurrence of commonly occurring idiomatic expressions, the
spans of which can be used for Metaphor Detection. With this in mind, we
present our English Possible Idiomatic Expressions(EPIE) corpus containing
25206 sentences labelled with lexical instances of 717 idiomatic expressions.
These spans also cover literal usages for the given set of idiomatic
expressions. We also present the utility of our dataset by using it to train a
sequence labelling module and testing on three independent datasets with high
accuracy, precision and recall scores.
Related papers
- Investigating Idiomaticity in Word Representations [9.208145117062339]
We focus on noun compounds of varying levels of idiomaticity in two languages (English and Portuguese)
We present a dataset of minimal pairs containing human idiomaticity judgments for each noun compound at both type and token levels.
We define a set of fine-grained metrics of Affinity and Scaled Similarity to determine how sensitive the models are to perturbations that may lead to changes in idiomaticity.
arXiv Detail & Related papers (2024-11-04T21:05:01Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Cross-Lingual Phrase Retrieval [49.919180978902915]
Cross-lingual retrieval aims to retrieve relevant text across languages.
Current methods typically achieve cross-lingual retrieval by learning language-agnostic text representations in word or sentence level.
We propose XPR, a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences.
arXiv Detail & Related papers (2022-04-19T13:35:50Z) - HIT at SemEval-2022 Task 2: Pre-trained Language Model for Idioms
Detection [23.576133853110324]
The same multi-word expressions may have different meanings in different sentences.
They can be divided into two categories, which are literal meaning and idiomatic meaning.
We use a pre-trained language model, which can provide a context-aware sentence embedding.
arXiv Detail & Related papers (2022-04-13T02:45:04Z) - Idiomatic Expression Identification using Semantic Compatibility [8.355785779504869]
We study the task of detecting whether a sentence has an idiomatic expression and localizing it.
We propose a multi-stage neural architecture with the attention flow mechanism for identifying these expressions.
A salient feature of the model is its ability to identify idioms unseen during training with gains from 1.4% to 30.8% over competitive baselines.
arXiv Detail & Related papers (2021-10-19T15:44:28Z) - On The Ingredients of an Effective Zero-shot Semantic Parser [95.01623036661468]
We analyze zero-shot learning by paraphrasing training examples of canonical utterances and programs from a grammar.
We propose bridging these gaps using improved grammars, stronger paraphrasers, and efficient learning methods.
Our model achieves strong performance on two semantic parsing benchmarks (Scholar, Geo) with zero labeled data.
arXiv Detail & Related papers (2021-10-15T21:41:16Z) - Understanding Synonymous Referring Expressions via Contrastive Features [105.36814858748285]
We develop an end-to-end trainable framework to learn contrastive features on the image and object instance levels.
We conduct extensive experiments to evaluate the proposed algorithm on several benchmark datasets.
arXiv Detail & Related papers (2021-04-20T17:56:24Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - MICE: Mining Idioms with Contextual Embeddings [0.0]
MICEatic expressions can be problematic for natural language processing applications.
We present an approach that uses contextual embeddings for that purpose.
We show that deep neural networks using both embeddings perform much better than existing approaches.
arXiv Detail & Related papers (2020-08-13T08:56:40Z) - Graph-Structured Referring Expression Reasoning in The Wild [105.95488002374158]
Grounding referring expressions aims to locate in an image an object referred to by a natural language expression.
We propose a scene graph guided modular network (SGMN) to perform reasoning over a semantic graph and a scene graph.
We also propose Ref-Reasoning, a large-scale real-world dataset for structured referring expression reasoning.
arXiv Detail & Related papers (2020-04-19T11:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.