Pragmatic information in translation: a corpus-based study of tense and
mood in English and German
- URL: http://arxiv.org/abs/2007.05234v1
- Date: Fri, 10 Jul 2020 08:15:59 GMT
- Title: Pragmatic information in translation: a corpus-based study of tense and
mood in English and German
- Authors: Anita Ramm, Ekaterina Lapshinova-Koltunski, Alexander Fraser
- Abstract summary: Grammatical tense and mood are important linguistic phenomena to consider in natural language processing (NLP) research.
We consider the correspondence between English and German tense and mood in translation.
Of particular importance is the challenge of modeling tense and mood in rule-based, phrase-based statistical and neural machine translation.
- Score: 70.3497683558609
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Grammatical tense and mood are important linguistic phenomena to consider in
natural language processing (NLP) research. We consider the correspondence
between English and German tense and mood in translation. Human translators do
not find this correspondence easy, and as we will show through careful
analysis, there are no simplistic ways to map tense and mood from one language
to another. Our observations about the challenges of human translation of tense
and mood have important implications for multilingual NLP. Of particular
importance is the challenge of modeling tense and mood in rule-based,
phrase-based statistical and neural machine translation.
Related papers
- Limpeh ga li gong: Challenges in Singlish Annotations [1.3812010983144802]
We work on a fundamental Natural Language Processing task: Parts-Of-Speech (POS) tagging of Singlish sentences.
For our analysis, we build a parallel Singlish dataset containing direct English translations and POS tags, with translation and POS annotation done by native Singlish speakers.
Experiments show that automatic transition- and transformer-based taggers perform with only $sim 80%$ accuracy when evaluated against human-annotated POS labels.
arXiv Detail & Related papers (2024-10-21T16:21:45Z) - Language-Agnostic Analysis of Speech Depression Detection [2.5764071253486636]
This work analyzes automatic speech-based depression detection across two languages, English and Malayalam.
A CNN model is trained to identify acoustic features associated with depression in speech, focusing on both languages.
Our findings and collected data could contribute to the development of language-agnostic speech-based depression detection systems.
arXiv Detail & Related papers (2024-09-23T07:35:56Z) - A Corpus for Sentence-level Subjectivity Detection on English News Articles [49.49218203204942]
We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics.
Our corpus paves the way for subjectivity detection in English without relying on language-specific tools, such as lexicons or machine translation.
arXiv Detail & Related papers (2023-05-29T11:54:50Z) - A Linguistic Investigation of Machine Learning based Contradiction
Detection Models: An Empirical Analysis and Future Perspectives [0.34998703934432673]
We analyze two Natural Language Inference data sets with respect to their linguistic features.
The goal is to identify those syntactic and semantic properties that are particularly hard to comprehend for a machine learning model.
arXiv Detail & Related papers (2022-10-19T10:06:03Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Representing `how you say' with `what you say': English corpus of
focused speech and text reflecting corresponding implications [10.103202030679844]
In speech communication, how something is said (paralinguistic information) is as crucial as what is said (linguistic information)
Current speech translation systems return the same translations if the utterances are linguistically identical.
We propose mapping paralinguistic information into the linguistic domain within the source language using lexical and grammatical devices.
arXiv Detail & Related papers (2022-03-29T12:29:22Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z) - Information-Theoretic Probing for Linguistic Structure [74.04862204427944]
We propose an information-theoretic operationalization of probing as estimating mutual information.
We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
arXiv Detail & Related papers (2020-04-07T01:06:36Z) - On the Integration of LinguisticFeatures into Statistical and Neural
Machine Translation [2.132096006921048]
We investigate the discrepancies between the strengths of statistical approaches to machine translation and the way humans translate.
We identify linguistic information that is lacking in order for automatic translation systems to produce more accurate translations.
We identify overgeneralization or 'algomic bias' as a potential drawback of neural MT and link it to many of the remaining linguistic issues.
arXiv Detail & Related papers (2020-03-31T16:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.