TRESTLE: Toolkit for Reproducible Execution of Speech, Text and Language
Experiments
- URL: http://arxiv.org/abs/2302.07322v1
- Date: Tue, 14 Feb 2023 20:07:31 GMT
- Title: TRESTLE: Toolkit for Reproducible Execution of Speech, Text and Language
Experiments
- Authors: Changye Li, Trevor Cohen, Martin Michalowski, and Serguei Pakhomov
- Abstract summary: We present TRESTLE, an open source platform that focuses on two datasets from the TalkBank repository with dementia detection as an illustrative domain.
TRESTLE provides a precise digital blueprint of the data pre-processing and selection strategies that can be reused via TRESTLE by other researchers.
- Score: 8.329520728240677
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The evidence is growing that machine and deep learning methods can learn the
subtle differences between the language produced by people with various forms
of cognitive impairment such as dementia and cognitively healthy individuals.
Valuable public data repositories such as TalkBank have made it possible for
researchers in the computational community to join forces and learn from each
other to make significant advances in this area. However, due to variability in
approaches and data selection strategies used by various researchers, results
obtained by different groups have been difficult to compare directly. In this
paper, we present TRESTLE (\textbf{T}oolkit for \textbf{R}eproducible
\textbf{E}xecution of \textbf{S}peech \textbf{T}ext and \textbf{L}anguage
\textbf{E}xperiments), an open source platform that focuses on two datasets
from the TalkBank repository with dementia detection as an illustrative domain.
Successfully deployed in the hackallenge (Hackathon/Challenge) of the
International Workshop on Health Intelligence at AAAI 2022, TRESTLE provides a
precise digital blueprint of the data pre-processing and selection strategies
that can be reused via TRESTLE by other researchers seeking comparable results
with their peers and current state-of-the-art (SOTA) approaches.
Related papers
- A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding.
There is no publicly available NLI corpus for the Romanian language.
We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - A Simple yet Efficient Ensemble Approach for AI-generated Text Detection [0.5840089113969194]
Large Language Models (LLMs) have demonstrated remarkable capabilities in generating text that closely resembles human writing.
It is essential to build automated approaches capable of distinguishing between artificially generated text and human-authored text.
We propose a simple yet efficient solution by ensembling predictions from multiple constituent LLMs.
arXiv Detail & Related papers (2023-11-06T13:11:02Z) - A deep Natural Language Inference predictor without language-specific
training data [44.26507854087991]
We present a technique of NLP to tackle the problem of inference relation (NLI) between pairs of sentences in a target language of choice without a language-specific training dataset.
We exploit a generic translation dataset, manually translated, along with two instances of the same pre-trained model.
The model has been evaluated over machine translated Stanford NLI test dataset, machine translated Multi-Genre NLI test dataset, and manually translated RTE3-ITA test dataset.
arXiv Detail & Related papers (2023-09-06T10:20:59Z) - MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection.
We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs.
Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z) - Text and author-level political inference using heterogeneous knowledge
representations [0.0]
Inference of politically-charged information from text data is a popular research topic in Natural Language Processing (NLP)
The present work describes a series of experiments to compare alternative model configurations for political inference from text in both English and Portuguese languages.
Results suggest certain text representations may outperform the alternatives across multiple experimental settings.
arXiv Detail & Related papers (2022-06-24T13:45:36Z) - A Dual-Contrastive Framework for Low-Resource Cross-Lingual Named Entity
Recognition [5.030581940990434]
Cross-lingual Named Entity Recognition (NER) has recently become a research hotspot because it can alleviate the data-hungry problem for low-resource languages.
In this paper, we describe our novel dual-contrastive framework ConCNER for cross-lingual NER under the scenario of limited source-language labeled data.
arXiv Detail & Related papers (2022-04-02T07:59:13Z) - CL-XABSA: Contrastive Learning for Cross-lingual Aspect-based Sentiment
Analysis [4.60495447017298]
We propose a novel framework, CL-XABSA: Contrastive Learning for Cross-lingual Aspect-Based Sentiment Analysis.
Specifically, we design two contrastive strategies, token level contrastive learning of token embeddings (TL-CTE) and sentiment level contrastive learning of token embeddings (SL-CTE)
Since our framework can receive datasets in multiple languages during training, our framework can be adapted not only for XABSA task, but also for multilingual aspect-based sentiment analysis (MABSA)
arXiv Detail & Related papers (2022-04-02T07:40:03Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and
Event Extraction [107.8262586956778]
We introduce graph convolutional networks (GCNs) with universal dependency parses to learn language-agnostic sentence representations.
GCNs struggle to model words with long-range dependencies or are not directly connected in the dependency tree.
We propose to utilize the self-attention mechanism to learn the dependencies between words with different syntactic distances.
arXiv Detail & Related papers (2020-10-06T20:30:35Z) - Intrinsic Probing through Dimension Selection [69.52439198455438]
Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks.
Such high performance should not be possible unless some form of linguistic structure inheres in these representations, and a wealth of research has sprung up on probing for it.
In this paper, we draw a distinction between intrinsic probing, which examines how linguistic information is structured within a representation, and the extrinsic probing popular in prior work, which only argues for the presence of such information by showing that it can be successfully extracted.
arXiv Detail & Related papers (2020-10-06T15:21:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.