Design of an Open-Source Architecture for Neural Machine Translation
- URL: http://arxiv.org/abs/2403.03582v1
- Date: Wed, 6 Mar 2024 09:57:52 GMT
- Title: Design of an Open-Source Architecture for Neural Machine Translation
- Authors: S\'eamus Lankford, Haithem Afli and Andy Way
- Abstract summary: adaptNMT is an open-source application that offers a streamlined approach to the development and deployment of Transformer models.
The application is built upon the widely-adopted OpenNMT ecosystem.
- Score: 2.648836772989769
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: adaptNMT is an open-source application that offers a streamlined approach to
the development and deployment of Recurrent Neural Networks and Transformer
models. This application is built upon the widely-adopted OpenNMT ecosystem,
and is particularly useful for new entrants to the field, as it simplifies the
setup of the development environment and creation of train, validation, and
test splits. The application offers a graphing feature that illustrates the
progress of model training, and employs SentencePiece for creating subword
segmentation models. Furthermore, the application provides an intuitive user
interface that facilitates hyperparameter customization. Notably, a
single-click model development approach has been implemented, and models
developed by adaptNMT can be evaluated using a range of metrics. To encourage
eco-friendly research, adaptNMT incorporates a green report that flags the
power consumption and kgCO${_2}$ emissions generated during model development.
The application is freely available.
Related papers
- Adaptable Embeddings Network (AEN) [49.1574468325115]
We introduce Adaptable Embeddings Networks (AEN), a novel dual-encoder architecture using Kernel Density Estimation (KDE)
AEN allows for runtime adaptation of classification criteria without retraining and is non-autoregressive.
The architecture's ability to preprocess and cache condition embeddings makes it ideal for edge computing applications and real-time monitoring systems.
arXiv Detail & Related papers (2024-11-21T02:15:52Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - Structural Pruning of Pre-trained Language Models via Neural Architecture Search [7.833790713816726]
Pre-trained language models (PLM) mark the state-of-the-art for natural language understanding task when fine-tuned on labeled data.
This paper explores neural architecture search (NAS) for structural pruning to find sub-parts of the fine-tuned network that optimally trade-off efficiency.
arXiv Detail & Related papers (2024-05-03T17:34:57Z) - A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys) [57.30228361181045]
This survey connects key advancements in recommender systems using Generative Models (Gen-RecSys)
It covers: interaction-driven generative models; the use of large language models (LLM) and textual data for natural language recommendation; and the integration of multimodal models for generating and processing images/videos in RS.
Our work highlights necessary paradigms for evaluating the impact and harm of Gen-RecSys and identifies open challenges.
arXiv Detail & Related papers (2024-03-31T06:57:57Z) - adaptNMT: an open-source, language-agnostic development environment for
Neural Machine Translation [2.648836772989769]
adaptNMT is designed for both technical and non-technical users who work in the field of machine translation.
The application is built upon the widely-adopted OpenNMT ecosystem.
To support eco-friendly research in the NLP space, a green report also flags the power consumption and kgCO$_2$ emissions generated during model development.
arXiv Detail & Related papers (2024-03-04T12:10:17Z) - Model LEGO: Creating Models Like Disassembling and Assembling Building Blocks [53.09649785009528]
In this paper, we explore a paradigm that does not require training to obtain new models.
Similar to the birth of CNN inspired by receptive fields in the biological visual system, we propose Model Disassembling and Assembling.
For model assembling, we present the alignment padding strategy and parameter scaling strategy to construct a new model tailored for a specific task.
arXiv Detail & Related papers (2022-03-25T05:27:28Z) - Assemble Foundation Models for Automatic Code Summarization [9.53949558569201]
We propose a flexible and robust approach for automatic code summarization based on neural networks.
We assemble available foundation models, such as CodeBERT and GPT-2, into a single model named AdaMo.
We introduce two adaptive schemes from the perspective of knowledge transfer, namely continuous pretraining and intermediate finetuning.
arXiv Detail & Related papers (2022-01-13T21:38:33Z) - Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction.
It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition.
We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z) - Learning Discrete Energy-based Models via Auxiliary-variable Local
Exploration [130.89746032163106]
We propose ALOE, a new algorithm for learning conditional and unconditional EBMs for discrete structured data.
We show that the energy function and sampler can be trained efficiently via a new variational form of power iteration.
We present an energy model guided fuzzer for software testing that achieves comparable performance to well engineered fuzzing engines like libfuzzer.
arXiv Detail & Related papers (2020-11-10T19:31:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.