PyMarian: Fast Neural Machine Translation and Evaluation in Python
- URL: http://arxiv.org/abs/2408.11853v1
- Date: Thu, 15 Aug 2024 01:41:21 GMT
- Title: PyMarian: Fast Neural Machine Translation and Evaluation in Python
- Authors: Thamme Gowda, Roman Grundkiewicz, Elijah Rippeth, Matt Post, Marcin Junczys-Dowmunt,
- Abstract summary: We describe a Python interface to Marian NMT, a C++-based training and inference toolkit for sequence-to-sequence models.
This interface enables models trained with Marian to be connected to the rich, wide range of tools available in Python.
- Score: 11.291502854418098
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The deep learning language of choice these days is Python; measured by factors such as available libraries and technical support, it is hard to beat. At the same time, software written in lower-level programming languages like C++ retain advantages in speed. We describe a Python interface to Marian NMT, a C++-based training and inference toolkit for sequence-to-sequence models, focusing on machine translation. This interface enables models trained with Marian to be connected to the rich, wide range of tools available in Python. A highlight of the interface is the ability to compute state-of-the-art COMET metrics from Python but using Marian's inference engine, with a speedup factor of up to 7.8$\times$ the existing implementations. We also briefly spotlight a number of other integrations, including Jupyter notebooks, connection with prebuilt models, and a web app interface provided with the package. PyMarian is available in PyPI via $\texttt{pip install pymarian}$.
Related papers
- depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers [92.13613958373628]
textttdepyf is a tool designed to demystify the inner workings of the PyTorch compiler.
textttdepyf decompiles bytecode generated by PyTorch back into equivalent source code.
arXiv Detail & Related papers (2024-03-14T16:17:14Z) - pyvene: A Library for Understanding and Improving PyTorch Models via
Interventions [79.72930339711478]
$textbfpyvene$ is an open-source library that supports customizable interventions on a range of different PyTorch modules.
We show how $textbfpyvene$ provides a unified framework for performing interventions on neural models and sharing the intervened upon models with others.
arXiv Detail & Related papers (2024-03-12T16:46:54Z) - PyPOTS: A Python Toolbox for Data Mining on Partially-Observed Time
Series [0.0]
PyPOTS is an open-source Python library dedicated to data mining and analysis on partially-observed time series.
It provides easy access to diverse algorithms categorized into four tasks: imputation, classification, clustering, and forecasting.
arXiv Detail & Related papers (2023-05-30T07:57:05Z) - PyGOD: A Python Library for Graph Outlier Detection [56.33769221859135]
PyGOD is an open-source library for detecting outliers in graph data.
It supports a wide array of leading graph-based methods for outlier detection.
PyGOD is released under a BSD 2-Clause license at https://pygod.org and at the Python Package Index (PyPI)
arXiv Detail & Related papers (2022-04-26T06:15:21Z) - PyHHMM: A Python Library for Heterogeneous Hidden Markov Models [63.01207205641885]
PyHHMM is an object-oriented Python implementation of Heterogeneous-Hidden Markov Models (HHMMs)
PyHHMM emphasizes features not supported in similar available frameworks: a heterogeneous observation model, missing data inference, different model order selection criterias, and semi-supervised training.
PyHHMM relies on the numpy, scipy, scikit-learn, and seaborn Python packages, and is distributed under the Apache-2.0 License.
arXiv Detail & Related papers (2022-01-12T07:32:36Z) - OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI
Libraries on HPC Systems [1.066106854070245]
OMB-Py is the first communication benchmark suite for parallel Python applications.
OMB-Py consists of a variety of point-to-point and collective communication benchmark tests.
We report up to 106x speedup on 224 CPU cores compared to sequential execution.
arXiv Detail & Related papers (2021-10-20T16:59:14Z) - Extending Python for Quantum-Classical Computing via Quantum
Just-in-Time Compilation [78.8942067357231]
Python is a popular programming language known for its flexibility, usability, readability, and focus on developer productivity.
We present a language extension to Python that enables heterogeneous quantum-classical computing via a robust C++ infrastructure for quantum just-in-time compilation.
arXiv Detail & Related papers (2021-05-10T21:11:21Z) - Using Python for Model Inference in Deep Learning [0.6027358520885614]
We show how it is possible to meet performance and packaging constraints while performing inference in Python.
We present a way of using multiple Python interpreters within a single process to achieve scalable inference.
arXiv Detail & Related papers (2021-04-01T04:48:52Z) - Python Workflows on HPC Systems [2.1485350418225244]
The recent successes and wide spread application of compute intensive machine learning and data analytics methods have been boosting the usage of the Python programming language on HPC systems.
While Python provides many advantages for the users, it has not been designed with a focus on multi-user environments or parallel programming.
In this paper, we analyze the key problems induced by the usage of Python on HPC clusters and sketch appropriate workarounds.
arXiv Detail & Related papers (2020-12-01T09:51:12Z) - OPFython: A Python-Inspired Optimum-Path Forest Classifier [68.8204255655161]
This paper proposes a Python-based Optimum-Path Forest framework, denoted as OPFython.
As OPFython is a Python-based library, it provides a more friendly environment and a faster prototyping workspace than the C language.
arXiv Detail & Related papers (2020-01-28T15:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.