Semantic Text Transmission via Prediction with Small Language Models:
Cost-Similarity Trade-off
- URL: http://arxiv.org/abs/2403.00290v1
- Date: Fri, 1 Mar 2024 05:20:16 GMT
- Title: Semantic Text Transmission via Prediction with Small Language Models:
Cost-Similarity Trade-off
- Authors: Bhavani A Madhabhavi, Gangadhar Karevvanavar, Rajshekhar V Bhat and
Nikolaos Pappas
- Abstract summary: We exploit language's inherent correlations and predictability to constrain transmission costs by allowing the destination to predict or complete words.
We obtain $(barc, bars)$ pairs for neural language and first-order Markov chain-based small language models.
We demonstrate that, when communication occurs over a noiseless channel, the threshold policy achieves a higher $bars$ for a given $barc$ than the periodic policy.
- Score: 7.666188363531336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the communication of natural language text from a source to a
destination over noiseless and character-erasure channels. We exploit
language's inherent correlations and predictability to constrain transmission
costs by allowing the destination to predict or complete words with potential
dissimilarity with the source text. Concretely, our objective is to obtain
achievable $(\bar{c}, \bar{s})$ pairs, where $\bar{c}$ is the average
transmission cost at the source and $\bar{s}$ is the average semantic
similarity measured via cosine similarity between vector embedding of words at
the source and those predicted/completed at the destination. We obtain
$(\bar{c}, \bar{s})$ pairs for neural language and first-order Markov
chain-based small language models (SLM) for prediction, using both a threshold
policy that transmits a word if its cosine similarity with that
predicted/completed at the destination is below a threshold, and a periodic
policy, which transmits words after a specific interval and predicts/completes
the words in between, at the destination. We adopt an SLM for word completion.
We demonstrate that, when communication occurs over a noiseless channel, the
threshold policy achieves a higher $\bar{s}$ for a given $\bar{c}$ than the
periodic policy and that the $\bar{s}$ achieved with the neural SLM is greater
than or equal to that of the Markov chain-based algorithm for the same
$\bar{c}$. The improved performance comes with a higher complexity in terms of
time and computing requirements. However, when communication occurs over a
character-erasure channel, all prediction algorithms and scheduling policies
perform poorly. Furthermore, if character-level Huffman coding is used, the
required $\bar{c}$ to achieve a given $\bar{s}$ is reduced, but the above
observations still apply.
Related papers
- Coupling without Communication and Drafter-Invariant Speculative Decoding [21.19028671377106]
Communication-free protocols yield a variant of speculative decoding that we call Drafter-Invariant Speculative Decoding.
We show that communication-free protocols yield a variant of speculative decoding that we call Drafter-Invariant Speculative Decoding.
arXiv Detail & Related papers (2024-08-15T06:52:24Z) - $\textit{Swap and Predict}$ -- Predicting the Semantic Changes in Words
across Corpora by Context Swapping [36.10628959436778]
We consider the problem of predicting whether a given target word, $w$, changes its meaning between two different text corpora.
We propose an unsupervised method that randomly swaps contexts between $mathcalC$ and $mathcalC$.
Our method achieves significant performance improvements compared to strong baselines for the English semantic change prediction task.
arXiv Detail & Related papers (2023-10-16T13:39:44Z) - Cooperative Multi-Agent Reinforcement Learning: Asynchronous
Communication and Linear Function Approximation [77.09836892653176]
We study multi-agent reinforcement learning in the setting of episodic Markov decision processes.
We propose a provably efficient algorithm based on value that enable asynchronous communication.
We show that a minimal $Omega(dM)$ communication complexity is required to improve the performance through collaboration.
arXiv Detail & Related papers (2023-05-10T20:29:29Z) - Truncation Sampling as Language Model Desmoothing [115.28983143361681]
Long samples of text from neural language models can be of poor quality.
Truncation sampling algorithms set some words' probabilities to zero at each step.
We introduce $eta$-sampling, which truncates words below an entropy-dependent probability threshold.
arXiv Detail & Related papers (2022-10-27T05:52:35Z) - Supervised Training of Conditional Monge Maps [107.78770597815242]
Optimal transport (OT) theory describes general principles to define and select, among many possible choices, the most efficient way to map a probability measure onto another.
We introduce CondOT, a multi-task approach to estimate a family of OT maps conditioned on a context variable.
We demonstrate the ability of CondOT to infer the effect of an arbitrary combination of genetic or therapeutic perturbations on single cells.
arXiv Detail & Related papers (2022-06-28T19:34:44Z) - Quantum Resources Required to Block-Encode a Matrix of Classical Data [56.508135743727934]
We provide circuit-level implementations and resource estimates for several methods of block-encoding a dense $Ntimes N$ matrix of classical data to precision $epsilon$.
We examine resource tradeoffs between the different approaches and explore implementations of two separate models of quantum random access memory (QRAM)
Our results go beyond simple query complexity and provide a clear picture into the resource costs when large amounts of classical data are assumed to be accessible to quantum algorithms.
arXiv Detail & Related papers (2022-06-07T18:00:01Z) - Communication-efficient SGD: From Local SGD to One-Shot Averaging [16.00658606157781]
We consider speeding up gradient descent (SGD) by parallelizing it across multiple workers.
We suggest a Local SGD scheme that communicates less overall by communicating less frequently as the number of iterations grows.
arXiv Detail & Related papers (2021-06-09T01:10:34Z) - Sparse Normal Means Estimation with Sublinear Communication [13.257075075199781]
We consider the problem of normal means estimation in a distributed setting with communication constraints.
We show that once the signal-to-noise ratio (SNR) is slightly higher, the support of $mu$ can be correctly recovered with much less communication.
arXiv Detail & Related papers (2021-02-05T08:52:25Z) - Improved Sample Complexity for Incremental Autonomous Exploration in
MDPs [132.88757893161699]
We learn the set of $epsilon$-optimal goal-conditioned policies attaining all states that are incrementally reachable within $L$ steps.
DisCo is the first algorithm that can return an $epsilon/c_min$-optimal policy for any cost-sensitive shortest-path problem.
arXiv Detail & Related papers (2020-12-29T14:06:09Z) - List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions.
Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset.
A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z) - Capacity Approaching Coding for Low Noise Interactive Quantum
Communication, Part I: Large Alphabets [15.078027648304115]
We consider the problem of implementing two-party interactive quantum communication over noisy channels.
For a noiseless qudit channel over a $mathrmpoly(n)$ size alphabet, our main result is a simulation method that fails with probability less than $2-Theta(nepsilon)$
We conjecture that it is optimal up to a constant factor in the $sqrtepsilon$ term.
arXiv Detail & Related papers (2020-01-09T02:48:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.