Related papers: LAGO: Few-shot Crosslingual Embedding Inversion Attacks via Language Similarity-Aware Graph Optimization

LAGO: Few-shot Crosslingual Embedding Inversion Attacks via Language Similarity-Aware Graph Optimization

URL: http://arxiv.org/abs/2505.16008v1
Date: Wed, 21 May 2025 20:48:24 GMT
Title: LAGO: Few-shot Crosslingual Embedding Inversion Attacks via Language Similarity-Aware Graph Optimization
Authors: Wenrui Yu, Yiyi Chen, Johannes Bjerva, Sokol Kosta, Qiongxiu Li,
Abstract summary: LAGO is a novel approach for few-shot cross-lingual embedding inversion attacks.<n>It explicitly models linguistic relationships through a graph-based constrained distributed optimization framework.<n>Experiments show it substantially improves the transferability of attacks with 10-20% increase in Rouge-L score over baselines.
Score: 4.274520108617021
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose LAGO - Language Similarity-Aware Graph Optimization - a novel approach for few-shot cross-lingual embedding inversion attacks, addressing critical privacy vulnerabilities in multilingual NLP systems. Unlike prior work in embedding inversion attacks that treat languages independently, LAGO explicitly models linguistic relationships through a graph-based constrained distributed optimization framework. By integrating syntactic and lexical similarity as edge constraints, our method enables collaborative parameter learning across related languages. Theoretically, we show this formulation generalizes prior approaches, such as ALGEN, which emerges as a special case when similarity constraints are relaxed. Our framework uniquely combines Frobenius-norm regularization with linear inequality or total variation constraints, ensuring robust alignment of cross-lingual embedding spaces even with extremely limited data (as few as 10 samples per language). Extensive experiments across multiple languages and embedding models demonstrate that LAGO substantially improves the transferability of attacks with 10-20% increase in Rouge-L score over baselines. This work establishes language similarity as a critical factor in inversion attack transferability, urging renewed focus on language-aware privacy-preserving multilingual embeddings.

Related papers

Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models [55.14276067678253]
This paper introduces a novel methodology for efficiently identifying inherent cross-lingual weaknesses in Large Language Models (LLMs)<n>We construct a new dataset of over 6,000 bilingual pairs across 16 languages using this methodology, demonstrating its effectiveness in revealing weaknesses even in state-of-the-art models.<n>Further experiments investigate the relationship between linguistic similarity and cross-lingual weaknesses, revealing that linguistically related languages share similar performance patterns.
arXiv Detail & Related papers (2025-05-24T12:31:27Z)
Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention [71.12193680015622]
Large Language Models (LLMs) have shown remarkable capabilities in natural language processing. LLMs exhibit significant performance gaps among different languages. We propose Inference-Time Cross-Lingual Intervention (INCLINE) to overcome these limitations without incurring significant costs.
arXiv Detail & Related papers (2024-10-16T11:23:03Z)
Probing the Emergence of Cross-lingual Alignment during LLM Training [10.053333786023089]
Multilingual Large Language Models (LLMs) achieve remarkable levels of zero-shot cross-lingual transfer performance. We study how such cross-lingual alignment emerges during pre-training of LLMs. We observe a high correlation between neuron overlap and downstream performance.
arXiv Detail & Related papers (2024-06-19T05:31:59Z)
Vicinal Risk Minimization for Few-Shot Cross-lingual Transfer in Abusive Language Detection [19.399281609371258]
Cross-lingual transfer learning from high-resource to medium and low-resource languages has shown encouraging results. We resort to data augmentation and continual pre-training for domain adaptation to improve cross-lingual abusive language detection.
arXiv Detail & Related papers (2023-11-03T16:51:07Z)
Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing [68.47787275021567]
Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data. We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between latent variables using Optimal Transport.
arXiv Detail & Related papers (2023-07-09T04:52:31Z)
VECO 2.0: Cross-lingual Language Model Pre-training with Multi-granularity Contrastive Learning [56.47303426167584]
We propose a cross-lingual pre-trained model VECO2.0 based on contrastive learning with multi-granularity alignments. Specifically, the sequence-to-sequence alignment is induced to maximize the similarity of the parallel pairs and minimize the non-parallel pairs. token-to-token alignment is integrated to bridge the gap between synonymous tokens excavated via the thesaurus dictionary from the other unpaired tokens in a bilingual instance.
arXiv Detail & Related papers (2023-04-17T12:23:41Z)
A Simple and Effective Method to Improve Zero-Shot Cross-Lingual Transfer Learning [6.329304732560936]
Existing zero-shot cross-lingual transfer methods rely on parallel corpora or bilingual dictionaries. We propose Embedding-Push, Attention-Pull, and Robust targets to transfer English embeddings to virtual multilingual embeddings without semantic loss.
arXiv Detail & Related papers (2022-10-18T15:36:53Z)
Multi-Level Contrastive Learning for Cross-Lingual Alignment [35.33431650608965]
Cross-language pre-trained models such as multilingual BERT (mBERT) have achieved significant performance in various cross-lingual downstream NLP tasks. This paper proposes a multi-level contrastive learning framework to further improve the cross-lingual ability of pre-trained models.
arXiv Detail & Related papers (2022-02-26T07:14:20Z)
Towards Multi-Sense Cross-Lingual Alignment of Contextual Embeddings [41.148892848434585]
We propose a novel framework to align contextual embeddings at the sense level by leveraging cross-lingual signal from bilingual dictionaries only. We operationalize our framework by first proposing a novel sense-aware cross entropy loss to model word senses explicitly. We then propose a sense alignment objective on top of the sense-aware cross entropy loss for cross-lingual model pretraining, and pretrain cross-lingual models for several language pairs.
arXiv Detail & Related papers (2021-03-11T04:55:35Z)
GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and Event Extraction [107.8262586956778]
We introduce graph convolutional networks (GCNs) with universal dependency parses to learn language-agnostic sentence representations. GCNs struggle to model words with long-range dependencies or are not directly connected in the dependency tree. We propose to utilize the self-attention mechanism to learn the dependencies between words with different syntactic distances.
arXiv Detail & Related papers (2020-10-06T20:30:35Z)
Inducing Language-Agnostic Multilingual Representations [61.97381112847459]
Cross-lingual representations have the potential to make NLP techniques available to the vast majority of languages in the world. We examine three approaches for this: (i) re-aligning the vector spaces of target languages to a pivot source language; (ii) removing language-specific means and variances, which yields better discriminativeness of embeddings as a by-product; and (iii) increasing input similarity across languages by removing morphological contractions and sentence reordering.
arXiv Detail & Related papers (2020-08-20T17:58:56Z)
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages. We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.