Related papers: Some Like It Small: Czech Semantic Embedding Models for Industry Applications

Some Like It Small: Czech Semantic Embedding Models for Industry Applications

URL: http://arxiv.org/abs/2311.13921v1
Date: Thu, 23 Nov 2023 11:14:13 GMT
Title: Some Like It Small: Czech Semantic Embedding Models for Industry Applications
Authors: Ji\v{r}\'i Bedn\'a\v{r}, Jakub N\'aplava, Petra Baran\v{c}\'ikov\'a, Ond\v{r}ej Lisick\'y
Abstract summary: This article focuses on the development and evaluation of Small-sized Czech sentence embedding models. Small models are important components for real-time industry applications in resource-constrained environments. Ultimately, this article presents practical applications of the developed sentence embedding models in Seznam.cz, the Czech search engine.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This article focuses on the development and evaluation of Small-sized Czech sentence embedding models. Small models are important components for real-time industry applications in resource-constrained environments. Given the limited availability of labeled Czech data, alternative approaches, including pre-training, knowledge distillation, and unsupervised contrastive fine-tuning, are investigated. Comprehensive intrinsic and extrinsic analyses are conducted, showcasing the competitive performance of our models compared to significantly larger counterparts, with approximately 8 times smaller size and 5 times faster speed than conventional Base-sized models. To promote cooperation and reproducibility, both the models and the evaluation pipeline are made publicly accessible. Ultimately, this article presents practical applications of the developed sentence embedding models in Seznam.cz, the Czech search engine. These models have effectively replaced previous counterparts, enhancing the overall search experience for instance, in organic search, featured snippets, and image search. This transition has yielded improved performance.

Related papers

Affordances Enable Partial World Modeling with LLMs [68.52975612311575]
We show that agents achieving task-agnostic, language-conditioned intents possess predictive partial-world models informed by affordances.<n>In the multi-task setting, we introduce distribution-robust affordances and show that partial models can be extracted to significantly improve search efficiency.
arXiv Detail & Related papers (2026-02-11T00:25:25Z)
LOCUS: Low-Dimensional Model Embeddings for Efficient Model Exploration, Comparison, and Selection [15.182368486530128]
We propose LOCUS, a method that produces low-dimensional vector embeddings that compactly represent a language model's capabilities across queries.<n>LOCUS is an attention-based approach that generates embeddings by a deterministic forward pass over query encodings and evaluation scores via an encoder model.<n>We train a correctness predictor that uses model embeddings and query encodings to achieve state-of-the-art routing accuracy on unseen queries.
arXiv Detail & Related papers (2026-01-28T22:09:42Z)
KCM: KAN-Based Collaboration Models Enhance Pretrained Large Models [62.658961779827145]
We propose a KAN-based Collaborative Model (KCM) as an improved approach to large-small model collaboration.<n>KAN offers superior visualizability and interpretability while mitigating catastrophic forgetting.
arXiv Detail & Related papers (2025-10-23T07:06:21Z)
Diffusion Model Quantization: A Review [36.22019054372206]
Recent success of large text-to-image models has underscored the exceptional performance of diffusion models in generative tasks.<n>Diffusion model quantization has emerged as a pivotal technique for both compression and acceleration.
arXiv Detail & Related papers (2025-05-08T13:09:34Z)
EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models [64.18350535770357]
We propose an automatic pruning method for large vision-language models to enhance the efficiency of multimodal reasoning.<n>Our approach only leverages a small number of samples to search for the desired pruning policy.<n>We conduct extensive experiments on the ScienceQA, Vizwiz, MM-vet, and LLaVA-Bench datasets for the task of visual question answering.
arXiv Detail & Related papers (2025-03-19T16:07:04Z)
A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models. Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning. We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z)
Comprehensive Study on Performance Evaluation and Optimization of Model Compression: Bridging Traditional Deep Learning and Large Language Models [0.0]
An increase in the number of connected devices around the world warrants compressed models that can be easily deployed at the local devices with low compute capacity and power accessibility. We implemented both, quantization and pruning, compression techniques on popular deep learning models used in the image classification, object detection, language models and generative models-based problem statements.
arXiv Detail & Related papers (2024-07-22T14:20:53Z)
Benchmarking for Deep Uplift Modeling in Online Marketing [17.70084353772874]
Deep uplift modeling (DUM) as a promising technique has attracted increased research from academia and industry. Current DUM still lacks some standardized benchmarks and unified evaluation protocols. We provide an open benchmark for DUM and present comparison results of existing models in a reproducible and uniform manner.
arXiv Detail & Related papers (2024-06-01T07:23:37Z)
Tiny Models are the Computational Saver for Large Models [1.8350044465969415]
This paper introduces TinySaver, an early-exit-like dynamic model compression approach which employs tiny models to substitute large models adaptively. Our evaluation of this approach in ImageNet-1k classification demonstrates its potential to reduce the number of compute operations by up to 90%, with only negligible losses in performance.
arXiv Detail & Related papers (2024-03-26T14:14:30Z)
RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling. Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training. Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z)
Compressing Sentence Representation with maximum Coding Rate Reduction [0.0]
In most natural language inference problems, sentence representation is needed for semantic retrieval tasks. Due to space and time hardware limitations, there is a need to attain comparable results when using the smaller model. We demonstrate that the new language model with reduced complexity and sentence embedding size can achieve comparable results on semantic retrieval benchmarks.
arXiv Detail & Related papers (2023-04-25T09:23:43Z)
Investigating Ensemble Methods for Model Robustness Improvement of Text Classifiers [66.36045164286854]
We analyze a set of existing bias features and demonstrate there is no single model that works best for all the cases. By choosing an appropriate bias model, we can obtain a better robustness result than baselines with a more sophisticated model design.
arXiv Detail & Related papers (2022-10-28T17:52:10Z)
A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained Models [87.7086269902562]
We show that subword-based models might still be the most practical choice in many settings. We encourage future work in tokenizer-free methods to consider these factors when designing and evaluating new models.
arXiv Detail & Related papers (2022-10-13T15:47:09Z)
Model Reuse with Reduced Kernel Mean Embedding Specification [70.044322798187]
We present a two-phase framework for finding helpful models for a current application. In the upload phase, when a model is uploading into the pool, we construct a reduced kernel mean embedding (RKME) as a specification for the model. Then in the deployment phase, the relatedness of the current task and pre-trained models will be measured based on the value of the RKME specification.
arXiv Detail & Related papers (2020-01-20T15:15:07Z)
Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study [81.11161697133095]
We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives. Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models. As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
arXiv Detail & Related papers (2020-01-12T04:33:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.