Building a Macedonian Recipe Dataset: Collection, Parsing, and Comparative Analysis
- URL: http://arxiv.org/abs/2510.14128v1
- Date: Wed, 15 Oct 2025 21:54:23 GMT
- Title: Building a Macedonian Recipe Dataset: Collection, Parsing, and Comparative Analysis
- Authors: Darko Sasanski, Dimitar Peshevski, Riste Stojanov, Dimitar Trajanov,
- Abstract summary: We present the first systematic effort to construct a Macedonian recipe dataset through web scraping and structured parsing.<n>An exploratory analysis of ingredient frequency and co-occurrence patterns, using measures such as Pointwise Mutual Information and Lift score, highlights distinctive ingredient combinations that characterize Macedonian cuisine.
- Score: 0.0538441598991272
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Computational gastronomy increasingly relies on diverse, high-quality recipe datasets to capture regional culinary traditions. Although there are large-scale collections for major languages, Macedonian recipes remain under-represented in digital research. In this work, we present the first systematic effort to construct a Macedonian recipe dataset through web scraping and structured parsing. We address challenges in processing heterogeneous ingredient descriptions, including unit, quantity, and descriptor normalization. An exploratory analysis of ingredient frequency and co-occurrence patterns, using measures such as Pointwise Mutual Information and Lift score, highlights distinctive ingredient combinations that characterize Macedonian cuisine. The resulting dataset contributes a new resource for studying food culture in underrepresented languages and offers insights into the unique patterns of Macedonian culinary tradition.
Related papers
- Mitigating Cross-modal Representation Bias for Multicultural Image-to-Recipe Retrieval [33.17028372962136]
Cross-modal representations to bridge the modality gap between images and recipes tend to ignore subtle recipe-specific details.<n>This paper proposes a novel causal approach that predicts the culinary elements potentially overlooked in images.<n> Experiments are conducted on the standard monolingual Recipe1M dataset and a newly curated multilingual multicultural cuisine dataset.
arXiv Detail & Related papers (2025-10-23T09:43:43Z) - Retrieval Augmented Recipe Generation [96.43285670458803]
We propose a retrieval augmented large multimodal model for recipe generation.<n>It retrieves recipes semantically related to the image from an existing datastore as a supplement.<n>It calculates the consistency among generated recipe candidates, which use different retrieval recipes as context for generation.
arXiv Detail & Related papers (2024-11-13T15:58:50Z) - A topological analysis of the space of recipes [0.0]
We introduce the use of topological data analysis, especially persistent homology, in order to study the space of culinary recipes.
In particular, persistent homology analysis provides a set of recipes surrounding the multiscale "holes" in the space of existing recipes.
arXiv Detail & Related papers (2024-06-12T01:28:16Z) - CookingSense: A Culinary Knowledgebase with Multidisciplinary Assertions [23.21190348451355]
CookingSense is a descriptive collection of knowledge assertions in the culinary domain extracted from various sources.
CookingSense is constructed through a series of dictionary-based filtering and language model-based semantic filtering techniques.
We present FoodBench, a novel benchmark to evaluate culinary decision support systems.
arXiv Detail & Related papers (2024-05-01T13:58:09Z) - Counterfactual Recipe Generation: Exploring Compositional Generalization
in a Realistic Scenario [60.20197771545983]
We design the counterfactual recipe generation task, which asks models to modify a base recipe according to the change of an ingredient.
We collect a large-scale recipe dataset in Chinese for models to learn culinary knowledge.
Results show that existing models have difficulties in modifying the ingredients while preserving the original text style, and often miss actions that need to be adjusted.
arXiv Detail & Related papers (2022-10-20T17:21:46Z) - Cross-lingual Adaptation for Recipe Retrieval with Mixup [56.79360103639741]
Cross-modal recipe retrieval has attracted research attention in recent years, thanks to the availability of large-scale paired data for training.
This paper studies unsupervised domain adaptation for image-to-recipe retrieval, where recipes in source and target domains are in different languages.
A novel recipe mixup method is proposed to learn transferable embedding features between the two domains.
arXiv Detail & Related papers (2022-05-08T15:04:39Z) - Assistive Recipe Editing through Critiquing [34.1050269670062]
RecipeCrit is a hierarchical denoising auto-encoder that edits recipes given ingredient-level critiques.
Our work's main innovation is our unsupervised critiquing module that allows users to edit recipes by interacting with the predicted ingredients.
arXiv Detail & Related papers (2022-05-05T05:52:27Z) - Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers
and Self-supervised Learning [17.42688184238741]
Cross-modal recipe retrieval has recently gained substantial attention due to the importance of food in people's lives.
We propose a simplified end-to-end model based on well established and high performing encoders for text and images.
Our proposed method achieves state-of-the-art performance in the cross-modal recipe retrieval task on the Recipe1M dataset.
arXiv Detail & Related papers (2021-03-24T10:17:09Z) - Multi-modal Cooking Workflow Construction for Food Recipes [147.4435186953995]
We build MM-ReS, the first large-scale dataset for cooking workflow construction.
We propose a neural encoder-decoder model that utilizes both visual and textual information to construct the cooking workflow.
arXiv Detail & Related papers (2020-08-20T18:31:25Z) - Decomposing Generation Networks with Structure Prediction for Recipe
Generation [142.047662926209]
We propose a novel framework: Decomposing Generation Networks (DGN) with structure prediction.
Specifically, we split each cooking instruction into several phases, and assign different sub-generators to each phase.
Our approach includes two novel ideas: (i) learning the recipe structures with the global structure prediction component and (ii) producing recipe phases in the sub-generator output component based on the predicted structure.
arXiv Detail & Related papers (2020-07-27T08:47:50Z) - Classification of Cuisines from Sequentially Structured Recipes [8.696042114987966]
classification of cuisines based on their culinary features is an outstanding problem.
We have implemented a range of classification techniques by accounting for this information on the RecipeDB dataset.
The state-of-the-art RoBERTa model presented the highest accuracy of 73.30% among a range of classification models.
arXiv Detail & Related papers (2020-04-26T05:40:36Z) - Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images
and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another.
We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities.
We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.