Fugu-MT 論文翻訳(概要): Tourism Question Answer System in Indian Language using Domain-Adapted Foundation Models

論文の概要: Tourism Question Answer System in Indian Language using Domain-Adapted Foundation Models

arxiv url: http://arxiv.org/abs/2511.23235v1
Date: Fri, 28 Nov 2025 14:44:16 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-01 19:47:55.937931
Title: Tourism Question Answer System in Indian Language using Domain-Adapted Foundation Models
Title（参考訳）: ドメイン適応基礎モデルを用いたインド語の観光質問応答システム
Authors: Praveen Gatla, Anushka, Nikita Kanwar, Gouri Sahoo, Rajesh Kumar Mundotiya,
Abstract要約: 本稿では,ヒンディー語観光地を対象としたベースライン抽出質問応答システム(QA)の設計に関する総合的研究について述べる。 Ganga Aarti、Cruise、Food Court、Public Toilet、Kund、Museum、General、Ashram、Temple、Travelの10種類の観光中心の変種をターゲットにしている。パラメータ効率とタスク性能を最適化するために,Supervised Fine-Tuning (SFT) と Low-Rank Adaptation (LoRA) を用いて微調整を行う基盤モデル-BERT と RoBERTa を利用するフレームワークを提案する。
参考スコア（独自算出の注目度）: 0.6524460254566904
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This article presents the first comprehensive study on designing a baseline extractive question-answering (QA) system for the Hindi tourism domain, with a specialized focus on the Varanasi-a cultural and spiritual hub renowned for its Bhakti-Bhaav (devotional ethos). Targeting ten tourism-centric subdomains-Ganga Aarti, Cruise, Food Court, Public Toilet, Kund, Museum, General, Ashram, Temple and Travel, the work addresses the absence of language-specific QA resources in Hindi for culturally nuanced applications. In this paper, a dataset comprising 7,715 Hindi QA pairs pertaining to Varanasi tourism was constructed and subsequently augmented with 27,455 pairs generated via Llama zero-shot prompting. We propose a framework leveraging foundation models-BERT and RoBERTa, fine-tuned using Supervised Fine-Tuning (SFT) and Low-Rank Adaptation (LoRA), to optimize parameter efficiency and task performance. Multiple variants of BERT, including pre-trained languages (e.g., Hindi-BERT), are evaluated to assess their suitability for low-resource domain-specific QA. Evaluation metrics - F1, BLEU, and ROUGE-L - highlight trade-offs between answer precision and linguistic fluency. Experiments demonstrate that LoRA-based fine-tuning achieves competitive performance (85.3\% F1) while reducing trainable parameters by 98\% compared to SFT, striking a balance between efficiency and accuracy. Comparative analysis across models reveals that RoBERTa with SFT outperforms BERT variants in capturing contextual nuances, particularly for culturally embedded terms (e.g., Aarti, Kund). This work establishes a foundational baseline for Hindi tourism QA systems, emphasizing the role of LORA in low-resource settings and underscoring the need for culturally contextualized NLP frameworks in the tourism domain.
Abstract（参考訳）: 本稿では,バクティ・バハーヴ(Bhakti-Bhaav)の文化的・精神的な中心地であるヴァラナシ(Varanasi)に特化して,ヒンディー語観光ドメインのベースライン抽出質問応答システム(QA)を設計する,最初の総合的研究について述べる。観光中心の10のサブドメイン(Ganga Aarti、Cruise、Food Court、Public Toilet、Kund、Museum、General、Ashram、Temple and Travel)をターゲットとし、ヒンディー語固有のQAリソースの欠如に対処する。本稿では,バラナシ観光に関連する7,715組のヒンディー語QAペアからなるデータセットを構築し,Llamaゼロショットプロンプトを用いて27,455組のペアを生成した。パラメータ効率とタスク性能を最適化するために,Supervised Fine-Tuning (SFT) と Low-Rank Adaptation (LoRA) を用いて微調整を行う基盤モデル-BERT と RoBERTa を利用するフレームワークを提案する。トレーニング済み言語(例えばHindi-BERT)を含むBERTの複数変種を評価し、低リソースドメイン固有のQAに対する適合性を評価する。評価指標(F1、BLEU、ROUGE-L)は、解答精度と言語流布のトレードオフを強調している。実験により、LoRAベースの微調整は、SFTと比較してトレーニング可能なパラメータを98\%削減し、効率と精度のバランスを保ちながら、競争性能(85.3\% F1)を達成することが示された。モデル間の比較分析により、RoBERTaとSFTは、特に文化的に埋め込まれた用語(例えば、Aarti、Kund)の文脈的ニュアンスを捉える際に、BERTの変種よりも優れていることが明らかになった。本研究は、低リソース環境におけるLORAの役割を強調し、観光領域における文化的文脈化NLPフレームワークの必要性を強調し、ヒンディー語観光QAシステムの基盤となるベースラインを確立する。

論文の概要: Tourism Question Answer System in Indian Language using Domain-Adapted Foundation Models

関連論文リスト