Related papers: Building Domain-Specific LLMs Faithful To The Islamic Worldview: Mirage or Technical Possibility?

Building Domain-Specific LLMs Faithful To The Islamic Worldview: Mirage or Technical Possibility?

URL: http://arxiv.org/abs/2312.06652v1
Date: Mon, 11 Dec 2023 18:59:09 GMT
Title: Building Domain-Specific LLMs Faithful To The Islamic Worldview: Mirage or Technical Possibility?
Authors: Shabaz Patel, Hassan Kane, Rayhan Patel
Abstract summary: Large Language Models (LLMs) have demonstrated remarkable performance across numerous natural language understanding use cases. In the context of Islam and its representation, accurate and factual representation of its beliefs and teachings rooted in the Quran and Sunnah is key. This work focuses on the challenge of building domain-specific LLMs faithful to the Islamic worldview.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across numerous natural language understanding use cases. However, this impressive performance comes with inherent limitations, such as the tendency to perpetuate stereotypical biases or fabricate non-existent facts. In the context of Islam and its representation, accurate and factual representation of its beliefs and teachings rooted in the Quran and Sunnah is key. This work focuses on the challenge of building domain-specific LLMs faithful to the Islamic worldview and proposes ways to build and evaluate such systems. Firstly, we define this open-ended goal as a technical problem and propose various solutions. Subsequently, we critically examine known challenges inherent to each approach and highlight evaluation methodologies that can be used to assess such systems. This work highlights the need for high-quality datasets, evaluations, and interdisciplinary work blending machine learning with Islamic scholarship.

Related papers

LLM-Crowdsourced: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models [13.713870642186254]
Large language models (LLMs) demonstrate remarkable capabilities across various tasks.<n>Existing evaluation methods suffer from issues such as data contamination, black-box operation, and subjective preference.<n>We propose a novel benchmark-free evaluation paradigm, LLM-Crowdsourced.
arXiv Detail & Related papers (2025-07-30T03:50:46Z)
Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models [10.1080193179562]
Current understanding models excel at recognizing "what" but fall short in high-level cognitive tasks like causal reasoning and future prediction.<n>We propose a novel framework that fuses a powerful Vision Foundation Model for deep visual perception with a Large Language Model (LLM) serving as a knowledge-driven reasoning core.
arXiv Detail & Related papers (2025-07-08T09:43:17Z)
Truly Assessing Fluid Intelligence of Large Language Models through Dynamic Reasoning Evaluation [75.26829371493189]
Large language models (LLMs) have demonstrated impressive reasoning capacities that mirror human-like thinking.<n>Existing reasoning benchmarks either focus on domain-specific knowledge (crystallized intelligence) or lack interpretability.<n>We propose DRE-Bench, a dynamic reasoning evaluation benchmark grounded in a hierarchical cognitive framework.
arXiv Detail & Related papers (2025-06-03T09:01:08Z)
A Call for New Recipes to Enhance Spatial Reasoning in MLLMs [85.67171333213301]
Multimodal Large Language Models (MLLMs) have demonstrated impressive performance in general vision-language tasks. Recent studies have exposed critical limitations in their spatial reasoning capabilities. This deficiency in spatial reasoning significantly constrains MLLMs' ability to interact effectively with the physical world.
arXiv Detail & Related papers (2025-04-21T11:48:39Z)
ChineseSimpleVQA -- "See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models [38.921977141721605]
We introduce the first factuality-based visual question-answering benchmark in Chinese, named ChineseSimpleVQA. Key features of this benchmark include a focus on the Chinese language, diverse knowledge types, a multi-hop question construction, high-quality data, static consistency, and easy-to-evaluate through short answers.
arXiv Detail & Related papers (2025-02-17T12:02:23Z)
Challenges in Guardrailing Large Language Models for Science [0.21990652930491852]
We provide guidelines for deploying large language models (LLMs) in the scientific domain. We identify specific challenges -- including time sensitivity, knowledge contextualization, conflict resolution, and intellectual property concerns. These guardrail dimensions include trustworthiness, ethics & bias, safety, and legal aspects.
arXiv Detail & Related papers (2024-11-12T20:57:12Z)
BloomWise: Enhancing Problem-Solving capabilities of Large Language Models using Bloom's-Taxonomy-Inspired Prompts [59.83547898874152]
BloomWise is a cognitively-inspired prompting technique for large language models (LLMs)<n>It is designed to enhance LLMs' performance on mathematical problem solving while making their solutions more explainable.
arXiv Detail & Related papers (2024-10-05T09:27:52Z)
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows" [74.7488607599921]
FaithEval is a benchmark to evaluate the faithfulness of large language models (LLMs) in contextual scenarios. FaithEval comprises 4.9K high-quality problems in total, validated through a rigorous four-stage context construction and validation framework.
arXiv Detail & Related papers (2024-09-30T06:27:53Z)
A Benchmark Dataset with Larger Context for Non-Factoid Question Answering over Islamic Text [0.16385815610837165]
We introduce a comprehensive dataset meticulously crafted for Question-Answering purposes within the domain of Quranic Tafsir and Ahadith. This dataset comprises a robust collection of over 73,000 question-answer pairs, standing as the largest reported dataset in this specialized domain. While this paper highlights the dataset's contributions, our subsequent human evaluation uncovered critical insights regarding the limitations of existing automatic evaluation techniques.
arXiv Detail & Related papers (2024-09-15T19:50:00Z)
Towards Few-Shot Learning in the Open World: A Review and Beyond [52.41344813375177]
Few-shot learning aims to mimic human intelligence by enabling significant generalizations and transferability. This paper presents a review of recent advancements designed to adapt FSL for use in open-world settings. We categorize existing methods into three distinct types of open-world few-shot learning: those involving varying instances, varying classes, and varying distributions.
arXiv Detail & Related papers (2024-08-19T06:23:21Z)
Meta Reasoning for Large Language Models [58.87183757029041]
We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs) MRP guides LLMs to dynamically select and apply different reasoning methods based on the specific requirements of each task. We evaluate the effectiveness of MRP through comprehensive benchmarks.
arXiv Detail & Related papers (2024-06-17T16:14:11Z)
Standards for Belief Representations in LLMs [0.0]
We argue adequacy for a representation in filling to count as belief-like. We establish four criteria that balance theoretical considerations with practical constraints. Our proposed criteria include accuracy, coherence, uniformity, and use.
arXiv Detail & Related papers (2024-05-31T17:21:52Z)
FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks. We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z)
From Understanding to Utilization: A Survey on Explainability for Large Language Models [27.295767173801426]
This survey underscores the imperative for increased explainability in Large Language Models (LLMs) Our focus is primarily on pre-trained Transformer-based LLMs, which pose distinctive interpretability challenges due to their scale and complexity. When considering the utilization of explainability, we explore several compelling methods that concentrate on model editing, control generation, and model enhancement.
arXiv Detail & Related papers (2024-01-23T16:09:53Z)
AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception [64.25808552299905]
AesBench is an expert benchmark aiming to comprehensively evaluate the aesthetic perception capacities of MLLMs. We construct an Expert-labeled Aesthetics Perception Database (EAPD), which features diversified image contents and high-quality annotations provided by professional aesthetic experts. We propose a set of integrative criteria to measure the aesthetic perception abilities of MLLMs from four perspectives, including Perception (AesP), Empathy (AesE), Assessment (AesA) and Interpretation (AesI)
arXiv Detail & Related papers (2024-01-16T10:58:07Z)
Brain in a Vat: On Missing Pieces Towards Artificial General Intelligence in Large Language Models [83.63242931107638]
We propose four characteristics of generally intelligent agents. We argue that active engagement with objects in the real world delivers more robust signals for forming conceptual representations. We conclude by outlining promising future research directions in the field of artificial general intelligence.
arXiv Detail & Related papers (2023-07-07T13:58:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.