Palestine-RAG

Retrieval-Augmented Generation for contextually grounded and factually verified knowledge about Palestine

Abstract

This paper presents Palestine-RAG, a domain-specific Retrieval-Augmented Generation (RAG) framework developed to counter the underrepresentation and mischaracterization of Palestinian history, legal discourse, and current events in mainstream language models. We construct a high-quality, culturally informed dataset by aggregating content from authoritative sources including Palquest.org, United Nations resolutions, International Court of Justice (ICJ) rulings, historical archives, and reputable news outlets. To evaluate model performance, we introduce the first multiple-choice question (MCQ) benchmarking dataset for this domain, comprising 222 manually crafted questions systematically categorized according to Bloom’s Taxonomy to capture varying levels of cognitive complexity. We benchmark 26 language models and demonstrate that retrieval-augmented approaches consistently outperform non-retrieval large language models in both factual accuracy and depth of reasoning, particularly within politically nuanced and historically complex contexts.

🌐 Vision

Palestine-RAG represents the next generation of context-grounded, retrieval-driven AI systems — combining knowledge integrity, transparency, and scalability. It is a step toward AI models that reason over history responsibly, guided by verifiable data and contextual awareness.

The long-term goal is to expand Palestine-RAG into a multi-domain retrieval ecosystem, integrating dynamic updates from news sources, research repositories, and archival databases — ensuring that every generated response remains current, factual, and contextually precise.

Palestine-RAG: Retrieval-Augmented Generation for Contextually Grounded and Factually Verified Knowledge

Palestine-RAG is a large language model (LLM)-powered Retrieval-Augmented Generation (RAG) system designed to provide accurate, contextually grounded, and evidence-backed answers to questions about Palestine’s history, culture, politics, and legal discourse.

The system addresses a key limitation in traditional LLMs — hallucination and bias in politically sensitive topics — by tightly coupling generative reasoning with document-level factual retrieval from trusted and verified sources.

⚙️ Core Concept

At its foundation, Palestine-RAG uses a retriever–generator pipeline:

  1. The retriever searches a Palestine-specific vector database containing semantically encoded documents from curated sources.
  2. The generator (a fine-tuned local LLM) synthesizes the retrieved context into coherent, well-structured responses.

Each answer is built from verifiable passages, ensuring transparency and traceability.

🔍 Objectives

  • Deliver factual, context-aware answers to user queries related to Palestine.
  • Combat misinformation by grounding outputs in authenticated, domain-specific data.
  • Demonstrate the practical application of retrieval-augmented generation in a culturally and politically nuanced domain.
  • Serve as an open research platform for evaluating truthful reasoning in RAG-based systems.

🧠 Key Features

1. Retrieval-Augmented Knowledge Grounding

  • All model responses are built from a curated knowledge base containing:

    • United Nations resolutions related to Palestine.
    • International Court of Justice (ICJ) and International Criminal Court (ICC) documents.
    • Historical archives and academic publications.
    • Credible news sources such as Reuters, Al Jazeera, and Middle East Eye.
    • Specialized open-source repositories on Palestinian history, law, and geopolitics.
  • The retriever employs semantic search via dense embeddings to extract contextually relevant content with high precision.
  • Retrieved passages are appended to the model prompt to ground the final response.

2. Factually Verified Question Answering

  • The model generates detailed, contextually rich answers that include:

    • Explanations supported by retrieved evidence.
    • Optional source citations or snippets showing which documents informed the response.
    • Hierarchical context awareness, ensuring coherence across multiple exchanges within the same conversation.

3. Interactive Web Interface

  • A modern and seamless chat interface built with React, TypeScript, and Tailwind CSS.
  • Features include:

    • Real-time streaming of model outputs.
    • Chat history displayed in a sidebar with timestamps and preview snippets.
    • Integrated user feedback system (rating or thumbs up/down).
    • Reference panel showing retrieved documents or excerpts.
  • The design prioritizes clarity, responsiveness, and user focus, ensuring an immersive experience without visual clutter.

🧩 System Architecture

Palestine-RAG operates through four main stages:

1. Data Curation and Processing

  • Aggregates thousands of authoritative documents from legal, academic, and media domains.
  • Performs:

    • Text normalization
    • Deduplication
    • Sentence segmentation
    • Embedding generation using transformer-based encoders

2. Vector Database Layer

  • All processed text chunks are stored in a vector database optimized for high-dimensional similarity search.
  • Queries are embedded and compared via cosine similarity to identify top-k relevant segments.

3. Retrieval and Context Assembly

  • Selected documents are concatenated and structured into a context prompt.
  • A context window optimization algorithm ensures relevant and non-redundant retrievals for long queries.

4. Generative Response Synthesis

  • The model generates a structured, human-readable answer while referencing retrieved evidence.
  • Outputs are filtered, formatted, and streamed back to the user through the frontend interface.

📊 Evaluation Framework

Palestine-RAG includes a custom cognitive benchmarking framework that evaluates performance using a multiple-choice question (MCQ) dataset grounded in Bloom’s Taxonomy. This benchmark assesses:

  • Factual recall
  • Conceptual understanding
  • Analytical reasoning
  • Judgment and synthesis capabilities

The system has been tested against 26 open-source LLM configurations, demonstrating up to 80% higher factual accuracy compared to non-retrieval models. The best-performing version, Palestine-RAG (Qwen3-4B), achieved an accuracy of over 92% on domain-specific benchmarks.

🧭 Impact

Palestine-RAG aims to:

  • Establish a transparent and auditable AI framework for complex geopolitical and historical domains.
  • Promote evidence-based public understanding of Palestinian history and rights.
  • Serve as a research and educational tool for policy analysts, students, and journalists seeking reliable, AI-assisted insights.
  • Demonstrate how domain-specific RAG architectures can outperform generic large-scale models in factuality and ethical alignment.

Open this project page

You can open this project page directly in your browser:

  • View: [https://llm-lab.qcri.org/palestine/]

Click the link above to open the page.