Build a RAG Pipeline with Python

This guide provides a step-by-step walkthrough to building a complete Retrieval-Augmented Generation (RAG) system. By leveraging FoxNose as your knowledge layer, you will create a working Q&A application that delivers accurate, grounded answers sourced directly from your FoxNose knowledge base.

Looking for the JavaScript version? See Build a RAG Pipeline with JavaScript / Node.js.

What you'll build: A Python application that:

Takes a user question.
Searches your FoxNose knowledge base for relevant context.
Sends the context and question to an LLM (e.g., GPT-4).
Returns a grounded answer with clear source references.

Prerequisites:

A FoxNose environment with some content already present (see our Quick Start guide to set this up if needed).
A configured Flux API with at least one folder connected and accessible.
Python 3.9 or higher.
An OpenAI API key.

Step 1: Set Up Your Project

Begin by creating a new project directory and setting up a virtual environment. Then, install the necessary Python libraries:

mkdir foxnose-rag && cd foxnose-rag
python -m venv venv
source venv/bin/activate  # On Windows, use: venv\Scripts\activate

pip install foxnose-sdk openai python-dotenv

foxnose-sdk: The official FoxNose Python SDK for interacting with the Flux API.
openai: The official client library for interacting with OpenAI's LLMs.
python-dotenv: To load environment variables from a .env file for secure credential management.

Next, create a .env file in your project's root directory to securely store your credentials:

OPENAI_API_KEY=sk-...
FOXNOSE_ENV_KEY=your-environment-key            # Your FoxNose Environment Key
FOXNOSE_PUBLIC_KEY=your-public-key              # Your Flux API public key
FOXNOSE_SECRET_KEY=your-secret-key              # Your Flux API secret key

You can find your FOXNOSE_ENV_KEY in the FoxNose dashboard under Environment → Settings. The API keys are generated when you create a Flux API key.

Step 2: Initialize the FoxNose SDK

Create a file named foxnose_client.py. This module initializes the FoxNose SDK and provides a flexible search function.

import os
from dataclasses import dataclass
from typing import List, Optional
from dotenv import load_dotenv
from foxnose_sdk.flux import FluxClient
from foxnose_sdk.auth import SimpleKeyAuth

load_dotenv()


@dataclass
class SearchResult:
    """A simple data structure to hold each search result from FoxNose."""
    key: str
    title: str
    content: str
    url: Optional[str] = None

    def __str__(self):
        return f"[{self.title}]\n{self.content}"


def get_flux_client(api_prefix: str) -> FluxClient:
    """Creates a FluxClient for the specified API prefix."""
    return FluxClient(
        base_url=f"https://{os.getenv('FOXNOSE_ENV_KEY')}.fxns.io",
        api_prefix=api_prefix,
        auth=SimpleKeyAuth(
            os.getenv("FOXNOSE_PUBLIC_KEY"),
            os.getenv("FOXNOSE_SECRET_KEY"),
        ),
    )


def search_knowledge_base(
    api_prefix: str,
    folder_path: str,
    query: str,
    *,
    mode: str = "hybrid",
    top_k: int = 5,
    similarity_threshold: float = 0.7,
    filters: Optional[dict] = None,
    content_field: str = "body",
    title_field: str = "title",
    url_field: str = "url",
) -> List[SearchResult]:
    """
    Searches a FoxNose folder for relevant content.

    Args:
        api_prefix: The Flux API prefix (e.g., 'my_api').
        folder_path: Path to the folder to search (e.g., 'knowledge-base').
        query: The natural language search query.
        mode: Search mode - 'vector', 'hybrid', or 'text'.
        top_k: Maximum number of results to return.
        similarity_threshold: Minimum similarity score (0-1) for vector results.
        filters: Optional structured filters (e.g., {"status__eq": "published"}).
        content_field: Name of the field containing main content.
        title_field: Name of the field containing the title.
        url_field: Name of the field containing the source URL.
    """
    client = get_flux_client(api_prefix)

    # Build the search request body
    body = {"limit": top_k}

    if mode == "vector":
        body["search_mode"] = "vector"
        body["vector_search"] = {
            "query": query,
            "top_k": top_k,
            "similarity_threshold": similarity_threshold,
        }
    elif mode == "hybrid":
        body["search_mode"] = "hybrid"
        body["find_text"] = {"query": query}
        body["vector_search"] = {
            "query": query,
            "top_k": top_k,
            "similarity_threshold": similarity_threshold,
        }
    else:  # text mode
        body["find_text"] = {"query": query}

    # Add structured filters if provided
    if filters:
        body["where"] = {"$": {"all_of": [{k: v} for k, v in filters.items()]}}

    # Execute the search
    response = client.search(folder_path, body=body)
    client.close()

    # Parse results into SearchResult objects
    results = []
    for resource in response.get("results", []):
        data = resource.get("data", {})
        results.append(SearchResult(
            key=resource.get("_sys", {}).get("key", ""),
            title=data.get(title_field, "Untitled"),
            content=data.get(content_field, ""),
            url=data.get(url_field),
        ))

    return results

This code returns structured SearchResult objects with optional source URLs for citations. Field names (content_field, title_field, url_field) are configurable per call.

Step 3: Build the RAG Function

Create rag.py. This file will contain the core logic for your RAG pipeline, which orchestrates the calls to FoxNose and your LLM.

from openai import OpenAI
from foxnose_client import search_knowledge_base, SearchResult
from typing import List, Optional

openai_client = OpenAI()

# Configure which API and folder to search (adjust for your setup)
API_PREFIX = "my_api"
FOLDER_PATH = "path/to/knowledge-base"


def build_prompt_context(results: List[SearchResult]) -> str:
    """Formats FoxNose search results into a string for the LLM context."""
    if not results:
        return "No relevant information was found in the knowledge base."

    context_parts = []
    for i, result in enumerate(results, 1):
        source_info = f"[Source {i}: {result.title}]"
        if result.url:
            source_info += f" ({result.url})"
        context_parts.append(f"{source_info}\n{result.content}")

    return "\n\n---\n\n".join(context_parts)


def ask(
    question: str,
    top_k: int = 5,
    similarity_threshold: float = 0.7,
    filters: Optional[dict] = None,
) -> dict:
    """
    Asks a question, retrieves context from FoxNose, and generates an answer using an LLM.

    Returns a dictionary containing the answer and the sources used.
    """
    # 1. Retrieve relevant context from your FoxNose knowledge base
    print(f"Searching for context related to: '{question}'...")
    results = search_knowledge_base(
        API_PREFIX,
        FOLDER_PATH,
        question,
        mode="hybrid",  # Combines semantic + text search for best results
        top_k=top_k,
        similarity_threshold=similarity_threshold,
        filters=filters,
    )

    # 2. Build the context string to be injected into the LLM prompt
    context = build_prompt_context(results)

    # 3. Generate a grounded answer using the LLM and the retrieved context
    system_prompt = """You are a helpful assistant that answers questions based ONLY on the provided context.
- If the context doesn't contain the answer, state that you don't have enough information.
- Be concise and direct.
- When possible, cite the source title and include the URL so users can read more."""

    user_prompt = f"""Context:
{context}

Question: {question}

Answer:"""

    print("Generating answer with LLM...")
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
        temperature=0,
    )

    answer = response.choices[0].message.content

    # 4. Return the final answer and the source documents for verification
    return {
        "answer": answer,
        "sources": [
            {"key": r.key, "title": r.title, "url": r.url}
            for r in results
        ],
    }

Key design decisions in this RAG function:

API_PREFIX and FOLDER_PATH are defined as constants here for simplicity, but you can make them parameters if your agent needs to search multiple sources.
Uses hybrid search mode by default for the best balance of semantic understanding and keyword matching.
The system_prompt strictly instructs the LLM to only use the provided context, reducing hallucinations.
Source URLs are included in the context, allowing the LLM to cite them in responses.

Step 4: Create the Main Application Loop

Create a final file, main.py, to run your Q&A application from the command line.

from rag import ask

def main():
    print("FoxNose RAG Q&A System")
    print("Type 'quit' or 'exit' to stop.\n")

    while True:
        question = input("Question: ").strip()
        if question.lower() in ['quit', 'exit']:
            break
        if not question:
            continue

        result = ask(question)

        print(f"\nAnswer: {result['answer']}\n")

        if result['sources']:
            print("Sources:")
            for source in result['sources']:
                if source.get('url'):
                    print(f"  - {source['title']}: {source['url']}")
                else:
                    print(f"  - {source['title']} (ID: {source['key']})")
        print("-" * 40)

if __name__ == "__main__":
    main()

Run your application from the terminal:

python main.py

Step 5 (Optional): Add Structured Filters

For questions that include specific constraints (e.g., "What was our revenue in Q4 2024?"), you can combine semantic search with structured filters.

Modify the ask() function in rag.py to support filtering by category:

# In rag.py, modify the ask() function signature:

def ask(
    question: str,
    category: Optional[str] = None,  # New parameter for filtering
    top_k: int = 5,
    similarity_threshold: float = 0.7,
) -> dict:

    # Build a filter dictionary if a category is provided
    filters = None
    if category:
        filters = {"category__eq": category}

    results = search_knowledge_base(
        API_PREFIX,
        FOLDER_PATH,
        question,
        mode="hybrid",
        top_k=top_k,
        similarity_threshold=similarity_threshold,
        filters=filters,
    )

    # ... the rest of the function remains the same

Now your application can handle both general and constrained questions:

# A general question
ask("How do I reset my password?")

# A constrained question filtered by category
ask("What's our return policy?", category="policies")

The filters parameter supports all FoxNose filter operators. See the Search & Filtering guide for the full list.

Step 6 (Optional): Integrate with LangChain

If you're using LangChain, the official langchain-foxnose package provides a ready-made FoxNoseRetriever that plugs straight into any LangChain chain.

Install the package:

pip install langchain-foxnose langchain-openai

Create retriever.py:

import os
from dotenv import load_dotenv
from langchain_foxnose import FoxNoseRetriever
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA

load_dotenv()

retriever = FoxNoseRetriever.from_client_params(
    base_url=f"https://{os.getenv('FOXNOSE_ENV_KEY')}.fxns.io",
    api_prefix="my_api",
    public_key=os.getenv("FOXNOSE_PUBLIC_KEY"),
    secret_key=os.getenv("FOXNOSE_SECRET_KEY"),
    folder="knowledge-base",
    search_mode="hybrid",
    content_field="body",
)

qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model="gpt-4o", temperature=0),
    retriever=retriever,
    return_source_documents=True,
)

# Example usage
if __name__ == "__main__":
    result = qa_chain.invoke({"query": "How do I reset my password?"})
    print(result["result"])

    print("\nSources:")
    for doc in result["source_documents"]:
        print(f"  - {doc.metadata['title']}")

See the LangChain Integration page for more examples — vector search, hybrid with custom weights, filtered retrieval, and async support. For broader RAG architecture patterns, see the LLM Integrations guide.

Troubleshooting

No results returned? Check that your folder has content with vectorizable fields and try lowering the similarity_threshold (e.g., to 0.5). Also, verify the folder is connected to your Flux API with get_many access.
Irrelevant results? Ensure content_field matches your schema's vectorizable field. Consider adding filters to narrow down the context.
Authentication errors? Verify your FOXNOSE_PUBLIC_KEY and FOXNOSE_SECRET_KEY are correct. See the Flux Authentication guide for details.
Connection errors? Check that your FOXNOSE_ENV_KEY is correct and the api_prefix matches an existing Flux API.

Next Steps & API Reference

LLM Integrations Overview → Review the high-level architecture and best practices for RAG systems.
Search & Filtering Guide → Master all search modes, including filters, joins, and pagination.
Flux API Reference → Get full technical details on all API endpoints.