Build a RAG Pipeline with Python
This guide provides a step-by-step walkthrough to building a complete Retrieval-Augmented Generation (RAG) system. By leveraging FoxNose as your knowledge layer, you will create a working Q&A application that delivers accurate, grounded answers sourced directly from your FoxNose knowledge base.
Looking for the JavaScript version? See Build a RAG Pipeline with JavaScript / Node.js.
What you'll build: A Python application that:
- Takes a user question.
- Searches your FoxNose knowledge base for relevant context.
- Sends the context and question to an LLM (e.g., GPT-4).
- Returns a grounded answer with clear source references.
Prerequisites:
- A FoxNose environment with some content already present (see our Quick Start guide to set this up if needed).
- A configured Flux API with at least one folder connected and accessible.
- Python 3.9 or higher.
- An OpenAI API key.
Step 1: Set Up Your Project
Begin by creating a new project directory and setting up a virtual environment. Then, install the necessary Python libraries:
mkdir foxnose-rag && cd foxnose-rag
python -m venv venv
source venv/bin/activate # On Windows, use: venv\Scripts\activate
pip install foxnose-sdk openai python-dotenv
foxnose-sdk: The official FoxNose Python SDK for interacting with the Flux API.openai: The official client library for interacting with OpenAI's LLMs.python-dotenv: To load environment variables from a.envfile for secure credential management.
Next, create a .env file in your project's root directory to securely store your credentials:
OPENAI_API_KEY=sk-...
FOXNOSE_ENV_KEY=your-environment-key # Your FoxNose Environment Key
FOXNOSE_PUBLIC_KEY=your-public-key # Your Flux API public key
FOXNOSE_SECRET_KEY=your-secret-key # Your Flux API secret key
You can find your FOXNOSE_ENV_KEY in the FoxNose dashboard under Environment → Settings. The API keys are generated when you create a Flux API key.
Step 2: Initialize the FoxNose SDK
Create a file named foxnose_client.py. This module initializes the FoxNose SDK and provides a flexible search function.
import os
from dataclasses import dataclass
from typing import List, Optional
from dotenv import load_dotenv
from foxnose_sdk.flux import FluxClient
from foxnose_sdk.auth import SimpleKeyAuth
load_dotenv()
@dataclass
class SearchResult:
"""A simple data structure to hold each search result from FoxNose."""
key: str
title: str
content: str
url: Optional[str] = None
def __str__(self):
return f"[{self.title}]\n{self.content}"
def get_flux_client(api_prefix: str) -> FluxClient:
"""Creates a FluxClient for the specified API prefix."""
return FluxClient(
base_url=f"https://{os.getenv('FOXNOSE_ENV_KEY')}.fxns.io",
api_prefix=api_prefix,
auth=SimpleKeyAuth(
os.getenv("FOXNOSE_PUBLIC_KEY"),
os.getenv("FOXNOSE_SECRET_KEY"),
),
)
def search_knowledge_base(
api_prefix: str,
folder_path: str,
query: str,
*,
mode: str = "hybrid",
top_k: int = 5,
similarity_threshold: float = 0.7,
filters: Optional[dict] = None,
content_field: str = "body",
title_field: str = "title",
url_field: str = "url",
) -> List[SearchResult]:
"""
Searches a FoxNose folder for relevant content.
Args:
api_prefix: The Flux API prefix (e.g., 'my_api').
folder_path: Path to the folder to search (e.g., 'knowledge-base').
query: The natural language search query.
mode: Search mode - 'vector', 'hybrid', or 'text'.
top_k: Maximum number of results to return.
similarity_threshold: Minimum similarity score (0-1) for vector results.
filters: Optional structured filters (e.g., {"status__eq": "published"}).
content_field: Name of the field containing main content.
title_field: Name of the field containing the title.
url_field: Name of the field containing the source URL.
"""
client = get_flux_client(api_prefix)
# Build the search request body
body = {"limit": top_k}
if mode == "vector":
body["search_mode"] = "vector"
body["vector_search"] = {
"query": query,
"top_k": top_k,
"similarity_threshold": similarity_threshold,
}
elif mode == "hybrid":
body["search_mode"] = "hybrid"
body["find_text"] = {"query": query}
body["vector_search"] = {
"query": query,
"top_k": top_k,
"similarity_threshold": similarity_threshold,
}
else: # text mode
body["find_text"] = {"query": query}
# Add structured filters if provided
if filters:
body["where"] = {"$": {"all_of": [{k: v} for k, v in filters.items()]}}
# Execute the search
response = client.search(folder_path, body=body)
client.close()
# Parse results into SearchResult objects
results = []
for resource in response.get("results", []):
data = resource.get("data", {})
results.append(SearchResult(
key=resource.get("_sys", {}).get("key", ""),
title=data.get(title_field, "Untitled"),
content=data.get(content_field, ""),
url=data.get(url_field),
))
return results
This code returns structured SearchResult objects with optional source URLs for citations. Field names (content_field, title_field, url_field) are configurable per call.
Step 3: Build the RAG Function
Create rag.py. This file will contain the core logic for your RAG pipeline, which orchestrates the calls to FoxNose and your LLM.
from openai import OpenAI
from foxnose_client import search_knowledge_base, SearchResult
from typing import List, Optional
openai_client = OpenAI()
# Configure which API and folder to search (adjust for your setup)
API_PREFIX = "my_api"
FOLDER_PATH = "path/to/knowledge-base"
def build_prompt_context(results: List[SearchResult]) -> str:
"""Formats FoxNose search results into a string for the LLM context."""
if not results:
return "No relevant information was found in the knowledge base."
context_parts = []
for i, result in enumerate(results, 1):
source_info = f"[Source {i}: {result.title}]"
if result.url:
source_info += f" ({result.url})"
context_parts.append(f"{source_info}\n{result.content}")
return "\n\n---\n\n".join(context_parts)
def ask(
question: str,
top_k: int = 5,
similarity_threshold: float = 0.7,
filters: Optional[dict] = None,
) -> dict:
"""
Asks a question, retrieves context from FoxNose, and generates an answer using an LLM.
Returns a dictionary containing the answer and the sources used.
"""
# 1. Retrieve relevant context from your FoxNose knowledge base
print(f"Searching for context related to: '{question}'...")
results = search_knowledge_base(
API_PREFIX,
FOLDER_PATH,
question,
mode="hybrid", # Combines semantic + text search for best results
top_k=top_k,
similarity_threshold=similarity_threshold,
filters=filters,
)
# 2. Build the context string to be injected into the LLM prompt
context = build_prompt_context(results)
# 3. Generate a grounded answer using the LLM and the retrieved context
system_prompt = """You are a helpful assistant that answers questions based ONLY on the provided context.
- If the context doesn't contain the answer, state that you don't have enough information.
- Be concise and direct.
- When possible, cite the source title and include the URL so users can read more."""
user_prompt = f"""Context:
{context}
Question: {question}
Answer:"""
print("Generating answer with LLM...")
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
temperature=0,
)
answer = response.choices[0].message.content
# 4. Return the final answer and the source documents for verification
return {
"answer": answer,
"sources": [
{"key": r.key, "title": r.title, "url": r.url}
for r in results
],
}
Key design decisions in this RAG function:
API_PREFIXandFOLDER_PATHare defined as constants here for simplicity, but you can make them parameters if your agent needs to search multiple sources.- Uses
hybridsearch mode by default for the best balance of semantic understanding and keyword matching. - The
system_promptstrictly instructs the LLM to only use the provided context, reducing hallucinations. - Source URLs are included in the context, allowing the LLM to cite them in responses.
Step 4: Create the Main Application Loop
Create a final file, main.py, to run your Q&A application from the command line.
from rag import ask
def main():
print("FoxNose RAG Q&A System")
print("Type 'quit' or 'exit' to stop.\n")
while True:
question = input("Question: ").strip()
if question.lower() in ['quit', 'exit']:
break
if not question:
continue
result = ask(question)
print(f"\nAnswer: {result['answer']}\n")
if result['sources']:
print("Sources:")
for source in result['sources']:
if source.get('url'):
print(f" - {source['title']}: {source['url']}")
else:
print(f" - {source['title']} (ID: {source['key']})")
print("-" * 40)
if __name__ == "__main__":
main()
Run your application from the terminal:
python main.py
Step 5 (Optional): Add Structured Filters
For questions that include specific constraints (e.g., "What was our revenue in Q4 2024?"), you can combine semantic search with structured filters.
Modify the ask() function in rag.py to support filtering by category:
# In rag.py, modify the ask() function signature:
def ask(
question: str,
category: Optional[str] = None, # New parameter for filtering
top_k: int = 5,
similarity_threshold: float = 0.7,
) -> dict:
# Build a filter dictionary if a category is provided
filters = None
if category:
filters = {"category__eq": category}
results = search_knowledge_base(
API_PREFIX,
FOLDER_PATH,
question,
mode="hybrid",
top_k=top_k,
similarity_threshold=similarity_threshold,
filters=filters,
)
# ... the rest of the function remains the same
Now your application can handle both general and constrained questions:
# A general question
ask("How do I reset my password?")
# A constrained question filtered by category
ask("What's our return policy?", category="policies")
The filters parameter supports all FoxNose filter operators. See the Search & Filtering guide for the full list.
Step 6 (Optional): Integrate with LangChain
If you're using LangChain, the official langchain-foxnose package provides a ready-made FoxNoseRetriever that plugs straight into any LangChain chain.
Install the package:
pip install langchain-foxnose langchain-openai
Create retriever.py:
import os
from dotenv import load_dotenv
from langchain_foxnose import FoxNoseRetriever
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
load_dotenv()
retriever = FoxNoseRetriever.from_client_params(
base_url=f"https://{os.getenv('FOXNOSE_ENV_KEY')}.fxns.io",
api_prefix="my_api",
public_key=os.getenv("FOXNOSE_PUBLIC_KEY"),
secret_key=os.getenv("FOXNOSE_SECRET_KEY"),
folder="knowledge-base",
search_mode="hybrid",
content_field="body",
)
qa_chain = RetrievalQA.from_chain_type(
llm=ChatOpenAI(model="gpt-4o", temperature=0),
retriever=retriever,
return_source_documents=True,
)
# Example usage
if __name__ == "__main__":
result = qa_chain.invoke({"query": "How do I reset my password?"})
print(result["result"])
print("\nSources:")
for doc in result["source_documents"]:
print(f" - {doc.metadata['title']}")
See the LangChain Integration page for more examples — vector search, hybrid with custom weights, filtered retrieval, and async support. For broader RAG architecture patterns, see the LLM Integrations guide.
Troubleshooting
- No results returned? Check that your folder has content with
vectorizablefields and try lowering thesimilarity_threshold(e.g., to 0.5). Also, verify the folder is connected to your Flux API withget_manyaccess. - Irrelevant results? Ensure
content_fieldmatches your schema's vectorizable field. Consider adding filters to narrow down the context. - Authentication errors? Verify your
FOXNOSE_PUBLIC_KEYandFOXNOSE_SECRET_KEYare correct. See the Flux Authentication guide for details. - Connection errors? Check that your
FOXNOSE_ENV_KEYis correct and theapi_prefixmatches an existing Flux API.
Next Steps & API Reference
- LLM Integrations Overview → Review the high-level architecture and best practices for RAG systems.
- Search & Filtering Guide → Master all search modes, including filters, joins, and pagination.
- Flux API Reference → Get full technical details on all API endpoints.