FoxNose: The Knowledge Layer for AI

Building modern AI applications, such as chatbots, search assistants, or autonomous agents, requires more than just a powerful LLM. These models often need access to accurate, up-to-date external knowledge that isn't included in their training data.

FoxNose serves as this essential knowledge layer, a unified platform that automatically stores your content, generates embeddings, and returns semantically relevant results through a simple API. It is specifically designed to streamline the Retrieval-Augmented Generation (RAG) process for your AI applications.

The FoxNose RAG Architecture: A Simplified Flow

Building a RAG application involves several stages. FoxNose is designed to simplify the "Retrieval" part of RAG, handling the complex knowledge management and search, so your application can focus on the "Augmented Generation."

Here's how FoxNose streamlines the typical RAG workflow:

User Question
     ↓
Your Application (orchestrates the flow)
     ↓
┌─────────────────────────────────────┐
│  **FoxNose Flux API**               │
│  _Your Knowledge Layer_             │
│  • Semantic search (vector)         │
│  • Hybrid search (vector + filters) │
│  • Structured filtering             │
│  • Content Storage & Embeddings     │
│  • Localization                     │
└─────────────────────────────────────┘
     ↓
Relevant Context (grounded, accurate, up-to-date)
     ↓
LLM (GPT-4, Claude, etc.)
     ↓
Grounded Response (powered by FoxNose knowledge)

FoxNose handles the heavy lifting of knowledge retrieval:

Content Management: Stores and versions your knowledge content.
Automated Embeddings: Automatically generates and updates vector embeddings for semantic search.
Advanced Search: Provides semantic, hybrid, and keyword search, along with structured filtering, all through a single API.
Multilingual Support: Handles localization for global knowledge bases.

Your application focuses on orchestration and generation:

Querying FoxNose: Sends user questions to FoxNose to retrieve relevant context.
Prompt Engineering: Passes the retrieved context to your LLM as part of the prompt.
Response Formatting: Processes the LLM's output for the user.

Integration Patterns: Connecting FoxNose to Your AI Apps

FoxNose offers flexible ways to integrate its knowledge layer into your AI applications, adapting to your architecture and preferred LLM frameworks. The examples below use the official FoxNose Python SDK and the LangChain integration.

Installation

pip install foxnose-sdk

1. Direct API Calls

When to use: This is the simplest and most flexible approach, giving you full control over the retrieval and generation process. It works with any LLM provider or custom application.

from foxnose_sdk.flux import FluxClient
from foxnose_sdk.auth import SimpleKeyAuth
from openai import OpenAI

# Initialize the FoxNose Flux client
foxnose = FluxClient(
    base_url="https://<env_key>.fxns.io",
    api_prefix="my_api",
    auth=SimpleKeyAuth("YOUR_PUBLIC_KEY", "YOUR_SECRET_KEY"),
)

# 1. Search FoxNose for relevant context using hybrid search
user_question = "How do I reset my password?"
results = foxnose.search(
    "path/to/your/knowledge-base",  # folder path
    body={
        "search_mode": "hybrid",
        "find_text": {"query": user_question},
        "vector_search": {
            "query": user_question,
            "top_k": 5,
            "similarity_threshold": 0.7,
        },
        "limit": 5,
    },
)

# 2. Format context from search results
context = "\n\n".join([
    f"Title: {resource['data']['title']}\nContent: {resource['data']['body']}"
    for resource in results["results"]
])

# 3. Pass retrieved context and question to your LLM
openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "Answer based on the provided context. Cite sources."},
        {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {user_question}"},
    ],
)

print(response.choices[0].message.content)
foxnose.close()

This pattern provides maximum control and is suitable for custom implementations.

2. Framework Retriever (LangChain)

When to use: If you are building your RAG application with LangChain, the official langchain-foxnose package provides a ready-made retriever with support for all search modes, content mapping, metadata control, and async.

pip install langchain-foxnose langchain-openai

from langchain_foxnose import FoxNoseRetriever
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA

retriever = FoxNoseRetriever.from_client_params(
    base_url="https://<env_key>.fxns.io",
    api_prefix="my_api",
    public_key="YOUR_PUBLIC_KEY",
    secret_key="YOUR_SECRET_KEY",
    folder="path/to/your/knowledge-base",
    search_mode="hybrid",
    content_field="body",
)

qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model="gpt-4o"),
    retriever=retriever,
    return_source_documents=True,
)

result = qa_chain.invoke({"query": "How do I reset my password?"})
print(result["result"])

See the LangChain Integration page for more examples — vector search, hybrid with custom weights, filtered retrieval, and async support.

3. Agent Tool

When to use: For autonomous AI agents, FoxNose can be exposed as a tool that the agent can intelligently decide to use when it needs to retrieve external knowledge.

from langchain.tools import tool
from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from foxnose_sdk.flux import FluxClient
from foxnose_sdk.auth import SimpleKeyAuth

# Initialize the FoxNose client
foxnose = FluxClient(
    base_url="https://<env_key>.fxns.io",
    api_prefix="my_api",
    auth=SimpleKeyAuth("YOUR_PUBLIC_KEY", "YOUR_SECRET_KEY"),
)


@tool
def search_knowledge_base(query: str) -> str:
    """Searches the company knowledge base for relevant information.
    Use this tool when you need to find answers about company policies,
    product documentation, or FAQs. Input should be a natural language
    question or keywords."""
    results = foxnose.search(
        "path/to/your/knowledge-base",
        body={
            "search_mode": "hybrid",
            "find_text": {"query": query},
            "vector_search": {
                "query": query,
                "top_k": 5,
                "similarity_threshold": 0.7,
            },
            "limit": 5,
        },
    )

    if not results["results"]:
        return "No relevant information found in the knowledge base."

    # Format results for the agent
    formatted = []
    for resource in results["results"]:
        title = resource["data"].get("title", "Untitled")
        content = resource["data"].get("body", resource["data"].get("content", ""))
        formatted.append(f"**{title}**\n{content}")

    return "\n\n---\n\n".join(formatted)


# Create the agent
llm = ChatOpenAI(model="gpt-4", temperature=0)
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Use the knowledge base tool to answer questions."),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

agent = create_openai_functions_agent(llm, [search_knowledge_base], prompt)
agent_executor = AgentExecutor(agent=agent, tools=[search_knowledge_base], verbose=True)

# Run the agent
response = agent_executor.invoke({"input": "What is the return policy?"})
print(response["output"])

This pattern enables your agent to dynamically access and utilize FoxNose's knowledge when appropriate, enhancing its reasoning capabilities.

4. MCP Server Connection

When to use: This is the lowest-effort pattern. Every Flux API ships with a built-in MCP server at /{api_prefix}/_mcp, so any Model Context Protocol-aware client (Claude Code, Claude Desktop, ChatGPT Custom Connectors, OpenAI Responses API, Cursor, Continue, Cline, autonomous agents you build yourself…) can consume FoxNose as a native tool source without an SDK, retriever, or hand-written tool definition.

The MCP server publishes five fixed read-only tools — discover, get_schema, list, get_one, search — scoped to the connected folders of the chosen API prefix. The model picks which tool to call based on the input JSON schema you'd otherwise have to write yourself.

For the simplest end-to-end use, no code is required at all — just a client config:

{
  "mcpServers": {
    "foxnose-blog": {
      "transport": "http",
      "url": "https://7c9h4pwu.fxns.io/blog/_mcp",
      "headers": {
        "Authorization": "Secure ${FOXNOSE_ACCESS_KEY}:${FOXNOSE_SIGNATURE}"
      }
    }
  }
}

For an MCP-as-tool call inside the OpenAI Responses API (one-shot, no client UI needed):

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-5",
    input="What's our return policy?",
    tools=[
        {
            "type": "mcp",
            "server_label": "foxnose-kb",
            "server_url": "https://7c9h4pwu.fxns.io/support/_mcp",
            "headers": {"Authorization": "Secure <access_key>:<signature>"},
            "require_approval": "never",
        }
    ],
)
print(response.output_text)

Why this beats the LangChain / agent-tool patterns above:

No SDK install, no retriever class, no @tool wrapper, no schema description to handwrite.
All five tools (discover, get_schema, list, get_one, search) are available with one config line — the model picks the right one based on the question.
The same permissions (is_auth_required, per-folder allowed_methods) apply automatically. No second access-control surface to maintain.
Scoped per API prefix: a blog prefix only exposes blog content, a legal-kb prefix only exposes legal content. Use this to tightly scope what an agent can see.

When NOT to use MCP:

You need a fine-grained, application-specific tool surface (e.g., a single tool that wraps three FoxNose calls and post-processes the result). For that, patterns 1–3 above give you more control.
Your client doesn't speak MCP and you don't want to add mcp-proxy or build your own bridge.
Your stack is locked to OpenAI's Assistants API (which doesn't support MCP — only the newer Responses API does).

Setup guides:

Connect Claude — Claude Code, Claude Desktop, and generic MCP clients
Connect ChatGPT and OpenAI — Responses API and ChatGPT Custom Connectors

For the conceptual picture — what "agent-native" means in FoxNose, how introspection endpoints interact with permissions, and when MCP is the right answer vs. one of the patterns above — see Agent-Native Flux APIs.

Choosing Your Search Strategy: When to Use Each Mode

Selecting the optimal search mode for your RAG application depends on the user's intent and the nature of the query. FoxNose's flexible Flux API allows you to choose the best approach for each scenario:

User Intent / Query Type	Search Mode	Why Choose This Mode
Natural Language Questions ("How do I...", "Explain X")	`vector`	Understands meaning and intent beyond keywords.
Questions with Constraints ("Reports about X in 2024")	`hybrid`	Combines deep semantic understanding with precise structured filters.
Finding Specific Items (Known ID, exact name)	`text` or filters	Provides fast, exact matches when keywords or field values are known.
Finding Similar Content ("Find other articles like this one")	`vector`	Excels at retrieving conceptually similar documents.

For most RAG applications, starting with hybrid search is recommended. It effectively handles a wide range of queries by combining semantic understanding with precise filtering, giving you the best of both worlds.

Best Practices for Building RAG Systems with FoxNose

Building a high-quality RAG system requires a thoughtful approach to both content modeling and query strategy. Here are some key recommendations:

Content Modeling

One Concept per Resource: Instead of large, monolithic documents, structure your knowledge into smaller, focused resources (e.g., one Q&A pair per resource). This dramatically improves retrieval precision.
Strategic Vectorization: Mark only the fields containing rich, semantic content (like a body or summary) as vectorizable. Avoid vectorizing of code-oriented data.
Chunking Strategy: For very long documents, consider splitting them into smaller, logically-chunked resources. This allows the LLM to receive more focused and relevant context for a given query.

Query Strategy

Set Appropriate top_k: Start with a small top_k (e.g., 3-5) to provide focused context for your LLM. More results are not always better and can dilute relevance.
Use similarity_threshold: Filter out low-relevance semantic results by setting a similarity_threshold (e.g., 0.7). This prevents irrelevant content from being passed to your LLM.
Combine with Filters: Whenever possible, use structured filters (e.g., by category, date, or tags) to narrow down the search space before semantic ranking is applied. This improves both speed and accuracy.

Response Generation

Include Source Links: Store a url or slug field in your resources pointing to the original source (e.g., documentation article, help center page). Include these links in LLM responses so users can read the full context: "For more details, see our Guide (https://docs.example.com/articles/42)." This builds trust and lets users verify information.
Handle Empty Results Gracefully: If FoxNose returns no relevant context for a query, instruct your LLM to respond accordingly (e.g., "I don't have information about that") rather than hallucinating an answer.

Practical Guides

Ready to build your FoxNose-powered RAG application? Follow our step-by-step guides:

Build a RAG Pipeline with Python → Complete example with LangChain and OpenAI
Build a RAG Pipeline with JavaScript / Node.js → Complete example with LangChain.js and OpenAI

Further API Details

For comprehensive technical documentation on our API endpoints and their capabilities, refer to:

Flux Search API → Learn about the POST endpoint with all query options, including hybrid search, filtering, and joins.
List Resources API → Explore the GET endpoint with query parameters for listing resources.
Vector Search Reference → Deep dive into pure semantic and hybrid search modes.