Multilingual Semantic Search

Building a search that understands meaning is hard. Building one that understands meaning across multiple languages is exponentially harder. It often requires managing separate search indices for each language, complex query routing, and inconsistent fallback logic.

FoxNose is designed to solve this complexity. Its semantic search and localization systems work together seamlessly, allowing you to build powerful, multilingual search experiences with minimal effort.


How it Works: The Single-Embedding Model

The key to FoxNose's approach is its single-embedding model. Instead of creating and storing separate vectors for every language (which is expensive and complex), FoxNose does the following:

  1. Identify a "Vectorizable" Field: You mark a text or string field in your schema as vectorizable. This field must also be localizable.
  2. Use Default Locale as the Source: FoxNose uses the content from your default locale (e.g., English) as the canonical source for vectorization.
  3. Generate a Multilingual Embedding: This English text is run through a powerful, multilingual embedding model. The resulting single vector understands the content's meaning in many languages.
  4. Store One Vector: Only this one vector is stored per field, keeping your storage costs low and your data pipeline simple.

Because the embedding model is multilingual, a search query in French for "voitures électriques" can successfully find an English document about "electric cars," as their vectors will be close together in the "map of meaning."


Querying Across Languages

When you send a search request to the Flux API, you have several parameters to control the multilingual behavior:

  • search_locale: This parameter primarily influences keyword search and sorting. When sorting localized fields, search_locale ensures correct alphabetical order and relevance for that specific language. If your hybrid query includes a keyword component, search_locale tells FoxNose which language's text to analyze. The vector search component remains language-agnostic due to the multilingual model.
  • return_locales: This specifies which language versions of the content should be included in the response. You can request multiple, like fr,en.
  • fallback_locales=true: If a translation for a locale in return_locales is missing, this tells FoxNose to fill the gap with content from your default locale. This is crucial for avoiding empty fields in your UI.

Example Hybrid Search Query

This powerful hybrid query demonstrates how you can combine semantic search with language-specific keyword search and filtering in a single request.

The goal: Find articles semantically related to "IA générative" in their summary, but only those that also contain the exact keyword "sécurité" in their French title and belong to the "Tech" category in French.

POST .../articles/_search?return_locales=fr,en&fallback_locales=true
{
  "search_mode": "hybrid",
  "search_locale": "fr",
  "vector_search": {
    "query": "IA générative",
    "fields": ["summary"]
  },
  "text_search": {
    "query": "sécurité",
    "fields": ["title"]
  },
  "where": {
    "all_of": [
      { "category": "Tech" }
    ]
  }
}

How this query works:

  • "search_locale": "fr" tells FoxNose that the text_search and where clauses should operate on the French (fr) content.
  • vector_search finds articles that are semantically similar to "IA générative" by searching the summary field.
  • text_search then filters those results, keeping only the ones that contain the exact keyword "sécurité" in their French title.
  • where further refines the results, ensuring the article's category is "Tech" in the French version.

The final result is a highly relevant, cross-language search that combines broad semantic understanding of the body text with precise, language-specific filtering on metadata fields—all in a single API call.


Your Step-by-Step Workflow

  1. Define Locales: Ensure all required languages are set up in your environment, with one designated as the default. See the Localization Guide.
  2. Enable Flags: In your schema, mark the relevant text or string fields as both localized and vectorizable.
  3. Populate Content: Make sure the default locale for your vectorizable fields contains high-quality content, as this is the source for your embeddings.
  4. Query with Context: Use search_locale, return_locales, and fallback_locales=true in your Flux API calls to control the multilingual search and response behavior.

By following this workflow, you can build sophisticated, multilingual semantic search without ever managing a separate vector database or synchronization pipeline.

Was this page helpful?