Multilingual Semantic Search
Building a search that understands meaning is hard. Building one that understands meaning across multiple languages is exponentially harder. It often requires managing separate search indices for each language, complex query routing, and inconsistent fallback logic.
FoxNose is designed to solve this complexity. Its semantic search and localization systems work together seamlessly, allowing you to build powerful, multilingual search experiences with minimal effort.
How it Works: The Single-Embedding Model
The key to FoxNose's approach is its single-embedding model. Instead of creating and storing separate vectors for every language (which is expensive and complex), FoxNose does the following:
- Identify a "Vectorizable" Field: You mark a
textorstringfield in your schema asvectorizable. This field must also belocalizable. - Use Default Locale as the Source: FoxNose uses the content from your default locale (e.g., English) as the canonical source for vectorization.
- Generate a Multilingual Embedding: This English text is run through a powerful, multilingual embedding model. The resulting single vector understands the content's meaning in many languages.
- Store One Vector: Only this one vector is stored per field, keeping your storage costs low and your data pipeline simple.
Because the embedding model is multilingual, a search query in French for "voitures électriques" can successfully find an English document about "electric cars," as their vectors will be close together in the "map of meaning."
Querying Across Languages
When you send a search request to the Flux API, you have several parameters to control the multilingual behavior:
search_locale: This parameter primarily influences keyword search and sorting. When sorting localized fields,search_localeensures correct alphabetical order and relevance for that specific language. If your hybrid query includes a keyword component,search_localetells FoxNose which language's text to analyze. The vector search component remains language-agnostic due to the multilingual model.return_locales: This specifies which language versions of the content should be included in the response. You can request multiple, likefr,en.fallback_locales=true: If a translation for a locale inreturn_localesis missing, this tells FoxNose to fill the gap with content from your default locale. This is crucial for avoiding empty fields in your UI.
Example Hybrid Search Query
This powerful hybrid query demonstrates how you can combine semantic search with language-specific keyword search and filtering in a single request.
The goal: Find articles semantically related to "IA générative" in their summary, but only those that also contain the exact keyword "sécurité" in their French title and belong to the "Tech" category in French.
POST .../articles/_search?return_locales=fr,en&fallback_locales=true
{
"search_mode": "hybrid",
"search_locale": "fr",
"vector_search": {
"query": "IA générative",
"fields": ["summary"]
},
"text_search": {
"query": "sécurité",
"fields": ["title"]
},
"where": {
"all_of": [
{ "category": "Tech" }
]
}
}
How this query works:
"search_locale": "fr"tells FoxNose that thetext_searchandwhereclauses should operate on the French (fr) content.vector_searchfinds articles that are semantically similar to "IA générative" by searching thesummaryfield.text_searchthen filters those results, keeping only the ones that contain the exact keyword "sécurité" in their Frenchtitle.wherefurther refines the results, ensuring the article'scategoryis "Tech" in the French version.
The final result is a highly relevant, cross-language search that combines broad semantic understanding of the body text with precise, language-specific filtering on metadata fields—all in a single API call.
Your Step-by-Step Workflow
- Define Locales: Ensure all required languages are set up in your environment, with one designated as the default. See the Localization Guide.
- Enable Flags: In your schema, mark the relevant
textorstringfields as bothlocalizedandvectorizable. - Populate Content: Make sure the default locale for your vectorizable fields contains high-quality content, as this is the source for your embeddings.
- Query with Context: Use
search_locale,return_locales, andfallback_locales=truein your Flux API calls to control the multilingual search and response behavior.
By following this workflow, you can build sophisticated, multilingual semantic search without ever managing a separate vector database or synchronization pipeline.