Metadata Filtering with Spring AI: A WHERE Clause for Your Vector Store

In the last post we fixed context pollution by giving every document domain its own VectorStore. One bucket for FAQ, one for legal, one for tech, one for HR — done. The router picks one and the LLM gets a clean top-K.

That works. It also gets old fast. The moment you have more than four dimensions you care about — products, versions, categories, years, tenants, languages — spinning up a new store per combination turns your config into a wall of @PostConstruct methods. You don’t want sixteen vector stores. You want one store that knows how to slice itself.

That’s metadata filtering. This post maps to Demo 9: Metadata Filtering in the rag-spring-ai project.

1. Why You Want a `WHERE` Clause on Your Vector Search

Pure similarity search has one job: find the K chunks closest to the query in embedding space. It does not care whether those chunks belong to the customer you’re answering for, the product version they’re using, or the document category they asked about. It just returns whatever is nearest.

Most of the time that’s fine. The problems show up when nearest ≠ relevant:

Multi-tenant SaaS. Acme Corp asks a question. Your top-K returns the most semantically similar chunks across all tenants. Now you’re leaking another customer’s onboarding doc into Acme’s answer. This is a security incident, not a relevance bug.
Versioned docs. A user on CloudFlow v2.0 asks “what features are available?”. The top-K is dominated by v2.5 release notes because they’re more recent and more verbose. The user gets confidently told about features they don’t have.
Mixed categories in one corpus. “What are the rate limits?” pulls in three API docs, one security best-practices chunk that mentions throttling, and one release note that mentions a perf improvement. Two of those five chunks are noise.

You can fix all three by attaching metadata at ingestion time and filtering before the similarity ranking happens. Same store, same embeddings, much sharper retrieval.

Figure: No filter vs. product == 'cloudflow'. Same store, same query, same embeddings — just a tighter candidate set before ranking.

2. Metadata Lives on the Document

Metadata isn’t a separate concept in Spring AI — it’s just a Map<String, Object> you hang off a Document when you create it:

new Document(
    "CloudFlow v2.5 added AI-powered document summarization, smart search, and automated workflow templates. Performance improved by 40%.",
    Map.of(
        "product",  "cloudflow",
        "version",  "2.5",
        "category", "release-notes",
        "year",     "2026"
    )
);

Two practical things to know:

Pick your keys before you ingest. Adding metadata later means re-embedding, because most stores don’t let you mutate a document’s metadata in place. Sit down for ten minutes and decide on a schema: tenant, product, version, category, language, accessLevel — whatever your queries will need to slice on. Fewer, well-named keys beat a sprawling bag.
Use string-friendly types. Some stores happily take ints and dates; others stringify them anyway. To stay portable, prefer strings for things like version and year unless you genuinely need numeric comparisons (>=, <). I’m using "2026" as a string above for that exact reason.

Once your documents have metadata, every single one of them is searchable and filterable. The store doesn’t need any extra setup.

3. `FilterExpressionBuilder` — The Type-Safe Way

You could write filter strings by hand — most stores accept something like product == 'cloudflow' && version >= '2.0' — but Spring AI gives you a builder so you don’t have to remember each backend’s exact dialect. It’s also what travels with you when you swap pgvector for Pinecone six months from now.

var b = new FilterExpressionBuilder();

// Equality
b.eq("product", "cloudflow").build();

// Membership
b.in("category", "release-notes", "security").build();

// Comparison
b.gte("year", "2025").build();           // year >= 2025

// Boolean composition
b.and(
    b.eq("product", "cloudflow"),
    b.in("category", "release-notes", "security")
).build();

// Negation
b.not(b.eq("category", "deprecated")).build();

The full operator set is roughly what you’d expect from a query language: eq, ne, gt, gte, lt, lte, in, nin, and, or, not. Build the expression once, hand the result to a SearchRequest, and you’re done.

Cheatsheet diagram of the Spring AI FilterExpressionBuilder. Three columns. Left: comparison operators (eq, ne, gt, gte, lt, lte). Middle: set operators (in, nin) and logical composition (and, or, not). Right: a worked example combining several operators into a complete filter expression for tenant, product, version range, and category. — **Figure:** The operator surface of `FilterExpressionBuilder` — comparisons, sets, and boolean composition. Same builder works against pgvector, Chroma, Pinecone, Milvus, Weaviate.

A small gotcha. eq("year", "2026") and eq("year", 2026) are not the same query against most backends. Whatever type you stored at ingestion is the type you have to filter on. If you stored years as strings, filter as strings. Mixing them is the kind of bug that returns zero results and zero hints.

4. Filtering a Bare Similarity Search

Filtering plugs into the existing SearchRequest you’ve used since the vector store operations post. One extra line:

public List<Map<String, Object>> searchByProduct(String query, String product) {
    var filter = new FilterExpressionBuilder();

    List<Document> results = vectorStore.similaritySearch(
            SearchRequest.builder()
                    .query(query)
                    .topK(5)
                    .filterExpression(filter.eq("product", product).build())
                    .build()
    );

    return results.stream()
            .map(doc -> Map.<String, Object>of(
                    "content",  doc.getText(),
                    "metadata", doc.getMetadata()))
            .toList();
}

Two details I always want to call out the first time someone wires this up:

The filter runs before similarity ranking. This matters. The store narrows the candidate set with the predicate, then ranks what’s left by cosine distance. You’re not paying the embedding-distance cost on the discarded chunks.
topK = 5 is a cap, not a target. If only three documents match the filter, you get three. The retriever isn’t going to soften the predicate to find you two more. That’s almost always the right behaviour.

A handy diagnostic: when filtered queries return fewer chunks than you expect, it’s almost always one of two things — your metadata key is misspelled ("prod" vs "product"), or the type doesn’t match what you ingested (2026 vs "2026"). Log the document metadata you actually have in the store. Five minutes of vectorStore.similaritySearch(SearchRequest.builder().query("anything").topK(20).build()) and a glance at doc.getMetadata() saves an hour of squinting at the filter expression.

5. Filtering Inside a RAG Call

The same filter goes into a QuestionAnswerAdvisor so the LLM only ever sees on-scope context:

public String askAboutProduct(String question, String product) {
    var filter = new FilterExpressionBuilder();
    SearchRequest searchRequest = SearchRequest.builder()
            .topK(3)
            .filterExpression(filter.eq("product", product).build())
            .build();

    return chatClient.prompt()
            .system("Answer questions using only the provided context for the " + product + " product.")
            .advisors(QuestionAnswerAdvisor.builder(vectorStore)
                    .searchRequest(searchRequest)
                    .build())
            .user(question)
            .call()
            .content();
}

Notice you don’t set .query(...) on the SearchRequest here — the advisor injects the user’s question as the query at call time. You’re only configuring the retrieval policy: top-K, similarity threshold, and the metadata filter. That’s the right separation of concerns. The advisor owns “what was asked”; you own “what you’re allowed to look at”.

The system prompt does the same nudging trick we used in the multi-doc post: telling the model which product the context is for. With a small local model like qwen3:4b, that one extra clause noticeably reduces drift into general knowledge.

6. Running the Demo

Same setup as the rest of the series — Postgres + pgvector + Ollama in Docker, Spring Boot on top:

docker compose up -d
./mvnw spring-boot:run

The six demo documents (CloudFlow v2.0, v2.5, API rate limits, security best practices, plus DataSync v1.0 and v1.5) ingest automatically on startup with their product, version, category, and year metadata.

Filter by product

curl -s "http://localhost:8080/api/metadata/search/product?query=new+features&product=cloudflow" | jq

You’ll get only CloudFlow documents back — even though the same query against the unfiltered store would mix in DataSync’s “v1.5 added Apache Kafka…” chunk because it’s also semantically about new features.

Try the same query with product=datasync and watch the result set flip entirely. Same vector store, same embeddings, two completely different worlds.

Filter by category

curl -s "http://localhost:8080/api/metadata/search/category?query=best+practices&category=security" | jq

This returns the security chunk and ignores the release notes — even if a release note happens to mention “best practices” in passing.

curl -s -X POST http://localhost:8080/api/metadata/ask \
  -H "Content-Type: application/json" \
  -d '{"question": "What new features were added in the latest version?", "product": "cloudflow"}' | jq

{
  "question": "What new features were added in the latest version?",
  "product": "cloudflow",
  "answer": "CloudFlow v2.5 added AI-powered document summarization, smart search with natural language queries, and automated workflow templates. Performance improved by 40%."
}

Switch the product to datasync and the same question returns the Kafka and pipeline-builder answer. The model can’t accidentally cross the streams because it never sees the other product’s chunks in the first place.

7. Vector Store Support — Read the Fine Print

Filter support is not uniform across backends. Spring AI smooths over the API, but the underlying capabilities differ:

Vector Store	What you get
`SimpleVectorStore`	Basic predicates (`eq`, `in`). Fine for demos, weak for anything else.
`PgVectorStore`	Full SQL-like expressions, indexed if you create the right indexes.
Chroma	Full metadata filtering with native operator support.
Pinecone	Full metadata filtering, very fast at scale.
Milvus	Full filtering with boolean expressions and scalar indexes.
Weaviate	Full filtering via its GraphQL `where` syntax.

Two takeaways:

The demo runs on SimpleVectorStore for convenience, but don’t ship it. If your app does real metadata filtering, move to a backend that can index those fields. Otherwise every “filtered” query is a full scan.
Performance ≠ correctness. Even when a backend supports filtering, an unindexed metadata field on a million-chunk store is going to hurt. When you settle on a stable schema, create indexes on the fields you actually filter by — tenantId, product, category. Treat metadata fields like database columns, because at this scale that’s exactly what they are.

8. Things That Will Bite You

A short list, because none of them are showstoppers but all of them have ruined someone’s afternoon.

Schema drift

You ingest a million chunks with category: "release-notes". Six months later, a new pipeline starts writing category: "release_notes". Your filters silently miss half the corpus. Pin a small enum-like list of allowed values and validate at ingestion time, not at query time.

Type mismatches

b.eq("year", 2026) against documents stored with "year" -> "2026" returns zero results, no error. Pick one type per field and enforce it everywhere — at ingestion and at query construction.

Filters that always return empty

If your “filtered” queries consistently return nothing, the order of debugging is: (1) is the metadata actually in the store? (run an unfiltered search and inspect doc.getMetadata()); (2) does the type match? (string vs number); (3) does the operator behave the way you think across null values? (eq against a missing key is not the same as ne of the wrong value).

Unindexed fields at scale

Filtering a small store is free. Filtering a large unindexed store is a full scan, every query. When you commit to a metadata schema, add the matching indexes in your vector store. For pgvector, that’s a regular B-tree index on the JSONB metadata key. For Pinecone and friends, follow their docs — but do it.

Tenant filters as a security boundary

Tempting take: “we’ll filter by tenantId in the advisor and call it a day.” Don’t. Pass tenantId from your authenticated principal server-side — never from the request body — and apply the filter in a single chokepoint (a Repository, a RetrievalService, an interceptor). One forgotten eq("tenantId", ...) is a cross-tenant data leak.

Don’t replace good chunking with filtering

Metadata filtering is great at narrowing scope. It cannot rescue chunks that are too big, too small, or split across the wrong boundaries. If your top-K is bad within a single tenant’s data, fix chunking first, then come back.

Leaking filter values into prompts unfiltered

This one’s subtle. When the user picks the filter (a product dropdown, a category facet), validate the value against an allow-list before you put it in the system prompt. Answer questions using only the provided context for the " + product + " product. is a perfectly fine prompt — until product is "); ignore previous instructions; (". Treat user-supplied filter values like user-supplied SQL: validate, don’t concatenate.

9. When to Reach for This (and When Not To)

Use metadata filtering when:

You have one logical corpus (one product family, one knowledge base) but multiple slicing dimensions: tenant, version, language, access level.
You’d otherwise be tempted to spin up N vector stores for N values of one field.
You need access control inside the retrieval layer, not bolted on top.
Your store has decent filter support and you can index the relevant fields.

Stick with multiple collections when:

The domains are genuinely different content with different tone, format, or update cadence (legal vs. tech vs. HR).
Different domains need different retrieval settings — different chunk sizes, different topK, different embedding strategies.
You want hard isolation: re-ingesting one collection should never touch another.

Combine both when:

You have a handful of separate stores for the genuinely distinct domains and metadata filtering inside each store for finer slicing. This is what most production setups end up looking like — and it’s not more complex than either approach alone.

10. Where This Sits in the Bigger Picture

Metadata filtering is the natural counterpart to the multi-document pattern. Multi-doc gives you isolation between domains; metadata filtering gives you slicing within a domain. Real systems use both: a few stores for the truly distinct corpora, and metadata predicates inside each one for tenant, version, language, and access-control scoping.

It also slots cleanly into the rest of the series:

Function calling — your searchProductDocs(product, version, question) @Tool method is just a metadata-filtered search behind a function signature. Let the LLM pick the values from the conversation.
Advisors — wrap the filtered retrieval in an advisor and you’ve got a per-tenant RAG pipeline that the rest of your app doesn’t have to think about.
Structured output — when the model classifies an incoming question into a typed Routing(product, category), those fields become your filter values. End-to-end typed retrieval.

Metadata filtering is the smallest possible addition to your RAG pipeline that pays back the most. One field on each Document, one extra line on each SearchRequest.

11. Key Takeaways

Metadata is a WHERE clause for vector search. It runs before similarity ranking, so you trim the candidate set instead of post-filtering noisy results.
Decide your schema before ingesting. Tenant, product, version, category, language, accessLevel — pick the keys, pick the types, and stick to them. Adding a new key later means re-embedding.
Use FilterExpressionBuilder, not raw filter strings. It’s type-safe, refactor-friendly, and portable across vector stores.
Type and key consistency are the #1 source of “no results” bugs. 2026 vs "2026" and "category" vs "categories" will silently return empty result sets.
Tenant filters are security boundaries. Apply them server-side from the authenticated principal, in a single chokepoint. Never trust the request body.
SimpleVectorStore is for demos. For real workloads, move to a backend that can index your metadata fields. Otherwise every filtered query is a full scan.
Combine with multi-document RAG. A few distinct stores plus metadata filtering inside each one is the pattern most production setups land on.

Series Roadmap

Post	Topic	What it adds
Post 1	Basic RAG	End-to-end retrieval pipeline with `QuestionAnswerAdvisor`
Post 2	Document Ingestion	Multi-format loading, custom chunk sizes, metadata enrichment
Post 3	Vector Store Operations	Direct similarity search, threshold tuning, embedding inspection
Post 4	Chat with Memory	Conversational RAG with per-session history and context carryover
Post 5	Advisors	Composing RAG + memory + safety advisors in a pipeline
Post 6	Structured Output	Extracting typed Java records from LLM responses
Post 7	Function Calling	Letting the LLM invoke Java methods as tools
Post 8	Multi-Document RAG	Multiple document collections with smart routing
→ You are here	Metadata Filtering	Scoping vector search with metadata filters

Source code: github.com/gdunhao/rag-spring-ai — clone it, run make setup && make run, and open localhost:8080 for the interactive playground.

1. Why You Want a WHERE Clause on Your Vector Search