Recent Posts
Metadata Filtering with Spring AI: A WHERE Clause for Your Vector Store
In the last post we fixed context pollution by giving every document domain its own VectorStore. One bucket for FAQ, one for legal, one for tech, one for HR — done. The router picks one and the LLM gets a...
Read more →Multi-Document RAG with Spring AI: Multiple Collections, Smart Routing, and Cleaner Top-K
Every demo in this series so far has lived inside one cosy little vector store. We dumped a CloudFlow FAQ into it, asked some questions, and got nice answers. That’s also pretty much how every “your first RAG app” tutorial...
Read more →Function Calling in Spring AI: Letting the LLM Press the Buttons
So far in this series the LLM has been a very polite librarian — we ask it questions, it goes to the vector store, it reads us a nicely worded answer. That’s RAG. It’s great. It’s also, eventually, not enough....
Read more →Structured Output in Spring AI: Turning LLM Prose into Typed Java Records
Up to now, every demo in this series has happily returned a String from the LLM and called it a day. That’s fine when you’re building a chatbot — humans are great at reading prose. It is not fine when...
Read more →Advisors in Spring AI: Composing RAG, Memory, and Safety as a Pipeline
So far in this series we’ve quietly been using a feature without ever really stopping to look at it. Every demo — basic RAG, ingestion, vector store ops, chat memory — has been built around ChatClient and these little things...
Read more →Agent-to-Agent with Spring AI: Two Agents, One Conversation, Zero Magic
I’ve spent a lot of the last few posts on single-agent stuff — RAG, memory, advisors, the whole “one ChatClient does everything” pipeline. That works great until your one agent starts to look like a kitchen drawer: every tool jammed...
Read more →Chat with Memory in Spring AI: Conversational RAG That Actually Remembers
So far in this series we’ve built a basic RAG pipeline, loaded a few different document formats, and poked at the vector store directly to understand what retrieval actually returns.
Read more →Vector Store Operations with Spring AI: Similarity Search, Thresholds, and Embedding Inspection
In the first post we built a basic RAG pipeline, and in the second post we explored different ways to ingest documents. Both times, we let QuestionAnswerAdvisor handle the retrieval for us — it searched the vector store, grabbed the...
Read more →Document Ingestion with Spring AI: Loading Text, JSON, and Custom Chunks into Your RAG Pipeline
In the first post we built a basic RAG system — one text file, default chunking, done. It worked great for a quick demo, but real-world documents don’t come in neat .txt files. You’ll deal with JSON exports, PDFs, maybe...
Read more →Basic RAG with Spring AI: Build a Grounded Q&A System from Scratch
Large language models are impressive, but they have a fundamental limitation: they can only work with what they learned during training. Ask about your company’s internal docs, last week’s release notes, or anything after the training cutoff — and the...
Read more →Deep Dive: Hierarchical Agent Systems — Supervisors, Workers, Delegation, and Quality Control
The single ReAct agent excels at single-domain tasks. Multi-agent systems extend this to multi-domain work by coordinating specialized agents. But there’s a specific multi-agent topology that deserves its own deep dive: the hierarchical supervisor pattern — where a supervisor agent...
Read more →Deep Dive: Multi-Agent Systems — Architectures, Coordination Patterns, Best Practices, and Pitfalls
The single ReAct agent handles an impressive range of tasks — but it has a ceiling. When a task spans multiple domains, requires different expertise for different phases, or benefits from verification and review, a single LLM with a single...
Read more →Deep Dive: The Single ReAct Agent — Architecture, Best Practices, and Pitfalls
The single ReAct agent is the most fundamental — and most widely deployed — agent architecture in production today. It places one LLM in a reasoning loop with access to tools, iterating through Thought → Action → Observation cycles until...
Read more →AI Agents Best Practices: Building Reliable, Safe, and Effective Agent Systems
AI agents — systems that use an LLM to reason, plan, and act in a loop — are among the most powerful patterns in modern AI engineering. They are also among the most fragile. A well-designed agent can autonomously resolve...
Read more →AI Agents: Autonomous Systems That Reason, Plan, and Act
Large Language Models are impressive text generators, but on their own they are stateless, passive, and confined to the information in their context window. An AI agent breaks all three constraints: it perceives its environment, reasons about a goal, selects...
Read more →Augmenting vs Training Large Language Models
One of the most consequential decisions in any AI project is whether to augment an existing Large Language Model or train (or retrain) one. The wrong choice can cost months of engineering effort, hundreds of thousands of dollars in compute,...
Read more →Augmenting Large Language Models
Large Language Models are remarkably capable out of the box, but they have well-known limitations — stale training data, hallucinations, no access to private knowledge, inability to take actions in the real world, and lack of domain depth. Augmentation is...
Read more →Java 26: Stable Values (JEP 526 — Preview)
What if you could have a lazily-initialized field that the JIT compiler treats exactly like final? JEP 526 introduces Stable Values — a new API for constants that are computed once and then permanently trusted by the JVM, enabling the...
Read more →Java 26: G1 GC Throughput via Reduced Synchronization (JEP 522)
Invisible to your application code, but measurable in your throughput metrics. JEP 522 refactors internal synchronization in the G1 garbage collector to reduce thread contention, delivering higher application throughput on multi-core servers — no JVM flags, no code changes required....
Read more →Java 26: Ahead-of-Time Object Caching with Any GC (JEP 516)
Faster startup and lower initial GC pressure — without changing a line of application code. JEP 516 extends Java’s Ahead-of-Time (AOT) cache to include heap objects, and lifts the restriction that previously limited this to SerialGC only. In JDK 26,...
Read more →Java 26: Remove the Applet API (JEP 504)
After 28 years, the java.applet package is gone. JEP 504 removes the Applet API entirely from JDK 26, completing a deprecation journey that started in JDK 9.
Read more →Java 26: Prepare to Make Final Mean Final (JEP 500)
The final keyword is about to mean something again. JEP 500 introduces runtime warnings in JDK 26 when reflection mutates final fields — the first step toward making final truly immutable. In a future JDK, those mutations will be blocked...
Read more →Java 26: Vector API — SIMD for Java (JEP 529 — Incubator)
Want 4x–16x speedups on data-parallel workloads? JEP 529 brings the Vector API to its 11th incubator round in JDK 26, enabling explicit SIMD (Single Instruction, Multiple Data) computations that compile to optimal hardware vector instructions.
Read more →Java 26: PEM Encodings of Cryptographic Objects (JEP 524 — Preview)
No more manual Base64 wrapping or Bouncy Castle dependency. JEP 524 introduces a built-in API for encoding and decoding cryptographic objects in PEM format — the ubiquitous text format used by OpenSSL, SSH, TLS, and virtually every crypto tool.
Read more →Java 26: Structured Concurrency (JEP 525 — Preview)
Concurrent programming in Java just got a lot safer. JEP 525 continues Structured Concurrency as a preview API that treats groups of related concurrent tasks as a single unit of work with automatic lifecycle management.
Read more →Java 26: Primitive Types in Patterns (JEP 530 — Preview)
Pattern matching in Java is now complete. JEP 530 extends patterns to support primitive types in switch, instanceof, and record patterns — closing the last gap in Java’s pattern matching story.
Read more →Java 26: HTTP/3 for the HTTP Client API (JEP 517)
Zero-RTT connection setup, no head-of-line blocking, and connection migration across networks — Java’s built-in HttpClient now speaks HTTP/3. JEP 517 finalizes HTTP/3 support in JDK 26, making it the first JDK version with zero-dependency QUIC-based HTTP.
Read more →