Vector Databases and RAG Architecture for Intelligent Search

Semantic Search with Zilliz/Milvus Vector Database

Vector databases have created a major revolution for modern applications looking to go beyond traditional keyword searches. Zilliz/Milvus is a high-performance vector database that can semantically search billions of texts in milliseconds.

How does it work?

Each document is converted into a high-dimensional vector using an embedding model and stored in Milvus/Zilliz.
The user’s question is also converted into an embedding.
Milvus quickly finds the closest (most similar) documents to this vector.

Advantage:
Meaning-based search across millions of records happens in milliseconds.

Example collection schema:

id: VARCHAR
embedding: FLOAT[]  # vector
text_content: STRING  # original text
date_info: DATETIME
content_type: STRING

LLM and Embedding with OpenAI/OpenRouter

Large Language Models (LLM) and embedding services form the foundation of natural language processing applications.

Goals:

Convert user text into vectors (embedding)
Summarize results and generate natural, concise responses (LLM)

How does it work?

User message is sent to OpenAI or OpenRouter API.
The embedding model (e.g. text-embedding-3-small) vectorizes the text.
After finding the most relevant results, the best response is generated using LLM (GPT-3.5, GPT-4, OpenRouter, etc.).

Advantage of OpenRouter:

Combines different LLM providers under a single API.
Strong Turkish language support, up-to-date models available.
Both embedding and response generation can be done with the same API.

RAG (Retrieval-Augmented Generation) Flow with LangChain/LangGraph

Instead of relying solely on the LLM, the core of RAG architecture is to pull relevant documents from the knowledge base (vector DB) and have the LLM generate responses based on this information.

RAG Flow:

Receive question from user.
Convert question into embedding.
Pull the most relevant documents from Milvus/Zilliz.
Pass these documents and the original question as a prompt to the LLM.
LLM generates a concise response.
Sources and scores are also shown to the user.

Advantages:

Reduces LLM hallucination, produces responses based on real information.
The process is modular and easily customizable.

Summary

Zilliz/Milvus: Fast and scalable semantic search infrastructure.
OpenAI/OpenRouter: Converts texts to vectors and generates natural language responses.
LangChain/LangGraph: Manages the entire process from user input to response generation, implements RAG architecture.

Using these technologies together in modern AI applications significantly improves both efficiency and user experience.

Stay tuned to our blog for more technical content and sample applications!