VECMAN : Build a RAG without Vector Database Using Only Python

Hazem Abbas

Jul 22, 2025 — 3 min read

Table of Content

What is VECMAN?

Built by an Egyptian Engineer (Loaii Eldeen Abdalslam), VECMAN, a revolutionary, high-performance vector database based on Vector Quantized Variational Autoencoders (VQ-VAE) , designed to compress, store, and retrieve text embeddings with unmatched memory efficiency and blazing-fast performance.

Why VECMAN?

Modern LLM applications demand scalable, low-latency retrieval systems. But traditional vector databases often struggle under:

High memory usage
Slow similarity search at scale
Lack of transparency in retrieval quality

VECMAN reimagines how we handle embeddings, not just storing them, but learning a compressed, meaningful representation through deep generative modeling.

✅ Think of it as "lossy compression for embeddings", like MP3 for vectors , but smarter.

Let’s be real: most vector databases today are just glorified key-value stores for embeddings. They scale, sure, but they don’t understand the data they’re storing. And when you're running RAG pipelines at scale, that lack of intelligence starts to hurt: bloated memory usage, brittle retrieval, and zero insight into why a result was returned.

Features

1- It Learns Structure, Not Just Stores Vectors

Instead of dumping raw embeddings into an index, VECMAN uses a trained VQ-VAE model to map them into a compressed, discrete latent space. Think of it like PNG compression for images, but for semantic vectors.

It optimizes the encoder architecture and training process so that similar embeddings naturally cluster together in the codebook. The result? Faster search, better generalization, and fewer “weirdly off” matches.

And here’s the kicker: It use encoder-to-encoder similarity during retrieval instead of comparing full decoded outputs. That means lower latency without sacrificing accuracy.

2- 4x Less Memory (Seriously)

Your all-MiniLM-L6-v2 embeddings are 384-dimensional. VECMAN compresses them down to just 96 learned dimensions using the codebook indices, a 4:1 compression ratio .

That means:

Smaller indexes
Lower RAM/VRAM usage
Cheaper hosting
Easier deployment on edge devices or containers with tight limits

All while keeping >95% of retrieval quality (measured on standard benchmarks). If you’re tired of paying $200/month for a Pinecone pod just to store 100k vectors, this is game-changing.

3- You Actually Know How Good the Results Are

Ever wonder if your retrieved chunk is a solid match… or just the least-worst option? With most systems, you get a distance score and a prayer.

VECMAN gives you real similarity scores, cosine and Euclidean, computed in the original embedding space after decoding.

No black box. You can set confidence thresholds, trigger fallback logic, or log low-score queries for review. Transparency matters, especially when debugging production issues.

4- Smart Retrieval, Not Dumb Search

VECMAN uses hybrid scoring : combining codebook proximity, post-decoding similarity, and reconstruction quality.

If the top candidate has a low confidence score? It automatically expands the search radius or falls back to brute-force comparison. This makes retrieval robust to edge cases, like rare terms or domain shifts.

No more writing custom reranking glue code.

5- Plug It Into Your Stack, No Rewrites Needed

It works seamlessly with tools you already use:

✅ sentence-transformers (obviously)
✅ Google Gemini Pro / Vertex AI embedding APIs
✅ Hugging Face pipelines and custom models

Just encode your text, pass the vectors in, and go. Minimal integration overhead — maximum impact.

6- Built for Evaluation, Not Just Deployment

VECMAN has first-class support for RAGAS , so you can run evaluations on datasets like WebQuestions or your own internal QA pairs, all without leaving Python.

Measure precision, recall, faithfulness, and see how compression affects each.

7- Actually Production-Ready

This isn’t a research prototype.

It includes:

Batch indexing with progress tracking
Error handling around malformed inputs and OOM scenarios
Disk persistence and load/resume support
Type hints, logging, and clean exceptions

It’s tested, typed, and designed to run reliably, whether you're prototyping locally or deploying in Kubernetes.

Install

Install VECMAN with pip:

pip install vecman

How to use VECMAN in your Python app?

from vecman import VecmanDB
from sentence_transformers import SentenceTransformer

# Initialize models
encoder = SentenceTransformer('all-MiniLM-L6-v2')
db = VecmanDB(dim=384, codebook_size=512)

# Add documents
docs = ["Machine learning is amazing", "Vectors power modern AI"]
embeddings = encoder.encode(docs)
db.add(embeddings, docs)

# Retrieve closest match
query = "Why is AI so powerful?"
query_emb = encoder.encode(query)
results = db.search(query_emb, k=1)

print(results)  # Returns top doc + similarity score!

License

This project is licensed under the MIT License

Resources & Downloads

VECMAN : Build a RAG without Vector Database Using Only Python

Hazem Abbas

Table of Content

What is VECMAN?

Why VECMAN?

Features

1- It Learns Structure, Not Just Stores Vectors

2- 4x Less Memory (Seriously)

3- You Actually Know How Good the Results Are

4- Smart Retrieval, Not Dumb Search

5- Plug It Into Your Stack, No Rewrites Needed

6- Built for Evaluation, Not Just Deployment

7- Actually Production-Ready

Install

License

Resources & Downloads

Are You Truly Ready to Put Your Mobile or Web App to the Test?

Articles

Systems

Development

Apps

Science - Healthcare

Open-source Apps

Medical Apps

Lists

Dev. Resources

Read more

Aaya: The Voice Assistant That Almost Changed Everything (And Why It Was Forgotten)

Unlocking the Hidden Power of Your IP Cameras: The Python-DVR Revolution

Srtly: a Must Have App for macOS to Organize your Messy Folders (Free But NOT Open-source)

FIAT (uconnect) Home Assistant Addon: Connect your Car to Your Smart Home Brain "Home Assistant"

Table of Content

What is VECMAN?

Why VECMAN?

Features

1- It Learns Structure, Not Just Stores Vectors

2- 4x Less Memory (Seriously)

3- You Actually Know How Good the Results Are

4- Smart Retrieval, Not Dumb Search

5- Plug It Into Your Stack, No Rewrites Needed

6- Built for Evaluation, Not Just Deployment

7- Actually Production-Ready

Install

License

Resources & Downloads

Read More Articles in Artificial Intelligence (AI)

Aaya: The Voice Assistant That Almost Changed Everything (And Why It Was Forgotten)

How AI Can Be Used in the Fight Against Addiction

11 Red Flags You're Chatting with an AI Bot: Protecting Yourself, Your Kids, and Your Digital Privacy

Zest AI: The AI That Decides Who Gets a Loan… And Why I’m Not Impressed

FinGPT: The Best Open-Source AI for Stock Market & Financial Analysis (Free & Powerful) Is it Better than BloombergGPT?

FinBot: Open-Source AI for Parsing and Understanding Financial Statements

Articles

Systems

Development

Apps

Science - Healthcare

Open-source Apps

Medical Apps

Lists

Dev. Resources

Read more

Aaya: The Voice Assistant That Almost Changed Everything (And Why It Was Forgotten)

Unlocking the Hidden Power of Your IP Cameras: The Python-DVR Revolution

Srtly: a Must Have App for macOS to Organize your Messy Folders (Free But NOT Open-source)

FIAT (uconnect) Home Assistant Addon: Connect your Car to Your Smart Home Brain "Home Assistant"