VECMAN : Build a RAG without Vector Database Using Only Python
Table of Content
What is VECMAN?
Built by an Egyptian Engineer (Loaii Eldeen Abdalslam), VECMAN, a revolutionary, high-performance vector database based on Vector Quantized Variational Autoencoders (VQ-VAE) , designed to compress, store, and retrieve text embeddings with unmatched memory efficiency and blazing-fast performance.
Why VECMAN?
Modern LLM applications demand scalable, low-latency retrieval systems. But traditional vector databases often struggle under:
- High memory usage
- Slow similarity search at scale
- Lack of transparency in retrieval quality
VECMAN reimagines how we handle embeddings, not just storing them, but learning a compressed, meaningful representation through deep generative modeling.
✅ Think of it as "lossy compression for embeddings", like MP3 for vectors , but smarter.
Let’s be real: most vector databases today are just glorified key-value stores for embeddings. They scale, sure, but they don’t understand the data they’re storing. And when you're running RAG pipelines at scale, that lack of intelligence starts to hurt: bloated memory usage, brittle retrieval, and zero insight into why a result was returned.
Features
1- It Learns Structure, Not Just Stores Vectors
Instead of dumping raw embeddings into an index, VECMAN uses a trained VQ-VAE model to map them into a compressed, discrete latent space. Think of it like PNG compression for images, but for semantic vectors.
It optimizes the encoder architecture and training process so that similar embeddings naturally cluster together in the codebook. The result? Faster search, better generalization, and fewer “weirdly off” matches.
And here’s the kicker: It use encoder-to-encoder similarity during retrieval instead of comparing full decoded outputs. That means lower latency without sacrificing accuracy.
2- 4x Less Memory (Seriously)
Your all-MiniLM-L6-v2
embeddings are 384-dimensional. VECMAN compresses them down to just 96 learned dimensions using the codebook indices, a 4:1 compression ratio .
That means:
- Smaller indexes
- Lower RAM/VRAM usage
- Cheaper hosting
- Easier deployment on edge devices or containers with tight limits
All while keeping >95% of retrieval quality (measured on standard benchmarks). If you’re tired of paying $200/month for a Pinecone pod just to store 100k vectors, this is game-changing.
3- You Actually Know How Good the Results Are
Ever wonder if your retrieved chunk is a solid match… or just the least-worst option? With most systems, you get a distance score and a prayer.
VECMAN gives you real similarity scores, cosine and Euclidean, computed in the original embedding space after decoding.
No black box. You can set confidence thresholds, trigger fallback logic, or log low-score queries for review. Transparency matters, especially when debugging production issues.
4- Smart Retrieval, Not Dumb Search
VECMAN uses hybrid scoring : combining codebook proximity, post-decoding similarity, and reconstruction quality.
If the top candidate has a low confidence score? It automatically expands the search radius or falls back to brute-force comparison. This makes retrieval robust to edge cases, like rare terms or domain shifts.
No more writing custom reranking glue code.
5- Plug It Into Your Stack, No Rewrites Needed
It works seamlessly with tools you already use:
- ✅
sentence-transformers
(obviously) - ✅ Google Gemini Pro / Vertex AI embedding APIs
- ✅ Hugging Face pipelines and custom models
Just encode your text, pass the vectors in, and go. Minimal integration overhead — maximum impact.
6- Built for Evaluation, Not Just Deployment
VECMAN has first-class support for RAGAS , so you can run evaluations on datasets like WebQuestions or your own internal QA pairs, all without leaving Python.
Measure precision, recall, faithfulness, and see how compression affects each.
7- Actually Production-Ready
This isn’t a research prototype.
It includes:
- Batch indexing with progress tracking
- Error handling around malformed inputs and OOM scenarios
- Disk persistence and load/resume support
- Type hints, logging, and clean exceptions
It’s tested, typed, and designed to run reliably, whether you're prototyping locally or deploying in Kubernetes.
Install
Install VECMAN with pip:
pip install vecman
How to use VECMAN in your Python app?
from vecman import VecmanDB
from sentence_transformers import SentenceTransformer
# Initialize models
encoder = SentenceTransformer('all-MiniLM-L6-v2')
db = VecmanDB(dim=384, codebook_size=512)
# Add documents
docs = ["Machine learning is amazing", "Vectors power modern AI"]
embeddings = encoder.encode(docs)
db.add(embeddings, docs)
# Retrieve closest match
query = "Why is AI so powerful?"
query_emb = encoder.encode(query)
results = db.search(query_emb, k=1)
print(results) # Returns top doc + similarity score!
License
This project is licensed under the MIT License