VECMAN : Build a RAG without Vector Database Using Only Python

Table of Content

What is VECMAN?

Built by an Egyptian Engineer (Loaii Eldeen Abdalslam), VECMAN, a revolutionary, high-performance vector database based on Vector Quantized Variational Autoencoders (VQ-VAE) , designed to compress, store, and retrieve text embeddings with unmatched memory efficiency and blazing-fast performance.

Why VECMAN?

Modern LLM applications demand scalable, low-latency retrieval systems. But traditional vector databases often struggle under:

  • High memory usage
  • Slow similarity search at scale
  • Lack of transparency in retrieval quality

VECMAN reimagines how we handle embeddings, not just storing them, but learning a compressed, meaningful representation through deep generative modeling.

✅ Think of it as "lossy compression for embeddings", like MP3 for vectors , but smarter.

Let’s be real: most vector databases today are just glorified key-value stores for embeddings. They scale, sure, but they don’t understand the data they’re storing. And when you're running RAG pipelines at scale, that lack of intelligence starts to hurt: bloated memory usage, brittle retrieval, and zero insight into why a result was returned.

Features

1- It Learns Structure, Not Just Stores Vectors

Instead of dumping raw embeddings into an index, VECMAN uses a trained VQ-VAE model to map them into a compressed, discrete latent space. Think of it like PNG compression for images, but for semantic vectors.

It optimizes the encoder architecture and training process so that similar embeddings naturally cluster together in the codebook. The result? Faster search, better generalization, and fewer “weirdly off” matches.

And here’s the kicker: It use encoder-to-encoder similarity during retrieval instead of comparing full decoded outputs. That means lower latency without sacrificing accuracy.

2- 4x Less Memory (Seriously)

Your all-MiniLM-L6-v2 embeddings are 384-dimensional. VECMAN compresses them down to just 96 learned dimensions using the codebook indices, a 4:1 compression ratio .

That means:

  • Smaller indexes
  • Lower RAM/VRAM usage
  • Cheaper hosting
  • Easier deployment on edge devices or containers with tight limits

All while keeping >95% of retrieval quality (measured on standard benchmarks). If you’re tired of paying $200/month for a Pinecone pod just to store 100k vectors, this is game-changing.

3- You Actually Know How Good the Results Are

Ever wonder if your retrieved chunk is a solid match… or just the least-worst option? With most systems, you get a distance score and a prayer.

VECMAN gives you real similarity scores, cosine and Euclidean, computed in the original embedding space after decoding.

No black box. You can set confidence thresholds, trigger fallback logic, or log low-score queries for review. Transparency matters, especially when debugging production issues.

VECMAN uses hybrid scoring : combining codebook proximity, post-decoding similarity, and reconstruction quality.

If the top candidate has a low confidence score? It automatically expands the search radius or falls back to brute-force comparison. This makes retrieval robust to edge cases, like rare terms or domain shifts.

No more writing custom reranking glue code.

5- Plug It Into Your Stack, No Rewrites Needed

It works seamlessly with tools you already use:

  • sentence-transformers (obviously)
  • ✅ Google Gemini Pro / Vertex AI embedding APIs
  • ✅ Hugging Face pipelines and custom models

Just encode your text, pass the vectors in, and go. Minimal integration overhead — maximum impact.

6- Built for Evaluation, Not Just Deployment

VECMAN has first-class support for RAGAS , so you can run evaluations on datasets like WebQuestions or your own internal QA pairs, all without leaving Python.

Measure precision, recall, faithfulness, and see how compression affects each.

7- Actually Production-Ready

This isn’t a research prototype.

It includes:

  • Batch indexing with progress tracking
  • Error handling around malformed inputs and OOM scenarios
  • Disk persistence and load/resume support
  • Type hints, logging, and clean exceptions

It’s tested, typed, and designed to run reliably, whether you're prototyping locally or deploying in Kubernetes.

Install

Install VECMAN with pip:

pip install vecman

How to use VECMAN in your Python app?

from vecman import VecmanDB
from sentence_transformers import SentenceTransformer

# Initialize models
encoder = SentenceTransformer('all-MiniLM-L6-v2')
db = VecmanDB(dim=384, codebook_size=512)

# Add documents
docs = ["Machine learning is amazing", "Vectors power modern AI"]
embeddings = encoder.encode(docs)
db.add(embeddings, docs)

# Retrieve closest match
query = "Why is AI so powerful?"
query_emb = encoder.encode(query)
results = db.search(query_emb, k=1)

print(results)  # Returns top doc + similarity score!

License

This project is licensed under the MIT License

Resources & Downloads

GitHub - Vec1man/vecman: VECMAN (Vector Manager) - A VQ-VAE based vector database for efficient text embeddings and retrieval. This package provides a memory-efficient way to store and retrieve text embeddings using Vector Quantized Variational Autoencoder (VQ-VAE).
VECMAN (Vector Manager) - A VQ-VAE based vector database for efficient text embeddings and retrieval. This package provides a memory-efficient way to store and retrieve text embeddings using Vector…

Are You Truly Ready to Put Your Mobile or Web App to the Test?

Don`t just assume your app works—ensure it`s flawless, secure, and user-friendly with expert testing. 🚀

Why Third-Party Testing is Essential for Your Application and Website?

We are ready to test, evaluate and report your app, ERP system, or customer/ patients workflow

With a detailed report about all findings

Contact us now






Open-source Apps

9,500+

Medical Apps

500+

Lists

450+

Dev. Resources

900+

Read more