Skip to content

RAGVersion

Async-first version tracking system for RAG applications

RAGVersion solves the critical problem of keeping vector databases synchronized with changing source documents in Retrieval-Augmented Generation (RAG) applications.

🎉 New in v0.10.0: Chunk-Level Versioning

80-95% embedding cost reduction through intelligent chunk-level tracking! Only re-embed the parts of documents that actually changed.

:octicons-arrow-right-24: Learn about Chunk Versioning

Why RAGVersion?

When building RAG applications, you face a challenge: documents change, but vector databases don't update automatically. RAGVersion provides:

  • Automatic change detection - Know exactly which documents changed
  • Chunk-level versioning - Track changes at chunk granularity (80-95% cost savings) 🆕
  • Version history - Complete audit trail of all changes
  • Cost optimization - Only re-index changed documents and chunks
  • Production-ready - Resilient error handling and async architecture
  • Framework integrations - Works with LangChain, LlamaIndex, and custom pipelines

Quick Start

Install RAGVersion:

pip install ragversion

Track your documents:

import asyncio
from ragversion import AsyncVersionTracker
from ragversion.storage import SupabaseStorage

async def main():
    tracker = AsyncVersionTracker(
        storage=SupabaseStorage.from_env()
    )

    # Track a directory
    result = await tracker.track_directory(
        "./documents",
        patterns=["*.pdf", "*.docx"],
        recursive=True
    )

    print(f"Changes detected: {result.success_count}")

asyncio.run(main())

Key Features

🚀 Async-First Architecture

Built from the ground up for Python's async/await patterns, enabling efficient concurrent processing.

📊 Change Detection

Automatic content-based change detection using hashing - no manual tracking needed.

🔄 Batch Processing

Process thousands of documents efficiently with parallel workers and resilient error handling.

🗄️ Supabase Integration

Reliable PostgreSQL-backed storage with Supabase for production deployments.

🔗 Framework Integrations

Ready-to-use helpers for: - LangChain - Sync with LangChain vector stores - LlamaIndex - Sync with LlamaIndex indexes - Custom - Build your own integrations

📝 Complete Documentation

15,000+ words of comprehensive documentation covering: - Installation and setup - Core concepts - API reference - Integration guides - Best practices - Troubleshooting

The Problem RAGVersion Solves

Without RAGVersion ❌

Documents change → Don't know which ones → Re-index everything →
Expensive API calls → Slow updates → Or risk serving stale data

With RAGVersion ✅

Documents change → Automatic detection → Only re-index changed docs →
99% cost savings → Fast updates → Always fresh data

Real-World Impact

Document-Level Tracking

Metric Without RAGVersion With RAGVersion
Cost $50 per update $0.50 per update
Time 33 minutes 20 seconds
Files processed 1,000 (all) 10 (only changed)
Savings - 99% reduction

Chunk-Level Tracking (v0.10.0+) 🆕

Scenario Without Chunks With Chunks Savings
Documentation Update (1 paragraph in 100-page doc) $2.50 (500 chunks) $0.01 (2 chunks) 99.6%
Code Repository (10 modified files out of 50) $5.00 (1,000 chunks) $0.15 (30 chunks) 97%
Average Use Case Full re-embedding Smart chunk updates 80-95%

Use Cases

  • 📚 Documentation Sites - Keep docs in sync with latest changes
  • 💬 Customer Support - Always use up-to-date product information
  • 🏢 Enterprise Knowledge Bases - Track document changes for compliance
  • 🔬 Research Systems - Version control for research papers and datasets
  • 📊 Content Management - Track changes across large content libraries

Installation Options

# Basic installation
pip install ragversion

# With document parsers (PDF, DOCX, etc.)
pip install "ragversion[parsers]"

# With LangChain integration
pip install "ragversion[langchain]"

# With LlamaIndex integration
pip install "ragversion[llamaindex]"

# Everything (recommended)
pip install "ragversion[all]"

Next Steps

Community & Support

License

RAGVersion is licensed under the MIT License.


Built with ❤️ for the RAG community