Skip to content

LangChain Integration

RAGVersion provides seamless integration with LangChain, making it easy to keep your vector stores in sync with document changes.

NEW in v0.11.0: One-line setup!

from ragversion.integrations.langchain import quick_start

# That's it! 🚀
sync = await quick_start("./documents")

# Ready to query
results = await sync.vectorstore.asimilarity_search("query", k=5)

What this does: - ✅ Creates RAGVersion tracker with smart defaults - ✅ Initializes FAISS vector store - ✅ Sets up OpenAI embeddings - ✅ Configures text splitter - ✅ Syncs your documents - ✅ Enables chunk-level tracking for 80-95% cost savings

See the Quick Start Guide for more details.

Customization

Vector Store Options

# FAISS (in-memory, fast)
sync = await quick_start(
    directory="./documents",
    vectorstore_type="faiss",
    vectorstore_path="./vectorstore",  # Save/load location
)

# Chroma (persistent)
sync = await quick_start(
    directory="./documents",
    vectorstore_type="chroma",
    vectorstore_path="./chroma_db",
)

Custom Configuration

sync = await quick_start(
    directory="./documents",
    vectorstore_type="faiss",
    storage_backend="supabase",      # or "sqlite", "auto"
    chunk_size=500,                  # Custom chunk size
    chunk_overlap=100,               # Custom overlap
    enable_chunk_tracking=True,      # Smart updates (default)
)

Custom Embeddings

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
sync = await quick_start("./documents", embeddings=embeddings)

Advanced Usage

Manual Setup (For Full Control)

If you need complete control over the setup:

from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from ragversion import AsyncVersionTracker
from ragversion.storage import SQLiteStorage
from ragversion.integrations.langchain import LangChainSync

# Setup storage
storage = SQLiteStorage()
tracker = AsyncVersionTracker(storage=storage)
await tracker.initialize()

# Setup LangChain components
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(["placeholder"], embeddings)

# Create sync
sync = LangChainSync(
    tracker=tracker,
    text_splitter=text_splitter,
    embeddings=embeddings,
    vectorstore=vectorstore,
    enable_chunk_tracking=True,  # Enable smart updates
)

# Sync directory
await sync.sync_directory("./documents")

Using the Sync Integration

# Query vector store
results = await sync.vectorstore.asimilarity_search("query", k=5)

# Track new files (auto-updates vector store)
await sync.tracker.track("./new_doc.pdf")

# Custom metadata extraction
def extract_metadata(file_path):
    return {"source": "internal", "department": "engineering"}

sync = LangChainSync(
    tracker=tracker,
    text_splitter=text_splitter,
    embeddings=embeddings,
    vectorstore=vectorstore,
    metadata_extractor=extract_metadata,
)

Smart Chunk-Level Tracking

When enable_chunk_tracking=True (default), RAGVersion only re-embeds changed chunks on document updates, achieving 80-95% cost savings compared to full re-embedding.

sync = await quick_start(
    directory="./documents",
    enable_chunk_tracking=True,  # Default
)

# Update a document
await sync.tracker.track("./documents/updated.pdf")
# Only changed chunks are re-embedded!

Examples

API Reference

quick_start()

One-line setup for LangChain integration.

Parameters: - directory (str): Directory to track and sync - vectorstore_type (str): "faiss" or "chroma" (default: "faiss") - vectorstore_path (str, optional): Path for persistent storage - embeddings (Embeddings, optional): Custom embeddings model - storage_backend (str): "auto", "sqlite", or "supabase" (default: "auto") - chunk_size (int): Text splitter chunk size (default: 1000) - chunk_overlap (int): Text splitter overlap (default: 200) - file_patterns (List[str]): File patterns to track (default: [".txt", ".md", "*.pdf"]) - enable_chunk_tracking (bool): Enable smart chunk updates (default: True)

Returns: - LangChainSync: Initialized sync instance with documents synced

LangChainSync

Main integration class for automatic synchronization.

Methods: - sync_directory(dir_path, patterns, recursive): Sync all files in directory - _handle_creation(event): Handle document creation events - _handle_modification(event): Handle document modification events - _handle_deletion(event): Handle document deletion events

See the complete documentation for detailed API reference.