Troubleshooting Guide¶

Common issues and solutions for RAGVersion.

Table of Contents¶

Installation Issues
Runtime Errors
Performance Issues
Integration Issues
Database Issues
File Tracking Issues

Installation Issues¶

"No module named ragversion"¶

Error:

ModuleNotFoundError: No module named 'ragversion'

Cause: Package not installed

Solutions:

# Install basic package
pip install ragversion

# Or with all features
pip install ragversion[all]

# Verify installation
python -c "import ragversion; print(ragversion.__version__)"

"Failed to install pypdf"¶

Error:

ERROR: Failed building wheel for pypdf

Cause: Missing optional dependencies or system libraries

Solutions:

Option 1: Install all parsers

pip install ragversion[parsers]

Option 2: Install specific parser

# For PDF only
pip install pypdf

# For DOCX only
pip install python-docx

# For Excel only
pip install openpyxl

Option 3: Use without parsers (text files only)

pip install ragversion  # Basic installation

"Microsoft Visual C++ required" (Windows)¶

Error:

error: Microsoft Visual C++ 14.0 or greater is required

Cause: Missing C++ build tools on Windows

Solutions:

Install Microsoft C++ Build Tools:
Download from: https://visualstudio.microsoft.com/visual-cpp-build-tools/
Install "Desktop development with C++" workload

Or use pre-built wheels:

pip install --only-binary :all: ragversion[parsers]

Runtime Errors¶

"Tracker not initialized"¶

Error:

RAGVersionError: Tracker not initialized. Call initialize() first.

Cause: Forgot to initialize tracker

Solutions:

Option 1: Use factory method (recommended)

# This auto-initializes
tracker = await AsyncVersionTracker.create()

Option 2: Use context manager

async with await AsyncVersionTracker.create() as tracker:
    result = await tracker.track("file.pdf")

Option 3: Call initialize() manually

tracker = AsyncVersionTracker(storage=storage)
await tracker.initialize()  # Don't forget this!

"File not found"¶

Error:

FileNotFoundError: File not found: document.pdf

Troubleshooting:
  • Check file path is correct
  • Use absolute path: /full/path/to/document.pdf

Cause: File doesn't exist or incorrect path

Solutions:

Check file exists:

from pathlib import Path
file = Path("document.pdf")
print(f"Exists: {file.exists()}")
print(f"Absolute: {file.absolute()}")

Use absolute paths:

import os
file_path = os.path.abspath("document.pdf")
result = await tracker.track(file_path)

Check working directory:

import os
print(f"Working directory: {os.getcwd()}")

"Storage connection failed"¶

Error (SQLite):

StorageError: Storage error: unable to open database file

Troubleshooting:
  • Check file path and permissions

Solutions:

Check permissions:

ls -la ragversion.db
chmod 644 ragversion.db

Check directory exists:
```
mkdir -p ./data
```

Use absolute path:

tracker = await AsyncVersionTracker.create(
    db_path="/full/path/to/ragversion.db"
)

Error (Supabase):

StorageError: Storage error: connection refused

Troubleshooting:
  • Check database connection and credentials
  • Verify SUPABASE_URL and SUPABASE_SERVICE_KEY

Solutions:

Verify environment variables:

import os
print(f"URL: {os.getenv('SUPABASE_URL')}")
print(f"Key: {'set' if os.getenv('SUPABASE_SERVICE_KEY') else 'NOT SET'}")

Test connection manually:

from supabase import create_client

url = os.getenv("SUPABASE_URL")
key = os.getenv("SUPABASE_SERVICE_KEY")
client = create_client(url, key)

# Test query
result = client.table("documents").select("*").limit(1).execute()
print(f"Connection works: {bool(result.data)}")

Check network:

ping your-project.supabase.co
curl https://your-project.supabase.co

"Table not found" (Supabase)¶

Error:

StorageError: Storage error: relation "documents" does not exist

Troubleshooting:
  • Run database migrations
  • Verify tables exist: documents, versions, version_content

Cause: Migrations not run

Solutions:

Run migrations:
```
ragversion migrate
```
Manually run SQL in Supabase console:
Go to: https://supabase.com/dashboard/project/_/sql
Copy SQL from: ragversion/storage/migrations/001_initial_schema.sql
Run the script

Verify tables exist:

SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'public';

Performance Issues¶

"track_directory() is very slow"¶

Symptoms: - Taking minutes to process hundreds of files - High CPU/memory usage - Slow embedding API calls

Causes & Solutions:

1. Too many files

# Solution: Use file patterns to limit scope
result = await tracker.track_directory(
    "docs",
    patterns=["*.md", "*.txt"]  # Only track specific types
)

2. Low concurrency

# Solution: Increase max_workers
result = await tracker.track_directory(
    "docs",
    max_workers=8  # Default is 4, increase for more parallelism
)

3. Large files

# Solution: Increase file size limit or filter
tracker = await AsyncVersionTracker.create(
    max_file_size_mb=100  # Increase from default 50MB
)

# Or skip large files
result = await tracker.track_directory(
    "docs",
    patterns=["*.md"]  # Exclude large PDFs/videos
)

4. Network latency (Supabase)

# Solution: Use SQLite for development
tracker = await AsyncVersionTracker.create("sqlite")  # Faster locally

5. Embeddings API rate limits

# Solution: Add delays between API calls (for integrations)
from ragversion.integrations.langchain import quick_start

sync = await quick_start(
    "docs",
    max_workers=2  # Reduce concurrency to avoid rate limits
)

"High memory usage"¶

Symptoms: - Python process using 1GB+ memory - Out of memory errors on large directories

Causes & Solutions:

1. Storing content in memory

# Solution: Disable content storage
tracker = await AsyncVersionTracker.create(
    store_content=False  # Only track hashes
)

2. Large files

# Solution: Reduce max file size
tracker = await AsyncVersionTracker.create(
    max_file_size_mb=10  # Limit to 10MB files
)

3. Too many files processed at once

# Solution: Reduce concurrency
result = await tracker.track_directory(
    "docs",
    max_workers=2  # Lower concurrency = less memory
)

Integration Issues¶

"LangChain integration not working"¶

Error:

ModuleNotFoundError: No module named 'langchain'

Solution:

pip install ragversion[langchain]

"Embeddings API rate limit"¶

Error:

openai.error.RateLimitError: Rate limit exceeded

Solutions:

1. Add delays

import asyncio

async def track_with_delays():
    async with await AsyncVersionTracker.create() as tracker:
        files = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]

        for file in files:
            result = await tracker.track(file)
            await asyncio.sleep(1)  # Wait 1 second between files

2. Reduce concurrency

result = await tracker.track_directory(
    "docs",
    max_workers=1  # Process one file at a time
)

3. Use chunk tracking (80-95% fewer embeddings)

sync = await quick_start(
    "docs",
    enable_chunk_tracking=True  # Default, but make sure it's enabled
)

"Vector store out of sync"¶

Symptoms: - Query returns outdated results - Deleted documents still appear

Solutions:

1. Resync from scratch

# Clear and rebuild
sync.vectorstore.delete(delete_all=True)
await sync.sync_directory("./docs")

2. Check callbacks are registered

# Ensure auto-sync is enabled
def verify_callbacks():
    print(f"Callbacks registered: {len(tracker._callbacks)}")

    if not tracker._callbacks:
        print("WARNING: No callbacks registered!")
        tracker.on_change(sync.on_change)

Database Issues¶

"SQLite database locked"¶

Error:

sqlite3.OperationalError: database is locked

Cause: Multiple processes accessing same database

Solutions:

Use Supabase for multi-process scenarios:

tracker = await AsyncVersionTracker.create("supabase")

Ensure only one process at a time:

# Use file-based locking
import filelock

lock = filelock.FileLock("ragversion.lock")
with lock:
    async with await AsyncVersionTracker.create() as tracker:
        result = await tracker.track_directory("docs")

Increase timeout:

from ragversion.storage import SQLiteStorage

storage = SQLiteStorage(timeout=60)  # Default is 30s
tracker = AsyncVersionTracker(storage=storage)

"Supabase quota exceeded"¶

Error:

StorageError: Database quota exceeded

Solutions:

Check your plan limits in Supabase dashboard

Clean up old versions:

# Keep only last 10 versions per document
for doc in await tracker.list_documents():
    deleted = await tracker.cleanup_old_versions(doc.id, keep_count=10)
    print(f"Deleted {deleted} old versions for {doc.file_name}")

Disable content storage:

tracker = await AsyncVersionTracker.create(
    store_content=False  # Only store hashes
)

File Tracking Issues¶

"No changes detected" when file changed¶

Symptoms: - File changed but track() returns changed=False - Expected MODIFIED but got nothing

Causes & Solutions:

1. Content hasn't changed (only metadata/timestamp)

# RAGVersion tracks content, not timestamps
# If only the file modified time changed, no new version is created

2. File is being parsed differently

# Check what content is being parsed
result = await tracker.track("file.pdf")

if not result.changed:
    # Get current content hash
    doc = await tracker.storage.get_document_by_path(str(Path("file.pdf").absolute()))
    print(f"Current hash: {doc.content_hash}")

    # Compare with parsed content
    from ragversion.parsers import ParserRegistry
    parser = ParserRegistry.get_parser("file.pdf")
    content = await parser.parse("file.pdf")
    import hashlib
    new_hash = hashlib.sha256(content.encode()).hexdigest()
    print(f"New hash: {new_hash}")
    print(f"Match: {doc.content_hash == new_hash}")

"Changes detected too frequently"¶

Symptoms: - New version created on every run - FileWatcher creates duplicate events

Causes & Solutions:

1. Debouncing not working (FileWatcher)

# FileWatcher has built-in 1-second debounce
# If still seeing duplicates, increase it:

from ragversion.watcher import DocumentEventHandler

handler = DocumentEventHandler()
handler._debounce_seconds = 2.0  # Increase to 2 seconds

2. Temp files being tracked

# Solution: Add to ignore patterns
watcher = FileWatcher(
    tracker,
    paths=["./docs"],
    ignore_patterns=[
        "*.tmp",
        "*.swp",
        "*~",
        ".git/*",
        "__pycache__/*"
    ]
)

Getting More Help¶

Still stuck? Here's how to get help:

Check examples: examples/
Read the tutorial: 5-Minute Tutorial
Search issues: GitHub Issues
Ask a question: GitHub Discussions
Report a bug: New Issue

When reporting issues, please include: - RAGVersion version: python -c "import ragversion; print(ragversion.__version__)" - Python version: python --version - Operating system - Full error traceback - Minimal code to reproduce