Troubleshooting Guide¶
Common issues and solutions for RAGVersion.
Table of Contents¶
- Installation Issues
- Runtime Errors
- Performance Issues
- Integration Issues
- Database Issues
- File Tracking Issues
Installation Issues¶
"No module named ragversion"¶
Error:
Cause: Package not installed
Solutions:
# Install basic package
pip install ragversion
# Or with all features
pip install ragversion[all]
# Verify installation
python -c "import ragversion; print(ragversion.__version__)"
"Failed to install pypdf"¶
Error:
Cause: Missing optional dependencies or system libraries
Solutions:
Option 1: Install all parsers
Option 2: Install specific parser
# For PDF only
pip install pypdf
# For DOCX only
pip install python-docx
# For Excel only
pip install openpyxl
Option 3: Use without parsers (text files only)
"Microsoft Visual C++ required" (Windows)¶
Error:
Cause: Missing C++ build tools on Windows
Solutions:
- Install Microsoft C++ Build Tools:
- Download from: https://visualstudio.microsoft.com/visual-cpp-build-tools/
-
Install "Desktop development with C++" workload
-
Or use pre-built wheels:
Runtime Errors¶
"Tracker not initialized"¶
Error:
Cause: Forgot to initialize tracker
Solutions:
Option 1: Use factory method (recommended)
Option 2: Use context manager
Option 3: Call initialize() manually
"File not found"¶
Error:
FileNotFoundError: File not found: document.pdf
Troubleshooting:
• Check file path is correct
• Use absolute path: /full/path/to/document.pdf
Cause: File doesn't exist or incorrect path
Solutions:
-
Check file exists:
-
Use absolute paths:
-
Check working directory:
"Storage connection failed"¶
Error (SQLite):
StorageError: Storage error: unable to open database file
Troubleshooting:
• Check file path and permissions
Solutions:
-
Check permissions:
-
Check directory exists:
-
Use absolute path:
Error (Supabase):
StorageError: Storage error: connection refused
Troubleshooting:
• Check database connection and credentials
• Verify SUPABASE_URL and SUPABASE_SERVICE_KEY
Solutions:
-
Verify environment variables:
-
Test connection manually:
-
Check network:
"Table not found" (Supabase)¶
Error:
StorageError: Storage error: relation "documents" does not exist
Troubleshooting:
• Run database migrations
• Verify tables exist: documents, versions, version_content
Cause: Migrations not run
Solutions:
-
Run migrations:
-
Manually run SQL in Supabase console:
- Go to: https://supabase.com/dashboard/project/_/sql
- Copy SQL from:
ragversion/storage/migrations/001_initial_schema.sql -
Run the script
-
Verify tables exist:
Performance Issues¶
"track_directory() is very slow"¶
Symptoms: - Taking minutes to process hundreds of files - High CPU/memory usage - Slow embedding API calls
Causes & Solutions:
1. Too many files
# Solution: Use file patterns to limit scope
result = await tracker.track_directory(
"docs",
patterns=["*.md", "*.txt"] # Only track specific types
)
2. Low concurrency
# Solution: Increase max_workers
result = await tracker.track_directory(
"docs",
max_workers=8 # Default is 4, increase for more parallelism
)
3. Large files
# Solution: Increase file size limit or filter
tracker = await AsyncVersionTracker.create(
max_file_size_mb=100 # Increase from default 50MB
)
# Or skip large files
result = await tracker.track_directory(
"docs",
patterns=["*.md"] # Exclude large PDFs/videos
)
4. Network latency (Supabase)
# Solution: Use SQLite for development
tracker = await AsyncVersionTracker.create("sqlite") # Faster locally
5. Embeddings API rate limits
# Solution: Add delays between API calls (for integrations)
from ragversion.integrations.langchain import quick_start
sync = await quick_start(
"docs",
max_workers=2 # Reduce concurrency to avoid rate limits
)
"High memory usage"¶
Symptoms: - Python process using 1GB+ memory - Out of memory errors on large directories
Causes & Solutions:
1. Storing content in memory
# Solution: Disable content storage
tracker = await AsyncVersionTracker.create(
store_content=False # Only track hashes
)
2. Large files
# Solution: Reduce max file size
tracker = await AsyncVersionTracker.create(
max_file_size_mb=10 # Limit to 10MB files
)
3. Too many files processed at once
# Solution: Reduce concurrency
result = await tracker.track_directory(
"docs",
max_workers=2 # Lower concurrency = less memory
)
Integration Issues¶
"LangChain integration not working"¶
Error:
Solution:
"Embeddings API rate limit"¶
Error:
Solutions:
1. Add delays
import asyncio
async def track_with_delays():
async with await AsyncVersionTracker.create() as tracker:
files = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
for file in files:
result = await tracker.track(file)
await asyncio.sleep(1) # Wait 1 second between files
2. Reduce concurrency
3. Use chunk tracking (80-95% fewer embeddings)
sync = await quick_start(
"docs",
enable_chunk_tracking=True # Default, but make sure it's enabled
)
"Vector store out of sync"¶
Symptoms: - Query returns outdated results - Deleted documents still appear
Solutions:
1. Resync from scratch
2. Check callbacks are registered
# Ensure auto-sync is enabled
def verify_callbacks():
print(f"Callbacks registered: {len(tracker._callbacks)}")
if not tracker._callbacks:
print("WARNING: No callbacks registered!")
tracker.on_change(sync.on_change)
Database Issues¶
"SQLite database locked"¶
Error:
Cause: Multiple processes accessing same database
Solutions:
-
Use Supabase for multi-process scenarios:
-
Ensure only one process at a time:
-
Increase timeout:
"Supabase quota exceeded"¶
Error:
Solutions:
-
Check your plan limits in Supabase dashboard
-
Clean up old versions:
-
Disable content storage:
File Tracking Issues¶
"No changes detected" when file changed¶
Symptoms:
- File changed but track() returns changed=False
- Expected MODIFIED but got nothing
Causes & Solutions:
1. Content hasn't changed (only metadata/timestamp)
# RAGVersion tracks content, not timestamps
# If only the file modified time changed, no new version is created
2. File is being parsed differently
# Check what content is being parsed
result = await tracker.track("file.pdf")
if not result.changed:
# Get current content hash
doc = await tracker.storage.get_document_by_path(str(Path("file.pdf").absolute()))
print(f"Current hash: {doc.content_hash}")
# Compare with parsed content
from ragversion.parsers import ParserRegistry
parser = ParserRegistry.get_parser("file.pdf")
content = await parser.parse("file.pdf")
import hashlib
new_hash = hashlib.sha256(content.encode()).hexdigest()
print(f"New hash: {new_hash}")
print(f"Match: {doc.content_hash == new_hash}")
"Changes detected too frequently"¶
Symptoms: - New version created on every run - FileWatcher creates duplicate events
Causes & Solutions:
1. Debouncing not working (FileWatcher)
# FileWatcher has built-in 1-second debounce
# If still seeing duplicates, increase it:
from ragversion.watcher import DocumentEventHandler
handler = DocumentEventHandler()
handler._debounce_seconds = 2.0 # Increase to 2 seconds
2. Temp files being tracked
# Solution: Add to ignore patterns
watcher = FileWatcher(
tracker,
paths=["./docs"],
ignore_patterns=[
"*.tmp",
"*.swp",
"*~",
".git/*",
"__pycache__/*"
]
)
Getting More Help¶
Still stuck? Here's how to get help:
- Check examples: examples/
- Read the tutorial: 5-Minute Tutorial
- Search issues: GitHub Issues
- Ask a question: GitHub Discussions
- Report a bug: New Issue
When reporting issues, please include:
- RAGVersion version: python -c "import ragversion; print(ragversion.__version__)"
- Python version: python --version
- Operating system
- Full error traceback
- Minimal code to reproduce