Document Management
Manage the documents within your knowledge base collections. Learn how to add, update, delete, and reindex documents to keep your RAG system accurate and up to date.

Add
Upload files, fetch URLs, or paste raw text into collections
Update
Modify document metadata and content with automatic reindexing
Reindex
Reprocess documents when chunk settings or embedding models change
Delete
Remove outdated or incorrect documents from collections
Adding Documents
Documents are the individual files or text entries within a collection. Each document goes through a processing pipeline: text extraction, chunking, embedding generation, and vector indexing. The document status updates as it moves through each stage.
| Status | Description | Typical Duration |
|---|---|---|
| Pending | Document received and queued for processing | Instant |
| Processing | Text extraction, chunking, and embedding in progress | Few seconds to minutes |
| Ready | Successfully indexed and available for retrieval | - |
| Failed | Processing error (corrupt file, unsupported format, etc.) | - |
| Archived | Disabled but retained; not used for retrieval | - |
Multi-File Upload
You can upload multiple files simultaneously. Each file is processed independently and added to the collection. The API accepts an array of files or a zip archive.
Updating Documents
Keeping documents current is critical for accurate RAG responses. When source content changes, update the document and trigger a reindex to refresh the vector embeddings.
Updating Document Content
You can replace the content of an existing document by uploading a new file, changing the source URL, or providing updated raw text. The system automatically queues a reindex after the content update.
Deleting Documents
Remove documents that are no longer relevant, contain errors, or were added by mistake. Deletion removes both the document record and its vector embeddings from the index.
Single Document Deletion
Delete a single document by its article ID. The system removes the document metadata, all chunks, and their vector embeddings from the search index.
No Undo
Reindexing Documents
Reindexing reprocesses a document through the entire pipeline: re-chunking with current settings, regenerating embeddings, and updating the vector index. This is necessary when you change chunk configuration, update the embedding model, or want to refresh the indexed content from the source.
When to Reindex
- After changing the chunk size or overlap configuration for a collection
- When switching to a different embedding model
- When the source URL content has been updated (for URL-sourced documents)
- If vector corruption or retrieval quality degradation is detected
- After restoring a document from an archived state
Reindex Individual Document
Reindex Costs
Document Limits
Understanding the platform limits helps you plan your knowledge base strategy and avoid hitting constraints during operation.
| Limit | Value | Notes |
|---|---|---|
| Documents per collection | 1,000 | Total documents across all statuses |
| File size per document | 50 MB | Larger files must be split manually |
| Chunks per document | 10,000 | Based on chunk size; larger chunks = fewer total |
| Characters per document | 1,000,000 | Approximately 250K tokens |
| Batch upload size | 50 files | Per API request |
| Concurrent reindexes | 5 per collection | Documents are processed in parallel |
| Tags per document | 20 | Used for filtering and organization |
Best Practices for Limits
- Split large documents into logical sections (max 50 MB each)
- Use multiple collections for different domains to stay under document limits
- Remove archived or unnecessary documents to free up capacity
- Monitor document count via the dashboard or API
Exceeding Limits
When you approach or exceed limits, the API returns a 429 status code. Review your collection strategy or contact support to discuss plan upgrades for higher limits.