Document Management

Manage the documents within your knowledge base collections. Learn how to add, update, delete, and reindex documents to keep your RAG system accurate and up to date.

app.8bit-ai.com

Add

Upload files, fetch URLs, or paste raw text into collections

Update

Modify document metadata and content with automatic reindexing

Reindex

Reprocess documents when chunk settings or embedding models change

Delete

Remove outdated or incorrect documents from collections

Adding Documents

Documents are the individual files or text entries within a collection. Each document goes through a processing pipeline: text extraction, chunking, embedding generation, and vector indexing. The document status updates as it moves through each stage.

Status	Description	Typical Duration
Pending	Document received and queued for processing	Instant
Processing	Text extraction, chunking, and embedding in progress	Few seconds to minutes
Ready	Successfully indexed and available for retrieval	-
Failed	Processing error (corrupt file, unsupported format, etc.)	-
Archived	Disabled but retained; not used for retrieval	-

Multi-File Upload

You can upload multiple files simultaneously. Each file is processed independently and added to the collection. The API accepts an array of files or a zip archive.

Updating Documents

Keeping documents current is critical for accurate RAG responses. When source content changes, update the document and trigger a reindex to refresh the vector embeddings.

Updating Document Content

You can replace the content of an existing document by uploading a new file, changing the source URL, or providing updated raw text. The system automatically queues a reindex after the content update.

Deleting Documents

Remove documents that are no longer relevant, contain errors, or were added by mistake. Deletion removes both the document record and its vector embeddings from the index.

Single Document Deletion

Delete a single document by its article ID. The system removes the document metadata, all chunks, and their vector embeddings from the search index.

No Undo

Document deletion is permanent. Consider disabling documents instead of deleting them if you might need to restore them later. Disabled documents still consume storage but do not affect retrieval.

Reindexing Documents

Reindexing reprocesses a document through the entire pipeline: re-chunking with current settings, regenerating embeddings, and updating the vector index. This is necessary when you change chunk configuration, update the embedding model, or want to refresh the indexed content from the source.

When to Reindex

After changing the chunk size or overlap configuration for a collection
When switching to a different embedding model
When the source URL content has been updated (for URL-sourced documents)
If vector corruption or retrieval quality degradation is detected
After restoring a document from an archived state

Reindex Individual Document

Reindex Costs

Reindexing consumes API credits for embedding generation. Large collections may take several minutes to fully reindex. The system processes documents in parallel (up to 5 concurrent) to minimize downtime.

Document Limits

Understanding the platform limits helps you plan your knowledge base strategy and avoid hitting constraints during operation.

Limit	Value	Notes
Documents per collection	1,000	Total documents across all statuses
File size per document	50 MB	Larger files must be split manually
Chunks per document	10,000	Based on chunk size; larger chunks = fewer total
Characters per document	1,000,000	Approximately 250K tokens
Batch upload size	50 files	Per API request
Concurrent reindexes	5 per collection	Documents are processed in parallel
Tags per document	20	Used for filtering and organization

Best Practices for Limits

Split large documents into logical sections (max 50 MB each)
Use multiple collections for different domains to stay under document limits
Remove archived or unnecessary documents to free up capacity
Monitor document count via the dashboard or API

Exceeding Limits

When you approach or exceed limits, the API returns a 429 status code. Review your collection strategy or contact support to discuss plan upgrades for higher limits.