Document Management

Manage the documents within your knowledge base collections. Learn how to add, update, delete, and reindex documents to keep your RAG system accurate and up to date.

app.8bit-ai.com
Knowledge Base page

Add

Upload files, fetch URLs, or paste raw text into collections

Update

Modify document metadata and content with automatic reindexing

Reindex

Reprocess documents when chunk settings or embedding models change

Delete

Remove outdated or incorrect documents from collections

Adding Documents

Documents are the individual files or text entries within a collection. Each document goes through a processing pipeline: text extraction, chunking, embedding generation, and vector indexing. The document status updates as it moves through each stage.

StatusDescriptionTypical Duration
PendingDocument received and queued for processingInstant
ProcessingText extraction, chunking, and embedding in progressFew seconds to minutes
ReadySuccessfully indexed and available for retrieval-
FailedProcessing error (corrupt file, unsupported format, etc.)-
ArchivedDisabled but retained; not used for retrieval-

Multi-File Upload

You can upload multiple files simultaneously. Each file is processed independently and added to the collection. The API accepts an array of files or a zip archive.

Updating Documents

Keeping documents current is critical for accurate RAG responses. When source content changes, update the document and trigger a reindex to refresh the vector embeddings.

Updating Document Content

You can replace the content of an existing document by uploading a new file, changing the source URL, or providing updated raw text. The system automatically queues a reindex after the content update.

Deleting Documents

Remove documents that are no longer relevant, contain errors, or were added by mistake. Deletion removes both the document record and its vector embeddings from the index.

1

Single Document Deletion

Delete a single document by its article ID. The system removes the document metadata, all chunks, and their vector embeddings from the search index.

No Undo

Document deletion is permanent. Consider disabling documents instead of deleting them if you might need to restore them later. Disabled documents still consume storage but do not affect retrieval.

Reindexing Documents

Reindexing reprocesses a document through the entire pipeline: re-chunking with current settings, regenerating embeddings, and updating the vector index. This is necessary when you change chunk configuration, update the embedding model, or want to refresh the indexed content from the source.

When to Reindex

  • After changing the chunk size or overlap configuration for a collection
  • When switching to a different embedding model
  • When the source URL content has been updated (for URL-sourced documents)
  • If vector corruption or retrieval quality degradation is detected
  • After restoring a document from an archived state

Reindex Individual Document

Reindex Costs

Reindexing consumes API credits for embedding generation. Large collections may take several minutes to fully reindex. The system processes documents in parallel (up to 5 concurrent) to minimize downtime.

Document Limits

Understanding the platform limits helps you plan your knowledge base strategy and avoid hitting constraints during operation.

LimitValueNotes
Documents per collection1,000Total documents across all statuses
File size per document50 MBLarger files must be split manually
Chunks per document10,000Based on chunk size; larger chunks = fewer total
Characters per document1,000,000Approximately 250K tokens
Batch upload size50 filesPer API request
Concurrent reindexes5 per collectionDocuments are processed in parallel
Tags per document20Used for filtering and organization

Best Practices for Limits

  • Split large documents into logical sections (max 50 MB each)
  • Use multiple collections for different domains to stay under document limits
  • Remove archived or unnecessary documents to free up capacity
  • Monitor document count via the dashboard or API

Exceeding Limits

When you approach or exceed limits, the API returns a 429 status code. Review your collection strategy or contact support to discuss plan upgrades for higher limits.