Document Formatting
Best practices for preparing documents before uploading them to your bot's knowledge base.
How you format your documents directly affects how well the bot retrieves and answers from them. Well-structured documents lead to more accurate, relevant responses.
Quick checklist
Before uploading any document, verify:
- Headings use actual heading styles (not just bold or large text)
- Content is organized with a clear hierarchy
- One main topic per document
- No critical information trapped inside images
- File is in a supported format (PDF, DOCX, TXT, MD, CSV)
Word documents
Word documents work best when you use the built-in heading styles:
The critical difference
- Correct: Use Word's Heading 1, Heading 2, Heading 3 styles from the Styles panel
- Incorrect: Making text bold and larger manually to look like a heading
The bot relies on heading styles to understand document structure. Manually formatted "headings" look the same to humans but the bot cannot distinguish them from regular text.
Recommended structure
- Title — The document title (use Title style)
- Heading 1 — Major sections
- Heading 2 — Subsections
- Heading 3 — Detailed topics within subsections
- Body text — Regular paragraphs
Use Word's built-in list features (numbered and bullet lists) rather than typing numbers manually.
PDF files
Native PDFs (recommended)
Native PDFs are created from digital documents (exported from Word, Google Docs, etc.). You can select and copy text from them. These process quickly and accurately.
Scanned PDFs (limited support)
Scanned PDFs are images of paper documents. The system processes them using OCR (optical character recognition), but accuracy varies:
- Handwritten text is poorly recognized
- Low-resolution scans produce errors
- Complex layouts (multi-column, tables with borders) may be misread
When possible, use the original digital document instead of a scan.
Plain text and Markdown
Both formats work well. For best results:
- Use clear section headers
- Separate topics with blank lines
- Use consistent formatting for lists
- Markdown headers (
#,##,###) are recognized and used for document structure
Q&A pairs (CSV format)
For structured question-and-answer content:
- Use exactly two columns:
questionandanswer - Include the column headers in the first row
- One Q&A pair per row
- Keep answers concise and complete
question,answer What is the return policy?,Items can be returned within 30 days of purchase with the original receipt. Do you offer international shipping?,Yes. We ship to all GCC countries. Delivery takes 3-7 business days.
Images in documents
The bot cannot read text inside images. This includes:
- Screenshots of text or tables
- Infographics with text labels
- Scanned handwritten notes
- Diagrams with text annotations
If important information is in an image, add the same information as regular text in the document.
Images also significantly slow down processing. Remove decorative images (logos, backgrounds, stock photos) before uploading.
Common mistakes
| Mistake | Impact | Fix |
|---|---|---|
| Bold text instead of heading styles | Bot cannot identify sections | Apply Heading styles in Word |
| Multiple topics in one large document | Irrelevant content retrieved alongside answers | Split into separate, focused documents |
| Critical info only in images | Bot cannot access the information | Add text version alongside images |
| Scanned PDF when digital exists | Lower accuracy, slower processing | Upload the original digital file |
| Outdated documents not removed | Bot gives incorrect answers | Remove or replace outdated files |
| No clear document structure | Poor retrieval accuracy | Add headings and organize content |