Skip to main content

What Types of Documents Are Supported in TeamAI's Data Hubs?

M
Written by Muhammad Jawad
Updated over a week ago

Overview

AI Data Hubs support diverse document types for creating comprehensive knowledge bases. Understanding supported formats helps you organize and structure your data effectively while managing billing considerations.

Learning Objectives:

  • Identify supported file formats for upload

  • Understand how webpages are treated as documents

  • Learn website import capabilities and limitations

  • Understand billing implications for different document types

  • Plan data hub organization based on document type support


Supported Document Types

Files

AI Data Hubs support the following file formats:

Text and Data Files

  • CSV (Comma-Separated Values) - Structured data and spreadsheets

  • TXT (Plain Text) - Simple text documents without formatting

  • MD (Markdown) - Formatted documentation with headers and styling

  • JSON (JavaScript Object Notation) - Structured data and API responses

Microsoft Office Documents

  • DOCX (Word Document) - Modern Word files (.docx)

  • DOC (Word Document) - Legacy Word files (.doc)

  • XLSX (Excel Spreadsheet) - Modern Excel files (.xlsx)

  • XLS (Excel Spreadsheet) - Legacy Excel files (.xls)

  • PPTX (PowerPoint Presentation) - Modern PowerPoint files (.pptx)

  • PPT (PowerPoint Presentation) - Legacy PowerPoint files (.ppt)

Web and Code Files

  • HTML (Hypertext Markup Language) - Web pages and formatted content

  • CSS (Cascading Style Sheets) - Stylesheets and design documentation

  • JS (JavaScript) - Code files and scripts

  • PDF (Portable Document Format) - Universal document format with complex layouts

Webpages

Webpages are treated as individual documents within data hubs:

  • Single webpage import - Store individual URLs as separate documents

  • Content extraction - System extracts text content from the page

  • Knowledge base creation - Archive important web resources for offline access

  • Search integration - Webpage content becomes searchable within your data hub

Result: Webpage content is indexed and available as a searchable document alongside your other files.

Tip: Use webpage imports to capture reference documentation, articles, and online resources that support your knowledge base.

Websites

Websites consist of multiple webpages that can be imported:

  • Website mode import - Import entire website content as a single unit

  • Individual pages - Each webpage is stored as a separate document within the website collection

  • Crawling limitations - Some dynamic or protected content may not be accessible

  • Organization - Website content is grouped together for easy management

Result: Complete website content becomes part of your data hub structure.

Note: Website import requires the website to be publicly accessible and respect crawling permissions (robots.txt).


Billing Considerations

Understanding how different document types impact your billing is critical for cost management.

Document Count Billing (Files)

These count toward your monthly document limit:

  • Individual files - Each uploaded file counts as one document

  • Webpages - Each imported URL counts as one document

  • High-volume impact - Large quantities of files can quickly increase document count

Example: Uploading 50 PDFs + 30 CSV files = 80 documents toward your limit

Website Count Billing (Websites)

Website mode has separate billing:

  • Website as single unit - Entire website import counts as one website (not multiple documents)

  • Doesn't affect document limit - Website imports don't add to your monthly document count

  • Separate website limit - Websites have their own monthly limit based on your plan

Example: Importing a 100-page website in website mode = 1 unit toward website limit, 0 documents toward document limit


Best Practices

  1. Use website mode for large sites: When archiving an entire documentation site or knowledge base, use website import to avoid consuming document limits.

  2. Upload files for controlled content: For documents you edit and version, use file uploads to maintain control over content.

  3. Import select webpages: For a few important articles, import as webpages rather than full websites to organize them alongside files.

  4. Plan around limits: Track your document and website counts in workspace settings to avoid unexpected overage charges.

  5. Organize by document type: Create separate data hub collections for files vs. web content to maintain clean organization.

  6. Test website imports: Before importing large websites, test with a small subsection to verify content extraction quality.

  7. Use supported formats: Convert documents to supported formats (e.g., DOCX, PDF) before upload to ensure proper indexing.

  8. Monitor billing dashboard: Regularly check your workspace usage statistics to understand consumption patterns.

Common Questions

Q: Are files and webpages counted the same for billing?
A: Yes. Both individual files and individual webpage imports count as one document toward your monthly document limit.

Q: What's the difference between webpage and website imports?
A: Webpage imports add one document per URL to your document count. Website import adds one website unit (containing many pages) to a separate website limit, not your document count.

Q: Do deleted documents or websites count toward billing?
A: No. Deleted content is removed from your active count. Only currently stored documents and websites count toward monthly limits.

Q: What happens if I exceed my document limit?
A: Most plans prevent uploads beyond the limit. Contact your administrator to upgrade your plan or archive unused content.

Q: Are there file size limits for uploads?
A: Yes. Individual files have size limits (typically 50MB-100MB depending on plan). Large PDFs or presentations may need to be split.

Q: Can I import password-protected websites?
A: No. The system can only access publicly available content. For secured resources, download the content and upload as files instead.

Q: Do HTML files import differently than webpages?
A: Yes. HTML files are treated as static documents. Webpage imports actively fetch and extract content from live URLs, which may change over time.

Did this answer your question?