Interface DocumentExtractInput

DocumentExtractInput represents the input for document extraction.

interface DocumentExtractInput {
    chunkDocument?: boolean;
    chunkSize?: number;
    embedImages?: boolean;
    enableOCR?: boolean;
    file: string | File;
    injection?: boolean;
    outputFormat?: string;
    pii?: string;
    replaceMethod?: string;
    toxicity?: boolean;
}

Properties

chunkDocument?: boolean

chunkDocument represents whether to separate the document into chunks.

chunkSize?: number

chunkSize represents the size of chunks for the documents.

embedImages?: boolean

embedImages represents whether to embed images from the document.

enableOCR?: boolean

enableOCR represents whether to enable OCR for document parsing.

file: string | File

file represents the document file to upload (can be File object or path string).

injection?: boolean

injection represents whether to check the output for a prompt injection.

outputFormat?: string

outputFormat represents the output format for the content of the document.

pii?: string

pii represents whether to check the output for PII.

replaceMethod?: string

replaceMethod represents the method to replace any found PII.

toxicity?: boolean

toxicity represents whether to check the output for toxicity.