MyBotBoxMyBotBox

@yarlisai/file-parsers

Extension-routed document parsing — PDF, CSV, DOCX, XLSX, PPTX, HTML, JSON, YAML and more, with lazy-loaded optional vendor adapters.

Extension-routed document parsing — PDF, CSV, DOCX, XLSX, PPTX, HTML, JSON, YAML, Markdown and plain text, with lazy-loaded optional vendor adapters.

Install

npm install @yarlisai/file-parsers

Vendor libraries are optional peer dependencies — install only the ones for the formats you parse (pdf-parse, csv-parse, mammoth, officeparser, exceljs, cheerio, js-yaml). Each adapter loads its vendor library on the first parse of that format; JSON, Markdown and plain text need no vendor at all.

Why

@yarlisai/file-parsers follows the port/adapter contract: consumers depend on a port (the FileParser interface) and the createParserRegistry() factory routes a file extension to the right adapter at runtime. Adding a format is one new adapter file plus one registry entry.

Usage

import { isSupportedFileType, parseBuffer, parseFile } from '@yarlisai/file-parsers'

if (isSupportedFileType('pdf')) {
  const fromDisk = await parseFile('/tmp/report.pdf')
  const fromMemory = await parseBuffer(buffer, 'csv')
}

The package's README ships a complete quickstart. mybotbox-platform itself is the reference consumer — apps/sat/lib/file-parsers/ is a thin shim re-exporting this package for the file-parse API route and the knowledge-base document processor.

See also

On this page

On this page