Document Format Guidelines

Native Files

Catalyst will attempt to index the text in a native file, so no text file is required. If this indexing is successful, search hit highlighting will be available on the preview of the native.

Compressed files must be extracted and processed prior to uploading. System and container files cannot be indexed or viewed through the site.

Metadata should be extracted from the native files and submitted with a corresponding load file.

OCR or Text Files

When delivering files that are searchable in the native format (.DOC, .XLS, etc.), extracted text files are not required because Catalyst will index the native file. If the files do not contain any extractable text (.TIF, .JPG, etc.), then OCR text files are needed in order to make the documents searchable within Insight. Requirements for delivering text files are as follows:

TIFF Files

Catalyst Insight accepts single-page TIFF files. In order for TIFF files to be searchable, OCR text files must also be delivered. Single-page TIFF files must be loaded manually and cannot be loaded via the Automated system. These files must be accompanied with an additional load file, either an IPRO .LFP file or an Opticon .OPT file to indicate the document breaks. Multi-page text files must be delivered with the single- page TIFF files (single page text cannot be loaded). The text files should be named to match the first page of each document, such as ABC001.TIF.TXT. If the text files do not contain the full TIFF file name (including the .TIF extension) plus .TXT then the files will be indexed but not visible on the site. Regardless of file naming convention, the text files must be delivered within the same folder as the image files. The associated text should not be included within the load file.

PDF Files

All PDF files must be optimized for fast web viewing or “linearized.” There are three types of PDF files, with unique instructions for each:

Multi-Language Documents

Multi-language documents must also be in UTF-8 format.

Delivery of Coding for Documents

When delivering document coding to be loaded to the site for fields where the data will be mapped to radio buttons, checkboxes, drop down lists or multi-select fields, there are specific formatting requirements. These requirements are as follows and only apply to editable fields: