Load File Specifications
This section outlines the requirements of the Automated data upload program and the general MDB, ACCDB or DAT load file specifications for Catalyst Insight. Catalyst’s ability to ingest load files is highly dependent on the consistency and format of the data. In the event that nonconforming data is submitted Catalyst will not attempt to correct the load file. Catalyst will instead wait for a corrected load file to be submitted or until a specific request to correct the load file has been received. All manual work will be billed at the standard rate.
Automated Requirements
Here are the basic requirements:
Load files must be in a MS Access database format (MDB or ACCDB) or a DAT file.
Documents must be compressed in a ZIP or RAR format.
ZIP/RAR files must be smaller than 25 GB.
The MDB and ZIP/RAR files must have the exact same file name and are case sensitive (for example, vol001.mdb and vol001.rar).
Use a dash (-) or underscore (_) if needed - do not use spaces or any other special characters in the file names of the load files. Do not use additional periods other than for the file extension (for example, use very_large_load.rar instead of very.large.load.rar).
The file extension for the MDB and ZIP/RAR files should either be all lower case or all upper case, but not a mix of cases.
Beg/End Control (or Beg/End Bates) Numbers
Beginning and ending control numbers (or Bates values) should be less than 20 characters. If suffixes are required (e.g., -001, -002, etc.), then the Begcontrol/Bates values should not exceed 16 characters to allow for four suffix characters (i.e., dash, underscore, period and three numeric values).
Beginning and ending control numbers (or Bates values) should not contain any special characters, including spaces, underscores or dashes. This is because these characters can interfere with the ability to perform accurate range searches on the site.
Parent/Child Relationships
The Parent and Child documents (attachments) should be linked together using the begattach and endattach fields. The begcontrol of the Parent document should be in the begattach field, and the endcontrol of the last Child document should be in the endattach field. All documents within the same attachment range (from the Parent to the last Child) need to have the same exact begattach and endattach values.
Fields in Load Files
It is important that the same fields are provided in the load file from upload to upload. For the first delivery, a mapping will be created by Catalyst. The mapping creates a relationship between the fields in the load file to the fields on the site. Every time a new load file is provided in a different format than previous deliveries, Catalyst must create a new mapping. If instructions are not provided to Catalyst detailing how to handle the new fields, the upload process will be halted until the specific information has been provided. Please be sure to only provide fields that exist on the site. If a field does not already exist on the site, the mapping will fail. We recommend having as few mappings as possible to create consistency and decrease errors.
One Load File per Volume
When providing documents for upload to Insight, we require two files per upload—an MDB or ACCDB load file, and a ZIP or RAR file containing the documents to be loaded. There should always be one load file provided per ZIP/RAR volume.
DAT file delivery is just like MDB/ACCDB load files, only they have delimiters around the data and delimiters separating the columns:
There must be a map that defines what the fields in the DAT go into in Insight and the map also defines the delimiters. The first row needs to contain the column names (as is standard with DAT files).
Standard Concordance delimiters are what we usually recommend, but we can handle other standard delimiters as well (the delimiters just need to be unique as far as they can't appear in the metadata).
The DAT file should be UTF-8 encoded; if it is not, we will try to convert to UTF-8 before handling the file.
There need to be path fields, just like with MDB/ACCDB files - the data in the path fields should be the same as in a MDB/ACCDB file.