BAM-PFA Art data mapping and migration

Background

Data will be migrated from the BAMCollection FileMaker Pro (v7) database currently in use at BAM/PFA for the art collection

Note: Export data directly from FileMaker in tab-delimited format as UTF8 in order to preserve data.  For example, DOB_Year and DOD_Year are declared in the database as datatype Date.  However, the UI allows users to enter "c. 1485" (see Urs Graf).  When query by JDBC or export CSV, this ends up as ".1485".  When export tab-delimited, we get "c. 1485".  We need to do more work to confirm that this is the appropriate way to get data from FileMaker to Talend.

Data mapping documents and other table and schema information are managed in Google Drive (access limited to team members).

ETL Process and Notes

While several other approaches were tested (e.g., JDBC queries directly against the FMPro database), the following process appears to be the best approach

  • Ensure you are viewing all records for the table or layout that you want to export.
  • Export table (all fields or only fields needed) from FileMaker in tab-delimited, UTF-8 character set. (Exclude fields (Image and TR_No_Sort in this case) and ensure field order is the same as what you have used before.)
  • Also export same fields into Excel in order to get the field names in row one.
  • Edit the tab-delimited file and paste in the row of field names from the Excel file.  You might need to enter a blank row after the column heading in order to ensure that the newline is recognized after the field names.
  • Use the resulting file as a Talend input.

Also, FileMaker allows the developer to designate a field as repeating.  The field is still a single field in the table, but repeating values are stored with a vertical-tab character as a separator that is interpreted in the UI.  More information at http://help.filemaker.com/app/answers/detail/a_id/949/.  We developed a procedure to parse this in Talend using character replacement, and we learned that it is possible to query these values using FileMaker's JDBC driver which has an extension to support querying values in the repeating fields.

Authorities

Persons and Organizations

  • Artist (will become person and/or organization vocabulary records; note need org vocabulary for Artists Collective probably) (Lam)
  • Artist Collective values will become their own organization vocabulary ('collective') (Chris).  Note: Because there are only 9 values in the data (11 in the final frozen FM Pro data), these are added by hand.
  • Engraver values will become values in Artist vocabulary (Lam)
  • Publisher (Asian Collection) might need to become a Production Organization, TBD

Storage Locations

  • Value list RepositoryNames will become Storage Location vocabulary
  • Storage Location terms are being mined directly from the data. (John)

Controlled lists

    • Collection (will become collection dynamic controlled list)

Cataloging

Main collection item

  • Collection Items (will become cataloging records including item-specific acquisition values) (Chris)
    • ETL notes:  I do need to check that the columns are in the right order (as expected by the header row) by importing into Excel.  !!NOTE: I'm excluding the Image field (which breaks the export) and the TR_No_SORT field (which I thought was breaking the output file).  Copy output to Data/collectionitems.tab

Merge groups

  • Artist Lineitems (will become repeating Production Person and possibly Production Organization on cataloging, likely in new custom field group with Qualifier). CollectionItem:Engravers would also become Production Person values with role Engraver. (Chris)
  • Collections Lineitems (will become repeating Collection values on cataloging) (Chris)
  • Measurements (will become repeating measurements information on cataloging) (Lam)
  • Collection Texts (will become repeating text-labels on cataloging)
  • CollectionItem/Period-Style (FileMaker repeating PeriodOrStyle field will reuse repeating Style field on cataloging; tie to new dynamic controlled list) (Chris)
  • CollectionItem/Subjects-Themes (FileMaker repeating subdescription field will become new repeating SubjectsThemes field, but need to decide how to parse data entered where BAMPFA created an implicit hierarchy) (Chris)
  • Titles: Main Title and Alternate Title (Chris)
  • Current value field group: Primary current value, source, date.  Current value old to pos=1. (Chris)

Procedures

Location Movement

  • Location: Will become CSpace Location-Movement records and LMI-Cataloging relationships

Acquisition

  • Several acquisition-related fields will become CSpace Acquisition records (John)
  • Acquisition-Cataloging relationships (Chris)
  • Item-Specific Acquisition overrides on collection items (Chris)

Media Handling

  • Parse from filenames.

Tables to skip

  • Collection Lineitems (digitization project?  do not migrate?)
  • Inventory (one-time inventory effort; do not migrate?)