UCBG-CollectionSpace data mapping, v2.0

With this iteration, we are targeting the data migration mappings for the migration of SAGE data into a CollectionSpace instance, based on the version 2.0 release of CollectionSpace. This is an interim release Changes from the previous mapping are in blue.

Person

Field mapping

SQL

View: cspace_person_2_0

Notes

  •  Temporarily excluding shortIdentifers 0, 60397 because they are not unique in the view. They are associated with multiple Orgs (which are going into the Groups field). Break into a repeating group and merge in separately.

Talend/ETL

bgimportperson1 0.2: Revising earlier talend job (based on 1.8) to change refname format, remove ampersand handling, and include collectionspace_core:uri element

Organization

Field mapping

SQL

View:

Notes

  • Skipping for now
  • Multiple records per org because many orgs have multiple contacts
  • Probably need to extend Org to have repeating PersonContacts information

Talend/ETL

Taxonomy

Field mapping

SQL

View: cspace_taxonomy

Notes

  • Not getting author shortIDs in view, just names. For now putting them in Taxon Notes. Adjust view or get them separately in SQL.
  • Do we need to adjust the format of scientific names?
  • Lots of other Sage data for Names to add

Talend/ETL

bgtaxonimport2 0.2

Accessions

Field mapping

SQL

View: xdb_accession_qv

Notes

  • Need dropdown values for form
  • Dead Flag Using expression (row1.dead_flag==0) ? "no" : "yes" but should be "alive" "dead"
  • Need to add dead_date, structured
  • Need to add accession date
  • Need hybrid flag
  • Need requested by
  • Is Source from a vocab? Need field (wait for Donor block being added for UCJEPS)
  • Add More Notes to Accession Notes
  • Geog location is often Earth so we know geography needs some work
  • For now, putting Source in Brief Description along with Material type
  • Put Accession Notes in Comments: Rename field
  • What is Plant Status?
  • What is Data Quality?
  • record_status drop down "1"
  • Do we need to fix the form of taxonomic names?

Talend/ETL

bgaccessionimport 0.1

Garden Locations

Field mapping

SQL

View: cspace_location

Notes

  • Type (storage location type) field expects a dynamic controlled list with URN-formatted values. For now just loading the raw data fields from the locationType field. Nothing will display.

Talend/ETL

bgimportgardenlocation 0.1

Plant Locations (Movement and Location procedure)

Field mapping

SQL

View: xdb_plant_location_qv

Notes

  • xdb_plant_location_qv.action_code map to Reason For Move, and rename? Right now it's a numeric code in Sage. What are decode values? Ignoring for now.

Talend/ETL

bgplantlocations 0.1

Relationships between Accessions (Cataloging) and Plant Locations (Movement)

Field mapping

SQL

Text file: plantlocids.txt, produced in Talend bgplantlocations job

Notes

  • Text file has CSID from Plant Location record and CSID from Accession record

Talend/ETL

bgplantlocations 0.1

  • Talend job creates two XML outputs (cataloging-to-movement, movement-to-cataloging)

Propagation

Field mapping

SQL

Control lists:

  • Pot Type: select pot_type from pot_type_code order by pot_type;
  • Pot Size: select pot_size from pot_size_code order by pot_size;
  • Treatment Type: select treatment_type_name from treatment_type order by treatment_type_name;

Notes

Prop Type:

  • lookup table: prop_type_code
  • propagation.prop_type = pro_type_code.prop_type
  • 21 values in prop_type_code

Reason for Prop: propagation.purpose

  • database has a rule that limits the values to: 'class use', 'research', 'garden collection',
    'distribution', 'conservation'
  • 5 distinct values in propagation.purpose

Extra Seeds: propagation.extra_seed_flag

  • boolean (0,1)

Activity Type:

  • lookup table: activity_type_code
  • activity_type_code.activity_type where growth_history.activity_type_id = activity_type_code.activity_type_id
  • 24 values in activity_type_code

Medium:

  • lookup table: medium
  • growth_history.medium_name = medium.medium_name
  • 1302 values in medium

Pot Type:

  • lookup table: pot_type
  • growth_history.pot_type = pot_type_code.pot_type
  • 35 values in pot_type_code

Pot Size:

  • lookup table: pot_size_code
  • pot_size_code.pot_size where growth_history.pot_size_id = pot_size_code.pot_size_id
  • 69 values in pot_size_code

Treatment Type:

  • lookup table: treatment_type
  • treatment_type.treatment_type_name where treatment.treatment_type_id = treatment_type.treatment_type_id
  • 91 values in treatment_type

Concentration: treatment.concentration

  • varchar(10) field
  • no rules or constraints on this field
  • 80 distinct values in treatment.concentration