Botanical propagation data analysis

Major Questions

Goals for data parsing:

  • If subsequent treatment record is a location only and matches the date of a preceding treatment or activity record, then merge the location into a new field.  The new UI has a field for Nursery Location.
  • In the new CSpace UI design, concentration from the SAGE treatment table is moving up to CSpace propagation concentration and only being used for hormone treatments.  However, there are propagations that have multiple treatments that have different concentration values and also different hormones.  Do we need to add hormone and concentration back into the CSpace Event (activity) record?  Or maybe the hormone remains the Event Type (which means we can't merge those treatment types).  Do we have concentration values for treatments that are not hormones (e.g., fungicide)?
  • The new model proposes merging Number Started from Propagation records and Propagule Count in Activity (growh_history) records.  It seems like we need a field at the Propagation level to hold the legacy Number Started information (because there are 2884 conflicting records), but data analysis might help make sense of this.
  • Can Treatment Types be merged by their Treatment Category so that we don't have so many Event Types in CSpace?  We might have to ask UCBG about this.
  • Can we derive the new top-level propagation value for the seeds/scar-strat dropdown?  Use UDF? Need data analysis.
  • Can we derive the new top-level propagation value for the cuttings/cutting type dropdown?  Use UDF? Need data analysis.
  • Can we derive the new top-level propagation value for the cuttings/hormone dropdown?  Use UDF? Need data analysis.
  • Can we derive the new top-level propagation value for the cuttings/concentration dropdown?  Use UDF? Need data analysis.

Also: Search for "ASKUCBG" in notes.

propType drop down - Susan wants to prune the drop down list to three values: seeds, cuttings, live plants, unsure what the mapping is for the rest of the values, may go to new fields we're adding (cutting type?).  FINDING: Several other values might map to seeds, cuttings, live plants.  Or we can concatenate into propagation comments.  SOme values might go into Cutting type.  Also, if we can make inactive terms not appear in drop down lists, then we are in good shape.  

** = legacy read-only field might be needed

** numberStarted in SAGE propagation table is moving down to CSpace activity propaguleCount (merged where possible). FINDING: There are 2884 situations where propagation.number_started differs from growth_history.propagule_count.  What do we do?  Presumably we take the growth_history.propagule_count value, but we then lose propagation.number_started.

propaguleCount in SAGE treatment table is moving up to CSpace activity propaguleCount (merged where possible)
order in SAGE treatment table is moving up to CSpace activity order (collated). FINDING: Ignore treatment.propagule_count.

type in SAGE treatment table is moving up to CSpace activity type - though values from both of these might move up to new fields 'scar/stratification' in propagation, 'hormone' in propagation, 'location in nursery' in activity, or 'fungicide' in activity. FINDING: Might merge treatment types by Treatment Category.

notes in SAGE treatment table is moving up to CSpace activity comments. Finding: We're OK since we are getting the union of SAGE activity records and SAGE treatment records.

germination/rooting date in SAGE activity table is moving up to CSpace propagation germination/rooting date - will have to do data analysis to make sure there is one per propagation.  FINDING: Always null so can ignore!

successRate in SAGE activity table is moving up to CSpace propagation successRate - will have to do data analysis to make sure there is one per propagation. FINDING: Only 3 propagations have different growth_history.percent_success values (416 for 92.1029; 746 for 81.0829; 2836 for 2002.0330)  If we can clean those up (set one value), then we'll be OK

treatment dates in SAGE treatment table are moving up to CSpace activity dates. Finding: We're OK since we are getting the union of SAGE activity records and SAGE treatment records..

** concentration in SAGE treatment table is potentially moving up to CSpace propagation concentration - will have to do data analysis to make sure there is one per propagation - or we can keep this as a hidden field in the activity group for historical data. FINDING: There are approximately 20 propagation records that have treatments with different concentration values.  Might need a legacy field if these can't be "cleaned up" or if it doesn't make sense to clean them up (examples: 567 for 90.0231; 1551 for 77.0717). Does this break the model? ASKUCBG.

OTHER MAJOR QUESTIONS:

- If treatment type is one of the locations (e.g., lath house), we want to move that to the Location in Nursery field. What are all the location treatment types?  Ideally, we would merge these with the matching activity-treatment record, but these might be hard to find.

- How do we derive the new top-level propagation value for the seeds/scar-strat dropdown?  Use UDF? Need data analysis.

- How do we derive the new top-level propagation value for the cuttings/cutting type dropdown?  Use UDF? Need data analysis.

- How do we derive the new top-level propagation value for the cuttings/hormone dropdown?  Use UDF? Need data analysis.

- How do we derive the new top-level propagation value for the cuttings/concentration dropdown?  Use UDF? Need data analysis.

Notes from analysis

spent some quality time this afternoon analyzing data in the propagation tables for the Botanical Garden.  We have a situation where they have requested a design that is significantly different from what they use now in SAGE.  Effectively they are going from a situation where three core entities support the information (propagation efforts can have multiple activities; activities can have multiple treatments) to one where they have two entities (propagation efforts can have multiple activities).  So the treatment data is getting moved around in different places.  This is certainly a simpler model for new data going into the system (the old one was too complex), but we needed to do some quick data analysis to see what this would mean for legacy data.  

The simplest approach (though still hard) is to combine records in activity and treatment together and map those fields to new CSpace fields.  But we knew of several cases where they wanted to do some more involved mapping to combine multiple legacy activity-treatment records into one CSpace activity record.  In my notes below I basically conclude that it is not feasible to clean up data like this, that in fact it does not buy much and is a very expensive operation.  

Lam and Amy, you should take a look at this and see if you concur or think we should look at anything else.  

Amy, I am certain we'll have to add some of those fields to hold the legacy treatment_type and such.  As discussed I think it might be good to make those read-only and clearly labeled ("SAGE Treatment" or something like that).  

John, Glen, and Ray, I'd definitely be interested in your thoughts here, though I'm not expecting you to spend serious time on this.

Thanks,
Chris
--------

SAGE schema (all tables have create and update info too)

propagation: 6738 recs
prop_id
accession_number
prop_date
early_prop_date
late_prop_date
prop_type
purpose
number_started
prop_comments
extra_seed_flag
gh_sequence

growth_history: 19929
prop_id
growth_id
activity_date
activity_begin_date
activity_end_date
activity_type_id
propagule_count
pot_type
pot_size_id
percent_success
growth_comments
medium_name
germination_date

treatment: 20451
prop_id
growth_id
treatment_type_id
concentration
begin_date
end_date
propagule_count
treatment_notes
display_sequence

This query gets the activity and treatment display names too:

NOTE: outside joins needed because there are activities that don't have treatments attached

select gh.prop_id, gh.growth_id, gh.activity_type_id, atc.activity_type,
 t.display_sequence treatment_seq, t.treatment_type_id, tt.treatment_type_name
from growth_history gh
join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id)
left outer join treatment t on (gh.prop_id=t.prop_id and gh.growth_id=t.growth_id)
left outer join treatment_type tt on (t.treatment_type_id=tt.treatment_type_id)
order by gh.prop_id, gh.growth_id, t.display_sequence

with data like:
25    1    4    started    1    16    benlate
25    1    4    started    2    25    lesan
25    1    4    started    3    15    prop house
25    1    4    started    4    1    bottom heat
25    2    14    results    (null)    (null)    (null)
25    3    2    changed pot    1    15    prop house
25    4    14    results    1    15    prop house
25    5    2    changed pot    1    14    holding area
25    6    3    planted out    (null)    (null)    (null)

showing 6 activities, the first of which has 4 treatments.  two activities have no joined treatment records.  Without hard work this would end up as 9 activities in CSpace for this activity.  And we would be adding the read-only legacy treatment type fields, etc.

How are activity_types distributed?

select atc.activity_type, count\(*)
from growth_history gh, activity_type_code atc
where gh.activity_type_id=atc.activity_type_id
group by activity_type
order by count\(*) desc

potted    4359
started    3620
changed pot    3256
results    3139
pretreat    2210
planted out    1120
changed location    1001
applied treatment    544
dead    358
discarded    145
changed medium    62
repotted    61
checked pot    21
divided    10
plant sale    7
moved    4
sow    3
cold strat    2
cutting taken    2
surplus    1
damped off    1
distributed    1
deaccessioned    1
returned to nursery    1

Treatment types, interesting, there is a treatment category field that identifies treatment types in several categories: null, disinfectant, environment, fertilizer, fungicide, hormone, none, pesticide, physical, scarification, stratification

How are treatment types distributed?

select tt.treatment_type_name, count\(*)
from treatment t, treatment_type tt
where t.treatment_type_id=tt.treatment_type_id
group by tt.treatment_type_name
order by order by count\(*) desc

lath house    4346
prop house    3119
intermittent mist    1677
cold stratification    1338
bottom heat    1309
prop house 1    1128
holding area    914
benlate    845
physan wash    813
lights    747
dip and grow    630
physan    590
IBA    543
soak in water    431
benlate & captan    261
IBA talc    250
benlate & lesan    211
warm stratification    162
research greenhouse    135
Hormex    97
sulphuric acid    95
water tray    92
prop house shade    90
smoke plus    79
subdue    58
shade house    51
boiling water on soil surface    50
scarify    40
GA3    32
lesan    31
Rootone    25
Truban    25
prop house 2 shade    24
hormone    18
benlate & physan    17
benlate & subdue    17
glass cover    16
physan soak    12
nursery    11
wardian case    11
lath house shade    10
Nurs    7
wound cuttings    7
boiling water    6
captan    5
fungicide    5
stratification    5
2-4-2    4
freeze    4
microwave oven    4
osmocote    4
pericarp removed    4
outside prop house    3
refrigerator    3
central prop house    2
fire    2
heat    2
liquid fertilizer    2
pericarp trimmed    2
physan dip    2
rear prop house    2
Serenade    2
bleach    1
bright location    1
cactus house    1
chipped    1
clorox soak    1
darkness    1
drenched in hot water    1
dusted with sulphur    1
force dormant    1
gravel mulch    1
green house #2    1
iron sulphate    1
metal ponds    1
NAA    1
pericarp ruptured    1
prop house 2    1
seed coat removed    1
shade    1
sterilized    1
sulphur    1
under mist bench    1
wash & clean    1
water & let go dry    1
water basin    1

There must be 4 unused treatments (91 in the table).  Many of the top ones are related to location (lath house, prop house)

OK, so how are counts of combined activity-treatment records distributed?

select atc.activity_type, tt.treatment_type_name, count\(*)
from growth_history gh
join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id)
left outer join treatment t on (gh.prop_id=t.prop_id and gh.growth_id=t.growth_id)
left outer join treatment_type tt on (t.treatment_type_id=tt.treatment_type_id)
group by atc.activity_type, tt.treatment_type_name
order by count\(*) desc

275 rows, but here they are. See Excel attachment too.

results    (null)    3089
potted    lath house    1822
potted    intermittent mist    1572
changed pot    lath house    1377
planted out    (null)    1116
started    cold stratification    1102
started    prop house    996
potted    prop house    858
changed pot    prop house    857
pretreat    physan wash    792
started    bottom heat    697
started    benlate    613
pretreat    dip and grow    613
started    lath house    567
pretreat    IBA    527
started    physan    526
started    lights    497
changed location    lath house    484
started    prop house 1    473
potted    holding area    447
pretreat    soak in water    428
changed pot    (null)    361
dead    (null)    355
potted    (null)    353
changed location    prop house    320
potted    bottom heat    316
changed pot    holding area    308
potted    prop house 1    255
changed pot    prop house 1    248
pretreat    IBA talc    247
changed location    bottom heat    226
applied treatment    cold stratification    215
discarded    (null)    145
started    benlate & lesan    133
changed location    prop house 1    130
started    warm stratification    128
potted    lights    125
started    holding area    119
pretreat    benlate    111
applied treatment    benlate & captan    108
pretreat    sulphuric acid    95
pretreat    Hormex    94
pretreat    smoke plus    77
changed pot    benlate & captan    68
started    prop house shade    64
started    research greenhouse    64
changed location    lights    58
changed pot    lights    57
started    water tray    56
started    intermittent mist    54
started    (null)    53
pretreat    physan    53
started    benlate & captan    53
potted    benlate    52
applied treatment    bottom heat    50
started    boiling water on soil surface    48
changed pot    research greenhouse    40
changed pot    shade house    38
applied treatment    benlate    37
applied treatment    prop house    37
pretreat    scarify    34
applied treatment    benlate & lesan    33
pretreat    GA3    31
potted    water tray    31
changed location    holding area    30
applied treatment    warm stratification    30
changed pot    benlate    29
applied treatment    subdue    29
pretreat    Rootone    25
changed medium    (null)    24
potted    benlate & captan    24
changed medium    lath house    24
potted    research greenhouse    24
results    lath house    22
potted    prop house shade    22
changed pot    benlate & lesan    21
started    lesan    19
results    prop house    19
changed location    (null)    18
started    Truban    18
pretreat    hormone    18
repotted    lath house    18
changed pot    intermittent mist    18
repotted    prop house    17
started    benlate & physan    17
started    prop house 2 shade    16
pretreat    (null)    15
started    glass cover    15
applied treatment    IBA    15
pretreat    intermittent mist    15
changed pot    subdue    14
changed pot    bottom heat    14
potted    benlate & subdue    13
potted    cold stratification    13
checked pot    (null)    12
potted    shade house    12
pretreat    benlate & lesan    12
repotted    (null)    11
repotted    prop house 1    11
applied treatment    lath house    11
applied treatment    physan wash    11
potted    subdue    10
changed pot    nursery    10
potted    benlate & lesan    10
applied treatment    dip and grow    9
started    physan wash    8
checked pot    lath house    8
applied treatment    lesan    8
changed medium    prop house    8
plant sale    (null)    7
pretreat    lath house    7
pretreat    physan soak    7
changed location    wardian case    7
applied treatment    (null)    6
started    dip and grow    6
potted    lath house shade    6
applied treatment    scarify    6
pretreat    benlate & captan    6
changed location    intermittent mist    6
potted    physan    5
changed pot    Nurs    5
pretreat    stratification    5
applied treatment    lights    5
repotted    intermittent mist    5
changed location    prop house 2 shade    5
changed location    research greenhouse    5
moved    (null)    4
divided    (null)    4
potted    Truban    4
repotted    subdue    4
started    physan soak    4
results    holding area    4
started    boiling water    4
pretreat    microwave oven    4
pretreat    wound cuttings    4
applied treatment    captan    4
applied treatment    osmocote    4
pretreat    cold stratification    4
applied treatment    intermittent mist    4
divided    prop house 1    3
changed pot    water tray    3
started    wound cuttings    3
applied treatment    physan    3
pretreat    pericarp removed    3
potted    warm stratification    3
changed medium    holding area    3
sow    (null)    2
cold strat    (null)    2
cutting taken    (null)    2
potted    lesan    2
pretreat    fire    2
pretreat    heat    2
started    freeze    2
dead    prop house    2
pretreat    lights    2
repotted    benlate    2
changed pot    2-4-2    2
divided    lath house    2
potted    physan wash    2
potted    dip and grow    2
potted    wardian case    2
pretreat    physan dip    2
pretreat    prop house    2
pretreat    bottom heat    2
results    prop house 1    2
started    refrigerator    2
started    wardian case    2
planted out    lath house    2
applied treatment    2-4-2    2
applied treatment    Hormex    2
started    lath house shade    2
pretreat    pericarp trimmed    2
repotted    benlate & captan    2
results    intermittent mist    2
applied treatment    IBA talc    2
applied treatment    Serenade    2
started    outside prop house    2
applied treatment    fungicide    2
changed location    water tray    2
changed medium    prop house 1    2
applied treatment    smoke plus    2
applied treatment    holding area    2
applied treatment    prop house 1    2
changed pot    prop house 2 shade    2
applied treatment    boiling water    2
applied treatment    soak in water    2
changed location    lath house shade    2
changed location    prop house shade    2
applied treatment    benlate & subdue    2
applied treatment    liquid fertilizer    2
applied treatment    central prop house    2
changed location    cold stratification    2
surplus    (null)    1
damped off    (null)    1
distributed    (null)    1
deaccessioned    (null)    1
potted    Nurs    1
results    IBA    1
pretreat    NAA    1
potted    Hormex    1
started    shade    1
divided    lights    1
pretreat    lesan    1
results    lights    1
started    captan    1
started    subdue    1
dead    lath house    1
potted    IBA talc    1
pretreat    Truban    1
pretreat    bleach    1
pretreat    freeze    1
results    benlate    1
started    sulphur    1
dead    bottom heat    1
potted    fungicide    1
pretreat    chipped    1
sow    prop house 1    1
changed pot    lesan    1
changed pot    Truban    1
changed pot    physan    1
potted    metal ponds    1
potted    water basin    1
pretreat    fungicide    1
repotted    fungicide    1
started    sterilized    1
potted    cactus house    1
potted    gravel mulch    1
results    bottom heat    1
started    clorox soak    1
potted    force dormant    1
applied treatment    GA3    1
changed location    Nurs    1
changed medium    lights    1
pretreat    wash & clean    1
repotted    holding area    1
started    soak in water    1
checked pot    prop house    1
dead    intermittent mist    1
planted out    prop house    1
potted    rear prop house    1
pretreat    iron sulphate    1
potted    under mist bench    1
results    benlate & lesan    1
started    bright location    1
started    rear prop house    1
applied treatment    Truban    1
applied treatment    freeze    1
changed location    nursery    1
started    benlate & subdue    1
potted    outside prop house    1
potted    water & let go dry    1
applied treatment    darkness    1
changed medium    bottom heat    1
changed pot    green house #2    1
potted    dusted with sulphur    1
pretreat    pericarp ruptured    1
pretreat    seed coat removed    1
results    prop house 2 shade    1
divided    research greenhouse    1
changed location    shade house    1
changed pot    benlate & subdue    1
changed pot    prop house shade    1
repotted    research greenhouse    1
applied treatment    glass cover    1
applied treatment    physan soak    1
changed location    prop house 2    1
changed location    refrigerator    1
started    drenched in hot water    1
returned to nursery    lath house    1
planted out    cold stratification    1
changed location    benlate & lesan    1
applied treatment    pericarp removed    1
applied treatment    prop house shade    1
pretreat    boiling water on soil surface    1
applied treatment    boiling water on soil surface    1

What we'd be really interested in finding out is if we can identify the situations where an activity has multiple treatment records, but the second is just a location.  Then we could fold those together.  But what if there are more than two?  Use the date?  What if there are only two but they have different dates?  Then maybe they should be separate CSpace records.  Ugh, too hard.  Maybe just find all combinations of activity and treatment.  Are there activities that don't have treatment records?  Yes!  Queries adjusted to do outside join to treatment table.

FYI, the number of records produced by getting all combinations of activity and treatment:

select count\(*)
from growth_history gh
join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id)
left outer join treatment t on (gh.prop_id=t.prop_id and gh.growth_id=t.growth_id)
left outer join treatment_type tt on (t.treatment_type_id=tt.treatment_type_id)

26027
growth_history has 19.9K and treatment has 20.5K so by loading all combinations we are not growing the number of records by very much beyond the ideal.  Maybe you could argue that this is 5500K extra records.

---

Next day:

Propagation type

SELECT
    prop_type,
   count(*)
FROM
    dbo.propagation
group by prop_type

seed    3630
cutting    1704
living plant    1104
division    111
bulbs    88
seedlings    33
corms    17
bulb    12
transplant    11
corm    6
rooted pieces    5
spores    5
seedling    3
bulblets    3
rooted cuttings    2
bulb scales    2
bare root    1
tuber    1

ASKUCBG: Susan wants to reduce prop type to three values.  Can we map these existing values to those three?  That is, if we can't figure out how to not display inactive terms?

Germination Date

Good news: Germination date is always null!

Success rate

successRate in SAGE activity table is moving up to CSpace propagation successRate - will have to do data analysis to make sure there is one per propagation

SELECT
    p.prop_id,
    accession_number,
    prop_date,
    prop_type,
    gh.growth_id,
    gh.activity_type_id,
    atc.activity_type,
    gh.percent_success
FROM
    dbo.propagation p
left outer join growth_history gh on (p.prop_id = gh.prop_id)
left outer join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id)
order by prop_id, growth_id

When there is a non-null percent_success value it looks like the activity type is results.  However, success_value can still be null even if activity type is "results"
The question is: Can there be different activity.percent_success values for a single propagation record?

SELECT
    p.prop_id, accession_number,
    atc.activity_type,
    gh.percent_success
FROM
    dbo.propagation p
left outer join growth_history gh on (p.prop_id = gh.prop_id)
left outer join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id)
where percent_success is not null
order by prop_id, growth_id

Only 3 propagations have different results in growth_history records.  

416    92.1029       results    5
416    92.1029       results    26

746    81.0829       results    71
746    81.0829       changed pot    99
746    81.0829       changed pot    99

2836    2002.0330     potted    5
2836    2002.0330     results    89

Concentration

** concentration in SAGE treatment table is potentially moving up to CSpace propagation concentration - will have to do data analysis to make sure there is one per propagation - or we can keep this as a hidden field in the activity group for historical data

SELECT
    p.prop_id, accession_number,
    atc.activity_type,
    t.display_sequence treatment_seq,
    tt.treatment_type_name,
    t.concentration
FROM
    dbo.propagation p
left outer join growth_history gh on (p.prop_id = gh.prop_id)
left outer join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id)
left outer join treatment t on (gh.prop_id=t.prop_id and gh.growth_id=t.growth_id)
left outer join treatment_type tt on (t.treatment_type_id=tt.treatment_type_id)
where t.concentration is not null
order by prop_id, gh.growth_id, display_sequence

NOTE: Might use these queries to create merge fragments to merge with main propagation SQL.  Otherwise have to find a way to get the non-null values, etc., in the main query.

NOTE: Might need to have the treatment_type come along in this query

567    90.0231       pretreat    2    dip and grow    1:5
567    90.0231       pretreat    3    IBA    0.8

610    83.1172       pretreat    2    dip and grow    1:3
610    83.1172       potted    2    bottom heat    1:3
610    83.1172       potted    3    intermittent mist    1:3

670    93.1051       pretreat    1    dip and grow    1:5
670    93.1051       pretreat    2    IBA talc    4.5

1402    94.0310       pretreat    2    dip and grow    1:10
1402    94.0310       pretreat    3    IBA talc    4.5

1478    91.1474       pretreat    1    dip and grow    1:5
1478    91.1474       potted    1    dip and grow    1:5

1551    77.0717       pretreat    2    dip and grow    1:5
1551    77.0717       pretreat    3    IBA    4.5

So I'm only about a 3rd through the 1375 records and there are 6 propagations that have different concentration values (and different treatment_types too)

Are the concentrations unique within an activity?

Here's an interesting query:
SELECT
    p.prop_id,
    tt.treatment_type_name,
    t.concentration, count(*)
FROM
    dbo.propagation p
left outer join growth_history gh on (p.prop_id = gh.prop_id)
left outer join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id)
left outer join treatment t on (gh.prop_id=t.prop_id and gh.growth_id=t.growth_id)
left outer join treatment_type tt on (t.treatment_type_id=tt.treatment_type_id)
group by p.prop_id,
    tt.treatment_type_name,
    t.concentration
having count(*) > 1
order by prop_id

---
10/9

Combined activity-treatment records

select gh.prop_id,
gh.growth_id,
gh.activity_date,
gh.activity_begin_date,
gh.activity_end_date,
gh.activity_type_id,
atc.activity_type,
gh.propagule_count,
gh.pot_type,
gh.pot_size_id,
gh.percent_success,
gh.growth_comments,
gh.medium_name,
gh.germination_date,
gh.date_entered,
gh.entered_staff_id,
gh.last_change_staff_id,
gh.last_change_date,
t.display_sequence,
t.treatment_type_id,
tt.treatment_type_name,
t.concentration,
t.begin_date,
t.end_date,
t.propagule_count,
t.treatment_notes,
t.date_entered,
t.entered_staff_id,
t.last_change_date,
t.last_change_staff_id
from growth_history gh
join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id)
left outer join treatment t on (gh.prop_id=t.prop_id and gh.growth_id=t.growth_id)
left outer join treatment_type tt on (t.treatment_type_id=tt.treatment_type_id)
order by gh.prop_id, gh.growth_id, t.display_sequence

ASKUCBG: Activity date in CSpace is calendar, but there are fuzzy dates in SAGE.  1058 dates of 18.5K are fuzzy.

Propagule count GH and T

Reconcile propagule count in GH and T:

SELECT
    p.prop_id, accession_number,
    atc.activity_type, gh.propagule_count,
    t.display_sequence treatment_seq,
    tt.treatment_type_name,
    t.propagule_count
FROM
    dbo.propagation p
left outer join growth_history gh on (p.prop_id = gh.prop_id)
left outer join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id)
left outer join treatment t on (gh.prop_id=t.prop_id and gh.growth_id=t.growth_id)
left outer join treatment_type tt on (t.treatment_type_id=tt.treatment_type_id)
where gh.propagule_count is not null or t.propagule_count is not null
order by prop_id, gh.growth_id, display_sequence

> 5000 records returned, mostly GH by the looks of it.

where gh.propagule_count is not null or t.propagule_count is not null: 16216
where gh.propagule_count is not null and t.propagule_count is not null: 2
where gh.propagule_count is not null: 16216
where 2.propagule_count is not null: 2

For the 2 where t.propagule_count is not null, the values are identical in GH.  Therefore use GH only.

Reconcile propagation.number_started with gh.propagule_count

SELECT
    p.prop_id,
    accession_number,
    prop_date,
    prop_type,
    number_started,
    gh.activity_type_id,
    atc.activity_type,
    gh.propagule_count
FROM
    dbo.propagation p
left outer join growth_history gh on (p.prop_id = gh.prop_id)
left outer join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id)
order by prop_id, growth_id

where p.number_started is not null or gh.propagule_count is not null: 13556
where p.number_started is not null and gh.propagule_count is not null: 10136
where p.number_started is not null: 11011
where  gh.propagule_count is not null: 12681
where p.number_started = gh.propagule_count: 7252
where p.number_started <> gh.propagule_count: 2884

ASKUCBG: What to do about 2284 where the values are not equal? e.g.,

22    85.1541       November 20 1985    seed    15    2    changed pot    1
22    85.1541       November 20 1985    seed    15    6    changed location    1
22    85.1541       November 20 1985    seed    15    9    dead    1

6771    96.1050       Oct 19, 2010    cutting    20    13    potted    11
6771    96.1050       Oct 19, 2010    cutting    20    13    potted    1
6773    96.1051       Oct 26, 2010    cutting    20    13    potted    16
6773    96.1051       Oct 26, 2010    cutting    20    13    potted    1

Do we need to keep number started in top level propagation record?

Instead? Get the union of activity records and treatment records?  Try this out SQL.

March 5, 2013

Here's the SQL we are currently using to bring together the activity and treatment records:

-- activity records with good dates
select
p.accession_number,
p.prop_type,
'activity' record_type,
gh.prop_id prop_id,
gh.growth_id growth_id,
0 treatment_id,
atc.activity_type type_name,
null treatment_category,
gh.activity_date event_date,
convert(char(12), dateadd(dd,(gh.activity_begin_date - 2415021),
  convert(datetime, '1900-01-01', 103)), 101) begin_date,
convert(char(12), dateadd(dd,(gh.activity_end_date - 2415021),
  convert(datetime, '1900-01-01', 103)), 101) end_date,
gh.propagule_count propagule_count,
gh.growth_comments comments,
null concentration,
gh.pot_type pot_type,
gh.pot_size_id pot_size_id,
gh.percent_success percent_success,
gh.medium_name medium_name,
gh.germination_date germination_date,
gh.date_entered date_entered,
gh.entered_staff_id entered_staff_id,
gh.last_change_date last_change_date,
gh.last_change_staff_id last_change_staff_id
from growth_history gh, activity_type_code atc, propagation p
where gh.activity_type_id=atc.activity_type_id
and gh.prop_id=p.prop_id
union
-- treatment records look OK too:
select
p.accession_number,
p.prop_type,
'treatment' record_type,
t.prop_id prop_id,
t.growth_id growth_id,
t.display_sequence treatment_id,
tt.treatment_type_name type_name,
tt.treatment_category treatment_category,
null event_date,
convert(char(12), t.begin_date, 101) begin_date,
convert(char(12), t.end_date, 101) end_date,
t.propagule_count propagule_count,
t.treatment_notes comments,
t.concentration concentration,
null pot_type,
null pot_size_id,
null percent_success,
null medium_name,
null germination_date,
t.date_entered date_entered,
t.entered_staff_id entered_staff_id,
t.last_change_date last_change_date,
t.last_change_staff_id last_change_staff_id
from treatment t, treatment_type tt, propagation p
where t.treatment_type_id=tt.treatment_type_id
and t.prop_id=p.prop_id
order by prop_id, growth_id, treatment_id