Botanical propagation data analysis
Major Questions
Goals for data parsing:
- If subsequent treatment record is a location only and matches the date of a preceding treatment or activity record, then merge the location into a new field. The new UI has a field for Nursery Location.
- In the new CSpace UI design, concentration from the SAGE treatment table is moving up to CSpace propagation concentration and only being used for hormone treatments. However, there are propagations that have multiple treatments that have different concentration values and also different hormones. Do we need to add hormone and concentration back into the CSpace Event (activity) record? Or maybe the hormone remains the Event Type (which means we can't merge those treatment types). Do we have concentration values for treatments that are not hormones (e.g., fungicide)?
- The new model proposes merging Number Started from Propagation records and Propagule Count in Activity (growh_history) records. It seems like we need a field at the Propagation level to hold the legacy Number Started information (because there are 2884 conflicting records), but data analysis might help make sense of this.
- Can Treatment Types be merged by their Treatment Category so that we don't have so many Event Types in CSpace? We might have to ask UCBG about this.
- Can we derive the new top-level propagation value for the seeds/scar-strat dropdown? Use UDF? Need data analysis.
- Can we derive the new top-level propagation value for the cuttings/cutting type dropdown? Use UDF? Need data analysis.
- Can we derive the new top-level propagation value for the cuttings/hormone dropdown? Use UDF? Need data analysis.
- Can we derive the new top-level propagation value for the cuttings/concentration dropdown? Use UDF? Need data analysis.
Also: Search for "ASKUCBG" in notes.
propType drop down - Susan wants to prune the drop down list to three values: seeds, cuttings, live plants, unsure what the mapping is for the rest of the values, may go to new fields we're adding (cutting type?). FINDING: Several other values might map to seeds, cuttings, live plants. Or we can concatenate into propagation comments. SOme values might go into Cutting type. Also, if we can make inactive terms not appear in drop down lists, then we are in good shape.
** = legacy read-only field might be needed
** numberStarted in SAGE propagation table is moving down to CSpace activity propaguleCount (merged where possible). FINDING: There are 2884 situations where propagation.number_started differs from growth_history.propagule_count. What do we do? Presumably we take the growth_history.propagule_count value, but we then lose propagation.number_started.
propaguleCount in SAGE treatment table is moving up to CSpace activity propaguleCount (merged where possible)
order in SAGE treatment table is moving up to CSpace activity order (collated). FINDING: Ignore treatment.propagule_count.
type in SAGE treatment table is moving up to CSpace activity type - though values from both of these might move up to new fields 'scar/stratification' in propagation, 'hormone' in propagation, 'location in nursery' in activity, or 'fungicide' in activity. FINDING: Might merge treatment types by Treatment Category.
notes in SAGE treatment table is moving up to CSpace activity comments. Finding: We're OK since we are getting the union of SAGE activity records and SAGE treatment records.
germination/rooting date in SAGE activity table is moving up to CSpace propagation germination/rooting date - will have to do data analysis to make sure there is one per propagation. FINDING: Always null so can ignore!
successRate in SAGE activity table is moving up to CSpace propagation successRate - will have to do data analysis to make sure there is one per propagation. FINDING: Only 3 propagations have different growth_history.percent_success values (416 for 92.1029; 746 for 81.0829; 2836 for 2002.0330) If we can clean those up (set one value), then we'll be OK
treatment dates in SAGE treatment table are moving up to CSpace activity dates. Finding: We're OK since we are getting the union of SAGE activity records and SAGE treatment records..
** concentration in SAGE treatment table is potentially moving up to CSpace propagation concentration - will have to do data analysis to make sure there is one per propagation - or we can keep this as a hidden field in the activity group for historical data. FINDING: There are approximately 20 propagation records that have treatments with different concentration values. Might need a legacy field if these can't be "cleaned up" or if it doesn't make sense to clean them up (examples: 567 for 90.0231; 1551 for 77.0717). Does this break the model? ASKUCBG.
OTHER MAJOR QUESTIONS:
- If treatment type is one of the locations (e.g., lath house), we want to move that to the Location in Nursery field. What are all the location treatment types? Ideally, we would merge these with the matching activity-treatment record, but these might be hard to find.
- How do we derive the new top-level propagation value for the seeds/scar-strat dropdown? Use UDF? Need data analysis.
- How do we derive the new top-level propagation value for the cuttings/cutting type dropdown? Use UDF? Need data analysis.
- How do we derive the new top-level propagation value for the cuttings/hormone dropdown? Use UDF? Need data analysis.
- How do we derive the new top-level propagation value for the cuttings/concentration dropdown? Use UDF? Need data analysis.
Notes from analysis
spent some quality time this afternoon analyzing data in the propagation tables for the Botanical Garden. We have a situation where they have requested a design that is significantly different from what they use now in SAGE. Effectively they are going from a situation where three core entities support the information (propagation efforts can have multiple activities; activities can have multiple treatments) to one where they have two entities (propagation efforts can have multiple activities). So the treatment data is getting moved around in different places. This is certainly a simpler model for new data going into the system (the old one was too complex), but we needed to do some quick data analysis to see what this would mean for legacy data.
The simplest approach (though still hard) is to combine records in activity and treatment together and map those fields to new CSpace fields. But we knew of several cases where they wanted to do some more involved mapping to combine multiple legacy activity-treatment records into one CSpace activity record. In my notes below I basically conclude that it is not feasible to clean up data like this, that in fact it does not buy much and is a very expensive operation.
Lam and Amy, you should take a look at this and see if you concur or think we should look at anything else.
Amy, I am certain we'll have to add some of those fields to hold the legacy treatment_type and such. As discussed I think it might be good to make those read-only and clearly labeled ("SAGE Treatment" or something like that).
John, Glen, and Ray, I'd definitely be interested in your thoughts here, though I'm not expecting you to spend serious time on this.
Thanks,
Chris
--------
SAGE schema (all tables have create and update info too)
propagation: 6738 recs
prop_id
accession_number
prop_date
early_prop_date
late_prop_date
prop_type
purpose
number_started
prop_comments
extra_seed_flag
gh_sequence
growth_history: 19929
prop_id
growth_id
activity_date
activity_begin_date
activity_end_date
activity_type_id
propagule_count
pot_type
pot_size_id
percent_success
growth_comments
medium_name
germination_date
treatment: 20451
prop_id
growth_id
treatment_type_id
concentration
begin_date
end_date
propagule_count
treatment_notes
display_sequence
This query gets the activity and treatment display names too:
NOTE: outside joins needed because there are activities that don't have treatments attached
select gh.prop_id, gh.growth_id, gh.activity_type_id, atc.activity_type, t.display_sequence treatment_seq, t.treatment_type_id, tt.treatment_type_name from growth_history gh join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id) left outer join treatment t on (gh.prop_id=t.prop_id and gh.growth_id=t.growth_id) left outer join treatment_type tt on (t.treatment_type_id=tt.treatment_type_id) order by gh.prop_id, gh.growth_id, t.display_sequence
with data like:
25 1 4 started 1 16 benlate
25 1 4 started 2 25 lesan
25 1 4 started 3 15 prop house
25 1 4 started 4 1 bottom heat
25 2 14 results (null) (null) (null)
25 3 2 changed pot 1 15 prop house
25 4 14 results 1 15 prop house
25 5 2 changed pot 1 14 holding area
25 6 3 planted out (null) (null) (null)
showing 6 activities, the first of which has 4 treatments. two activities have no joined treatment records. Without hard work this would end up as 9 activities in CSpace for this activity. And we would be adding the read-only legacy treatment type fields, etc.
How are activity_types distributed?
select atc.activity_type, count\(*) from growth_history gh, activity_type_code atc where gh.activity_type_id=atc.activity_type_id group by activity_type order by count\(*) desc
potted 4359
started 3620
changed pot 3256
results 3139
pretreat 2210
planted out 1120
changed location 1001
applied treatment 544
dead 358
discarded 145
changed medium 62
repotted 61
checked pot 21
divided 10
plant sale 7
moved 4
sow 3
cold strat 2
cutting taken 2
surplus 1
damped off 1
distributed 1
deaccessioned 1
returned to nursery 1
Treatment types, interesting, there is a treatment category field that identifies treatment types in several categories: null, disinfectant, environment, fertilizer, fungicide, hormone, none, pesticide, physical, scarification, stratification
How are treatment types distributed?
select tt.treatment_type_name, count\(*) from treatment t, treatment_type tt where t.treatment_type_id=tt.treatment_type_id group by tt.treatment_type_name order by order by count\(*) desc
lath house 4346
prop house 3119
intermittent mist 1677
cold stratification 1338
bottom heat 1309
prop house 1 1128
holding area 914
benlate 845
physan wash 813
lights 747
dip and grow 630
physan 590
IBA 543
soak in water 431
benlate & captan 261
IBA talc 250
benlate & lesan 211
warm stratification 162
research greenhouse 135
Hormex 97
sulphuric acid 95
water tray 92
prop house shade 90
smoke plus 79
subdue 58
shade house 51
boiling water on soil surface 50
scarify 40
GA3 32
lesan 31
Rootone 25
Truban 25
prop house 2 shade 24
hormone 18
benlate & physan 17
benlate & subdue 17
glass cover 16
physan soak 12
nursery 11
wardian case 11
lath house shade 10
Nurs 7
wound cuttings 7
boiling water 6
captan 5
fungicide 5
stratification 5
2-4-2 4
freeze 4
microwave oven 4
osmocote 4
pericarp removed 4
outside prop house 3
refrigerator 3
central prop house 2
fire 2
heat 2
liquid fertilizer 2
pericarp trimmed 2
physan dip 2
rear prop house 2
Serenade 2
bleach 1
bright location 1
cactus house 1
chipped 1
clorox soak 1
darkness 1
drenched in hot water 1
dusted with sulphur 1
force dormant 1
gravel mulch 1
green house #2 1
iron sulphate 1
metal ponds 1
NAA 1
pericarp ruptured 1
prop house 2 1
seed coat removed 1
shade 1
sterilized 1
sulphur 1
under mist bench 1
wash & clean 1
water & let go dry 1
water basin 1
There must be 4 unused treatments (91 in the table). Many of the top ones are related to location (lath house, prop house)
OK, so how are counts of combined activity-treatment records distributed?
select atc.activity_type, tt.treatment_type_name, count\(*) from growth_history gh join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id) left outer join treatment t on (gh.prop_id=t.prop_id and gh.growth_id=t.growth_id) left outer join treatment_type tt on (t.treatment_type_id=tt.treatment_type_id) group by atc.activity_type, tt.treatment_type_name order by count\(*) desc
275 rows, but here they are. See Excel attachment too.
results (null) 3089
potted lath house 1822
potted intermittent mist 1572
changed pot lath house 1377
planted out (null) 1116
started cold stratification 1102
started prop house 996
potted prop house 858
changed pot prop house 857
pretreat physan wash 792
started bottom heat 697
started benlate 613
pretreat dip and grow 613
started lath house 567
pretreat IBA 527
started physan 526
started lights 497
changed location lath house 484
started prop house 1 473
potted holding area 447
pretreat soak in water 428
changed pot (null) 361
dead (null) 355
potted (null) 353
changed location prop house 320
potted bottom heat 316
changed pot holding area 308
potted prop house 1 255
changed pot prop house 1 248
pretreat IBA talc 247
changed location bottom heat 226
applied treatment cold stratification 215
discarded (null) 145
started benlate & lesan 133
changed location prop house 1 130
started warm stratification 128
potted lights 125
started holding area 119
pretreat benlate 111
applied treatment benlate & captan 108
pretreat sulphuric acid 95
pretreat Hormex 94
pretreat smoke plus 77
changed pot benlate & captan 68
started prop house shade 64
started research greenhouse 64
changed location lights 58
changed pot lights 57
started water tray 56
started intermittent mist 54
started (null) 53
pretreat physan 53
started benlate & captan 53
potted benlate 52
applied treatment bottom heat 50
started boiling water on soil surface 48
changed pot research greenhouse 40
changed pot shade house 38
applied treatment benlate 37
applied treatment prop house 37
pretreat scarify 34
applied treatment benlate & lesan 33
pretreat GA3 31
potted water tray 31
changed location holding area 30
applied treatment warm stratification 30
changed pot benlate 29
applied treatment subdue 29
pretreat Rootone 25
changed medium (null) 24
potted benlate & captan 24
changed medium lath house 24
potted research greenhouse 24
results lath house 22
potted prop house shade 22
changed pot benlate & lesan 21
started lesan 19
results prop house 19
changed location (null) 18
started Truban 18
pretreat hormone 18
repotted lath house 18
changed pot intermittent mist 18
repotted prop house 17
started benlate & physan 17
started prop house 2 shade 16
pretreat (null) 15
started glass cover 15
applied treatment IBA 15
pretreat intermittent mist 15
changed pot subdue 14
changed pot bottom heat 14
potted benlate & subdue 13
potted cold stratification 13
checked pot (null) 12
potted shade house 12
pretreat benlate & lesan 12
repotted (null) 11
repotted prop house 1 11
applied treatment lath house 11
applied treatment physan wash 11
potted subdue 10
changed pot nursery 10
potted benlate & lesan 10
applied treatment dip and grow 9
started physan wash 8
checked pot lath house 8
applied treatment lesan 8
changed medium prop house 8
plant sale (null) 7
pretreat lath house 7
pretreat physan soak 7
changed location wardian case 7
applied treatment (null) 6
started dip and grow 6
potted lath house shade 6
applied treatment scarify 6
pretreat benlate & captan 6
changed location intermittent mist 6
potted physan 5
changed pot Nurs 5
pretreat stratification 5
applied treatment lights 5
repotted intermittent mist 5
changed location prop house 2 shade 5
changed location research greenhouse 5
moved (null) 4
divided (null) 4
potted Truban 4
repotted subdue 4
started physan soak 4
results holding area 4
started boiling water 4
pretreat microwave oven 4
pretreat wound cuttings 4
applied treatment captan 4
applied treatment osmocote 4
pretreat cold stratification 4
applied treatment intermittent mist 4
divided prop house 1 3
changed pot water tray 3
started wound cuttings 3
applied treatment physan 3
pretreat pericarp removed 3
potted warm stratification 3
changed medium holding area 3
sow (null) 2
cold strat (null) 2
cutting taken (null) 2
potted lesan 2
pretreat fire 2
pretreat heat 2
started freeze 2
dead prop house 2
pretreat lights 2
repotted benlate 2
changed pot 2-4-2 2
divided lath house 2
potted physan wash 2
potted dip and grow 2
potted wardian case 2
pretreat physan dip 2
pretreat prop house 2
pretreat bottom heat 2
results prop house 1 2
started refrigerator 2
started wardian case 2
planted out lath house 2
applied treatment 2-4-2 2
applied treatment Hormex 2
started lath house shade 2
pretreat pericarp trimmed 2
repotted benlate & captan 2
results intermittent mist 2
applied treatment IBA talc 2
applied treatment Serenade 2
started outside prop house 2
applied treatment fungicide 2
changed location water tray 2
changed medium prop house 1 2
applied treatment smoke plus 2
applied treatment holding area 2
applied treatment prop house 1 2
changed pot prop house 2 shade 2
applied treatment boiling water 2
applied treatment soak in water 2
changed location lath house shade 2
changed location prop house shade 2
applied treatment benlate & subdue 2
applied treatment liquid fertilizer 2
applied treatment central prop house 2
changed location cold stratification 2
surplus (null) 1
damped off (null) 1
distributed (null) 1
deaccessioned (null) 1
potted Nurs 1
results IBA 1
pretreat NAA 1
potted Hormex 1
started shade 1
divided lights 1
pretreat lesan 1
results lights 1
started captan 1
started subdue 1
dead lath house 1
potted IBA talc 1
pretreat Truban 1
pretreat bleach 1
pretreat freeze 1
results benlate 1
started sulphur 1
dead bottom heat 1
potted fungicide 1
pretreat chipped 1
sow prop house 1 1
changed pot lesan 1
changed pot Truban 1
changed pot physan 1
potted metal ponds 1
potted water basin 1
pretreat fungicide 1
repotted fungicide 1
started sterilized 1
potted cactus house 1
potted gravel mulch 1
results bottom heat 1
started clorox soak 1
potted force dormant 1
applied treatment GA3 1
changed location Nurs 1
changed medium lights 1
pretreat wash & clean 1
repotted holding area 1
started soak in water 1
checked pot prop house 1
dead intermittent mist 1
planted out prop house 1
potted rear prop house 1
pretreat iron sulphate 1
potted under mist bench 1
results benlate & lesan 1
started bright location 1
started rear prop house 1
applied treatment Truban 1
applied treatment freeze 1
changed location nursery 1
started benlate & subdue 1
potted outside prop house 1
potted water & let go dry 1
applied treatment darkness 1
changed medium bottom heat 1
changed pot green house #2 1
potted dusted with sulphur 1
pretreat pericarp ruptured 1
pretreat seed coat removed 1
results prop house 2 shade 1
divided research greenhouse 1
changed location shade house 1
changed pot benlate & subdue 1
changed pot prop house shade 1
repotted research greenhouse 1
applied treatment glass cover 1
applied treatment physan soak 1
changed location prop house 2 1
changed location refrigerator 1
started drenched in hot water 1
returned to nursery lath house 1
planted out cold stratification 1
changed location benlate & lesan 1
applied treatment pericarp removed 1
applied treatment prop house shade 1
pretreat boiling water on soil surface 1
applied treatment boiling water on soil surface 1
What we'd be really interested in finding out is if we can identify the situations where an activity has multiple treatment records, but the second is just a location. Then we could fold those together. But what if there are more than two? Use the date? What if there are only two but they have different dates? Then maybe they should be separate CSpace records. Ugh, too hard. Maybe just find all combinations of activity and treatment. Are there activities that don't have treatment records? Yes! Queries adjusted to do outside join to treatment table.
FYI, the number of records produced by getting all combinations of activity and treatment:
select count\(*) from growth_history gh join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id) left outer join treatment t on (gh.prop_id=t.prop_id and gh.growth_id=t.growth_id) left outer join treatment_type tt on (t.treatment_type_id=tt.treatment_type_id)
26027
growth_history has 19.9K and treatment has 20.5K so by loading all combinations we are not growing the number of records by very much beyond the ideal. Maybe you could argue that this is 5500K extra records.
---
Next day:
Propagation type
SELECT prop_type, count(*) FROM dbo.propagation group by prop_type
seed 3630
cutting 1704
living plant 1104
division 111
bulbs 88
seedlings 33
corms 17
bulb 12
transplant 11
corm 6
rooted pieces 5
spores 5
seedling 3
bulblets 3
rooted cuttings 2
bulb scales 2
bare root 1
tuber 1
ASKUCBG: Susan wants to reduce prop type to three values. Can we map these existing values to those three? That is, if we can't figure out how to not display inactive terms?
Germination Date
Good news: Germination date is always null!
Success rate
successRate in SAGE activity table is moving up to CSpace propagation successRate - will have to do data analysis to make sure there is one per propagation
SELECT p.prop_id, accession_number, prop_date, prop_type, gh.growth_id, gh.activity_type_id, atc.activity_type, gh.percent_success FROM dbo.propagation p left outer join growth_history gh on (p.prop_id = gh.prop_id) left outer join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id) order by prop_id, growth_id
When there is a non-null percent_success value it looks like the activity type is results. However, success_value can still be null even if activity type is "results"
The question is: Can there be different activity.percent_success values for a single propagation record?
SELECT p.prop_id, accession_number, atc.activity_type, gh.percent_success FROM dbo.propagation p left outer join growth_history gh on (p.prop_id = gh.prop_id) left outer join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id) where percent_success is not null order by prop_id, growth_id
Only 3 propagations have different results in growth_history records.
416 92.1029 results 5
416 92.1029 results 26
746 81.0829 results 71
746 81.0829 changed pot 99
746 81.0829 changed pot 99
2836 2002.0330 potted 5
2836 2002.0330 results 89
Concentration
** concentration in SAGE treatment table is potentially moving up to CSpace propagation concentration - will have to do data analysis to make sure there is one per propagation - or we can keep this as a hidden field in the activity group for historical data
SELECT p.prop_id, accession_number, atc.activity_type, t.display_sequence treatment_seq, tt.treatment_type_name, t.concentration FROM dbo.propagation p left outer join growth_history gh on (p.prop_id = gh.prop_id) left outer join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id) left outer join treatment t on (gh.prop_id=t.prop_id and gh.growth_id=t.growth_id) left outer join treatment_type tt on (t.treatment_type_id=tt.treatment_type_id) where t.concentration is not null order by prop_id, gh.growth_id, display_sequence
NOTE: Might use these queries to create merge fragments to merge with main propagation SQL. Otherwise have to find a way to get the non-null values, etc., in the main query.
NOTE: Might need to have the treatment_type come along in this query
567 90.0231 pretreat 2 dip and grow 1:5
567 90.0231 pretreat 3 IBA 0.8
610 83.1172 pretreat 2 dip and grow 1:3
610 83.1172 potted 2 bottom heat 1:3
610 83.1172 potted 3 intermittent mist 1:3
670 93.1051 pretreat 1 dip and grow 1:5
670 93.1051 pretreat 2 IBA talc 4.5
1402 94.0310 pretreat 2 dip and grow 1:10
1402 94.0310 pretreat 3 IBA talc 4.5
1478 91.1474 pretreat 1 dip and grow 1:5
1478 91.1474 potted 1 dip and grow 1:5
1551 77.0717 pretreat 2 dip and grow 1:5
1551 77.0717 pretreat 3 IBA 4.5
So I'm only about a 3rd through the 1375 records and there are 6 propagations that have different concentration values (and different treatment_types too)
Are the concentrations unique within an activity?
Here's an interesting query:
SELECT
p.prop_id,
tt.treatment_type_name,
t.concentration, count(*)
FROM
dbo.propagation p
left outer join growth_history gh on (p.prop_id = gh.prop_id)
left outer join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id)
left outer join treatment t on (gh.prop_id=t.prop_id and gh.growth_id=t.growth_id)
left outer join treatment_type tt on (t.treatment_type_id=tt.treatment_type_id)
group by p.prop_id,
tt.treatment_type_name,
t.concentration
having count(*) > 1
order by prop_id
---
10/9
Combined activity-treatment records
select gh.prop_id, gh.growth_id, gh.activity_date, gh.activity_begin_date, gh.activity_end_date, gh.activity_type_id, atc.activity_type, gh.propagule_count, gh.pot_type, gh.pot_size_id, gh.percent_success, gh.growth_comments, gh.medium_name, gh.germination_date, gh.date_entered, gh.entered_staff_id, gh.last_change_staff_id, gh.last_change_date, t.display_sequence, t.treatment_type_id, tt.treatment_type_name, t.concentration, t.begin_date, t.end_date, t.propagule_count, t.treatment_notes, t.date_entered, t.entered_staff_id, t.last_change_date, t.last_change_staff_id from growth_history gh join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id) left outer join treatment t on (gh.prop_id=t.prop_id and gh.growth_id=t.growth_id) left outer join treatment_type tt on (t.treatment_type_id=tt.treatment_type_id) order by gh.prop_id, gh.growth_id, t.display_sequence
ASKUCBG: Activity date in CSpace is calendar, but there are fuzzy dates in SAGE. 1058 dates of 18.5K are fuzzy.
Propagule count GH and T
Reconcile propagule count in GH and T:
SELECT p.prop_id, accession_number, atc.activity_type, gh.propagule_count, t.display_sequence treatment_seq, tt.treatment_type_name, t.propagule_count FROM dbo.propagation p left outer join growth_history gh on (p.prop_id = gh.prop_id) left outer join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id) left outer join treatment t on (gh.prop_id=t.prop_id and gh.growth_id=t.growth_id) left outer join treatment_type tt on (t.treatment_type_id=tt.treatment_type_id) where gh.propagule_count is not null or t.propagule_count is not null order by prop_id, gh.growth_id, display_sequence
> 5000 records returned, mostly GH by the looks of it.
where gh.propagule_count is not null or t.propagule_count is not null: 16216
where gh.propagule_count is not null and t.propagule_count is not null: 2
where gh.propagule_count is not null: 16216
where 2.propagule_count is not null: 2
For the 2 where t.propagule_count is not null, the values are identical in GH. Therefore use GH only.
Reconcile propagation.number_started with gh.propagule_count
SELECT p.prop_id, accession_number, prop_date, prop_type, number_started, gh.activity_type_id, atc.activity_type, gh.propagule_count FROM dbo.propagation p left outer join growth_history gh on (p.prop_id = gh.prop_id) left outer join activity_type_code atc on (gh.activity_type_id=atc.activity_type_id) order by prop_id, growth_id
where p.number_started is not null or gh.propagule_count is not null: 13556
where p.number_started is not null and gh.propagule_count is not null: 10136
where p.number_started is not null: 11011
where gh.propagule_count is not null: 12681
where p.number_started = gh.propagule_count: 7252
where p.number_started <> gh.propagule_count: 2884
ASKUCBG: What to do about 2284 where the values are not equal? e.g.,
22 85.1541 November 20 1985 seed 15 2 changed pot 1
22 85.1541 November 20 1985 seed 15 6 changed location 1
22 85.1541 November 20 1985 seed 15 9 dead 1
6771 96.1050 Oct 19, 2010 cutting 20 13 potted 11
6771 96.1050 Oct 19, 2010 cutting 20 13 potted 1
6773 96.1051 Oct 26, 2010 cutting 20 13 potted 16
6773 96.1051 Oct 26, 2010 cutting 20 13 potted 1
Do we need to keep number started in top level propagation record?
Instead? Get the union of activity records and treatment records? Try this out SQL.
March 5, 2013
Here's the SQL we are currently using to bring together the activity and treatment records:
-- activity records with good dates select p.accession_number, p.prop_type, 'activity' record_type, gh.prop_id prop_id, gh.growth_id growth_id, 0 treatment_id, atc.activity_type type_name, null treatment_category, gh.activity_date event_date, convert(char(12), dateadd(dd,(gh.activity_begin_date - 2415021), convert(datetime, '1900-01-01', 103)), 101) begin_date, convert(char(12), dateadd(dd,(gh.activity_end_date - 2415021), convert(datetime, '1900-01-01', 103)), 101) end_date, gh.propagule_count propagule_count, gh.growth_comments comments, null concentration, gh.pot_type pot_type, gh.pot_size_id pot_size_id, gh.percent_success percent_success, gh.medium_name medium_name, gh.germination_date germination_date, gh.date_entered date_entered, gh.entered_staff_id entered_staff_id, gh.last_change_date last_change_date, gh.last_change_staff_id last_change_staff_id from growth_history gh, activity_type_code atc, propagation p where gh.activity_type_id=atc.activity_type_id and gh.prop_id=p.prop_id union -- treatment records look OK too: select p.accession_number, p.prop_type, 'treatment' record_type, t.prop_id prop_id, t.growth_id growth_id, t.display_sequence treatment_id, tt.treatment_type_name type_name, tt.treatment_category treatment_category, null event_date, convert(char(12), t.begin_date, 101) begin_date, convert(char(12), t.end_date, 101) end_date, t.propagule_count propagule_count, t.treatment_notes comments, t.concentration concentration, null pot_type, null pot_size_id, null percent_success, null medium_name, null germination_date, t.date_entered date_entered, t.entered_staff_id entered_staff_id, t.last_change_date last_change_date, t.last_change_staff_id last_change_staff_id from treatment t, treatment_type tt, propagation p where t.treatment_type_id=tt.treatment_type_id and t.prop_id=p.prop_id order by prop_id, growth_id, treatment_id