Friday Developer Meeting notes - ADR

Rough notes from Developer Meeting sessions on Friday, 2012-02-10

The following notes for this session are sparse - randomly selective

Review of the Roadmap to 3.0 Development Priorities page on the wiki

Work on vocabulary and authority control
Chris P: need to inform that work to incorporate support for multiple languages

Software wizards
Talk with Chris and Nate about scripts used at the Walker

Hovering over a term in term completion - see its context in the tree

edit section on proposed additions for media handling
3rd bullet on thumbnails
to limit scope to cataloging record

At Berkeley, want to propose structured objects soon

Can't just do at Berkeley, consult with, even have help from App and UI

Perhaps do some remote paired programming

Challenges on funding and resources for work not Mellon grant funded

In principal, we can spend IMLS funds at Berkeley to work on these

Version 2.2 Place Authority at least should be far enough along

Makes sense that core dev team should be able to do reviews on contributions to core
Overhead on the project

Chris M and Yura can consult, give us feedback on approaches
but we can't ask tasks to them

Perfectly appropriate for the community to suggest substitutions, or reordering
of priorities on the development roadmap

Patrick gives Megan and Angela some idea of difficulty, time estimates

As more people know how to do more across the system, like Chris and Richard, Yura and Amy ...

More and more, Chris is going to be telling us where to go and how to do stuff
Chris will be working on restructuring of internal app code
UISpec stuff
Eliminating at least part of artificial distinction between procedures and authorities
Will help introduce hierarchy into objects, for example
Talk to Chris if you plan to do app work, because of these changes

Taking a contribution in

As an example for Place:

Chris:
I'm usually asked to integrate near the end of the sprint
That means that I'm busier at the end
Better opportunity to review early on

Would require payloads in XML and JSON
Changes to tests
Resources provided for running tests

Wiki page
On what we each need, in our layers, before reviewing a contribution
Things we want ready before we consider this

Yura:
Similar to Chris
UI is entry point for people to identify problems, new features
Even if issue isn't in the UI itself
That means that I'm busier at the end of the sprint
Better opportunity to review early on

Not much impact on code itself
Addition of some templates, message bundle, configuration file for that record
Presence of the 'fake' pseudo-app layer UISpecs and UISchemas in the 'test' directory
so can run UI app locally without any developer
Just requires grabbing it from the running app
Those are the same files that Chris needs, as well
UISpec, UISchema, dummy payloads (XML, JSON)

Code submission process with GitHub

GitHub will help with this
A true queue of pull requests

I submit a pull request, someone makes some changes, do I need to cancel and re-submit pull request?
Just merge into the branch
It's not like submitting a patch
More like a merge path between the branch the pull request was submitted from and the main branch in the repo

Can add in-line comments

If you're not happy as a reviewer, can close it and ask for rework and resubmission

We should let people know, especially outside of our immediate group, that it was received and some idea of when we might look at it

merging into a branch is work

We could treat it as a workflow, 'these six steps need to be checked off ...'

There is an ordering issue
The app can't accept anything that the services haven't committed

Functionality test in this contribution review?

My understanding, you start on the Talk list, wireframes, etc.

Different from actually using it.

Scripted test during QA

The stuff running on a server may have a lot of local extensions

In theory create another tenant

Start out by sticking into a sandbox tenancy, play with it a little on Nightly

What if the contributions aren't self contained? If they need contributions from the core, like special searching?

Will need to be raised early - an up-front thing

With the first several contributions, we'll be playing it by ear
Many contributions may be relatively low-risk

If Nightly is effectively where the core dev is going on, may need a second slice for contributions
for internal team to throw up contributions and do first QA

Contributing team has its own server
Assuming they can open up a server for that
First functionality - look at it in core tenant, or a sandbox tenant

Chris would also like to see performance logs, to ensure no fanout

Jesse: generic VM image in which to test out

Chris at Walker: how far out is it that I can give you a directory

If the core is going to own it, it needs to be a pull request with the changed files
on one, two, or three repositories on the collectionspace project

the mini-build could include some convenience targets to merge these files
back into a tree

queue of submitted contributions should be public
so everyone can see it

broader community won't care about pull requests, and instead
look at wireframes, schemas, etc.

announce on Talk list, put up on a wiki - doesn't matter where it is,
can link to it from email, some page

Use git to manage contributions
branch on own repo
If testing an in-process contribution

Will be a contrib area for things we don't own
What does a community contribution area look like?

May be another repo, within collectionspace the project
Or can just be a public repo (anywhere), "just fetch it from here"

Tension between seeing what's been contributed and having them all in one place
and not having a maintenance headache for the core team

If we had a community repo under collectionspace
can we break it down into sections?

Yura: organization concept, like collectionspace,
then repos within the organization, that have teams assigned

If a trusted organization, could all be in the collectionspace org
each in their own repos

Community repo might not need to be a branch of the core repo

Org:Cspace
UI
App
Services
Contrib

Org:CSpace-UCB
UI
App
Services
Contrib

Can't have partial pull requests; it's all or nothing.
And the file structures at some level, no matter how deeply nested, need to match

Contrib can be links to repos

Main master of forked repo won't have any contributions in there
In your best interests to keep it up to date with main collectionspace master
All the rest will be documented in the branch

If it's a heavyweight contribution, you could have a branch in each layer in your repo.

Create a GitHub project
Your master is kept synced with master on the collectionspace repo
On the contributors repo, there should be a feature branch
in ui, app, svcs

In the main collectionspace repo,
we create corresponding feature branches for each layer

Org:Cspace
UI
master
Place (forked from Org:Contributor's Place branch)
...

Org:Contributor
UI
master
Place
...

Retire feature branches once pulled fully into core and accepted

On the wiki
there's a contribs area
links to contributors project pages on GitHub
or instructions

Could tag place
Could later contribute an update to Place

Chris from Walker:
If minibuild were part of each tree
Could conceivably use submodule init

other things beyond services/app/UI layer change:
some free structure on this
import files
ETL template
Perl script to do data cleaning on AAT
etc.

Will evolve over time

Wiki page for contributions
Not by organization, necessarily

Each needs minimal metadata
Description
Institution
Contact info (probably at least email)
Version last tested with
Special system requirements, prerequisites, dependencies, install instructions

Tagging would be nice

Feedback comments go in the Comments on the page

Patrick has an image of the whiteboard which can be attached to this page

Field level permissions

An idea for level perms

issues with it

at UI layer, being able to hide or expose fields with templates
least secure but one way to approach it
- may require association of roles with templates
- this is the easiest

can have map of field level perms that the UI could use to hide the data, regardless of template
- here's what this user is allowed to save
- services can enforce whether a record can be saved
- no-read-access fields have to be hidden altogether in the UI
- involves real work, but do-able
- real problem is with search, if I can do a search and some records come back, I know the data is there
- may not scale, particularly with large cataloging record and n extensions

proposing that fields come in classes
- by default, are in an un-sensitive class
- could have financial, cultural sensitivity classes, entirely user defined, including their names
- would be configured, with no UI management
- would have associated permissions with them
  - could say, "this role is allowed to view financially sensitive fields", etc.
- could create keyword indexes for each class
- at search time, we can say, which indexes does this user have the rights to; can search just the indexes to which they have access
- as an alternative, run a post filter on the results, makes searches brutally slow, requires fetching objects or at least their metadata

Chris M:
Where within the services schema would you envision marking this against fields?

If services derived from app, would be changing core configuration

Another possibility, a separate section where we declare classes, and refer to the paths (XPaths?) to these fields

Chris H:
With multiple tenants?

One of the changes we're making for SaaS support is to separate repositories for each tenant
to help facilitate backup and restore
also solves the problem where at the report level, can see each others' data

Susan:
Can you still have field-level write permission?

The idea is to move away from the unit being the field, no read for some roles, read-only for other roles
No access - no read - problem is the hardest
The read-only problem is not as big a problem - the services already have support for filtering fields out of payloads when they come in

In the UI, using the same fields, but applying a disabled decorator
to make read-only templates

could traverse and make selected fields read only

currently, already have templates where a field is disabled via autocomplete if the user doesn't have rights to the authority

Chris H
One of the main use cases is the student who is supposed to enter data for selected fields in a collectionobject
could use have a template, have grayed out fields

first step, "you should only use this template"
second step, "associate roles with templates", restricts use on the UI level

TMS doesn't enforce 'no read' access via search

Nate:
if that's what we launched with, at the beginning, we'd be fine

our use case on this is the valuation problem
on valuations or insurance coverage

Chris H:
most important for us is to restrict students to data entry in selected fields

when app layer fetches an object, will also need info from the services what template to fetch, for the current user based on their role
relatively straightforward

template based on the login status of the individual, which is available first even before the object is fetched

one of the things people do want to be able to say, is that I'd like the 'coins' template for that object

personalization issue, track for each person, which template did they use to enter the object last
will need some means of fetching full template
and that would need to be restriction-checked

if there are multiple templates legal for the user logged in, which they wish to use

parametrized permissions, how to return the correct data
services also don't support setting permissions on individual objects

nuxeo is capable of doing that now, slows search down a lot

Want people to go away and 'grind on' the idea of having classes of fields
rather than per-field permissions

Chris M: think this sounds good
Knock-on issue of whether one of these fields might be a name or number field for search results summary

Chris P: this will meet our needs

Import

Susan:
After Richard's fixes, import is now accepting 5K imports routinely
at some point, need to restart Tomcat from time to time

Would like to get to 10K and perhaps more in imports

Maybe some issues with macro substitution, ampersand substitutions, etc.

Had seen these issues even with the Java Client Library in the past

Chris P:
Using Talend
Data imported directly from a MSSQL database via a Talend module
Go through transforms, merging together

For output part, were using 'Advanced XML'
Problem with not supporting multiple loops in a tree
Instead, using Java, using JAXB-generated classes
if I have repeated fields, turn them into an object list
lots of modules with code, inside of Talend
write code to map objects

originally was using services APIs to import
now using import service
just using JAXB bindings to create the XML
if anything's changed in the schema, I get a compile error
which helps me find out what to change
just import new libraries in, and see what doesn't compile

is on the main CollectionSpace wiki

Yuteh:
Generating individual files for main record and repeatable structures
Previously using Susan's 10K record merge
Doing merging
XMLMerge works for smaller files like Person
CollectionObject is too big
Richard: it probably can be made to work

Am now splitting every 3K (78 MB, not including any repeating)
have hundreds of files
if I can keep my delta in one file
can run this over and over again on the same delta file

Susan:
Have 5K, but my objects at MMI aren't as large as PAHMA's objects

Patrick:
Pulled all 600K records, denormalized into 1 million rows, for Delphi, 270 MB

Yuteh:
Wrote Java code to strip off 'easy' empty records

Chris P:
Last did this in 1.7, don't recall how many batches, wasn't too bad
Have 46K object records

Chris H:
Talend XML generator has 'create elements even if empty checkbox', is checked ('on') by default

Susan:
Required in groups and lists, perhaps in repeatables

Relations are difficult with custom extensions

Patrick:
Should be able to have generic doctype in there
Richard will think about this
Already marked with a tenant
We don't need that in the doctype

When filtering relations, could do a stem search

Richard:
Nuxeo shouldn't care, due to its derivation model, if derived from the common doctype

Susan:
When custom tenant isn't there

Richard:
The fix will mean that you won't need to re-import the relation records; you can leave the tenant-qualified doctypes in there

Susan:
Display predicate name in relation not used; different in app layer?
Doesn't appeared to be used at all

Richard:
Dan asked for this some time ago

Chris at Walker:
Hooking up Talend right now

Nate:
Sending payloads now using the services
The downside is you can't run it again, without querying whether the object already exists
May not necessarily be a bad fit for us

There was a set of tools that could take various data sources, transform, spit out uniform
Kettle

Susan:
I assemble the XML myself in JavaScript
Kettle lets you make fragments and assemble them in JavaScript
Quick and easy

Patrick:
Talend can import a schema and generate XML in that schema

Nate:
Pre-populate CSIDs with GUIDs?

Susan:
Yes

Richard:
Easier for creating relations

Chris H:
Simple Java method to get a GUID/UUID, which you can put in your CSID in Talend

Nate:
Collection we're importing is 11K objects
Even if we have to do it again, talking to services is appealing to us
Might look again at Talend, Kettle
Our starting data is in CSV files from FileMaker Pro, I can generate good CDWA Lite data from that

Chris P:
Relations, movements ... not just objects

Susan:
By sheer number, the relations are the most

Patrick:
If use import, you can prepopulate with CSIDs, with all relations using those, etc.
If use services, you will need to retrieve what you imported to get their CSIDs
Speed difference using import - close to an order of magnitude advantage in speed over services
If you're fiddling, that speed difference can be important

A Talend script importing from CDWA Lite would be interesting to many people

Can export a job from Talend, and someone else can look at what you've done

Chris H:
Talend is great, but has its own mindset

Not always clear about what should be shared

Yuteh has been creating some great documentation; e.g. on creating relationships

Chris P:
Has a page on the main wiki about what he did in 1.7

Would be really good would be a standalone output module
You can do whatever you need on the import side
But the maintenance is quite high on that, while the schemas are changing

Might be a significant benefit in a monthly implementer's call
Problems go out on the Work or Talk list
But successes don't always get reported or discussed

Richard:
It's possible I could get the Nuxeo shell and/or Webapp installed

If you get the Nuxeo DM webapp and configure it to point to the right repository settings used in CollectionSpace now
You can run it in its own container; it doesn't need to be in Tomcat or the same Tomcat
The configuration settings that are in Tomcat might be enough to figure this out

The worst case might be that you need to shut down / undeploy CollectionSpace while using the console or shell for an export, but you might not need to.

Most valuable?

Nate:
When Richard working with Chris
Ray working with me
When Jesse worked with our iimplementer in the past

Could be done in Skpe
with shared screens

Susan:
Badly need to get rid of old docs

Chris M:
Search on the wiki is very difficult - sometimes need exact titles
e.g. services APIs

Chris H:
Peer sessions are vital
Monthly sessions?

Chris M:
Anyone can use the Adobe Connect space
sultan @ caret maintains it

Nate:
IRC logs valuable
searchable on the wiki