Skip to end of metadata
Go to start of metadata

See attachments for whiteboard photos and audio of meeting discussion.

Discussion Notes

Some topics discussed - questions to answer:

  • Ways in which this service co-exists with other services in the Services Layer.
  • Tenant-based usage.
    • Tenant context at start-up and shutdown.
  • Whether the schemas for CollectionObject, etc., are correct with ID Service usage.
    • Where binding occurs.
  • How service is configured.
  • How ID generators are associated with other services; what type of context?
  • Migration from one server to another. (Follow-up: And one tenant to another? Which ID generator instances may be shared and which are exclusive to a single tenant?)
  • If plan to cache state, how data is reinitialized on restart.
  • How to deal with data sent in responses that doesn't reach the requestor (e.g. due to network errors, requestor going away).
    • Write response on a stream, then write when confirmation? Possible conclusion: may be overkill .
  • How to deal with rapid, near-simultaneous, possibly overlapping requests. (E.g. Non-safe requests, such as creates.)
    • IDResource as singleton class.
    • Use SELECT FOR UPDATE, supported by all major databases.
    • (Discussion of lock period, even a short one, with pros and cons ...)

Database table schemas

ID Service binding

  • Tenant ID
  • ID Generator ID
  • Text (e.g. tag or equivalent)

ID Generator definitions

  • Display name
  • Description
  • ID (can even be simple integers)
  • Class
  • Algorithm
  • Version
  • Parts (can be normalized into their own table)
    • Display name
    • Desciption
    • Class
    • Algorithm
    • Version

For Further Discussion

Other table schemas

  • ID Part values
  • Tracking of IDs within a deployed system

Near-term Follow-ups

JIRAs to add

  • Move hard-coding of database configuration for ID Service to external configuration. (Where feasible, share at least part of this configuration with common services layer configuration.)

4 Comments

  1. All,

    Some further thoughts from yesterday (preceded by some too-tedious context setting...sorry).

    -Steve

    Context

    In yesterday's meeting we began to discuss a current design for ID Generators that serializes IDGenerator classes to a database blob column, including the generator's state (i.e., data that enables the generator to track which was the last-issued ID and to derive the next when requested). One issue that came up was concurrency: how to lock an ID generator so that it does not issue multiple IDs based on (concurrent) requests for an ID that – because the second is issued prior to committing the incremented state back to the database.

    Solutions proposed included the question of when in a request response cycle to lock and release an IDGenerator: e.g., locking only until a response is issued to the requestor; locking until the requestor confirms receipt of the generated ID; locking until the record that includes the ID is saved by a user. We all agreed the latter cases would be unworkable because they would inappropriately block the application.

    Sanjay suggested that SELECT FOR UPDATE be used on IDGenerator rows to delegate responsibility for locking to the database; the row would be released when the appropriate class successfully updated the data that includes incremented IDGenerator state, assuring that multiple IDs based on the same state would be generated/issued. The sequence of events:

    1. SELECT FOR UPDATE
    2. Generate ID
    3. Commit incremented serialization of IDGenerator to the database

    would be quick, reducing the worry of unworkable blocks to application function.

    Further thoughts

    I'm thinking that there are a couple of ways to improve this scheme and further streamline the time taken to generate IDs. The core idea is to bypass synchronous reliance on the database altogether. (I think this may well be where Sanjay hoped we were heading, and I hope this post doesn't preempt collaborative arrival at a new paradigm – apologies if I'm blowing the process here, Sanjay!)

    I wonder whether the following are the classes needed:

    • a handler class for id generation for each tenant. requests for id generation would be dispatched to the proper handler based on tenant context.
    • an IDGenerator for each tenant-generator to which the above handler will dispatch requests based on the type of ID to be generated
      • these are singleton classes, and would maintain state (including last ID generated, from which next ID can be algorithmically derived)
      • these classes would queue requests, but would be able to process requests without database calls, because:
        • these classes would persist their (changing) state asynchronously to the generation of IDs ... that is, they could persist themselves each time an ID is generated; after N IDs are generated; on a schedule (every N seconds); on destruction of the class instance (e.g., on shutdown of the instance)
        • on instantiation, these classes could be deserialized, as in the current scheme; or, perhaps more helpfully, could instantiate themselves from retrieved database rows that describe last-generated-id and next-id algorithm(s) [more helpful because an administrator could look at the data and see something meaningful rather than a blob of serialized Java class]
      • incrementing state without database calls (e.g., via a Map of some kind) will be faster than incrementing state via synchronous database selects and updates
      • IDGenerator classes functioning in this way could be enabled to retain a 'memory' of generated but not-yet-used IDs, so that after a user-initiated cancellation or a timeout period those IDs could be re-issued – if that behavior is desired (this could be a configurable IDGenerator behavior)
    • handler and IDGenerator classes could be destroyed if the class has been idle for some long time – e.g., in order to conserve memory in multitenant deployments where some tenant goes dormant for a day, or a week, or a month
    1. Steve, Could you elaborate on what you mean by "persist their state asynchronously to the generation of IDs"? Secondly, what are the implications regarding recoverability with this approach? What happens if the server crashes and the state of the ID generator was not persisted. Considering CollectionSpace as an enterprise-class application, I would vote for correctness more than performance at this stage in the project.

      1. "persist their state asynchronously to the generation of IDs": e.g.,

        1. receive request
        2. generate ID
        3. respond to request
        4. persist changed ID generation state

        "What happens if the server crashes and the state of the ID generator was not persisted."

        If server crashes between steps 3 & 4 of the above scenario, which would be a matter of milliseconds, I think, any transaction in a user participates and in which the generated ID might be used would be trashed as well, wouldn't it? The user would not have time to receive the ID and then submit the record in which it is situated. If there are use cases in which an ID is generated and utilized w/o user action (e.g., click "Submit") between those steps, then protecting against server crash may well be more important than performance.

        In our discussion a couple weeks ago, concern seemed weighted more toward performance (esp. in multiuser scenarios), but I may be missing a larger picture. Even if the order of steps were reversed however, to assure recoverability at the expense of a single db call between request and response:

        1. receive request
        2. generate ID
        3. persist changed ID generation state
        4. respond to request

        ... this scenario would involve one database call per request rather than two (fetch state from db, increment state, write state to db) as we were discussing the week before last.