Skip to end of metadata
Go to start of metadata
Icon

work in progress

One of the core deployment requirements of CollectionSpace is that it should host collection management services of one or more institution (see Tech Thinking for CSpace.org).

One approach to satisfy this requirement and achieve the economy of scale would be to adopt a multi-tenant architecture. According to Wikipedia, multitenancy _refers to a principle in software architecture where a single instance of the software runs on a software-as-a-service (SaaS) vendor's servers, serving multiple client organizations (tenants). With a multitenant architecture, a software application is designed to virtually partition its data and configuration so that each client organization works with a customized virtual application instance.

Supporting multiple tenants requires a top-down evaluation of the software components in the typical enterprise software stack such as CollectionSpace Service Architecture. This document examines the CollectionSpace architecture with a view of supporting multiple tenants.

Schema Extension

The page Schema extension in CollectionSpace does not exist.

Schema extension in CollectionSpace documents the implementation approach in details.

Storage

There are three kinds of storage systems expected to be used in CollectionSpace. The CollectionSpace runtime would utilize the tenant-specific data available in the runtime context while performing IO operations to isolate the data per tenant. How the runtime context would be populated with tenant-specific information is addressed in the section on security. Let's assume that the tenant specific information is available at the time of performing storage operations.

  1. Document repository (backed by SQL datastore)
  2. SQL datastore
  3. File system

Document Repository

As described earlier, Nuxeo is used in CollectionSpace predominantly for entity object management. Nuxeo Repository is the core component providing this functionality. A repository stores documents representing CollectionSpace entities in a tree-like structure that enables grouping documents inside folders in an hierarchic manner as shown in figure below.

nuxeo_repository_hierarchy

ERROR

Gliffy is unlicensed. Please install a license to draw diagrams in your wiki.

The top-level node in this structure is called a Nuxeo domain (that is different from CollectionSpace domain). Each Nuxeo repository instance by default creates a Nuxeo domain. The diagram above shows workspaces for Collection Object and Location entities of CollectionSpace system.

Workspaces are children components of a Nuxeo domain. A workspace is used to store documents of a document type.

In Nuxeo repository, the document types are generally shared between one or more organization(s) using the repository. However, in a multi-tenant environment, this is useful only for those document types that are shared between the tenants. In CollectionSpace, some document types could be specific to a tenant or a group of tenants because of the Schema extensions feature. See tenant-aware repository binding for more details.

In the following sections we describe and discuss several approaches that could be considered to accommodate the multi-tenant scenario in Nuxeo Repository.

Approach #1 Repository per tenant

In this approach a Nuxeo repository is assigned per tenant. CollectionSpace would use SQL datastore backed Nuxeo repository. According to this approach, there would exist a separate J2EE datasource per tenant. Such a datasource should be created at the time of tenant provisioning.

repository_per_tenant

ERROR

Gliffy is unlicensed. Please install a license to draw diagrams in your wiki.

Pros:

  1. This approach offers a very clean isolation between two tenants. Repositories are not shared so documents and document types are kept in two totally different storage areas.
  2. Backup and restore and repository level is possible.
  3. Increase redundancy. If one repository is not available, it will affect the availability of the services only to a single tenant. Other tenants using other repositories could continue to work.

Cons:

  1. One of the major drawbacks would be that no two tenants would be able to share document types or component schemas of a document type. The document types or schemas would need to be replicated across Nuxeo domains (is this true?).
  2. This could be a very expensive configuration as it requires a repository, a relatively heavy and expensive resource, to serve only a single tenant.
  3. Proliferation of repository specific configuration. Hard to manage and upgrade.

Issues:

  1. Does this mean a separate nuxeo database (in MySQL) is required per repository?
  2. Would Nuxeo recognize a new repository configuration dynamically?

Approach #2 Nuxeo domain per tenant (also multiple tenants per repository)

Under this scenario, each tenant could be assigned a Nuxeo domain as shown in figure below. Multiple Nuxeo domains could reside in the same physical Nuxeo repository. CollectionSpace service layer would keep an association between the domain id (or name) and the tenant id (or name). A Nuxeo domain could be created at the time of tenant provisioning.

Nuxeo domain is just a core document type. In Nuxeo APIs (repository session), the get, update and delete operations on a document do not require a fully-qualified id (domain-workspace-document id) as document ids are globally unique. Only create operation allows to provide parent id. We should explore if PathRef could be used instead of IdRef for get, update and delete operations. That way we could create a fully-qualified path to the document.

domain per tenant

ERROR

Gliffy is unlicensed. Please install a license to draw diagrams in your wiki.

Pros:

  1. There would be clean isolation between document instances that belong to two different tenants.
  2. All document types within the same repository are shared between different tenants. No replication.

Cons:

  1. Single repository instance could become a single point of failure. Availability of service decreases as more tenants are affected due to failure of a single repository. This could be mitigated by scaling out.
  2. Nuxeo's hierarchy table that is accessed to browse anything in the repository becomes very hot.
  3. No isolation. All the document types (or CollectionSpace schema templates) will be available to all the tenants sharing the repository even if not relevant.
  4. Additional routing and look up logic would be required in the service layer to identify the repository hosting a tenant found in the context at the runtime.

Issues:

  1. Use PathRef instead of IdRef - Resolved 11/02/09

Tenant-aware repository binding

As discussed earlier, there are various elements of the Nuxeo repository that might need tenant-based isolation. These would include ...

  1. Nuxeo Domain
  2. Workspaces
  3. Document types
  4. Schemas

A tenant-aware binding would include these repository specific components. Detailed explanation about various elements of a tenant aware binding is provided in tenant-aware_bindings.

SQL datastore

In CollectionSpace v1.0, we plan to use MySQL 5.x as a SQL datastore for storing various kinds of data for various purposes. This could include out-of-the-box user registry, organization registry, audit trail logs, CollectionSpace ID space, and any information that requires reliable persistence and recoverability.

Each table that could possibly have any tenant-specific information should add a Tenant ID column and create an index involving that column. For example, the following diagram shows a users table with each user qualified by tenant.

basic_table_tenant

ERROR

Gliffy is unlicensed. Please install a license to draw diagrams in your wiki.

File system

Various configuration related artifacts are stored on file system by CollectionSpace as well as the infrastructure used by the CollectionSpace service layer. These includes XML schema files, various properties, log files, connection (to database, ftp servers, identity providers, 3rd party web services, etc.) related configuration, etc.

There would be some files that would be tenant specific. For example, CollectionSpace might have to use a specific identity provider (e.g. Berkeley CalNet server) to host Berkley museums. Some of the files would need to be isolated per tenant. This could be achieved in the following manner.

  1. Create separate directories per tenant
  2. Qualify each file name with tenant id

Former might be a better approach from maintenance, backup and management perspective. However, depending upon the expected extension requirements for each artifact, appropriate mechanism would be chosen.

Search and Query

Provisioning a tenant

Before using CollectionSpace to manage collections, a potential tenant is required to complete a few steps. Following captures the kinds of information required at the time of provisioning.

  1. Organization related information including name, address, recognized registration code(s), email, web address, etc. The system would create an entry into its Organization database. (Required)
  2. Selection of a museum domain from the list of museum domains available (Required)
  3. Registration of a user in the role of "tenant administrator" (Required)
  4. Roles and permissions for managing tenant's collection in CollectionSpace (Optional). An out-of-the-box roles and permissions for a collection management system would be available.

In addition to the above steps, a museum might choose to customize schemas of various entities and relationships in CollectionSpace as described in Schema Extension. This would require additional steps including creating new document types, optionally creating workflows, generating and deploying tenant-aware bindings, testing, etc. Deployment of tenant-aware bindings would involve creating necessary domain and workspaces in repository among other things.

Who has the authority to provision a tenant?

Not all users of a potential tenant organization may have authority/knowledge of information required for provisioning a tenant in CollectionSpace. A designated user from the tenant's organization (e.g. tenant administrator, admin@hearstmuseum.berkeley.edu) should to first provision a tenant and then register her/himself as a user with tenant administration privileges. S/he can then invite fellow co-workers to register with appropriate tenant name or id. How a designated user is chosen by the tenant organization is not a CollectionSpace issue.

This approach would also work in case a user is associated with more than one tenant.

Security

In this section, we discuss the impact of a multi-tenant environment on security architecture. The following topics are covered.

  1. Account provisioning
  2. Authentication Process
  3. Authorization
  4. Audit trail
  5. Callout

Account provisioning

The page Account Service Description and Assumptions does not exist.

Once the registration is complete, the system would know how to associate a user with a tenant at the time of login.

Authentication Process

The page Authentication Service Description and Assumptions does not exist. The page Authentication Service Description and Assumptions does not exist.

Multi-tenancy related implications on authentication

The page Authentication Service Description and Assumptions does not exist.

Tenant-qualified security context

The page Authorization Service Description and Assumptions does not exist.

Single sign-on

Authorization

The page Authorization Service Description and Assumptions does not exist.

Multi-tenancy related implications on authorization

The page Authorization Service Description and Assumptions does not exist. The page Authorization Service Description and Assumptions does not exist.

Audit trail

Callout

Impersonation

Trusted subsystem account

Runtime

Metadata services

Configuration

Tenant-aware Bindings

CollectionSpace will have tenant-specific bindings for various components. A binding may include information listed below and more...

  1. Tenant identification (tenant ID, name, display name, etc.)
  2. Repository domain ID assigned to the tenant
  3. Services used by the tenant
  4. Service bindings including
    1. CollectionSpace entity schemas
    2. Repository workspace where objects are stored for the tenant
    3. Document types, etc.
  5. Roles and policies for access control
  6. Vocabularies used
  7. Application configuration in XML
  8. UI specific artifacts such as stylesheets, icons, images, etc.

Such a binding would be created/generated at the time of provisioning a tenant. Following describes some of the major components of a tenant binding. The schema for tenant binding is here.

TenantConfigBinding

tenant config

TenantBindingConfig contains one or more tenant binding of type TenantBindingType.

TenantBindingType

A TenantBindingType consists of the following.

  1. Each tenant binding consists of one or more service binding. A service binding represents a CollectionSpace service used by the tenant.
  2. A system generated unique tenant ID. This identifier is generated when a tenant is provisioned in the system.
  3. A name for the tenant. Usually this would be a domain name without TLD suffix. It should also include subdomain if applicable, e.g. pahma.berkeley.
  4. A displayName is English-like name of the tenant. For example, "Museum of Moving Images".
  5. Version indicates the version of the tenant binding.
  6. Each tenant is assigned a separate space in repository. This space is called repositoryDomain.

Following is the XML data type for the tenant binding.

tenant binding

ServiceBindingType

A ServiceBindingType describes a tenant specific binding for a CollectionSpace service. This includes the following.

  1. The meta-data for object being exchanged with the service consumer. The object could be a CollectionSpace entity object such as CollectionObject, Loan, Intake, etc. Such an object could consist of one or more parts based on how the common schema is extended.
  2. The reposiotryClient indicates the name of the repository client used by the service to store the object data.
  3. The name indicates the name and version indicates the version of the CollectionSpace service.

Following is the schema for ServiceBindingType.

service binding
ServiceObjectType

ServiceObjctType describes tenant specific meta data for each CollectionSpace entity served by a distinct service. This part of the binding is created by the tenant administrator. It contains the following elements.

  1. One or more tenant defined properties.
  2. One or more parts the object is made up of. For example, a collection object could consist of parts such as system data, common collection object data, museum domain specific collection object data and tenant specific collection object data.
  3. A serviceHandler is an application defined handler used to process the object data. This handler should implement the ServiceHandler interface (TBD) and follow the packaging instructions (TBD) to deploy it at runtime.
  4. Each service object is uniquely identified using ID, name and version.
  5. A tenant administrator could update the timestamp to indicate when this service binding was updated. A tenant provisioning service/tool (TBD) could update this field as well.

Following snippet shows the schema for the ServiceObjectType.

service object type
ObjectPartType

An ObjectPartType represents the metadata for a part of a CollectionSpace entity. Following describes the elements of an object part metadata.

  1. Pre-defined properties (name value pairs) by the tenant.
  2. Content meta data of the part as described by ObjectPartContentType. Various types of content is possible. Examples are XML content, printed-quotable content, based 64-encoded content, etc. The binding only contains the meta data for the content that is marshalled or unmarshalled at runtime by the service.
  3. The control group indicates to the service where the content resides. This is similar to Fedora control group. Following are the control groups supported by CollectionSpace.
    1. External indicates to CollectionSpace that the content resides in a 3rd party data store, e.g. MediaVault or CalPhoto.
    2. Managed content is actively managed by CollectionSpace. A collection object stored in CollectionSpace repository is marked as managed content.
    3. Inline content is the content that is provided by the tenant at the time of provisioning of the service. This is a static content that is inserted by the service.
  4. If a part content is required to have version, it should be marked as such using the versionable attribute. By default a part is not versionable.
  5. A change in the part content could be auditable. While the global audit option would be available at the tenant level (TBD), at individual part level an audit could be turned on or off by using the auditable attribute.
  6. The label is an application defined lable to uniquely identify the part in service operations (CREATE, READ, UPDATE and INDEX). This label is used to identify and retrieve part content from a multi part MIME message exchanged with the consumer.
  7. A tenant administrator could update the timestamp to indicate when this part metadata binding was updated. A tenant provisioning service/tool (TBD) could update this field as well.
  8. The order attribute indicates in what order this part should be processed by the CollectionSpace service.
object part type"controls=true

ObjectPartContentType

The ObjectPartCotentType describes the meta data for the object's content. Following consists of the part content meta data.

  1. An optional content digest. The digest could be created using any of the supported digest algorithms as shown in XML schema below in ContentDigestType.
  2. The type of the content. It could be XML content, base64 encoded binary content or a reference to the content found outside of the CollectionSpace. The XML data type called XmlContentType shown below includes schemaLocation and namespace URI to be used while validating the XML content.
  3. A tenant administrator could implement the PartHandler interface (TBD) and configure it as partHandler. This handler would then be used by the CollectionSpace service to handle the processing of the part content at runtime. The packaging instructions (TBD) should be followed by the tenant administrator in order to successfully deploy the handler at runtime.
  4. The contentType indicates the MIME media type for the part content.
object part content type

Customization

Extension points or plug-ins

Access control policies, resources and roles

Scalability

Scaling out

To scale out the document repository as discussed in Approach #2, we can use a hybrid of Approach #1 and Approach #2. Here, we would divide the CollectionSpace museum domain(s) (e.g., life sciences, art history, anthropology and archeology, architecture, etc.) into separate repositories. Each Nuxeo domain (e.g. Domain-Tenant-1) in repository still represents a tenant (e.g. Tenant-1). That is, one repository would host multiple tenants from the same CollectionSpace domain (e.g. anthropology museums). The CollectionSpace server could also be scaled out where one CollectionSpace server could use one or more Nuxeo repository and there could be one or more CollectionSpace server.

multi-tenant scale out

ERROR

Gliffy is unlicensed. Please install a license to draw diagrams in your wiki.

Pros:

Cons:

Issues:

CollectionSpace ID space and service

Administration

Delegated Administration

Monitoring

Backup and restore

Import/Export

Hot deployment

References

  1. Nuxeo multi-tenancy issues
  2. Multi-tenancy project
  3. Organization Service Home
  4. Person Service Home
  5. Contact Service Home
  6. Nuxeo documentation
  7. Java Security Overview
  8. Java Authentication and Authorization Service JAAS
  9. A brief introduction to XACML
  10. eXtensible Access Control Markup Language (XACML) specifications