Application Deployment Profiles
We currently see two core deployment profiles, or ways of installing and delivering the CollectionSpace application functionality. These two profiles reflect the requirements that emerged from the institutional profiles earlier identified by the project management staff:
- CollectionSpace as a standalone installation, locally and independently maintained for a given institution;
- CollectionSpace as a hosted service that is centrally maintained and in which different institutions/collections are managed as additional hosted instances.
Both of these will use the same core service logic and will share most if not all of the same end-user experience, although there may be some differences in the integration logic.
The importance of this looking forward is the requirement that the architectures and designs we develop must support both these scenarios in a reasonable manner. This may pose some challenges as we consider different models and technology tools, however there are strong and clear needs for both of these profiles, and so we will have to find a way to accommodate both of them.
Technology Platform Recommendations
The tech team is still in the midst of some basic analysis tasks, and has yet to get very far into the architecture of the system we're building. Nevertheless, we have begun to see certain themes emerge, and we want to share our thinking with the community. At this point, we're discussing a set of Technology Platform Recommendations. These are based upon a number of factors, including:
- An analysis of the functionality and capabilities that we believe will be required in the CollectionSpace applications, and what will be required to support this. This is guided by a number of sources, including the needs expressed in the Community Design Workshops, the functionality implied by evaluation standards like the CHIN framework and Spectrum, and the functionality that is available in existing solutions like OpenCollection.
- An analysis of the systems and projects we've found that lend themselves to re-use. We do not have the resources to build the entire application from scratch, nor do we think this would be wise even if we did have the resources. To the extent reasonable, we are trying to find open source platforms, packages or other projects that we can either build directly upon, or from which we can repurpose significant functionality for CollectionSpace.
- Standards, tools, and principles that are used by other projects and services that make sense for CollectionSpace to align to. This includes some institution-specific initiatives like the Kuali projects for various University services, and a variety of projects that share funding sources (e.g., Mellon).
- The requirements and constraints of the deployment profiles (described above). This includes certain real-world factors such as the available expertise (for a given technology) within museums and archives, or among technical service providers that support museums and archives. It may also reflect common technology environments for institutions likely to deploy CollectionSpace (in either of the two deployment profiles).
We have identified a number of areas of analysis. In some cases we have eliminated certain approaches but are still considering several alternatives. In other cases, some consensus is emerging on the approach we are recommending.
Content and Metadata Storage/Asset management: This is still somewhat open, and we are looking for existing platforms that can provide much of this for us. However, we have explored several approaches to this including triple stores and XML-databases, and have not found a good reason to favor these approaches over more traditional relational models. We found performance issues with the use-cases we expect to see, and do not as yet see a significant need for the few advantages that either the triple-stores or the XML DBs provide. We are nevertheless still investigating several large projects in this space, including 1) Fedora (Fedora-Commons, not the Linux distribution), 2) Alfresco and 3) Nuxeo. At this point, we like what we have seen in Nuxeo, but we have not yet completed our analysis.
Indexing, Search and Retrieval Mechanisms: This is somewhat related to the above, especially where technologies like SPARQL and XQuery are tied to the underlying repository. However, we are trying to separate this for the purposes of our analyses. At this point, we are leaning towards Lucene for reasons of performance, stability, wide deployment and flexible language bindings.
Enterprise Service Bus (ESB) and web-service support layers: We are trying to defer a final decision on any specific ESB, however there is a strong incentive to go with a Java solution for this. Many or most of the open source projects focused on services seem to be using a Java platform for the middleware and orchestration support, and the projects in this domain seem to be maturing fairly rapidly. JBoss is widely used across a range of enterprises, and seems to be fairly full featured. Mule is getting strong reviews from business analysts, and is also feature rich. JBoss, Mule and ServiceMix were identified as strong candidates in a Mellon-funded analysis by a group of Higher Ed Institutions. All of these have sufficient industry adoption that commercial support is available, in addition to the community support mechanisms.
Programming Languages: For the back-end functionality and services, there is a strong motivation to adopt Java. This is based in part upon the same issues as for the ESB area, but as well because of the large (and growing) community of SOA developers familiar with Java and the relevent Java tools. There may be room for Python for some orchestration and integration tasks, especially given its popularity with many developers. For the front end (application layer), we are currently leaning towards PHP and Ruby on Rails. Both of these languages are in wide use for web development, enjoy substantial uptake among web developers (with PHP holding a slight edge here), and both have fairly solid functionality. We do not see a strong reason to go one way or the other, and so the decision may be made by a wider analysis of the technical staff in and supporting museums and archives. It should finally be noted that in the commercial enterprise space, many applications are written end to end in Java, but we have not seriously considered this approach up to know, most because of the constraints described above.
Presentation Frameworks: A server-side presentation framework will provide much of the web tier functionality for CollectionSpace, incluidng HTML page generation, handling the request/response lifecycle of the application, management of URL space, and RESTful views on the service layer. The choice of a specific web tier technologies is directly related to the choice of programming language for the application layer. As a result, evaluation of presentation frameworks will be done alongside programming language evaluation, ensuring that we are able to select a web tier technology that is simple, quick, well-factored. A primary consideration for presentation frameworks is be the ability to easily deliver a naturally "webbish" experience, including statelessness, bookmarkability, user-friendly URLs, and RESTfulness. The ability to reuse markup-generating components without having to write a lot of controller glue code is essential. Technologies slated to be evaluated include: Django (Python), Rails or Merb (Ruby), Grails (Java/Groovy), RSF, Tapestry, or Spring MVC+WebFlow (Java).
Client-Side Toolkits: On the client side, the Fluid framework will power a rich user experience through the creation of fully accessible components that cooperate across client and server boundaries. The Fluid framework will likely be extended to support validation logic that can be reused on both the client and server.