UC Berkeley Deployments Platform Strategy

May 2013 status

With three CollectionSpace deployments now in production at UC Berkeley, the team is analyzing how those instances are operated.  What is the target platform and set of practices that are allow us to operate CollectionSpace so that it is:

  • efficient
  • cost-effective
  • stable
  • performs well

There are several dimensions to this question:

  • system administration
  • database administration
  • CollectionSpace application administration
  • tools and utilities

What can we do in May and June to move this forward?

Do we want to try update all servers to be on the same versions of Red Hat, Postgres, and other software?

What is the problem with our current operations?

  • Upgrading servers is time-consuming because each one was set up at different times
  • Deploying servers is time-consuming
  • Tacit knowledge and practices are held in one person's head
  • Too much risk
  • Programmer performing administration tasks should be spending time using collections expertise
  • Dedicated database administrators and system administrators should be able to perform these tasks taking advantage of their specialized expertise

Database administration questions

Questions about database administration (related to having another team manage our Postgres instances):

  • If we need to restart the database after standard support hours, how will that be done? We are in the process of signing a contract with an outside vendor who will supply 24 x 7 break/fix support for PostgreSQL and MySQL databases.
  • Can we run SQL DDL statements on the production server? Yes. This will require a bit more discussion; while we want to embrace the idea of self service, we also want to make sure that Production environments are treated in a way to maintain stability and consistency. 
  • Can we import Postgres snapshots (pgdump files) from other server that have the same schema? Yes.
  • Can we create pgdump files that we have access to? Yes.
  • Can we choose specific versions of Postgres? Yes.
  • What tools or utilities are available to us to manage the Postgres instance? We do not provide any tools to manage the instance. Again, we should have more detailed discussions regarding roles and responsibilities, and expectations in terms of System DBA functions vs Application DBA functions. 
  • Can we kill queries? We can write a function that can give you permission to kill queries.
  • Who tweaks Postgres configuration files? The DBA team.
  • Can we get detailed help on performance questions that require deeper analysis of the queries? Yes. We can supply assistance on performance, or we can also leverage the outside vendor to supply in depth performance tuning.
  • If we separate the database onto a different VM, to what degree would we need the application server to be on the same physical machine? As a best practice, we advise you to put the application server on a separate machine.
  • Can we have a login account with the Create DB and Create role attributes? We should discuss why you would want to do this.
  • Can we customize template1? We should discuss why you would want to do this.
  • Are there restrictions on the network access settings in pg_hba.conf? Yes.
  • If we needed to temporarily add an IP address to pg_hba.conf, what would be the turn-around time? 4 hours is our standard turnaround time.
  • If we wanted to experiment with a different setting in postgresql.conf, what would be the turn-around time? 4 hours is our standard turnaround time.
  • Some of the pg_catalog tables/views have restricted columns, in particular pg_stat_activity restricts the "query" column. Could we arrange to have access to that information during performance testing? Yes.

System administration questions

Questions about system administration (related to having another team manage our virtual machines and operating systems):

  • We will need to have Apache2 and Tomcat installed, Apache will be the standard version from the software channel, but Tomcat will be a non-standard version, installed manually.  Can we do that and can we have full access No. to the tomcat account?
  • Apache will be configured for HTTPS with a comodo certificate.  Who will be responsible for installing the CA certificate?
  • Can we have sudo access for a list of No. init.d scripts?  e.g. tomcat, apache (httpd), postgresql? 
  • If we are running postgresql locally on the VM, can we have sudo access to the postgres account and configuration files? No.
  • Collectionspace No. does not work with openJava, it requires the Oracle version of java7.  Can we have jdk1.7 installed (as the system default) and updated regularly?
  • Can we manage the host firewall ruleset? No.
  • We have a set of working tools that will need to be installed (git, maven, ant) - who will handle that?
  • Can we have crontabs  for tomcat and postgres? Not for postgres.
  • Can we have sudo access to manage the logrotate service?
  • We typically have a global configuration file "/etc/profile.d/cspace.sh". Will that be okay and can we own it (be able to edit it)?
  • Currently, our VMs have sudo rules allowing some operations to be executed with no password - such as allowing members of a designated group to restart tomcat. Will this be possible?
  • Depending on the OS version, Django may need to be installed manually. Is that okay?
  • Can we reboot the system when needed?
  • Can we manage users and groups?

Keep in mind these moving targets

  • We might want to adjust our storage model. Right now, all storage is mounted through LVM and can not be mounted to another host.
  • For reporting and/or web apps, we might want a separate database snapshot.
  • Web apps might or might not reside on the same VM as the CollectionSpace application server.
  • Image processing might move off to a different server.