Skip to end of metadata
Go to start of metadata
Icon

Draft.  The project team should verify these initial impressions with a set of users.  More precise information about industry standards for things like page load times can probably be gathered by more knowledgeable experts!

CollectionSpace is subject to a set of performance expectations characterized by two overlapping drivers:

  • As a web-based system, most users will expect CollectionSpace to perform as well as modern well-developed web sites for most operations (i.e., page loads for most pages should be pretty fast)
  • As a highly transactional collection management system, CollectionSpace will likely be most heavily used by professionals working with collections, and those users have a basic set of expectations related to their roles as data custodians and system users (i.e., they will usually understand that data intensive operations like massive updates and extracts can take time).

The grid below tries to capture the areas of system performance that we should be aware of.  They are given a ranking of relative importance and some general sense of performance expected.

User stories based on this grid are at

Top Level Story: CSpace 0.7a Roadmap

Detailed Stories: Perfomance Optimization

Area

Priority

Expectation and comments

Time to login and get to dashboard

High

Industry standard

Displaying any detail screen with data in it (especially a system with 600K to millions of records)

High

Industry standard

Displaying lists of objects or records, and pagination (especially a system with 600K to millions of records)

High

Industry standard

General navigation (e.g., moving from tab to tab)

High

Industry standard

Searching (especially a system with 600K to millions of records)

High

Industry standard though museum professional users might be more forgiving

Data entry on data entry screens (object, procedural, authorities, admin)

High

Industry standard, including fast tabbing from field to field

Saving data entered on data entry screens

High

Industry standard

Updating and saving data on data entry screens

High

Industry standard

Entering, updating, and saving relationships among things

High

Industry standard

Uploading media, including any processing done by the system

Lower

Museum standard? Experience of museum users with current systems might encourage patience

Importing data

Lower

Museum standard? Experience of museum users with current systems might encourage patience

Exporting data

Lower

Museum standard? Experience of museum users with current systems might encourage patience

Running a report

Lower

Museum standard? Experience of museum users with current systems might encourage patience

Response to RESTful interoperability calls

Lower

Museum standard? Experience of museum users with current systems might encourage patience

Bulk media upload

Lower

Museum standard? Experience of museum users with current systems might encourage patience

Displaying media housed in other systems

Lower

Museum standard? Experience of museum users with current systems might encourage patience

Performing batch updates

Lower

Museum standard? Experience of museum users with current systems might encourage patience

Regarding industry standard expectations for basic page loads, Aron provided this useful information:

An IT and marketing consultancy, Forrester Consulting, found in 2009 that for websites aimed at mass market consumers, 2 seconds is the expected wait time for page load, and that load times exceeding 3 seconds will adversely impact the usage experience of at least a large minority of consumers. 

"Two seconds is the new threshold in terms of an average online shopper's expectation for a webpage to load, and 40% of shoppers will wait no more than three seconds before abandoning a retail site, according to a new study conducted by Forrester Consulting on behalf of Akamai Technologies."

Performance is also heavily impacted by data volumes.

Use Cases

  • No labels

2 Comments

  1. Moving Image CMS – performance stats (unofficial)

    Login: 2s

    Search returning <10,000 results: 2s
    Search returning 10,000-20,000 results: 4s
    Search returning 20,000-30,000 results: 6s
    Search for all (star): 25s

    Search result to object record: 1s
    Object record to next object record: 1s
    Tabs w/in object record: 1s

    Search for authority term w/in object record: 1s

    Upload media 5MB: 3s
    Upload media 25MB: 17s
    Upload media 100+MB: 75s

    (Larger images double for processing time)

  2. Hearst Museum CMS — performance stats (unofficial)

    All tests run on 8/4/2010 on a Windoze XP (Core 2 duo, E6750, 2.67 GHz, 3.25 GB).
    Gigabit Cat. 5 connectivity. Network speed during testing periods average 65.98 Mb D/L, 39.01 Mb U/L (range 42.47–89.33 D/L, 26.42–52.64 U/L).

    Tests repeated at three times during day (high CMS usage, low-moderate CMS usage, zero-low CMS usage); all values included in an effort to capture typical performance of the CMS.

    I'll post the raw data in a subsequent post if I can't figure out how to attach an Excel file to this post.

    The query in all cases was input into the GUI of the CMS under "Find Object Number:" using wildcards. In all cases (even find all, where only a single wildcard character was entered in the search field), the query was interpreted and executed by the CMS as a simple single-table query:

    SELECT DISTINCT a0108.ObjectID, a0108.ObjectNumber, a0108.DepartmentID SecurityCategoryID, a0108.SortNumber
    FROM [TMS]..Objects a0108
    WHERE ((a0108.ObjectID > –1 AND ((a0108.ObjectNumber LIKE '1–%')) AND a0108.IsVirtual=0) AND a0108.IsTemplate=0)
    ORDER BY a0108.SortNumber ASC

    In the case of 'find all', the query seems to have been essentially the same:

    SELECT DISTINCT a0108.ObjectID, a0108.ObjectNumber, a0108.DepartmentID SecurityCategoryID, a0108.SortNumber
    FROM [TMS]..Objects a0108
    WHERE ((a0108.ObjectID > –1 AND ((a0108.ObjectNumber LIKE '%')) AND a0108.IsVirtual=0) AND a0108.IsTemplate=0)
    ORDER BY a0108.SortNumber ASC

    I'd like to also do more complex queries to see what the times are for those. I'll post when/if I do.

    The best fit for (X=number of records in result set) versus (Y=query-to-results-rendered time) is a second-degree polynomial (r^2 = 0.98509): y=–1E-11x^2 + 3E-05x + 1.1711 (or (–.00000000001x^2) + .00003x + 1.1711 ).

    A simple linear regression fits slightly less well (r^2 = 0.9785): y=2E-5x + 1.3065 (or (.00002x) + 1.3065 ).

    In either case, the constants (1.1711 or 1.3065) probably reflect the reality of having a certain amount of [non-scaling] overhead required for any search, regardless of number of results returned (about 1.2 – 1.3 seconds).

    Anyhow, here are interpolated results from the polynomial and linear regressions:

    Polynomial regression
    Seconds Records
    1.17 1
    1.17 10
    1.17 100
    1.20 1000
    1.32 5000
    1.47 10000
    1.62 15000
    1.77 20000
    1.91 25000
    2.06 30000
    2.36 40000
    2.65 50000
    2.94 60000
    3.22 70000
    3.51 80000
    3.79 90000
    4.07 100000
    4.76 125000
    5.45 150000
    6.11 175000
    6.77 200000
    8.05 250000
    9.27 300000
    10.45 350000
    11.57 400000
    13.67 500000
    15.57 600000
    17.27 700000

    Linear regression
    Seconds Records
    1.31 1
    1.31 10
    1.31 100
    1.33 1000
    1.41 5000
    1.51 10000
    1.61 15000
    1.71 20000
    1.81 25000
    1.91 30000
    2.11 40000
    2.31 50000
    2.51 60000
    2.71 70000
    2.91 80000
    3.11 90000
    3.31 100000
    3.81 125000
    4.31 150000
    4.81 175000
    5.31 200000
    6.31 250000
    7.31 300000
    8.31 350000
    9.31 400000
    11.31 500000
    13.31 600000
    15.31 700000