Saturday, May 28, 2011

Rosetta

On Friday I heard a presentation by Ex Libris staff about their Rosetta long term digital preservation system. Marketing presentations generally do not interest me, but the presenter was the project manager and could in fact answer questions about technical issues.

Bitstream Integrity

Basically this is not a problem that Rosetta addresses directly, but it also does not deny its importance. They have in fact talked with David Rosenthal about it. The system structure separates the bit-management from other layers and allows multiple solutions, including those that do active integrity checking. When Rosetta must manage the storage directly, it uses checksums and does periodic integrity testing against the stored copy. But if the copy's checksum does not match the stored checksum, then they can only ask for someone else to give them a new copy, which could be troublesome in 100 years or so.

We talked about whether LOCKSS might integrate with Rosetta at this level. The general answer seemed to be yes, or at least that it might be worth a try.

Authenticity

Rosetta does maintain provenance information, but has no way to link back to check against the original to make sure that the authenticity remains synchronized. This is problem is not unique to Rosetta. The digital preservation field really needs to develop reasonable criteria for authenticity testing.

Access

Here Rosetta seems to do a good job in making various access copies and controlling the access rules.

Risk Manager

This feature appears to function something like the migration manager in koLibRI. It uses a database to keep track of technical metadata about formats and versions. Rosetta has a knowledge base that allow institutions "to share their formats, related risks and applications". Rosetta has a work-group to enhance the knowledge base as well. [Thanks to Ido Peled for this addition.]

We talked about the risk problem generally with format change. It is not really 0 or 1, but more likely a scaled reduction of access to certain formats. Clearly there needs to be more thinking about when to trigger migration and what kind of migration (on-the-fly or preventative) makes sense.

Load Speeds

I was pleased to see that Rosetta has tested its performance loading different sizes of data and that the information is publicly accessible (see figure 3). I have talked with a number of publishers recently that have concerns about the ability of archiving systems to load their contents in a timely manner and I think other systems should test their ingest times.

Conclusion

The session ended with our agreeing to talk more about potential collaboration in the research arena.


1 comment:

  1. Michael,

    My larger group at the Stanford Libraries has been working on digital preservation for some years (separate from LOCKSS) - you might want to get in touch with Tom Cramer. We've got a whole infrastructure and have been steadily improving our ingest times to be respectable, too.

    - Naomi

    ReplyDelete