Thursday, June 23, 2011

Archiving in the Networked World: Metrics for Testing (abstract)

This article will appear in Library Hi Tech, v29, no. 3, which should be available in August 2011 in preprint form. The abstract is below.

Purpose: This column looks at how long term digital archiving systems are tested and what benchmarks and other metrics are necessary for that testing to produce data that the community can use to make decisions.

Methodology: The article reviews recent literature about digital archiving systems involving public and semi-public tests. It then looks specifically at the rules and metrics needed for doing public or semi-public testing for three specific issues: 1) triggering migration; 2) ingest rates; and 3) storage capacity measurement.

Findings: Important literature on testing exists but common metrics do not, and too little data is available at this point to establish them reliably. Metrics are needed to judge the quality and timeliness of an archive’s migration services. Archives should offer benchmarks for the speed of ingest, but that will happen only once they come to agreement about starting and ending points. Storage capacity is another area where librarians are raising questions, but without proxy measures and agreement about data amounts, such testing cannot proceed.

Implications:  Testing is necessary to develop useful metrics and benchmarks about performance. At present the archiving community has too little data on which to make decisions about long term digital archiving, and as long as that is the case, the decisions may well be flawed.

Saturday, June 18, 2011

German Library Conference in Berlin

The German Library Conference (Bibliothekartag auf Deutsch) took place last week in Berlin at the Estrel Conference Center in the (far) south east corner of Berlin. The theme of the conference was “Libraries for the Future; the Future for Libraries” and as the theme implies German libraries are aware that the information world is changing in ways that they cannot simply ignore. A friend describes the conference as a purely incestuous association meeting. The Bibliothekartag is certainly more like the American Library Association meeting than purely scholarly conferences like JCDL or TPDL (formerly ECDL). Nonetheless such meetings are important both to measure the readiness of ordinary libraries to make changes and as an opportunity to educate the profession about topics that they approach with considerable reserve.

I attended only a few sessions because of concurrent meetings at my University. One was by Lynn Connaway from OCLC Research. Lynn was one of relatively few speakers who spoke in English – which the German audience understood without any apparent problems. She spoke about a JISC funded project in which her task was to find common results among a number of user-studies. A point that she passed over quickly in the talk (but which we spoke about in greater detail privately) was the difficulty in finding exactly how some of the studies were done: how the subjects were chosen, how exactly the data were gathered, or how they were analyzed. Among the common conclusions that she reported were:

  • Virtual Help. Users sometimes prefer online help even in the library because they do not want to get out of their chairs.
  • Squirreling instead of reading. Many users squirrel away information and spend relatively little time working actively with contents.
  • Libraries = books. Many people think of libraries primarily as collections of physical books and often do not realize the library's role in providing electronic resources. They also criticize the physical library and its traditional collections.

I also attended a session that was entitled “Networked Libraries: Service providers for networked data.” Many of the talks discussed linked data or linked open data. Jakob Voss gave the initial lecture and used a visual metaphor of bridges to make the point both about the need for connections and their fragility (one of his slides showed a bridge that had collapsed). The final presentations in this session focused on digital archiving. The first looked at research data with the idea that “data is the new oil”. One major step forward is that DFG and NSF both now require data management plans for data from supported projects. A serious issue is the long term costs for archiving research data, which both nestor and MIT are beginning to examine. The second archiving talk was mine on the LuKII Project (LOCKSS und KOPAL: Infrastruktur und Interoperabilit√§t). In my overview I mentioned the need to understand cultural as well as technical migration -- that is, our cultural understanding of information changes over time, just as do the formats. This evoked some interest during the discussion.

Wednesday, June 1, 2011

AnthroLib map

Nancy Foster, one of the editors of "Studying Students: the Undergraduate Research Project at the University of Rochester" has published a map of anthropologically based library studies that is not only shows the geographic distribution, but shows an impressive number of projects.

In Berlin's "winter" semester (that begins in October 2011), I plan to offer a seminar with a colleague where some students work with her using psychological experiments to evaluate digital libraries, and I use ethnographic methods. Many of the projects in Nancy's map seem to address physical library spaces, but my interest is in digital space.

In talking about cultural anthropology with students I often quote Clifford Geertz:
The "essential task of theory building here is … not to generalize across cases but to generalize within them. ...The diagnostician doesn't predict measles; he decides that someone has them…" (Geertz, 1973, p. 26)
In looking at a particular digital resource the goal should not be not to generalize about all users, but to understand the culture and quirks of those who use (or do not use) that particular source. This is very much like the goal of those studying students at particular libraries, except that the users are not necessarily physically there -- which can be a problem.

For the seminar we may well use some portion of the digital presence of our own university library. One possibility might be to repeat some experiments from Studying Students (for example, the one in which students redesign the library website), but to use culturally different sets of users (humanists and natural scientists? German students and Germanists in the US?) to understand different design preferences.

If anyone has done ethnographic experiments in digital space, I would be interested in hearing about them.

Geertz, C. (1973) Thick Description: Toward an Interpretive Theory of Culture, In: The Interpretation of Cultures: Selected Essays. New York, Basic Books, pp.3-32.