This blog discusses research issues involving digital libraries, digital archiving, and almost anything else that pertains to the management of digital information.
Tuesday, November 1, 2011
eReading
Now a more extensive study at the University of Mainz has reached similar conclusions: "Almost all participants stated that reading from paper was more comfortable than from an e-ink reader despite the fact that the study actually showed that there was no difference in terms of reading performance between reading from paper and from an e-ink reader." The study also found that "the older participants exhibited faster reading times when using the tablet PC." (Source)
The general assumption in Germany is that a strong cultural preference for print on paper is likely to persist. It may, of course, but if the US experience offers any indication, the resistance may give way to the convenience of having multiple works on a single device. On my iPhone right now I have a dissertation and 4 novels. The iPhone is a bit small for dissertation reading, but I never read scholarly works on paper any longer because I want to be able to search them and to look up references simultaneously. I also buy fewer and fewer novels in paper form because I do not want to have to carry one more object with me.
A few years ago I thought that an interesting study would be to sit in the Berlin S-Bahn (elevated train) and count the number of people using eReader devices. Now such a study would be harder, because such a large number of people spend their transit time doing something on their smart phones, but whether that is reading, playing games, or sending email is hard to say. Or perhaps what they are doing is irrelevant. Whatever they are doing, it seems to involve reading on an electronic device.
Thursday, October 27, 2011
CyberAnthropology
The session began with a critique of the presenters' paper in which the speaker noted that the presenters had done no original data collection of their own. In my own part of the academic world that criticism would probably be grounds for rejection from any serious scholarly symposium or conference.
It further became clear that the presenters had almost no background in contemporary scholarly anthropology. Their approach was to throw out the whole empirical basis of contemporary anthropology as too narrow and to replace it with a cloud of philosophers starting with Plato and Aristotle and ending with Paul Ricoeur and Derrida. (Note: I was at the University of Chicago during the years when Ricoeur was there -- his concepts are not entirely new to me.) The presenters believe that "philosophical anthropology" and their understanding of hermeneutics eliminates the need for standards for evidence and rules for persuasion. At least that is what I heard them say over and over again. Nonetheless it is interesting that they welcome data from others. I wonder why?
True to their law faculty roots, the presenters have already acquired the rights to http://www.cyberanthropology.de. Others have already captured the name http://www.cyber-anthro.com/, so the brand is not exclusive.
My objection to this symposium paper is partly that I regard it as an embarrassment to my own Philosophical Faculty 1, which houses the university's departments of philosophy and European Ethnography (cultural anthropology). It does not reflect our standards.
Thursday, October 20, 2011
AnthroLib moves
The map is a typical Google map with the odd quirk that it can start moving and be difficult to stop without reloading the location. The flaw may lie in how I touch the map screen with my cursor. It is mildly annoying. The map shows that most of the AnthroLib projects are US-based and generally east-coast, but perhaps we can get some started in Berlin.
Tuesday, October 18, 2011
O'Reilly Media Ebook report
While I read a great deal on the screen and insist that I only read student papers in electronic form, I admit that I take pleasure in Berlin's excellent book stores with their intelligent selections and recommendations. As physical places, they are a delight. I wish they offered eBooks in house the way Barnes & Noble does, though. Then they would be perfect.
Sunday, September 11, 2011
Archiving in the Networked World: Preserving Plagiarized Works (Abstract)
Methodology: The article looks first at the ingest process (called the Submission Information Package or SIP, then at storage management in the archive (the AIP or Archival Information Package), and finally at the retrieval process (the DIP or Distribution Information Package).
Findings: The chief argument of this article is that works of plagiarism and the evidence exposing them are complex objects, technically, legally and culturally. Merely treating them like any other work needing preservation runs the risk of encountering problems on one of those three fronts
Implications: This is a problem, since currently many public preservation strategies focus on ingesting large amounts of self-contained content that resembles print on paper, rather than on online works that need special handling. Archival systems also often deliberately ignore the cultural issues that affect future usability.
Saturday, July 23, 2011
Microsoft Research Summit 2011 - day 3
The dinner cruise turned out not to be especially cold, since the ship had large indoor areas where we ate. Microsoft also provided an open bar at this and in fact at all the dinners. Often the wine at such functions is dubious, but even the wine was good and the quality of the selection of micro-brew beer was equally impressive.
Of course the goal of the dinner was not food or drink or even the scenery along the lake, but the conversation among colleagues. I ate with fellow deans from Illinois, Michigan, and Carnegie Mellon, and even if no great research comes from our discussions, collegial discourse is an important social component in the efficient functioning of organizations and projects.
Day 3
I should remember more of day 3 that I do, but jet lag had not quite lost its hold and the morning presentations, while good, left little permanent impression. The main event of day three was in any case the iSchool meeting with Lee Dirks and Alex Wade from Microsoft Research. Lee and Alex gave some sense of the projects they are working on. Lee especially has an interest in long term digital archiving that includes involvement in projects like PLANETS. While testing is an official component of PLANETS, I find that it puts less emphasis on testing than on planing and organization. Testing is, however, what is really needed and that is what I tried to suggest in the meeting -- not, I think, with great success.
The other research aspect that I tried to sell, without much obvious resonance, was ethnographic research on what digital tools people really use and what they really want. Microsoft builds tools and we saw a lot of them that are oriented toward research, but I wonder how well some of them will do in the academic marketplace in the long run. Ethnographic research gives deeper insights into what people understand and misunderstand than do surveys.
Just before I left we held the oral defense of the thesis for one of my very best MA students [1]) whose thesis looked at how a group of literature professors at Humboldt-Universität zu Berlin (which actively supports Open Access) regard Open Access. It was striking how much they misunderstand Open Access and how little they know about it. Most of them would never have filled out a survey. This information would just have slipped away or remained as an anomaly. Microsoft could profit from research like this and had an interest in it in years past. It is less clear that it does today.
[1] Name available on request, with the student's permission.
Tuesday, July 19, 2011
Microsoft Research Summit 2011 - day 2
In the discussion someone said that we aren't doing science if we can't go back to the data. This suggests a clear separation of data and processing that speakers about very large data sets said is no longer really possible, since the data is usable only with a degree of processing. The speaker was doing fundamentally social science research, which is more human-readable than Big Science data.
Other comments of interest:
- Do we need to take the “good” into account in our interpretation of the “natural”?
- It's not a machine that we are interfacing to any more, though the speaker is not sure what it is exactly. There is no machine, but a task.
Microsoft clearly has a strong interest in image management, particularly three-dimensional images such are used in medical imaging (doubtless a good market) or gaming. They are dividing a picture into quadrants and creating mathematical representations of the edges in each square to create a hash to search for similar photos. Photosynth.net was also demoed -- it allows the creation of three dimensional images from multiple photos.
Evening Cruise
Microsoft has planned a dinner cruise for the evening. It should be pleasant (I will comment tomorrow), but many of us wanted to go back to the hotel to leave computers, etc., and to change clothes because it is fairly chilly out (despite the heat wave in the rest of the US).
Monday, July 18, 2011
Microsoft Research Summit 2011 - day 1
Monday, July 11, 2011
Computational Thinking
Sunday, July 3, 2011
ICE Forum and Bloomsbury Conference
- in 2011 88% of scholarly reading in the UK came from an electronic source (94% of those readings from a library).
- In 2005 54% of the scholarly reading in the US was from an electronic source.
- In 2011 45% of the scholarly reading was done on the computer screen and 55% of scholars printed a copy.
- In 2005 19% of the scholarly reading was done on the computer screen.
Thursday, June 23, 2011
Archiving in the Networked World: Metrics for Testing (abstract)
Purpose: This column looks at how long term digital archiving systems are tested and what benchmarks and other metrics are necessary for that testing to produce data that the community can use to make decisions.
Methodology: The article reviews recent literature about digital archiving systems involving public and semi-public tests. It then looks specifically at the rules and metrics needed for doing public or semi-public testing for three specific issues: 1) triggering migration; 2) ingest rates; and 3) storage capacity measurement.
Findings: Important literature on testing exists but common metrics do not, and too little data is available at this point to establish them reliably. Metrics are needed to judge the quality and timeliness of an archive’s migration services. Archives should offer benchmarks for the speed of ingest, but that will happen only once they come to agreement about starting and ending points. Storage capacity is another area where librarians are raising questions, but without proxy measures and agreement about data amounts, such testing cannot proceed.
Implications: Testing is necessary to develop useful metrics and benchmarks about performance. At present the archiving community has too little data on which to make decisions about long term digital archiving, and as long as that is the case, the decisions may well be flawed.
Saturday, June 18, 2011
German Library Conference in Berlin
I attended only a few sessions because of concurrent meetings at my University. One was by Lynn Connaway from OCLC Research. Lynn was one of relatively few speakers who spoke in English – which the German audience understood without any apparent problems. She spoke about a JISC funded project in which her task was to find common results among a number of user-studies. A point that she passed over quickly in the talk (but which we spoke about in greater detail privately) was the difficulty in finding exactly how some of the studies were done: how the subjects were chosen, how exactly the data were gathered, or how they were analyzed. Among the common conclusions that she reported were:
- Virtual Help. Users sometimes prefer online help even in the library because they do not want to get out of their chairs.
- Squirreling instead of reading. Many users squirrel away information and spend relatively little time working actively with contents.
- Libraries = books. Many people think of libraries primarily as collections of physical books and often do not realize the library's role in providing electronic resources. They also criticize the physical library and its traditional collections.
Wednesday, June 1, 2011
AnthroLib map
In Berlin's "winter" semester (that begins in October 2011), I plan to offer a seminar with a colleague where some students work with her using psychological experiments to evaluate digital libraries, and I use ethnographic methods. Many of the projects in Nancy's map seem to address physical library spaces, but my interest is in digital space.
In talking about cultural anthropology with students I often quote Clifford Geertz:
The "essential task of theory building here is … not to generalize across cases but to generalize within them. ...The diagnostician doesn't predict measles; he decides that someone has them…" (Geertz, 1973, p. 26)In looking at a particular digital resource the goal should not be not to generalize about all users, but to understand the culture and quirks of those who use (or do not use) that particular source. This is very much like the goal of those studying students at particular libraries, except that the users are not necessarily physically there -- which can be a problem.
For the seminar we may well use some portion of the digital presence of our own university library. One possibility might be to repeat some experiments from Studying Students (for example, the one in which students redesign the library website), but to use culturally different sets of users (humanists and natural scientists? German students and Germanists in the US?) to understand different design preferences.
If anyone has done ethnographic experiments in digital space, I would be interested in hearing about them.
References
Geertz, C. (1973) Thick Description: Toward an Interpretive Theory of Culture, In: The Interpretation of Cultures: Selected Essays. New York, Basic Books, pp.3-32.
Saturday, May 28, 2011
Rosetta
We talked about the risk problem generally with format change. It is not really 0 or 1, but more likely a scaled reduction of access to certain formats. Clearly there needs to be more thinking about when to trigger migration and what kind of migration (on-the-fly or preventative) makes sense.
Tuesday, May 24, 2011
ANADP in Tallinn - day 3
Monday, May 23, 2011
ANADP in Tallinn - day 2
Keynote
The keynote speaker at the ANADP conference for the second day was Gunnar Sahlin from the National Library of Sweden. One of the National Library's explicit tasks is to support university libraries. Open access and e-publishing are key initiatives together with the other 4 Nordic countries. Linked open data a more problematic topic because of resistance by publishers, but the National library strongly supports Europeana's efforts in this area. There is a close cooperation with the public sector, especially Swedish radio and television. The Swedish Parliament is considering a new copyright law that may clarify some issues.
Standards panel
The standards panel began with the idea that we have both too many standards and too few. Standards can be seen as a sign of maturity in a field. Digital preservation has not only its own standards, but many from other areas -- a Chinese menu of choices. Information security standards are for preserving confidentiality, integrity, and the availability of information. Many memory institutions have to comply with these standards. The issue was especially important for Estonia because of internet attacks, especially denial of service attacks. In general information security is well integrated into plans in the Baltic countries, but long term digital preservation is not. Only 12% have an offsite disaster recovery plan.
The UK Data Archive is an archive for social science and humanities data since 1967. "A standard is an agreed and repeatable way of doing something -- a specification of precise criteria designed to be used consistently and appropriately." In fact many standards are impractical, with unnecessary detail (8 [?] pages to explain options for gender in humans). Cal Lee spoke about 10 fundamental assertions, including that no particular level of preservation is canonically correct. Context is the set of symbolic and social relationships. With best practices and standards, trust is a key issue. PLANETS is concerned about quality standards and such standards begin with testing. Trust consists of audits, peer-reviewing, self-assessment, and certification. The process moves from awareness to evidence to learning. The biggest technology challenge comes from de facto standards from industry, and we have little control there. Good standards have metrics and measurement systems. Within our lifetime everything that we have as a preservation standard now will be superseded, but the principles will remain.
Copyright panel
Digital legal deposit is a key element, but not a form of alignment. In the UK, for example, legal deposit is still just for print material. In the Netherlands there is a voluntary agreement that works well. Territoriality is a problem – how to define the venue in which publishing takes place in the digital world, what is unlawful, what is protected, etc. The variance in legal deposit between countries leads to gaps. The rules for diligent search for orphan works are so complicated that they are too expensive to use. Even within the context of Europeana cross border access to orphan works is a problem. In US law contract law takes precedent over copyright law. Too many licenses could undermine the ability to preserve materials.
To a question about Google a speaker said that Google's original defense of the scanning project was "fair use" (17 USC 107) and they had a good chance there. It changed to a class-action suit, which is more complicated. The breakout session on copyright went into further depth about what problems exist in dealing with copyright across national borders. Apparently a feature of Irish copyright law is that the copyright law takes precedence over private contracts. Generally contracts take priority.
Panel chairs gave a summary of their sessions and breakout sessions. For the technical group I spoke about the need for testing, trust (or distrust) and metrics and argued that we are really just beginning to address these issues.