Sunday, May 5, 2013

Finding a research question


Students often ask me about finding a research question, since I require one for theses of all sorts, and this blog post is an attempt to provide an answer.

Many students start with a topic that they would like to research. This is natural, but in some ways secondary to the process of scholarly writing. I recommend that students start with: 1) a method based on some well-established discipline, and 2) a source of data. Let me explain.


A method is the tool you use for research. If you were a painter, you might choose a variety of scenes to paint, but the brush and oils would be an essential part of how you approached it. As scholars we build up skills using certain discipline-based methods. In effect, we learn how to paint with a particular set of intellectual tools. If we ignore the tools or never learn how to use them, the chances of painting a satisfying scene are significantly less.

In graduate school I settled on a set of ethnographic tools, which I have used and reused over the decades. This does not mean that my approach has been unchanging, but it has had a consistency based on long practice. I am not a trained ethnographer in the German sense of having a degree in it or even having taken classes, but cultural anthropology was part of the atmosphere of my graduate school environment, and I just keep reading and practicing it in my dissertation and beyond. Having a method  means absorbing a way of thinking. This is essential to formulating a research question.


Access to data is the equivalent to providing a painter with a scene to paint. If the painter has no one who will sit for a portrait, it becomes much harder to paint a portrait. People try sometimes with varying success, but without sufficient experience with real subjects, it is hard for a painter to create a portrait in the abstract. 

Some students want to rely entirely on existing published results and to comment on them. This is more like copying a painting than creating a new one. It can be a reasonable approach if they can do a new analysis, but for a beginner  merely to comment on other people's work without actually analysing the data anew risks superficial results. 

Data are hard to get. Many desirable sources are closed to the public, and many public sources are overworked or unreliable. Data do not have to be perfect to be used in a scholarly study, but they do need to be available and the author does need to understand and be able to explain their imperfections.

The question

Once the method and a source of data are clear, the student can then reasonably begin to formulate the research question. It needs a grounding in the scholarly discourse in the field to explain why that particular question is interesting. Many students want a completely new question so that they can do something original, but wise students often take a well-researched existing research question and approach it with new data or a new method. The advantage of an existing research question is that its importance is already clear. 

The best research questions for a thesis are ones with a straightforward answer. I generally recommend a yes/no question, or one that has a quantitative answer, or one that is a choice among reasonable alternatives. These are not the only possible research questions, but questions involving complex issues about "why" or even "how" tend to be beyond the scope and experience of even the cleverest doctoral students. The virtue of  a yes/no type question is that the student can make a clear choice. A thesis with a vague answer is not a contribution to knowledge, while even a very narrowly stated and highly qualified yes/no answer can be a reasonable step forward.

Choosing a research question is hard, but it is probably the most important step in writing a thesis. The topic matters only in so far as data are available and the research method can reasonably apply. Topics are temporary and can change with the seasons. Good research questions grow ultimately out of the intersection of scholarly methods and quality data. 

Saturday, December 22, 2012

Prize Selection

I am involved with a number of paper prize awards and find myself wondering how effectively the selection process works. In the end the selection comes down to a small group of people with both personal and cultural preferences. In some fields the quality of the mathematics makes for a fairly even playing field, but information science today has little clarity about its core topics or methods, and a cultural diversity that makes consensus hard. Do we really know how well the process works? Several questions come to mind that a masters student could answer in a thesis.

Is there evidence, that papers that get an award are more influential over time than other papers? Influence might be measured via the number of citations. The population of prizes should be restricted to awards with a long enough history to allow for publication and public reaction. The ASIS&T / Proquest awards, for example, list 15 years of winners. The JCDL student paper award is only 8 years old, but the Vannevar Bush best paper award goes back to 1998. The iConference awards are newer and there is no single list of winners. Nonetheless this data is generally available. Citations could be counted in a number of databases, or tested via Google Scholar, which would then include open source citations.

A related question is whether authors who win awards also get more citations on other papers, regardless of the success of the winning paper, and whether the authors become notable figures in the field. I recognized four of the 15 winners of the ASIS&T award immediately, and they are certainly active in the field. 

An number of other research questions revolve around factors that influence reviewers. I see a lot of reviewer comments in my work and so many reviewers make errors in their comments on statistical analyses that I wonder whether a moderately complex statistical analysis actually hurts a paper's chances of winning prizes. A related issue is the use of popular buzzwords. There are years when certain topics generate intense interest that is not sustained over time. Buzzwords associated with these topics may give an impression of cutting-edge work and give these papers an edge. Finding measurable answers to both of these research questions would be harder than doing a simple citation analysis, but it would give useful information both to applicants and to prize committees.

Tuesday, October 16, 2012

Do not track...

According to a New York Times article by Natasha Singer (13 October 2012), 9 members of the US House of Representatives questioned the Federal Trade Commission's "involvement with an international group called the World Wide Web Consortium, or W3C, which is trying to work out global standards for the don’t-track-me features." What apparently has them distressed is that "do not track" may become the default in, for example, Microsoft's new version of its Internet Explorer browser.

Defaults are important, as Richard Thaler and Cass Sunstein explain in their People tend to accept defaults rather than change them, and this is true for a wide variety of topics including pension plans, health care, and privacy. Those who control the choice of the default arguably determine what a majority will decide. This is not surprising, since we accept cultural defaults all the time in matters as basic as food and clothes. 

After years of working on operating systems and on network applications, I am perhaps less concerned about privacy than many of my colleagues for quite contradictory reasons: first, because I realize that anyone with the right technical skills can break ordinary privacy protections in the Internet, and second, because I realize that it is a lot of work and mostly not worth the trouble.  Nonetheless some regard for privacy seems basic to how free and democratic societies operate in the HTTP environment. Microsoft seems to have consumer interests more at heart than  the nine US congressmen.  

Sunday, October 7, 2012

Information Technology history

Some while ago I began to put together an historical timeline on digital library developments. The timeline began relatively informally, but lately I have started to add references to source materials. It is very much a work in progress, but I would be happy to have suggestions for more entries. Anyone may view the timeline, but only I can update it at the moment.

Digital libraries and in a broader sense the world of information technology is relatively young, but it has become old enough that some attention to its history seems increasingly warranted. ASIS&T has, for example, a webpage devoted to the history of information science and technology. Professional historians are starting to take an interest as well, including colleagues at Humboldt-Universit├Ąt zu Berlin.

The social and legal issues are complex and interesting, and increasingly students need enough historical background in the history of technology to discuss topics like copyright or censorship or even the effect of technology on elections (such as the current US presidential election). We also have an imperfect understanding about the interaction between innovation and users, except that in some cases users quickly adopted new developments (HTML, for example) and in other cases innovations like the mouse sat fallow for years. Questions about the innovation/demand cycle play a key role in discussions about the industrial revolution. Whether the dynamics are similar or not I have too little evidence to judge.

This blog has been quiet for some time, but I plan to use it more regularly to discuss issues about the history of information technology precisely because I hope for comments from readers.

Thursday, October 4, 2012

30 Years of Information Technology

Library Hi Tech has been celebrating its thirtieth year with a number of special issues, and the latest issue looks back on the development of information technology for libraries. Below is the structured abstract for my editorial. I will include a  link when the issue is available online.

Purpose: This issue of Library Hi Tech offers a retrospective over the last thirty years of information technology as used in libraries and other memory institutions, particularly archives and museums. This editorial will add the editors’ reflections.
Method: The method uses historical documentation and relies heavily on personal recollection. 
Findings: Thirty years ago information technology in libraries largely had to do with ways in which libraries could make their ordinary operations more efficient. Today the information science frontier has broken out of the comfortable institutional paradigm of the past and made libraries aware that they need to redefine themselves in a world where their buildings no longer represent a storehouse of knowledge unavailable elsewhere.
Implications: Information technology advances have not made libraries obsolete, but they have made it imperative that libraries redefine their role to be digital information managers and service providers for their readers.

Wednesday, July 11, 2012

ReCAPTCHA - a post by Estelle Shumann

Note: I have removed this post by Estelle Shumann after a number of negative comments and requests. The topic was interesting and it seemed harmless enough. Recently I received the following message:

 You currently have a link on your site pointing to our website.  We have recently received warning from Google that they are suspicious of link trading schemes surrounding this, and we want to make sure that you are taking the necessary precautionary measures so that your site is not adversely affected.

We are requesting that you remove the link back to our site. 
I do not know that her post was part of this effort, but I am removing it as a precaution.

Wednesday, February 1, 2012

Writing on the iPad

This Blog entry is an experiment, as was my writing a full scale article (3500 words) on the iPad.

The article was on measuring reliability in long term digital archiving. I based it on talks given in Tallinn, Estonia, and at a workshop here in Berlin, and I copied the text from the slides onto the iPad using Dropbox, though I could easily have mailed them to myself as well. Then I purchased the Apple Pages app and imported the text into Pages so that I had a ready-made outline. Actually I almost never write from an outline, so in some ways this was a bad idea, but not one for which the iPad bears any guilt.

The Pages app is very easy to use once one recognizes where one has to tap to change styles, get fonts, or send backups in the form of email copies. The backups may not have been necessary, since iCloud is sharing copies among my various Apple devices, but it seemed like useful extra protection, since I could not be sure that the iCloud would not instantly and automatically change every copy if I accidentally deleted a key portion of text. That is not a theoretical but real issue. It is easy to tap the UNDO button a bit too often. Only later did I learn that I could REDO an UNDO by holding down the UNDO longer. I found it was also surprisingly easy to highlight far too marge a segment of text and then brush a key that deleted it accidentally. UNDO and REDO are really valuable options.

The touchpad keyboard as such gave me no particular problems. I am a decent multiple-finger typist, but not especially fast. Nonetheless I do find that I often hit one of the keys in the bottom row rather than the space bar. The spelling checker and word-suggestion system is unexpectedly good, but the price for corrections in multiple languages is that I must constantly switch keyboards, since the spelling and keyboard choices are linked. With the English and German keyboards this is family simple, since only the Y and Z keys shift, but I am very accustomed to the German keyboard (all of my other devices have German keyboard) and sometimes my finger strays. The correction facility is fairly good at catching and fixing this.

One advantage that I had hoped for with the iPad was that I could carry the machine with me easily and write in even quite short blocks of time, such as in the S-BahnCard train (six minutes from my home station to the office). I found that that worked fairly well as long as I worked on the article so regularly that I had it mostly in mind and did not have to search back to find the threat of what I wrote. Generally I write in landscape mode because the keyboard is bigger and mistakes are therefore fewer, but portrait mode give a far better sense of the virtual page. By the end of the article I tended to use portrait more often, especially when I was revising what I wrote.

Only very recently have I returned to using Microsoft Word on my other computers, and I confess that it is really good. Mostly I don't want its features though. My articles require little fancy formatting or inserts. The Pages app is definitely no substitute for Word, but as a basic word processing tool on the iPad, I found it met my needs well.