Information retrieval: a view from the legal library.

Information retrieval: what is it? According to Baeza-Yates and Ribeiro-Neto (1999) ‘the key goal of an IR system is to retrieve information which might be useful or relevant to the user.’ It is possible to extract three core properties of information retrieval from this statement: systems, relevance and users. Information retrieval using digital technologies revolves around the interplay between these three elements and IR models have usually been viewed with an emphasis on relevance in relation to users (user-centric) or relevance in relation to systems (system-centred). Chowdhury (2004). Both views of IR are important in defining its success even though the two concepts are quite different: from the point of view of a user what is relevant might be defined by a broad range of external factors, while from a system’s point of view what is relevant might be determined by performing calculations. However different these two approaches are, they are both important for successful information retrieval. Therefore, in this essay I am going to look at some of the system models for information retrieval, and evaluate them from the point of view of a user in the legal profession. After that I shall look at a few of the issues surrounding managing digital information using information technology.

Evaluating:

The Boolean model works by allowing users to transmit queries to a database using the AND, OR or NOT operators to refine a search. Chowdhury (2004) describes one limitation of the Boolean method of searching by highlighting the fact that it selects relevant documents by simply matching a query with an index term. Therefore the system does not provide a relevance ranking on the documents retrieved, and a user may have to order the results, e.g. chronologically, alphabetically. For users conducting legal research however, this may not be a limitation, as this kind of search will provide high recall – useful for a legal professional wanting to gain a 360’ view of a topic. Rosenfeld and Morville (2002). In addition, legal practitioners need recent commentary, and as such precision becomes less important than having lots of results that might be relevant and which can be displayed in date order.

Another IR model is Best Match. Best match searching involves a ranking of weightings of the importance of a term in a query or document, coupled with a means of using these term weightings to calculate the similarity between a document and a query. Chowdhury (2004). Those queries and documents with the highest similarity will be ranked most relevant. Natural language querying can be used in this type of IR system. Westlaw UK and Lexis Library (online legal databases) both use this kind of natural language system in addition to the Boolean method, which allows greater flexibility in searching. Baeza-Yates and Ribeiro-Neto (1999).

Hypertext browsing is a slightly different approach to information retrieval as it does not rely on search software; rather it relies upon good hypertext design which allows easy navigation in an online database. Rosenfeld and Morville (2002). The success of retrieving information through browsing can depend in some circumstances on the prior knowledge of the user – both in terms of the structure of the database and in terms of details of the information sought (dates and citation of case law for example.) If this knowledge is not present in the user, then a mixture of searching and browsing is an excellent way to retrieve information – searching initially to locate the case in the database, then browsing to find out its relevance.

Browsing has a significance which is unique to legal research because of the way that the legal system works. If we stay with the example of case law for now, we can see that hypertext provides a relevance ranking quite by accident. This is because in the English legal system, the importance of a case will be reflected by how many other cases use it as a precedent. In online legal databases like Lexis Library and Westlaw UK, cases citing a given case are represented within that case in hypertext. Therefore the more hypertext in a case law document, the more important that case is – and also the more relevant that case is to that topic of law.[1] So you can see that browsing as a form of IR in online legal databases can provide a relevance ranking of sorts. Although not directly linked technologically, this provides a neat parallel with the way that a big search engine like Google detects relevance (speaking simplistically) by assessing the number of links or HITS to a website, and presenting websites with the most links to them as the most highly relevant to a given search. Langville and Meyer (2006).

Managing:

In order to effectively use information technologies to facilitate management of digital information it is important that user-centric and system-centred approaches work harmoniously together. This means that the information needs of a legal researcher must inform the organisation of the digital information behind the scenes.

The user interface is a good example of this, especially when providing different search fields, as not only is it vital to take into account the different fields that will be necessary for the website’s audience, but having different search fields could also impact upon the way inverted files are managed. According to Rowley and Hartley (2008) “Inverted files are often created for author names, title words, subject-indexing terms, and author-title acronyms.” As inverted files contain the addresses of documents with relevant keywords, having separate files for different search fields is a good way to use technology to manage digital documents. In big online full-text databases like Westlaw UK and Lexis Library this could also extend to dividing indexes into categories of legal information, such as cases, legislation or journals, and then into subdivisions such as party names, subject, date.

Bibliography.

Baeza-Yates, R. and Ribeiro-Neto, B. (1999) Modern Information Retrieval, Essex, Pearson Education Ltd.

Cooke, A. (1999) A Guide to Finding Quality Information on the Internet, London, Library Association Publishing

Chowdhury, G.G. (2004) Introduction to Modern Information Retrieval, London, Facet Publishing.

Langville, A. and Meyer, C. (2006) Google’s PageRank and Beyond, Oxfordshire, Princeton University Press

Rosenfeld, L. and Morville, P. (2002) Information Architecture for the World Wide Web, CA, O’Reilly & Associates, Inc.

Rowley, J. and Hartley (2008) Organizing knowledge: an Introduction to Managing Access to Information, Hampshire, Ashgate Publishing

[1] JustCite (another online legal database) have represented this by creating a visual precedent map.

Information is Beautiful

Pages

Sunday, 30 October 2011

Coursework 1:

Rowley, J. and Hartley (2008) Organizing knowledge: an Introduction to Managing Access to Information, Hampshire, Ashgate Publishing

No comments:

Post a Comment

Search This Blog