Information retrieval: a view from the legal library.
Information retrieval: what is it? According to Baeza-Yates
and Ribeiro-Neto (1999) ‘the key goal of an IR system is to retrieve
information which might be useful or relevant to the user.’ It is possible to
extract three core properties of information retrieval from this statement:
systems, relevance and users. Information retrieval using digital technologies
revolves around the interplay between these three elements and IR models have
usually been viewed with an emphasis on relevance in relation to users
(user-centric) or relevance in relation to systems (system-centred). Chowdhury
(2004). Both views of IR are important
in defining its success even though the two concepts are quite different: from
the point of view of a user what is relevant might be defined by a broad range
of external factors, while from a system’s point of view what is relevant might
be determined by performing calculations. However different these two
approaches are, they are both important for successful information retrieval.
Therefore, in this essay I am going to look at some of the system models for
information retrieval, and evaluate them from the point of view of a user in
the legal profession. After that I shall look at a few of the issues
surrounding managing digital information using information technology.
Evaluating:
The Boolean model works by allowing users to transmit
queries to a database using the AND, OR or NOT operators to refine a search.
Chowdhury (2004) describes one limitation of the Boolean method of searching by
highlighting the fact that it selects relevant documents by simply matching a
query with an index term. Therefore the system does not provide a relevance
ranking on the documents retrieved, and a user may have to order the results,
e.g. chronologically, alphabetically. For users conducting legal research
however, this may not be a limitation, as this kind of search will provide high
recall – useful for a legal professional wanting to gain a 360’ view of a topic.
Rosenfeld and Morville (2002). In addition, legal practitioners need recent
commentary, and as such precision becomes less important than having lots of
results that might be relevant and which can be displayed in date order.
Another IR model is Best Match. Best match searching
involves a ranking of weightings of the importance of a term in a query or
document, coupled with a means of using these term weightings to calculate the
similarity between a document and a query. Chowdhury (2004). Those queries and
documents with the highest similarity will be ranked most relevant. Natural
language querying can be used in this type of IR system. Westlaw UK and Lexis
Library (online legal databases) both use this kind of natural language system in
addition to the Boolean method, which allows greater flexibility in searching. Baeza-Yates and Ribeiro-Neto (1999).
Hypertext browsing is a slightly different approach to
information retrieval as it does not rely on search software; rather it relies
upon good hypertext design which allows easy navigation in an online database. Rosenfeld
and Morville (2002). The success of retrieving information through browsing can
depend in some circumstances on the prior knowledge of the user – both in terms
of the structure of the database and in terms of details of the information
sought (dates and citation of case law for example.) If this knowledge is not
present in the user, then a mixture of searching and browsing is an excellent
way to retrieve information – searching initially to locate the case in the
database, then browsing to find out its relevance.
Browsing has a significance which is unique to legal
research because of the way that the legal system works. If we stay with the
example of case law for now, we can see that hypertext provides a relevance
ranking quite by accident. This is because in the English legal system, the importance
of a case will be reflected by how many other cases use it as a precedent. In
online legal databases like Lexis Library and Westlaw UK, cases citing a given
case are represented within that case in hypertext. Therefore the more
hypertext in a case law document, the more important that case is – and also
the more relevant that case is to that topic of law.[1] So
you can see that browsing as a form of IR in online legal databases can provide
a relevance ranking of sorts. Although not directly linked technologically,
this provides a neat parallel with the way that a big search engine like Google
detects relevance (speaking simplistically) by assessing the number of links or HITS to a
website, and presenting websites with the most links to them as the most highly
relevant to a given search. Langville and Meyer (2006).
Managing:
In order to effectively use information technologies to
facilitate management of digital information it is important that user-centric
and system-centred approaches work harmoniously together. This means that the
information needs of a legal researcher must inform the organisation of the
digital information behind the scenes.
The user interface is a good example of this, especially
when providing different search fields, as not only is it vital to take into
account the different fields that will be necessary for the website’s audience,
but having different search fields could also impact upon the way inverted
files are managed. According to Rowley and Hartley (2008) “Inverted files are
often created for author names, title words, subject-indexing terms, and
author-title acronyms.” As inverted
files contain the addresses of documents with relevant keywords, having
separate files for different search fields is a good way to use technology to
manage digital documents. In big online full-text databases like Westlaw UK and
Lexis Library this could also extend to dividing indexes into categories of
legal information, such as cases, legislation or journals, and then into
subdivisions such as party names, subject, date.
Bibliography.
Baeza-Yates, R. and Ribeiro-Neto, B. (1999) Modern Information Retrieval, Essex,
Pearson Education Ltd.
Cooke, A. (1999) A
Guide to Finding Quality Information on the Internet, London, Library
Association Publishing
Chowdhury, G.G. (2004) Introduction
to Modern Information Retrieval, London, Facet Publishing.
Langville, A. and Meyer, C. (2006) Google’s PageRank and Beyond, Oxfordshire, Princeton University Press
Rosenfeld, L. and Morville, P. (2002) Information Architecture for
the World Wide Web, CA, O’Reilly & Associates, Inc.
Rowley, J. and Hartley (2008) Organizing knowledge: an Introduction to Managing Access to Information, Hampshire, Ashgate Publishing
[1] JustCite
(another online legal database) have represented this by creating a visual
precedent map.
No comments:
Post a Comment