MEB logo

Mary E. Brown, Ph.D., Professor
Information Science

Southern Connecticut State University
501 Crescent Street, New Haven, CT 06515

Department of Information and Library Science
Fax: 1.203.392-5780 / Phone: 1.203.392-5781
Toll Free: 1-888-500-SCSU, then press 4

Resources:

Students Ask

Fair Use Guidelines for Educational Multimedia

Avoiding Plagiarism


Dr. Brown Home



Buley Library

History Concepts Users Support Ethics Global Service-bases Evaluation
   

Supporting digital libraries

In this unit we ask three questions: What are the problems in merging systems? What are some of the more exciting developments? What are the problems of collaboration in DLs? In answering these questions we consider non-uniform query languages, Web-based collaborative library research, and complex information environments.

 

WHAT ARE THE PROBLEMS IN MERGING SYSTEMS?

Merging results

Voorhees and Tong (1997) look at the problem of multiple search engines being involved in merging multiple independent databases of text. Specifically, there is a need for a single cohesive search across these databases which will accommodate the range of topics, the diverse vocabularies, and the differences in depth of coverage and which will combine the results of these multiple independent searches into a single, cohesive response. The problem of combining retrieval results from multiple databases into a single result that has the best possible effectiveness is termed database merging or metasearching (Voorhees & Tong, 1997). The need for metasearching is exemplified by the following problem: If five databases are searched, each database might rank order the probable relevance of each retrieved item to the search query, however, the most relevant document in one database could be of low relevance compared to all five databases together, therefore, the items of all five databases need to be rank-ordered against each other for probable relevance to the search query.

One approach is to use published metadata (information/statistics on each database, such as range of topics, depth of coverage, probability of relevance). The problem with this approach is the user is not supreme in the search and the work is completed at the server end. One solution is to use agents that collect the history of user behavior (historical metadata of long-term, stable information needs of the user and relevance patterns of prior retrieved documents). The benefit of an agent-assisted approach is the system does not impose a cost to the user, rather, the agent is supported at the user end. Certainly, this is an area of high interest in research and development.

Non-uniform query languages

Chang and Garcia-Molina (1997) are concerned with non-uniform query languages which make searching over heterogeneous information sources difficult. Their solution is to allow the user to compose boolean queries in one rich front-end language, which then translates/transforms the user query into a subsuming query that is submitted to the information source. As this method may retrieve more documents than are relevant to the original query, the results are processed by a filter query and the final results are transferred to the user. The problem found with this two-process solution is that costs of post-filtering may be significant. This research evaluates the acceptability of the cost of this methodology.

There are meta-searchers on the internet which share the goals of Chang and Garcia-Molina. Some of these are SavvySearch, accessible at http://www.savvysearch.com/, and Meta-Crawler. However, these meta-searchers are only slightly more powerful than the least common denominator supported by the sources they search (Chang & Garcia-Molina, 1997).

 

WHAT ARE SOME OF THE MORE EXCITING DEVELOPMENTS?

DLITE

One of the most exciting lines of research is DLITE by Cousins, Paepcke, Winograd, Bier, and Pier (1997) of Stanford and Xerox PARC. Using prior research in library use as a resource, this group developed a metaphor of workcenters which are customized for user's task. Workcenters, loosely comparable to work areas such as a kitchen or a hobby room, contain tools needed to accomplish a given task. The tools are represented as icons and are utilized through a drag-and-drop manipulation. For example, a user of a library selects (by clicking on the icon) the bibliography-creation workcenter. In the create-query dialogue box, the user enters a topic. This causes a query icon to appear on the screen. The query icon can now be dragged to and dropped into the icon for the databases to be searched. The results are returned in separate documents for each search service. By dragging the various documents to a common folder, the results are merged. By dragging this result to another icon, the retrieved results will be formatted into a bibliography that can be appended to a paper or saved as a personal collection through which future written work can be passed to located the complete bibliographic reference for citations within the work.

The workcenter metaphor contains five components (tools): documents, collections, queries, services, and representations of people. Documents can be anything from a simple citation to a complex entity with hundreds of information fields and text and graphics documents uploaded from the user's local disks. Collections are containers for other components, such as documents, which seem to retain a memory of the activity which retrieved the contents. (In the case of a database search, this would allow the user to reactivate the search to add additional references to those already found. Queries are expressions of the user's information need. Queries can range from simple lists of keywords to complex boolean search strategies. One of the services available through the workcenter is a query translator that interprets a query into the form needed for various databases. Queries can be stored for reuse and sharing with other users. Services, available through the component termed the InfoBus include summarization facilities, optical character recognition, query expansion, format translation, and bibliography processing. Representations of people (icons) are components that manage access control, communications, payments, and authorizations and which contain the information necessary to execute management tasks. For example, when a user logs on, an icon appears. Contained in the icon is information about that user which will be passed on to vendor as charges are incurred, for example, during an online search.

Documents in the DLITE environment do not necessarily represent digital materials. DLITE documents may also represent, through an abstract, bibliographic record, etc., materials that exist on paper, film, etc. We would expect, in these cases, for as much information to be presented to the user as is available on gaining access to the materials.

As DLITE is being designed to work across the internet, its developers have leveraged web browsers, such as Netscape Navigator, as a space to process and display retrieved documents. From there, documents may be transferred to a disk and word processing application.

While DLITE appears to be a technology that will leave users wiggling in delight, in so far as its mechanisms go, the graphic component may leave something to be desired. The picto-cabulary used in DLITE does not seem to be intuitive and seems to lack the camaraderie of the component-mechanisms. This is a marvelous piece of work in need of an equally marvelous interface designer.

InterBib, bibliography-related services, is found at: http://www-interbib.stanford.edu/~testbed/interbib.

Web-based collaborative library research

Robertson, Jitan and Reese (1997) report on a design for web-based collaborative library research. A corporate research library has moved many of its resources and services to the organization*s intranet. (Roughly, an intranet can be thought of as an internet limited to a small select group, such as members of a business organization.) The corporate research library conducts information searches and research analyses for employees of the organization. The described web-based system makes library research activities highly visible and collects usage statistics as a background activity (for purposes of billing and research visibility). As a research request is submitted, it generates its own interactive web page through which researchers and clients chat, share files, manage their activities, provide feedback, and learn about each another. Automatic listing and indexing of research requests (generated from collected statistics) are viewable, permitting possible future formation of interest communities who share common research interests.

Robertson et al. (1997) point out that while the major perceived advantage of digitalization is to find information, observations of work that occurs in libraries shows that interaction and collaboration are central to information-gather. Research shows that library users intermingle their various activities (social, technical, data-gathering, analytic) while working in the library. From these observations emerge a new vision of technology bringing together otherwise dispersed knowledge worker for sharing of knowledge and special expertise in an asynchroneous collaborative workspace.

In this system, each client has a homepage that is generated when a new client enters the system. The client's homepage contains links to initiate a research request, view pending and archived research requests, view statistics about their own usage of research services, as well as a link to the group page of the clients business unit within the organization. There is also a mailto link that permits email to be sent to the client without having to know the particulars of the email address.

When the client selects the initiate-a-research-request link, a new request form--a webpage customized to the client--appears and the client types in a description of his or her information need. The information-request webpage serves as a recording tool for all interactions between the client and the researchers regarding this information research request. The research request records a request number, date received, date needed, status of request, billing time, client's name and group, and assigned researcher's name. The page is divided into sections: Comments by the client, the researcher, or other employees who know of relevant sources; Actions of the researcher and billing time spent; and Files, consisting of hotlinks to digital copies with notations on source, topic covered, copyright and permissions.

Each researcher also has a personal page, allowing each to have a "sense of 'place'" on the web. Personal information on the researcher and on the client is carried automatically from the personal page to each action page created from the personal page. The system has automated many management and clerical function and is an outstanding example of how technology can, at a relatively low-tech-user entry, greatly benefit and enhance the research function in an organization. This is a must see for anyone in a research library environment.

 

WHAT ARE THE PROBLEMS OF COLLABORATION IN DLs?

Complex information environments

Schiff, Van House and Butler (1997) address digital libraries as a social artifact, that is, as an artifact of human social activity. Specifically Schiff et al. report from the perspective of negotiating information needs within the context of restrictive social relations during a project in which they developed a system to support collaborative watershed planning. The collaborative planning took place among federated concerns: There was no value neutral position. Schiff et al. concluded that investigating and understanding the context and social situatedness of a digital library is important for the development of a social informatics of digital libraries. Schiff et al. used the work of the French sociologist Pierre Bourdieu, which includes a theoretical framework and a suite of analytical tools, as a platform for their initial work with watershed planning. The specific works of Bourdieu cited by Schiff et al. are:

Outline of a Theory of PracticeCambridge University Press, 1977.
Distinction: A Social Critique of the Judgement of Taste. Harvard University Press, 1984.
The forms of capital. In John G. Richardson, editor, Handbook of Theory and Research for the Sociology of Education. Greenwood Pres, 1986.
The Logic of Practice. Stanford University Press, 1990.

 

REFERENCES

Chang, C-C. & Garcia-Molina, H. (1997). Evaluating the cost of Boolean query mapping. Proceedings of the 2nd ACM International Conference on Digital Libraries, 103-111.

Cousins, S. B., Paepcke, A., Winograd, T., Bier, E. Z., & Pier, K. (1997). The digital library integrated task environment (DELITE). Proceedings of the 2nd ACM International Conference on Digital Libraries, 142-151.

Robertson, S., Jitan, S., & Reese, K. (1997). Web-based collaborative library research. Proceedings of the 2nd ACM International Conference on Digital Libraries, 152 160.

Voorhees, E. M. & Tong, R. M. (1997). Multiple search engines in database merging. Proceedings of the 2nd ACM International Conference on Digital Libraries, 93-102.


           

                       

    Last Modified Thursday, July 7, 2005

This site is maintained by Mary E. Brown, Ph.D. Art work by Valerie Samandar from photograph of sculpture on Southern's campus.