Making a Web Search Feel Like a Stroll in the Library

 

August 19, 2004, NYTimes, G5

 By ANNE EISENBERG

 

A VISIT to the school library was once a necessity before

writing term papers or reports. But nowadays many students

use the Internet as their library.

 

However convenient it may be to search the Web from home or

a dorm room, the Internet cannot replace many of the

built-in benefits of the library, like browsing the stacks

for related information that could add spark and depth to

an essay or a report.

 

But researchers are working on more flexible approaches to

searching for digital information not only on the Web, but

on one's own hard drive, where elusive details may be

scattered through photos, e-mail and other files.

 

At the University of California, Berkeley, a professor and

her students have created a search program called Flamenco

that lets users browse a digitized collection in ways that

are similar to a stroll among the shelves of a library.

 

"It's for when you are not quite sure what you want," said

Marti Hearst, an associate professor at the School of

Information Management and Systems, who led the research.

"It's meant to help people find things, in part, by

serendipity."

 

To create Flamenco, Dr. Hearst started with one archived

collection of art at the Fine Arts Museums of San

Francisco, which included 35,000 images that were

identified by written descriptions. She used the

descriptions to classify the items in a variety of ways,

including the medium, the date, the artist and the content

of the image.

 

The categories were then cross-linked so that when people

clicked on a category, they immediately saw not only the

images within it - say, of landscapes - but those in

related categories, like other artists working on

landscapes at the same time in the Netherlands.

 

The effect, she said, is very much like walking down a

library aisle and finding related books on a subject.

 

The search program is also intended to let people look at

multiple subcategories at once, she said. For example, a

student doing research for an essay on the depiction of

flowers in the 18th century can click on the "flowers"

category. The system can immediately group the flowers in

the collection by subcategories like the kind of flower and

show thumbnail-size images of them.

 

It can then group the irises or chrysanthemums by medium,

for instance, listing all the ceramics pieces showing these

flowers or all of the prints or drawings that include them.

It can group the images by decade - showing, for example,

how flowers were portrayed in 1740 compared to 1780. "This

way," Dr. Hearst said, "people can compare and contrast,

discovering new categories and relationships."

 

Dr. Hearst has been working for 10 years on ways to browse

digital collections, inspired in part by her own

frustration in searching the Web. Flamenco, financed in

part by the National Science Foundation, is still a

prototype; she will be testing it this month with students.

 

 

The Web is not the only place where searches are made.

Often, necessary details are scattered across a computer

hard drive, making them hard to find. To address this

problem, Bruce Horn, the founder of Ingenuity Software in

Mammoth Lakes, Calif., has created an information

management system, now being tested, that lets people

individually tailor and cross-index all kinds of files.

 

Dr. Horn, one of the members of the original Macintosh team

at Apple Computer, has added another layer of organization

beyond folders to his desktop system. The layer is called

"collections" because the system collects and cross-links

all references to any subject that the user specifies. For

example, someone researching John Adams and his presidency

could make a collection by telling the program to find any

mention of him and related historical events.

 

While some current software uses a "collection" system to

keep track of one kind of file - digital photos, for

instance - Dr. Horn's software can handle many kinds of

files.

 

The collection does not copy the actual items, a move that

could multiply storage demands and possibly lead to changes

in original documents. "The items remain in their original

folders," he said, "and are referenced by the collection."

 

There are many ways to put objects into collections.

"People can drag and drop them in," he said, "or use an

annotation to classify items one by one, for instance, in a

group photo." Items can also be put into collections

automatically by using key phrases.

 

Dr. Horn and Dr. Hearst both presented their work at a

conference at the I.B.M. Almaden Research Center in

California, organized by Daniel Russell, senior scientist

there, to discuss new approaches to dealing with the

ever-increasing mass of the Web. "Too much information was

our topic this year," Dr. Russell said. "Way too much

information."

 

New types of information are constantly evolving, he added,

citing moblogs - Web pages filled with photos from

cellphones - as one of the latest examples. Video, too, is

being stored at a ferocious rate, he said, as are radio

shows.

 

And all of it has to be made searchable, he said.

 

*****