August
19, 2004, NYTimes, G5
By ANNE EISENBERG
A
VISIT to the school library was once a necessity before
writing
term papers or reports. But nowadays many students
use
the Internet as their library.
However
convenient it may be to search the Web from home or
a
dorm room, the Internet cannot replace many of the
built-in
benefits of the library, like browsing the stacks
for
related information that could add spark and depth to
an
essay or a report.
But
researchers are working on more flexible approaches to
searching
for digital information not only on the Web, but
on
one's own hard drive, where elusive details may be
scattered
through photos, e-mail and other files.
At
the University of California, Berkeley, a professor and
her
students have created a search program called Flamenco
that
lets users browse a digitized collection in ways that
are
similar to a stroll among the shelves of a library.
"It's
for when you are not quite sure what you want," said
Marti
Hearst, an associate professor at the School of
Information
Management and Systems, who led the research.
"It's
meant to help people find things, in part, by
serendipity."
To
create Flamenco, Dr. Hearst started with one archived
collection
of art at the Fine Arts Museums of San
Francisco,
which included 35,000 images that were
identified
by written descriptions. She used the
descriptions
to classify the items in a variety of ways,
including
the medium, the date, the artist and the content
of the
image.
The
categories were then cross-linked so that when people
clicked
on a category, they immediately saw not only the
images
within it - say, of landscapes - but those in
related
categories, like other artists working on
landscapes
at the same time in the Netherlands.
The
effect, she said, is very much like walking down a
library
aisle and finding related books on a subject.
The
search program is also intended to let people look at
multiple
subcategories at once, she said. For example, a
student
doing research for an essay on the depiction of
flowers
in the 18th century can click on the "flowers"
category.
The system can immediately group the flowers in
the
collection by subcategories like the kind of flower and
show
thumbnail-size images of them.
It
can then group the irises or chrysanthemums by medium,
for
instance, listing all the ceramics pieces showing these
flowers
or all of the prints or drawings that include them.
It
can group the images by decade - showing, for example,
how
flowers were portrayed in 1740 compared to 1780. "This
way,"
Dr. Hearst said, "people can compare and contrast,
discovering
new categories and relationships."
Dr.
Hearst has been working for 10 years on ways to browse
digital
collections, inspired in part by her own
frustration
in searching the Web. Flamenco, financed in
part
by the National Science Foundation, is still a
prototype;
she will be testing it this month with students.
The
Web is not the only place where searches are made.
Often,
necessary details are scattered across a computer
hard
drive, making them hard to find. To address this
problem,
Bruce Horn, the founder of Ingenuity Software in
Mammoth
Lakes, Calif., has created an information
management
system, now being tested, that lets people
individually
tailor and cross-index all kinds of files.
Dr.
Horn, one of the members of the original Macintosh team
at
Apple Computer, has added another layer of organization
beyond
folders to his desktop system. The layer is called
"collections"
because the system collects and cross-links
all
references to any subject that the user specifies. For
example,
someone researching John Adams and his presidency
could
make a collection by telling the program to find any
mention
of him and related historical events.
While
some current software uses a "collection" system to
keep
track of one kind of file - digital photos, for
instance
- Dr. Horn's software can handle many kinds of
files.
The
collection does not copy the actual items, a move that
could
multiply storage demands and possibly lead to changes
in
original documents. "The items remain in their original
folders,"
he said, "and are referenced by the collection."
There
are many ways to put objects into collections.
"People
can drag and drop them in," he said, "or use an
annotation
to classify items one by one, for instance, in a
group
photo." Items can also be put into collections
automatically
by using key phrases.
Dr.
Horn and Dr. Hearst both presented their work at a
conference
at the I.B.M. Almaden Research Center in
California,
organized by Daniel Russell, senior scientist
there,
to discuss new approaches to dealing with the
ever-increasing
mass of the Web. "Too much information was
our
topic this year," Dr. Russell said. "Way too much
information."
New
types of information are constantly evolving, he added,
citing
moblogs - Web pages filled with photos from
cellphones
- as one of the latest examples. Video, too, is
being
stored at a ferocious rate, he said, as are radio
shows.
And
all of it has to be made searchable, he said.
*****