Old Search Engine, the Library, Tries to Fit Into a Google World

 

NYTimes, A1, A14, June 21, 2004

 By KATIE HAFNER

 

SAN FRANCISCO, June 20 - Katarina Maxianova, who received

her bachelor's degree in comparative literature from

Columbia University in May, took a seminar last year in

which the professor assigned two articles from New Left

Review magazine. She found one immediately through Google;

for the other, she had to trek to the library stacks.

 

"Everyone in class tried to get those articles online," she

said, "and some people didn't even bother to go to the

stacks when they couldn't Google them."

 

For the last few years, librarians have increasingly seen

people use online search sites not to supplement research

libraries but to replace them. Yet only recently have

librarians stopped lamenting the trend and started working

to close the gap between traditional scholarly research and

the incomplete, often random results of a Google search.

 

"We can't pretend people will go back to walking into a

library and talking to a reference librarian," said Kate

Wittenberg, director of the Electronic Publishing

Initiative at Columbia University.

 

Ms. Wittenberg's group recently finished a three-year study

of research habits, including surveys of 1,233 students

across the country, that concluded that electronic

resources have become the main tool for information

gathering, particularly among undergraduates.

 

"We have to respond to these new ways," Ms. Wittenberg

said, and come up with a way to make better research

material available online.

 

That means working with commercial search engines like

Google and Yahoo to make ever more digital-research

materials searchable.

 

Undergraduates like Ms. Maxianova and her classmates are

not the only ones conducting research from their computers.

Faculty members also do it.

 

"One of the rarest things to find is a member of the

faculty in the library stacks," said Paul Duguid, an

information researcher who will teach a class this fall at

the University of California, Berkeley on judging the

authenticity of information found on the Web.

 

In the Columbia survey, 90 percent of the faculty members

who responded said they used electronic resources in their

research several times a week or more. Nearly all said it

was a valuable resource.

 

While the accuracy of online information is notoriously

uneven, the ubiquity of the Web means that a trip to the

stacks is no longer the way most academic research begins.

 

"The nature of discovery is changing," said Joseph Janes,

associate professor and chairman of library and information

science at the University of Washington. "I think the

digital revolution and the use of digital resources in

general is really the beginning of a change in the way

humanity thinks and presents itself."

 

A few research librarians say Google could eventually take

on more of the role of a universal library.

 

"If you could use Google to just look across digital

libraries, into any digital library collection, now that

would be cool," said Daniel Greenstein, university

librarian of the California Digital Library, the digital

branch of the University of California library system.

 

"It would help libraries achieve something that we haven't

yet been able to achieve by ourselves," Dr. Greenstein

said, "which is to place all of our publicly accessible

digital library collections in a common pool."

 

The biggest problem is that search engines like Google skim

only the thinnest layers of information that has been

digitized. Most have no access to the so-called deep Web,

where information is contained in isolated databases like

online library catalogs.

 

Search engines seek so-called static Web pages, which

generally do not have search functions of their own.

Information on the deep Web, on the other hand, comes to

the surface only as the result of a database query from

within a particular site.

 

Use Google, for instance, to research Upton Sinclair's 1934

campaign for governor of California, and you will miss an

entire collection of pamphlets accessible only from the

University of California at Los Angeles's archive of

digitized campaign literature.

 

"Google searches an index at the first layers of any Web

site it goes to, and as you delve beneath the surface, it

starts to miss stuff," said Mr. Duguid, co-author of "The

Social Life of Information." "When you go deeper, the

number of pages just becomes absolutely mind-boggling."

 

Some estimates put the number of Web pages that are hidden

from the view of most search engines at 500 billion.

 

Reference librarians are trying to bring material from the

deep Web to the surface. In recent months, dozens of

research libraries began working with Google and other

search engines to help put their collections within reach

of a broader public.

 

Carnegie-Mellon University, for instance, has digitally

scanned 1.6 million pages of archival material from the

papers of Carnegie-Mellon scientists like Herbert Simon, a

Nobel Prize winner for economics and a computer chess

expert. Now, a Google search for "Herbert Simon and

Carnegie Mellon" turns up the Simon papers.

 

Google has also indexed two million book titles through the

Online Computer Library Center, which manages a database of

catalogs from 12,000 libraries around the world.

 

Other search sites are striking similar deals. Yahoo

recently signed an agreement with the online library center

to index its catalogs, and four months ago, it started

carrying out a plan to make more of the deep Web reachable

through Yahoo.

 

Yahoo has also signed agreements with the University of

Michigan to make searchable the university's compendium of

academic collections from more than 250 institutions. And

it has indexed a digital repository at Northwestern

University of more than 2,000 hours of Supreme Court oral

arguments.

 

Yet for every archive that has become searchable by

commercial Web engines, scores are not accessible. "There's

lots of great stuff that isn't available digitally and

likely never will be," Dr. Janes said. Most books published

before 1995 fit into this category, he said, as do many

older magazines, newspapers and journals, as well as

historical maps, archives, letters, diaries, older census

statistics and genealogical materials.

 

"We have to figure out how to adapt to a world where people

will prefer digital stuff," Dr. Janes said, "yet not forgo

the investment in print and analog collections and the work

involved in mapping and maintaining those collections."

 

Research institutions are investing heavily in combining

the new with the old. At Columbia's Butler Library, the

stacks are not only alive and well, Ms. Wittenberg said,

but have been modernized to allow for better physical

access to the seven million volumes in the collection.

 

During the renovation, work areas with network connections

were placed throughout the library.

 

"A student or faculty member could work for a whole day in

what looks and feels like a very traditional library, while

accessing either the print collection or the large and

rapidly growing collection of electronic resources," Ms.

Wittenberg said.

 

Many experts, even those who specialize in digital

material, say that losing the tactile experience of books

and relying too heavily on electronic resources is certain

to exact a price.

 

"How do you know it's the appropriate universe from which

to draw your research materials?" said Dr. Greenstein. "It

has huge ramifications for the nature of instruction and

scholarship."

 

At the same time, many research librarians say that the new

reliance on electronic resources is making their role as

guides to undiscovered material more important than ever.

 

Thomas Mann, a reference librarian in the main reading room

of the Library of Congress, was reminded of this recently

while helping a visitor who was researching a famine in

Greece that occurred in 1942. A Google search had yielded

little useful information.

 

"While he was looking at newspaper articles from the 1940's

that we have digitized," Dr. Mann said, "I set up a search

on the terminal next to him in another database of

historical abstracts and history journals."

 

In less than a minute, he pulled up citations for five

scholarly articles about the famine and helped the visitor

put in requests for the paper copies from the stacks. "We

can show people things they don't ask for," Dr. Mann said.

"The historical database I got into hit it right on the

button."

 

Some library experts welcome the change with few

reservations.

 

"Although it seems like an apocalyptic change now, over

time we'll see that young people will grow up using many

ways of finding information," said Abby Smith, director of

programs at the Council on Library and Information

Resources, a nonprofit group in Washington.

 

"We'll see the current generation we accuse of doing

research in their pajamas develop highly sophisticated

searching strategies to find high quality information on

the Web," Dr. Smith said. "It's this transition period

we're in, when not all high-quality information is

available on the Web - that's what we lament."

 

Dr. Janes said that, like many others, he occasionally

pined for the days spent in musty library stacks, where one

could chance upon scholarly gems by browsing the shelves.

 

"You can think of electronic research as a more

impoverished experience," Dr. Janes said. "But in some ways

it's a richer one, because you have so much more access to

so much more information. The potential is there for this

to be a real bonus to humanity, because we can see more and

read more and do more with it. But it is going to be very

different in lots of ways."