Making Intelligence a Bit Less Artificial
May 1, 2003
By LISA GUERNSEY
WHEN Mike Kellogg of Chicago shops at Amazon.com, he can
view a list of recommended products that match his taste in
music with uncanny precision. The store knows he loves
Wilco, a folksy rock band, and offers him CD's with a
similar sound.
When Anne Heilemann of Iowa City visits Amazon, she is
immediately reminded how clueless the recommendations can
be. A few months ago she bought a book about flower girls
for a friend's 7-year-old daughter, who will be in her
wedding this month. Now Amazon offers her more flower-girl
books (one is more than enough, Ms. Heilemann said), and
assumes that she must also need a book about bargains on
baby products.
"I would freak if I needed a baby-bargains book right now,"
she said.
Mr. Kellogg's experience is a dream come true for
developers of recommender systems - software that analyzes
patterns in a customer's choices to predict what else that
person might want or need. In addition to Amazon's system,
the better-known examples include TiVo, the digital
television recorder, and NetFlix, the online DVD rental
service.
But Ms. Heilemann's experience may ring truer to most
people. Only 7.4 percent of online consumers who noticed
these systems said they often purchased recommended
products, according to a report issued in February by
Forrester Research. About 22 percent said they found the
recommendations valuable, and about 42 percent said the
products listed were not of interest.
To improve the recommendations, many software developers
are doing an about-face from the mid-1990's, when they put
their energy into getting computers to do all the work.
Today they say that automated programs that look for
patterns in customer data are not smart enough to detect a
gaffe. Something more sophisticated is required: the human
mind.
People are becoming a critical component: analysts who
understand why a particular type of music appeals to some
people, categorization experts who know how to
cross-reference material, retail executives who tweak the
system to improve the bottom line and reviewers who check
for nonsensical or offensive results.
"The holy grail is to be able to capture all the customer's
interactions in detail and get smarter about what not to
recommend," said Usama Fayyad, chief executive and
president of digiMine, the software company behind the
online recommendations of J. Crew and Barnes & Noble. "We
can recommend very well. Knowing when not to bother someone
is much harder."
Odd pitches and poor matches have led to an outpouring of
anecdotes. A discussion on a bulletin board at Salon.com
this year titled "When Customer Profiling Goes Wrong"
described people's befuddlement upon receiving off-the-wall
recommendations from Amazon. Someone named Molly wrote that
she bought "a single trashy romance novel" and is now
"branded for life."
"The best results are achieved from powerful technology and
human intervention," said Matt Turck, president of
TripleHop Technologies, a company that has built
recommendation engines for USA Today's online travel
section and SkiMatcher, which advises travelers on ski
resorts.
All this talk of human intervention sounds very different
from the hype of the dot-com boom, when startup companies
spun visions of computer programs that could help people
discover their yet-to-be-revealed tastes in books and
music. Imaginative online software was more coveted than
gold, and recommendation systems looked like a step toward
the creation of artificial intelligence. It was hard not to
be intrigued by the idea that with the right data and the
right mathematical formula, a computer might be able to
grasp a person's preferences better than friends and
family, suggesting books or movies that the consumer would
not have discovered otherwise.
"There is a sense that with people you are perceived
stereotypically, but that the system might give you a
totally different chance," said Rashmi Sinha, the founder
of Uzanto Consulting, a company that focuses on end-user
experiences with technology. She is also a cognitive
psychologist who has conducted studies of how people
respond to recommender systems.
In the mid-1990's, much attention was drawn to
collaborative filtering, a technique that matches a user to
a group of others who have purchased or praised similar
products, then analyzes the group's data to predict what
else the user might like. Patti Maes, an expert in the
field and a professor at the Media Lab of the Massachusetts
Institute of Technology, called it "automating
word-of-mouth." Firefly, a company she helped start, became
the symbol of the technology's promise. The New York Times
Magazine ran a 4,200-word article in 1997 about Firefly,
which was then valued at $100 million, Microsoft bought the
company the following year.
It turned out that Microsoft hadn't bought Firefly for its
collaborative filter. It wanted the software that kept
track of user profiles, which Firefly had called
"passport," for what became Microsoft's own Passport
software for the quick transfer of personal data.
"They weren't so much interested in the recommendation
engine," Dr. Maes said in an interview last month. "It
wasn't because they didn't believe in it, but it wasn't as
good a match for their strategy."
Collaborative filtering is only a piece of today's
recommendation technology. "There was this great
expectation that it was going to be this killer app, and it
didn't meet people's expectations," said Jack Aaronson, who
helped to design the technology for a recommendation
company called Open Sesame and who now runs the Aaronson
Group, a consulting practice.
Some of the problems with collaborative filtering are
common enough that they have earned nicknames, like the
cold start problem and the popularity effect.
To make interesting matches, a company needs a large number
of people who have rated or purchased a large number of
products. The cold start phenomenon arises when a Web site
opens but neither of those criteria has been met. Joe Smith
might buy the same book as Jane Doe, but without more data,
it would be a stretch to predict that the next book Joe
buys is one that Jane might want.
To counter that, some companies employ human editors to
make the first connections between products and likely
purchase patterns. At Barnesandnoble.com, an editorial
staff makes recommendations. Choicestream, the company that
creates the MyBestBets technology used by America Online,
has brought in analysts to distill the defining attributes
of television programs ("thought-provoking" is one example)
and uses the computer to match them with other programs
that have been similarly categorized.
"If it is not vetted and monitored by humans and not
complemented by actual hand-selling, as we say in the book
industry, it doesn't feel like there is anybody there,"
said Daniel Blackman, vice president for books, video and
music for Barnesandnoble .com.
The popularity effect is at work when results delivered by
the computer are boring and obvious. MediaUnbound, the
company that develops the recommendation engine for the MP3
service PressPlay, has been analyzing the more than four
million MP3 collections that are open for browsing through
Napster, KaZaA and other file-sharing services. Michael S.
Papish, the company's chief executive, said that if he were
to run that data through a collaborative filter to predict
musical taste, one band would be at the top of the list for
every single person: the Beatles.
Because of the Beatles' name recognition and popularity,
they are likely to be in anyone's collection, regardless of
their taste. But if MediaUnbound were to put the Beatles at
the top of every recommendation list, their service would
seem stale and uninspired.
To address that problem, the company built some new rules
into the software. But it also hired music analysts to
scout the music scene for new bands, seed the databases
with interesting acts and build genre maps to show how
musical tastes are connected.
"The computer is good at averaging things; it rounds it
out, sands it down," Mr. Papish said. "And then we use the
humans to bring back the exciting rough edges."
Colin Wambsgans, an analyst at MediaUnbound, said he was
working last week on a recommendation list generated for
consumers who like the Flatlanders, a 1970's band from
Texas that recently released a new album. He moved Squirrel
Bait, a rock band with a harder sound, down a few notches
and pushed another band, the Palace Brothers, up. "The
style of music wasn't quite the same," he said. "The Palace
Brothers have more of a folksy sound."
Even Mr. Kellogg's experience with helpful taste-matching
at Amazon was a case of a human's coming to the rescue. To
make sure that he gets recommendations that follow his
tastes, Mr. Kellogg, 29, a product manager in Chicago, has
spent a good deal of time editing his online profile. He
wants Amazon to know that he loves Bob Dylan, so he has
given high ratings to those CD's and dozens of other
favorites. When he makes a purchase for someone else - like
a karaoke CD for his 7-year-old cousin - he checks a box
that tells Amazon to ignore it.
"I actively work at my recommendations," Mr. Kellogg said.
Jason Kilar, vice president of worldwide software
operations for Amazon, would not comment on how many people
use the editing feature; Christopher M. Kelley, an analyst
at Forrester, said he figured that most people never take
the time to use it. But it has become an important tool for
taking into account the complexity of human interactions.
Mr. Kilar said that customers need to be able to
communicate, for example, "Please exclude this one because
I was temporarily out of my mind and I thought I liked
Sade."
Business sense is the latest layer to be added to
recommendation technology. Some companies want to be able
to weight the results that appear on a recommendation list
so that products they want to clear out appear at the top
and out-of-stock items are suppressed. "The technology
needs to be able to support this," said Dr. Fayyad, whose
digiMine software offers such options. But Dr. Maes and
some software developers warn that if companies allow the
bottom line to dictate their recommendations, shoppers may
distrust their systems.
Amazon says it is holding out against such tweaking. "We
let the recommendation engine do its magic," Mr. Kilar
said. "We are extremely pure."
That is giving Ms. Heilemann, a 29-year-old development
director for a university in Iowa, a good laugh. "Am I
missing something in my life?" she asked. "Maybe I really
do need this booster bundle thing for my computer or this
Wiggles 'Yummy Yummy' DVD," a video for the toddler set.
"But they do think that I need a total body yoga workout,"
she added, "and they might be right about that."