Cheating Probabilistic Systems

Shamus Young makes some interesting comments regarding last week's post on probabilistic systems. He makes an important distinction between weblogs, which have no central point of control ("The weblog system is spontaneous and naturally occurring."), and the other systems I mentioned, which do. Systems like the ones used by Google or Amazon are centrally controlled and usually reside on a particular set of servers. Shamus then makes the observation that such centralization lends itself to "cheating." He uses Amazon as an example:
You’re a company like Amazon.com. You buy a million red widgets and a million blue widgets. You make a better margin on the blue ones, but it turns out that the red widgets are just a little better in quality. So the feedback for red is a little better. Which leads to red being recommended more often than blue, which leads to better sales, more feedback, and even more recommendations. Now you’re down to your last 100,000 red but you still have 500,000 blue.

Now comes the moment of truth: Do you cheat? You’d rather sell blue. You see that you could “nudge” the numbers in the feedback system. You own the software, pay the programmers who maintain it, and control the servers on which the system is run. You could easily adjust things so that blue recommendations appear more often, even though they are less popular. When Amazon comes up with “You might also enjoy… A blue widget” a customer has no idea of the numbers behind it. You could have the system try to even things out between the more popular red and the more profitable blue.
His post focuses mostly on malicious uses of the system by it's owners. This is certainly a worry, but one thing I think I need to note is that no one really thinks that these systems should be all that trustworthy. The reason the system works is that we all hold a certain degree of skepticism about it. Wikipedia, for instance, works best when you use it as a starting point. If you use it as the final authority, you're going to get burned at some point. The whole point of a probabilistic system is that the results are less consistent than traditional systems, and so people trust them less. The reason people still use such systems is that they can scale to handle the massive amounts of information being thrown at them (which is where traditional systems begin to break down).
Today Wikipedia offers 860,000 articles in English - compared with Britannica's 80,000 and Encarta's 4,500. Tomorrow the gap will be far larger.
You're much more likely to find what you're looking for at Wikipedia, even though the quality of any individual entry at Wikipedia ranges from poor and inaccurate to excellent and helpful. As I mentioned in my post, this lack of trustworthiness isn't necessarily bad, so long as it's disclosed up front. For instance, the problems that Wikipedia is facing are related to the fact that some people consider everything they read there to be very trustworthy. Wikipedia's policy of writing entries from a neutral point of view tends to exacerbate this (which is why the policy is a controversial one). Weblogs do not suffer from this problem because they are written in overtly subjective terms, and thus it is blatantly obvious that you're getting a biased view that should be taken with a grain of salt. Of course, that also makes it more difficult to glean useful information from weblogs, which is why Wikipedia's policy of writing entries from a neutral point of view isn't necessarily wrong (once again, it's all about tradeoffs).

Personally, Amazon's recommendations rarely convince me to buy something. Generally, I make the decision independently. For instance, in my last post I mentioned that Amazon recommended the DVD set of the Firefly TV series based on my previous purchases. At that point, I'd already determined that I wanted to buy that set and thus Amazon's recommendation wasn't so much convincing as it was convenient. Which is the point. By tailoring their featured offerings towards a customer's preferences, Amazon stands to make more sales. They use the term "recommendations," but that's probably a bit of a misnomer. Chances are, they're things we already know about and want to buy, hence it makes more sense to promote those items... When I look at my recommendations page, many items are things I already know I want to watch or read (and sometimes even buy, which is the point).

So is Amazon cheating with its recommendations? I don't know, but it doesn't really matter that much because I don't use their recommendations as an absolute guide. Also, if Amazon is cheating, all that really means is that Amazon is leaving room for a competitor to step up and provide better recommendations (and from my personal experience working on such a site, retail websites are definitely moving towards personalized product offerings).

One other thing to consider, though, is that it isn't just Amazon or Google that could be cheating. Gaming Google's search algorithms has actually become a bit of an industry. Wikipedia is under a constant assault of spammers who abuse the openness of the system for their own gain. Amazon may have set their system up to favor items that give them a higher margin (as Shamus notes), but it's also quite possible that companies go on Amazon and write glowing reviews for their own products, etc... in an effort to get their products recommended.

The whole point is that these systems aren't trustworthy. That doesn't mean they're not useful, it just means that we shouldn't totally trust them. You aren't supposed to trust them. Ironically, acknowledging that fact makes them more useful.

In response to Chris Anderson's The Probabilistic Age post , Nicholas Carr takes a skeptical view of these systems and wonders what the broader implications are:
By providing a free, easily and universally accessible information source at an average quality level of 5, will Wikipedia slowly erode the economic incentives to produce an alternative source with a quality level of 9 or 8 or 7? Will blogging do the same for the dissemination of news? Does Google-surfing, in the end, make us smarter or dumber, broader or narrower? Can we really put our trust in an alien logic's ability to create a world to our liking? Do we want to be optimized?
These are great questions, but I think it's worth noting that these new systems aren't really meant to replace the old ones. In Neal Stephenson's The System of the World, the character Daniel Waterhouse ponders how new systems supplant older systems:
"It has been my view for some years that a new System of the World is being created around us. I used to suppose that it would drive out and annihilate any older Systems. But things I have seen recently ... have convinced me that new Systems never replace old ones, but only surround and encapsulate them, even as, under a microscope, we may see that living within our bodies are animalcules, smaller and simpler than us, and yet thriving even as we thrive. ... And so I say that Alchemy shall not vanish, as I always hoped. Rather, it shall be encapsulated within the new System of the World, and become a familiar and even comforting presence there, though its name may change and its practitioners speak no more about the Philosopher's Stone." (page 639)
And so these new probabilistic systems will never replace the old ones, but only surround and encapsulate them...