« Mac: open source projects need your help | Main | Don't rent a car from Capital Car in Edinburgh »
April 27, 2005
MSN's capitalistic search engine bias for Microsoft-driven web sites: damned statistics and the truth.
Slashdot readers have brains, despite what I've heard otherwise. However, the human mind doesn't always work as well as we thing it should: humans interpret their surroundings with a strong bias towards what we expect and what we want to hear.
One the one hand, it's a good thing. It's one of the things that makes us creative individuals as opposed to a big grey mass, where every intelligent person thinks the same about, say, a news story. Our bias towards the expected also helps us to perceive our world - for example a complicated continuous audio signal as speech.
Then again, it's not a great idea to confuse a biased opinion with truth. Let's look at the Slashdot crowd. The readers of one of the biggest geek websites on the planet have a very particular bias when it comes to software-producers. Open Source people are good. Microsoft is bad. Apple is good, despite being somewhat close-source.
Thankfully, we have the mathematical means to overcome our own tinted glasses that make us wine-goggle in the bars on Saturday nights. They're called statistics. A hated subject for those that believe in believing. I guess I'd simply say: Believing is fun, Knowing is fun-ner!
So, the other day, Slashdot posted a news story about Microsoft's own search engine (MSN search) having a tendency to list search results before other ones, if the sites are delivered by a Microsoft web server. That means, if you run a web site that uses Microsoft's (known-to-be inferior) web server IIS, it will turn up earlier than those run using the unix-based Apache server. So, the Microsoft search engine has a bias in favor of MS-driven sites. At least that's what the Slashdot story suggested.
![]()
(Diagrams courtesy of Ivor.it)
Looking at the original article, I noticed that it listed only some 20 search queries that were used to verify this disturbing hypothesis. It showed that the MS-engine shows a slightly higher proportion of IIS-run sites than Google does.
A good analysis should ask: is this pure chance, or is there a real correlation? My comment on Slashdot about the result: Is it significant? That is, can we exclude (to a reasonable certainty, that is, p>0.95) the possibility that the effect seen cannot be attributed to chance or some other criterion MSN uses?
Later on, the author the article decided to give us more (better) data, and a fellow Slashdot reader ran a quick Chi-square significance test and showed that there is a correlation beyond reasonable doubt. (What I mean is that the independent variable "Search Engine" (=MSN or Google) and the dependent variable "IIS proportion") are related according to the results.
So, I rest my case, it's a proven thing - Microsoft is evil, they manipulate their search results big-time right?
Couldn't be further from the truth. Apart from the fact that the effect - that is the bias - might not be very strong, the correlation doesn't mean that M$ introduced this bias. Data need to be approached with more caution.
Maybe it's just that sites with that particular web server are of higher quality? Maybe there are so many junk sites out there run with Apache simply because it's cheaper to install Apache? (see also this comment.)
This is just a little episode, and I have the feeling we encounter situations like this one all the time when dealing with scientific data. The mind plays tricks on us and hides the truth. Then, our statistical analysis plays more tricks, suggesting maybe what the statistician wanted to see. Oh well, life's a bitch, everything is relative and there is no truth anyways. Over and out.
Posted by dr at April 27, 2005 2:14 PM
Trackback Pings
Please use the following TrackBack URL:
http://www.davids-world.com/~dr/cgi-bin/mt/mt-tb.cgi/53