« Txt msgs from Dublin, Ireland | Main | Places I've been, things I've seen »
February 18, 2005
Blog spam: nonsense in the blogosphere
A new form of spam is on the horizon: Blog Spam. For a while now I noticed items popping up as new time and again in my Technorati RSS feeds that show me new entries for some keywords. "Cheap Web Hosting" is one of them. As you can easily see, it's just a collection of keywords that obviously attract visitors.
In principle, spammers are following a beaten path here: search engine spam is an old phenomenon. People just put up junk on the web, with some highly sought-after keywords. Thousands of junk sites link to some only-commerces, only to make their sites turn up first in Google and other search engines. What is new is that they try to use content-rich blog platforms: Blogger and the like potentially have a high site-ranking, or, if you want to call it that way, a high credibility among the search engines. Of course, this depends on the ranking algorithm used by the search engine.
Comment spam essentially does the same: millions of junk comments on innocent people's blogs, which contain link to just a few targets that deal with online gambling. A alliance of Google and the makers of content management systems like Movable Type have developed a response: the blog systems now classify comment links in a general way (assigning rel="nofollow" to the link), so that search engines can simply ignore the links. Maybe that's why a few spammers are trying to get their feet wet in even muddier waters. They produce not just comments, but whole blogs, with dozens of nonsense-entries in just one day.
Often, these guys just mark all of their entries as updated -- daily. This will bring up their nonsense entries as a top search result on blog search engines like Technorati. Of course, the spammer can freely link to some target sites from there - however, that wouldn't be his main concern, as he could do so with any web site.
To tackle this problem, free blogspace providers such as Blogger will have to do something about people that create blogs automatically, and maybe they will have to aggressively delete blogs which aren't filled with proper, editorial content. Search engines, on the other hand, will need to employ more sophisticated techniques to determine the value - and thus, the ranking, of content. It would be an interesting way to investigate technical methods to determine, which content is actually edited, and which is just junk. Techniques from natural language processing and models from information theory might be a good starting point for this endeavour.
Posted by dr at February 18, 2005 3:47 PM
Trackback Pings
Please use the following TrackBack URL:
http://www.davids-world.com/~dr/cgi-bin/mt/mt-tb.cgi/31
Comments
I believe "spam that pollutes the blogosphere" should be referred to as "SMOG"
Jim Parham (Swing Trader/Creative Thinker)
Yuba City, CA
stockmaverick@gmail.com
Posted by: Jim Parham at August 20, 2005 11:39 PM