Woffling On

Sunday, August 28, 2005

Potential Trap in Drupal and Thoughts on Search Engines

This is just a small point that may be worth tucking away in the back of one's mind. Assuming you use Drupal of course.

One of the strengths of Drupal and applications like it is strong support for semantics. Basically it facilitates what's important in web publishing, namely, communicating meaning.

A good illustration of this in Drupal is the capacity to assign a node (fancy word for a piece of content, like a web page) to more than one conceptual category. For instance, I wrote a piece that was meaningfully related to both vitamins and herbs. Since these are two separate categories, serving independent RSS feeds, it was both easy and semantically sensible to assign the content to both.

So far so good. This means that the content is readily made available to people with an interest in either vitamins or herbs, or indeed both. This is good for publishing meaning-rich content, good for people searching for that content, and you might imagine simply good all round.

Unfortunately, Drupal actually makes two separate paths leading to the same content. This serves the ideal arrangement extolled above, but, the search engines don't like it! You might think they would love it. After all, they're about providing meaningful content to searchers aren't they?

The trouble is, their supposedly sophisticated, semantically aware algorithms can't cope. You see, they are still bruised by spammers who simply replicate content on multiple sites (or paths) in the hope of snaring more space in the search engines and thereby increasing their likely exposure to searchers. They can't tell the difference between a spammer's attempt to trick them into indexing multiple copies of identical content and the perfectly sound, semantically driven, multiple paths to appropriate content illustrated above.

I discovered this in my Drupal logs (an excellent system for monitoring activity on the site incidently) when I saw red error messages about duplicate content after a spider visited. So what to do hey?

Normally, when what is sensible conflicts with the dictates of the search engines I go with what makes sense and forget about the search engines. Why? Simply because I am one of those old duffers with principles. Many would say I'm crazy, but my way of thinking goes like this.

Within reason it is a great idea to be flexible and to adapt to fit in with the search engine expectations. It must be noted that overwhelmingly they are sensible folk who want to do the right thing by everyone, so they don't set out to do any harm. Following Google's webmaster guidelines, for example, makes sense because the ideas themselves make sense.

However, when they get it wrong or are unreasonable, I go with what I think is right and ignore the search engines. You see, my view is that it is their job to fit in with what makes sense. In essence, they have to follow me, rather than me follow them. Not because I'm personally able to lead them, but because I'm following sound principles. It's the principles they need to get aboard. I do not believe in letting the search engines become dictators.

In this case though, I'm inclined to avoid dual paths. I am not sufficiently experienced in the management of a databse driven website, so I will err on the side of caution for now. However, I'll be keeping an eye on this issue. It is one area in which the search engines need to improve their analysis. I won't hold my breath though.

0 Comments:

Post a Comment

<< Home