Nearly all search engines utilize spiders, a.k.a. robots, to go out and search the Web looking for Web pages. These search engine spiders then bring the data back to be indexed by the engine.
Since roughly 1996, individual meta commands have existed that can be used on individual Web pages to modify how these search engine spiders behave. The most useful of these commands are fairly universal and respected by almost all search engines. What follows is a list of some of the more popular spider commands and instances in which you might want to use them.
meta name="robots" content="index"
This meta command is one of the most common ones used - and it is also the least necessary. It tells search engine spiders to come on in and put the page in their index. However, all search engines do this by default anyway.
meta name="robots" content="follow"
The follow command is different from the index command. It basically requests that the search engine spiders follow the links that are on a particular page. Again, however, this piece of code is completely unnecessary because all search engines are going to follow the links on a page.
meta name="robots" content="noindex"
The noindex command, the opposite of the index command, tells search engine spiders not to index the content of a page. It’s important to note however that search engine spiders will still follow the links on a page that uses only this command. When not used for legitimate purposes, this tag can be dangerous because it can put you at risk for penalization by most, if not all search engines. This is because you can use a noindex tag to hide pages with multiple links that you don’t want visitors to see but that you do want all search engines to index. This would be know as a 'black hat" SEO technic. There are however some legitimate uses for the noindex command. For example, if you have a dynamic site and you’ve created static pages to replace some of your dynamic pages, which can make them easier for search engine spiders to access, you could put a noindex tag on the dynamic version.
As Google mentions in its Webmaster Help Center:
"Consider creating static copies of dynamic pages. Although the Google index includes dynamic pages, they comprise a small portion of our index. If you suspect that your dynamically generated pages (such as URLs containing question marks) are causing problems for our crawler, you might create static copies of these pages."
In cases like these, it is acceptable to use the "no index" command on the dynamic version of the page, so that your content will not be treated as duplicate. You are not tricking all search engines, you’re just redirecting them.
meta name="robots" content="nofollow"
This tag tells search engine spiders that it’s OK to go ahead and index a page and list it but that they shouldn’t follow any of the links that are on the page. This could be used on a gateway page to secure content, help files etc.. The nofollow command in effect tells all search engines that this is the end of the line.
meta name="robots" content="noindex,nofollow"
Obviously, noindex and nofollow are powerful tags - and in combination, they can make a page and the subsequent pages to which it links invisible to nearly all search engines. This combination command tells search engine spiders, "Do not read this page; do not follow any of the links on this page; do not include this page in your index."
This command has its beneficial uses. For example, it can be placed on pages on a site that have duplicate content for legitimate reasons. A Web site might have both a page for the company and a page for a distributor that cover the same product with exactly the same content. However, nearly all search engines would see this as duplicate content and could devalue both pages. So placing this command on one of them means that search engine spiders will walk on by and you won’t be penalized.
meta name="robots" content="noarchive"
Finally, almost all search engines today, including Google and Yahoo, offer a cached version of a site alongside its listing that provides a snapshot of what the page used to look like. The noarchive tag, therefore, is available to be used in circumstances where there is content on your Web site that is of a timely nature and therefore that you might not necessarily want search engine spiders to cache for people to have access to moving forward. By using the noarchive tag, you are telling search engine spiders, in effect, "This page is subject to frequent changes, and I don’t want my visitors to have access to some of this content at a later time."
Conclusion
The commands discussed above are just a few of the ones in existence, and new ones are being added frequently. While nearly all search engines support these commands, there are still some that don’t. The ones above, however, are fairly universally understood by search engine spiders, no matter from where they originate.
No comments:
Post a Comment