As many of you may have noticed, the duplicate content penalty applied by Google is the subject of many online discussions.
This is what Google officials advise webmasters:
It’s also important to keep in mind that our crawlers don’t index duplicate content, so creating identical sites at several domains will likely not result in their returning for many country restricts. If you do create duplicate domains, we suggest using a robots.txt file to block our crawler from accessing all but your preferred one.
What is duplicate content?
Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Most of the time when we see this, it’s unintentional or at least not malicious in origin: forums that generate both regular and stripped-down mobile-targeted pages, store items shown (and — worse yet — linked) via multiple distinct URLs, and so on. In some cases, content is duplicated across domains in an attempt to manipulate search engine rankings or garner more traffic via popular or long-tail queries.
More on the issue of what is and what’s not duplicate content, you can find here.
DupPrevent Wordpress Plugin
You probably know already that your blog contains pages with identical content, each post having a few possibilities of being accessed: via its permalink, via archives, via categories, or as a feed.
DupPrevent Plugin controls NOINDEX meta tags to prevent duplicate content penalty. Additionally the plugin includes robots.txt file to disallow search engines to spider feeds, trackbacks and the wp- directories. In this way, you make sure that your pages are crawled only once.
Yet, the DupPrevent plugin cannot prevent those who steal content and publish it on their sites, and it is less probable that such a plugin will ever be written.



