Robots.txt is a text file which helps search engine bots to crawl your site or blog effectively. Using robots.txt file you can control search engine bots that which part of your blog should be allowed or not. Solving duplicate content issues can be helpful to webmaster and bloggers to rank their site higher in search engines. If you have large content indexed on search engines with much duplicate posts and pages then there are chances of getting penalized from search engines.

Some common duplicate content issues on a WordPress blog

  • Duplicate content in post feed.
  • Permalinks with and without trailing slash.
  • Same content in www and non-www version of your blog.
  • Same content in your Category pages and post pages.
  • Same content in your Index pages and post pages.

How to avoid WordPress duplicate content issue

You can avoid WordPress duplicate content issues by using robots.txt file.

User-agent: *
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/
Disallow: /comments/feed/
Disallow: /page/
Disallow: /comments/

This robots.txt file will avoid search engine bots to crawl your admin folders, trackbacks, feeds, comments, comments feed and pages. You can also check LineToWeb robots.txt file for your reference.