When same content is posted in multiple sites, how Google determine which is original


The same content could appear in multiple URLs on the web for a variety of reasons. If you are afraid someone else is copying your content and your traffic will go to the copied content, learn how to deal with it. Read this article to find out what will happen if someone copy your content and what you can do about it.

Duplicate content, reproduced content and plagiarism have always been challenges to webmasters as well as search engines. Content developers and bloggers spend a lot of time researching on various topics to provide the best content to their readers. However, a plagiarist needs only a minute to copy and reproduce the same content in another site and then walk away with the fruits of your hard work.



How can Google find out which is original content and which is copied content


In most cases, search engines like Google can figure out the original content based on a number of signals like the date and time of indexing, authoritative nature of the websites etc. If your website is an authority website, have good ranking and you posted the content first, then there are higher chances that Google will consider your article as the original post and would ignore the reproduced one in the search results. However, if you are new in to blogging and if your blog post is reproduced by another website which got indexed by Google before it found your post, then you will probably lose the game. By all chances, Google would think you copied from the other site, losing the benefits of all of your hard work.

How Google determines which is the original content is a secret and Google would never disclose it.

Even though Google would never tell us how it determine which is the original content and which one to ignore in search results, we can safely conclude few points based on what we found is happening in the real search engine world. Some points that Google might be considering to choose the original content against the reproduced content are:
  1. Date and time the content was indexed by Google: In most cases, the content which was indexed first by Google would be treated as the original content and all others would be treated as reproduced content. However, this has a problem. New sites and blogs are not indexed often and if an established blogger reproduced content from new sites, Google would conclude the original author as copy cat. Fortunately, Google does not depend on this factor fully to determine the original content.

  2. Authority of the website: Google has some internal rating for all websites regarding their authoritative nature. Content on the authority site would be given weightage in the analysis to determine the original content vs reproduced content. For example, if you reproduce an article from Wikipedia (or viceversa), it is almost impossible to make Google believe that your article is the original article because of the high ranking and authoritative nature of Wikipedia website.

  3. Links from other websites: Google give a high respect for the incoming links from various websites. If your article has multiple links pointing to it from other highly ranked websites, then your article would be treated as the original and the article which has lower number of incoming links would be treated as re produced content.

  4. Reference Links: Sometimes the reproduced article would give a link to the original source. This is a strong signal to Google that the article to which the link is given is the original source and the one which has the links from it is a re produced content.

If someone ask you if they can reproduce your content, it is a good idea to ask them to just reproduce only a small summary and then give a link to your original article. This will help you get some valuable incoming links. This is an opportunity to build reputation and authority for your site in Google and other search engines.



Google is moving towards Semantic Search and this could bring up another major challenge for content developers and SEO specialists. In addition to dealing with plagiarism and reproduced content, they will have to deal with convincing Google about the semantic meaning of their content, unless the search engine is smart enough to detect what exactly you mean in your content. If you are not familiar with Semantic Search, take a look my article on what is semantic search.


Article by Tony John
Tony John is a professional blogger from India, who started his first Weblog in 1998 at Tripod.com. Tony switched to blogging as a passion blended business in the year 2000 and currently operates several popular web properties including IndiaStudyChannel.com, Techulator.com, dotnetspider.com and many more.

Follow Tony John or read 703 articles authored by Tony John

Comments

No responses found. Be the first to comment...


  • Do not include your name, "with regards" etc in the comment. Write detailed comment, relevant to the topic.
  • No HTML formatting and links to other web sites are allowed.
  • This is a strictly moderated site. Absolutely no spam allowed.
  • Name:
    Email: