Tuesday, November 20, 2012

Coping HTML is feature by Google?

Before couple of week I visited one blog explain about search engines reaction after coping content from any small website to high PR website.

I would like to thank Dejan SEO team for publishing this post on their website.
  • We all know about SEO and we also aware about search engines myths & Mystery...
Google making changes in their algorithm again and again to provide top quality search result to its user.

  1. At one side we all SEO(search engine optimizers) put our effort to get good organic result by different on-page and off-page activities.
  2. But this updates by Google creates lots of doubts in our mind – about future of our website ranking.
  • Today we are going to discuss about search engine result hijacking. This case done by Dejan SEO Team after taking permission from three various authors
  1. Jim Munro
  2. Robb Mass
  3. Rand Fishkin
  • Introduction: It’s not Bug it is a feature of Google. We all know about, how Google always prefer quality of content to rank higher. If you are coping content from another website than your future is not so GOOD.

  1. After this regular updates by Google we try to make our self happy because we are not affected with this update.
  2. We also know higher PR or Authority websites overtaking small business website’s results.
  • Search Theory: As same from below image, in past if you were coping content from some website (Even from High PR website) it was OK, you were in search result.
But now.... If we try to copy content from any website, we’ll no more in organic search result.
Today’s key part: Search engine crawler visits each website and if they found same content on multiple URLs, than They will only display “HIGH PR” website in search result.
  • Case studies: They took hdiffount websites to analyze this case....

Dejanseo.com.au - [PR7]
  1. Marketbizz.nl – [PR4]
  2. Dumbseoquestions.com – [PR2]
  3. Shopsafe.com.au – [PR4]
  4. Moz.com – [PR0]

[Case #1]:- (Marketbizz.nl – 0 +1) and (rob.dejanseo.com.au – 7 +1)

Marketbizz.nl/(en)/ReferentiEN.htm
 
to the...

DejanSEO-Sub domain (rob.dejanseo.com.au/(en)/ReferentiEN.htm)
  • Copied whole Html and images - on 26th Oct.2012

  • Search result hijacked on 30th Oct.2012....not only in organic results but also replaced by info:command.it is showing new url in Search Result.

  • Even also ‘PR’ replaced............
  • They also tried to search with (“-“)of Description but still only showing replaced result, even also with also try with exact URL but not showing original result.


[Case #2]:- (Dumbseoquestions.com copied to dsq.dejan.seo.com.au)

Copied whole content but without copying any media on 30th Oct.2012


If I searches with (info:) result replaced but if I searches with brand name or title ”Damb SEO Questions” result not replaced.


And...

  • Interesting   discovery:-
  1. Exact content search showing original site
  2. Sidebar content search showing Duplicate copy

Still there is no use of canonical tag.........to stop hijacking.

[Case #3] shopsafe.com.au to shop.safe.dejanseo.com.au


This time they have selected domain who have placed (Rel=”canonical”) tag in HTML.
  • Because we use rel:canonical

The organic search results were replaced by copied page, but........ if we try to search with “info:(    )” Command it shows original website.
All +1’s were removed intentionally –But it is still unclear to us.........
  • Possible defence mechanisms:
  1. rel=”canonical”
  2. Authorship mark/link from Google + Profile
  3. +1’s.......

[Case #4] In this case study they “Higher Authority Domain” to check out search engine movment. So they have chooses hard domain of Mr. Rand Fishkin


take minor edits for this test like (rel next/prev, authorship, canonical)
  • They did manage to hack Rand’s search result for ”Rand Fishkin”.

And one of his article “Yellow Shoes and Personal Vs. Company Branding”, but only for Australian searches.

  • It should because of “Geo locking” – with 2 reasons:
  1. Australian domain(.au) host the copy
  2. Link pointing towards copied(Original) page from Australian domain.
  • Not A Full Hijack:
Failed to achieve replacing URL (if you searches with “info:” command showing copied page)
  • Due to code was slightly different than original.
  • Rand’s authorship link (which was removed)
  • More quality social signs & quality inbound links.

  • Interesting Observation:
Plus one sharing between Rand’s blog and Duplicate page
It mean if someone +1 any of these two page than another will immediately gets +1 credit.
In full hijack couldn’t see any +1 movement but on Rand’s blog after removing “Author tag” they shares +1s..
  • Security:
We all need to prevent our website result/documents even if authority website (like DejanSEO) by coping your HTML.
  • Canonicalization:
With the proper use of rel=”canonical” tag can secure our documents and Google will also get idea to know which page in ORIGINAL. But in [Case#3] we see that Google still replacing URL even after use of canonical tag.
There is a way to protect our document including PDF with the use of “http header canonicalization


We cannot say that Authorship really play role in organic ranking factors.

After observing Rand’s blog internal links (Full URL) from copied page. If someone copy same URL in place of yours or they replaced it with their newly created page links in place of internal links. If they do no changes still they got Page Rank.
  • Content Monitoring:
Content monitoring is best way to say protected by content monitoring. You can protect it by using any tool like CopyScape. You can also protect it by complaining on web master tool. Quick reaction can help you by using web master link removal tool if any High Authority website copied your content even they passes back-link advantage.