![]() |
Spamdexing a.k.a. Black Hat SEO [8] |
Aditi writes about SEO in her blog; I would like to add
to it by writing about Spamdexing a.k.a. Black Hat- SEO, which is
an unethical means, to poison the results of a search engine. She gives a brief introduction
about some of the methods not to be used in SEO such as Keyword Stuffing,
Hidden texts etc at the end of the blog. Do you know why ?
These methods actually fall under Spamdexing (Specifically
under Content Spamdexing) which is why Search Engines enforce penalty on
website if they find it using them. I’d like to give more details about these
and various other methods used for spamdexing and finally I’ll conclude with
some of the methods employed by the search engines to fight these.
At some point in time, we all have come across some web-search
results which are completely irrelevant to our information need and to the query
but still ranked high. These spam pages employ multiple methods to trick search
engine into giving them high rank.
One example would be web-pages having a lot of keywords
embedded in them which are not visible to the viewer (This could be achieved by
using different color schemes for those keywords). Since the search engine
indexes all the keywords of the web-page so they also get indexed and web-pages gets ranked high due to a lot of terms being used as keywords even if they are irrelevant to the main content.
Other instance of spam would be when someone creates a lot
of spurious websites, all linking to his main website in order to increase the rank
of his main website in search results. This could work in some search engines
as they give more importance to number of incoming links (in-degree) for ranking
a website.
![]() |
Keyword Stuffing [9] |
Spamdexing, which I mentioned before as the method of
unethically poisoning search results and thus gaining higher rank
illegitimately, is classified into two types: -
- Content Spamdexing (a.k.a. Term Spamdexing)
- Link Spamdexing
- Scraper Sites – There are various sites which employ web scraping to scrape content from various other credible websites and show all their content. Since they offer information from different web resources, they get ranked higher. If they are not identified for a long time then some other blogs, sites may later add link to such sites for the offered information and hence increasing their in-degree which further increases their rank. As the information and content which these sites scrape could be copyrighted, so these fall under spamdexing. Sometimes these sites gets ranked above the original site from where they scraped information.
- Gateway (a.k.a. Doorway pages) –These web-pages usually don’t contain any information but are filled with a lot of related keywords (usually invisible to user) and contain a link to another page. With large number of keyword usage, if they are ranked high in results then after opening them, user will click the only link available in the page to enter the main site. So, these act as a doorway or gateway to main site.
- Article Spinning – Even though the sites are not scraped under this but by using different synonyms and grammar rules they are either re-written completely by taking information from other site either manually or by using neural networks in a program. Since some of the search engines check for duplicate information in various sites and penalize the rank for less popular site, so this approach still goes undetected in various cases and these sites still get high rank a lot of times.
2. Link Spamdexing- Various ranking algorithms such as
PageRank which are used by search engines consider – 1) the incoming http links to the
page (in-degree) and 2) the outgoing http links from the page (out-degree) as important
measure to calculate the rank of a website. More the in-degree and out-degree
of a website, higher rank is given to it. Link Spamdexing exploits this
property and introduces different links to one site from various other sites in
order to increase the in-degree which results in higher rank in the search
results. Second example considered in the beginning is an exploit of this type. There are various methods for doing link Spamdexing. Some of
them are given below: -
- Link Farming – Consider a complete graph whose nodes are the sites and whose edges are the links between those sites (See figure below). A link farm refers to a collection of sites which all link to each other. Since many search engine algorithms consider in-degree and out-degree as an important measure of ranking a site so a site in a link farm gets ranked very high.
- Hidden hyperlinks – Just as hidden keywords were used in content spamdexing to increase rank, hidden hyperlinks are used under link spamdexing which helps in increasing rank for the same reason mentioned above.
- Sybil Attacks - In general, Sybil attack refers to creating a lot of identities for some malicious purpose. In terms of link spamdexing, an attacker creates multiple sites which are under different domains but all point to a specific site whose rank is to be increased. They all may also link to each other to create a spam blog which is similar to link farm.
Various other methods are also used for spamdexing. Please
check the references for more information.
Now we come to the final part. What steps are taken by popular
Search Engines to avoid such spam attempts?
Google Penalty [3]– Google enforces a penalty for sites
practicing spamdexing techniques to reduce their search rankings. This is
either done manually (if such a site is reported) or its done automatically if
algorithms employed for checking spam sites identifies such a site.
There are various updates introduced by google in its search
algorithm just to identify lower the rank of spam sites. Some of them are
listed below: -
- Google Panda [7] – This update to search algorithm was released in Feb 2011 whose main purpose was to lower the ranks of sites whose content wasn’t of good quality. After this update, it was observed that news sites got higher ranking and the advertisement sites got penalized.
- Google Penguin [4] – Google has various webmaster guidelines in place which provide information on Search Engine Optimization in legitimate ways. Google Penguin was an update to its search algorithm which was released in Apr 2012 which aimed at decreasing the ranking of the sites which violated its webmaster guidelines. Latest update to Penguin (Penguin 7) was released on September 2017.
Apart from the proprietary algorithm of Google, there are some public
algorithms such as TrustRank [5,6] which are employed by search engines for
the sole purpose of finding spam pages and lowering their rank
Kindly give your feedback on the blog
References : -
Image Source: -
Comments
Post a Comment