Spamdexing: A means to Search Engine Poisoning

Spamdexing a.k.a. Black Hat SEO [8]

Aditi writes about SEO in her blog; I would like to add to it by writing about Spamdexing a.k.a. Black Hat- SEO, which is an unethical means, to poison the results of a search engine. She gives a brief introduction about some of the methods not to be used in SEO such as Keyword Stuffing, Hidden texts etc at the end of the blog. Do you know why ?

These methods actually fall under Spamdexing (Specifically under Content Spamdexing) which is why Search Engines enforce penalty on website if they find it using them. I’d like to give more details about these and various other methods used for spamdexing and finally I’ll conclude with some of the methods employed by the search engines to fight these.

At some point in time, we all have come across some web-search results which are completely irrelevant to our information need and to the query but still ranked high. These spam pages employ multiple methods to trick search engine into giving them high rank.

One example would be web-pages having a lot of keywords embedded in them which are not visible to the viewer (This could be achieved by using different color schemes for those keywords). Since the search engine indexes all the keywords of the web-page so they also get indexed and web-pages gets ranked high due to a lot of terms being used as keywords even if they are irrelevant to the main content.

Other instance of spam would be when someone creates a lot of spurious websites, all linking to his main website in order to increase the rank of his main website in search results. This could work in some search engines as they give more importance to number of incoming links (in-degree) for ranking a website.

Keyword Stuffing [9]

Spamdexing, which I mentioned before as the method of unethically poisoning search results and thus gaining higher rank illegitimately, is classified into two types: -

Content Spamdexing (a.k.a. Term Spamdexing)
Link Spamdexing

1. Content Spamdexing - In content spamdexing, the content of the web-page is changed in different ways which violate the search engine guidelines. The first example mentioned above, in which invisible keyword stuffing is employed to get higher rank, falls under this category. Some of the other methods used in content spamdexing are mentioned below: -

Scraper Sites – There are various sites which employ web scraping to scrape content from various other credible websites and show all their content. Since they offer information from different web resources, they get ranked higher. If they are not identified for a long time then some other blogs, sites may later add link to such sites for the offered information and hence increasing their in-degree which further increases their rank. As the information and content which these sites scrape could be copyrighted, so these fall under spamdexing. Sometimes these sites gets ranked above the original site from where they scraped information.

Gateway (a.k.a. Doorway pages) –These web-pages usually don’t contain any information but are filled with a lot of related keywords (usually invisible to user) and contain a link to another page. With large number of keyword usage, if they are ranked high in results then after opening them, user will click the only link available in the page to enter the main site. So, these act as a doorway or gateway to main site.

Article Spinning – Even though the sites are not scraped under this but by using different synonyms and grammar rules they are either re-written completely by taking information from other site either manually or by using neural networks in a program. Since some of the search engines check for duplicate information in various sites and penalize the rank for less popular site, so this approach still goes undetected in various cases and these sites still get high rank a lot of times.

2. Link Spamdexing- Various ranking algorithms such as PageRank which are used by search engines consider – 1) the incoming http links to the page (in-degree) and 2) the outgoing http links from the page (out-degree) as important measure to calculate the rank of a website. More the in-degree and out-degree of a website, higher rank is given to it. Link Spamdexing exploits this property and introduces different links to one site from various other sites in order to increase the in-degree which results in higher rank in the search results. Second example considered in the beginning is an exploit of this type. There are various methods for doing link Spamdexing. Some of them are given below: -

Link Farming – Consider a complete graph whose nodes are the sites and whose edges are the links between those sites (See figure below). A link farm refers to a collection of sites which all link to each other. Since many search engine algorithms consider in-degree and out-degree as an important measure of ranking a site so a site in a link farm gets ranked very high.

Link Farm Example [10]

Hidden hyperlinks – Just as hidden keywords were used in content spamdexing to increase rank, hidden hyperlinks are used under link spamdexing which helps in increasing rank for the same reason mentioned above.

Sybil Attacks - In general, Sybil attack refers to creating a lot of identities for some malicious purpose. In terms of link spamdexing, an attacker creates multiple sites which are under different domains but all point to a specific site whose rank is to be increased. They all may also link to each other to create a spam blog which is similar to link farm.

Various other methods are also used for spamdexing. Please check the references for more information.

Now we come to the final part. What steps are taken by popular Search Engines to avoid such spam attempts?

Google Penalty [3]– Google enforces a penalty for sites practicing spamdexing techniques to reduce their search rankings. This is either done manually (if such a site is reported) or its done automatically if algorithms employed for checking spam sites identifies such a site.

There are various updates introduced by google in its search algorithm just to identify lower the rank of spam sites. Some of them are listed below: -

Google Panda [7] – This update to search algorithm was released in Feb 2011 whose main purpose was to lower the ranks of sites whose content wasn’t of good quality. After this update, it was observed that news sites got higher ranking and the advertisement sites got penalized.

Google Penguin [4] – Google has various webmaster guidelines in place which provide information on Search Engine Optimization in legitimate ways. Google Penguin was an update to its search algorithm which was released in Apr 2012 which aimed at decreasing the ranking of the sites which violated its webmaster guidelines. Latest update to Penguin (Penguin 7) was released on September 2017.

Apart from the proprietary algorithm of Google, there are some public algorithms such as TrustRank [5,6] which are employed by search engines for the sole purpose of finding spam pages and lowering their rank

Kindly give your feedback on the blog

References : -

[1] https://en.wikipedia.org/wiki/Spamdexing

[2] https://en.wikipedia.org/wiki/Link_farm

[3] https://en.wikipedia.org/wiki/Google_penalty#Negative_SEO

[4] https://en.wikipedia.org/wiki/Google_Penguin

[5] https://en.wikipedia.org/wiki/TrustRank

[6] https://dl.acm.org/citation.cfm?id=1316689.1316740

[7] https://en.wikipedia.org/wiki/Google_Panda

Image Source: -

[8] https://talkroute.com/wp-content/uploads/2016/05/black-hat-seo-1184x739.jpg

[9] https://tectrick.org/wp-content/uploads/2015/05/keyword-stuffing.jpg