Google Search Overwhelmed By Massive Spam Attack

Google’s search results have been hit by a spam attack for the past few days in what can only be described as completely out of control. Many domains are ranking for hundreds of thousands of keywords each, an indication that the scale of this attack could easily reach into the millions of keyword phrases.

The spam was initially discovered by Lily Ray:

Surprisingly, many of the domains have only been registered within the past 24-48 hours.

This recently came to my attention from a series of posts by Bill Hartzer (LinkedIn profile) where he published a link graph generated by the Majestic backlinks tool that exposed the link networks of several of the spam sites.

The link graph that he posted showed scores of websites tightly interlinking with each other, which is a fairly typical pattern for spammy link networks.

Screenshot Of Tightly Interlinked Network

Image by Bill Hartzer via Majestic

Bill and I talked about the spam sites over Facebook messenger and we both agreed that although the spammers put a lot of work into creating a backlink network, the links weren’t actually responsible for the high rankings.

Bill said:

“This, in my opinion, is partly the fault of Google, who appears to be putting more emphasis on content rather than links.”

I agree 100% that Google is putting more emphasis on content than links. But my thoughts are that the spam links are there so that Googlebot can discover the spam pages and index them, even if just for one or two days.

Once indexed the spam pages are likely exploiting what I consider two loopholes in Google’s algorithms, which I talk about next.

Out of Control Spam in Google SERPs

Multiple sites are ranking for longtail phrases that are somewhat easy to rank, as well as phrases with a local search component, which are also easy to rank.

Longtail phrases are keyword phrases that are used by people but exceedingly rarely. Longtail is a concept that’s been around for almost twenty years and subsequently popularized by a 2006 book called The Long Tail: Why the Future of Business is Selling Less of More.

Spammers are able to rank for these rarely searched phrases because there is little competition for those phrases, which makes it easy to rank.

So if a spammer creates millions of pages of longtail phrases those pages can then rank for hundreds of thousands of keywords every day in a short period of time.

Companies like Amazon use the principle of the longtail to sell hundreds of thousands of individual products a day which is different than selling one product hundreds of thousands of times per day.

That’s what the spammers are exploiting, the ease of ranking for longtail phrases.

The second thing that the spammers are exploiting is the loophole that’s inherent in Local Search.

The local search algorithm is not the same as the algorithm for ranking non-local keywords.

The examples that have come to light are variations of Craigslist and related keywords.

Examples are phrases like Craigslist auto parts, Craigslist rooms to rent, Craigslist for sale by owner and thousands of other keywords, most of which don’t use the word Craigslist.

The scale of the spam is huge and it goes far beyond than keywords with the word “Craigslist” in it.

What The Spam Page Looks Like

Taking a look at what the spam page looks like is impossible by visiting the pages with a browser.

I tried to see the source code of the sites that rank in Google but all of the spam sites automatically redirect to another domain.

I next entered the spam URL into the W3C link checker to visit the website but the W3C bot couldn’t see the site either.

So I changed my browser user agent to identify itself as Googlebot but the spam site still redirected me.

That indicated that the site was not checking if the user agent was Googlebot.

The spam site was checking for Googlebot IP addresses. If the visitor’s IP address matched as belonging to Google then the spam page displayed content to Googlebot.

All other visitors got a redirect to other domains that displayed sketchy content.

In order to see the HTML of the website I had to visit with a Google IP address. So I used Google’s Rich Results tester to visit the spam site and record the HTML of the page.

I showed Bill Hartzer how to extract the HTML by using the Rich Results tester and he immediately went off to tweet about it, lol. Dang!

The Rich Results Tester has an option to show the HTML of a webpage. So copied the HTML, pasted it into a text file then saved it it as an HTML file.

Screenshot Of HTML Provided By Rich Results Tool

Google Search Overwhelmed By Massive Spam Attack

I next edited the HTML file to remove any JavaScript then saved the file again.

I was now able to see what the webpage looks like to Google:

Screenshot Of Spam Webpage

Screenshot of a spam webpage that ranks in Google

One Domain Ranks For 300,000+ Keywords

Bill sent me a spreadsheet containing a list of keyword phrases that just one of the spam sites ranked for. One spam site, just one of them, ranked for over 300,000 keyword phrases.

Screenshot Showing Keywords For One Domain

Image showing a closeup of a spreadsheet with keyword phrases on it

There were a lot of Craigslist keyword phrases but there were also other longtail phrases, many of which contained a local search element. As I mentioned, it’s easy to rank for longtail phrases, easy to rank for local search phrases and combine the two kinds of phrases and it’s really easy to rank for these keyword phrases.

Why Does This Spam Technique Work?

Local search uses a different algorithm than the non-local algorithm. For example, a local site, in general, doesn’t need a lot of links to rank for a query. The pages just need the right kinds of keywords to trigger a local search algorithm and rank it for a geographic area.

So if you search for “Craigslist auto parts” that’s going to trigger the local search algorithm and because it’s longtail it’s not going to take too much to rank it.

This is an ongoing problem for many years. Several years ago a website was able to rank for “Rhinoplasty Plano, Texas” with a site that contained old Roman Latin content and headings in English. Rhinoplasty is a longtail local search and Plano, Texas is a relatively small town. Ranking for that Rhinoplasty keyword phrase was so easy that the latin language website was able to easily rank for it.

Google has known about this spam problem since at least December 19th, as acknowledged in a tweet by Danny Sullivan.

It will be interesting to see if Google finally after all this time figures out a way to combat this kind of spam.

Featured Image by Shutterstock/Kateryna Onyshchuk

Leave a Reply

Your email address will not be published. Required fields are marked *