Smartproxy

Table of content

December 22, 2021
10 minutes read

How To Scrape Google Search Results, Or Rising To The Google Challenge [VIDEO]

Whenever you want to find an answer to a tricky question or dig out some advice, who (or what) do you approach first? Let’s be honest, it’s Google. Market research, competitor analysis, latest news, exclusive deals on designer clothing – whichever you’re after, 9 times out of 10, you’ll google it.

Being the richest encyclopedia in the world, Google is also the most protective of all search engines, so extracting data from it can be pretty hellish. On the bright side, there’s a way out. This tutorial will demonstrate how you can successfully scrape the world’s largest library by using Smartproxy’s SERP Scraping API.

Smartproxy scraper analyzes Google search results page (SERP)
  • Smartproxy >
  • Blog >
  • How To Scrape Google Search Results, Or Rising To The Google Challenge

Why people scrape Google

We all know Bing, Yahoo, Yandex, and other search engines. Compared to Google, though, they’re small. Just like sprinkles on a cupcake. Google holds almost 90% of the market, which is why ranking high in their search pages is a stairway to heaven for practically every single business.

Amidst the myriad of reasons why businesses scrape Google, the most typical use cases are:

  • Researching competitors and prices on the market
  • Monitoring search engine optimization (SEO) 
  • Building URL lists for specific keywords
  • Getting insights into keyword rankings  
  • Retrieving paid and organic data
  • Analyzing ads
Google data properties for scraping

Which elements of a Google SERP are scrapable

Before learning how to scrape Google, it’s crucial to understand how it shows your searches. And the display of search results has changed a lot! The days when you saw only an index of pages with URLs, the so-called organic search, are done.

Although the primary purpose of Google is to enable you to answer your queries quickly and efficiently, it’s not just that. To make you choose Google over others, it also seeks to display your search results in a way that’ll be attractive and easy on the eyes.

That’s why the display of search results has changed significantly over time. Depending on the complexity and type of search, you’ll see different content on Google. As an illustration, type “data collection” and see which components make up a Google SERP. Below you’ll see what pops up in our browser.

Organic search:

Google SERP

Featured snippet:

Google featured snippet for the data collection entity

People also ask section:

People also ask (PAA) section on a Google SERP

Top stories:

Top stories section on a Google SERP

Related searches:

Related searches section on a Google SERP

Quite a list! And that’s not even a final list of elements that a Google results page can have. As already mentioned, everything depends on the keyword that you’re googling. Below there’s a list of the elements that might appear on Google and can be easily scraped with Smartproxy’s scraping API:

Paid and organic search

Travel industry 

Popular products & listings 

Videos 

Ads (real-time)

Images

Google Shopping 

Related questions

Related searches 

Featured snippets 

Local pack 

Top stories

Hotel availability 

Restaurant data 

Recipes 

Jobs 

Knowledge panel 

… and anything else

What is Smartproxy’s SERP Scraping API?

If you want to scrape all major search engines at scale, including the almighty Google, your only choice will be our SERP Scraping API. It’s truly an all-in-one scraping tool: a network of 40M+ residential and datacenter proxies, web scraper and data parser.

With this tool, you won’t need any extras such as a scraper or a crawler to gather SERP info from Google. Our SERP Scraping API functions as a full-stack solution, which combines a proxy network and any extra scraping tool in a single product.

Features of our SERP API

Our SERP Scraping API is specifically designed for successful data scraping, so they’re highly scalable and flexible. What we mean by that is that our scraping API is:

  • Easy to use

All you need to do is specify your query or URL, and we’ll return well-formatted data in JSON. Trust us – after reading the tutorial below, you’ll see that it’s not that scary.

  • 100% successful 

We guarantee that we’ll show you search results for any device or browser. If our first request fails, we keep sending requests as long as we need to return the desired result to you. We swear – you’ll get structured data from Google by sending just one API request.

The best part is that you won’t overpay because we never charge for failed requests. One request from your side equals one successful request count. All you need to do is wait for that data to arrive.

If you’re wondering how we do that, let us just tell you that we filter the best proxies from Smartproxy’s 40M+ proxy pool and choose only those that’ll bring you a 100% success rate. Smart, right?

100% success rate for Smartproxy
  • Rotating

Our network uses advanced rotation, which will automate proxy changes, so you don’t have to worry about that every time a proxy dies.

  • Extremely flexible

Our scraping API will let you view results from any country, state, or city in the world. Using different endpoints to access real IP addresses from any place on earth also means that you’ll avoid IP blocking and flagging.

  • Scalable

We support high volumes of requests, even when Google is asking for CAPTCHA. The exact number of maximum requests varies from time to time and depends on your scraping activities. Yet, what we know for sure is that using our API, you’ll defo forget about CAPTCHAs!

Tutorial on scraping Google with Smartproxy’s API

Loading video...

Step 1

Sign up to get access to our dashboard:

Smartproxy website

Step 2 

Once you’ve signed up, wait for a verification email to arrive in your inbox. When you receive it, click on the link to confirm your email address, and hooray! You can now explore our dashboard! 

Step 3 

In the menu on your left, select “SERP” (under the “Scraping APIs” section) and then “Pricing”. Over there, pick the plan that suits your needs best:

Smartproxy dashboard

With Smartproxy, you never have to worry about your payment. All our monthly subscription plans are recurrent. We charge you automatically, so you don’t have to manage anything. Of course, you can cancel, upgrade, or downgrade your subscription at any time. 

The cherry on the cake is that if you’re not satisfied with our SERP API, we’ll refund you! We offer a 3-day money-back option to all proxy purchases on our website, except for cases when you’ve used over 20% of your subscription traffic and for all crypto payments.

Step 4

That’s where some coding comes into place. There, there… It’s easier than it sounds.

Let’s have a look at the code that you’d need to write in a Python interpreter. Why Python? Well, it’s widely known for its simple syntax, so a lot of programmers opt for it. However, if Python isn’t your type, you can also use other programming languages or simply write a code in the Terminal (Mac or Linux users) or the Command Prompt (Windows fans). Keep in mind, though, that you’ll need different code then – for more info on that, look at our API documentation.

Below is a code for Python:

import requests
url = “https://scrape.smartproxy.com/v1/tasks”
payload = {
“target”: “google_search”,
“query”: “proxy faq”,
“parse”: True,
“locale”: “en-GB”,
“google_results_language”: “en”,
“geo”: “London,England,United Kingdom”
}
headers = { "Authorization": "Basic cHJdeHl3YXv6U1Bwcm94rXdhtTE=",
“Accept”: “application/json”,
“Content-Type”: “application/json”
}
response = requests.request(“POST”, url, json=payload, headers=headers)
print(response.text)

Here, your URL would always be the same – https://scrape.smartproxy.com/v1/tasks. Don’t make any changes here. Just copy and paste. Period.

If you want to scrape organic search results on Google, write “google_search” in the target line. Just as we said before, a Google search page is multifaceted so there are many available targets. If you’re after hotel deals on Google, write “google_hotels”. If you’re into book results, hit “google_books”. A list of all supported targets for our proxies is available in our documentation.

The query parameter indicates what you’d write in the Google search bar. In this case – “proxy faq”. Moving on, if you write “True” when specifying the parsing method, it means that your results will be parsed in JSON. Leave blank for HTML.

The locale parameter enables you to change the interface language of a Google search page (not the results). On the other hand, the results language parameter selects in which language you’d like to get your results from Google. You can access a complete list of available languages here. The geo variable pinpoints which region you want to target.

Regarding the remaining part of the code above (from headers), you should copy-paste it. Sure, another type of integration might require some minor changes. Yet, in general, the idea will be the same.

Step 5

The code above will return a huge set of beautifully parsed data. But what to do if you don’t want to see all the results but only some specific ones? Depending on what you want to filter, add these three simple lines in your Python interpreter:

my_list = parsed[“results”][0][“content”][“results”][“organic”]

for k in my_list:

print(k[“pos”], k[“url”])

We narrowed down the results so that only the URLs from the organic search of the keyphrase “proxies faq” would be visible. The code also extracts the position which a particular URL holds on Google.

Spot that tiny “0” in the code. It shows that we didn’t specify how many URLs we wish to get. In such a case, you’ll get ten results by default. Of course, don’t be shy to write a bigger number if you need more results.

So, those three added lines will return the parsed data below, including the position (from 1 to 10) for each URL:

1 http://www.freeproxy.ru/en/free_proxy/faq/index.htm 

2 https://brightdata.com/blog/proxy-101/common-proxy-questions

3 https://wonderproxy.com/faq

4 https://smartproxy.com/faq

5 https://www.tue.nl/en/our-university/library/practical-information/faq/faq-proxy-server/

6 https://desk.zoho.com/portal/iplayerproxy4u/en/kb/articles/proxy-faq

7 https://limeproxies.netlify.app/blog/top-10-proxy-faqs-smarter-proxy-analysis

8 https://duo.com/docs/authproxy-faq

9 https://uncw.edu/reg/students-proxyfaq.html

10 https://geekflare.com/best-proxy-browsers/

One final note

Sure, if you’re not a computer geek, you may not be as snug as a bug in a rug after reading this blog, but you should be feeling more comfortable with our scraping API now.

Don’t forget that our comprehensive documentation will give you more info on how to get started with Smartproxy’s API for gathering SERP data from Google. Oh yeah, I almost forgot! If you bump into a problem with the setup, contact our customer support team straight away.

smartproxy

Ella Moore

Ella’s here to help you untangle the anonymous world of residential proxies to make your virtual life make sense. She believes there’s nothing better than taking some time to share knowledge in this crazy fast-paced world.

Frequently asked questions

How does SERP Scraping API differ from proxies?

Our all-inclusive scraping tool is more than just a pool of proxies! Here, we’re talking about a complete API for scraping all major search engines. It’s a full-stack solution: a network of 40M+ residential and datacenter proxies together with a web scraper and data parser. Not only easier but also a cheaper way to gather all the data and spare you a headache from all those extra tools.

What can I use the scraping API for?

Our API will certainly help you build up your muscles in SEO and market research. Here, we’re talking not only about getting insights into keyword rankings, viewing real-time and up-to-date results, or retrieving paid and organic data. On top of all that, our scraping API will help you create lists of product names, prices, descriptions, discounts, and availability on the market with ease.

Which search engines can I scrape with Smartproxy’s SERP Scraping API?

With our API, you can target not only Google but also other major search engines, including Bing, Yandex, and Baidu.

What are the requirements for connection when scraping Google?

We need an open connection to return the requested data. The data will come back with the HTTP status code 200, and it’ll be parsed in JSON or contain raw HTML. If your connection is closed before the task is completed, the data will be lost. Note that the timeout limit for open connections is 150 seconds. In a rare case of heavy loading, we may not be able to get the data for you.

What is the Google SERP structure?

Multifaceted, with as many layers as a lasagne. The most common elements are: organic and paid search, featured snippets, top stories, people also ask sections, and related searches.

Which programming languages work with our SERP Scraping API?

Any. Use Python or any other programming language. Alternatively, type in your code in the Terminal (Mac or Linux users) or the Command Prompt (Windows users). For more details on integration, review our documentation.