How To Scrape Google Search Results, Or Rising To The Google Challenge [VIDEO]
Whenever you want to find an answer to a tricky question or dig out some advice, who (or what) do you approach first? Let’s be honest, it’s Google. Market research, competitor analysis, latest news, exclusive deals on designer clothing – whichever you’re after, 9 times out of 10, you’ll google it.
Being the richest encyclopedia in the world, Google is also the most protective of all search engines, so extracting data from it can be pretty hellish. On the bright side, there’s a way out. This tutorial will demonstrate how you can successfully scrape the world’s largest library by using Smartproxy’s SERP Scraping API.
Why people scrape Google
We all know Bing, Yahoo, Yandex, and other search engines. Compared to Google, though, they’re small. Just like sprinkles on a cupcake. Google holds almost 90% of the market, which is why ranking high in their search pages is a stairway to heaven for practically every single business.
Amidst the myriad of reasons why businesses scrape Google, the most typical use cases are:
- Researching competitors and prices on the market
- Monitoring search engine optimization (SEO)
- Building URL lists for specific keywords
- Getting insights into keyword rankings
- Retrieving paid and organic data
- Analyzing ads
Which elements of a Google SERP are scrapable
Before learning how to scrape Google, it’s crucial to understand how it shows your searches. And the display of search results has changed a lot! The days when you saw only an index of pages with URLs, the so-called organic search, are done.
Although the primary purpose of Google is to enable you to answer your queries quickly and efficiently, it’s not just that. To make you choose Google over others, it also seeks to display your search results in a way that’ll be attractive and easy on the eyes.
That’s why the display of search results has changed significantly over time. Depending on the complexity and type of search, you’ll see different content on Google. As an illustration, type “data collection” and see which components make up a Google SERP. Below you’ll see what pops up in our browser.
Organic search:
Featured snippet:
People also ask section:
Top stories:
Related searches:
Quite a list! And that’s not even a final list of elements that a Google results page can have. As already mentioned, everything depends on the keyword that you’re googling. Below there’s a list of the elements that might appear on Google and can be easily scraped with Smartproxy’s scraping API:
- Paid and organic search
- Travel industry
- Popular products & listings
- Videos
- Ads (real-time)
- Images
- Google Shopping
- Related questions
- Related searches
- Featured snippets
- Local pack
- Top stories
- Hotel availability
- Restaurant data
- Recipes
- Jobs
- Knowledge panel
- … and anything else
What is Smartproxy’s SERP Scraping API?
If you want to scrape all major search engines at scale, including the almighty Google, your only choice will be our SERP Scraping API. It’s truly an all-in-one scraping tool: a network of 65M+ residential, mobile, and datacenter proxies, a web scraper, and a data parser.
With this tool, you won’t need any extras such as a scraper or a crawler to gather SERP info from Google. Our SERP Scraping API functions as a full-stack solution, which combines a proxy network and any extra scraping tool in a single product.
Features of our SERP API
Our SERP Scraping API is specifically designed for successful data scraping, so they’re highly scalable and flexible. What we mean by that is that our scraping API is:
- Easy to use
- 100% successful
All you need to do is specify your query or URL, and we’ll return well-formatted data in JSON. Trust us – after reading the tutorial below, you’ll see that it’s not that scary.
We guarantee that we’ll show you search results for any device or browser. If our first request fails, we keep sending requests as long as we need to return the desired result to you. We swear – you’ll get structured data from Google by sending just one API request.
The best part is that you won’t overpay because we never charge for failed requests. One request from your side equals one successful request count. All you need to do is wait for that data to arrive.
If you’re wondering how we do that, let us just tell you that we filter the best proxies from Smartproxy’s 65M+ proxy pool and choose only those that’ll bring you a 100% success rate. Smart, right?
- Rotating
- Extremely flexible
- Scalable
Our network uses advanced rotation, which will automate proxy changes, so you don’t have to worry about that every time a proxy dies.
Our scraping API will let you view results from any country, state, or city in the world. Using different endpoints to access real IP addresses from any place on earth also means that you’ll avoid IP blocking and flagging.
We support high volumes of requests, even when Google is asking for CAPTCHA. The exact number of maximum requests varies from time to time and depends on your scraping activities. Yet, what we know for sure is that using our API, you’ll defo forget about CAPTCHAs!
Step 1
Sign up to get access to our dashboard:
Step 2
Once you’ve signed up, wait for a verification email to arrive in your inbox. When you receive it, click on the link to confirm your email address, and hooray! You can now explore our dashboard!
Step 3
In the menu on your left, select “SERP” (under the “Scraping APIs” section) and then “Pricing”. Over there, pick the plan that suits your needs best:
With Smartproxy, you never have to worry about your payment. All our monthly subscription plans are recurrent. We charge you automatically, so you don’t have to manage anything. Of course, you can cancel, upgrade, or downgrade your subscription at any time.
The cherry on the cake is that if you’re not satisfied with our SERP API, we’ll refund you! We offer a 14-day money-back option to all proxy purchases on our website (terms apply).
Step 4
That’s where some coding comes into place. There, there… It’s easier than it sounds.
Let’s have a look at the code that you’d need to write in a Python interpreter. Why Python? Well, it’s widely known for its simple syntax, so a lot of programmers opt for it. However, if Python isn’t your type, you can also use other programming languages or simply write a code in the Terminal (Mac or Linux users) or the Command Prompt (Windows fans). Keep in mind, though, that you’ll need different code then – for more info on that, look at our API documentation.
Below is a code for Python:
import requestsurl = “https://scrape.smartproxy.com/v1/tasks”payload = {“target”: “google_search”,“query”: “proxy faq”,“parse”: True,“locale”: “en-GB”,“google_results_language”: “en”,“geo”: “London,England,United Kingdom”}headers = { "Authorization": "Basic cHJdeHl3YXv6U1Bwcm94rXdhtTE=",“Accept”: “application/json”,“Content-Type”: “application/json”}response = requests.request(“POST”, url, json=payload, headers=headers)print(response.text)
Here, your URL would always be the same – https://scrape.smartproxy.com/v1/tasks. Don’t make any changes here. Just copy and paste. Period.
If you want to scrape organic search results on Google, write “google_search” in the target line. Just as we said before, a Google search page is multifaceted so there are many available targets. If you’re after hotel deals on Google, write “google_hotels”. If you’re into book results, hit “google_books”. A list of all supported targets for our proxies is available in our documentation.
The query parameter indicates what you’d write in the Google search bar. In this case – “proxy faq”. Moving on, if you write “True” when specifying the parsing method, it means that your results will be parsed in JSON. Leave blank for HTML.
The locale parameter enables you to change the interface language of a Google search page (not the results). On the other hand, the results language parameter selects in which language you’d like to get your results from Google. Access a complete list of available languages. The geo variable pinpoints which region you want to target.
Regarding the remaining part of the code above (from headers), you should copy-paste it. Sure, another type of integration might require some minor changes. Yet, in general, the idea will be the same.
Step 5
The code above will return a huge set of beautifully parsed data. But what to do if you don’t want to see all the results but only some specific ones? Depending on what you want to filter, add these three simple lines in your Python interpreter:
my_list = parsed[“results”][0][“content”][“results”][“organic”]for k in my_list:print(k[“pos”], k[“url”])
We narrowed down the results so that only the URLs from the organic search of the keyphrase “proxies faq” would be visible. The code also extracts the position which a particular URL holds on Google.
Spot that tiny “0” in the code. It shows that we didn’t specify how many URLs we wish to get. In such a case, you’ll get ten results by default. Of course, don’t be shy to write a bigger number if you need more results.
So, those three added lines will return the parsed data below, including the position (from 1 to 10) for each URL:
1 http://www.freeproxy.ru/en/free_proxy/faq/index.htm2 https://brightdata.com/blog/proxy-101/common-proxy-questions3 https://wonderproxy.com/faq4 https://smartproxy.com/faq5 https://www.tue.nl/en/our-university/library/practical-information/faq/faq-proxy-server/6 https://desk.zoho.com/portal/iplayerproxy4u/en/kb/articles/proxy-faq7 https://limeproxies.netlify.app/blog/top-10-proxy-faqs-smarter-proxy-analysis8 https://duo.com/docs/authproxy-faq9 https://uncw.edu/reg/students-proxyfaq.html10 https://geekflare.com/best-proxy-browsers/
One final note
Sure, if you’re not a computer geek, you may not be as snug as a bug in a rug after reading this blog, but you should be feeling more comfortable with our scraping API now.
Don’t forget that our comprehensive documentation will give you more info on how to get started with Smartproxy’s API for gathering SERP data from Google. Oh yeah, I almost forgot! If you bump into a problem with the setup, contact our customer support team straight away.
About the author
Ella Moore
Ella’s here to help you untangle the anonymous world of residential proxies to make your virtual life make sense. She believes there’s nothing better than taking some time to share knowledge in this crazy fast-paced world.
All information on Smartproxy Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Smartproxy Blog or any third-party websites that may be linked therein.