Python Tutorial: Scraping Telegram with Datacenter Proxies
Social media scraping can look like a tough nut to crack due to strong anti-bot systems. Gladly, it’s not a rule, at least in Telegram’s case. This platform supports various Telegram bot automation, making the scraping process easier.
There’re a lot of ready-built solutions for that, but you can easily make one yourself with a bit of coding and the Telegram API. Yes, this platform even has its own API! Dope, innit?
Telegram and its unlocked potential
Let's start from the beginning. Telegram is quite a popular messaging platform that offers users better security than many alternatives due to encryption. It's perfect for chatting, sending photos or videos, and sharing files in doc, zip, mp3, or many other formats. This platform isn't only about getting in touch with your friends. Its network's megagroups support up to 200K members, and you can engage with all of them! Or you can enjoy themed Telegram channels, where admins share their content with an unlimited number of people.
Wondering how this platform can be beneficial for your business? Telegram is known for hosting niche communities that share similar interests. Because of that, it became the holy grail for marketing gurus. Full of useful contact information, this messaging service opens the opportunity to send automated DM messages to very niche audiences. And we're not talkin' about spamming here – Telegram takes it seriously, so you should avoid such activities to not get restricted.
We're excited to show you the way to bear fruit from this platform, but first, let's look at why you would need Telegram automation.
What is Telegram automation?
Building contact lists could be a headache if you don't use Telegram bot automation. Usually, those tiny robots are configured to send automated messages; however, it's not the only possible use case. Telegram automation can be used for automated reminders, video downloading, converting files, etc.
Another broad use case is data collection. Solutions like datacenter proxies empower you to scrape Telegram for whatever reason you want. For example, a Telegram bot coupled with datacenter IPs can automatically scrape group members in bulk without forcing you to spend donkey's years in front of a computer.
The magic behind Telegram proxies
Data scraping isn't always smooth – you can face bad connection or restricted location problems. However, it can be solved using Telegram proxies. Routing your original IP address via a proxy server can also give you stronger digital privacy. The best part is that you can easily add it to the app through Settings > Data and Storage > Proxy settings > Add proxy.
That's not all – the platform supports proxies for various Telegram automation, making life easier for marketers, product developers, and others who need to scrape in bulk. They can use the scraped information to perform rock-solid research or benefit in other ways. While scraped username contact data can help with lead generation, extracting chats could provide brilliant insights for creating new products or updating old systems.
This platform has its own API with detailed documentation, so scraping Telegram is a legit use case with a lot of coverage.
Extracting information from Telegram groups and channels
Before scraping any information on Telegram, a good starting point is to understand how channels differ from groups. Telegram channels have a broadcasting function where only admins can share messages, leaving other members as viewers. Groups are different since they serve like chats where everybody's interaction is welcome.
Scraping Telegram channel subscribers
Channels could sound like a promising source of contact information due to the unlimited number of members. However, scraping Telegram channel members from the subscriber's standpoint isn't possible since only admins can access such information. So, we're not going to dig deeper into this case.
What about scraping messages from Telegram channels?
Such alternative data, like Telegram posts, can provide you with great insights to your research. The good news is that, unlike the list of subscribers, you can easily scrape Telegram messages in the channels using Python and Telegram API. And by adding Telegram proxies, you can extract content that might be unavailable with your original IP.
How to scrape Telegram group members?
Thanks to data collection gods, extracting group members is more than possible. Telegram is quite chill about people scraping its content, so you don't need specific tools. Only the previously mentioned API with a bunch of Python. And there's no need to be an experienced coder since the tutorial we prepared for you is beginner-friendly. Yap, you guessed right – we will be scraping Telegram group members.
Unlocking restricted Telegram content with datacenter proxies
In this tutorial, we use datacenter proxies, so we recommend you to use them as well. Why on earth do I need them, you may ask? No doubts you can extract data without them, but there are several reasons why you might need to change your IP:
- Additional protection to your privacy. High-anonymous proxies, like the ones Smartproxy offers, don't disclose your home IP or the fact you're using a proxy.
- Geo-restricted content. Telegram is restricted or banned in some countries, for example, China. If you're connecting from a restricted area, datacenter IPs with geo-targeting worldwide are a must.
Additionally, you can always use residential proxies and lift your targeting game to another level. We offer datacenter proxies since Telegram anti-bot systems aren't a thing you need to worry about while scraping this platform. They're faster, cheaper, and in this case, can deliver great results.
Scraping Telegram with Python
We’ll be scraping Telegram group members. For that, you’ll need:
- Your Telegram credentials (we’ll explain how to get them);
- Datacenter proxies;
- Python 3;
- The Telethon library.
Now, let’s get down to business.
Step 1 – Get your Telegram credentials
Before scraping telegram group members and using the Telegram API, you should get your credentials. For that, go to my.telegram.org, log in to your existing account and press on “API development tools.”
Name your app title and fill in other required fields. Make sure you’re not connected to a VPN or proxy while creating your application.
In the next window, you’ll receive api_id and api_hash, which will be useful for scraping with Telegram API. Save them somewhere cause you’ll need them later.
Step 2 – Setup proxies
Datacenter proxies are more than enough for scraping Telegram, so we’ll stick with them in this tutorial. Here’s how to authenticate datacenter IPs:
- Log in to the Smartproxy dashboard.
- Press Datacenter on the left menu, choose Pricing, and pick the best plan that suits your needs.
- Then go to the Authentication method. For datacenter proxies, only the user:pass method is available.
Don’t forget to change the address, port, username, and password values.
proxy = {'proxy_type': 'http','addr': '1.1.1.1','port': 5555,'username': 'your_username','password': 'your_pass',}
Step 3 – Install Telethon library
Telethon is Python 3 MTProto library that works with Telegram API. Add it to your code:
python pip install telethon
Step 4 – Create database file and login
Now, we need to import the sync model from our chosen library and credentials from Step 1. Don’t forget to change information in api_id, api_hash, phone lines as well as proxy information.
from telethon.sync import TelegramClientapi_id = 123456api_hash = 'YOUR_API_HASH'phone = '+111111111111'TelegramClient(phone, api_id, api_hash, proxy={'proxy_type': python_socks.ProxyType.HTTP, 'addr': '1.1.1.1', 'port': 5555, 'username': 'your_username', 'password': 'your_pass'})
We recommend you log in to your Telegram account again and check if you’ve been authorized properly. In case you’re not, issue a request for an OTP code and enter the received code:
client.connect()if not client.is_user_authorized():client.send_code_request(phone)client.sign_in(phone, input('Enter the code: '))
After successfully logging in, you’ll see a .session type file created – this is your permanent database file.
Step 5 – Create a list for results
Now it’s time to create an empty chat list and fill it with the information you’ll receive from GetDialogsRequest. For that, you also need to add the InputPeerEmpty function. Here’s what this part of the code looks like:
from telethon.tl.functions.messages import GetDialogsRequestfrom telethon.tl.types import InputPeerEmptychats = []last_date = Nonechunk_size = 200groups=[]result = client(GetDialogsRequest(offset_date=last_date,offset_id=0,offset_peer=InputPeerEmpty(),limit=chunk_size,hash = 0))chats.extend(result.chats)
Note that conversations are filtered using offset_date and offset_peer. We provide empty values to these parameters for the API to return all the chats. We also set offset_id and limit for pagination. In this case, you will end up with the last 200 chats of the user.
We assume you’re only interested in megagroups, so to check if this attribute is true, add this part of the code:
for chat in chats:try:if chat.megagroup== True:groups.append(chat)except:Continue
Of course, not all chats have a megagroup. That's why an except part ensures that everything will work regardless.
Step 6 – Select a group to scrape members
Now, when you have all the groups listed, it’s time to pick the group which members’ details you wanna scrape. While running, the code iterates through each group you stored in the previous step, printing each group's name beginning with a number. This number serves as the group list's index.
print('Choose a group to scrape members from:')i=0for g in groups:print(str(i) + '- ' + g.title)i+=1
Enter the number associated with a particular group as an index:
g_index = input("Enter a Number: ")target_group=groups[int(g_index)]
Step 7 – Export all the members’ details
To export all the members of your chosen Telegram group, create an empty list using the get_participants function.
print('Fetching Members...')all_participants = []all_participants = client.get_participants(target_group, aggressive=True)
Note that aggressive is set on true, making it possible to extract more than 10K group members. When this parameter is activated, Telethon usually exports more than 90% of all the contacts in the list.
Step 8 – Store the exported data in a .csv file
Finally, it’s time to put everything in a more readable format for further analysis. We will use UTF-8 encoding if the members’ usernames are in non-ASCII form (which is very common for Telegram users, no cap).
So first, open the .csv file in writer mode and name the header. Then, write each item in the all participants list to the CSV file using a loop.
print('Saving In file...')with open("members.csv","w",encoding='UTF-8') as f:writer = csv.writer(f,delimiter=",",lineterminator="\n")writer.writerow(['username','user id', 'access hash','name','group', 'group id'])for user in all_participants:if user.username:username= user.usernameelse:username= ""if user.first_name:first_name= user.first_nameelse:first_name= ""if user.last_name:last_name= user.last_nameelse:last_name= ""name= (first_name + ' ' + last_name).strip()writer.writerow([username,user.id,user.access_hash,name,target_group.title, target_group.id])print('Members scraped successfully.')
Just bear in mind that not all the users have usernames, names, or surnames. Those who don’t have them will be named “None.”
The whole exporting process can take up to a few minutes for some groups, but after the work is done, you should receive a note “Members scraped successfully.” Voila, your job is done here!
On the final note
Since Telegram is a proxy-friendly messaging platform, you can use this edge for your scraping needs. If you need to jumpstart your scraping process with Telegram proxies but have some hems and haws about starting, contact our proxy heroes. They would be happy to chat with you about how to rock Telegram with Smartproxy datacenter IPs.
About the author
Ella Moore
Ella’s here to help you untangle the anonymous world of residential proxies to make your virtual life make sense. She believes there’s nothing better than taking some time to share knowledge in this crazy fast-paced world.
All information on Smartproxy Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Smartproxy Blog or any third-party websites that may belinked therein.