Collecting huge amounts of data with WhatsApp

Nederlandse versie

Creating a database of phone numbers, profile pictures and status information of almost all users of WhatsApp turns out to be very easy . The user doesn’t even have to be added to your contacts.  This should raise at least some privacy concerns and hopefully a lot more. Let me explain how it works.

Are you tech savvy? You can download here a Chrome extension to use my script for yourself.

A few years ago WhatsApp made it possible to use WhatsApp in your web browser. That is good for user experience because composing a message on your keyboard is a lot easier than using those tiny touch screen buttons. It also makes copying/pasting and adding attachments easier. So much for the good news. The bad news is that it’s technically possible to use the WhatsApp Web interface to create a huge database of all possible WhatsApp users. There’s only a small group of users not affected: the users who have changed their privacy settings. Unfortunately, most users don’t change those privacy settings and WhatsApp doesn’t encourage it very much. These facts open up the possibility of collecting huge amounts of interesting data which i’m going to show you now.

Explanation for normal users

Web WhatsApp connects to the WhatsApp servers by using your phone. In a nutshell the browser instructs the server to send back all the information for a certain phone number. Some of the information that’s being sent back include the following:

  • The profile picture
  • The status text or about text, the default texts is the famous ‘Hey there! I am using WhatsApp’
  • The online/offline status of the user

It turns out that the above information can be requested for every phone number. As said, it’s not necessary that the phone number has been added to your contact list. And because there is no such restriction, it’s possible to create a complete database of phone numbers, profile pictures, about texts and online/offline statuses. The database may be setup in such a way that complete timelines of phone numbers can be reconstructed. That answers questions like: when was the user with phone number xxx-xxxxxx online and offline?

A fictional time line that can be reconstructed for almost all users from the mentioned database

Almost all web sites that are being send to your browser, contain specific software that determines how the web site will function in your browser. Such software is written in Javascript. The software determines what happens when you click a button or move the mouse. But the software may also connect to a server to request certain kinds of information. The software in Web WhatsApp does that too. It sends a phone number to the WhatsApp server and in a few milliseconds it receives the information about that phone number. One of the nice things of that software is that everybody can take a look at the source code. That’s not all, you can also use the software by yourself by using certain parts of it stand alone. I’ve used that possibility to develop a script to request information from a huge range of phone numbers. That information contains the profile pictures, about texts and online/offline statuses. Everyone can create such a script.

My script in action where I requested the information for 400 random phone numbers

Privacy concerns

So, what can anyone do with al this information? First of all, again, imagine that anyone can create a database with the above information that contains all phone numbers for a certain country together with the profile pictures, about texts and online/offline statuses. This is in reach for a country like the Netherlands. The database can be queried in such a way that it tells me when a phone number was online and it tells me what profile picture belongs to the phone number. After a few months it can tell me how often you have changed your profile picture and into what pictures. And how about facial recognition? Those techniques haven been improved over the last years. Imagine this, I take a walk and take a picture of some stranger. Now I feed the database that picture and in a few minutes it tells me which phone number belongs to the picture. Now that is quite scary, isn’t it?

Response WhatsApp

I’m a fan of responsible disclosure. So when i found out this possibility of collecting huge amounts of data in WhatsApp, I contacted them. Or, I contacted Facebook because they own WhatsApp. Summarized, they are aware of the possibility of this amount of data collection but they don’t see it as a problem or a privacy concern for that matter. Take a moment to think about it before you might agree…

Response from Facebook after I disclosed the data collection possibility to them

Technical explanation

The following is a technical explanation that may be a bit difficult to follow for non-technical users.

Web WhatsApp makes use of a undocumented API. That’s an API that you can use but you’ll have to find out for yourself how to use it. The javascript API communicates with the WhatsApp servers by using a WebSocket.

Part of the Javascript API of Web WhatsApp

There are three API calls that I use in my script. The first one is Store.ProfilePicThumb.find(<phone number>) and it’s used to collect profile pictures. You can use it as follows:

Example of requesting a profile picture for a phone number. Keep in mind that you can only request the urls in the same tab as where Web WhatsApp is running. You’ll have to add a <img> element to the DOM.

The second API call is Store.Wap.statusFind( <phone number>) and it’s used to request the about text of a phone number. An example:

Requesting the about text of a phone number

The last API call is Store.Presence.find( <phone number>) and it’s used to request the online/offline status. Use it as follows:

Requesting the online/offline status of a phone number

By putting all these API calls in a loop, you can request this information for every phone number you can think of.

At the beginning of this article you see an UI that I’ve created. It uses the above API calls. You can find that script here. Drop the script in the developer console in your Web WhatsApp instance and the UI pops up. Please, use it wisely!

Update 15-05-2017 04:20 PM

Some people made some remarks about this finding. I’d like to respond to those remarks as follows:

Remark 1:  ‘I don’t see how this is such news, i can simply add any number and have the same information’
Yes, you can do that and you would have the same information. But the difference is that I use an automated way to collect that information. Are you able to select 100 numbers, add them to your contacts and have that information in a nice table using your phone? It’s the scale of the collection and it’s not about revealing secret information. It’s not a security related issue.

Remark 2: ‘This information isn’t secret or private. How is this news?’ 
Haha, yes I get that. But think about this: I can create a huge database containing profile pictures connected to a phone number. That way I can use facial recognition to find out what someone’s phone number is just by taking a picture of them. Again, as with remark 1: it’s all about the potential scale of doing this that makes it an issue.

Remark 3: ‘I already discovered this years ago, how is this news?’
I don’t know? Probably didn’t it get the attention that it deserved at the time. Now it gets the attention, be happy with it because a lot of people have changed their privacy settings in WhatsApp because of this blog/script. 🙂

Update 16-05-2017 02:00 PM

Andreas Buchenscheit has pointed out to me the dangers of knowing the presence timeline of WhatsApp users. Take a look at his paper here.

Update 16-05-2017 21:45 PM

I’ve created a very simple Chrome extension that brings up the UI.

26 gedachten over “Collecting huge amounts of data with WhatsApp”

Reacties zijn gesloten.