OSINT blog: Exposed PII

Information doesn’t need to be secret to hold value. Whether from the blogs we visit, the broadcasts we watch, or the specialized journals we read, there is an endless stream of information that shapes our understanding of the world. The Intelligence Community commonly refers to this type of information as Open Source Intelligence (OSINT).’

  • Central Intelligence Agency, United States of Americ

The story for this blog is based on true events surrounding an OSINT investigation I undertook. The mission included two companies, one software developer, and over 100 employees who had their personally identifiable information (PII) exposed online.

This was an investigation on behalf of a large company that has been a victim of Emotet attack campaigns, and has been targeted by state-level threat actors (also known as APTs). 

For the purpose of this blog, let us call this large company: “Company A”

Part of my work involves monitoring for threats that face Company A and create an alert for mitigation. The threat in this sotry was that over a hundred its employees had their PII exposed in a project on a development platform, such as Bitbucket, GitHub, SourceForge, and Firebase. 

The PII I found included a long list of individuals’ full names, job role in company, department, location, email address, mobile number, and work telephone. I had to find out what kind of data this was. Was it a contact list from a mailbox? Was it from a marketing campaign? Was it from an exposed database? Or was the data meant to be there? Was it there by accident or was it there maliciously? Most of these questions needed answering to provide intelligence surrounding this threat.

After inspecting the project, I was initially able to find out more about the code hosted on the development platform. I paid close attention to the name of the project, but also the time elapsed, and who was likely responsible for this.

The project’s owner was the first thing I investigated. Luckily for me, it they had used their real name. By typing: “Developer Name” (in quotes) into Google I found their Twitter and LinkedIn accounts, as well as their other social networks within seconds. I saved and stored this for later. I was then able to see how long that this project had been active, but had no way to see how many people had viewed the code, which would be useful intelligence. The project itself was conveniently named, as developers often do, with the other company’s name. 

For the purpose of this blog, let us call this second company: “Company B”

Coveniently, the project was named something like “CompanyB-version-1” and with this I could pivot off the name, find out more information, and could begin to understand how this information ended up exposed via this project.

By typing exactly: “Company B” (in quotes) into Google, I was able to uncover that Company B was a legitimate firm and had a presence in the same country as Company A, another stroke of luck which made this investigation easier.

From this, I then typed exactly into Google: “Company B”+”Company A”, which led me to find that both A and B had been at the same conference together in 2018. This was via a PDF of the conference’s Floor Plan that was indexed on Google. To double check both were present at this conference I used a keyboard shortcut: CTRL + F to type Company A and Company B to see where they were on the Floor Plan, because it was a large conference, with hundreds of other firms present, although Company A were co-hosts of the event. This led me to believe that Company B was invited to this event and must have some kind of relationship already with Company A, these are not two unrelated firms, and even turned out to be in the same industry.

After finding out that A and B had a relationship with each other, I then continued to look through the results that Googling “Company B”+”Company A” produced. This led me to find an employee on LinkedIn who was a Sales Manager at Company B, who had previously worked at or with Company A, closely. By finding this individual, it helped prove there was a clear link between A and B. This link may have also been the cause of the data exposure. Perhaps they kept their old colleagues or connections’ contact details for marketing purposes later on? But this does question whether they had consent to do so, and questions how this information ended up into the hands of a third-party developer who then subsequently exposed the data online for all to see, and do nefarious things with.

Circling back to the developer, who owned the project on the development platform, as I was tasked with finding out the intent of what they were doing with this data and was also tasked with getting it taken offline from their account.

So by looking at their developer account on the platform I was able to see their other projects and what else they were working on. I could see they were developing some kind of travel app which was mentioned in their tweets a year ago. I could also see who they worked for via their LinkedIn, Twitter, and their other developer-type social networks. This led me to think that this developer was a third-party and may have been doing freelance/contractor work for Company B, the individual also claimed to specialise in PHP and to be a Front End and Back End developer.

Diving deeper into the developer’s presence online I was able to see where they were from, as they had made it public on several accounts. They also had several followers on Twitter who also had in their profiles that they were from the same country, this was also the same place that Company A and B were from.

When analysing social media profiles for intelligence, it is always important to see what information you can scrape from these accounts, such as names, jobs, emails addresses, projects, contact details, companies they are associated with, followers, what’s set to private or public, or mentions in other posts/tweets. It may also be worth noting what information they may try to hide or have only kept to one account, but not the others, this demonstrates its importance.

At this point I had collected the PII, identified the developer and their accounts, proof that Company A and Company B were connected through meetings or through individuals, and some idea of how this happened.


I collated what I found and put all this information neatly into a report and sent it to Company A’s head of security.

I was then tasked with getting this PII of over a hundred employees, including full names. email addresses, and phone numbers offline. On Company A’s end they were beginning to work out which employees had been targeted in phishing attacks since this PII had been made available online, but it potentially was not the first time.

In order to go about getting this project offline I first contacted the development platform directly with a complaint form. But all I got back was an automated response that they were ‘looking into it’. However, this information needed to be brought down as soon as possible. Therefore, I was going to have to contact the developer and ask them to remove the code.

The best way, in this situation, was to contact them professionally and ask them as politely as you can to remove information, you found, that could have damaging ramifications, which turned out that it did already, anyway.

I could have reached out to the developer via Twitter, although their direct messages (DMs) were set to private and mentioning them in a tweet was possible, although it more than likely would expose the PII to more people but can be an effective way to get it removed immediately. Therefore, I chose to contact them via LinkedIn. I decided to use ‘sock puppet’ accounts to reach out to them, with a note attached, via connection request. In the note, I simply and politely mentioned that this I found their project online, it contained potentially exposing data, and it would be a good idea to remove it, and prevent others from using it nefariously.

Two or three hours passed and the developer responded and promised to remove the code immediately from the development platform. This was a mission success. The code was offline and it was no longer publicly accessible from anyone with a URL to the project.

Company A’s security operations center (SOC) responded to me by saying that they had found that over 50 of the email addresses had been targeted in phishing attacks in the last month, approximately the same amount of time the project had been online. Therefore, it was safe to assume that attackers had already discovered this data and were already using it in attacks. But, with the knowledge that the source of the attacks was found, those threat actors who do have the PII are potentially the only ones with it, and other provisions can be made for those mailboxes affected (e.g. notification to those affected, training, updating email gateway etc).

Bloopers:

During this OSINT investigation there can be hairy moments, which are to be expected. The only blooper going through this process was during the moment I had to contact the developer and get the project offline. There was no easy direct avenue to contact them. Therefore, the most simple way was to use the ‘attached note with connection request’ feature on LinkedIn (see Figure 1). LinkedIn premium does enable you to reach out directly, however, it is quite costly and was not really an option here.

(Figure 1)

So the mistake I made here was that when I went to make the note I left it on draft on the browser, and then I noticed the developer’s company website was present on their account and clicked through. At the same time the connection request was fired off without the all important note. Oops. I started to sweat and now the developer just has a random account requesting to be a connection. This was not the end of the world, but to me, it was giving my position away, like a scout sniper rustling the bushes next to their targets. I had to move to a fall back position, try again, and lined up another request from another sock puppet account and fired away. By taking care and being polite, the note got to its target and the mission objective was accomplished after all. Phew.

Conclusion:

The thing is, online cloud development platforms have only existed within the last 5 to 10 years, but have quickly become popular with software developers. They have taken the industry by storm and are deployed in every major organisation in some way. Each platform contains hundreds of millions of lines of code, ranging from students’ projects to Fortune 500 firms’ backend infrastructure. This results in a potential gold mine for cybercriminals who can scour through the code submissions and pick up anything from credentials for critical online services, SSL keys, API Access Tokens, database connection strings, and PII among other things, which could be disastrous for an organisation. This problem is normally caused due to a lack of awareness on the developer’s part and relaxed auditing approaches of code. It is also the responsibility of the security teams whose are required to make sure business operations are executed with confidentiality, integrity, and secure accessibility.

Notes:
– I would like to thank @Ph055a who created this great collection of OSINT-specific tools and who is the author of osint.team which is an awesome place to hang out and learn about OSINT.

– The method of Googling described in this investigation is known as ‘Google Dorks‘ or ‘Google Hacking’. This is a fun way of finding glaring security issues which are freely available, indexed by the world’s largest search engine.