Finding celebrity locations via Twitter — Celebrity Exif data mining

:: UPDATE ::

Nate Beck gave a presentation on this topic at Ignite Seattle 13, here’s the video.


Last week one of my favorite video podcasts, Hak5, had a segment on Exif data mining (Episode 721). In the episode Rob Fuller (a.k.a. Mubix) shared his experience, how his images contained unwanted GPS information embedded in the Exif headers.

While mining Exif information from images is nothing new (Version 2.1 of the specification is dated June 12, 1998)… most consumers don’t realize the kind of information attached to their images by default.

In February of this year, Johannes Ullrich over at ISC published a paper titled Twitpic, EXIF and GPS: I Know Where You Did it Last Summer. Johannes explains:

“Modern cell phones frequently include a camera and a GPS. Even if a GPS is not included, cell phone towers can be used to establish the location of the phone. Image formats include special headers that can be used to store this information, so called EXIF tags.”

So after watching the segment, I was curious how widespread the issue is and decided to conduct my own investigation, codenamed Seeker.

Harvesting images

I decided to write a quick ruby script to harvest images from multiple image sites. The sites that I targeted were: Twitpic, Twitgoo, Tweetphoto, and yfrog. Some were easier to harvest than others, mostly due to easily accessible APIs.

After running the script on myself, a few of my friends and the team over at Hak5. I decided to widen my search to include public figures and celebrities.

Finding celebrities

At first I just scoured the internet looking for verified accounts for celebrities. I found a few sites that have lists of Twitter accounts for celebrities. My favorite one is WeFollow, which is owned by Digg.

So I wrote a script to collect Twitter account names from WeFollow. All I needed to provide the script was a category and a number of pages to scrape.

I edited the results and tossed in a few people that weren’t on WeFollow, and I came up with the following list of 147 celebrities.

Below is a glimpse of the log file while the script is running.

And in about 42 minutes… I had 11,688 photos from 147 Twitter handles. Not every celebrity on the list had images on those services. In fact I couldn’t detect images for 22 of the celebrities.

Processing 147 users
Starting twitpic
Finished twitpic -- 985.437336 seconds
Starting yfrog
Finished yfrog -- 980.306637 seconds
Starting tweetphoto
Finished tweetphoto -- 475.657377 seconds
Starting twitgoo
Finished twitgoo -- 85.571865 seconds
Total Time : 2526.98922 seconds

Extracting the Exif data

Now that I had 11,688 images, it was time to go through the images and see what kind of gems were hidden in the metadata.

So I wrote yet another ruby script which goes through each image and dumps all of the Exif data into a text file.

And the result…

44 users affected
125 users total
GPS count: 878
Total count: 11688
Percentage: 7.51%

Success! 878 images out of 11,688 have GPS data.

Visualizing the information

Now by day I’m an Adobe AIR developer, so naturally I decided to write a simple Flex 4 interface to help me visualize the information I collected, codenamed SeekAIR.

[note]
I was able to find personal addresses for some of the celebrities, but have opted not to share that information. The images I have chosen to share are public places where the location is obvious.
[/note]

Since it was Darren’s show that sent me on this investigation… let’s take a look at Darren’s images.

Darren Kitchen (hak5darren)

First I checked Darren’s TwitPic “Places I’ve Been” page to see if he has GPS enabled on purpose…

It doesn’t seem so.

Let’s open the photos up in SeekAIR and see what we can find…

These are just a couple of the images that I found from Darren which had GPS data encoded in them… I also learned that Darren took these photos on his Droid phone because that information was in the Exif data.

Not that Darren Kitchen isn’t a celebrity in my life… but let’s take a look at someone a little more interesting.

Adam Savage (donttrythis)

Another show that I love is MythBusters, so naturally Adam Savage was on my list of people. Again I checked Adam’s “Places I’ve Been” on TwitPic…

Same thing, apparently no images have locations. Again this is misleading, because many of Adam’s photos have GPS data in them… for example:

You may argue that Adam Savage isn’t a celebrity, and I’d have to fight you on that account. But in any case, let’s move on to another example.

Tom Hanks (tomhanks)

Once again to show I have nothing up my sleeve, let’s check Tom’s “Places I’ve Been” page.

Now let’s open the photos in SeekAIR…

This one shows Google street view of the location where the image was shot.

And this one shows Tom at Pixar.

Once again we have found GPS data hidden within these images. But… GPS data is not the only information that is included in Exif headers.

Britney Spears (britneyspears)

Britney didn’t have any GPS data in her photos, but nonetheless other information can be found in the Exif data.

This one made me laugh…

I mean, we all know that almost all celebrity photos have been Photoshopped, but this photo has the proof embedded right inside of it.

Collection Statistics

Now, you may be asking yourself exactly how many images are affected, so let’s take a look at the statistics.

Breakdown by Device

The following chart is a bit startling. I’m not going to draw any conclusions about it… perhaps the Apple iPhone is the most popular device among celebrities.

The article from ISC has a better chart showing the cross section by device for the general public.

Affected Files By Site

As you can see in the below chart, the majority of images came from TwitPic. It seems to be the most popular image service.

Affected Users By Site

You’ll notice the total users on this chart is 214; this is because some users had pictures on multiple image services. The blue bar represents the affected users out of the total users for the site.

Where to go from here

So what can we do to protect ourselves going forward? This issue affects everyone, not only celebrities. Consumers should be aware of what information is leaving their mobile devices.

Remove previous images

The thing about Twitter is that tweets expire. So the tweets that correspond with a particular image may no longer be available. According to the Twitter API Wiki:

“We also restrict the size of the search index by placing a date limit on the updates we allow you to search. This limit is currently around 1.5 weeks but is dynamic and subject to shrink as the number of tweets per day continues to grow.”

Turn off location services for the camera

Thankfully, the latest release of Apple iOS4 has the ability to turn off “Location Services” specifically for the camera application.

If you’re not an iPhone user, your device should have similar settings.

Scrub your images before you upload them

Since Seeker was just a weekend project, I haven’t gotten around to this yet… but ZaaLabs will be releasing a free Adobe AIR application to scrub Exif data from images before uploading them to these image services. Stay tuned.

Hey where can I get Seeker?

I will not be releasing the Seeker or SeekAIR code or applications to the public.

A note to the users mentioned in this post

ZaaLabs is willing to assist in identifying and removing affected images… Contact us.

Tags: , , , , , , ,

16 Responses to “Finding celebrity locations via Twitter — Celebrity Exif data mining”

  1. Adam July 12, 2010 at 4:22 am #

    That is some high tech stalking you have here. I just wish photo services like Picasa used this EXIF data to put my iPhone photos on the map. Or maybe they get stripped out in the upload process?

  2. Clint July 12, 2010 at 8:41 am #

    Very cool stuff!

  3. James July 12, 2010 at 9:46 am #

    You hi-tech stalker you. I wonder if this sort of exposé would cause sites like TwitPic to consider removing the EXIF metadata from their users images?

    But then, I suppose some of the EXIF metadata is quite useful…

  4. Garth Braithwaite July 12, 2010 at 10:27 am #

    I heard my twitpic account was used for some of the testing. I was sad to find I didn’t make the celebrity cut. I guess I have to take down Justin Bieber.

  5. polyGeek July 12, 2010 at 12:18 pm #

    I love it. This app should be enough to get you into the Evil League of Evil. 🙂

  6. merc July 13, 2010 at 3:26 pm #

    this is brilliant! 🙂

  7. Albatross July 23, 2010 at 7:07 am #

    When the last Harry Potter book was posted as page-by-page jpgs on the Internet, the first thing I did was bust open the exif data. No joy, but I could tell you the model number and ID of the camera he used.

  8. Joshua September 17, 2010 at 4:09 pm #

    Great post. Thanks for doing all of the interesting work to show us the cool stuff.

  9. SecBoyUK April 16, 2012 at 3:25 pm #

    Hi

    If you’re not going to release this app do you know of any similar apps out there that do the same?

    Thanks

Trackbacks/Pingbacks

  1. A new podcast you can’t refuse | polyGeek.com - July 20, 2010

    […] me and Flex Gangsta are Nate – stalker of stars – Beck, Matt – MultitouchUp – Legrand, and Aaron – HTML5 ate my brain […]

  2. RIARadio, GangstaCast, TheFlexShow and Tech News Today… - July 21, 2010

    […] Article on ZaaLabs […]

  3. Wish list for next release... - Droid Forum - Verizon Droid & the Motorola Droid Forum - August 2, 2010

    […] disable exif data when using the built-in camera! just looked at one of the images i took and bam GPS location. See link below: Finding celebrity locations via Twitter — Celebrity Exif data mining | ZaaLabs […]

  4. FITC // FITC San Francisco Report - September 23, 2010

    […] Celebrity Stalking A session I caught there started with the question, “Do you want to see my game talk or my talk on celebrity stalking?” The next 30 minutes involved a good look at the ins and outs of stalking celebrities using Twitter id’s geo-tagged photos, ruby scripts and AIR applications to figure out where your favourite celebrities (or in our case Seb and Jeremy) are spending their time. Check out SeekAIR and a detailed breakdown of the entire process and it’s findings here: http://www.zaalabs.com/2010/07/finding-celebrity-locations-via-twitter/ […]

  5. Speakers for Ignite Seattle 13 - January 27, 2011

    […] Nate Beck (jnbeck) Searching for Adam Savage Modern cell phones frequently include a camera and a GPS. Even if a GPS is not included, cell phone towers can be used to establish the location of the phone. Image formats include special headers that can be used to store this information, so called EXIF tags. In this talk, I will describe the process that was used to find the information that many consumers (including celebrities) unknowingly broadcast out on the internet. […]

  6. Cool stuff in Google+ – Image details | Psyked - July 17, 2011

    […] class of Joa Eberts’ Imageprocessing Library for histograms [link] and on image metadata the “Finding celebrity locations via Twitter — Celebrity Exif data mining” post from nate of ZaaLabs. Posted by James FordFull profile: […]

  7. Leedo Studios | GangstaCast Episode 1: Celebrity Stalker Edition - November 5, 2011

    […] 360Flex speakers and sessions announced 360Flex Palooza Google App Inventor Native 3d in Flash Player 11? Ryan Stewart on “Going Native” iPhone 4 woes and press conference update Oblong Nates Celebrity Stalking […]

Leave a Reply