Violent Python: A Cookbook for Hackers, Forensic Analysts, Penetration Testers and Security Engineers (12 page)

Using Python to Recover Deleted Items in the Recycle Bin

On Microsoft Operating Systems, the Recycle Bin serves as a special folder that contains deleted files. When a user deletes files via Windows Explorer, the operating system places the files in this special folder, marking them for deletion but not actually removing them. On Windows 98 and prior systems with a FAT file system, the C:\Recycled\ directory holds the Recycle Bin directory. Operating systems that support NTFS, including Windows NT, 2000, and XP, store the Recycle Bin in the C:\Recycler\ directory. Windows Vista and 7 store the directory at C:\$Recycle.Bin.

Using the OS Module to Find Deleted Items

To allow our script to remain independent of the operating system, let’s write a function to test each of the possible candidate directories and return the first one that exists on the system.

 import os

 def returnDir():

  dirs=[‘C:\\Recycler\\’,‘C:\\Recycled\\’,‘C:\\$Recycle.Bin\\’]

  for recycleDir in dirs:

   if os.path.isdir(recycleDir):

    return recycleDir

  return None

After discovering the Recycle Bin directory, we will need to inspect its contents. Notice the two subdirectories. They both contain the string S-1-5-21-1275210071-1715567821-725345543- and terminate with 1005 or 500. This string represents the user SID, corresponding to a unique user account on the machine.

 C:\RECYCLER>dir /a

  Volume in drive C has no label.

  Volume Serial Number is 882A-6E93

  Directory of C:\RECYCLER

 04/12/2011 09:24 AM 

 .

 04/12/2011 09:24 AM 

 .

 04/12/2011 09:56 AM  

  S-1-5-21-1275210071-1715567821-725345543-

 1005

 04/12/2011 09:20 AM  

   S-1-5-21-1275210071-1715567821-725345543-

 500

    0 File(s) 0 bytes

    4 Dir(s) 30,700,670,976 bytes free

Python to Correlate SID to User

We will use the windows registry to translate this SID into an exact username. By inspecting the windows registry key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList\\ProfileImagePath, we see it return a value of %SystemDrive%\Documents and Settings\. In the following figure, we see that this allows us to
translate the SID S-1-5-21-1275210071-1715567821-725345543-1005 directly to the username “alex”.

 C:\RECYCLER>reg query “HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentV

 ersion\ProfileList\S-1-5-21-1275210071-1715567821-725345543-1005” /v ProfileImagePath

 ! REG.EXE VERSION 3.0

 HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList \S-1-5-21-1275210071-1715567821-725345543-1005 ProfileImagePath

 REG_EXPAND_SZ %SystemDrive%\Documents and Settings\alex

As we will want to know who deleted which files in the Recycle Bin, let’s write a small function to translate each SID into a username. This will allow us to print some more useful output when we recover deleted items in the Recycle Bin. This function will open the registry to examine the ProfileImagePath Key, find the value and return the name located after the last backward slash in the userpath.

 from _winreg import ∗

 def sid2user(sid):

  try:

   key = OpenKey(HKEY_LOCAL_MACHINE,

   “SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList”

   + ‘\\’ + sid)

   (value, type) = QueryValueEx(key, ‘ProfileImagePath’)

   user = value.split(‘\\’)[-1]

   return user

  except:

   return sid

Finally, we will put all of our code together to create a script that will print the deleted files still in the Recycle Bin.

 import os

 import optparse

 from _winreg import ∗

 def sid2user(sid):

  try:

   key = OpenKey(HKEY_LOCAL_MACHINE,

   “SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList”

   + ‘\\’ + sid)

   (value, type) = QueryValueEx(key, ‘ProfileImagePath’)

   user = value.split(‘\\’)[-1]

   return user

  except:

   return sid

 def returnDir():

  dirs=[‘C:\\Recycler\\’,‘C:\\Recycled\\’,‘C:\\$Recycle.Bin\\’]

  for recycleDir in dirs:

   if os.path.isdir(recycleDir):

    return recycleDir

  return None

 def findRecycled(recycleDir):

  dirList = os.listdir(recycleDir)

  for sid in dirList:

   files = os.listdir(recycleDir + sid)

   user = sid2user(sid)

   print ‘\n[∗] Listing Files For User: ’ + str(user)

   for file in files:

    print ‘[+] Found File: ’ + str(file)

 def main():

  recycledDir = returnDir()

  findRecycled(recycledDir)

 if __name__ == ‘__main__’:

  main()

Running our code inside a target, we see that the script discovers two users: alex and Administrator. It lists the files contained in the Recycle Bin of each user. In the next section, we will examine a method for examining some of the content inside of those files that may prove useful in an investigation.

 Microsoft Windows XP [Version 5.1.2600]

 (C) Copyright 1985-2001 Microsoft Corp.

 C:\>python dumpRecycleBin.py

 [∗] Listing Files For User: alex

 [+] Found File: Notes_on_removing_MetaData.pdf

 [+] Found File: ANONOPS_The_Press_Release.pdf

 [∗] Listing Files For User: Administrator

 [+] Found File: 192.168.13.1-router-config.txt

 [+] Found File: Room_Combinations.xls

 C:\Documents and Settings\john\Desktop>

Metadata

In this section, we will write some scripts to extract metadata from some files. A not clearly visible object of files, metadata can exist in documents, spreadsheets, images, audio and video file types. The authoring application may store details such as the file’s authors, creation and modification times, potential revisions, and comments. For example, a camera-phone may imprint the GPS location of a photo, or a Microsoft Word application may store the author of a Word document. While checking every individual file appears an arduous task, we can automate this using Python.

From The Trenches
Anonymous’ Metadata Fail

On December 10, 2010, the hacker group Anonymous posted a press release outlining the motivations behind a recent attack named Operation Payback (
Prefect, 2010
). Angry with the companies that had dropped support for the Web site WikiLeaks, Anonymous called for retaliation by performing a distributed denial of service (DDoS) attack against some of the parties involved. The hacker posted the press release unsigned and without attribution. Distributed as a Portable Document Format (PDF) file, the press release contained metadata. In addition to the program used to create the document, the PDF metadata contained the name of the author, Mr. Alex Tapanaris. Within days, Greek police arrested Mr. Tapanaris (
Leyden, 2010
).

Using PyPDF to Parse PDF Metadata

Let’s use Python to quickly recreate the forensic investigation of a document that proved useful in the arrest of a member of the hacker group Anonymous.
Wired.com
still mirrors the document ANONOPS_The_Press_Release.pdf. We can start by downloading the document using the wget utility.

 forensic:∼# wget http://www.wired.com/images_blogs/threatlevel/2010/12/ANONOPS_The_Press_Release.pdf

 --2012-01-19 11:43:36-- http://www.wired.com/images_blogs/threatlevel/2010/12/ANONOPS_The_Press_Release.pdf

 Resolving www.wired.com... 64.145.92.35, 64.145.92.34

 Connecting to www.wired.com|64.145.92.35|:80... connected.

 HTTP request sent, awaiting response... 200 OK

 Length: 70214 (69K) [application/pdf]

 Saving to: ‘ANONOPS_The_Press_Release.pdf.1’

 100%[==================================================================================>] 70,214 364K/s in 0.2s

 2012-01-19 11:43:39 (364 KB/s) - ‘ANONOPS_The_Press_Release.pdf’ saved [70214/70214]

PYPDF is an excellent third-party utility for managing PDF documents and is available for download from
http://pybrary.net/pyPdf/
. It offers the ability to extract document information, split, merge, crop, encrypt and decrypt documents. To extract metadata, we utilize the method .getDocumentInfo(). This method returns an array of tuples. Each tuple contains a description of the metadata element and its value. Iterating through this array prints out the entire metadata of the PDF document.

 import pyPdf

 from pyPdf import PdfFileReader

 def printMeta(fileName):

  pdfFile = PdfFileReader(file(fileName, ‘rb’))

  docInfo = pdfFile.getDocumentInfo()

  print ‘[∗] PDF MetaData For: ’ + str(fileName)

  for metaItem in docInfo:

   print ‘[+] ’ + metaItem + ‘:’ + docInfo[metaItem]

Adding an option parser to identify a specific file, we have a tool that can identify the metadata embedded in a PDF document. Similarly, we can modify our script to test for specific metadata, such as a specific user. Certainly, it might be helpful for Greek law enforcement officials to search for files that also list Alex Tapanaris as the author.

 import pyPdf

 import optparse

 from pyPdf import PdfFileReader

 def printMeta(fileName):

  pdfFile = PdfFileReader(file(fileName, ‘rb’))

  docInfo = pdfFile.getDocumentInfo()

  print ‘[∗] PDF MetaData For: ’ + str(fileName)

  for metaItem in docInfo:

   print ‘[+] ’ + metaItem + ‘:’ + docInfo[metaItem]

 def main():

  parser = optparse.OptionParser(‘usage %prog “+\

   “-F ’)

  parser.add_option(‘-F’, dest=‘fileName’, type=‘string’,\

   help=‘specify PDF file name’)

  (options, args) = parser.parse_args()

  fileName = options.fileName

  if fileName == None:

   print parser.usage

   exit(0)

  else:

   printMeta(fileName)

 if __name__ == ‘__main__’:

  main()

Running our pdfReader script against the Anonymous Press Release, we see the same metadata that led Greek authorities to arrest Mr. Tapanaris.

 forensic:∼# python pdfRead.py -F ANONOPS_The_Press_Release.pdf

 [∗] PDF MetaData For: ANONOPS_The_Press_Release.pdf

 [+] /Author:Alex Tapanaris

 [+] /Producer:OpenOffice.org 3.2

 [+] /Creator:Writer

 [+] /CreationDate:D:20101210031827+02’00’

Understanding Exif Metadata

The exchange image file format (Exif) standard defines the specifications for how to store image and audio files. Devices such as digital cameras, smartphones, and scanners use this standard to save audio or image files. The Exif standard contains several useful tags for a forensic investigation. Phil Harvey wrote a tool aptly named exiftool (available from
http://www.sno.phy.queensu.ca/~phil/exiftool/
) that can parse these tags. Examining all the Exif tags in a photo could result in several pages of information, so let’s examine a snipped version of some information tags. Notice that the Exif tags contain the camera model name
iPhone 4S
as well as the GPS latitude and longitude coordinates of the actual image. Such information can prove helpful in organizing images. For example, the Mac OS X application iPhoto uses the location information to neatly arrange photos on a world map. However, this information also has plenty of malicious uses. Imagine a soldier placing Exif-tagged photos on a blog or a Web site: the enemy could download entire sets of photos and know all of that soldier’s movements in seconds. In the following section, we will build a script to connect to a Web site, download all the images on the site, and then check them for Exif metadata.

 investigator$ exiftool photo.JPG

 ExifTool Version Number  : 8.76

 File Name   : photo.JPG

 Directory    : /home/investigator/photo.JPG

 File Size    : 1626 kB

 File Modification Date/Time : 2012:02:01 08:25:37-07:00

 File Permissions   : rw-r--r--

 File Type    : JPEG

 MIME Type   : image/jpeg

 Exif Byte Order   : Big-endian (Motorola, MM)

 Make    : Apple

 Camera Model Name  : iPhone 4S

 Orientation   : Rotate 90 CW

 <..SNIPPED..>

 GPS Altitude   : 10 m Above Sea Level

 GPS Latitude   : 89 deg 59’ 59.97” N

 GPS Longitude   : 36 deg 26’ 58.57” W

 <..SNIPPED..>

Downloading Images with BeautifulSoup

Available from
http://www.crummy.com/software/BeautifulSoup/
, Beautiful Soup allows us to quickly parse HTML and XML documents. Leonard Richardson released the latest version of Beautiful Soup on May 29, 2012. To update to the latest version on Backtrack, use easy_install to fetch and install the beautifulsoup4 library.

 investigator:∼# easy_install beautifulsoup4

 Searching for beautifulsoup4

 Reading http://pypi.python.org/simple/beautifulsoup4/

 <..SNIPPED..>

 Installed /usr/local/lib/python2.6/dist-packages/beautifulsoup4-4.1.0-py2.6.egg

 Processing dependencies for beautifulsoup4

 Finished processing dependencies for beautifulsoup4

In this section, we will Beautiful Soup to scrape the contents of an HTML document for all the images found on the document. Notice that we are using the urllib2 library to open the contents of a document and read it. Next, we can create a Beautiful Soup object or a parse tree that contains the different objects of the HTML document. In that object, we will extract all the image tags by searching using the method .findall(‘img’). This method returns an array of all the image tags, which we will return.

 import urllib2

 from bs4 import BeautifulSoup

 def findImages(url):

  print ‘[+] Finding images on ’ + url

  urlContent = urllib2.urlopen(url).read()

  soup = BeautifulSoup(urlContent)

  imgTags = soup.findAll(‘img’)

  return imgTags

Next, we need to download each image from the site in order to examine them in a separate function. To download an image, we will use the functionality included in the urllib2, urlparse, and os libraries. First, we will extract the source address from the image tag. Next, we will read the binary contents of the image into a variable. Finally, we will open a file in write-binary mode and write the contents of the image to the file.

 import urllib2

 from urlparse import urlsplit

 from os.path import basename

 def downloadImage(imgTag):

  try:

   print ‘[+] Dowloading image...’

   imgSrc = imgTag[‘src’]

   imgContent = urllib2.urlopen(imgSrc).read()

   imgFileName = basename(urlsplit(imgSrc)[2])

   imgFile = open(imgFileName, ‘wb’)

   imgFile.write(imgContent)

   imgFile.close()

   return imgFileName

  except:

   return ’’

Reading Exif Metadata from Images with the Python Imaging Library

To test the contents of an image file for Exif Metadata, we will process the file using the Python Imaging Library. PIL, available from
http://www.pythonware.com/products/pil/
, adds image-processing capabilities to Python, and allows us to quickly extract the metadata associated with geo-location information. To test a file for metadata, we will open the object as a PIL Image and use the method
_getexif().
Next, we parse the Exif data into an array, indexed by the metadata type. With the array complete, we can search the array to see if it contains an Exif tag for GPSInfo. If it does contain a GPSInfo tag, then we will know the object contains GPS Metadata and we can print a message to the screen.

 def testForExif(imgFileName):

  try:

   exifData = {}

   imgFile = Image.open(imgFileName)

   info = imgFile._getexif()

   if info:

    for (tag, value) in info.items():

     decoded = TAGS.get(tag, tag)

    exifData[decoded] = value

    exifGPS = exifData[‘GPSInfo’]

    if exifGPS:

     print ‘[∗] ’ + imgFileName + \

      ‘ contains GPS MetaData’

  except:

    pass

Wrapping everything together, our script is now able to connect to a URL address, parse and download all the images files, and test each file for Exif metadata. Notice that in the main function, we first fetch a list of all the images on the site. Then, for each image in the array, we will download the file and test it for GPS metadata.

 import urllib2

 import optparse

 from urlparse import urlsplit

 from os.path import basename

 from bs4 import BeautifulSoup

 from PIL import Image

 from PIL.ExifTags import TAGS

 def findImages(url):

  print ‘[+] Finding images on ’ + url

  urlContent = urllib2.urlopen(url).read()

  soup = BeautifulSoup(urlContent)

  imgTags = soup.findAll(‘img’)

  return imgTags

 def downloadImage(imgTag):

  try:

   print ‘[+] Dowloading image...’

   imgSrc = imgTag[‘src’]

   imgContent = urllib2.urlopen(imgSrc).read()

   imgFileName = basename(urlsplit(imgSrc)[2])

   imgFile = open(imgFileName, ‘wb’)

   imgFile.write(imgContent)

   imgFile.close()

   return imgFileName

  except:

   return ’’

 def testForExif(imgFileName):

  try:

   exifData = {}

   imgFile = Image.open(imgFileName)

   info = imgFile._getexif()

   if info:

    for (tag, value) in info.items():

     decoded = TAGS.get(tag, tag)

     exifData[decoded] = value

    exifGPS = exifData[‘GPSInfo’]

    if exifGPS:

     print ‘[∗] ’ + imgFileName + \

      ‘ contains GPS MetaData’

  except:

   pass

 def main():

  parser = optparse.OptionParser(‘usage%prog “+\

  “-u ’)

  parser.add_option(‘-u’, dest=‘url’, type=‘string’,

  help=‘specify url address’)

  (options, args) = parser.parse_args()

  url = options.url

  if url == None:

   print parser.usage

   exit(0)

  else:

   imgTags = findImages(url)

   for imgTag in imgTags:

    imgFileName = downloadImage(imgTag)

    testForExif(imgFileName)

 if __name__ == ‘__main__’:

  main()

Testing the newly created script against a target address, we see that one of the images on the target contains GPS metadata information. While this can be used in an offensive reconnaissance sense to target individuals, we can also use the script in a completely benign way—to identify our own vulnerabilities before attackers.

 forensics: # python exifFetch.py -u http://www.flickr.com/photos/dvids/4999001925/sizes/o

 [+] Finding images on http://www.flickr.com/photos/dvids/4999001925/sizes/o

 [+] Dowloading image…

 [+] Dowloading image…

 [+] Dowloading image…

 [+] Dowloading image…

 [+] Dowloading image…

 [∗] 4999001925_ab6da92710_o.jpg contains GPS MetaData

 [+] Dowloading image…

 [+] Dowloading image…

 [+] Dowloading image…

 [+] Dowloading image…

Other books

Sh*t My Dad Says by Justin Halpern
Star Trek: Pantheon by Michael Jan Friedman
Bite Me (Woodland Creek) by Mandy Rosko, Woodland Creek
Lies and Alibis by Warren, Tiffany L.
Off the Record by Sawyer Bennett
Before Adam by Jack London
The Doomsday Equation by Matt Richtel