Documents

Searching and Retrieving Documents

- Google Searching (google.com)

Use he "filetype:" (or easier "ext.") search operator. Bing demands "filetype" while Google seems to prefer "ext".

Files to look after:

Microsoft Word: DOC, DOCX

Microsoft Excel: XLS, XISX, CSV

Microsoft Power Point :PPT, PPIX

Adobe Acrobat: PDF

Text File: TXT, RTF

Open Office: ODI, ODS, ODG, ODP

Word Perfect: WD

To search all of these file types at once:

"OSINT" filetype:pdf OR filetype:doc OR filetype:xls OR filetype:xls OR filetype:docx OR filetype:ppt OR filetype:pptx OR filetype:wpd OR filetype:txt

- Google Docs (docs.google.com)

Once the document is finished and no longer needed, the user may forget to remove it from public view.

site:docs.google.com

site:docs.google.com/presentation/d —> PowerPoint presentations site:does.google.com/drawings/d —> Google flowchart drawings site:docs.google.com/file/d —> images, videos, PDF files, and documents

site:docs.google.com/folder/d —> collections of files inside folders

site:docs.google.com/open —> external documents, folders, and files

In 2013, Google began placing some user generated documents on the "dtive.google.com" domain. Therefore, any search that you conduct with the method described previously should be repeated with "drive" in place of "docs".

site:drive.google.com

- Microsoft Docs (docs.microsoft.com)

site:docs.microsoft.com

- Amazon Web Services (amazonaws.com)

site:amazonaws.com

Examples:

site:amazonaws.com ext:xls "password"

site:amazonaws.com ext:pdf "osint"

Another option is the Amazon CloudFront servers:

site:cloudfront.net OSINT

- Gray Hat Warfare (buckets.grayhatwarfare.com)

Searchable database of over one billion files, all publicly stored with AWS servers.

- Google Cloud Storage (cloud.google.com)

Examples:

site:storage.googleapis.com ext:xlsx OR ext:xls

sitestorage.googleapis.com "confidential"

site:storage.googleapis.com "confidential"

- Presentation Repositories

Many people choose to store PowerPoint and other types of presentations in the cloud.

  • Slide Share (slideshare.net)

  • ISSUU issuu.com)

  • Slidebean (slidebean.com)

  • Prezi (prezi.com) --> site:prezi.com "osint"

  • PPT Search (pptsearchengine.net)

  • Power Show powershow.com

- Scribd (scribd.com)

Scribd was a leading cloud storage document service for several years.

The plethora of stored documents is still accessible. This can be valuable for historical content posted, and likely forgotten, by the target.

Search, click “Documents” and this will present the information.

- PDF Drive (pdfdrive.com)

This service scans the internet for new PDF files and archives them. This can be helpful when the original source removes the content.

- WikiLeaks (search.wikileaks.org)

It’s sole purpose is leaking sensitive and classified documents.

- Cryptome (cryptome.org)

Another site that strives to release sensitive and classified information to the public.

Cryptome does not provide a search for their site, we must rely on Google or Bing to find the documents:

site:cryptome.org "osint"

- Paste Sites

  • Pastebin (pastebin.com) —> Most popular paste site. site:pastebin.com "osint"

  • 0bin.net

  • cl1p.net

  • codepad.org

  • controlc.com

  • doxbin.org

  • dpaste.com

  • dpaste.de

  • dpaste.org

  • dumpz.org

  • friendpaste.com

  • gist.github.com

  • hastebin.com

  • heypasteit.com

  • ideone.com

  • ivpaste.com

  • jsbin.com

  • justpaste.it

  • justpaste.me

  • paste.debian.net

  • paste.ee

  • paste.centos.org

  • paste.frubar.net

  • paste.lisp.org

  • paste.opensuse.org

  • paste.org

  • paste.org.ru

  • paste.ubuntu.com

  • paste2.org

  • pastebin.ca

  • pastebin.com

  • pastebin.fr

  • pastebin.gr

  • pastefs.com

  • pastehtml.com

  • pastelink.net

  • pastie.org

  • p.ip.fi

  • privatebin.net

  • slexy.org

  • snipplt.com

  • sprunge.us

  • textsnip.com

  • tidypub.org

  • wordle.net

  • zerobin.net

Document Metadata

General metadata procedures and tools are explained in "Metadata" section. However, here are specific tools and techniques for metadata in documents.

We should do this process locally, but here is a list of websites for quickly see the metadata inside documents:

  • Extract Metadata (extractmetadata.com)

  • Jeffrey's Viewer (exif.regex.info/exif.cgi)

  • ExifInfo (exifinfo.org)

  • Get Metadata (get-metadata.com)

- Document Metadata Applications

We could use either exiftool, explained in "Metadata" section, or FOCA (github.com/Eleven Paths/FOCA), explained below.

FOCA is a Windows-based solution which possesses a user-friendly interface.

To download and install:

  • Click the hyperlink for the most recent "zip" file.

  • Double-click the zip file and extract the contents.

  • Launch FOCA.exe from the "bin" folder within the "FOCAPro" folder.

  • If prompted, select "Download and install this feature" to install the net framework.

To use it:

  • Open FOCA and click the Metadata folder in the left menu.

  • Drag and drop the documents into the FOCA window.

  • Right-click any of the documents and choose "Extract all metadata"

  • Right-click any of the documents and choose "Analvze metadata"

- Manual Metadata Extraction

Some documents has hidden metadata, commonly PowerPoint files.

Once downloaded, changed the name of the PowerPoint extension .pptx to .zip, then decompress the zip file, this will present dozens of new files.

They include all of the images inside the presentation, which can then easily be analyzed for their own metadata, and text extraction of all words in the slides. The "app.xmI" file confirms that the author was using PowerPoint from Microsoft Office 2016 (App Version>16.0000 and several files include unique identifiers for this user. Comparing these to other downloaded documents could prove that the authors of each were the same.

OCR and Text Archives

- Free OCR (free-ocr.com)

You may occasionally locate a PDF file that has not been indexed for the text content. These types of PDF files will not allow you to copy and paste any of the text. This could be due to poor scanning techniques of to purposely prohibit outside use of the content. You may desire to capture this text for a summary report. These files can be uploaded to Free OCR and converted to text documents.

- Text Archives (archive.org)

The Internet Archive possesses massive collections of text files, books, and other documents.

https://archive.org/search.php?query=inteltechniques&sin=TXI

Books

- Google Books books google.com)

Google has scanned most books printed within the past decade and the index is searchable. Many of these scans also include a digital preview of the content. A direct URL query appears as follows.

https://www.google.com/search?tbm=bks&q=inteltechniques

- Pirated Books (annas-archive.org)

Many people use a service called Library Genesis to download illegal pirated e-books. Library Genesis does not offer any tupe of indexing of content, but Annas Archive acquires all of their content and provides a search interface. This allows us to search within the content of millions of pirated books without actually downloading any content and allows us to find references to specific targets within publications.

https://annas-archive.org/search?q=michael%20bazzell

- Book Sales (amazon-asin.com)

Amazon offers a lookup tool which displays details about the number of copies being sold of any book on their site. First, you must identify the ASIN assigned to the book.

https://amazon-asin.com/asincheck/?product_id=B09PHL6Q4G

Rental Vehicle Records

Several vehicle rental companies offer an option to access your receipts online.

While the processes to retrieve these documents are designed to only obtain your own records, it is easy to view others.

- Enterprise (enterprise.com)

At the bottom of every Enterprise web page is an option to "Get a receipt". Clicking this will present a form that must be completed before display of any details. Enterprise will need the driver's license number and last name. Providing this information will display the user's entire rental history for the past six months to three years.

- Hertz (hertz.com/rentacar/receipts/request-receipts.do)

Similar to Enterprise.

- Alamo (alamo.com)

Alamo titles their receipt retrieval link "Past Trips/Receipts" and it is located in the bottom portion of every page. The process is identical to the previous two examples.

- Thrifty (thrifty.com/Reservations/OnlineReceipts.asps)

This service requires a last name and either a driver's license number or credit card number.

- Dollar (dollar.com/Reservations/Receipt.aspx)

Similar to Thrifty, this service requires a last name and either a driver's license number or credit card number.

IntelTechniques Documents and Pastes Tools

The first section queries documents by file types.

The second section allows entry of any terms or operators in order to locate files stored within Google Docs, Google Drive, Microsoft Docs, Amazon AWS, CloudFront, SlideShare, Prezi, ISSUU, Scribd, PDF Drive, and others.

The "Pastes" search tool presents a Google custom search engine (CSE) which queries all paste sites mentioned previously.

Code in Documents.html and Pastes.html.

Last updated