Websites & Domains

Website Content Exfiltration Tools

EyeWitness

- Installation

cd ~/Downloads/Programs

git clone https://github.com/FortyNorthSecurity/EyeWitness.git

cd EyeWitness/Python/setup && sudo ./setup.sh

cd ~/Documents/scripts

sed -i 's/ChrisTruncer/FortyNorthSecurity/g' updates.sh

cd ~/Downloads/Programs

wget https://github.com/mozilla/geckodriver/releases/download/v0.32.0/geckodriver-v0.32.0-linux-aarch64.tar.gz

tar -xvzf geckodriver* && chmod +x geckodriver

sudo mv geckodriver /usr/local/bin

- Usage

Open your Applications menu and launch Text Editor.

Type or paste URLs, one per line, and save the file to your Desktop as "sites.txt".

Open Terminal and enter the following commands:

cd ~ /Downloads/Programs/EyeWitness/Python

./EyeWitness.py -f ~/Desktop/sites.txt --web -d ~/Documents/Eyewitness/

The results include screen captures of each target website and detailed information including the server IP address, page title, modification date, and full source code of the page.

- Custom captures.sh script

#!/usr/bin/env bash
opt1="Single URL"
opt2="Multiple URLs (File)"
eyewitness=$(zenity  --list  --title "EyeWitness" --radiolist  --column "" --column "" TRUE "$opt1" FALSE "$opt2" --height=400 --width=300)
case $eyewitness in
$opt1 )
domain=$(zenity --entry --title "EyeWitness" --text "Enter Target URL")
cd ~/Downloads/Programs/EyeWitness/Python
./EyeWitness.py --web --single "$domain" -d ~/Documents/EyeWitness/
exit;;
$opt2 ) 
eyewitness_file=$(zenity --file-selection --title "URL List" --text "Select File of URLs")
cd ~/Downloads/Programs/EyeWitness/Python
./EyeWitness.py --web -f "$eyewitness_file" -d ~/Documents/EyeWitness/ 
exit;;esac

To make a desktop shortcut of the previous script:

[Desktop Entry]
Type=Application
Name=Captures Tool
Categories=Network;OSINT
Exec=/home/osint/Documents/scripts/captures.sh
Icon=/home/osint/Documents/icons/captures.png
Terminal=true

The Harvester

Search a supplied domain with the intent of providing email addresses associated to the target.

Installation:

cd ~/Downloads/Programs

git clone https://github.com/laramies/theHarvester.git

cd theHarvester

python3 -m venv theHarvesterEnvironment

source theHarvesterEnvironment/bin/activate

sudo pip install -r requirements.txt

deactivate

Usage:

theHarvester -h

theHarvester -d <domain>

We can specify which source we want to access for data by using the -b switch, such as Baidu, Bing, Bing API, Certspotter, CRTSH, DNSdumpster, Dogpile, and many others.

If you want to use all these resources, you can simply use the all switch from the command line.

In some cases, you will want to use the services API (application programming interface). To do so, open the text file in any text editor at /etc/theHarvester/api-kets.yaml

Example:

theHarvester -d teslas.com -b all -f /home/kali/tesla_results2

python3 theHarvester.py -d inteltechniques.com -b bing

Carbon14

It searches for any images hosted within the page and analyzes the metadata for creation dates.

Installation:

cd ~/Downloads/Programs

git clone https://github.com/Lazza/Carbon14

cd Carbon14

python3 -m venv Carbon14Environment

source Carbon14Environment/bin/activate

sudo pip install -r requirements.txt

deactivate

Usage:

python3 carbon14.py https://inteltechniques.com

Metagoofil

Locate and download documents.

Installation

cd ~/Downloads/Programs

git clone https://github.com/opsdisk/metagoofil.git

cd metagoofil

python3 -m venv metagoofilEnvironment

source metagoofilEnvironment/bin/activate

sudo pip install -r requirements.txt

deactivate

Basic usage:

metagoofil -d sans.org -t doc,pdf -l 20 -n 10 -o sans -f html

python3 metagoofil.py -d cisco.com -t pdf -o ~/Desktop/cisco/

python3 metagoofil.py -d cisco.com -t docx,xlsx -o ~/Desktop/cisco/

If I already possess numerous documents on my computer, I create a metadata CSV spreadsheet using ExifTool. I then analyze this document.

If my target website possesses few documents, I download them manually through Google or Bing within a web browser.

If my target website possesses hundreds of documents, I use Metagoofil, but only download one file type at a time. If my target were cisco.com, I would execute the following commands in Terminal:

python3 metagoofil.py -d cisco.com -t pdf -o ~/Desktop/cisco/

python3 metagoofil.py -d cisco.com -t doc -o ~/Desktop/cisco/

python3 metagoofil.py -d cisco.com -t xls -o ~/Desktop/cisco/

руthon3 metagoofil.ру -а сіsсо.сот -t ppt -o ~/Desktop/cisco/

python3 metagoofil.py -d cisco.com -t xlsx -o ~/Desktop/cisco/

python3 metagoofil.py -d cisco.com -t pptx -o ~/Desktop/cisco/

Custom domains.sh script

*This script also contains tools from "Subdomains and Directories" section.

#!/usr/bin/env bash
opt1="Amass"
opt2="Sublist3r"
opt3="Photon"
opt4="TheHarvester"
opt5="Carbon14"
timestamp=$(date +%Y-%m-%d:%H:%M)
domainmenu=$(zenity  --list  --title "Domain Tool" --radiolist  --column "" --column "" TRUE "$opt1" FALSE "$opt2" FALSE "$opt3" FALSE "$opt4" FALSE "$opt5" --height=400 --width=300)
case $domainmenu in
$opt1 )
domain=$(zenity --entry --title "Amass" --text "Enter target domain name")
mkdir ~/Documents/Amass/
amass intel -whois -ip -src -active -d $domain  -o ~/Documents/Amass/$timestamp-$domain.1.txt
amass enum -src -passive -d $domain -o ~/Documents/Amass/$timestamp-$domain.2.txt -d $domain
open ~/Documents/Amass/
exit;;
$opt2 )
domain=$(zenity --entry --title "Sublist3r" --text "Enter target domain name")
mkdir ~/Documents/Sublist3r/
cd ~/Downloads/Programs/Sublist3r
python3 sublist3r.py -d $domain -o ~/Documents/Sublist3r/sublist3r_$domain.txt
open ~/Documents/Sublist3r/
exit;;
$opt3 ) 
domain=$(zenity --entry --title "Photon" --text "Enter target domain name")
mkdir ~/Documents/Photon/
cd ~/Downloads/Programs/Photon/
python3 photon.py -u $domain -l 3 -t 100 -o ~/Documents/Photon/$timestamp-$domain
open ~/Documents/Photon/$timestamp-$domain
exit;;
$opt4 )
domain=$(zenity --entry --title "TheHarvester" --text "Enter target domain name")
mkdir ~/Documents/theHarvester/
cd ~/Downloads/Programs/theHarvester/
python3 theHarvester.py -d $domain -b bing,duckduckgo,yahoo,qwant -f ~/Documents/theHarvester/$timestamp-$domain
open ~/Documents/theHarvester/
exit;;
$opt5 )
domain=$(zenity --entry --title "Carbon14" --text "Domain name (WITHOUT 'HTTPS://')")
mkdir ~/Documents/Carbon14/
cd ~/Downloads/Programs/Carbon14/
python3 carbon14.py https://$domain > ~/Documents/Carbon14/$domain.txt
open ~/Documents/Carbon14/
exit;;
esac

To make a desktop shortcut of the previous script:

[Desktop Entry]
Type=Application
Name=Domains Tool
Categories=Network;OSINT
Exec=/home/osint/Documents/scripts/domains.sh
Icon=/home/osint/Documents/icons/domains.png
Terminal=true

HTTrack

Make an exact copy of a static website

Installation:

sudo apt install -y httrack webhttrack

GUI: webhttrack

Terminal version: httrack

Subdomains and Directories

censys-subdomain-finder

https://github.com/christophetd/censys-subdomain-finder

It should return any subdomain who has ever been issued a SSL certificate by a public CA, that's why it is very interesting option for passive scan.

To get the API, register In https://censys.io/register, the go to https://censys.io/register/account/api/ and copy the API ID and Secret values and run the following commands to make the script works:

export CENSYS_API_ID=...

export CENSYS_API_SECRET=...

To perform the queries:

python censys-subdomain-finder.py example.com -o subdomains.txt

Robots.txt

Practically every professional website has a robots.txt file at the "root" of the website. This file is not visible from any of the web pages at the site. It is present in order to provide instructions to search engines that crawl the website looking for keywords. These instructions identify files and folders within the website that should not be indexed by the search engine.

They usually provide insight into which areas of the site are considered sensitive by the owner.

To query:

site:cnn.com robots ext:txt

We can also query the Wayback Machine to display changes of this file over time.

Other usefull tools

- PentestTools (pentest-tools.com/information-gathering/find-subdomains-of-domain)

This unique tool performs several tasks that will attempt to locate hidden pages on a domain. First it performs a DNS zone transfer which will often fail. It will then use a list of numerous common subdomain names and attempt to identify any that are present. If any are located, it will note the IP address assigned to that subdomain. and will scan all 254 IP addresses in that range.

- Columbus Project (columbus.elmasy.com)

curl -H "Accept: text/plain" "https://columbus.elmasy.com/lookup/cnn.com"

If these options do not provide the results you need, consider SubDomain Finder (subdomainfinder.c99.nl) and DNS Dumpster (dnsdumpster.com). These services rely on Host Records from the domain registrat to display potential subdomains.

This is a brute force option.

Installation:

sudo snap install amass

Usefull commands:

amass intel -whois -ip-src -active -d inteltechniques.com

amass enum -src -ip -passive -d inteltechnigues.com

For a correct config:

amass enum -list

Enable any free API NOT enabled (needed to be effective):

amass enum -list | grep -v "\*"

For this tool, the Best paid APIS:

  • SecurityTrails

  • SpiderFootHX

And the best free APIS (put them all in ~/.config/amass/config.ini):

  • FacebookCT

  • PassiveTotal

  • Shodan

- sublist3r

This tool only find common subdomains.

Installation:

cd ~/Downloads/Programs

git clone https://github.com/aboul3la/Sublist3r.git

cd Sublist3r

python3 -m venv Sublist3rEnvironment

source Sublist3rEnvironment/bin/activate

sudo pip install -r requirements.txt

deactivate

Usage:

python3 sublist3r.py -d inteltechniques.com

- Photon

This tool will search for internal pages.

Installation:

cd ~/Downloads/Programs

git clone https://github.com/s0md3v/Photon.git

cd Photon

python3 -m venv PhotonEnvironment

source PhotonEnvironment/bin/activate

sudo pip install -r requirements.txt

deactivate

Usage:

python3 photon.py -u inteltechniques.com -1 3 -t 100

Usefull to get working URLs for a list of subdomains, and get with the https:// prefix:

cat subdomains.txt | httpx -silent

cat subdomains.txt | httpx -l subdomains.txt -ports 80,8080,8000,8443,8080,8888,10000 -threads 200 > subdomains_alive.txt

Fast passive subdomain enumeration tool.

subfinder -d example.com

echo example.com | subfinder -silent | httpx -silent

subfinder -dL domains.txt -all -recursive -o subdomains.txt

Crawler that find directories.

cat domains.txt | httpx | katana --silent

gau --mt text/html,application/json --providers wayback,commoncrawl,otx,urlscan --verbose example.com

cp censys-api.nse /usr/share/nmap/scripts/

export CENSYS_API_ID=…

export CENSYS_API_SECRET=…

nmap -sn -Pn -n --script censys-api scanme.nmap.org

- censys_search.py

https://github.com/sparcflow/HackLikeALegend/blob/master/py_scripts/censys_search.py

! Deprecated, using v1 of the API, could be updated with minor changes

- chaos.projectdiscovery.io

API to obtain subdomains for a given domain.

To obtain subdomains through Shodan.

- crt.sh

curl -s https://crt.sh\/?q=\example.com\&output\=json | jq -r '.[].name_value' \ grep -Po '(\w+\.w+\. \w+)$'

- anew

Once obatined new subdomains:

https://github.com/tomnomnom/anew

cat new-subdomains | anew subdomains.txt

Current Domain Registration and Hosting

- dig command

We use dig command to return the IP address of a web:

dig +short www.example.com

+short flag shortens the output

- whois command

We use whois lookup to figure out who hosts the main website.

whois {IP obteined before}

- query_whois.py

Some of these websites might be hosted by third parties and others by the company we are scanning itself.

We can expose the site hosts for a list of domains with the following script:

This script loops through multiple whois calls and extract relevant information into a readable CVS file.

python query_whois.py domains.txt | column -s "," -t

Then we could try getting reachable services and their versions with nmap:

nmap -p --sV 1.1.1.1-256 (Here we can select the range of IP's discovered)

- SubW0iScan.py

For a list of subdomains, obtain Active Domains, Hosting name, Domain IP, IP Range and Country.

https://github.com/Sergio-F20/SubW0iScan

python SubWh0iScan.py -d subdomains-list.txt -o subdomains-info.csv

- ViewDNS Whois (viewdns.info/whois)

This service provides numerous online searches related to domain and IP address lookups.

ViewDNS will occasionally block my connection if I am connected to a VPN. An alternative Whois research tool is who.is.

https://viewdns.info/whois/?domain=cnn.com

- ViewDNS Reverse IP (viewdns.info/reverseip)

Next, you should translate the domain name into the IP address of the website. ViewDNS will do this, and display additional domains hosted on the same server.

If the results had included domains from websites all over the world without a common theme, it would have indicated that this was a shared server, which is very common.

https://viewdns.info/reverseip/?host=cnn.com&t=1

- ViewDNS Reverse Whois (viewdns.info/reversewhois)

This utility attempts to search the domain in order to locate other domains owned by the same registrant.

If the domain possessed private registration, this technique would fail.

https://viewdns.info/reversewhois/?q=cnn.com

- ViewDNS Port Scanner (viewdns.info/portscan)

This online port scanner looks for common ports that may be open.

https://viewdns.info/portscan/?host=cnn.com

- ViewDNS IP History (viewdns.info/iphistory)

This tool translates a domain name to IP address and identifies previous IP addresses used by that domain.

https://viewdns.info/iphistory/?domain=cnn.com

- ViewDNS DNS Report (viewdns.info/dnsreport)

This option presents a complete report on the DNS settings for the target domain.

https://viewdns.info/dnsreport/?domain=cnn.com

Historical Domain Registration

Many domains now possess private registration. If you query a domain and see a name entry such as "WhoisGuard Protected", you know that the domain is protected.

To reveal the owner is through historical domain records. If the domain-has been around a while, there is a very good chance that the domain was not always private.

- Whoxy (whoxy.com)

This is one of the very few premium services which offer a decent free tier.

https://www.whoxy.com/inteltechniques.com

The search option in the upper right allows us to query email addresses, names, and keywords. This can be extremely valuable when you do not know which domain names your target has owned.

It allows a free API demo at https:/ /www.whoxy.com/whois-history/demo.php, but you will be rate-limited if it detects abuse. They do not offer a free trial of their API, but the fees are minimal. The current price is $2.00 for 400 queries. The following is the URL structure:

https://api.whoxy.com/?key=XXXX&history=inteltechniques.com | python3-mjson.tool

- Whoisology (whoisology.com)

Like Whoxy, it provides historical domain records as a reverse-domain search utility.

Search field requests a domain or email address.

Once logged in as a free user, you receive much more detail within your searches.

It has the ability to immediately search for additional domains associated within any field of this data.

This type of cross-reference search has not been found through many other services. Another powerful feature of Whoisology is the historical archives. This service constantly scans for updates to domain registrations. When new content is located, it documents the change and allows you to search the previous data.

- WhoisXMLAPI (whois.whoisxmlapi.com)

Sign up for a free account and confirm your email address. Click on profile and select the "My Products" option and make note of your API key.

To make a query (output JSON format):

curl 'https://whois-history.whoisxmlapi.com/api/vl?apiKey=at_0vPfsSUdf1ZpiCxc5&domainName=inteltechniques.com&mode=purchase' | python3 -mjson.tool

or

https://whois-history.whoisxmlapi.com/api/vl?apiKey=at_0vPfsSUdf1ZpiCxc5&domainName=inteltechniques.com&mode=purchase

- Archive.org Domain Registration Data (web.archive.org)

We can query the Wayback Machine for the exact historical URL of a domain registration.

Following URLs, displays any results:

https://web.archive.org/web/http://www.who.is/whois/cnn.com/

https://web.archive.org/web/https://whois.domaintools.com/cnn.com

https://web.archive.org/web/https://www.whoxy.com/cnn.com

https://web.archive.org/web/https://domainbigdata.com/cnn.com

https://web.archive.org/web/https://whoisology.com/cnn.com

- SecurityTrails (https://securitytrails.com/)

Historical Content Archives

- Archive Box (github.com/ArchiveBox/ArchiveBox)

To install and initialize the application:

  • mkdir ~ /Downloads /Programs / archivebox

  • cd ~ /Downloads/Programs/archivebox

  • python3 -m ven archiveboxEnvironment

  • source archiveboxEnvironment/bin/activate sudo pip install archivebox

  • deactivate mkdir ~/Documents/archivebox cd ~/Documents/archivebox archivebox init

Once installed, add target website; launch the server; and open the database within our browser:

  • cd ~ /Documents/archivebox

  • archivebox add 'https://notla.com'

  • archivebox server 0.0.0.0:8000

  • firefox http://0.0.0.0:8000

Archive Box captured the target with SingleFile through Chrome; generated a PDF and screenshot; performed a WGET of the live page; extracted the page from Archive.org; fetched all HTML code; and downloaded any media files.

- Custom Internet Archive Tool

This script includes the tool from the previous section “Archive Box” and the "Change Detection" Tool. It esentially takes advantage of a Python script called "waybackpy".

Executing the script conducts the following tasks:

  • Make a directory in the Documents folder for data and enter it.

  • Download all known URLs indexed by Internet Archive into a text file.

  • Download the oldest known archive URL into a text file.

  • Append the file with the newest archive URL.

  • Append the file with URLs from the past ten years.

  • Remove duplicates and sort by date.

  • Generate screen captures of all unique links with only one thread (slower).

  • Download source code of the oldest and newest archives.

The output will be text and html files, each of these is an archived home page of the target website from a different date.

Archives.sh script:

#!/usr/bin/env bash
opt1="Launch HTTrack"
opt2="Launch WebHTTrack"
opt3="Launch ChangeDetection"
opt4="Internet Archive Tool"
opt5="Archive Box"
timestamp=$(date +%Y-%m-%d:%H:%M)
domainmenu=$(zenity  --list  --title "Archives Tool" --radiolist  --column "" --column "" TRUE "$opt1" FALSE "$opt2" FALSE "$opt3" FALSE "$opt4" FALSE "$opt5" --height=400 --width=300)
case $domainmenu in
$opt1 )
httrack
exit;;
$opt2 )
webhttrack
exit;;
$opt3 )
mkdir ~/Documents/ChangeDetection
changedetection.io -d ~/Documents/ChangeDetection -p 5000 & firefox 
http://127.0.0.1:5000

exit;;
$opt4 )
url=$(zenity --entry --title "Internet Archive Tool" --text "Enter Target URL")
mkdir ~/Documents/waybackpy
mkdir ~/Documents/waybackpy/$url
cd ~/Documents/waybackpy/$url
waybackpy --url "$url" --known_urls
waybackpy --url "$url" --oldest >> $url.txt
waybackpy --url "$url" --newest >> $url.txt
waybackpy --url "$url" --near --year 2010 >> $url.txt
waybackpy --url "$url" --near --year 2011 >> $url.txt
waybackpy --url "$url" --near --year 2012 >> $url.txt
waybackpy --url "$url" --near --year 2013 >> $url.txt
waybackpy --url "$url" --near --year 2014 >> $url.txt
waybackpy --url "$url" --near --year 2015 >> $url.txt
waybackpy --url "$url" --near --year 2016 >> $url.txt
waybackpy --url "$url" --near --year 2017 >> $url.txt
waybackpy --url "$url" --near --year 2018 >> $url.txt
waybackpy --url "$url" --near --year 2019 >> $url.txt
waybackpy --url "$url" --near --year 2020 >> $url.txt
waybackpy --url "$url" --near --year 2021 >> $url.txt
sort -u -i $url.txt -o $url.sorted.txt
webscreenshot -r chrome -q 100 -i $url.sorted.txt -w 1
waybackpy --url "$url" -o > oldest.html
waybackpy --url "$url" -n > newest.html
open ~/Documents/waybackpy/$url >/dev/null
exit;;
$opt5 )
url=$(zenity --entry --title "Archive Box" --text "Enter FULL Target URL")
cd ~/Documents/archivebox
archivebox add ''$url''
archivebox server 0.0.0.0:8000 &
sleep 3
firefox 
http://0.0.0.0:8000/

exit;;
esac

- The Internet Archive

More advanced tools has been explained, however, to query through URL:

https://web.archive.org/cdx/search/cdxPurl=cnn.com/*&output=text&fl=original&collapse=urlkey

- Archive Today (archive.is/archive.fo/archive.md)

This service also collects copies of websites and ignores all requests for deletion.

https://archive.is/*.inteltechniques.com

- Mementoweb (mementoweb.org)

This service offers a "Time Travel" option which presents archives of a domain from several third-party providers.

http://timetravel.mementoweb.org/list/19991212110000/http://inteltechniques.com

- Library of Congress (webarchive.loc.gov)

This option allows you to search by domain to discover all publicly available content in the Library of Congress Web Archives.

https://webarchive.loc.gov/all/*/http://inteltechniques.com

- Portuguese Web Archive (arquivo.pt)

https://arquivo.pt/page/search?hitsPerPage=100&query=site%3Ainteltechniques.com

Screen Captures & Monitoring

Historical Screen Captures

"Custom Internet Archive Tool" also generates screenshoots, below is a list of other tools that can generate screenshots of previous versions of a website.

- Search Engine Cache

This should be conducted first in order to identify any recent cached copies. Google is going to possess the most recent cache.

https://webcache.googleusercontent.com/search?g=cache:inteltechniques.com

- Website Informer (website.informer.com)

Screen capture available to the right of a search result.

- URLScan (urlscan.io)

Similar to the previous option, but the screen captures are often unique.

- Easy Counter (easycounter.com)

The screen capture presented here was very similar to Website Informer, but it was cropped slightly different.

- Domain Tools (whois.domaintools.com)

These screen captures are in high resolution and current.

- Domains App (dmns.app)

This service offers the highest resolution image.

https://files.dmns.app/screenshots/inteltechniques.com.jpg

- Hype Stat (hypestat.com)

The lowest-quality option, but typically shows an older image.

- Carbon Dating (carbondate.cs.odu.edu)

This free service provides a summary of available online caches of a website, and displays an estimated site creation date based on the first available capture.

http://carbondate.cs.odu.edu/#inteltechniques.com

Current Screen Captures

- webscreenshoot python tool

For a list of URLs we can use webscreenshoot tool which grabs a screenshot of the main page of each URL for a quick manual review:

pip install webscreenshoot

webscreenshoot -i urls.txt

Monitoring Through Screen Captures and Change Detection Tools

Once you locate a website of interest, it can be time consuming to continually visit the site looking fot any changes. With large sites, it is easy to miss the changes due to an enormous amount of content to analyze.

- Follow That Page (followthatpage.com)

Enter the address of the target page of interest, as well as an email address where you can be reached. This service will monitor the page and send you an email if anything changes. Anything highlighted is either new or modified content. Anything that has been stricken through indicates deleted text.

It does not wotk well on some social networks.

- Visual Ping (visualping.io)

Robust options.

Allows you to select a target domain for monitoring.

Visual Ping will generate a current snapshot of the site and you can choose the level of monitoring. I recommend hourly monitoring and notification of any "tiny change". It will now check the domain hourly and email you if anything changes. If you are watching a website that contains advertisements or any dynamic data that changes. often, you can select to avoid that portion of the page.

- Change Detection (github.com/dgtlmoon/changedetection.io)

The services above can be inappropriate for sensitive investigations. This is where Change Detection can help. It is locally installed and available only to you.

The following configures the application:

  • mkdir ~ / Downloads /Programs / changedetection

  • cd ~/Downloads/Programs/changedetection

  • python3 -m venv changedetectionEnvironment

  • source changedetectionEnvironment/bin/activate

  • sudo pip install changedetection.io

  • deactivate

Then to launch it:

  • mkdir ~ / Documents/ ChangeDetection

  • changedetection.io -d ~/Documents/ChangeDetection -p 5000 & firefox http://127.0.0.1:5000

This tool provides the same service as the online options, if not better, but gives you full control within your Linux machine.

Email Address Identification

Identify any email addresses associated with a specific domain. This can lead to the discovery of emplovees and can be used for further breach data queries.

- Hunter (hunter.io)

This tool can also accept a domain name as a search term, and provides any email addresses that have been scraped from public web pages.

https://hunter.io/try/search/cnn.comPlocale=en

It also includes the online source of the information which was scraped.

- Website Informer (website.informer.com)

Once exhausted Hunter, switch to Website Informer. The results are typically fewer, but they are not redacted.

https://website.informer.com/cnn.com/emails

- SkyMem (skymem.info)

This service also displays full email addresses which are not redacted.

https://www.skymem.info/srch?q=cnn.com

These services will likely never provide the same results. This is why it is so important to exhaust all options in order to acquire the most data possible.

Corporate Mail Service Discovery

Once we have a domain name, its likely that, if its an organization, is using a mail service like Office 365, Google Suite or Amazon WorkMail:

- Office 365 (O365):

MX Record Check:

  • Use MXToolbox. (https://mxtoolbox.com/)

  • Enter the domain and check for MX records pointing to outlook.com, office365.com, or exchange.microsoft.com.

Login Page Check:

- Google Suite (G Suite):

MX Record Check:

  • Use MXToolbox. (https://mxtoolbox.com/)

  • Enter the domain and look for MX records typically pointing to aspmx.l.google.com.

Login Page Check:

- Amazon WorkMail:

MX Record Check:

  • Use MXToolbox. (https://mxtoolbox.com/)

  • Enter the domain and search for MX records related to amazonses.com or workmail.

  • CNAME Record Check:

  • Look for CNAME records pointing to workmail.awsapps.com. This can be done with MXToolbox or dig yourdomain.com CNAME

Domain Analytics

Domain analyties are commonly installed on websites in order to track usage information. This data often identifies the city and state from where a visitor is; details about the web browser the person is using; and keywords that were searched to find the site. Only the owner of the website can view this analytic data. Analytics search services determine the specific number assigned to the analytics of a website. If the owner of this website uses analytics to monitor other websites, the analytic number will probably be the same.

Additionally, it will try to identify user specific advertisements stored on one site that are visible on others. It will reverse search this to identify even more websites that are associated with each other.

- Blacklight (themarkup.org/blacklight)

To know about any malicious activity embedded into a target website.

- Spy On Web (spyonweb.com)

Will search a domain name and identify the web server IP address and location. It identifies and cross-references website analytic data that it locates on a target domain.

- Analyze ID (analyzeid.com)

Analyze ID performs the same type of query and attempts to locate any other domains that share the same analytics or advertisement user numbers as your target.

- Hacker Target (hackertarget.com/reverse-analyties-seatch).

This service stands out due to their immediate availability of historie analytics IDs across multiple subdomains.

- DNSLytics (dnslvtics.com/reverse-analytics)

Enter the analytics ID found with any of the previous techniques and you may find associations not present within other options.

https://dnslvtics.com/reverse-analytics/inteltechniques.com

https://dnslytics.com/reverse-adsense/inteltechniques.com

SSL Certificates

- CRT.sh (crt.sh)

https://crt.sh/?q=inteltechniques.com

Most of the data identifies various certificate updates which do not provide any valuable information. However, through the history you can see purchased ones. Clicking on any of the entries displays the other domains secured with that option, including dates of activity. This identifies other domains to investigate.

- Censys

Very valuable when searching for SSL Certificates.

Website Source Code

There are other scanners included in the Enumeration section of “Web Pentesting”.

- Nerdy Data (search.nerdydata.com)

Nerdy Data is a search engine that indexes the source code of websites. I use this to locate websites which steal the source code of my search tools and present them as their own.

If you have located a Google Analytics ID, AdSense ID, or Amazon ID of a website using the previous methods, you should consider searching this number through Nerdy Data.

- Built With (builtwith.com)

Entering a domain of into the Built With search immediately identifies the web server operating system (Linux), email provider (DreamHost), web framework (PHP, WordPress), WordPress plugins, website analytics, video services, mailing list provider, blog environment, and website code functions.

- Stats Crop (statscrop.com)

SEO & SEM

Search Engine Optimization (SEO) applies various techniques affecting the visibility of a website or a web page in a search engine's results.

Search Engine Marketing (SEM) websites provide details valuable to those responsible for optimizing their own websites. SEM services usually provide overall ranking of a website; its keywords that are often searched; backlinks, and referrals from other websites. SEO specialists use this data to determine potential advertisement relationships and to study their competition: Online investigators can use this to collect important details that are never visible on the target websites.

- Similar Web (similarweb.com)

The most comprehensive of the free options. Much of this data is "guessed" based on many factors.

- Moon Search (moonsearch.com)

It provides a recent screen capture of the target domain, plus its ranking, backlinks, IP address, server technologies, and analytics identifiers.

Additional websites which provide a similar service to Moon Search include Search Metrics (suite.searchmetrics.com), SpyFu (spyfu.com), and Majestic (majestic.com).

- Shared Count (sharedcount.com)

It searches your target domain and identifies its popularity on social networks such as Facebook and Twitter.

- Reddit Domains (reddit.com)

If your target website has ever been posted on Reddit, you can retrieve a listing of the incidents.

Backlink checker.

Offer many additional backlinks which are not present within the previous option.

https://host.io/backlinks/inteltechniques.com

- Host.io Redirects (host.io)

This option displays any URLs which are forwarding their traffic to your target site.

https://host.io/redirects/inteltechniques.com

A summary of all details about a domain stored with Host io can be found via the following direct URL.

https://host.io/inteltechniques.com

- Small SEO Tools: Plagiarism Checker (smallseotools.com/plagiarism-checker)

To make sure the content is original.

You can use this tool by copying any questionable text from a website and paste it into this free tool. It will analyze the text and display other websites that possess the same words.

The benefit of using this tool instead of Google directly is that it will structure several queries based on the supplied content and return variations of the found text.

Another option for this type of search is Copy Scape (copyscape.com).

- Visual Site Mapper (visualsitemapper.com)

This service analyzes the domain in real time, looking for linked pages within that domain. It provides an interactive graph that shows whether a domain has a lot of internal links that you may have missed. Highlighting any page will display the internal pages that connect to the selected page. This helps identify pages that are most "linked”.

- XML Sitemaps (xml-sitemaps.com)

This service "crawls" a domain and creates an XML text file of all public pages.

This is a great companion to visual site mappers, as the text can be easily imported into reporting systems. This often presents previously unknown content.

- github-subdomains

https://github.com/gwen001/github-subdomains

To obtain referenced subdomains of the target from github

API is random.

Many API keys needed and many runs to complete.

Domain Reputation

https://www.spam.org/search?type=domain&convert_block=1&group_ips=1&data-inteltechniques.com

https://spamdb.org/blacklists?q=inteltechniques.com

https://www.mywot.com/en/scorecard/inteltechniques.com

Data Breaches and Leaks

Domains can possess valuable breach data.

- Dehashed (dehashed.com)

https://dehashed.com/search?query=linteltechniques.com

- IntelX (intelx.io)

Presents partial Pastebin files which include your target domain.

https://intelx.io/?s=inteltechniques.com.

A free trial is required to see all results.

- Leakpeek (leakpeek.com)

Requires a free account to search domains.

- Phonebook (phonebook.cz)

Searche a domain for any email addresses which exist within publicly available breaches.

Shortened URLs

Social networking sites have made the popularity of shortened URL.

These services create a new URL, and simply point anyone to the original source when clicked.

Bitly allows access to metadata by including a "+" after the URL.

Tinv.co adds a "~" to the end of a link to display metadata.

Google use the “+" at the end.

Bit.do provides the most extensive data. They use a "-" after the URL.

- CheckShortURL (checkshorturl.com)

Catch-all service.

- UNFURL (dfir.blog/unfurl/)

See forwarding details, timestamps, and unique identifiers without visiting the site.

Very beneficial for the investigation of magnet Torrent links.

WordPress Data

WordPress enumeration and explotation steps are described in "Web Pentesting > CMS's > Wordpress". Below are additional OSINT resources to use when facing a wordpress target without needing to install any software.

https://gf.dev/wordpress-security-scanner

https://hackertarget.com/wordpress-security-scan/

Cloudflare

While identifying web hosts and IP addresses behind your target domain, you ate likely to encounter sites hiding behind Cloudflare. This company provides security for websites which often prevents online attacks and outages. They also help keep web host and owner details hidden.

Investigative options:

Threat Data

- Virus Total (virustotal.com)

The "Details" menu provides the most public data.

The Whois and DNS records should be similar to other sites.

The "Categories area provides the general topics of the target site.

The "HITPS Certificate" section can become interesting very quickly.

The "Subject Alternative Name" portion of this section identifies additional domains and subdomains associated with the SSL certificate of your target site.

The "Relations" tab identifies many new subdomains.

The "Files" section displays unique content from practically any other resource. It identifies files downloaded from the target site for analvsis and files which have a reference to the target site.

The “Community” section is where members of the VirusTotal community can leave comments or experiences in reference to the target.

Always use a virtual machine without network access if you plan to download or open anything found here.

- Threat Intelligence (threatintelligenceplatform.com)

The "Connected Domains" area identifies any external domains which are linked from your source, often display hidden links to third-party services otherwise unknown.

"Potentially dangerous content" and "Malware detection" sections. Both of these offer a historical view into an malicious content hosted on the target domain.

- Threat Crowd (threatcrowd.org)

This service provides a unique view of the domains associated with your target.

- Censys (censys.io)

The moment a certificate is issued, it is provided in real-time to Censys.

Click the "Details button on the summary page, search "alt_name within the results.

HTTP Body text information stored within the "Details" page of the HTTP and HTTPS sections.

Capture this data in the event of a modified or removed target web page.

Advanced DNS Tools

- Domains App (dmns.app)

We can also use this resource to see much more DNS details.

https://dmns.app/domains/michaelbazzell.com/dns-records

For many domains which apply extra level of email security and verification, you will find a legitimate email address which may have escaped your other analysis.

- OSINT.sh (osint.sh)

- IntelTechniques Domain Tool

It can automate queries across the most beneficial options. The final section provides easy access to shortened URL metadata.

Code at Domain.html.

- dnsrecon.py

python3 dnsrecon.py -d example-test.com

ASNs and Cloud Assets

An autonomous system (AS) is a very large network or group of networks with a single routing policy. Each AS is assigned a unique ASN, which is a number that identifies the AS.

Autonomous System Numbers are given to large enough nerworks.

ASNs help track down entity's IT infrastructure.

! This are not always a complete picture of a network thanks to rogue assets and cloud.

To locate ASNs:

https://bgp.he.net/

Validate it against ARIN: https://whois.arin.net/rest/asn/AS{id}

Search for IPs associated with ASN: https://raw.githubusercontent.com/nitefood/asn/master/asn

ARIN and RIPE, regional registrars who allow full text searches for address space.

US -> https://whois.arin.net/ui/query.do

EU, Central Asia -> https://apps.db.ripe.net/db-web-ui/#/fulltextsearch

Validating that the IP range is owned by the target (using ARIN or automated script https://github.com/Mr-Un1k0d3r/SearchIPOwner)

We can also check a list of IP addresses against cloud provider IP space with ip2provider (https://github.com/oldrho/ip2provider)

- Karnmav2

ASNs can also be located through Karma v2 (Shodan API utility):

bash karma_v2 -d <DOMAIN.TLD> -l <INTEGER> -asn

- Naabu

Then, to scan ports for a give ASN:

echo AS14421 | naabu -p 80,443

To enumerate cloud instances associated to a domain:

./cloud_enum.py -k example-company -k example-company.com -k example-product-name

To scan the entire cloud range within two hours.

To get IP ranges: https://github.com/lord-alfred/ipranges/blob/main/all/ipv4_merged.txt

Last updated