Websites & Domains
Website Content Exfiltration Tools
EyeWitness
- Installation
cd ~/Downloads/Programs
git clone https://github.com/FortyNorthSecurity/EyeWitness.git
cd EyeWitness/Python/setup && sudo ./setup.sh
cd ~/Documents/scripts
sed -i 's/ChrisTruncer/FortyNorthSecurity/g' updates.sh
cd ~/Downloads/Programs
wget https://github.com/mozilla/geckodriver/releases/download/v0.32.0/geckodriver-v0.32.0-linux-aarch64.tar.gz
tar -xvzf geckodriver* && chmod +x geckodriver
sudo mv geckodriver /usr/local/bin
- Usage
Open your Applications menu and launch Text Editor.
Type or paste URLs, one per line, and save the file to your Desktop as "sites.txt".
Open Terminal and enter the following commands:
cd ~ /Downloads/Programs/EyeWitness/Python
./EyeWitness.py -f ~/Desktop/sites.txt --web -d ~/Documents/Eyewitness/
The results include screen captures of each target website and detailed information including the server IP address, page title, modification date, and full source code of the page.
- Custom captures.sh script
To make a desktop shortcut of the previous script:
GoWitness (https://github.com/sensepost/gowitness)
The Harvester
Search a supplied domain with the intent of providing email addresses associated to the target.
Installation:
cd ~/Downloads/Programs
git clone https://github.com/laramies/theHarvester.git
cd theHarvester
python3 -m venv theHarvesterEnvironment
source theHarvesterEnvironment/bin/activate
sudo pip install -r requirements.txt
deactivate
Usage:
theHarvester -h
theHarvester -d <domain>
We can specify which source we want to access for data by using the -b switch, such as Baidu, Bing, Bing API, Certspotter, CRTSH, DNSdumpster, Dogpile, and many others.
If you want to use all these resources, you can simply use the all switch from the command line.
In some cases, you will want to use the services API (application programming interface). To do so, open the text file in any text editor at /etc/theHarvester/api-kets.yaml
Example:
theHarvester -d teslas.com -b all -f /home/kali/tesla_results2
python3 theHarvester.py -d inteltechniques.com -b bing
Carbon14
It searches for any images hosted within the page and analyzes the metadata for creation dates.
Installation:
cd ~/Downloads/Programs
git clone https://github.com/Lazza/Carbon14
cd Carbon14
python3 -m venv Carbon14Environment
source Carbon14Environment/bin/activate
sudo pip install -r requirements.txt
deactivate
Usage:
python3 carbon14.py https://inteltechniques.com
Metagoofil
Locate and download documents.
Installation
cd ~/Downloads/Programs
git clone https://github.com/opsdisk/metagoofil.git
cd metagoofil
python3 -m venv metagoofilEnvironment
source metagoofilEnvironment/bin/activate
sudo pip install -r requirements.txt
deactivate
Basic usage:
metagoofil -d sans.org -t doc,pdf -l 20 -n 10 -o sans -f html
python3 metagoofil.py -d cisco.com -t pdf -o ~/Desktop/cisco/
python3 metagoofil.py -d cisco.com -t docx,xlsx -o ~/Desktop/cisco/
If I already possess numerous documents on my computer, I create a metadata CSV spreadsheet using ExifTool. I then analyze this document.
If my target website possesses few documents, I download them manually through Google or Bing within a web browser.
If my target website possesses hundreds of documents, I use Metagoofil, but only download one file type at a time. If my target were cisco.com, I would execute the following commands in Terminal:
python3 metagoofil.py -d cisco.com -t pdf -o ~/Desktop/cisco/
python3 metagoofil.py -d cisco.com -t doc -o ~/Desktop/cisco/
python3 metagoofil.py -d cisco.com -t xls -o ~/Desktop/cisco/
руthon3 metagoofil.ру -а сіsсо.сот -t ppt -o ~/Desktop/cisco/
python3 metagoofil.py -d cisco.com -t xlsx -o ~/Desktop/cisco/
python3 metagoofil.py -d cisco.com -t pptx -o ~/Desktop/cisco/
Custom domains.sh script
*This script also contains tools from "Subdomains and Directories" section.
To make a desktop shortcut of the previous script:
HTTrack
Make an exact copy of a static website
Installation:
sudo apt install -y httrack webhttrack
GUI: webhttrack
Terminal version: httrack
Subdomains and Directories
censys-subdomain-finder
https://github.com/christophetd/censys-subdomain-finder
It should return any subdomain who has ever been issued a SSL certificate by a public CA, that's why it is very interesting option for passive scan.
To get the API, register In https://censys.io/register, the go to https://censys.io/register/account/api/ and copy the API ID and Secret values and run the following commands to make the script works:
export CENSYS_API_ID=...
export CENSYS_API_SECRET=...
To perform the queries:
python censys-subdomain-finder.py example.com -o subdomains.txt
Robots.txt
Practically every professional website has a robots.txt file at the "root" of the website. This file is not visible from any of the web pages at the site. It is present in order to provide instructions to search engines that crawl the website looking for keywords. These instructions identify files and folders within the website that should not be indexed by the search engine.
They usually provide insight into which areas of the site are considered sensitive by the owner.
To query:
site:cnn.com robots ext:txt
We can also query the Wayback Machine to display changes of this file over time.
Other usefull tools
- PentestTools (pentest-tools.com/information-gathering/find-subdomains-of-domain)
This unique tool performs several tasks that will attempt to locate hidden pages on a domain. First it performs a DNS zone transfer which will often fail. It will then use a list of numerous common subdomain names and attempt to identify any that are present. If any are located, it will note the IP address assigned to that subdomain. and will scan all 254 IP addresses in that range.
- Columbus Project (columbus.elmasy.com)
curl -H "Accept: text/plain" "https://columbus.elmasy.com/lookup/cnn.com"
If these options do not provide the results you need, consider SubDomain Finder (subdomainfinder.c99.nl) and DNS Dumpster (dnsdumpster.com). These services rely on Host Records from the domain registrat to display potential subdomains.
- amass (https://github.com/OWASP/Amass)
This is a brute force option.
Installation:
sudo snap install amass
Usefull commands:
amass intel -whois -ip-src -active -d inteltechniques.com
amass enum -src -ip -passive -d inteltechnigues.com
For a correct config:
amass enum -list
Enable any free API NOT enabled (needed to be effective):
amass enum -list | grep -v "\*"
For this tool, the Best paid APIS:
SecurityTrails
SpiderFootHX
And the best free APIS (put them all in ~/.config/amass/config.ini
):
FacebookCT
PassiveTotal
Shodan
- sublist3r
This tool only find common subdomains.
Installation:
cd ~/Downloads/Programs
git clone https://github.com/aboul3la/Sublist3r.git
cd Sublist3r
python3 -m venv Sublist3rEnvironment
source Sublist3rEnvironment/bin/activate
sudo pip install -r requirements.txt
deactivate
Usage:
python3 sublist3r.py -d inteltechniques.com
- Photon
This tool will search for internal pages.
Installation:
cd ~/Downloads/Programs
git clone https://github.com/s0md3v/Photon.git
cd Photon
python3 -m venv PhotonEnvironment
source PhotonEnvironment/bin/activate
sudo pip install -r requirements.txt
deactivate
Usage:
python3 photon.py -u inteltechniques.com -1 3 -t 100
Usefull to get working URLs for a list of subdomains, and get with the https:// prefix:
cat subdomains.txt | httpx -silent
cat subdomains.txt | httpx -l subdomains.txt -ports 80,8080,8000,8443,8080,8888,10000 -threads 200 > subdomains_alive.txt
- Subfinder (https://github.com/projectdiscovery/subfinder)
Fast passive subdomain enumeration tool.
subfinder -d example.com
echo example.com | subfinder -silent | httpx -silent
subfinder -dL domains.txt -all -recursive -o subdomains.txt
- Assetfinder (https://github.com/tomnomnom/assetfinder)
- Katana (https://github.com/projectdiscovery/katana)
Crawler that find directories.
cat domains.txt | httpx | katana --silent
- gau (https://github.com/lc/gau)
gau --mt text/html,application/json --providers wayback,commoncrawl,otx,urlscan --verbose example.com
- Censys nmap script (https://github.com/censys/nmap-censys):
cp censys-api.nse /usr/share/nmap/scripts/
export CENSYS_API_ID=…
export CENSYS_API_SECRET=…
nmap -sn -Pn -n --script censys-api scanme.nmap.org
- censys_search.py
https://github.com/sparcflow/HackLikeALegend/blob/master/py_scripts/censys_search.py
! Deprecated, using v1 of the API, could be updated with minor changes
- chaos.projectdiscovery.io
API to obtain subdomains for a given domain.
- Shosubgo (https://github.com/incogbyte/shosubgo)
To obtain subdomains through Shodan.
- crt.sh
curl -s https://crt.sh\/?q=\example.com\&output\=json | jq -r '.[].name_value' \ grep -Po '(\w+\.w+\. \w+)$'
- anew
Once obatined new subdomains:
https://github.com/tomnomnom/anew
cat new-subdomains | anew subdomains.txt
Current Domain Registration and Hosting
- dig command
We use dig command to return the IP address of a web:
dig +short www.example.com
+short
flag shortens the output
- whois command
We use whois lookup to figure out who hosts the main website.
whois {IP obteined before}
- query_whois.py
Some of these websites might be hosted by third parties and others by the company we are scanning itself.
We can expose the site hosts for a list of domains with the following script:
This script loops through multiple whois calls and extract relevant information into a readable CVS file.
python query_whois.py domains.txt | column -s "," -t
Then we could try getting reachable services and their versions with nmap:
nmap -p --sV 1.1.1.1-256
(Here we can select the range of IP's discovered)
- SubW0iScan.py
For a list of subdomains, obtain Active Domains, Hosting name, Domain IP, IP Range and Country.
https://github.com/Sergio-F20/SubW0iScan
python SubWh0iScan.py -d subdomains-list.txt -o subdomains-info.csv
- ViewDNS Whois (viewdns.info/whois)
This service provides numerous online searches related to domain and IP address lookups.
ViewDNS will occasionally block my connection if I am connected to a VPN. An alternative Whois research tool is who.is.
https://viewdns.info/whois/?domain=cnn.com
- ViewDNS Reverse IP (viewdns.info/reverseip)
Next, you should translate the domain name into the IP address of the website. ViewDNS will do this, and display additional domains hosted on the same server.
If the results had included domains from websites all over the world without a common theme, it would have indicated that this was a shared server, which is very common.
https://viewdns.info/reverseip/?host=cnn.com&t=1
- ViewDNS Reverse Whois (viewdns.info/reversewhois)
This utility attempts to search the domain in order to locate other domains owned by the same registrant.
If the domain possessed private registration, this technique would fail.
https://viewdns.info/reversewhois/?q=cnn.com
- ViewDNS Port Scanner (viewdns.info/portscan)
This online port scanner looks for common ports that may be open.
https://viewdns.info/portscan/?host=cnn.com
- ViewDNS IP History (viewdns.info/iphistory)
This tool translates a domain name to IP address and identifies previous IP addresses used by that domain.
https://viewdns.info/iphistory/?domain=cnn.com
- ViewDNS DNS Report (viewdns.info/dnsreport)
This option presents a complete report on the DNS settings for the target domain.
https://viewdns.info/dnsreport/?domain=cnn.com
Historical Domain Registration
Many domains now possess private registration. If you query a domain and see a name entry such as "WhoisGuard Protected", you know that the domain is protected.
To reveal the owner is through historical domain records. If the domain-has been around a while, there is a very good chance that the domain was not always private.
- Whoxy (whoxy.com)
This is one of the very few premium services which offer a decent free tier.
https://www.whoxy.com/inteltechniques.com
The search option in the upper right allows us to query email addresses, names, and keywords. This can be extremely valuable when you do not know which domain names your target has owned.
It allows a free API demo at https:/ /www.whoxy.com/whois-history/demo.php, but you will be rate-limited if it detects abuse. They do not offer a free trial of their API, but the fees are minimal. The current price is $2.00 for 400 queries. The following is the URL structure:
https://api.whoxy.com/?key=XXXX&history=inteltechniques.com | python3-mjson.tool
- Whoisology (whoisology.com)
Like Whoxy, it provides historical domain records as a reverse-domain search utility.
Search field requests a domain or email address.
Once logged in as a free user, you receive much more detail within your searches.
It has the ability to immediately search for additional domains associated within any field of this data.
This type of cross-reference search has not been found through many other services. Another powerful feature of Whoisology is the historical archives. This service constantly scans for updates to domain registrations. When new content is located, it documents the change and allows you to search the previous data.
- WhoisXMLAPI (whois.whoisxmlapi.com)
Sign up for a free account and confirm your email address. Click on profile and select the "My Products" option and make note of your API key.
To make a query (output JSON format):
curl 'https://whois-history.whoisxmlapi.com/api/vl?apiKey=at_0vPfsSUdf1ZpiCxc5&domainName=inteltechniques.com&mode=purchase' | python3 -mjson.tool
or
- Archive.org Domain Registration Data (web.archive.org)
We can query the Wayback Machine for the exact historical URL of a domain registration.
Following URLs, displays any results:
https://web.archive.org/web/http://www.who.is/whois/cnn.com/
https://web.archive.org/web/https://whois.domaintools.com/cnn.com
https://web.archive.org/web/https://www.whoxy.com/cnn.com
https://web.archive.org/web/https://domainbigdata.com/cnn.com
https://web.archive.org/web/https://whoisology.com/cnn.com
- SecurityTrails (https://securitytrails.com/)
Historical Content Archives
- Archive Box (github.com/ArchiveBox/ArchiveBox)
To install and initialize the application:
mkdir ~ /Downloads /Programs / archivebox
cd ~ /Downloads/Programs/archivebox
python3 -m ven archiveboxEnvironment
source archiveboxEnvironment/bin/activate sudo pip install archivebox
deactivate mkdir ~/Documents/archivebox cd ~/Documents/archivebox archivebox init
Once installed, add target website; launch the server; and open the database within our browser:
cd ~ /Documents/archivebox
archivebox add 'https://notla.com'
archivebox server 0.0.0.0:8000
firefox http://0.0.0.0:8000
Archive Box captured the target with SingleFile through Chrome; generated a PDF and screenshot; performed a WGET of the live page; extracted the page from Archive.org; fetched all HTML code; and downloaded any media files.
- Custom Internet Archive Tool
This script includes the tool from the previous section “Archive Box” and the "Change Detection" Tool. It esentially takes advantage of a Python script called "waybackpy".
Executing the script conducts the following tasks:
Make a directory in the Documents folder for data and enter it.
Download all known URLs indexed by Internet Archive into a text file.
Download the oldest known archive URL into a text file.
Append the file with the newest archive URL.
Append the file with URLs from the past ten years.
Remove duplicates and sort by date.
Generate screen captures of all unique links with only one thread (slower).
Download source code of the oldest and newest archives.
The output will be text and html files, each of these is an archived home page of the target website from a different date.
Archives.sh script:
- The Internet Archive
More advanced tools has been explained, however, to query through URL:
https://web.archive.org/cdx/search/cdxPurl=cnn.com/*&output=text&fl=original&collapse=urlkey
- Archive Today (archive.is/archive.fo/archive.md)
This service also collects copies of websites and ignores all requests for deletion.
https://archive.is/*.inteltechniques.com
- Mementoweb (mementoweb.org)
This service offers a "Time Travel" option which presents archives of a domain from several third-party providers.
http://timetravel.mementoweb.org/list/19991212110000/http://inteltechniques.com
- Library of Congress (webarchive.loc.gov)
This option allows you to search by domain to discover all publicly available content in the Library of Congress Web Archives.
https://webarchive.loc.gov/all/*/http://inteltechniques.com
- Portuguese Web Archive (arquivo.pt)
https://arquivo.pt/page/search?hitsPerPage=100&query=site%3Ainteltechniques.com
Screen Captures & Monitoring
Historical Screen Captures
"Custom Internet Archive Tool" also generates screenshoots, below is a list of other tools that can generate screenshots of previous versions of a website.
- Search Engine Cache
This should be conducted first in order to identify any recent cached copies. Google is going to possess the most recent cache.
https://webcache.googleusercontent.com/search?g=cache:inteltechniques.com
- Website Informer (website.informer.com)
Screen capture available to the right of a search result.
- URLScan (urlscan.io)
Similar to the previous option, but the screen captures are often unique.
- Easy Counter (easycounter.com)
The screen capture presented here was very similar to Website Informer, but it was cropped slightly different.
- Domain Tools (whois.domaintools.com)
These screen captures are in high resolution and current.
- Domains App (dmns.app)
This service offers the highest resolution image.
https://files.dmns.app/screenshots/inteltechniques.com.jpg
- Hype Stat (hypestat.com)
The lowest-quality option, but typically shows an older image.
- Carbon Dating (carbondate.cs.odu.edu)
This free service provides a summary of available online caches of a website, and displays an estimated site creation date based on the first available capture.
http://carbondate.cs.odu.edu/#inteltechniques.com
Current Screen Captures
- webscreenshoot python tool
For a list of URLs we can use webscreenshoot tool which grabs a screenshot of the main page of each URL for a quick manual review:
pip install webscreenshoot
webscreenshoot -i urls.txt
Monitoring Through Screen Captures and Change Detection Tools
Once you locate a website of interest, it can be time consuming to continually visit the site looking fot any changes. With large sites, it is easy to miss the changes due to an enormous amount of content to analyze.
- Follow That Page (followthatpage.com)
Enter the address of the target page of interest, as well as an email address where you can be reached. This service will monitor the page and send you an email if anything changes. Anything highlighted is either new or modified content. Anything that has been stricken through indicates deleted text.
It does not wotk well on some social networks.
- Visual Ping (visualping.io)
Robust options.
Allows you to select a target domain for monitoring.
Visual Ping will generate a current snapshot of the site and you can choose the level of monitoring. I recommend hourly monitoring and notification of any "tiny change". It will now check the domain hourly and email you if anything changes. If you are watching a website that contains advertisements or any dynamic data that changes. often, you can select to avoid that portion of the page.
- Change Detection (github.com/dgtlmoon/changedetection.io)
The services above can be inappropriate for sensitive investigations. This is where Change Detection can help. It is locally installed and available only to you.
The following configures the application:
mkdir ~ / Downloads /Programs / changedetection
cd ~/Downloads/Programs/changedetection
python3 -m venv changedetectionEnvironment
source changedetectionEnvironment/bin/activate
sudo pip install changedetection.io
deactivate
Then to launch it:
mkdir ~ / Documents/ ChangeDetection
changedetection.io -d ~/Documents/ChangeDetection -p 5000 & firefox http://127.0.0.1:5000
This tool provides the same service as the online options, if not better, but gives you full control within your Linux machine.
Email Address Identification
Identify any email addresses associated with a specific domain. This can lead to the discovery of emplovees and can be used for further breach data queries.
- Hunter (hunter.io)
This tool can also accept a domain name as a search term, and provides any email addresses that have been scraped from public web pages.
https://hunter.io/try/search/cnn.comPlocale=en
It also includes the online source of the information which was scraped.
- Website Informer (website.informer.com)
Once exhausted Hunter, switch to Website Informer. The results are typically fewer, but they are not redacted.
https://website.informer.com/cnn.com/emails
- SkyMem (skymem.info)
This service also displays full email addresses which are not redacted.
https://www.skymem.info/srch?q=cnn.com
These services will likely never provide the same results. This is why it is so important to exhaust all options in order to acquire the most data possible.
Corporate Mail Service Discovery
Once we have a domain name, its likely that, if its an organization, is using a mail service like Office 365, Google Suite or Amazon WorkMail:
- Office 365 (O365):
MX Record Check:
Use MXToolbox. (https://mxtoolbox.com/)
Enter the domain and check for MX records pointing to outlook.com, office365.com, or exchange.microsoft.com.
Login Page Check:
A redirect to a branded login page indicates Office 365 usage.
- Google Suite (G Suite):
MX Record Check:
Use MXToolbox. (https://mxtoolbox.com/)
Enter the domain and look for MX records typically pointing to aspmx.l.google.com.
Login Page Check:
A redirect to a Google login page suggests G Suite usage.
- Amazon WorkMail:
MX Record Check:
Use MXToolbox. (https://mxtoolbox.com/)
Enter the domain and search for MX records related to amazonses.com or workmail.
CNAME Record Check:
Look for CNAME records pointing to workmail.awsapps.com. This can be done with MXToolbox or dig yourdomain.com CNAME
Domain Analytics
Domain analyties are commonly installed on websites in order to track usage information. This data often identifies the city and state from where a visitor is; details about the web browser the person is using; and keywords that were searched to find the site. Only the owner of the website can view this analytic data. Analytics search services determine the specific number assigned to the analytics of a website. If the owner of this website uses analytics to monitor other websites, the analytic number will probably be the same.
Additionally, it will try to identify user specific advertisements stored on one site that are visible on others. It will reverse search this to identify even more websites that are associated with each other.
- Blacklight (themarkup.org/blacklight)
To know about any malicious activity embedded into a target website.
- Spy On Web (spyonweb.com)
Will search a domain name and identify the web server IP address and location. It identifies and cross-references website analytic data that it locates on a target domain.
- Analyze ID (analyzeid.com)
Analyze ID performs the same type of query and attempts to locate any other domains that share the same analytics or advertisement user numbers as your target.
- Hacker Target (hackertarget.com/reverse-analyties-seatch).
This service stands out due to their immediate availability of historie analytics IDs across multiple subdomains.
- DNSLytics (dnslvtics.com/reverse-analytics)
Enter the analytics ID found with any of the previous techniques and you may find associations not present within other options.
https://dnslvtics.com/reverse-analytics/inteltechniques.com
https://dnslytics.com/reverse-adsense/inteltechniques.com
SSL Certificates
- CRT.sh (crt.sh)
https://crt.sh/?q=inteltechniques.com
Most of the data identifies various certificate updates which do not provide any valuable information. However, through the history you can see purchased ones. Clicking on any of the entries displays the other domains secured with that option, including dates of activity. This identifies other domains to investigate.
- Censys
Very valuable when searching for SSL Certificates.
Website Source Code
There are other scanners included in the Enumeration section of “Web Pentesting”.
- Nerdy Data (search.nerdydata.com)
Nerdy Data is a search engine that indexes the source code of websites. I use this to locate websites which steal the source code of my search tools and present them as their own.
If you have located a Google Analytics ID, AdSense ID, or Amazon ID of a website using the previous methods, you should consider searching this number through Nerdy Data.
- Built With (builtwith.com)
Entering a domain of into the Built With search immediately identifies the web server operating system (Linux), email provider (DreamHost), web framework (PHP, WordPress), WordPress plugins, website analytics, video services, mailing list provider, blog environment, and website code functions.
- Stats Crop (statscrop.com)
SEO & SEM
Search Engine Optimization (SEO) applies various techniques affecting the visibility of a website or a web page in a search engine's results.
Search Engine Marketing (SEM) websites provide details valuable to those responsible for optimizing their own websites. SEM services usually provide overall ranking of a website; its keywords that are often searched; backlinks, and referrals from other websites. SEO specialists use this data to determine potential advertisement relationships and to study their competition: Online investigators can use this to collect important details that are never visible on the target websites.
- Similar Web (similarweb.com)
The most comprehensive of the free options. Much of this data is "guessed" based on many factors.
- Moon Search (moonsearch.com)
It provides a recent screen capture of the target domain, plus its ranking, backlinks, IP address, server technologies, and analytics identifiers.
Additional websites which provide a similar service to Moon Search include Search Metrics (suite.searchmetrics.com), SpyFu (spyfu.com), and Majestic (majestic.com).
- Shared Count (sharedcount.com)
It searches your target domain and identifies its popularity on social networks such as Facebook and Twitter.
- Reddit Domains (reddit.com)
If your target website has ever been posted on Reddit, you can retrieve a listing of the incidents.
- Small SEO Tools: Backlinks (smallseotools.com/backlink-checker)
Backlink checker.
- Host.io Backlinks (host.io)
Offer many additional backlinks which are not present within the previous option.
https://host.io/backlinks/inteltechniques.com
- Host.io Redirects (host.io)
This option displays any URLs which are forwarding their traffic to your target site.
https://host.io/redirects/inteltechniques.com
A summary of all details about a domain stored with Host io can be found via the following direct URL.
https://host.io/inteltechniques.com
- Small SEO Tools: Plagiarism Checker (smallseotools.com/plagiarism-checker)
To make sure the content is original.
You can use this tool by copying any questionable text from a website and paste it into this free tool. It will analyze the text and display other websites that possess the same words.
The benefit of using this tool instead of Google directly is that it will structure several queries based on the supplied content and return variations of the found text.
Another option for this type of search is Copy Scape (copyscape.com).
- Visual Site Mapper (visualsitemapper.com)
This service analyzes the domain in real time, looking for linked pages within that domain. It provides an interactive graph that shows whether a domain has a lot of internal links that you may have missed. Highlighting any page will display the internal pages that connect to the selected page. This helps identify pages that are most "linked”.
- XML Sitemaps (xml-sitemaps.com)
This service "crawls" a domain and creates an XML text file of all public pages.
This is a great companion to visual site mappers, as the text can be easily imported into reporting systems. This often presents previously unknown content.
- github-subdomains
https://github.com/gwen001/github-subdomains
To obtain referenced subdomains of the target from github
API is random.
Many API keys needed and many runs to complete.
Domain Reputation
https://www.spam.org/search?type=domain&convert_block=1&group_ips=1&data-inteltechniques.com
https://spamdb.org/blacklists?q=inteltechniques.com
https://www.mywot.com/en/scorecard/inteltechniques.com
Data Breaches and Leaks
Domains can possess valuable breach data.
- Dehashed (dehashed.com)
https://dehashed.com/search?query=linteltechniques.com
- IntelX (intelx.io)
Presents partial Pastebin files which include your target domain.
https://intelx.io/?s=inteltechniques.com.
A free trial is required to see all results.
- Leakpeek (leakpeek.com)
Requires a free account to search domains.
- Phonebook (phonebook.cz)
Searche a domain for any email addresses which exist within publicly available breaches.
Shortened URLs
Social networking sites have made the popularity of shortened URL.
These services create a new URL, and simply point anyone to the original source when clicked.
Bitly allows access to metadata by including a "+" after the URL.
Tinv.co adds a "~" to the end of a link to display metadata.
Google use the “+" at the end.
Bit.do provides the most extensive data. They use a "-" after the URL.
- CheckShortURL (checkshorturl.com)
Catch-all service.
- UNFURL (dfir.blog/unfurl/)
See forwarding details, timestamps, and unique identifiers without visiting the site.
Very beneficial for the investigation of magnet Torrent links.
WordPress Data
WordPress enumeration and explotation steps are described in "Web Pentesting > CMS's > Wordpress". Below are additional OSINT resources to use when facing a wordpress target without needing to install any software.
https://gf.dev/wordpress-security-scanner
https://hackertarget.com/wordpress-security-scan/
Cloudflare
While identifying web hosts and IP addresses behind your target domain, you ate likely to encounter sites hiding behind Cloudflare. This company provides security for websites which often prevents online attacks and outages. They also help keep web host and owner details hidden.
Investigative options:
Historical: Use the previous methods to search historical domain registration records.
Censys (censys.io): Searching a domain name through the Censys "Certificates" search may identify historical SSL ownership records. https://censys.io/certificates?q=inteltechniques.com, hosts search should also be done, then navigate to the direct IP.
Shodan: Search the domain through the SSL search.
Third-Party Tracking: Some websites will hide their domain registration and host, but continue to use analytics, tracking, and Google services. Consider the previous methods of searching Analytics.
Cloudmare: https://github.com/mrh0wl/Cloudmare
Threat Data
- Virus Total (virustotal.com)
The "Details" menu provides the most public data.
The Whois and DNS records should be similar to other sites.
The "Categories area provides the general topics of the target site.
The "HITPS Certificate" section can become interesting very quickly.
The "Subject Alternative Name" portion of this section identifies additional domains and subdomains associated with the SSL certificate of your target site.
The "Relations" tab identifies many new subdomains.
The "Files" section displays unique content from practically any other resource. It identifies files downloaded from the target site for analvsis and files which have a reference to the target site.
The “Community” section is where members of the VirusTotal community can leave comments or experiences in reference to the target.
Always use a virtual machine without network access if you plan to download or open anything found here.
- Threat Intelligence (threatintelligenceplatform.com)
The "Connected Domains" area identifies any external domains which are linked from your source, often display hidden links to third-party services otherwise unknown.
"Potentially dangerous content" and "Malware detection" sections. Both of these offer a historical view into an malicious content hosted on the target domain.
- Threat Crowd (threatcrowd.org)
This service provides a unique view of the domains associated with your target.
- Censys (censys.io)
The moment a certificate is issued, it is provided in real-time to Censys.
Click the "Details button on the summary page, search "alt_name within the results.
HTTP Body text information stored within the "Details" page of the HTTP and HTTPS sections.
Capture this data in the event of a modified or removed target web page.
Advanced DNS Tools
- Domains App (dmns.app)
We can also use this resource to see much more DNS details.
https://dmns.app/domains/michaelbazzell.com/dns-records
For many domains which apply extra level of email security and verification, you will find a legitimate email address which may have escaped your other analysis.
- OSINT.sh (osint.sh)
https://osint.sh/subdomain: Display all subdomains of a target domain https://osint.sh/ stack: Display all technologies in use by a target domain
https://osint.sh/email: Display all email addresses publicly associated with a target domain https://osint.sh/ssl: Display all SSL certificates associated with a target domain
https://osint.sh/whoishistory: Display historic registrations associated with a target domain https://osint.sh/analytics: Display all domains associated with a Google Analytics ID
https://osint.sh/adsense: Display all domains associated with a Google AdSense ID
https://osint.sh/domain: Display all domains associated with keywords https://osint.sh/reversewhois: Display all domains publicly associated with an email address https://osint.sh/ocr; Extract text found within a document stored at a target domain URL
- IntelTechniques Domain Tool
It can automate queries across the most beneficial options. The final section provides easy access to shortened URL metadata.
Code at Domain.html.
- dnsrecon.py
python3 dnsrecon.py -d example-test.com
ASNs and Cloud Assets
An autonomous system (AS) is a very large network or group of networks with a single routing policy. Each AS is assigned a unique ASN, which is a number that identifies the AS.
Autonomous System Numbers are given to large enough nerworks.
ASNs help track down entity's IT infrastructure.
! This are not always a complete picture of a network thanks to rogue assets and cloud.
To locate ASNs:
Validate it against ARIN: https://whois.arin.net/rest/asn/AS{id}
Search for IPs associated with ASN: https://raw.githubusercontent.com/nitefood/asn/master/asn
ARIN and RIPE, regional registrars who allow full text searches for address space.
US -> https://whois.arin.net/ui/query.do
EU, Central Asia -> https://apps.db.ripe.net/db-web-ui/#/fulltextsearch
Validating that the IP range is owned by the target (using ARIN or automated script https://github.com/Mr-Un1k0d3r/SearchIPOwner)
We can also check a list of IP addresses against cloud provider IP space with ip2provider (https://github.com/oldrho/ip2provider)
- Karnmav2
ASNs can also be located through Karma v2 (Shodan API utility):
bash karma_v2 -d <DOMAIN.TLD> -l <INTEGER> -asn
- Naabu
Then, to scan ports for a give ASN:
echo AS14421 | naabu -p 80,443
- cloud_enum.py (https://github.com/initstring/cloud_enum)
To enumerate cloud instances associated to a domain:
./cloud_enum.py -k example-company -k example-company.com -k example-product-name
- Cloud Recon (https://github.com/g0ldencybersec/CloudRecon)
To scan the entire cloud range within two hours.
To get IP ranges: https://github.com/lord-alfred/ipranges/blob/main/all/ipv4_merged.txt
Last updated