Leaks, Breaches, Logs and Ransomware
Hardware and Enviroment
This type of data can fill your drives quickly. Because of this, there are many hardware considerations.
Is not recommend a virtual machine for this type of work. Downloading large files within a VM will be slightly slower than on a host machine. Decompressing large files within a VM will be much slower, and VM disk space will be depleted quickly. You could connect an external drive to the VM for storage, but the data transfer speed will be a huge bottleneck to the overall efficiency of data queries.
Never use your primary personal or investigations host computer.
Never download breach data onto the same personal machine.
Never process breach data within the host operating system of my primary OSINT investigations computer.
This type of data is full of bad players, malicious files, viruses, and other shady concerns.
You must possess a machine solely for breaches, leaks, logs, and ransomware. It is a Linux machine with minimal applications and blazing fast hardware. When I am not actively downloading data, it is offline. Use a desktop without Wi-Fi connectivity so you can disconnect ethernet whenever you want to be sure it is offline.
Use a minimum of a NVMe PCle 4 drive which is inserted into a slot directly on the motherboard. I am currently using a System 76 Thelio desktop with an 8 TB NVMe PIe 4 drive which can read data at over 7000 MB/s, three times faster than most SATA SSD drives. It has 32 GB of RAM, which helps with fast data processing. The desktop also possesses 20 TB of slower 2.5" 5 TB drives for long term storage of data and only supplies integrated graphics.
This small desktop has 28 TB of storage; is faster than any laptop.
Now, the data you download is toxic, but you don't care.
Then, if you encountered a rare Linux threat, you know that there is no personal information on this machine which could expose you. If the machine becomes infected, it is no big deal to wipe it out and reinstall Linux and backup all of my data to an external 20 TB drive.
If you are not ready to commit to new hardware for these tasks, rely on portable SSD drives made by Sandisk. Sandisk Extreme Portable SSD (amn.to/3vPcMI2) and the SanDisk Extreme PRO Portable SSD (amzn.to/3MKbvZI). Also, SanDisk 2 TB Extreme Portable SSD (amzn.to/3yPcM]2) is a great starter drive.
Once you have your desired computer, make sure you have Ripgrep installed:
sudo apt-get install ripgrep
Never consider a Windows OS host simply due to the common presence of Windows-targeted viruses within the data sets we will acquire.
Data Parsing
Cat
is a command which allows us to combine multiple files into one large file.
Ripgrep
is a utility which allows us to search through large data files as fast as possible. It eliminates the need to possess a database with an index in order to conduct keyword searches.
Sed
is a command which allows us to substitute or remove portions of data within a file.
Cut
is a command which allows us to extract only the desired columns from a CSV file.
Sort
is a command which will remove duplicate lines and alphabetize our data within a file.
LC_ALL=C
can be used when unusual characters prevent a command from completing.
JQ
can be used to extract ]SON data with ease, without the need for the previous utilities.
- Basic commands
We will use this tool to peek into large files.
rg -aFiN Test@Gmail.com
--> Search EXACTLY test@gmail.com or TEST@GMAIL.com
rg -aiN Test@Gmail.com
--> Search ALL test and gmail and com
rg -aFN Test@Gmail.com
--> Search ONLY Test@Gmail.com and not test@gmail.com
rg -aFi Test@Gmail.com
--> Search EXACTLY test@gmail.com and show line #
rg --help
--> Show Ripgrep help menu
To search two specific pieces of data within a single file:
rg -aFiN "Michael" | rg -aFiN "Bazzell" Voter-FL-2018.txt
To search two potential pieces of data within a single file:
rg -aFiN "Bazel|Bazzell" Voter-FL-2018. txt
- Text Files
To combine all of them into one single file and title it appropriately.
cd ~/Downloads/USVoterData_BF/data/Florida/2018/Voters
cat * > Voter-FL-2018.txt
This new large text file may take a long time to open, which is quite common with these datasets.
To remove unnecesary tabs and colons:
sed -i 's/ /:/g' Voter-FL-2018.txt
sed -i 's/::/:/g' Voter-FL.txt
--> Save it in real-time (-i
) and replace every occurrence of :: with : throughout the current file ('s/::/:g'
).
To remove any lines which are exact duplicates and sort the data alphabetically:
sort -u -b -i -f Voter-FL-All.txt > Voter-FL-All-Cleaned.txt
To combine two files into one file while removing all duplicate lines:
sort -u -f Giganews2.txt UsenetHistorical2.txt > UsenetFinal.txt
- CSV Files
These are stored within comma-separated value files (CSV) and could simply be opened with any spreadsheet program.
If we have text files with this format, to convert them into a csv and just take the desired columns:
LC_ALL=C cut -d, -f4,5,20,23,29,37 *.txt > Voter-CO-2021.csv
To remove unnecesary quotation marks:
LC_ALL=C sed -i "s/[\"]//g" Voter-CO-2021.txt
To combine every CSV file into a single text file and only extract the columns of data most valuable to us:
find . -type f -name \*.csv -print0 | xargs -0 cut -f1,3,4,5 > Giganews2.txt
To eliminate duplicated entries in text files that previously were cvs files:
LC_ALL=C gsort -u -b -i -f data-cleaned.txt > Data-Final.txt
- JSON Files
When I encounter JSON data, I rely on JO (stedolan.github.io/jq/).
sudo apt-get install jq
To make a selective extraction:
jq --raw-output '"\(.data.first_name),\(.data.last_name),\(.data.gender),\(.data.birth_year),\(.data.birth_date),\(.data.linkedin_username),\(.data.facebook_id),\(.data.twitter_username),\( .data.work_email),\(.data.mobile_phone)"' people.json > people.txt
Methedology
Conduct queries across the open and dark web, you could use IntelTechniques Search Tool and Tor Browser.
Download data using OSINT Tools and Skills from all the OSINT sections.
As a start, you may consider focusing on public credential leaks on Pastebin (Pastebin.com). When you search an email address on Pastebin via Google, the results often include paste dumps. Clicking on these links will take you to the original paste document, which likely has many additional compromised credentials. A query of test@test.com on this site will likely present hundreds of data sets ready for download.
Any time I see on HIBP that a new public leak has surfaced, I search for that specific data with a custom Google search. If I saw that Myspace has been exposed, I would use the following.
"myspace" ext:rar OR ext:zip OR ext:7z OR ext:txt OR ext:sql
Now, assume that you possess the databases, our first step is to query the email, username, name or whatever you have through all the databases you have acquired.
If we notice that the target used a password on a site, see should next conduct a search of that password to see if it is used anywhere else.
Data Hunting
Most of these sites comes and go very frequently.
Known Data Breaches
- COMB
"Compilation Of Many Breaches", otherwise known as "COMB".
There are numerous other combo lists released in Anti-Public, Exploit.in, and others.
site:anonfiles.com "CompilationOfManyBreaches.7z"
https://twitter.com/BubbaMustafa/status/1370376039583657985 —> Password: "+w/P3PRqQQoJ6g"
COMB includes a fast search option:
./query michaelbazzell@gmail.com
This presents limitations. You cannot use this tool to search a specific domain or password. For that we will once again rely on Ripgrep.
- hashes.org
The entire archive of hashes and passwords which previously existed on hashes.org is available as a torrent file.
90 GB compressed download:
https://pastebin.com/pS5AQNVOhttps://old.reddit.com/r/Data/oarder/comments/ohlcye/hashesorg_archives_of_all_cracked_hash_lists_up
- HashMob (hashmob.net)
Paid search service and free downloads of lists similar to those previously found at Hashes.org.
These files are very similar to the Hashes.org data, but updated often.
To download all:
wget --content-disposition -i https://inteltechniques.com/data/hashlists.txt
- BTDig (btdig.com)
Go to this site and search for "public_ “.
241 GB of data titled "MyCloud".
- Archive.org Breach Data
Search on their site for
"nulled io" and click the "nulled.io _database _dump".
10 GB database from the Nulled hacking forum.
You could spend weeks on Archive.org identifying breach data.
Open Databases
- Elasticsearch Databases
The best tool to find this data is Shodan (shodan.io), you will need to be logged in to a free or paid account to conduct these queries.
Elasticsearch databases are extremely easy to access.
product:elastic port:9200 [target data]
Clicking the red square with arrow next to the redacted IP address connects to. the database within your browser in a new tab. This confirms that the database is online and open.
http://34.80.1.1:9200/_cat/indices?v
http://34.80.1.1:9200/bank/_search?size=10000
A Python script which parses through all results and stotes ever record is most appropriate —> Elasticsearch Crawler (github.com/AmIJesse/ Elasticsearch-Crawler), This will present several user prompts to enter the target IP address, index name, port number, and fields to obtain.
Online Data Search Sites
Remember that all of these sites come and go quickly.
- Underground Forums
BreachForums (breached.to / breached.vc / bf.hn)
- Resources to begin a leak search
People Data Labs: I mentioned this phenomenal resource previously in this book. In 2019, a security researcher claimed to have scraped an open database of over 1.2 billion PDL records including names, locations, and social network content. This data set is still floating around, and it is valuable.
Pipl: Similar to the previous example, a reseatcher claimed to extract 50 million profiles in 2019 from the people search site Pipl, containing full names, email addresses, vehicles owned, phone numbers and home addresses. Every week, I see someone offering a full copy for download.
IntelligenceX: I discussed this paste-scraping service in previous chapters. In 2021, a disgruntled customer "scraped their scrapes". Today, the 80,000 copied documents have been merged into one file possessing sensitive data associated with 46 million email addresses.
LiveRamp: In 2021, data allegedly sourced from LiveRamp (formerly Acxiom) was leaked. The data contains extensive information on most people living in the U.S., including home addresses, cellular telephone numbers, email addresses, names and more. It is commonly traded on Telegram, but the original source is unknown.
White Pages: Someone breached the White Pages site and copied 11 million profiles with user's names, email addresses, and passwords. It has been repackaged for years and offered as "new".
Verifications.io: This "enterprise email validation" service exposed their MongoDB database containing 763 million records including email addresses, names, genders, IP addresses, phone numbers and other personal information. It is widely available as a 150 GB download
- Breach Search Resources
Each specific section has this breach data resources for each specific task explained. However, here is a summary with direct URLs:
Email Address (test@test.com)
https://haveibeenpwned.com/unifiedsearch/test@test.com
https://dehashed.com/search?query=test@test.com
https://psbdmp.ws/api/search/test@test.com
https://portal.spycloud.com/endpoint/enriched-stats/test@test.com
https://check.cybernews.com/chk/?lang=en_US&e=test@test.com
https://intelx.io/?s=test@test.com
Username (test):
https://haveibeenpwned.com/unifiedsearch/test
https://dehashed.com/search?query=test
https://psbdmp.ws/api/search/test
Domain inteltechniques.com:
https://dehashed.com/search?query=inteltechniques.com
https://psbdmp.ws/api/search/inteltechniques.com
https://intelx.io/?s=inteltechniques.com
Telephone (6185551212):
https://dehashed.com/search?query=6185551212
IP Address (1.1.1.1):
https://dehashed.com/search?query=1.1.1.1
https://psbdmp.ws/api/search/1.1.1.1
Name (Michael Bazzell):
https://dehashed.com/search?query=michaelBazzell
https://psbdmp.ws/api/search/Michael%20Bazzell
Password (password1234):
https://dehashed.com/search?query=password1234
https://psbdmp.ws/api/search/password1234
https://www.google.com/search?q=password1234
Hash (DC87B9C894DA5168059E00EBFFB9077):
https://hash.ziggi.org/api/dehash.get?hash=BDC87B9C894DA5168059E00EBFFB9077&include_external_db
https://decrypt.tools/client-server/decrypt?type=md5&string=BDC87B9C894DA5168059E00EBFFB9077
https://md5.gromweb.com/?md5=BDC87B9C894DA5168059E00EBFFB9077
https://www.nitrxgen.net/md5db/BDC87B9C894DA5168059E00EBFFB9077
https://dehashed.com/search?query=BDC87B9C894DA5168059E.00EBFFBO077®
https://www.google.com/search?q=BDC87B9C894DA5168059E00EBFFB9077
Miscellaneous Sites:
LeakPeek (leakpeek.com)
Beach Directory (breachdirectory.org)
Advanced Breaches Tools
- h8mail
Combine many of the breach services.
It should never take the place of a full manual review.
To install it:
mkdir ~/Downloads/Programs/h8mail
cd ~ Downloads/Programs/h8mail
python3 -m venv h8mailenvironment
source h8mailEnvironment/bin/activate
sudo pip install -U h8mail
deactivate
cd ~/Downloads && h8mail -g
sed -i 's/\;leak\-lookup\_pub/leak\-lookup\_pub/g' h8mail_config.ini
To use it:
h8mail -t <target email address>
Next, you can provide API keys from services such as Snusbase, WeLeakInfo, Leak-Lookup, HavelBeenPwned, Emailrep, Dehashed, and Hunterio will provide MANY more results, but these services can be quite expensive.
To create a configuration file and provide API keys to the databases we want to search. To create a configuration file, enter; python3 h8mail -g
- Custom breaches-leaks.sh script
Combines h8mail, elastic searches and hash searches.
- IntelTechniques Breaches & Leaks Tool
This tool combines most of the online search options.
Code at Breaches.html.
- Git Leaks
To enumerate leaked credentials for AWS or GCP within github repositories:
gitleaks --repo-url=https://github.com/example/production -v
Similar tools for github leaks:
https://github.com/gitleaks/gitleaks
https://github.com/eth0izzle/shhgit
https://github.com/michenriksen/gitrob
- JS Leaks
To find leaked credentials within the target website javascript files:
https://github.com/m4ll0k/SecretFinder/tree/master
cat subdomains_alive.txt | gau > params.txt
cat params.txt | uro -o filtered-params.txt
cat filtered-params.txt | grep ".jsp$" > jsfiles.txt
cat jsfiles.txt | uro | anew jsfiles.txt
cat jsfiles.txt | while read url; do python3 SecretFinder.py -i $url -o cli >> secret.txt
- TruffleHog
Monitor Git, Jira, Slack, Confluence, Microsoft Teams, Sharepoint, and more.. for credentials:
https://github.com/trufflesecurity/trufflehog
Stealer Logs
The most common stealer logs we find are labeled as Raccoon Stealer, Redline Stealer, and Vidar Stealer. The groups which offer data stolen via these programs is limitless. A new group pops up every week.
Your browser has likely asked you if you would like to store information which was entered into an online form. If you allow your browser to store this data, stealer logs easily collect it into their systems.
They possess the cookies from your browser, also, that files offers immediate access to the priority domains which exist in the overall record, parses data from the stored form fields, identifies all installed browsers and versions, presents all applications installed within the machine, presents all of the passwords stored within all browsers, take screen captures of the victim's machine at the time of infection and displays general details about the system, including the victim's IP address, hardware, location, and date.
The following Google queries might present interesting information, but most results lead to shady criminal marketplaces:
"stealer logs" "download"
"stealer logs" "Azorult"
"stealer logs" "Vidar"
"stealer logs" "Redline"
"stealer logs" "Raccoon"
- Telegram (telegram.org)
Telegram is quite possibly the best resource for Stealer Logs.
sudo apt install telegram-desktop
There are dozens of Telegram channels which offer free stealer logs in order to promote their paid services, we can search them with the services presented in the “Social Media > Telegram” section, with the oficial telegram search field, or through google searches, examples:
site:breached.vc "mega.nz" "stealer log"
site:breached.vc "anonfiles.com" "stealer log"
site:anonfiles.com "logs" site:mediafire.com "logs"
Known Telegram stealer log rooms:
"rubancloudfree"
"Ruban Private @rubanowner.rar"
Another Telegram channel is called "redlogscloud”.
Other groups: Logs Cloud, Keeper Cloud, Luffich Cloud, Crypton Logs, Bugatti Cloud, Ruban Cloud, Luxury Logs, Wild Logs, Eternal Logs, Logs Arthouse, HUBLOGS, Redlogs Cloud, OnelOgs, Bank, Observer Cloud, Expert Logs, BradMax, and Cloud Logs.
- Data Cleansing
Most people find the email addresses, usernames, and passwords to be the most valuable part of this data. While the screen captures, documents, and cookies are beneficial, they take up the most space.
Instead of saving the password files, let's merge them all into one file.
To navigate to each "passwords.txt file; extract all of the text data; compile it into one file; and ignore the case of the file in order to extract data from both "passwords.txt" and "Passwords.txt":
find . -iname 'passwords.txt' -exec cat {} \; > ~/Downloads/Passwords.txt
find . -iname 'passwords*' -exec cat {} \; > ~/Downloads/Passwords.txt
You could replicate this for the " Important Autofills" files with the following:
find . -iname 'importantautofills.txt' -exec cat {} \; > ~/Downloads/ImportantAutofills.txt
We can also create an ever-growing Passwords. txt file which can be quickly queried for times:
timestamp=$(date +%Y-%m-%d) && find . -iname 'passwords.txt' -exec cat {} \; > ~/Downloads/Passwords.$timestamp.txt
Ransomware
The first step is to identify the current URLs for the vatious ransomware groups.
Some resources:
http://ransomwr3tsydeii4q43vazm/wofla5ujdajquitomtd47cxjtfgwyyd.onion/
https://github.com/fastfire/deepdarkCTI/blob/main/ransomware_gang.md
https://darkfeed.io/ransomgroups/
Next, Google should help you find any other pages valuable to ransomware investigations. Queries:
"onion" "Ragnar" "ransomware" "url"
"onion" "REvil" "ransomware" "url"
"onion" "Conti" "ransomware" "url"
"onion" "Vice Society" "ransomware" "url"
"onion" "Clop" "ransomware" "url"
"onion" "Nefilim" "ransomware" "url"
"onion" "Everest" "ransomware" "url"
Hack Notice often announce ransomware publications:
site:https://app.hacknotice.com "onion"
Clicking the title of the article presents the Tor URL which may display the stolen data. Clicking "View Original Source” on this new page will attempt to open the Tor URL in your browser.
Hashes
Most websites and a lot of data sets stores passwords in hashed format.
To identify and crack these hashes, check "Password Cracking" section.
If we possess a database with hashed passwords and we want to search a password, we can invert the process and convert a password into the hashed version of the database.
Some resources to convert a password into common hashes:
https://passwordsgenerator.net/md5-hash-generator/
https://passwordsgenerator.net/shal-hash-generator/
https://passwordsgenerator.net/sha256-hash-generator/
Data Cleansing
It will not take long for your own data collection to fill your drive space.
- Cheatsheet of cleansing commands
Replace "OLD" with "NEW": sed -i 's/OLD/NEW/g' data.txt
Replace all commas with a hyphen: sed -i 's/\,/\-/g' data.txt
Replace all tabs with a comma: sed -i 's/[strike ctrl-v-tab]/\,/g' data.txt
Remove all data until the first comma: sed -i 's/^\([^,]*,\)//g' data.txt
Remove all data until the first colon: sed -i 's/^[^:]*://g' data.txt
Remove all single quotes: sed -i ' s/\'//g' data.txt
Remove all double quotes: sed -i 's/\"//g' data.txt
Remove "junk": sed -i 's/junk//g' data.txt
Remove all between "FIRST" & "THIRD": sed -i 's/\(FIRST\).*\(THIRD\)/\1\2/' data.txt
Remove all digits between commas: sed -i 's/\,[0-9]*\,//g' data.txt
Remove any line beginning with "TEST": sed -i '/^TEST/d' data.txt
Remove any line not containing "@": awk '/@/' data.txt > newfile.txt
Remove empty lines: "@": sed -i "' '/^$/d" data.txt
Remove first 10 lines: sed -i '1,10d' data.txt
Remove first ten characters: sed -i 's/^.\{10\}//' data.txt
Remove everything after the last "_": sed -i "s/_[^_]*$//" data.txt
Remove 0000-00-00 00:00:00 : sed -i 's/[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9]//g' data.txt
Remove duplicate rows: awk '!seen[$0]++' data.txt > newfile.txt
Remove duplicate rows and sort: LC_ALL=C sort -u -b -i -f data.txt > newfile.txt
Remove data between "{" and "}": sed -i 's/{[^}]*}//g' data.txt
Extract Emails: grep -E -o "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" < data.txt > newfile.txt
Split large file into multiple files: split -l 200000000 data.txt 1
Display total lines in a file: wc -l data.txt
Cut columns 1, 2, and 6: LANG=C cut -d, -f1,2,6 data.txt > newfile.txt
Remove hyphens from phone numbers: sed -i 's/\([0-9]\{3,\}\)-/\1/g' data.txt
Cut columns from JSON: jq --raw-output '"\(.email)\(.password),"' data.json > newfile.txt
- Duplicate Files
If you collect enough breach data, you will likely find many duplicate files with different names.
Install fdupes:
sudo apt-get install fdupes
The next command launches fdupes, scans recursive files within the Downloads directory, and prompts you to select which duplicate file to keep:
fdupes -r -d ~/Downloads
Last updated