Leaks, Breaches, Logs and Ransomware

Hardware and Enviroment

This type of data can fill your drives quickly. Because of this, there are many hardware considerations.

Is not recommend a virtual machine for this type of work. Downloading large files within a VM will be slightly slower than on a host machine. Decompressing large files within a VM will be much slower, and VM disk space will be depleted quickly. You could connect an external drive to the VM for storage, but the data transfer speed will be a huge bottleneck to the overall efficiency of data queries.

Never use your primary personal or investigations host computer.

Never download breach data onto the same personal machine.

Never process breach data within the host operating system of my primary OSINT investigations computer.

This type of data is full of bad players, malicious files, viruses, and other shady concerns.

You must possess a machine solely for breaches, leaks, logs, and ransomware. It is a Linux machine with minimal applications and blazing fast hardware. When I am not actively downloading data, it is offline. Use a desktop without Wi-Fi connectivity so you can disconnect ethernet whenever you want to be sure it is offline.

Use a minimum of a NVMe PCle 4 drive which is inserted into a slot directly on the motherboard. I am currently using a System 76 Thelio desktop with an 8 TB NVMe PIe 4 drive which can read data at over 7000 MB/s, three times faster than most SATA SSD drives. It has 32 GB of RAM, which helps with fast data processing. The desktop also possesses 20 TB of slower 2.5" 5 TB drives for long term storage of data and only supplies integrated graphics.

This small desktop has 28 TB of storage; is faster than any laptop.

Now, the data you download is toxic, but you don't care.

Then, if you encountered a rare Linux threat, you know that there is no personal information on this machine which could expose you. If the machine becomes infected, it is no big deal to wipe it out and reinstall Linux and backup all of my data to an external 20 TB drive.

If you are not ready to commit to new hardware for these tasks, rely on portable SSD drives made by Sandisk. Sandisk Extreme Portable SSD (amn.to/3vPcMI2) and the SanDisk Extreme PRO Portable SSD (amzn.to/3MKbvZI). Also, SanDisk 2 TB Extreme Portable SSD (amzn.to/3yPcM]2) is a great starter drive.

Once you have your desired computer, make sure you have Ripgrep installed:

sudo apt-get install ripgrep

Never consider a Windows OS host simply due to the common presence of Windows-targeted viruses within the data sets we will acquire.

Data Parsing

  • Cat is a command which allows us to combine multiple files into one large file.

  • Ripgrep is a utility which allows us to search through large data files as fast as possible. It eliminates the need to possess a database with an index in order to conduct keyword searches.

  • Sed is a command which allows us to substitute or remove portions of data within a file.

  • Cut is a command which allows us to extract only the desired columns from a CSV file.

  • Sort is a command which will remove duplicate lines and alphabetize our data within a file.

  • LC_ALL=C can be used when unusual characters prevent a command from completing.

  • JQ can be used to extract ]SON data with ease, without the need for the previous utilities.

- Basic commands

We will use this tool to peek into large files.

rg -aFiN Test@Gmail.com --> Search EXACTLY test@gmail.com or TEST@GMAIL.com

rg -aiN Test@Gmail.com --> Search ALL test and gmail and com

rg -aFN Test@Gmail.com --> Search ONLY Test@Gmail.com and not test@gmail.com

rg -aFi Test@Gmail.com --> Search EXACTLY test@gmail.com and show line #

rg --help --> Show Ripgrep help menu

To search two specific pieces of data within a single file:

rg -aFiN "Michael" | rg -aFiN "Bazzell" Voter-FL-2018.txt

To search two potential pieces of data within a single file:

rg -aFiN "Bazel|Bazzell" Voter-FL-2018. txt

- Text Files

To combine all of them into one single file and title it appropriately.

cd ~/Downloads/USVoterData_BF/data/Florida/2018/Voters

cat * > Voter-FL-2018.txt

This new large text file may take a long time to open, which is quite common with these datasets.

To remove unnecesary tabs and colons:

sed -i 's/ /:/g' Voter-FL-2018.txt

sed -i 's/::/:/g' Voter-FL.txt --> Save it in real-time (-i) and replace every occurrence of :: with : throughout the current file ('s/::/:g').

To remove any lines which are exact duplicates and sort the data alphabetically:

sort -u -b -i -f Voter-FL-All.txt > Voter-FL-All-Cleaned.txt

To combine two files into one file while removing all duplicate lines:

sort -u -f Giganews2.txt UsenetHistorical2.txt > UsenetFinal.txt

- CSV Files

These are stored within comma-separated value files (CSV) and could simply be opened with any spreadsheet program.

If we have text files with this format, to convert them into a csv and just take the desired columns:

LC_ALL=C cut -d, -f4,5,20,23,29,37 *.txt > Voter-CO-2021.csv

To remove unnecesary quotation marks:

LC_ALL=C sed -i "s/[\"]//g" Voter-CO-2021.txt

To combine every CSV file into a single text file and only extract the columns of data most valuable to us:

find . -type f -name \*.csv -print0 | xargs -0 cut -f1,3,4,5 > Giganews2.txt

To eliminate duplicated entries in text files that previously were cvs files:

LC_ALL=C gsort -u -b -i -f data-cleaned.txt > Data-Final.txt

- JSON Files

When I encounter JSON data, I rely on JO (stedolan.github.io/jq/).

sudo apt-get install jq

To make a selective extraction:

jq --raw-output '"\(.data.first_name),\(.data.last_name),\(.data.gender),\(.data.birth_year),\(.data.birth_date),\(.data.linkedin_username),\(.data.facebook_id),\(.data.twitter_username),\( .data.work_email),\(.data.mobile_phone)"' people.json > people.txt

Methedology

Conduct queries across the open and dark web, you could use IntelTechniques Search Tool and Tor Browser.

Download data using OSINT Tools and Skills from all the OSINT sections.

As a start, you may consider focusing on public credential leaks on Pastebin (Pastebin.com). When you search an email address on Pastebin via Google, the results often include paste dumps. Clicking on these links will take you to the original paste document, which likely has many additional compromised credentials. A query of test@test.com on this site will likely present hundreds of data sets ready for download.

Any time I see on HIBP that a new public leak has surfaced, I search for that specific data with a custom Google search. If I saw that Myspace has been exposed, I would use the following.

"myspace" ext:rar OR ext:zip OR ext:7z OR ext:txt OR ext:sql

Now, assume that you possess the databases, our first step is to query the email, username, name or whatever you have through all the databases you have acquired.

If we notice that the target used a password on a site, see should next conduct a search of that password to see if it is used anywhere else.

Data Hunting

Most of these sites comes and go very frequently.

Known Data Breaches

- COMB

"Compilation Of Many Breaches", otherwise known as "COMB".

There are numerous other combo lists released in Anti-Public, Exploit.in, and others.

site:anonfiles.com "CompilationOfManyBreaches.7z"

https://twitter.com/BubbaMustafa/status/1370376039583657985 —> Password: "+w/P3PRqQQoJ6g"

COMB includes a fast search option:

./query michaelbazzell@gmail.com

This presents limitations. You cannot use this tool to search a specific domain or password. For that we will once again rely on Ripgrep.

- hashes.org

The entire archive of hashes and passwords which previously existed on hashes.org is available as a torrent file.

90 GB compressed download:

https://pastebin.com/pS5AQNVOhttps://old.reddit.com/r/Data/oarder/comments/ohlcye/hashesorg_archives_of_all_cracked_hash_lists_up

- HashMob (hashmob.net)

Paid search service and free downloads of lists similar to those previously found at Hashes.org.

These files are very similar to the Hashes.org data, but updated often.

To download all:

wget --content-disposition -i https://inteltechniques.com/data/hashlists.txt

- BTDig (btdig.com)

Go to this site and search for "public_ “.

241 GB of data titled "MyCloud".

- Archive.org Breach Data

Search on their site for

"nulled io" and click the "nulled.io _database _dump".

10 GB database from the Nulled hacking forum.

You could spend weeks on Archive.org identifying breach data.

Open Databases

- Elasticsearch Databases

The best tool to find this data is Shodan (shodan.io), you will need to be logged in to a free or paid account to conduct these queries.

Elasticsearch databases are extremely easy to access.

product:elastic port:9200 [target data]

Clicking the red square with arrow next to the redacted IP address connects to. the database within your browser in a new tab. This confirms that the database is online and open.

http://34.80.1.1:9200/_cat/indices?v

http://34.80.1.1:9200/bank/_search?size=10000

A Python script which parses through all results and stotes ever record is most appropriate —> Elasticsearch Crawler (github.com/AmIJesse/ Elasticsearch-Crawler), This will present several user prompts to enter the target IP address, index name, port number, and fields to obtain.

Online Data Search Sites

Remember that all of these sites come and go quickly.

- Underground Forums

  • BreachForums (breached.to / breached.vc / bf.hn)

  • People Data Labs: I mentioned this phenomenal resource previously in this book. In 2019, a security researcher claimed to have scraped an open database of over 1.2 billion PDL records including names, locations, and social network content. This data set is still floating around, and it is valuable.

  • Pipl: Similar to the previous example, a reseatcher claimed to extract 50 million profiles in 2019 from the people search site Pipl, containing full names, email addresses, vehicles owned, phone numbers and home addresses. Every week, I see someone offering a full copy for download.

  • IntelligenceX: I discussed this paste-scraping service in previous chapters. In 2021, a disgruntled customer "scraped their scrapes". Today, the 80,000 copied documents have been merged into one file possessing sensitive data associated with 46 million email addresses.

  • LiveRamp: In 2021, data allegedly sourced from LiveRamp (formerly Acxiom) was leaked. The data contains extensive information on most people living in the U.S., including home addresses, cellular telephone numbers, email addresses, names and more. It is commonly traded on Telegram, but the original source is unknown.

  • White Pages: Someone breached the White Pages site and copied 11 million profiles with user's names, email addresses, and passwords. It has been repackaged for years and offered as "new".

  • Verifications.io: This "enterprise email validation" service exposed their MongoDB database containing 763 million records including email addresses, names, genders, IP addresses, phone numbers and other personal information. It is widely available as a 150 GB download

- Breach Search Resources

Each specific section has this breach data resources for each specific task explained. However, here is a summary with direct URLs:

Email Address (test@test.com)

https://haveibeenpwned.com/unifiedsearch/test@test.com

https://dehashed.com/search?query=test@test.com

https://psbdmp.ws/api/search/test@test.com

https://portal.spycloud.com/endpoint/enriched-stats/test@test.com

https://check.cybernews.com/chk/?lang=en_US&e=test@test.com

https://intelx.io/?s=test@test.com

Username (test):

https://haveibeenpwned.com/unifiedsearch/test

https://dehashed.com/search?query=test

https://psbdmp.ws/api/search/test

Domain inteltechniques.com:

https://dehashed.com/search?query=inteltechniques.com

https://psbdmp.ws/api/search/inteltechniques.com

https://intelx.io/?s=inteltechniques.com

Telephone (6185551212):

https://dehashed.com/search?query=6185551212

IP Address (1.1.1.1):

https://dehashed.com/search?query=1.1.1.1

https://psbdmp.ws/api/search/1.1.1.1

https://intelx.io/?s=1.1.1.1

Name (Michael Bazzell):

https://dehashed.com/search?query=michaelBazzell

https://psbdmp.ws/api/search/Michael%20Bazzell

Password (password1234):

https://dehashed.com/search?query=password1234

https://psbdmp.ws/api/search/password1234

https://www.google.com/search?q=password1234

Hash (DC87B9C894DA5168059E00EBFFB9077):

https://hash.ziggi.org/api/dehash.get?hash=BDC87B9C894DA5168059E00EBFFB9077&include_external_db

https://decrypt.tools/client-server/decrypt?type=md5&string=BDC87B9C894DA5168059E00EBFFB9077

https://md5.gromweb.com/?md5=BDC87B9C894DA5168059E00EBFFB9077

https://www.nitrxgen.net/md5db/BDC87B9C894DA5168059E00EBFFB9077

https://dehashed.com/search?query=BDC87B9C894DA5168059E.00EBFFBO077®

https://www.google.com/search?q=BDC87B9C894DA5168059E00EBFFB9077

Miscellaneous Sites:

LeakPeek (leakpeek.com)

Beach Directory (breachdirectory.org)

Advanced Breaches Tools

- h8mail

Combine many of the breach services.

It should never take the place of a full manual review.

To install it:

  • mkdir ~/Downloads/Programs/h8mail

  • cd ~ Downloads/Programs/h8mail

  • python3 -m venv h8mailenvironment

  • source h8mailEnvironment/bin/activate

  • sudo pip install -U h8mail

  • deactivate

  • cd ~/Downloads && h8mail -g

  • sed -i 's/\;leak\-lookup\_pub/leak\-lookup\_pub/g' h8mail_config.ini

To use it:

h8mail -t <target email address>

Next, you can provide API keys from services such as Snusbase, WeLeakInfo, Leak-Lookup, HavelBeenPwned, Emailrep, Dehashed, and Hunterio will provide MANY more results, but these services can be quite expensive.

To create a configuration file and provide API keys to the databases we want to search. To create a configuration file, enter; python3 h8mail -g

- Custom breaches-leaks.sh script

Combines h8mail, elastic searches and hash searches.

#!/usr/bin/env bash
opt1="Elasticsearch Tool"
opt2="Search That Hash"
opt3="Name That Hash"
opt4="H8Mail"
leaksmenu=$(zenity  --list  --title "Breaches/Leaks Tool" --radiolist  --column "" --column "" TRUE "$opt1" FALSE "$opt2" FALSE "$opt3" FALSE "$opt4" --height=400 --width=300) 
case $leaksmenu in
$opt1 )
mkdir ~/Documents/Elasticsearch/
cd ~/Downloads/Programs/Elasticsearch-Crawler/
ip=$(zenity --entry --title "IP Address" --text "Enter target IP address")
index=$(zenity --entry --title "Index" --text "Enter target index" )
fields=$(zenity --entry --title "Fields" --text "Enter desired data fields (separated by space)")
python3 crawl.py $ip 9200 $index $fields > ~/Documents/Elasticsearch/$ip.txt
open ~/Documents/Elasticsearch/
exit;;
$opt2 )
hash=$(zenity --entry --title "Hash" --text "Enter Hash")
sth --text $hash
read -rsp $'Press enter to continue...\n'
exit;;
$opt3 )
hash=$(zenity --entry --title "Hash" --text "Enter Hash")
nth --text $hash
read -rsp $'Press enter to continue...\n'
exit;;
$opt4 )
mkdir ~/Documents/H8Mail/
email=$(zenity --entry --title "Email Address" --text "Enter target email address")
h8mail -t $email -c ~/Downloads/h8mail_config.ini -o ~/Documents/H8Mail/$email.txt
open ~/Documents/H8Mail/$email.txt
exit;;
esac

- IntelTechniques Breaches & Leaks Tool

This tool combines most of the online search options.

Code at Breaches.html.

- Git Leaks

To enumerate leaked credentials for AWS or GCP within github repositories:

gitleaks --repo-url=https://github.com/example/production -v

Similar tools for github leaks:

https://github.com/gitleaks/gitleaks

https://github.com/eth0izzle/shhgit

https://github.com/michenriksen/gitrob

- JS Leaks

To find leaked credentials within the target website javascript files:

https://github.com/m4ll0k/SecretFinder/tree/master

cat subdomains_alive.txt | gau > params.txt

cat params.txt | uro -o filtered-params.txt

cat filtered-params.txt | grep ".jsp$" > jsfiles.txt

cat jsfiles.txt | uro | anew jsfiles.txt

cat jsfiles.txt | while read url; do python3 SecretFinder.py -i $url -o cli >> secret.txt

- TruffleHog

Monitor Git, Jira, Slack, Confluence, Microsoft Teams, Sharepoint, and more.. for credentials:

https://github.com/trufflesecurity/trufflehog

Stealer Logs

The most common stealer logs we find are labeled as Raccoon Stealer, Redline Stealer, and Vidar Stealer. The groups which offer data stolen via these programs is limitless. A new group pops up every week.

Your browser has likely asked you if you would like to store information which was entered into an online form. If you allow your browser to store this data, stealer logs easily collect it into their systems.

They possess the cookies from your browser, also, that files offers immediate access to the priority domains which exist in the overall record, parses data from the stored form fields, identifies all installed browsers and versions, presents all applications installed within the machine, presents all of the passwords stored within all browsers, take screen captures of the victim's machine at the time of infection and displays general details about the system, including the victim's IP address, hardware, location, and date.

The following Google queries might present interesting information, but most results lead to shady criminal marketplaces:

"stealer logs" "download"

"stealer logs" "Azorult"

"stealer logs" "Vidar"

"stealer logs" "Redline"

"stealer logs" "Raccoon"

- Telegram (telegram.org)

Telegram is quite possibly the best resource for Stealer Logs.

sudo apt install telegram-desktop

There are dozens of Telegram channels which offer free stealer logs in order to promote their paid services, we can search them with the services presented in the “Social Media > Telegram” section, with the oficial telegram search field, or through google searches, examples:

site:breached.vc "mega.nz" "stealer log"

site:breached.vc "anonfiles.com" "stealer log"

site:anonfiles.com "logs" site:mediafire.com "logs"

Known Telegram stealer log rooms:

"rubancloudfree"

"Ruban Private @rubanowner.rar"

Another Telegram channel is called "redlogscloud”.

Other groups: Logs Cloud, Keeper Cloud, Luffich Cloud, Crypton Logs, Bugatti Cloud, Ruban Cloud, Luxury Logs, Wild Logs, Eternal Logs, Logs Arthouse, HUBLOGS, Redlogs Cloud, OnelOgs, Bank, Observer Cloud, Expert Logs, BradMax, and Cloud Logs.

- Data Cleansing

Most people find the email addresses, usernames, and passwords to be the most valuable part of this data. While the screen captures, documents, and cookies are beneficial, they take up the most space.

Instead of saving the password files, let's merge them all into one file.

To navigate to each "passwords.txt file; extract all of the text data; compile it into one file; and ignore the case of the file in order to extract data from both "passwords.txt" and "Passwords.txt":

find . -iname 'passwords.txt' -exec cat {} \; > ~/Downloads/Passwords.txt

find . -iname 'passwords*' -exec cat {} \; > ~/Downloads/Passwords.txt

You could replicate this for the " Important Autofills" files with the following:

find . -iname 'importantautofills.txt' -exec cat {} \; > ~/Downloads/ImportantAutofills.txt

We can also create an ever-growing Passwords. txt file which can be quickly queried for times:

timestamp=$(date +%Y-%m-%d) && find . -iname 'passwords.txt' -exec cat {} \; > ~/Downloads/Passwords.$timestamp.txt

Ransomware

The first step is to identify the current URLs for the vatious ransomware groups.

Some resources:

http://ransomwr3tsydeii4q43vazm/wofla5ujdajquitomtd47cxjtfgwyyd.onion/

https://github.com/fastfire/deepdarkCTI/blob/main/ransomware_gang.md

https://darkfeed.io/ransomgroups/

Next, Google should help you find any other pages valuable to ransomware investigations. Queries:

"onion" "Ragnar" "ransomware" "url"

"onion" "REvil" "ransomware" "url"

"onion" "Conti" "ransomware" "url"

"onion" "Vice Society" "ransomware" "url"

"onion" "Clop" "ransomware" "url"

"onion" "Nefilim" "ransomware" "url"

"onion" "Everest" "ransomware" "url"

Hack Notice often announce ransomware publications:

site:https://app.hacknotice.com "onion"

Clicking the title of the article presents the Tor URL which may display the stolen data. Clicking "View Original Source” on this new page will attempt to open the Tor URL in your browser.

Hashes

Most websites and a lot of data sets stores passwords in hashed format.

To identify and crack these hashes, check "Password Cracking" section.

If we possess a database with hashed passwords and we want to search a password, we can invert the process and convert a password into the hashed version of the database.

Some resources to convert a password into common hashes:

https://passwordsgenerator.net/md5-hash-generator/

https://passwordsgenerator.net/shal-hash-generator/

https://passwordsgenerator.net/sha256-hash-generator/

Data Cleansing

It will not take long for your own data collection to fill your drive space.

- Cheatsheet of cleansing commands

Replace "OLD" with "NEW": sed -i 's/OLD/NEW/g' data.txt

Replace all commas with a hyphen: sed -i 's/\,/\-/g' data.txt

Replace all tabs with a comma: sed -i 's/[strike ctrl-v-tab]/\,/g' data.txt

Remove all data until the first comma: sed -i 's/^\([^,]*,\)//g' data.txt

Remove all data until the first colon: sed -i 's/^[^:]*://g' data.txt

Remove all single quotes: sed -i ' s/\'//g' data.txt

Remove all double quotes: sed -i 's/\"//g' data.txt

Remove "junk": sed -i 's/junk//g' data.txt

Remove all between "FIRST" & "THIRD": sed -i 's/\(FIRST\).*\(THIRD\)/\1\2/' data.txt

Remove all digits between commas: sed -i 's/\,[0-9]*\,//g' data.txt

Remove any line beginning with "TEST": sed -i '/^TEST/d' data.txt

Remove any line not containing "@": awk '/@/' data.txt > newfile.txt

Remove empty lines: "@": sed -i "' '/^$/d" data.txt

Remove first 10 lines: sed -i '1,10d' data.txt

Remove first ten characters: sed -i 's/^.\{10\}//' data.txt

Remove everything after the last "_": sed -i "s/_[^_]*$//" data.txt

Remove 0000-00-00 00:00:00 : sed -i 's/[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9]//g' data.txt

Remove duplicate rows: awk '!seen[$0]++' data.txt > newfile.txt

Remove duplicate rows and sort: LC_ALL=C sort -u -b -i -f data.txt > newfile.txt

Remove data between "{" and "}": sed -i 's/{[^}]*}//g' data.txt

Extract Emails: grep -E -o "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" < data.txt > newfile.txt

Split large file into multiple files: split -l 200000000 data.txt 1

Display total lines in a file: wc -l data.txt

Cut columns 1, 2, and 6: LANG=C cut -d, -f1,2,6 data.txt > newfile.txt

Remove hyphens from phone numbers: sed -i 's/\([0-9]\{3,\}\)-/\1/g' data.txt

Cut columns from JSON: jq --raw-output '"\(.email)\(.password),"' data.json > newfile.txt

- Duplicate Files

If you collect enough breach data, you will likely find many duplicate files with different names.

Install fdupes:

sudo apt-get install fdupes

The next command launches fdupes, scans recursive files within the Downloads directory, and prompts you to select which duplicate file to keep:

fdupes -r -d ~/Downloads

Last updated