Wordlist Building
Last updated
Last updated
We'll start with a base of common dictionary words. We need to look for wordlists containing names, places, seasons, bands, cities, and the like, as well as words related to the target.
Crackstation (https://crackstation.net/) has one of the most complete and diverse wordlists available for free online.
We can use this wordlist as a base, and adapt it to our current target.
We run this script to crawl the site and extract words to feed to our wordlist and then remove duplicates:
python wordcollector.py {target website} > wordlist.txt
sort -u wordlist.txt > scrapped_wordlist.txt | wc -1
CeWL https://digi.ninja/projects/cewl.php
A typical command line for using CeWL would be something like the following where the -m flag with a value of 8 will create a list of words with a minimum of 8 characters:
cewl -m 8 http://www.google.com
kwprocessor https://github.com/hashcat/kwprocessor
This is a utility for generating key-walk passwords, which are based on adjacent keys such as qwerty, 1q2w3e4r, 6yHnMjU7 and so on.
kwp64.exe basechars\custom.base keymaps\uk.keymap routes\2-to-10-max-3-direction-changes.route -o keywalk.txt
./kwp -z basechars/full.base keymaps/en-us.keymap routes/2-to-16-max-3-direction-changes.route > kwp3.txt
Some candidates will get generated multiple times, so you'll want to de-dup the list before using it for maximum efficiency (Select-String -Pattern "^qwerty$" -Path keywalk.txt -CaseSensitive).
Other tools: Rsmangler, MentalList
The RockYou ruleset rockyou-3000.rule from hascat (https://github.com/hashcat/hashcat/blob/master/rules/rockyou-30000.rule) is a very good start, but it doesn't account for obvious combinations such as leet letters followed by numbers, special characters followed by numbers, capitalized first letters, and prepended numbers or special characters.
We can use this custom ruleset: https://github.com/sparcflow/HackLikeALegend/blob/master/corporate.rule
We also can harvest passwords previouslly leaked during major breaches to create a wordlist that needs no rule processing. These are , after all, real passwords that probably respected similar password complexity rules. This saves us a huge amounth of computing power. Berzek0's Real-Passwords GitHub repo (https://github.com/berzerk0/Probable-Wordlists/) lists close to 2 billion leaked passwords.