Wordlist Building

We'll start with a base of common dictionary words. We need to look for wordlists containing names, places, seasons, bands, cities, and the like, as well as words related to the target.

Crackstation (https://crackstation.net/) has one of the most complete and diverse wordlists available for free online.

We can use this wordlist as a base, and adapt it to our current target.

- wordcollector.py

We run this script to crawl the site and extract words to feed to our wordlist and then remove duplicates:

python wordcollector.py {target website} > wordlist.txt

sort -u wordlist.txt > scrapped_wordlist.txt | wc -1

- Other tools to generate a wordlist

CeWL https://digi.ninja/projects/cewl.php

A typical command line for using CeWL would be something like the following where the -m flag with a value of 8 will create a list of words with a minimum of 8 characters:

cewl -m 8 http://www.google.com

kwprocessor https://github.com/hashcat/kwprocessor

This is a utility for generating key-walk passwords, which are based on adjacent keys such as qwerty, 1q2w3e4r, 6yHnMjU7 and so on.

kwp64.exe basechars\custom.base keymaps\uk.keymap routes\2-to-10-max-3-direction-changes.route -o keywalk.txt

./kwp -z basechars/full.base keymaps/en-us.keymap routes/2-to-16-max-3-direction-changes.route > kwp3.txt

Some candidates will get generated multiple times, so you'll want to de-dup the list before using it for maximum efficiency (Select-String -Pattern "^qwerty$" -Path keywalk.txt -CaseSensitive).

Other tools: Rsmangler, MentalList

- Ruleset

The RockYou ruleset rockyou-3000.rule from hascat (https://github.com/hashcat/hashcat/blob/master/rules/rockyou-30000.rule) is a very good start, but it doesn't account for obvious combinations such as leet letters followed by numbers, special characters followed by numbers, capitalized first letters, and prepended numbers or special characters.

We can use this custom ruleset: https://github.com/sparcflow/HackLikeALegend/blob/master/corporate.rule

We also can harvest passwords previouslly leaked during major breaches to create a wordlist that needs no rule processing. These are , after all, real passwords that probably respected similar password complexity rules. This saves us a huge amounth of computing power. Berzek0's Real-Passwords GitHub repo (https://github.com/berzerk0/Probable-Wordlists/) lists close to 2 billion leaked passwords.

Last updated