Visualizing dataset of 2 million+ passwords:

Standard

I found a data-set of password(s) on DataScienceCentral: Password and hijacked email dataset for you to test your data science skills – And for fun, I played with the data-set for an hour or so:

1) Password Length vs Frequency

1 how to choose password password length

2) Percentage of passwords having at least one special character vs passwords having no special character:

2 passwords that have special character vs the one's that dont

3) Percentage of passwords that have: at-least one number, one alphabet & one special character AND length = 8 or more.

Answer: 1.4856%

Let’s see a comparison of Passwords of length 8 or more (69.302%) vs Passwords of length 8 or more having combination of alphabets & numbers & special characters (1.485%)

4 passwords having combination of alphabets plus numbers and special characters

That’s about it for now – it was fun!

 

And for those interested, here are the few behind the scene technical details:

Tools I used:

1. Excel & 2. SQL Server

Note: I first tried using Google refine to augment data – but it crashed on me. So thought of using SQL Server and TSQL. And if excel 2010 supported 2+ million then I would not have needed SQL server. Anyhow – the tool used is not important here.

Initial state:

2 million passwords in a .txt file.

Information I appended to the data-set using TSQL:

1. Length of password

2. Has Alphabets?

[a-zA-Z]

3. Has Numbers?

[0-9]

4. Has special Characters?

[^a-zA-Z0-9]

Plus few others derived from #2, #3 & #4 like ” has alphabets+ characters + special characters?”

That’s about it for the technical details. Ping me if interested!

 

Advertisements

5 thoughts on “Visualizing dataset of 2 million+ passwords:

  1. Ben Littenberg

    Were these hacked passwords or just passwords in general? If hacked, then the idea that the common characteristics in these strings are to be avoided is a good idea. But if these are passwords that served well, then we should emulate them.

    Like

Thank this author by sharing the article on social media. If you have any questions or comments, please leave a reply below:

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s