Cracking Caesar Cipher

Visualising frequency analysis in encrypted messages using Caesar and Vigenère ciphers

Juan Rinconada
4 min readOct 21, 2019

--

Introduction

The Caesar cipher is a method of message encryption easily crackable using frequency analysis. To evade this analysis our secrets are safer using the Vigenère cipher. This is the advantage of using a polyalphabetic cipher over an affine monoalphabetic substitution cipher, in other words: the same letter is not always encrypted the same way.

To make sense of all of this estrange words I made a Python script that encrypts a massage using both Caesar and Vigenère ciphers and performs a letter frequency analysis plotting some pretty graphs to illustrate it all.

This articles tries to explain the whole process going through the code and the output. The complete project with the full code and usage instructions can be found here: https://github.com/jrinconada/cracking-caesar-cipher

Letter counting

Going through every letter in a message counting how many times it appears and saving it as a list of integer numbers.

Definitions

Usage

Output

Plotting letter count

Plotting a line with the letter count as the y coordinate and the alphabet as the x coordinate.

Definition

Usage

Output

Adding Caesar cipher

Adding another line to represent letter count in a message encrypted using Caesar cipher.

Notice that previous definitions are needed for this code to work. They are not included to avoid redundancy and keep the code cleaner.

Definition

Usage

Output

Adding Vigenère cipher

Adding yet another line to represent letter count in a message encrypted using Vigenère cipher.

Definition

Usage

Output

Plotting theoretical frequencies

Using a pie chart and a bar chart to display theoretical letter frequency in a pleasant way.

Definition

Usage

Output

Comparing theoretical vs actual frequencies

Plotting lines for theoretical and actual frequencies of a sample message and highlight the differences as green or red fills.

Definition

Usage

Output

Usage with a longer message

To illustrate how a longer message will give a letter count closer to the theoretical letter frequency.

Output

Now the differences (red and green regions) are smaller.

The first figure shows letter frequency for a short message and the second for a long one

Showing letter frequency for every letter

Every letter is represented as a square in a grid and the frequency is the color intensity of the square.

Definition

Usage with the longer message

The same long message from before is used, omitted for code clarity.

Output

--

--

Juan Rinconada

Software developer and teacher experienced in Android, iOS, C++, Raspberry Pi, Arduino, Python, Cobol…