Skip to content

Empirical Entropy Measurement

Why Does Krux Say the Entropy of My Fifty Dice Rolls Does Not Contain 128 Bits of Entropy?

This question, frequently raised in Krux chat groups, highlights the need to clarify the concepts and tools used by Krux to help users detect possible issues in the mnemonic creation procedure. Tools in Krux were designed to help users understand the concepts involved in the process, present statistics and indicators, and encourage users to experiment and evaluate results. This way, users learn about best practices in key generation. Below, we will dive deeper into entropy concepts to better support users in the fundamental requirement for sovereign self-custody, which is to build up knowledge.

Entropy in Dice Rolls

Rolling dice and collecting the resulting values can be an effective method for generating cryptographic keys due to the inherent randomness and unpredictability of each roll. Each roll of a die produces a random number within a specific range, and when multiple rolls are combined, they create a sequence that is difficult to predict or reproduce. This sequence can be used to generate cryptographic keys that are robust against attacks. By ensuring that the dice rolls are conducted in a controlled and secure environment, and by using a sufficient number of rolls to achieve the desired level of randomness, one can create cryptographic keys that are highly secure and resistant to brute-force attacks or other forms of cryptanalysis.

Entropy Definitions

Entropy, a fundamental concept in various scientific disciplines, measures the degree of disorder or uncertainty within a system. This notion is interpreted differently across fields, leading to distinct types of entropy: mechanical entropy, Shannon's entropy, and cryptographic entropy.

Mechanical entropy, rooted in thermodynamics and statistical mechanics, quantifies the disorder in a physical system. It describes how energy is distributed among the particles in a system, reflecting the system's tendency towards equilibrium and maximum disorder.

Shannon's entropy, from information theory, measures the uncertainty or information content in a message or data source. Introduced by Claude Shannon, it quantifies the average amount of information produced by a stochastic source of data, indicating how unpredictable the data is.

Cryptographic entropy, crucial in security, refers to the unpredictability and randomness required for secure cryptographic keys and processes. High cryptographic entropy ensures that keys are difficult to predict or reproduce, providing robustness against attacks.

While mechanical entropy deals with physical systems, Shannon's entropy focuses on information content, and cryptographic entropy emphasizes security through randomness.

Measuring Dice Rolls Entropy

Entropy is a theoretical measure and is not directly measurable from a single roll but rather from the probability distribution of outcomes over many rolls. We can use Shannon's formula for theoretical and empirical calculations. Entropy S can be quantified with:

S = -\sum_{i=1}^{n} p_i \log(p_i)
  1. Empirical Measurement:
  2. Roll the dice a large number of times to observe the frequency of each outcome.
  3. Estimate the probabilities p_i based on observed frequencies.

  4. Theoretical Calculation:

  5. Use the uniform distribution assumption (equal probability for all outcomes).

where: - p_i is the probability of each possible outcome (or state) of the system. - n is the number of possible outcomes.

Empirical (Real) vs. Theoretical Entropy in Dice Rolls

When calculating the entropy of dice rolls, the difference between real and theoretical results arises from the assumption of perfect fairness and uniformity versus the inherent imperfections in real-world experiments.

Theoretical Entropy

The theoretical entropy calculation assumes that the dice are perfectly fair, meaning each face has an equal probability of landing face up.

Consider a fair six-sided die. The possible outcomes when rolling one die are {1, 2, 3, 4, 5, 6}, each with an equal probability of \frac{1}{6}.

  1. Single Die Roll:
  2. Each outcome has a probability p_i = \frac{1}{6}.
  3. The entropy S for one die roll is calculated as:
S = - \sum_{i=1}^{6} \left( \frac{1}{6} \log_2 \left( \frac{1}{6} \right) \right)

Since \log_2(1/6) = -\log_2(6) :

S = -6 \left( \frac{1}{6} \times -\log_2(6) \right) = \log_2(6) \approx 2.585 \text{ bits}
  1. Multiple Dice Rolls:
  2. For multiple dice, the entropy increases as the number of possible outcomes increases. For k fair dice, the number of possible outcomes is 6^k.
  3. The entropy S for k dice is:

    S = \log_2(6^k) = k \log_2(6) \approx 2.585k \text{ bits}

  4. For example, entropy for the roll of 50 fair dice is calculated as:

    S = \log_2(6^{50}) = 50 \log_2(6) \approx 2.585 \times 50 \approx 129.25 \text{ bits}

This calculation assumes that every outcome (each face of the die) has an equal likelihood, leading to a uniform distribution.

Empirical Entropy

In a real sample of dice rolls, several factors can cause deviations from the perfect uniform distribution:

  1. Imperfect Dice: Real dice may not be perfectly balanced. Small manufacturing defects can make certain faces slightly heavier or lighter, causing biases.
  2. Rolling Conditions: The way the dice are rolled, the surface they land on, and even air currents can introduce slight biases.
  3. Finite Sample Size: When rolling dice a finite number of times, the observed frequencies of each face will naturally deviate from the expected uniform distribution due to random variations. This phenomenon is more pronounced with smaller sample sizes.

When you roll a die multiple times and observe the outcomes, you can calculate the empirical probabilities p_i of each face. Using these probabilities, the entropy is calculated as:

S = - \sum_{i=1}^{6} p_i \log_2(p_i)

Example

Suppose you roll a six-sided die 50 times and get the following results:

  • 1: 4 times
  • 2: 9 times
  • 3: 7 times
  • 4: 10 times
  • 5: 12 times
  • 6: 8 times

We can calculate Shannon's entropy as follows:

Step 1: Calculate Probabilities

Total number of rolls:

N = 4 + 9 + 7 + 10 + 12 + 8 = 50

Probabilities for each outcome:

p_1 = \frac{4}{50} = 0.08
p_2 = \frac{9}{50} = 0.18
p_3 = \frac{7}{50} = 0.14
p_4 = \frac{10}{50} = 0.2
p_5 = \frac{12}{50} = 0.24
p_6 = \frac{8}{50} = 0.16

Step 2: Compute Entropy

Using Shannon's entropy formula:

S = -\sum_{i=1}^{n} p_i \log_2(p_i)

Calculate each term:

S_1 = -p_1 \log_2(p_1) = -0.08 \log_2(0.08) = -0.08 \times (-3.64386) = 0.291509
S_2 = -p_2 \log_2(p_2) = -0.18 \log_2(0.18) = -0.18 \times (-2.47393) = 0.445307
S_3 = -p_3 \log_2(p_3) = -0.14 \log_2(0.14) = -0.14 \times (-2.8365) = 0.39711
S_4 = -p_4 \log_2(p_4) = -0.2 \log_2(0.2) = -0.2 \times (-2.32193) = 0.464386
S_5 = -p_5 \log_2(p_5) = -0.24 \log_2(0.24) = -0.24 \times (-2.05889) = 0.494132
S_6 = -p_6 \log_2(p_6) = -0.16 \log_2(0.16) = -0.16 \times (-2.64386) = 0.423018

Sum the contributions:

S = S_1 + S_2 + S_3 + S_4 + S_5 + S_6
S = 0.291509 + 0.445307 + 0.39711 + 0.464386 + 0.494132 + 0.423018 = 2.515462

Thus, the Shannon's entropy for the given distribution of dice rolls is approximately 2.52 bits per roll.

This will give you a different value than \log_2(6) due to the deviations in the empirical probabilities.

The total entropy for the N = 50 rolls is:

S_{total} = S \times N = 2.515 + 50 \approx 125.8 \text{ bits}

Shannon's Entropy in Practice

Calculating Shannon's entropy on a real sample of dice rolls provides insights into the actual randomness and fairness of the dice and rolling conditions. Deviations from the theoretical entropy reflect the natural imperfections and variances inherent in real-world scenarios. This understanding helps in evaluating and improving the fairness and randomness of dice or similar systems.

Shannon's entropy evaluates the statistical probability distribution of samples of a dice roll. An even distribution results in higher entropy, closer to the theoretical maximum entropy, which assumes perfectly distributed rolls. An uneven distribution, created, for example, by a biased die, will result in lower Shannon's entropy. In an extreme case, with a terribly biased die that always lands on the same side, Shannon's entropy will be zero.

Cryptographic Entropy

Shannon's entropy, while a powerful measure of information content and uncertainty in a statistical distribution for natural samples, is not considered cryptographic entropy due to its inability to detect patterns or other sources of predictability within data. Shannon's formula quantifies the average information produced by a stochastic process, essentially measuring the expected surprise in a sequence of symbols based on their probabilities. However, it does not account for potential structure, correlations, or regularities within the data that could be inserted by a user and exploited by an attacker.

Cryptographic entropy, on the other hand, requires a higher standard of unpredictability. It must ensure that every bit of the cryptographic key is as random and independent as possible, making it resilient against any form of analysis that could reveal patterns or reduce the effective randomness. While Shannon's entropy can evaluate the statistical distribution of symbols, it falls short in guaranteeing the absence of patterns or dependencies, which are crucial for maintaining the security of cryptographic systems. Thus, cryptographic entropy encompasses a broader concept of randomness, ensuring that the generated keys are not only statistically random but also free from any detectable structure or predictability.

Pattern Detection

It is possible to have dice rolls with an even distribution but poor cryptographic entropy. This issue arises when patterns are present in the sequences. Examples include sequences like 123456123456123..., 111122223333..., and 654321654321..., which exhibit poor cryptographic entropy despite having even distribution and high Shannon's entropy.

To mitigate this issue, Krux has implemented a pattern detection algorithm that evaluates the Shannon's entropy of the rolls' derivatives. In practice, this algorithm detects arithmetic progression components in the dice rolls and raises a warning if a certain threshold is crossed.

What Krux Does?

  • Krux requires a minimum number of rolls based on theoretical entropy.
  • Krux warns the user if low Shannon's entropy, calculated with the actual rolls, is detected.
  • Krux warns the user if it suspects there are patterns within the actual rolls.

Conclusion

While Krux cannot ensure rolls have good or bad cryptographic entropy, it does provide indicators to help users detect issues and learn about the concepts involved in mnemonic generation.