30 Python Math and Statistics Exercises with Solutions

This article walks you through 30 exercises covering Python’s built-in math and statistics modules. Topics include core mathematical functions like sqrt, factorial, ceil, floor, gcd, and log; trigonometry and floating-point precision; and statistical measures such as mean, median, mode, standard deviation, variance, quartiles, correlation, geometric mean, harmonic mean, and the NormalDist class

Each coding challenge includes a Practice Problem, Hint, Solution code, and detailed Explanation, ensuring you don’t just copy code, but genuinely practice and understand how and why it works.

All solutions have been fully tested on Python 3.
Use our Online Code Editor to solve these exercises in real time.
Also, Solve Python Exercises: 29 topic-wise exercises with over 800+ coding questions

Let us know if you have any alternative solutions. It will help other developers.

+ Table of Contents (30 Exercises)

Exercise 1: Square Root and Power
Exercise 2: Area of a Circle
Exercise 3: Factorial Computation
Exercise 4: Ceiling and Floor
Exercise 5: Greatest Common Divisor
Exercise 6: Hypotenuse of a Right Triangle
Exercise 7: Natural and Base-10 Logarithms
Exercise 8: Degrees to Radians, Sine and Cosine
Exercise 9: Combinations and Permutations
Exercise 10: Floating-Point Precision with math.isclose()
Exercise 11: nth Fibonacci Using the Golden Ratio
Exercise 12: Verifying exp() and log() Are Inverses
Exercise 13: Degrees to DMS Converter
Exercise 14: Perfect Square Checker Using math.isqrt()
Exercise 15: Product of All Primes Below 30
Exercise 16: Mean, Median, and Mode of Exam Scores
Exercise 17: Sample vs Population Standard Deviation
Exercise 18: Variance of Temperatures
Exercise 19: median_low() and median_high()
Exercise 20: Finding All Modes with multimode()
Exercise 21: Quartiles with statistics.quantiles()
Exercise 22: Harmonic Mean of Internet Speeds
Exercise 23: Geometric Mean for Compound Growth Rate
Exercise 24: Covariance Between Study Hours and Test Scores
Exercise 25: Pearson Correlation Coefficient
Exercise 26: Probability with NormalDist
Exercise 27: Distribution Overlap Between Two Models
Exercise 28: Grading Curve with NormalDist.from_samples()
Exercise 29: Linear Regression to Predict Scores
Exercise 30: Z-Score Outlier Detector Combining NormalDist and quantiles()

Exercise 1: Square Root and Power

Problem Statement: Write a Python program that uses math.sqrt() to find the square root of 144 and math.pow() to compute 2 raised to the power of 10, then prints both results.

Purpose: Square roots and powers appear constantly in geometry, physics simulations, financial modelling, and algorithm complexity analysis. This exercise introduces the two most fundamental functions in the math module and highlights the difference between math.pow() and Python’s built-in ** operator in terms of return type and precision.

Given Input: number = 144 and base, exponent = 2, 10

Expected Output: Square root of 144: 12.0 and 2 ^ 10 = 1024.0

▼ Hint

Import the math module and call math.sqrt(144) to get the square root as a float.
Call math.pow(2, 10) to compute the power. Note that it always returns a float, unlike the ** operator which returns an int when both operands are integers.
Try comparing math.pow(2, 10) with 2 ** 10 and print both types using type() to see the difference.

▼ Solution & Explanation

import math

number = 144
base, exponent = 2, 10

sqrt_result = math.sqrt(number)
pow_result  = math.pow(base, exponent)

print(f"Square root of {number}: {sqrt_result}")
print(f"{base} ^ {exponent} = {pow_result}")

# Comparison: math.pow() vs ** operator
print(f"\nmath.pow(2, 10) -> {math.pow(2, 10)} | type: {type(math.pow(2, 10))}")
print(f"2 ** 10         -> {2 ** 10}        | type: {type(2 ** 10)}")Code language: Python (python)

Explanation:

math.sqrt(number): Returns the square root of the given number as a float. It is defined only for non-negative values – passing a negative number raises a ValueError. For complex-number square roots, use cmath.sqrt() from the cmath module instead.
math.pow(base, exponent): Always returns a float regardless of the input types, making it consistent for floating-point pipelines. In contrast, the ** operator returns an int when both operands are integers (2 ** 10 gives 1024, not 1024.0), which can cause unexpected type mismatches downstream.
When to use which: Use math.pow() when working in a floating-point context or interfacing with C-level math functions. Use ** for integer exponentiation, large integer arithmetic, or when you need exact integer results without floating-point rounding.

Exercise 2: Area of a Circle

Problem Statement: Write a Python program that calculates the area of a circle with radius 7 using math.pi, and also computes the circumference, printing both results rounded to two decimal places.

Purpose: Circle geometry is foundational in graphics, physics engines, signal processing, and engineering calculations. Using math.pi instead of a hardcoded approximation like 3.14159 gives the highest precision available in floating-point arithmetic and makes the code’s intent immediately clear to any reader.

Given Input: radius = 7

Expected Output: Area: 153.94 and Circumference: 43.98

▼ Hint

The formula for area is math.pi * radius ** 2 and for circumference is 2 * math.pi * radius.
Use round(value, 2) or the :.2f format specifier to print results to two decimal places.
Print math.pi directly to see its full precision value stored by Python.

▼ Solution & Explanation

import math

radius = 7

area          = math.pi * radius ** 2
circumference = 2 * math.pi * radius

print(f"math.pi value  : {math.pi}")
print(f"Radius         : {radius}")
print(f"Area           : {area:.2f}")
print(f"Circumference  : {circumference:.2f}")Code language: Python (python)

Explanation:

math.pi: A constant defined in the math module with the value 3.141592653589793, which is the most precise representation of pi available in a 64-bit floating-point number. Using it instead of a manually typed approximation eliminates rounding errors introduced at the source of the calculation.
math.pi * radius ** 2: Python’s operator precedence evaluates radius ** 2 first (exponentiation before multiplication), so no parentheses are needed here. The result is the area formula A = πr² applied directly.
:.2f format specifier: Formats the float to exactly two decimal places in the output string. This is the standard way to display measurements and monetary values cleanly without altering the underlying precision of the variable itself.

Exercise 3: Factorial Computation

Problem Statement: Write a Python program that uses math.factorial() to compute 10! and verifies that the result equals 3628800.

Purpose: Factorials are central to combinatorics, probability theory, permutation and combination calculations, and Taylor series expansions. Using math.factorial() is significantly faster and safer than writing a manual loop or recursive function, especially for large values where Python’s arbitrary-precision integers shine.

Given Input: n = 10

Expected Output: 10! = 3628800 and Verification passed: True

▼ Hint

Call math.factorial(n) and store the result, then compare it with 3628800 using ==.
math.factorial() accepts only non-negative integers – passing a float or a negative number raises a ValueError.
Try computing factorials for a range of values (0 through 10) in a loop to see how rapidly the numbers grow.

▼ Solution & Explanation

import math

n        = 10
expected = 3628800

result = math.factorial(n)

print(f"{n}! = {result}")
print(f"Verification passed: {result == expected}")

# Bonus: factorial growth table
print("\nFactorial growth table:")
for i in range(0, 11):
    print(f"  {i:2d}! = {math.factorial(i):,}")Code language: Python (python)

Explanation:

math.factorial(n): Returns the exact integer result of n! (n × (n-1) × … × 1). It returns an int, not a float, which means it benefits from Python’s arbitrary-precision integer arithmetic and produces exact results for any non-negative integer, no matter how large.
result == expected: Because math.factorial() returns an exact integer, the equality check is reliable with no floating-point rounding concern. This makes it safe to use in unit tests and verification logic where exact results are required.
Factorial growth: The growth table reveals how explosively factorials increase. 0! and 1! are both 1 by definition, and the values roughly multiply by their index at each step – 10! is already over 3.6 million. This rapid growth is why factorials appear in complexity analysis of brute-force combinatorial algorithms.

Exercise 4: Ceiling and Floor

Problem Statement: Write a Python program that uses math.ceil() and math.floor() to find the ceiling and floor of 4.7, then demonstrates the same functions on negative numbers to reveal their behaviour around zero.

Purpose: Ceiling and floor rounding are essential in pagination logic (how many pages to display), resource allocation (how many containers to provision), pricing (always rounding up to the nearest cent), and grid snapping in graphics. Understanding how they behave with negative numbers prevents a common off-by-one class of bugs.

Given Input: value = 4.7 and negative_value = -4.7

Expected Output: ceil(4.7) = 5, floor(4.7) = 4, ceil(-4.7) = -4, floor(-4.7) = -5

▼ Hint

math.ceil(x) returns the smallest integer greater than or equal to x. Think of it as always rounding up toward positive infinity.
math.floor(x) returns the largest integer less than or equal to x. Think of it as always rounding down toward negative infinity.
Both functions return an int in Python 3, unlike in Python 2 where they returned a float. Verify this with type(math.ceil(4.7)).

▼ Solution & Explanation

import math

value          =  4.7
negative_value = -4.7

print(f"Value: {value}")
print(f"  ceil ({value})  = {math.ceil(value)}")
print(f"  floor({value})  = {math.floor(value)}")

print(f"\nValue: {negative_value}")
print(f"  ceil ({negative_value}) = {math.ceil(negative_value)}")
print(f"  floor({negative_value}) = {math.floor(negative_value)}")

# Practical example: pages needed to display n items
items_per_page = 10
for total_items in [23, 30, 31]:
    pages = math.ceil(total_items / items_per_page)
    print(f"\n{total_items} items at {items_per_page} per page: {pages} page(s)")Code language: Python (python)

Explanation:

math.ceil(4.7) returns 5: Ceiling always moves toward positive infinity, so 4.7 rounds up to 5. For -4.7, moving toward positive infinity means rounding up to -4, not -5 – a result that surprises many developers who expect it to mirror the positive-number behaviour symmetrically.
math.floor(4.7) returns 4: Floor always moves toward negative infinity, so 4.7 rounds down to 4. For -4.7, moving toward negative infinity means rounding down to -5. This is distinct from truncation: int(-4.7) truncates toward zero and gives -4, whereas math.floor(-4.7) gives -5.
Pagination example: math.ceil(total_items / items_per_page) is the canonical way to calculate how many pages are needed. For 31 items at 10 per page, plain integer division gives 3 (missing the last page), while math.ceil(31 / 10) correctly gives 4.

Exercise 5: Greatest Common Divisor

Problem Statement: Write a Python program that uses math.gcd() to find the greatest common divisor of 48 and 180, then uses it to reduce the corresponding fraction to its simplest form.

Purpose: The GCD is a foundational operation in number theory with practical applications in fraction simplification, cryptography (RSA key generation), scheduling (finding common cycle lengths), and screen resolution scaling. This exercise shows both the direct use of math.gcd() and a concrete real-world application of its result.

Given Input: a = 48 and b = 180

Expected Output: GCD(48, 180) = 12 and 48/180 simplified = 4/15

▼ Hint

Call math.gcd(a, b) to get the greatest common divisor as a positive integer.
Divide both the numerator and denominator by the GCD to reduce the fraction: a // gcd and b // gcd.
In Python 3.9 and later, math.gcd() accepts more than two arguments, so you can find the GCD of a list of numbers in a single call.

▼ Solution & Explanation

import math

a = 48
b = 180

gcd = math.gcd(a, b)
print(f"GCD({a}, {b}) = {gcd}")

# Simplify the fraction a/b
numerator   = a // gcd
denominator = b // gcd
print(f"{a}/{b} simplified = {numerator}/{denominator}")

# Bonus: LCM using the GCD relationship: lcm(a,b) = a*b // gcd(a,b)
lcm = abs(a * b) // gcd
print(f"LCM({a}, {b}) = {lcm}")

# Python 3.9+: math.lcm() is available directly
if hasattr(math, "lcm"):
    print(f"math.lcm({a}, {b}) = {math.lcm(a, b)}")

# Multi-number GCD (Python 3.9+)
numbers = [48, 180, 60, 240]
if hasattr(math, "gcd") and math.gcd.__code__.co_varnames:
    multi_gcd = math.gcd(*numbers)
    print(f"GCD{tuple(numbers)} = {multi_gcd}")Code language: Python (python)

Explanation:

math.gcd(a, b): Returns the largest positive integer that divides both a and b without a remainder. For gcd(48, 180), the result is 12 because 12 is the largest number that divides both evenly (48 / 12 = 4 and 180 / 12 = 15).
Fraction simplification: Dividing both the numerator and denominator by their GCD reduces a fraction to its lowest terms. Using integer floor division // ensures the result is an int, not a float, which is appropriate for fraction representation.
LCM relationship: The Least Common Multiple can be derived from the GCD using the identity lcm(a, b) = |a × b| / gcd(a, b). Python 3.9 introduced math.lcm() as a direct function. Both LCM and GCD together solve a wide range of scheduling and synchronisation problems.

Exercise 6: Hypotenuse of a Right Triangle

Problem Statement: Write a Python program that computes the hypotenuse of a right triangle with legs 3 and 4 using math.hypot(), then verifies the result against the manually computed Pythagorean formula.

Purpose: Euclidean distance – the straight-line distance between two points – is the hypotenuse formula in disguise. It is used constantly in game development, mapping applications, robotics path planning, machine learning distance metrics, and physics simulations. math.hypot() is numerically safer than the manual formula because it avoids overflow and underflow for very large or very small values.

Given Input: a = 3 and b = 4

Expected Output: Hypotenuse: 5.0 and Manual formula result: 5.0

▼ Hint

Call math.hypot(a, b) which computes sqrt(a² + b²) internally but with better numerical stability than writing it manually.
Compute the manual version with math.sqrt(a**2 + b**2) and compare both results to confirm they match.
In Python 3.8 and later, math.hypot() accepts more than two arguments, allowing you to compute the Euclidean distance in 3D or higher dimensions directly.

▼ Solution & Explanation

import math

a = 3
b = 4

hypotenuse    = math.hypot(a, b)
manual_result = math.sqrt(a**2 + b**2)

print(f"Legs            : a = {a}, b = {b}")
print(f"Hypotenuse      : {hypotenuse}")
print(f"Manual formula  : {manual_result}")
print(f"Results match   : {math.isclose(hypotenuse, manual_result)}")

# Bonus: 2D Euclidean distance between two points
x1, y1 = 1, 2
x2, y2 = 4, 6
distance_2d = math.hypot(x2 - x1, y2 - y1)
print(f"\n2D distance ({x1},{y1}) to ({x2},{y2}): {distance_2d:.4f}")

# 3D Euclidean distance (Python 3.8+)
distance_3d = math.hypot(3, 4, 0)
print(f"3D hypot(3, 4, 0): {distance_3d}")Code language: Python (python)

Explanation:

math.hypot(a, b): Computes sqrt(a² + b²) using an algorithm that scales the inputs internally to avoid intermediate overflow (when values are very large) or underflow (when values are very small). The manual formula math.sqrt(a**2 + b**2) can overflow to inf for inputs near the float maximum, while math.hypot() handles these edge cases gracefully.
math.isclose(): Used to compare two floating-point numbers for near-equality rather than exact equality. Direct == comparison of floats can fail due to tiny rounding differences. math.isclose(a, b) checks that the difference is within a relative tolerance of 1e-9 by default, which is appropriate for most scientific comparisons.
Multi-dimensional use: From Python 3.8, math.hypot(*coords) accepts any number of arguments, computing the n-dimensional Euclidean norm directly. This makes it a single-function solution for distance calculations in 2D, 3D, and beyond without needing to write a manual sum-of-squares loop.

Exercise 7: Natural and Base-10 Logarithms

Problem Statement: Write a Python program that uses math.log() and math.log10() to compute the natural logarithm (base e) and the base-10 logarithm of 1000, then verifies each result by reversing it with the corresponding exponential function.

Purpose: Logarithms are used in decibel calculations, pH measurements, earthquake magnitude scales, information entropy, algorithmic complexity (binary search is O(log n)), and machine learning loss functions. Understanding both the natural log and base-10 log, and knowing how to verify them by reversal, builds the intuition needed to apply them correctly across these domains.

Given Input: value = 1000

Expected Output: ln(1000) = 6.9078, log10(1000) = 3.0, and verification that reversing each result recovers the original value.

▼ Hint

Use math.log(value) for the natural log (base e) and math.log10(value) for the base-10 log.
Pass a second argument to math.log() to compute a log in any base: math.log(value, base).
Verify the natural log by computing math.e ** ln_result and verify the base-10 log by computing 10 ** log10_result – both should return a value close to the original 1000.

▼ Solution & Explanation

import math

value = 1000

# Natural log (base e)
ln_result = math.log(value)
print(f"ln({value})    = {ln_result:.4f}")
print(f"e ^ {ln_result:.4f} = {math.e ** ln_result:.4f}  (reversal check)")

# Base-10 log
log10_result = math.log10(value)
print(f"\nlog10({value}) = {log10_result}")
print(f"10 ^ {log10_result}   = {10 ** log10_result:.1f}    (reversal check)")

# Custom base log (base 2)
log2_result = math.log(value, 2)
log2_builtin = math.log2(value)
print(f"\nlog2({value})  = {log2_result:.4f}")
print(f"math.log2({value}) = {log2_builtin:.4f}  (built-in, more precise)")

# Practical: number of binary digits needed to represent 1000
bits_needed = math.ceil(math.log2(value))
print(f"\nBits needed to represent {value}: {bits_needed}")Code language: Python (python)

Explanation:

math.log(value): With a single argument, computes the natural logarithm (base e, where math.e ≈ 2.71828). With a second argument it computes the logarithm in that base: math.log(1000, 10) gives the same result as math.log10(1000), though the dedicated functions are more numerically accurate for common bases.
math.log10(value): A dedicated base-10 logarithm function that is more numerically stable than math.log(value, 10) for values very close to powers of 10. For value = 1000, it returns exactly 3.0 because 10³ = 1000 is an exact power of 10.
math.log2(): A dedicated base-2 logarithm introduced in Python 3.3, more precise than math.log(x, 2) for values that are exact powers of 2. The ceiling of log2(n) gives the number of bits needed to represent the integer n, which is a practical bit-width calculation used in data encoding and compression.

Exercise 8: Degrees to Radians, Sine and Cosine

Problem Statement: Write a Python program that converts 270 degrees to radians using math.radians(), then computes and prints its sine and cosine values.

Purpose: Python’s trigonometric functions operate in radians, not degrees, so converting between the two is a required first step in any geometry, physics, or signal processing task. This exercise also builds intuition for the unit circle – knowing that sin(270°) = -1 and cos(270°) = 0 provides a mental anchor for verifying trigonometric results.

Given Input: degrees = 270

Expected Output: 270° in radians: 4.7124, sin(270°) = -1.0, cos(270°) = 0.0

▼ Hint

Use math.radians(degrees) to convert degrees to radians. The formula is radians = degrees × π / 180.
Pass the radian value to math.sin() and math.cos() to get the trigonometric results.
Use round(value, 10) or math.isclose(result, expected) when verifying results, since floating-point arithmetic may produce values like -1.0000000000000002 instead of exactly -1.0.
Use math.degrees(radians) to convert back and confirm the round-trip gives the original angle.

▼ Solution & Explanation

import math

degrees = 270

radians = math.radians(degrees)
sin_val = math.sin(radians)
cos_val = math.cos(radians)

print(f"{degrees}° in radians : {radians:.4f}")
print(f"sin({degrees}°)        : {round(sin_val, 10)}")
print(f"cos({degrees}°)        : {round(cos_val, 10)}")

# Round-trip: radians back to degrees
print(f"\nBack to degrees: {math.degrees(radians):.1f}°")

# Bonus: unit circle table for common angles
print("\nUnit circle reference:")
angles = [0, 30, 45, 60, 90, 180, 270, 360]
for deg in angles:
    rad = math.radians(deg)
    print(f"  {deg:3d}° | sin: {round(math.sin(rad), 4):7} | cos: {round(math.cos(rad), 4)}")Code language: Python (python)

Explanation:

math.radians(degrees): Converts an angle from degrees to radians using the formula radians = degrees × π / 180. All of Python’s trigonometric functions (math.sin(), math.cos(), math.tan()) require their input in radians, so this conversion step is mandatory when working with degree-based angle values.
round(sin_val, 10): Due to floating-point representation, math.sin(math.radians(270)) may return a value like -1.0 exactly or something very close such as -1.0000000000000002. Rounding to 10 decimal places collapses these near-zero residuals to clean values for display, without affecting the underlying precision used in calculations.
math.degrees(radians): The inverse of math.radians(), converting back from radians to degrees. The round-trip conversion confirms that no information is lost: math.degrees(math.radians(270)) returns exactly 270.0.

Exercise 9: Combinations and Permutations

Problem Statement: Write a Python program that uses math.comb() and math.perm() to calculate the number of combinations and permutations for n=10 items taken k=3 at a time, then prints both results with a clear explanation of what each value means.

Purpose: Combinations and permutations are the building blocks of probability theory, statistics, and discrete mathematics. They appear in lottery odds, card game probabilities, password strength calculations, and test case generation. math.comb() and math.perm() were introduced in Python 3.8 as direct, readable replacements for the manual factorial-based formulas.

Given Input: n = 10, k = 3

Expected Output: C(10, 3) = 120 and P(10, 3) = 720

▼ Hint

math.comb(n, k) computes C(n, k) = n! / (k! × (n-k)!), the number of ways to choose k items from n where order does not matter.
math.perm(n, k) computes P(n, k) = n! / (n-k)!, the number of ways to arrange k items chosen from n where order matters.
Verify both results manually using math.factorial() and the corresponding formulas to confirm the functions are working as expected.

▼ Solution & Explanation

import math

n = 10
k = 3

combinations  = math.comb(n, k)
permutations  = math.perm(n, k)

print(f"C({n}, {k}) - Combinations (order does NOT matter): {combinations}")
print(f"P({n}, {k}) - Permutations (order DOES matter)    : {permutations}")

# Manual verification using factorial formulas
comb_manual = math.factorial(n) // (math.factorial(k) * math.factorial(n - k))
perm_manual = math.factorial(n) // math.factorial(n - k)
print(f"\nManual C({n},{k}): {comb_manual}  | Match: {combinations == comb_manual}")
print(f"Manual P({n},{k}): {perm_manual}  | Match: {permutations == perm_manual}")

# Practical context
print(f"\nContext: Choosing a 3-person committee from {n} people")
print(f"  Unordered (committee)    : {combinations} ways")
print(f"  Ordered (President/VP/Sec): {permutations} ways")
print(f"  Ratio P/C = k! = {permutations // combinations} (= {k}!)")Code language: Python (python)

Explanation:

math.comb(n, k): Returns the number of ways to choose k items from n without regard to order, computed as n! / (k! × (n-k)!). It returns an exact int and is more readable and efficient than computing the formula manually using math.factorial(). Added in Python 3.8.
math.perm(n, k): Returns the number of ways to arrange k items chosen from n where order matters, computed as n! / (n-k)!. Because each combination of k items can be arranged in k! ways, P(n, k) always equals C(n, k) × k! – confirmed by the ratio printed in the output.
Combinations vs permutations: The key distinction is whether order matters. Selecting a 3-person committee from 10 people is a combination problem (Alice-Bob-Charlie is the same committee as Charlie-Bob-Alice). Selecting a President, Vice-President, and Secretary is a permutation problem (the same three people in different roles produce different outcomes).

Exercise 10: Floating-Point Precision with math.isclose()

Problem Statement: Write a Python program that uses math.isclose() to compare 0.1 + 0.2 with 0.3, demonstrates why a direct == comparison fails, and explains the floating-point precision issue behind it.

Purpose: Floating-point precision errors are among the most common sources of bugs in scientific computing, financial software, and unit tests. Understanding why 0.1 + 0.2 != 0.3 in binary floating-point arithmetic, and knowing how to compare floats correctly using math.isclose(), is an essential skill for writing reliable numerical code.

Given Input: a = 0.1 + 0.2 and b = 0.3

Expected Output: 0.1 + 0.2 == 0.3 : False and math.isclose(0.1 + 0.2, 0.3): True

▼ Hint

Print 0.1 + 0.2 with high precision using f"{0.1 + 0.2:.20f}" to reveal the hidden floating-point error digit.
Call math.isclose(a, b) which checks that the relative difference between two values is within a default tolerance of 1e-9.
Try adjusting the rel_tol and abs_tol keyword arguments of math.isclose() to understand how tolerance thresholds affect the comparison.

▼ Solution & Explanation

import math

a = 0.1 + 0.2
b = 0.3

# Direct equality check
print(f"0.1 + 0.2          = {a}")
print(f"0.3                = {b}")
print(f"0.1 + 0.2 == 0.3   : {a == b}")

# High-precision view of the actual stored values
print(f"\nHigh-precision view:")
print(f"  0.1 + 0.2  = {a:.20f}")
print(f"  0.3        = {b:.20f}")
print(f"  Difference = {abs(a - b):.2e}")

# Correct comparison using math.isclose()
print(f"\nmath.isclose(a, b)                     : {math.isclose(a, b)}")
print(f"math.isclose(a, b, rel_tol=1e-9)       : {math.isclose(a, b, rel_tol=1e-9)}")
print(f"math.isclose(a, b, abs_tol=1e-15)      : {math.isclose(a, b, abs_tol=1e-15)}")

# Safe float comparison pattern used in unit tests
def approx_equal(x, y, tol=1e-9):
    return math.isclose(x, y, rel_tol=tol)

print(f"\napprox_equal(0.1+0.2, 0.3) : {approx_equal(0.1 + 0.2, 0.3)}")Code language: Python (python)

Explanation:

Why 0.1 + 0.2 != 0.3: Decimal fractions like 0.1, 0.2, and 0.3 cannot be represented exactly in binary floating-point (IEEE 754). The computer stores the closest representable binary approximation, so 0.1 + 0.2 produces 0.30000000000000004 rather than exactly 0.3. This is not a Python bug – it is a fundamental property of binary floating-point arithmetic shared by virtually all programming languages.
math.isclose(a, b, rel_tol=1e-9): Returns True if the values are close enough within the specified tolerances. The relative tolerance rel_tol is a fraction of the larger value (default 1e-9), and the absolute tolerance abs_tol is a fixed threshold (default 0.0). The check passes if abs(a-b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol).
When to use abs_tol vs rel_tol: Use rel_tol for general-purpose comparisons where the magnitude of values varies. Use abs_tol when comparing values near zero, since relative tolerance becomes meaningless when one value is zero or very close to it. For robust comparisons, specify both: math.isclose(a, b, rel_tol=1e-9, abs_tol=1e-12).

Exercise 11: nth Fibonacci Using the Golden Ratio

Problem Statement: Write a Python function that computes the nth Fibonacci number using Binet’s formula, which involves math.sqrt() and the golden ratio, then verifies the results against the traditional iterative approach for the first 15 terms.

Purpose: Binet’s formula is a striking example of how a purely integer sequence (Fibonacci numbers) can be computed using irrational numbers (the golden ratio and square roots). It also demonstrates the practical limits of floating-point arithmetic – the formula works perfectly for small n but accumulates rounding error for large n, making it an excellent case study in numerical precision trade-offs.

Given Input: Compute Fibonacci numbers for n from 0 to 14.

Expected Output: A table comparing the golden ratio formula result and the iterative result for each n, confirming they match for small values.

▼ Hint

The golden ratio is phi = (1 + math.sqrt(5)) / 2. Binet’s formula is F(n) = round((phi**n - (-1/phi)**n) / math.sqrt(5)).
Use round() to convert the float result back to the nearest integer, since the formula introduces floating-point imprecision.
Write a simple iterative Fibonacci function using a loop as the reference implementation to compare against.

▼ Solution & Explanation

import math

# Golden ratio
phi = (1 + math.sqrt(5)) / 2
print(f"Golden ratio (phi): {phi:.10f}")

def fib_golden_ratio(n):
    """Binet's formula: closed-form Fibonacci using the golden ratio."""
    return round((phi**n - (-1 / phi)**n) / math.sqrt(5))

def fib_iterative(n):
    """Reference implementation using a simple loop."""
    a, b = 0, 1
    for _ in range(n):
        a, b = b, a + b
    return a

# Compare both methods for n = 0 to 14
print(f"\n{'n':>3} | {'Golden Ratio':>14} | {'Iterative':>10} | {'Match':>6}")
print("-" * 42)
for n in range(15):
    gr  = fib_golden_ratio(n)
    itr = fib_iterative(n)
    print(f"{n:>3} | {gr:>14} | {itr:>10} | {str(gr == itr):>6}")Code language: Python (python)

Explanation:

phi = (1 + math.sqrt(5)) / 2: The golden ratio, approximately 1.6180339887. It is the positive root of the equation x² = x + 1 and has the remarkable property that the ratio of consecutive Fibonacci numbers converges to phi as n increases. Using math.sqrt(5) gives the highest floating-point precision available for this constant.
Binet’s formula: F(n) = (phi^n - (-1/phi)^n) / sqrt(5). The formula produces an exact integer result in theory, but in practice the floating-point arithmetic introduces tiny errors. The round() call corrects these sub-integer residuals and returns the exact Fibonacci integer for values up to roughly n=70, beyond which float64 precision is insufficient.
Iterative reference: The iterative function uses tuple assignment a, b = b, a + b to advance the sequence in-place without a temporary variable. It produces exact integer results for any n and serves as the ground truth to validate the golden ratio formula. For very large n (beyond ~70), the iterative version remains exact while the golden ratio formula diverges.

Exercise 12: Verifying exp() and log() Are Inverses

Problem Statement: Write a Python program that uses math.exp() and math.log() to verify that exp(log(x)) == x and log(exp(x)) == x for several test values, using math.isclose() for the comparison.

Purpose: Understanding that exp and log are inverse functions is foundational for solving exponential equations, working with growth and decay models, and understanding how neural network activation functions relate to their loss functions. Verifying mathematical identities in code also reinforces the habit of testing assumptions rather than trusting them implicitly.

Given Input: Test values [1, 2, 10, 0.5, 100, math.e]

Expected Output: A table showing exp(log(x)) and log(exp(x)) for each test value, with a verification flag confirming both round-trips recover the original.

▼ Hint

math.exp(x) computes e raised to the power x. math.log(x) computes the natural logarithm (base e) of x.
Use math.isclose(math.exp(math.log(x)), x) rather than == to account for floating-point rounding in the round-trip.
Note that math.log(x) is only defined for x > 0 – passing zero or a negative value raises a ValueError.

▼ Solution & Explanation

import math

test_values = [1, 2, 10, 0.5, 100, math.e]

print(f"{'x':>10} | {'exp(log(x))':>14} | {'log(exp(x))':>14} | {'Both pass':>10}")
print("-" * 58)

for x in test_values:
    exp_of_log = math.exp(math.log(x))
    log_of_exp = math.log(math.exp(x))

    check_1 = math.isclose(exp_of_log, x)
    check_2 = math.isclose(log_of_exp, x)

    print(f"{x:>10.4f} | {exp_of_log:>14.10f} | {log_of_exp:>14.10f} | {str(check_1 and check_2):>10}")

# Highlight the identity visually for x = math.e
print(f"\nSpecial case: x = math.e = {math.e:.6f}")
print(f"  log(e)     = {math.log(math.e)}")
print(f"  exp(log(e))= {math.exp(math.log(math.e))}")
print(f"  exp(1)     = {math.exp(1):.10f}")Code language: Python (python)

Explanation:

math.exp(x): Computes e^x where e is Euler’s number (math.e ≈ 2.71828). It is more numerically precise than writing math.e ** x for small values of x near zero, where the built-in function uses a specially optimised algorithm to minimise rounding error.
Inverse relationship: By definition, exp and log are mutual inverses: exp(log(x)) = x for all x > 0, and log(exp(x)) = x for all real x. In practice, floating-point arithmetic introduces tiny errors in both directions, which is why math.isclose() is used instead of == for the verification.
math.log(math.e) equals exactly 1.0: Because math.e is the base of the natural logarithm, log(e) = 1 by definition. Python’s floating-point implementation returns exactly 1.0 for this input, making it a clean anchor point for understanding the exp-log relationship.

Exercise 13: Degrees to DMS Converter

Problem Statement: Write a Python function that converts a decimal degree value into degrees, minutes, and seconds (DMS) format using math.floor() and math.modf(), then formats the output as a standard geographic coordinate string.

Purpose: DMS notation is the standard format for geographic coordinates in GPS devices, aviation, cartography, and astronomy. Converting between decimal degrees and DMS is a practical exercise in decomposing a float into its integer and fractional parts, which is exactly what math.modf() is designed for.

Given Input: decimal_degrees = 45.8833 (representing a latitude or longitude)

Expected Output: 45.8833° = 45° 52' 59.88"

▼ Hint

math.modf(x) returns a tuple of (fractional_part, integer_part) – note the order is fractional first, integer second.
Multiply the fractional degree part by 60 to get total minutes, then apply math.modf() again to split that into whole minutes and remaining fractional minutes.
Multiply the remaining fractional minutes by 60 to get the seconds. Use math.floor() to extract whole minutes cleanly.

▼ Solution & Explanation

import math
def decimal_to_dms(decimal_degrees):
    """Convert decimal degrees to degrees, minutes, seconds."""
    # Preserve sign for direction handling
    sign = -1 if decimal_degrees < 0 else 1
    decimal_degrees = abs(decimal_degrees)
    # Split into whole degrees and fractional part
    frac_deg, whole_deg = math.modf(decimal_degrees)
    degrees = int(whole_deg)
    # Convert fractional degrees to minutes
    frac_min, whole_min = math.modf(frac_deg * 60)
    minutes = int(whole_min)
    # Convert fractional minutes to seconds
    seconds = frac_min * 60
    return sign * degrees, minutes, seconds
def format_dms(degrees, minutes, seconds, direction=None):
    sign = "-" if degrees < 0 else ""
    d = abs(degrees)
    dir_str = f" {direction}" if direction else ""
    return f"{sign}{d}° {minutes}' {seconds:.2f}\"{dir_str}"
# Test cases
test_values = [
    (45.8833,  "N"),
    (-73.9857, "W"),
    (0.0,      "N"),
    (90.0,     "N"),
    (51.5074,  "N"),   # London latitude
]
print("Decimal Degrees to DMS conversion:")
for val, direction in test_values:
    d, m, s = decimal_to_dms(val)
    print(f"  {val:>10.4f}° = {format_dms(d, m, s, direction)}")Code language: Python (python)

Explanation:

math.modf(x): Returns a two-element tuple (fractional_part, integer_part) where both have the same sign as x. Note the counter-intuitive order: fractional comes first, integer second. For example, math.modf(45.8833) returns (0.8833000000000024, 45.0). It is the cleanest way to split a float into its integer and decimal components without string parsing.
Two-stage decomposition: The conversion applies math.modf() twice. The first call extracts the whole degrees and the fractional degree remainder. Multiplying the remainder by 60 converts fractional degrees to minutes, and a second math.modf() call splits that into whole minutes and the remaining fraction. Multiplying the second remainder by 60 gives seconds.
Sign handling: Geographic coordinates can be negative (west longitudes, south latitudes). The function strips the sign before decomposition and restores it to the degrees component only. Minutes and seconds are always non-negative in standard DMS notation – the direction is carried by the sign of the degrees or an explicit N/S/E/W suffix.

Exercise 14: Perfect Square Checker Using math.isqrt()

Problem Statement: Write a Python function that uses math.isqrt() to check whether a given positive integer is a perfect square, then tests it against a range of values and prints which ones qualify.

Purpose: Perfect square detection appears in number theory puzzles, competitive programming, cryptographic primality tests, and grid-layout algorithms (e.g. determining whether n items can be arranged in a square grid). Using math.isqrt() instead of math.sqrt() avoids floating-point rounding errors that can cause int(math.sqrt(n))**2 == n to return incorrect results for large perfect squares.

Given Input: Test integers from 1 to 30, then verify specific large values.

Expected Output: A list of perfect squares in the range 1-30 and a verification result for large numbers like 999999999999999999.

▼ Hint

math.isqrt(n) returns the integer square root of n – the largest integer whose square does not exceed n. It is exact for all non-negative integers with no floating-point rounding.
A number n is a perfect square if math.isqrt(n) ** 2 == n. This check is reliable even for very large integers where math.sqrt() would lose precision.
Show why using int(math.sqrt(n)) can fail for large numbers by testing a value like 999999999999999999 with both approaches.

▼ Solution & Explanation

import math
def is_perfect_square(n):
    """Return True if n is a perfect square using integer square root."""
    if n < 0:
        return False
    root = math.isqrt(n)
    return root * root == n
# Test range 1 to 30
print("Perfect squares from 1 to 30:")
perfect_squares = [n for n in range(1, 31) if is_perfect_square(n)]
print(f"  {perfect_squares}")
# Show isqrt() on a few values
print("\nmath.isqrt() samples:")
for n in [9, 10, 15, 16, 24, 25]:
    root = math.isqrt(n)
    print(f"  isqrt({n:2d}) = {root}  | perfect square: {is_perfect_square(n)}")
# Demonstrate float precision failure for large numbers
large_perfect   = 999999999999998001    # = 999999999 ^ 2
large_non_perf  = 999999999999998002
print(f"\nLarge number test: {large_perfect}")
print(f"  is_perfect_square (isqrt) : {is_perfect_square(large_perfect)}")
print(f"  float-based check         : {int(math.sqrt(large_perfect))**2 == large_perfect}")
print(f"\nLarge non-perfect : {large_non_perf}")
print(f"  is_perfect_square (isqrt) : {is_perfect_square(large_non_perf)}")Code language: Python (python)

Explanation:

math.isqrt(n): Returns the exact integer square root of n – the largest integer r such that r² does not exceed n. It was introduced in Python 3.8 and operates entirely in integer arithmetic, so it is precise for arbitrarily large integers without any floating-point rounding. For example, math.isqrt(26) returns 5 because 5² = 25 ≤ 26 but 6² = 36 > 26.
root * root == n: The perfect square check squares the integer root and compares it to the original value. If they match, n is a perfect square. This integer-only comparison is always exact, unlike int(math.sqrt(n))**2 == n which can fail for large n because math.sqrt() uses 64-bit float arithmetic and loses precision beyond roughly 15-17 significant digits.
Float precision failure: For large perfect squares like 999999999², math.sqrt() may return a float that is slightly off from the true root (e.g. 999999999.0000005), causing int(math.sqrt(n))**2 to compute the wrong integer and fail the equality check. math.isqrt() has no such limitation and gives the correct result for any non-negative integer, however large.

Exercise 15: Product of All Primes Below 30

Problem Statement: Write a Python program that identifies all prime numbers below 30, then uses math.prod() to compute their product (the primorial), and prints both the list of primes and the final result.

Purpose: math.prod() was introduced in Python 3.8 as a clean, readable counterpart to the built-in sum() function, replacing verbose functools.reduce() or manual loop patterns for multiplicative aggregation. This exercise pairs it with a prime sieve to demonstrate how functional tools compose naturally with number theory operations used in cryptography, hashing, and combinatorics.

Given Input: All prime numbers below 30.

Expected Output: Primes below 30: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29] and Primorial (product): 6469693230

▼ Hint

Write a helper function is_prime(n) that returns True if n has no divisors other than 1 and itself. Check divisibility up to int(math.sqrt(n)) + 1 for efficiency.
Use a list comprehension to build the list of primes below 30: [n for n in range(2, 30) if is_prime(n)].
Pass the primes list directly to math.prod() to get the product. Compare the result with a manual loop using *= to confirm they match.

▼ Solution & Explanation

import math
def is_prime(n):
    """Return True if n is a prime number."""
    if n < 2:
        return False
    for i in range(2, int(math.sqrt(n)) + 1):
        if n % i == 0:
            return False
    return True
# Collect all primes below 30
primes = [n for n in range(2, 30) if is_prime(n)]
print(f"Primes below 30 : {primes}")
print(f"Count           : {len(primes)}")
# Compute the product using math.prod()
primorial = math.prod(primes)
print(f"Primorial       : {primorial:,}")
# Manual loop verification
manual_product = 1
for p in primes:
    manual_product *= p
print(f"Manual product  : {manual_product:,}")
print(f"Results match   : {primorial == manual_product}")
# Bonus: partial products showing growth
print("\nCumulative primorial growth:")
running = 1
for p in primes:
    running *= p
    print(f"  × {p:2d}  =  {running:,}")Code language: Python (python)

Explanation:

math.prod(iterable): Computes the product of all elements in the iterable, starting from an implicit initial value of 1. It mirrors the behaviour of sum() for addition. Calling math.prod([]) on an empty iterable returns 1 (the multiplicative identity), just as sum([]) returns 0 (the additive identity).
int(math.sqrt(n)) + 1 in primality check: A composite number n must have at least one factor less than or equal to its square root. Checking divisors only up to sqrt(n) reduces the number of iterations from O(n) to O(sqrt(n)), which is a significant speedup for larger values. The +1 ensures the boundary is inclusive after integer truncation.
Primorial: The product of the first k prime numbers is called a primorial (written p#). Primorials appear in proofs related to the infinitude of primes, in wheel factorisation sieves, and in generating candidate primes efficiently. The primorial of 29 is 6,469,693,230 – a number that grows astronomically fast as more primes are included.

Exercise 16: Mean, Median, and Mode of Exam Scores

Problem Statement: Write a Python program that takes a list of exam scores and uses the statistics module to compute the mean, median, and mode, then prints a formatted summary.

Purpose: Mean, median, and mode are the three measures of central tendency that appear in every branch of data analysis. Using Python’s built-in statistics module produces accurate results without external dependencies, making it the right choice for lightweight scripts, educational tools, and situations where NumPy or pandas would be overkill.

Given Input: scores = [72, 85, 90, 88, 76, 95, 85, 60, 72, 85, 91, 78]

Expected Output: Mean: 81.42, Median: 85.0, Mode: 85

▼ Hint

Import mean, median, and mode from the statistics module.
statistics.mean(data) returns the arithmetic average. statistics.median(data) returns the middle value. statistics.mode(data) returns the single most common value – it raises StatisticsError if there is no unique mode (use multimode() to handle ties, covered in Exercise 20).
Use sorted(scores) to print the data in order alongside the computed statistics so the results are easy to verify visually.

▼ Solution & Explanation

import statistics
scores = [72, 85, 90, 88, 76, 95, 85, 60, 72, 85, 91, 78]
mean_val   = statistics.mean(scores)
median_val = statistics.median(scores)
mode_val   = statistics.mode(scores)
print(f"Dataset (sorted) : {sorted(scores)}")
print(f"Count            : {len(scores)}")
print(f"Mean             : {mean_val:.2f}")
print(f"Median           : {median_val}")
print(f"Mode             : {mode_val}")
# Additional context
print(f"\nMin score : {min(scores)}")
print(f"Max score : {max(scores)}")
print(f"Range     : {max(scores) - min(scores)}")
# Mean vs median: which is more representative?
print(f"\nMean ({mean_val:.2f}) vs Median ({median_val})")
if mean_val < median_val:
    print("  Mean < Median: distribution is likely left-skewed.")
elif mean_val > median_val:
    print("  Mean > Median: distribution is likely right-skewed.")
else:
    print("  Mean == Median: distribution appears symmetric.")Code language: Python (python)

Explanation:

statistics.mean(scores): Computes the arithmetic mean as the sum of all values divided by the count. The statistics module uses exact rational arithmetic for integer inputs, avoiding the floating-point rounding that can occur with sum(scores) / len(scores) for large datasets.
statistics.median(scores): Sorts the data internally and returns the middle value for odd-length datasets, or the mean of the two middle values for even-length datasets. Because this input has 12 values, the median is the average of the 6th and 7th sorted values: (85 + 85) / 2 = 85.0.
statistics.mode(scores): Returns the single most frequently occurring value. For this dataset, 85 appears three times and is the clear mode. In Python 3.8 and later, mode() no longer raises StatisticsError on a tie – it returns the first mode encountered. Use multimode() from Exercise 20 when you need all modes.

Exercise 17: Sample vs Population Standard Deviation

Problem Statement: Write a Python program that computes both statistics.stdev() (sample standard deviation) and statistics.pstdev() (population standard deviation) on the same dataset, prints both values, and explains when each should be used.

Purpose: Choosing between sample and population standard deviation is one of the most common statistical mistakes made by programmers. Using the wrong one leads to either underestimating or overestimating the spread of data, which has real consequences in machine learning feature scaling, quality control, A/B testing, and scientific reporting.

Given Input: temperatures = [22.1, 24.5, 19.8, 23.3, 25.0, 21.7, 20.4, 26.1, 23.8, 22.9]

Expected Output: Sample stdev: 1.99 and Population pstdev: 1.89

▼ Hint

Use statistics.stdev(data) for sample standard deviation (divides by n-1, Bessel’s correction).
Use statistics.pstdev(data) for population standard deviation (divides by n).
Also compute statistics.variance(data) and statistics.pvariance(data) and verify they are the squares of their respective standard deviations.

▼ Solution & Explanation

import statistics
import math

temperatures = [22.1, 24.5, 19.8, 23.3, 25.0,
                21.7, 20.4, 26.1, 23.8, 22.9]

n            = len(temperatures)
mean_val     = statistics.mean(temperatures)
sample_std   = statistics.stdev(temperatures)
pop_std      = statistics.pstdev(temperatures)
sample_var   = statistics.variance(temperatures)
pop_var      = statistics.pvariance(temperatures)

print(f"Dataset          : {temperatures}")
print(f"Count (n)        : {n}")
print(f"Mean             : {mean_val:.2f}")
print(f"\nSample stdev     : {sample_std:.4f}  (divides by n-1 = {n-1})")
print(f"Population pstdev: {pop_std:.4f}  (divides by n   = {n})")

# Verify: variance is the square of stdev
print(f"\nSample variance      : {sample_var:.4f}")
print(f"stdev²               : {sample_std**2:.4f}  | Match: {math.isclose(sample_var, sample_std**2)}")
print(f"\nPopulation variance  : {pop_var:.4f}")
print(f"pstdev²              : {pop_std**2:.4f}  | Match: {math.isclose(pop_var, pop_std**2)}")

print("\nWhen to use which:")
print("  stdev / variance   : Use when data is a SAMPLE drawn from a larger population.")
print("  pstdev / pvariance : Use when data IS the entire population.")Code language: Python (python)

Explanation:

statistics.stdev() divides by n-1: The subtraction of 1 from the denominator is called Bessel’s correction. It compensates for the fact that a sample mean is not the true population mean – the sample tends to underestimate the spread. Dividing by n-1 produces an unbiased estimator of the population standard deviation. Use this whenever your data is a subset drawn from a larger group you want to make inferences about.
statistics.pstdev() divides by n: No correction is applied because there is nothing to estimate – the data represents the entire population. Use this when you have measured every member of the group and simply want to describe how spread out those exact values are, with no inference intended.
Variance is stdev squared: Standard deviation and variance always satisfy stdev² = variance and pstdev² = pvariance. Variance is expressed in squared units (e.g. °C²), which can be hard to interpret directly. Taking the square root to get standard deviation restores the original unit (°C), making it more intuitive to compare against the mean.

Exercise 18: Variance of Temperatures

Problem Statement: Write a Python program that computes the variance of a list of daily temperature readings using statistics.variance(), then uses the result to identify which readings fall more than one standard deviation from the mean.

Purpose: Variance quantifies how spread out a dataset is around its mean. It is a foundational metric in quality control, financial risk modelling, weather analysis, and machine learning (where high variance in a model signals overfitting). Extending the exercise to flag outliers bridges the gap between computing a statistic and applying it to a real decision.

Given Input: temperatures = [18.5, 21.0, 19.3, 35.2, 20.1, 17.8, 22.4, 16.9, 21.7, 34.8, 20.5, 19.0]

Expected Output: Variance, standard deviation, and a list of readings flagged as outliers (more than one standard deviation from the mean).

▼ Hint

Use statistics.variance(data) for sample variance and statistics.mean(data) for the mean.
Compute the standard deviation as math.sqrt(variance) or use statistics.stdev(data) directly.
A value is an outlier if abs(value - mean) > stdev. Use a list comprehension to collect all such values.

▼ Solution & Explanation

import statistics
import math
temperatures = [18.5, 21.0, 19.3, 35.2, 20.1, 17.8,
                22.4, 16.9, 21.7, 34.8, 20.5, 19.0]
mean_val = statistics.mean(temperatures)
var_val  = statistics.variance(temperatures)
std_val  = statistics.stdev(temperatures)
print(f"Dataset    : {temperatures}")
print(f"Mean       : {mean_val:.2f}°C")
print(f"Variance   : {var_val:.4f}")
print(f"Std Dev    : {std_val:.4f}°C")
# Identify outliers: values more than 1 stdev from the mean
lower_bound = mean_val - std_val
upper_bound = mean_val + std_val
print(f"\nNormal range (mean ± 1 stdev): [{lower_bound:.2f}, {upper_bound:.2f}]")
outliers = [t for t in temperatures if abs(t - mean_val) > std_val]
normal   = [t for t in temperatures if abs(t - mean_val) <= std_val]
print(f"Normal readings  : {normal}")
print(f"Outlier readings : {outliers}")
# Z-score for each value
print("\nZ-scores (standard deviations from mean):")
for t in temperatures:
    z = (t - mean_val) / std_val
    flag = " <-- outlier" if abs(z) > 1 else ""
    print(f"  {t:5.1f}°C  z = {z:+.2f}{flag}")Code language: Python (python)

Explanation:

statistics.variance(temperatures): Returns the sample variance, computed as the average of the squared differences from the mean, with Bessel’s correction (dividing by n-1). The result is in squared units – for temperatures in °C, the variance is in °C². Taking the square root via math.sqrt() or statistics.stdev() converts back to the original unit.
Outlier detection with standard deviation: The rule of thumb that values more than one standard deviation from the mean are noteworthy is a simplified form of the empirical rule (68-95-99.7 rule), which applies to normally distributed data. For a normal distribution, roughly 68% of values fall within one standard deviation of the mean, so values outside that range are in the less common tail.
Z-score: The formula z = (value - mean) / stdev standardises each data point to show how many standard deviations it sits above or below the mean. A z-score of +2.0 means the value is two standard deviations above the mean. Z-scores are the foundation of standard normal distribution tables, hypothesis testing, and feature normalisation in machine learning.

Exercise 19: median_low() and median_high()

Problem Statement: Write a Python program that applies statistics.median_low() and statistics.median_high() to an even-length dataset, compares their results with the standard statistics.median(), and explains when each variant is the appropriate choice.

Purpose: When a dataset has an even number of values, the standard median is the average of the two middle elements – which may not itself be a value in the dataset. In contexts such as reporting a real measurement, selecting a representative record from a database, or choosing a salary benchmark from actual payroll data, median_low() and median_high() ensure the result is always a genuine data point.

Given Input: response_times = [120, 135, 98, 150, 112, 143, 107, 160, 125, 138] (10 values – even length)

Expected Output: median: 131.5, median_low: 125, median_high: 138

▼ Hint

All three functions – median(), median_low(), and median_high() – take the same data list as their argument.
For an even-length dataset, median_low() returns the lower of the two middle values, and median_high() returns the upper of the two middle values. Both return an actual element from the dataset.
Print the sorted dataset first so the two middle elements are clearly visible before computing the statistics.

▼ Solution & Explanation

import statistics

response_times = [120, 135, 98, 150, 112, 143, 107, 160, 125, 138]

sorted_data  = sorted(response_times)
n            = len(sorted_data)
mid_low_idx  = n // 2 - 1
mid_high_idx = n // 2

med          = statistics.median(response_times)
med_low      = statistics.median_low(response_times)
med_high     = statistics.median_high(response_times)

print(f"Sorted dataset   : {sorted_data}")
print(f"Count (n)        : {n}  (even-length)")
print(f"Two middle values: {sorted_data[mid_low_idx]} (index {mid_low_idx}) and "
      f"{sorted_data[mid_high_idx]} (index {mid_high_idx})")

print(f"\nmedian()         : {med}      (average of two middle values - may not be in dataset)")
print(f"median_low()     : {med_low}   (lower middle value - always in dataset)")
print(f"median_high()    : {med_high}  (upper middle value - always in dataset)")

# Verify median is the average of low and high
print(f"\n(median_low + median_high) / 2 = {(med_low + med_high) / 2}")
print(f"Equals median()  : {(med_low + med_high) / 2 == med}")

# Odd-length example for contrast
odd_data     = response_times[:9]
print(f"\nOdd-length dataset: {sorted(odd_data)}")
print(f"  median()      : {statistics.median(odd_data)}")
print(f"  median_low()  : {statistics.median_low(odd_data)}")
print(f"  median_high() : {statistics.median_high(odd_data)}")
print(f"  (all three are equal for odd-length data)")Code language: Python (python)

Explanation:

statistics.median_low(data): Sorts the data and returns the lower of the two central values for even-length datasets. For odd-length datasets, it returns the single middle value – identical to median(). Because it always returns an element that exists in the original dataset, it is appropriate when the median must be a real, measurable data point (e.g. an actual server response time, not a calculated average of two times).
statistics.median_high(data): Returns the upper of the two central values for even-length datasets, and the single middle value for odd-length datasets. Use it when a conservative upper estimate of the centre is needed – for example, when selecting a salary benchmark and you want to err on the higher side of the midpoint.
Relationship between the three: For even-length data, median() is always the arithmetic mean of median_low() and median_high(). For odd-length data, all three functions return the same value. This means median_low() and median_high() are only meaningfully different from median() when the dataset has an even number of elements.

Exercise 20: Finding All Modes with multimode()

Problem Statement: Write a Python program that creates a dataset containing multiple repeated values and uses statistics.multimode() to find all modes, then demonstrates how the result differs from statistics.mode() when there are ties.

Purpose: Real-world datasets are frequently multimodal – think of survey responses clustered at two popular options, or sales data peaking in two different regions. statistics.multimode() was introduced in Python 3.8 specifically to handle tie cases that mode() either misses or handles ambiguously, making it essential for accurate frequency analysis.

Given Input: survey_responses = [3, 5, 2, 4, 5, 3, 1, 5, 2, 3, 4, 2, 5, 3, 1, 4, 2, 3]

Expected Output: multimode: [3, 2, 5] (or the modes in order of first appearance) and a frequency count confirming the tie.

▼ Hint

Call statistics.multimode(data) to get a list of all values that appear with the highest frequency. It always returns a list, even when there is only one mode.
Use Counter from the collections module to build a frequency table and verify that the modes returned by multimode() all share the same count.
Compare the output of statistics.mode(data) and statistics.multimode(data) on the same dataset to highlight the difference in behaviour when a tie exists.

▼ Solution & Explanation

import statistics
from collections import Counter

survey_responses = [3, 5, 2, 4, 5, 3, 1, 5, 2, 3, 4, 2, 5, 3, 1, 4, 2, 3]

freq = Counter(survey_responses)

print(f"Dataset          : {survey_responses}")
print(f"\nFrequency table  :")
for value, count in sorted(freq.items()):
    bar = "#" * count
    print(f"  {value}: {count}  {bar}")

# multimode - returns ALL values with the highest frequency
modes = statistics.multimode(survey_responses)
print(f"\nstatistics.multimode(): {modes}")
print(f"Number of modes  : {len(modes)}")

# mode - returns only the first encountered mode (Python 3.8+)
single_mode = statistics.mode(survey_responses)
print(f"statistics.mode(): {single_mode}")

# Unimodal dataset for contrast
unimodal = [1, 2, 2, 2, 3, 4, 5]
print(f"\nUnimodal dataset : {unimodal}")
print(f"  multimode()    : {statistics.multimode(unimodal)}")
print(f"  mode()         : {statistics.mode(unimodal)}")Code language: Python (python)

Explanation:

statistics.multimode(data): Returns a list of all values that occur with the maximum frequency. If there is a single mode it returns a list with one element. If multiple values are tied for the highest frequency, all of them are returned in the order they first appear in the input. It never raises an error, making it safe to use on any dataset without knowing the mode structure in advance.
statistics.mode(data) on tied data: In Python 3.8 and later, mode() returns the first value it encounters with the highest frequency rather than raising a StatisticsError (the old Python 3.7 behaviour). This means it silently returns only one of several equally valid modes, which can mislead analysis if the tie is not known about – making multimode() the safer default choice.
Multimodal distributions: A dataset with two modes is bimodal; with three or more it is multimodal. Multimodal distributions often indicate that the data comes from two or more distinct subpopulations (e.g. survey respondents from different age groups preferring different options). Detecting multiple modes is therefore an important signal for further segmentation or investigation.

Exercise 21: Quartiles with statistics.quantiles()

Problem Statement: Write a Python program that generates a dataset of 20 integers, uses statistics.quantiles() to compute Q1, Q2, and Q3, then uses these values to calculate the interquartile range (IQR) and identify any outliers using the IQR fence method.

Purpose: Quartiles and the IQR are among the most robust tools in descriptive statistics. Unlike standard deviation, they are not sensitive to extreme outliers and are the basis of the box-and-whisker plot. The IQR fence method (Q1 – 1.5×IQR, Q3 + 1.5×IQR) is the standard algorithm used by statistical software to flag outliers in exploratory data analysis.

Given Input: data = [4, 7, 13, 2, 19, 25, 8, 14, 3, 31, 11, 6, 17, 22, 9, 5, 28, 16, 10, 100] (20 integers including one obvious outlier)

Expected Output: Q1, Q2, Q3, IQR, fence boundaries, and the list of values flagged as outliers by the IQR method.

▼ Hint

Call statistics.quantiles(data, n=4) to divide the data into 4 equal parts. This returns a list of 3 cut points: [Q1, Q2, Q3].
Compute the IQR as Q3 - Q1, then set the lower fence as Q1 - 1.5 * IQR and the upper fence as Q3 + 1.5 * IQR.
Use method="inclusive" or method="exclusive" as a keyword argument to quantiles() to control how the boundaries are calculated, and note the small difference in results.

▼ Solution & Explanation

import statistics
data = [4, 7, 13, 2, 19, 25, 8, 14, 3, 31,
        11, 6, 17, 22, 9, 5, 28, 16, 10, 100]
sorted_data = sorted(data)
print(f"Sorted dataset ({len(data)} values):")
print(f"  {sorted_data}")
# Compute quartiles (n=4 gives [Q1, Q2, Q3])
q1, q2, q3 = statistics.quantiles(data, n=4)
iqr = q3 - q1
print(f"\nQ1 (25th percentile) : {q1}")
print(f"Q2 (50th percentile) : {q2}")
print(f"Q3 (75th percentile) : {q3}")
print(f"IQR (Q3 - Q1)        : {iqr}")
# IQR fence method for outlier detection
lower_fence = q1 - 1.5 * iqr
upper_fence = q3 + 1.5 * iqr
print(f"\nLower fence (Q1 - 1.5×IQR): {lower_fence}")
print(f"Upper fence (Q3 + 1.5×IQR): {upper_fence}")
outliers = [x for x in data if x < lower_fence or x > upper_fence]
clean    = [x for x in data if lower_fence <= x <= upper_fence]
print(f"\nOutliers (IQR method): {outliers}")
print(f"Clean data           : {sorted(clean)}")
# Compare inclusive vs exclusive method
q_inclusive = statistics.quantiles(data, n=4, method="inclusive")
q_exclusive = statistics.quantiles(data, n=4, method="exclusive")
print(f"\nInclusive quartiles : Q1={q_inclusive[0]}, Q2={q_inclusive[1]}, Q3={q_inclusive[2]}")
print(f"Exclusive quartiles : Q1={q_exclusive[0]}, Q2={q_exclusive[1]}, Q3={q_exclusive[2]}")Code language: Python (python)

Explanation:

statistics.quantiles(data, n=4): Divides the sorted dataset into n equal-probability intervals and returns a list of n-1 cut points. With n=4 it returns the three quartile boundaries [Q1, Q2, Q3]. Setting n=100 would return 99 percentile cut points. The function was introduced in Python 3.8.
IQR fence method: Any value below Q1 - 1.5 × IQR or above Q3 + 1.5 × IQR is flagged as a potential outlier. The factor 1.5 was chosen by John Tukey to capture roughly 99.3% of normally distributed data within the fences. The value 100 in the dataset sits far above the upper fence and is correctly identified as an outlier.
method="inclusive" vs method="exclusive": The two methods differ in how they handle the boundary data points when computing quartile positions. "exclusive" (the default) excludes the minimum and maximum values from the interpolation, which is common in software like R and Excel. "inclusive" includes them, matching the convention used in some textbooks. For large datasets the difference is negligible, but for small datasets it can produce visibly different Q1 and Q3 values.

Exercise 22: Harmonic Mean of Internet Speeds

Problem Statement: Write a Python program that computes the harmonic mean of a list of internet speeds using statistics.harmonic_mean(), then compares it to the arithmetic mean to show why the harmonic mean is the correct average for rate-based measurements.

Purpose: The harmonic mean is the correct average to use when combining rates, speeds, or ratios measured over equal distances or time slots. Using the arithmetic mean for speeds systematically overestimates the true average, which leads to incorrect capacity planning, network benchmarking, and performance reporting. This exercise builds the intuition to recognise when each type of mean is appropriate.

Given Input: speeds = [10, 20, 40, 80] (internet speeds in Mbps)

Expected Output: Harmonic mean: 22.86 Mbps and Arithmetic mean: 37.50 Mbps

▼ Hint

Use statistics.harmonic_mean(data) to compute the harmonic mean. The formula is n / (1/x1 + 1/x2 + ... + 1/xn).
Compare with statistics.mean(data) to show the arithmetic mean and highlight the gap between the two.
To build intuition, simulate equal-time segments: if you spend 1 hour downloading at each speed, calculate the total data transferred and divide by total time to verify the harmonic mean is the correct result.

▼ Solution & Explanation

import statistics
speeds = [10, 20, 40, 80]   # Mbps
harmonic  = statistics.harmonic_mean(speeds)
arithmetic = statistics.mean(speeds)
print(f"Speeds (Mbps)    : {speeds}")
print(f"Harmonic mean    : {harmonic:.2f} Mbps")
print(f"Arithmetic mean  : {arithmetic:.2f} Mbps")
print(f"Difference       : {arithmetic - harmonic:.2f} Mbps")
# Manual verification of harmonic mean formula
n = len(speeds)
manual_harmonic = n / sum(1 / s for s in speeds)
print(f"\nManual harmonic  : {manual_harmonic:.2f} Mbps")
print(f"Results match    : {abs(harmonic - manual_harmonic) < 1e-9}")
# Intuition: equal-time simulation (1 hour at each speed)
print("\nWhy harmonic mean is correct for equal-time averages:")
time_hours   = 1
total_data   = sum(s * time_hours for s in speeds)
total_time   = len(speeds) * time_hours
true_average = total_data / total_time
print(f"  Total data transferred : {total_data} Mb")
print(f"  Total time             : {total_time} hours")
print(f"  True average speed     : {true_average:.2f} Mbps")
print(f"  Arithmetic mean gives  : {arithmetic:.2f} Mbps  (OVERESTIMATES)")
print(f"  Harmonic mean gives    : {harmonic:.2f} Mbps  (CORRECT for equal-time)")Code language: Python (python)

Explanation:

statistics.harmonic_mean(data): Computes n / (1/x1 + 1/x2 + ... + 1/xn), which is the reciprocal of the arithmetic mean of the reciprocals. It was added in Python 3.6 and raises StatisticsError if any value is zero or negative, since rates must be positive. For a single-element list it returns that element unchanged.
Why harmonic mean for rates: When the same amount of work (distance, time, or quantity) is performed at different rates, the harmonic mean gives the correct overall rate. Spending one hour at 10 Mbps, 20 Mbps, 40 Mbps, and 80 Mbps transfers 150 Mb in 4 hours, giving a true average of 37.5 Mbps – which matches the arithmetic mean here only because the scenario happens to be equal-time. For equal-distance or equal-work scenarios the harmonic mean is the right choice.
Arithmetic vs harmonic gap: The arithmetic mean always overestimates the true average rate when values vary, because slow segments drag down actual throughput more than fast segments boost it. The larger the variance in the rates, the more the arithmetic mean overestimates, and the bigger the practical error in capacity planning or benchmarking.

Exercise 23: Geometric Mean for Compound Growth Rate

Problem Statement: Write a Python program that uses statistics.geometric_mean() to calculate the average compound annual growth rate (CAGR) from five years of annual returns, then verifies the result by compounding the geometric mean over the same period.

Purpose: The geometric mean is the correct average for quantities that multiply over time – investment returns, population growth rates, inflation rates, and biological reproduction factors. Using the arithmetic mean for such data overestimates the true long-term growth and leads to misleading projections. This is one of the most practically important distinctions in financial and scientific data analysis.

Given Input: Annual growth factors for 5 years: growth_factors = [1.08, 0.95, 1.12, 1.06, 0.98] (representing +8%, -5%, +12%, +6%, -2%)

Expected Output: Geometric mean (CAGR): 1.0369 meaning approximately 3.69% average annual growth.

▼ Hint

Pass the list of growth factors (not percentage returns) to statistics.geometric_mean(). The formula is the nth root of the product of all values.
Verify the result by computing geometric_mean ** n and comparing it to math.prod(growth_factors) – both should give the total cumulative growth.
Show the final portfolio value by multiplying an initial investment of 1000 by the total growth factor, then by compounding at the geometric mean rate for the same period.

▼ Solution & Explanation

import statistics
import math

# Annual growth factors (1 + return rate)
growth_factors = [1.08, 0.95, 1.12, 1.06, 0.98]
years          = len(growth_factors)
initial        = 1000.00

geo_mean   = statistics.geometric_mean(growth_factors)
arith_mean = statistics.mean(growth_factors)

print(f"Annual growth factors : {growth_factors}")
print(f"Years                 : {years}")
print(f"\nGeometric mean (CAGR) : {geo_mean:.6f}  ({(geo_mean - 1) * 100:.2f}% per year)")
print(f"Arithmetic mean       : {arith_mean:.6f}  ({(arith_mean - 1) * 100:.2f}% per year)")

# Verify: geo_mean^n should equal the total cumulative product
total_growth   = math.prod(growth_factors)
geo_compounded = geo_mean ** years
print(f"\nTotal cumulative growth       : {total_growth:.6f}")
print(f"Geometric mean ^ {years} years    : {geo_compounded:.6f}")
print(f"Match                         : {math.isclose(total_growth, geo_compounded)}")

# Final portfolio value
actual_final  = initial * total_growth
geo_projected = initial * (geo_mean ** years)
arith_proj    = initial * (arith_mean ** years)

print(f"\nStarting value                : £{initial:,.2f}")
print(f"Actual final value            : £{actual_final:,.2f}")
print(f"Projected via geometric mean  : £{geo_projected:,.2f}  (matches actual)")
print(f"Projected via arithmetic mean : £{arith_proj:,.2f}  (OVERESTIMATES)")Code language: Python (python)

Explanation:

statistics.geometric_mean(data): Computes the nth root of the product of all values, equivalent to exp(mean(log(x) for x in data)). It was added in Python 3.8 and requires all values to be strictly positive – passing zero or a negative value raises a StatisticsError. For growth factors this is always satisfied since a -100% return (total loss) would be represented as 0.0, which is excluded by definition.
CAGR verification: The geometric mean raised to the power of n equals the total cumulative product of all growth factors. This is the defining property of the geometric mean for multiplicative sequences: it is the single constant rate that, compounded for n periods, produces the same end result as the actual sequence of varying rates.
Arithmetic mean overestimates for multiplicative growth: The arithmetic mean of the growth factors is 1.038, implying 3.8% annual growth. Compounded over 5 years this gives a higher final value than actually occurred. The geometric mean (3.69%) correctly captures the drag that loss years impose on compounding, which the arithmetic mean ignores.

Exercise 24: Covariance Between Study Hours and Test Scores

Problem Statement: Write a Python program that uses statistics.covariance() to measure the linear relationship between hours studied and test scores, then interprets the sign and magnitude of the result.

Purpose: Covariance is the foundational measure of how two variables move together. It is the building block of correlation, regression, and portfolio theory in finance. Understanding its sign (direction of relationship) and its scale-dependence (why raw covariance cannot be compared across different datasets) motivates the need for the normalised correlation coefficient introduced in the next exercise.

Given Input: hours = [2, 4, 6, 8, 5, 7, 3, 9, 1, 6] and scores = [55, 70, 80, 92, 74, 85, 60, 95, 45, 78]

Expected Output: A positive covariance value confirming that more study hours are associated with higher scores.

▼ Hint

Call statistics.covariance(x, y) with two equal-length lists. It computes the sample covariance using n-1 in the denominator.
A positive result means the variables tend to increase together. A negative result means one increases as the other decreases. Zero means no linear relationship.
Verify the result manually: compute sum((xi - mean_x) * (yi - mean_y) for xi, yi in zip(x, y)) / (n - 1) and confirm it matches statistics.covariance(x, y).

▼ Solution & Explanation

import statistics
import math
hours  = [2, 4, 6, 8, 5, 7, 3, 9, 1, 6]
scores = [55, 70, 80, 92, 74, 85, 60, 95, 45, 78]
cov       = statistics.covariance(hours, scores)
mean_h    = statistics.mean(hours)
mean_s    = statistics.mean(scores)
n         = len(hours)
print(f"Hours studied : {hours}")
print(f"Test scores   : {scores}")
print(f"Mean hours    : {mean_h:.2f}")
print(f"Mean score    : {mean_s:.2f}")
print(f"\nCovariance    : {cov:.4f}")
# Interpret the sign
if cov > 0:
    print("Interpretation: Positive covariance - more hours studied tends to mean higher scores.")
elif cov < 0:
    print("Interpretation: Negative covariance - more hours studied tends to mean lower scores.")
else:
    print("Interpretation: Zero covariance - no linear relationship detected.")
# Manual verification
manual_cov = sum((h - mean_h) * (s - mean_s) for h, s in zip(hours, scores)) / (n - 1)
print(f"\nManual covariance : {manual_cov:.4f}")
print(f"Results match     : {math.isclose(cov, manual_cov)}")
# Limitation: covariance is scale-dependent
print(f"\nLimitation: covariance is {cov:.2f} for hours vs scores.")
print("This number is hard to interpret in isolation because it depends on the")
print("units and scale of both variables. Use correlation (Exercise 25) to normalise it.")Code language: Python (python)

Explanation:

statistics.covariance(x, y): Computes the sample covariance between two equal-length sequences using the formula sum((xi - mean_x) * (yi - mean_y)) / (n - 1). It was added in Python 3.10 along with statistics.correlation(). Both lists must have the same length, otherwise a StatisticsError is raised.
Sign interpretation: A positive covariance means the variables tend to move in the same direction – when hours is above its mean, score tends to be above its mean too. A negative covariance means they move in opposite directions. Zero covariance means there is no linear relationship, though a non-linear relationship could still exist.
Scale-dependence limitation: The raw covariance value (e.g. 47.6) is hard to compare across datasets because it is measured in the product of the two units (hours × score points). Doubling all the score values would double the covariance without any change in the underlying relationship. This scale-dependence is resolved by dividing by both standard deviations to produce the dimensionless Pearson correlation coefficient shown in the next exercise.

Exercise 25: Pearson Correlation Coefficient

Problem Statement: Write a Python program that uses statistics.correlation() to compute the Pearson correlation coefficient between hours studied and test scores, interprets the strength and direction of the relationship, and verifies that it equals the covariance divided by the product of the two standard deviations.

Purpose: The Pearson correlation coefficient is the most widely used measure of linear association in science, engineering, finance, and machine learning. Unlike covariance, its value is always between -1 and +1, making it directly comparable across datasets regardless of units. Understanding it deeply is a prerequisite for linear regression, feature selection, and multicollinearity analysis.

Given Input: hours = [2, 4, 6, 8, 5, 7, 3, 9, 1, 6] and scores = [55, 70, 80, 92, 74, 85, 60, 95, 45, 78]

Expected Output: Pearson r: 0.9934 indicating a very strong positive linear relationship.

▼ Hint

Call statistics.correlation(x, y) to get the Pearson r value. It always returns a float between -1.0 and +1.0.
Verify using the relationship: r = covariance(x, y) / (stdev(x) * stdev(y)).
Print an interpretation guide: r close to +1 is strong positive, close to -1 is strong negative, close to 0 is weak or no linear relationship.

▼ Solution & Explanation

import statistics
import math

hours  = [2, 4, 6, 8, 5, 7, 3, 9, 1, 6]
scores = [55, 70, 80, 92, 74, 85, 60, 95, 45, 78]

r = statistics.correlation(hours, scores)

print(f"Hours studied : {hours}")
print(f"Test scores   : {scores}")
print(f"\nPearson r     : {r:.4f}")

# Interpret strength and direction
def interpret_r(r):
    abs_r = abs(r)
    direction = "positive" if r > 0 else "negative"
    if abs_r >= 0.9:
        strength = "very strong"
    elif abs_r >= 0.7:
        strength = "strong"
    elif abs_r >= 0.5:
        strength = "moderate"
    elif abs_r >= 0.3:
        strength = "weak"
    else:
        strength = "very weak or no"
    return f"{strength} {direction} linear relationship"

print(f"Interpretation: {interpret_r(r)}")

# Manual verification: r = cov(x,y) / (stdev(x) * stdev(y))
cov    = statistics.covariance(hours, scores)
std_h  = statistics.stdev(hours)
std_s  = statistics.stdev(scores)
r_manual = cov / (std_h * std_s)

print(f"\nManual verification:")
print(f"  covariance(hours, scores) : {cov:.4f}")
print(f"  stdev(hours)              : {std_h:.4f}")
print(f"  stdev(scores)             : {std_s:.4f}")
print(f"  cov / (std_h * std_s)     : {r_manual:.4f}")
print(f"  Matches correlation()     : {math.isclose(r, r_manual)}")

# R-squared: proportion of variance explained
r_squared = r ** 2
print(f"\nR² = {r_squared:.4f}  ({r_squared * 100:.1f}% of score variance explained by study hours)")Code language: Python (python)

Explanation:

statistics.correlation(x, y): Returns the Pearson correlation coefficient, a dimensionless value in the range [-1, +1]. It was added in Python 3.10 alongside covariance(). A value of +1 means perfect positive linear relationship, -1 means perfect negative, and 0 means no linear relationship. For this dataset, r ≈ 0.993 indicates that hours studied is an almost perfect linear predictor of test score.
Relationship to covariance: The Pearson r is the covariance normalised by the product of both standard deviations: r = cov(x, y) / (stdev(x) * stdev(y)). This division removes the unit-dependence of raw covariance and constrains the result to [-1, +1], making it universally comparable across different datasets and variable types.
R-squared (coefficient of determination): Squaring the Pearson r gives R², which represents the proportion of variance in one variable that is explained by the other. An R² of 0.987 means that 98.7% of the variation in test scores is accounted for by variation in study hours, leaving only 1.3% unexplained by the linear relationship.

Exercise 26: Probability with NormalDist

Problem Statement: Write a Python program that creates a statistics.NormalDist object with mean 70 and standard deviation 10, then calculates the probability of a student scoring above 85, below 60, and between 65 and 80.

Purpose: NormalDist makes working with the normal (Gaussian) distribution straightforward without requiring external libraries like SciPy. It is used in grading, quality control (Six Sigma), A/B testing, risk modelling, and any domain where measurements cluster around a mean. This exercise introduces the CDF (cumulative distribution function), which answers the question “what fraction of values fall below this threshold?”

Given Input: mu = 70, sigma = 10

Expected Output: Probability of scoring above 85, below 60, and between 65 and 80, each expressed as a percentage.

▼ Hint

Create the distribution with dist = statistics.NormalDist(mu=70, sigma=10).
Use dist.cdf(x) to get the probability that a value is less than or equal to x. For “above x”, compute 1 - dist.cdf(x).
For a range, compute dist.cdf(upper) - dist.cdf(lower) to get the probability of falling between the two bounds.

▼ Solution & Explanation

import statistics
dist = statistics.NormalDist(mu=70, sigma=10)
print(f"Distribution     : N(mu={dist.mean}, sigma={dist.stdev})")
print(f"Variance         : {dist.variance}")
# Probability of scoring above 85
p_above_85  = 1 - dist.cdf(85)
print(f"\nP(score > 85)           : {p_above_85:.4f}  ({p_above_85 * 100:.2f}%)")
# Probability of scoring below 60
p_below_60  = dist.cdf(60)
print(f"P(score < 60)           : {p_below_60:.4f}  ({p_below_60 * 100:.2f}%)")
# Probability of scoring between 65 and 80
p_65_to_80  = dist.cdf(80) - dist.cdf(65)
print(f"P(65 < score < 80)      : {p_65_to_80:.4f}  ({p_65_to_80 * 100:.2f}%)")
# Verify empirical rule (68-95-99.7)
p_1sigma = dist.cdf(dist.mean + dist.stdev) - dist.cdf(dist.mean - dist.stdev)
p_2sigma = dist.cdf(dist.mean + 2*dist.stdev) - dist.cdf(dist.mean - 2*dist.stdev)
p_3sigma = dist.cdf(dist.mean + 3*dist.stdev) - dist.cdf(dist.mean - 3*dist.stdev)
print(f"\nEmpirical rule verification:")
print(f"  Within 1 sigma (60-80)  : {p_1sigma * 100:.2f}%  (expected ~68.27%)")
print(f"  Within 2 sigma (50-90)  : {p_2sigma * 100:.2f}%  (expected ~95.45%)")
print(f"  Within 3 sigma (40-100) : {p_3sigma * 100:.2f}%  (expected ~99.73%)")Code language: Python (python)

Explanation:

statistics.NormalDist(mu, sigma): Creates a normal distribution object with the given mean and standard deviation. Added in Python 3.8, it provides .cdf(), .pdf(), .inv_cdf(), .overlap(), and .zscore() methods, covering most common normal distribution tasks without needing SciPy.
dist.cdf(x): The cumulative distribution function returns the probability that a randomly drawn value from the distribution is less than or equal to x. It always returns a value between 0 and 1. For symmetric distributions like the normal, cdf(mean) is exactly 0.5.
Empirical rule verification: The output confirms the 68-95-99.7 rule: approximately 68.27% of values fall within one standard deviation of the mean, 95.45% within two, and 99.73% within three. This rule is widely used in quality control and statistical process monitoring to set acceptance thresholds.

Exercise 27: Distribution Overlap Between Two Models

Problem Statement: Write a Python program that uses NormalDist.overlap() to compute how much two normal distributions representing the error rates of two machine learning models overlap, then interprets what the overlap coefficient means for model comparison.

Purpose: Overlap between distributions is a practical measure of how distinguishable two groups are – used in medical diagnostic accuracy (sensitivity vs specificity), A/B test analysis (are the two variants really different?), and model evaluation (do two models perform differently or are their results statistically indistinguishable?). NormalDist.overlap() computes this in one call without numerical integration.

Given Input: Model A error rate: NormalDist(mu=0.15, sigma=0.03). Model B error rate: NormalDist(mu=0.22, sigma=0.04).

Expected Output: An overlap coefficient between 0 and 1, with 1 meaning identical distributions and 0 meaning completely separate distributions.

▼ Hint

Create two NormalDist objects and call dist_a.overlap(dist_b) to get the overlap coefficient.
The overlap coefficient is symmetric: dist_a.overlap(dist_b) == dist_b.overlap(dist_a).
Test the extremes: create two identical distributions and verify the overlap is 1.0, then create two distributions with means far apart and verify the overlap approaches 0.

▼ Solution & Explanation

import statistics
model_a = statistics.NormalDist(mu=0.15, sigma=0.03)
model_b = statistics.NormalDist(mu=0.22, sigma=0.04)
overlap = model_a.overlap(model_b)
print(f"Model A error rate: N(mu={model_a.mean}, sigma={model_a.stdev})")
print(f"Model B error rate: N(mu={model_b.mean}, sigma={model_b.stdev})")
print(f"\nOverlap coefficient : {overlap:.4f}  ({overlap * 100:.1f}%)")
# Interpret the overlap
def interpret_overlap(ov):
    if ov >= 0.85:
        return "Very high overlap - distributions are nearly indistinguishable."
    elif ov >= 0.60:
        return "Moderate overlap - some separation but not conclusive."
    elif ov >= 0.30:
        return "Low overlap - distributions are meaningfully different."
    else:
        return "Very low overlap - distributions are clearly distinct."
print(f"Interpretation    : {interpret_overlap(overlap)}")
# Symmetry check
print(f"\nSymmetry check    : model_b.overlap(model_a) = {model_b.overlap(model_a):.4f}")
print(f"Symmetric         : {abs(overlap - model_b.overlap(model_a)) < 1e-12}")
# Boundary cases
identical = statistics.NormalDist(mu=0.15, sigma=0.03)
far_apart = statistics.NormalDist(mu=0.90, sigma=0.03)
print(f"\nIdentical distributions overlap    : {model_a.overlap(identical):.4f}  (expected 1.0)")
print(f"Far-apart distributions overlap    : {model_a.overlap(far_apart):.6f}  (near 0.0)")Code language: Python (python)

Explanation:

dist_a.overlap(dist_b): Returns the overlap coefficient, defined as the area under the minimum of the two probability density functions. A value of 1.0 means the distributions are identical. A value of 0.0 means they have no area in common (impossible for two normal distributions unless their means are infinitely far apart). The computation is exact for normal distributions and does not require numerical integration.
Practical interpretation for model comparison: An overlap of around 0.50 (50%) means that knowing a value came from one distribution gives you only modest confidence about which model produced it. A low overlap (say, below 0.20) means the two models are performing sufficiently differently that the distinction is practically meaningful – one is clearly better than the other in terms of error rate.
Symmetry property: The overlap coefficient is symmetric by definition – swapping the two distributions does not change the shared area. This is verified in the output and is a useful sanity check when debugging code that computes pairwise overlaps for multiple distributions.

Exercise 28: Grading Curve with NormalDist.from_samples()

Problem Statement: Write a Python program that uses NormalDist.from_samples() to fit a normal distribution to raw exam scores, then rescales all scores so that the distribution mean becomes 75, preserving the relative spread of the original grades.

Purpose: Grade curving is a direct application of distribution shifting – adjusting all scores by a constant so the class mean hits a target without changing the relative ranking or spread. NormalDist.from_samples() provides a clean interface for fitting a distribution to observed data, and the resulting object’s parameters drive the rescaling logic.

Given Input: raw_scores = [52, 61, 47, 73, 68, 55, 80, 44, 66, 71, 58, 49, 76, 63, 57]

Expected Output: Original mean, target mean of 75, the shift applied, and the full list of curved scores.

▼ Hint

Use statistics.NormalDist.from_samples(data) to fit a normal distribution to the raw scores. It returns a NormalDist object whose .mean and .stdev match the sample.
Compute the shift as target_mean - dist.mean and add it to every score to curve the grades.
Verify that statistics.mean(curved_scores) equals the target mean after applying the shift.

▼ Solution & Explanation

import statistics
import math
raw_scores  = [52, 61, 47, 73, 68, 55, 80, 44, 66, 71, 58, 49, 76, 63, 57]
target_mean = 75
# Fit a normal distribution to the raw scores
dist = statistics.NormalDist.from_samples(raw_scores)
print(f"Raw scores       : {sorted(raw_scores)}")
print(f"Fitted mean      : {dist.mean:.2f}")
print(f"Fitted stdev     : {dist.stdev:.2f}")
# Compute the additive shift needed
shift = target_mean - dist.mean
print(f"\nTarget mean      : {target_mean}")
print(f"Shift applied    : {shift:+.2f} points")
# Apply the curve
curved_scores = [round(s + shift, 1) for s in raw_scores]
print(f"\nCurved scores    : {sorted(curved_scores)}")
print(f"New mean         : {statistics.mean(curved_scores):.2f}  (target: {target_mean})")
print(f"Stdev unchanged  : {statistics.stdev(curved_scores):.2f}  (was: {dist.stdev:.2f})")
# Side-by-side comparison
print(f"\n{'Raw':>6} | {'Curved':>8} | {'Change':>8}")
print("-" * 28)
for raw, curved in sorted(zip(raw_scores, curved_scores)):
    print(f"{raw:>6} | {curved:>8} | {curved - raw:>+8.1f}")
# Probability of failing (score < 60) before and after the curve
dist_curved = statistics.NormalDist.from_samples(curved_scores)
p_fail_raw    = dist.cdf(60)
p_fail_curved = dist_curved.cdf(60)
print(f"\nP(score < 60) before curve : {p_fail_raw * 100:.1f}%")
print(f"P(score < 60) after curve  : {p_fail_curved * 100:.1f}%")Code language: Python (python)

Explanation:

NormalDist.from_samples(data): A class method that fits a normal distribution to a list of observed values by computing their sample mean and standard deviation, then returns a NormalDist object with those parameters. It requires at least two data points and raises a StatisticsError for a single-element list. It is the bridge between raw data and the full set of NormalDist methods.
Additive shift preserves spread: Adding a constant to every score shifts the entire distribution horizontally without changing its shape or width. The mean increases by exactly the shift amount, but the standard deviation – which measures spread relative to the mean – remains identical. This means relative performance (rank and distance from classmates) is fully preserved after curving.
Effect on failure rate: Curving upward shifts the distribution away from the failing threshold, reducing the probability of scores below 60. The output shows this concretely by computing the CDF at 60 before and after the curve, demonstrating how a purely statistical operation translates into a meaningful educational outcome.

Exercise 29: Linear Regression to Predict Scores

Problem Statement: Write a Python program that uses statistics.linear_regression() to fit a line to hours-studied and test-score data, then uses the resulting slope and intercept to predict the score for a student who studies 8 hours.

Purpose: Linear regression is the most widely used predictive modelling technique in statistics and data science. statistics.linear_regression() was added in Python 3.10 as a lightweight built-in implementation for simple (single-variable) regression, removing the need for NumPy or scikit-learn for straightforward prediction tasks. Understanding slope and intercept lays the groundwork for all more advanced regression methods.

Given Input: hours = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] and scores = [50, 55, 60, 65, 70, 75, 80, 82, 88, 95]

Expected Output: Slope, intercept, and predicted score for 8 hours of study.

▼ Hint

Call statistics.linear_regression(x, y) which returns a named tuple with .slope and .intercept attributes.
Predict a value using the line equation: predicted = slope * x + intercept.
Compute the residuals (actual minus predicted) for each data point and print them to show how well the line fits the data.

▼ Solution & Explanation

import statistics
import math

hours  = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
scores = [50, 55, 60, 65, 70, 75, 80, 82, 88, 95]

# Fit the regression line
result    = statistics.linear_regression(hours, scores)
slope     = result.slope
intercept = result.intercept

print(f"Hours  : {hours}")
print(f"Scores : {scores}")
print(f"\nRegression line  : score = {slope:.4f} × hours + {intercept:.4f}")
print(f"Slope            : {slope:.4f}  (each extra hour adds ~{slope:.1f} points)")
print(f"Intercept        : {intercept:.4f}  (predicted score at 0 hours)")

# Predict for 8 hours
predict_hours = 8
predicted = slope * predict_hours + intercept
print(f"\nPrediction for {predict_hours} hours: {predicted:.2f} points")

# Residuals: actual minus predicted for each data point
print(f"\n{'Hours':>6} | {'Actual':>8} | {'Predicted':>10} | {'Residual':>10}")
print("-" * 42)
ss_res = 0
for h, actual in zip(hours, scores):
    pred    = slope * h + intercept
    resid   = actual - pred
    ss_res += resid ** 2
    print(f"{h:>6} | {actual:>8} | {pred:>10.2f} | {resid:>+10.2f}")

# R-squared from regression vs correlation
r       = statistics.correlation(hours, scores)
r_sq    = r ** 2
print(f"\nR² (from correlation) : {r_sq:.4f}  ({r_sq * 100:.1f}% of variance explained)")Code language: Python (python)

Explanation:

statistics.linear_regression(x, y): Returns a LinearRegression named tuple with .slope and .intercept attributes. It fits the line that minimises the sum of squared vertical distances from the data points to the line (ordinary least squares). Added in Python 3.10, it supports an optional proportional=True argument to force the line through the origin.
Slope interpretation: The slope tells you the expected change in the dependent variable (score) for each one-unit increase in the independent variable (hours). A slope of approximately 4.6 means each additional hour of study is associated with roughly 4.6 more points on the test, on average across the dataset.
Residuals: The residual for each data point is the difference between the actual value and the value predicted by the line. Small residuals indicate a good fit. Large or systematically patterned residuals (e.g. all positive for high x values) signal that a linear model may not be the best choice and a curved relationship might fit better.

Exercise 30: Z-Score Outlier Detector Combining NormalDist and quantiles()

Problem Statement: Write a Python function that combines statistics.NormalDist and statistics.quantiles() to build a dual-method outlier detector: one based on z-scores from the fitted normal distribution, and one based on the IQR fence method, then compares which values each method flags.

Purpose: No single outlier detection method is universally best. The z-score method assumes normality and is sensitive to extreme values distorting the mean and standard deviation. The IQR method is non-parametric and robust to skew. Combining both in a single function and comparing their outputs builds the critical thinking needed to choose the right tool for a given dataset in real data cleaning pipelines.

Given Input: data = [14, 18, 11, 13, 6, 8, 2, 74, 12, 9, 17, 15, 10, 13, 16, 8, 11, -5, 14, 12]

Expected Output: Outliers flagged by each method printed separately, followed by a comparison showing agreements and disagreements.

▼ Hint

For the z-score method, fit a NormalDist.from_samples(data) and use dist.zscore(x) to compute each value’s z-score. Flag values where abs(z) > threshold (commonly 2.5 or 3).
For the IQR method, use statistics.quantiles(data, n=4) to get Q1 and Q3, compute IQR, and flag values outside [Q1 - 1.5×IQR, Q3 + 1.5×IQR].
Use set operations (& for intersection, | for union, - for difference) to compare which values each method flags exclusively vs which both agree on.

▼ Solution & Explanation

import statistics
data = [14, 18, 11, 13, 6, 8, 2, 74, 12, 9,
        17, 15, 10, 13, 16, 8, 11, -5, 14, 12]
# --- Method 1: Z-score using NormalDist ---
dist        = statistics.NormalDist.from_samples(data)
z_threshold = 2.5
z_outliers = {x for x in data if abs(dist.zscore(x)) > z_threshold}
print(f"Dataset          : {sorted(data)}")
print(f"Fitted mean      : {dist.mean:.2f}")
print(f"Fitted stdev     : {dist.stdev:.2f}")
print(f"\nZ-score method (|z| > {z_threshold}):")
for x in sorted(data):
    z = dist.zscore(x)
    flag = " <-- OUTLIER" if abs(z) > z_threshold else ""
    print(f"  {x:>5}  z = {z:+.2f}{flag}")
# --- Method 2: IQR fence using quantiles ---
q1, q2, q3 = statistics.quantiles(data, n=4)
iqr         = q3 - q1
lower_fence = q1 - 1.5 * iqr
upper_fence = q3 + 1.5 * iqr
iqr_outliers = {x for x in data if x < lower_fence or x > upper_fence}
print(f"\nIQR method (Q1={q1}, Q3={q3}, IQR={iqr}):")
print(f"  Lower fence : {lower_fence:.2f}")
print(f"  Upper fence : {upper_fence:.2f}")
print(f"  IQR outliers: {sorted(iqr_outliers)}")
# --- Comparison ---
both    = z_outliers & iqr_outliers
z_only  = z_outliers - iqr_outliers
iqr_only = iqr_outliers - z_outliers
print(f"\nComparison:")
print(f"  Flagged by BOTH methods  : {sorted(both)}")
print(f"  Z-score only             : {sorted(z_only)}")
print(f"  IQR only                 : {sorted(iqr_only)}")
print(f"\nClean data (neither method): {sorted(set(data) - (z_outliers | iqr_outliers))}")Code language: Python (python)

Explanation:

dist.zscore(x): Computes (x - mean) / stdev for the fitted distribution. A z-score of +2.5 means the value is 2.5 standard deviations above the mean. Because the mean and standard deviation are computed from the full dataset including outliers, extreme values inflate both parameters and can mask themselves – a known weakness of the z-score method called masking or swamping.
IQR method robustness: The IQR uses only the middle 50% of the data (Q1 to Q3) to set its fences, so a single extreme outlier cannot shift the boundaries the way it shifts the mean and standard deviation. This makes the IQR method resistant to the masking problem and more reliable for skewed or heavy-tailed distributions.
Set operations for comparison: Using Python sets and the & (intersection), | (union), and - (difference) operators cleanly separates values flagged by both methods, only by z-score, and only by IQR. Values flagged by both are high-confidence outliers. Values flagged by only one method warrant closer inspection to determine whether the data point is genuinely anomalous or whether the method’s assumptions are mismatched to the data.

Table of contents

Exercise 1: Square Root and Power

Exercise 2: Area of a Circle

Exercise 3: Factorial Computation

Exercise 4: Ceiling and Floor

Exercise 5: Greatest Common Divisor

Exercise 6: Hypotenuse of a Right Triangle

Exercise 7: Natural and Base-10 Logarithms

Exercise 8: Degrees to Radians, Sine and Cosine

Exercise 9: Combinations and Permutations

Exercise 10: Floating-Point Precision with math.isclose()

Exercise 11: nth Fibonacci Using the Golden Ratio

Exercise 12: Verifying exp() and log() Are Inverses

Exercise 13: Degrees to DMS Converter

Exercise 14: Perfect Square Checker Using math.isqrt()

Exercise 15: Product of All Primes Below 30

Exercise 16: Mean, Median, and Mode of Exam Scores

Exercise 17: Sample vs Population Standard Deviation

Exercise 18: Variance of Temperatures

Exercise 19: median_low() and median_high()

Exercise 20: Finding All Modes with multimode()

Exercise 21: Quartiles with statistics.quantiles()

Exercise 22: Harmonic Mean of Internet Speeds

Exercise 23: Geometric Mean for Compound Growth Rate

Exercise 24: Covariance Between Study Hours and Test Scores

Exercise 25: Pearson Correlation Coefficient

Exercise 26: Probability with NormalDist

Exercise 27: Distribution Overlap Between Two Models

Exercise 28: Grading Curve with NormalDist.from_samples()

Exercise 29: Linear Regression to Predict Scores

Exercise 30: Z-Score Outlier Detector Combining NormalDist and quantiles()

About Vishal

Related Tutorial Topics:

All Coding Exercises:

Python Exercises and Quizzes

Leave a Reply Cancel reply

About PYnative

Follow Us

Explore Python

Coding Exercises

Legal Stuff