This lesson demonstrates ways to choose single or multiple elements from the list randomly with a different probability. Use the
random.choices() function to get the weighted random samples in Python.
Let’s take the following example for a better understanding of the requirement.
import random sampleList = [10, 20, 30, 40] x = random.choice(sampleList) print(x)
If you execute the choice() in the above code, it will give you 10, 20, 30, or 40 with equal probability. But what if you want to pick the element from the list with a different probability. For example, choose a list of items from any sequence in such a way that each element has a different probability of being selected.
In other words, choose 4 elements from the list randomly with different probabilities. For example:
- Choose 10 – 10% of the time
- Choose 20 – 25% of the time
- Choose 30 – 50% of the time
- Choose 40 – 15% of the time
There are 2 ways to make weighted random choices in Python
- If you are using Python 3.6 or above then use the
- Else, use a
We will see how to use both one by one.
Also, see: –
Table of contents
- Relative weights to choose elements from the list with different probability
- Cumulative weights to choose items from the list with different probability
- Probability of getting 6 or more heads from 10 spins
- Generate weighted random numbers
- Points to remember before implementing weighted random choices
- Numpy’s random.choice() to choose elements from the list with different probability
- Next Steps
Python 3.6 introduced a new function
random.choices() in the random module. By using the
choices() function, we can make a weighted random choice with replacement. You can also call it a weighted random sample with replacement.
Let’s have a look at the syntax of this function.
random.choices(population, weights=None, *, cum_weights=None, k=1)
It returns a
k sized list of elements chosen from the
population with replacement.
population: It is is sequence or data structure from which you want to choose data.
cum_weights: Define the selection probability for each element.
weights: If a
weightssequence is specified, random selections are made according to the relative weights.
cum_weights: Alternatively, if a cum_weights sequence is given, the random selections are made according to the cumulative weights.
k: The number of samples you want from a
Note: You cannot specify both
cum_weights at the same time.
As mentioned above we can define weights sequence using the following two ways
- Relative weights
- Cumulative weights
Relative weights to choose elements from the list with different probability
First, define the probability for each element. If you specified the probability using the relative weight, the selections are made according to the relative weights. You can set relative weights using the
Example: Choose 5 elements from the list with different probability
import random numberList = [111, 222, 333, 444, 555] print(random.choices(numberList, weights=(10, 20, 30, 40, 50), k=5)) # Output [555, 222, 555, 222, 555]
- As you can see in the output, we received an item ‘555‘ three times because we assigned the highest weight to it. So it has the highest probability to be selected
- Weights sum is not 100 because they’re relative weights, not percentages.
The following rule determines the weighted probability of selecting each element.
Probability = element_weight/ sum of all weights
In the above example, the probability of occurring each element is determined is as follows
The total weight is 10+20+30+40+50 = 150 List is [111, 222, 333, 444, 555] It returns 111 with probability 0.66 (10/150) It returns 222 with probability 0.13 (20/150) It returns 333 with probability 0.20 (30/150) It returns 444 with probability 0.26 (40/150) It returns 555 with probability 0.33 (50/150)
Cumulative weights to choose items from the list with different probability
To make selections according to the cumulative weights, use the
Note: Python converts the relative weights to cumulative weights before making selections. So, I suggest you pass cumulative weights to saves time and extra work.
he cumulative weight of each element is determined by using the following formula.
cum_weight= Weight of previous element + own weight
For example, the relative weights
[5, 10, 15, 20] are equivalent to the cumulative weights
[5, 15, 30, 50].
Let’s see how to use cumulative weights to choose 4 elements from a list with different probability.
import random nameList = ["Kelly", "Scott", "Emma", "Jon"] print(random.choices(nameList, cum_weights=(5, 15, 30, 50), k=4)) # Output ['Jon', 'Kelly', 'Jon', 'Scott']
Choose a single element form list with different probability
import random names = ["Kelly", "Scott", "Emma", "Jon"] for i in range(3): item = random.choices(names, cum_weights=(5, 15, 30, 50), k=1) print("Iteration:", i, "Weighted Random choice is", item)
Iteration: 0 Weighted Random choice is Jon Iteration: 1 Weighted Random choice is Kelly Iteration: 2 Weighted Random choice is Jon
Note: we got “Jon” 3 times in the result because it has the highest probability of being selected
Probability of getting 6 or more heads from 10 spins
Use the cumulative weights to set the probability of getting the head of a coin to 0.61, and the tail of a coin to 0.39 (1 – 0.61 = 0.39)
import random # we specified head and tail of a coin in string coin = "HT" # Execute 3 times to verify we are getting 6 or more heads in every 10 spins for i in range(3): print(random.choices(coin, cum_weights=(0.61, 1.00), k=10))
['H', 'H', 'H', 'H', 'H', 'H', 'H', 'T', 'H', 'T'] ['H', 'T', 'H', 'H', 'H', 'T', 'H', 'H', 'H', 'H'] ['H', 'T', 'T', 'T', 'H', 'T', 'H', 'H', 'H', 'H']
Generate weighted random numbers
Given a range of integers, we want to generate five random numbers based on the weight. We need to specify the probability/weight for each number to be selected. Let’s see how to generate random numbers with a given (numerical) distribution with different probability
import random # Generate 6 random numbers from a given range with weighted probability numbers = random.choices(range(10, 40, 5), cum_weights=(5, 15, 10, 25, 40, 65), k=6) print(numbers) # Output [35, 35, 15, 10, 35, 35]
Points to remember before implementing weighted random choices
- If you don’t specify the relative or cumulative weight, the r
andom.choices()will choose elements with equal probability.
- The specified weights sequence must be of the same length as the population sequence.
- Don’t specify relative weights and cumulative weight at the same time to avoid Type Error (
TypeError: Cannot specify both weights and cumulative weights).
- You can specify The weights or cum_weights only as integers, floats, and fractions but excludes decimals.
- Weights must be non-negative.
random.choice() to choose elements from the list with different probability
If you are using Python version less than 3.6, you can use the NumPy library to make weighted random choices. Install numpy using a
pip install numpy.
numpy.random.choice() you can specify the probability distribution.
numpy.random.choice(a, size=None, replace=True, p=None)
a: It is the population from which you want to choose elements. for example, list.
size: It is nothing but the number of elements you want to choose.
p: It Used to specify the probability for each element to be selected.
Note: Probabilities must sum to 1, i.e., when you specify probability weights for each element, the sum of all weights must be 1.
import numpy as np numberList = [100, 200, 300, 400] # Choose elements with different probabilities sampleNumbers = np.random.choice(numberList, 4, p=[0.10, 0.20, 0.30, 0.40]) print(sampleNumbers) # Output [300 200 300 300]
I want to hear from you. What do you think of this article? Or maybe I missed one of the ways to generate weighted random choices? Either way, let me know by leaving a comment below.
Also, try to solve the following Free exercise and quiz to have a better understanding of working with random data in Python.