The goal of the numpy exercises is to serve as a reference as well as to get you to apply numpy beyond the basics. The questions are of 4 levels of difficulties with L1 being the easiest to L4 being the hardest.
Numpy Tutorial Part 2: Vital Functions for Data Analysis. Photo by Ana Justin Luebke.
If you want a quick refresher on numpy, the numpy basics and the advanced numpy tutorials might be what you are looking for.
1. Import numpy as np and see the version
Difficulty Level: L1
Q. Import numpy as `np` and print the version number.
Show Solution
<code>import numpy as np print(np.__version__) #> 1.13.3 </code>
You must import numpy
as np
for the rest of the codes in this exercise to work.
To install numpy its recommended to use the installation provided by anaconda.
2. How to create a 1D array?
Difficulty Level: L1
Q. Create a 1D array of numbers from 0 to 9
Desired output:
<code>#> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) </code>
Show Solution
<code>arr = np.arange(10) arr #> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) </code>
3. How to create a boolean array?
Difficulty Level: L1
Q. Create a 3×3 numpy array of all True’s
Show Solution
<code>np.full((3, 3), True, dtype=bool) #> array([[ True, True, True], #> [ True, True, True], #> [ True, True, True]], dtype=bool) # Alternate method: np.ones((3,3), dtype=bool) </code>
4. How to extract items that satisfy a given condition from 1D array?
Difficulty Level: L1
Q. Extract all odd numbers from arr
Input:
<code>arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])` </code>
Desired output:
<code>#> array([1, 3, 5, 7, 9]) </code>
Show Solution
<code># Input arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) # Solution arr[arr % 2 == 1] #> array([1, 3, 5, 7, 9]) </code>
5. How to replace items that satisfy a condition with another value in numpy array?
Difficulty Level: L1
Q. Replace all odd numbers in arr
with -1
Input:
<code>arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) </code>
Desired Output:
<code>#> array([ 0, -1, 2, -1, 4, -1, 6, -1, 8, -1]) </code>
Show Solution
<code>arr[arr % 2 == 1] = -1 arr #> array([ 0, -1, 2, -1, 4, -1, 6, -1, 8, -1]) </code>
6. How to replace items that satisfy a condition without affecting the original array?
Difficulty Level: L2
Q. Replace all odd numbers in arr with -1 without changing arr
Input:
<code>arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) </code>
Desired Output:
<code>out #> array([ 0, -1, 2, -1, 4, -1, 6, -1, 8, -1]) arr #> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) </code>
Show Solution
<code>arr = np.arange(10) out = np.where(arr % 2 == 1, -1, arr) print(arr) out #> [0 1 2 3 4 5 6 7 8 9] array([ 0, -1, 2, -1, 4, -1, 6, -1, 8, -1]) </code>
7. How to reshape an array?
Difficulty Level: L1
Q. Convert a 1D array to a 2D array with 2 rows
Input:
<code>np.arange(10) #> array([ 0, -1, 2, -1, 4, -1, 6, -1, 8, -1]) </code>
Desired Output:
<code>#> array([[0, 1, 2, 3, 4], #> [5, 6, 7, 8, 9]]) </code>
Show Solution
<code>arr = np.arange(10) arr.reshape(2, -1) # Setting to -1 automatically decides the number of cols #> array([[0, 1, 2, 3, 4], #> [5, 6, 7, 8, 9]]) </code>
8. How to stack two arrays vertically?
Difficulty Level: L2
Q. Stack arrays a
and b
vertically
Input
<code>a = np.arange(10).reshape(2,-1) b = np.repeat(1, 10).reshape(2,-1) </code>
Desired Output:
<code>#> array([[0, 1, 2, 3, 4], #> [5, 6, 7, 8, 9], #> [1, 1, 1, 1, 1], #> [1, 1, 1, 1, 1]])` </code>
Show Solution
<code>a = np.arange(10).reshape(2,-1) b = np.repeat(1, 10).reshape(2,-1) # Answers # Method 1: np.concatenate([a, b], axis=0) # Method 2: np.vstack([a, b]) # Method 3: np.r_[a, b] #> array([[0, 1, 2, 3, 4], #> [5, 6, 7, 8, 9], #> [1, 1, 1, 1, 1], #> [1, 1, 1, 1, 1]]) </code>
9. How to stack two arrays horizontally?
Difficulty Level: L2
Q. Stack the arrays a
and b
horizontally.
Input
<code>a = np.arange(10).reshape(2,-1) b = np.repeat(1, 10).reshape(2,-1) </code>
Desired Output:
<code>#> array([[0, 1, 2, 3, 4, 1, 1, 1, 1, 1], #> [5, 6, 7, 8, 9, 1, 1, 1, 1, 1]]) </code>
Show Solution
<code>a = np.arange(10).reshape(2,-1) b = np.repeat(1, 10).reshape(2,-1) # Answers # Method 1: np.concatenate([a, b], axis=1) # Method 2: np.hstack([a, b]) # Method 3: np.c_[a, b] #> array([[0, 1, 2, 3, 4, 1, 1, 1, 1, 1], #> [5, 6, 7, 8, 9, 1, 1, 1, 1, 1]]) </code>
10. How to generate custom sequences in numpy without hardcoding?
Difficulty Level: L2
Q. Create the following pattern without hardcoding. Use only numpy functions and the below input array a
.
Input:
<code>a = np.array([1,2,3])` </code>
Desired Output:
<code>#> array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]) </code>
Show Solution
<code>np.r_[np.repeat(a, 3), np.tile(a, 3)] #> array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]) </code>
11. How to get the common items between two python numpy arrays?
Difficulty Level: L2
Q. Get the common items between a
and b
Input:
<code>a = np.array([1,2,3,2,3,4,3,4,5,6]) b = np.array([7,2,10,2,7,4,9,4,9,8])</code>
Desired Output:
<code>array([2, 4])</code>
Show Solution
<code>a = np.array([1,2,3,2,3,4,3,4,5,6]) b = np.array([7,2,10,2,7,4,9,4,9,8]) np.intersect1d(a,b) #> array([2, 4]) </code>
12. How to remove from one array those items that exist in another?
Difficulty Level: L2
Q. From array a
remove all items present in array b
Input:
<code>a = np.array([1,2,3,4,5]) b = np.array([5,6,7,8,9])</code>
Desired Output:
<code>array([1,2,3,4])</code>
Show Solution
<code>a = np.array([1,2,3,4,5]) b = np.array([5,6,7,8,9]) # From 'a' remove all of 'b' np.setdiff1d(a,b) #> array([1, 2, 3, 4]) </code>
13. How to get the positions where elements of two arrays match?
Difficulty Level: L2
Q. Get the positions where elements of a
and b
match
Input:
<code>a = np.array([1,2,3,2,3,4,3,4,5,6]) b = np.array([7,2,10,2,7,4,9,4,9,8]) </code>
Desired Output:
<code>#> (array([1, 3, 5, 7]),) </code>
Show Solution
<code>a = np.array([1,2,3,2,3,4,3,4,5,6]) b = np.array([7,2,10,2,7,4,9,4,9,8]) np.where(a == b) #> (array([1, 3, 5, 7]),) </code>
14. How to extract all numbers between a given range from a numpy array?
Difficulty Level: L2
Q. Get all items between 5 and 10 from a
.
Input:
<code>a = np.arange(15)</code>
Desired Output:
<code>(array([ 5, 6, 7, 8, 9, 10]),)</code>
Show Solution
<code>a = np.arange(15) # Method 1 index = np.where((a >= 5) & (a <= 10)) a[index] # Method 2: index = np.where(np.logical_and(a>=5, a<=10)) a[index] #> (array([ 5, 6, 7, 8, 9, 10]),) </code>
# Method 3: (thanks loganzk!) a[(a >= 5) & (a <= 10)] [/expand]
15. How to make a python function that handles scalars to work on numpy arrays?
Difficulty Level: L2
Q. Convert the function maxx
that works on two scalars, to work on two arrays.
Input:
<code>def maxx(x, y): """Get the maximum of two items""" if x >= y: return x else: return y maxx(1, 5) #> 5 </code>
Desired Output:
<code>a = np.array([5, 7, 9, 8, 6, 4, 5]) b = np.array([6, 3, 4, 8, 9, 7, 1]) pair_max(a, b) #> array([ 6., 7., 9., 8., 9., 7., 5.]) </code>
Show Solution
<code>def maxx(x, y): """Get the maximum of two items""" if x >= y: return x else: return y pair_max = np.vectorize(maxx, otypes=[float]) a = np.array([5, 7, 9, 8, 6, 4, 5]) b = np.array([6, 3, 4, 8, 9, 7, 1]) pair_max(a, b) #> array([ 6., 7., 9., 8., 9., 7., 5.]) </code>
16. How to swap two columns in a 2d numpy array?
Difficulty Level: L2
Q. Swap columns 1 and 2 in the array arr
.
<code>arr = np.arange(9).reshape(3,3) arr </code>
Show Solution
<code># Input arr = np.arange(9).reshape(3,3) arr # Solution arr[:, [1,0,2]] #> array([[1, 0, 2], #> [4, 3, 5], #> [7, 6, 8]]) </code>
17. How to swap two rows in a 2d numpy array?
Difficulty Level: L2
Q. Swap rows 1 and 2 in the array arr
:
<code>arr = np.arange(9).reshape(3,3) arr </code>
Show Solution
<code># Input arr = np.arange(9).reshape(3,3) # Solution arr[[1,0,2], :] #> array([[3, 4, 5], #> [0, 1, 2], #> [6, 7, 8]]) </code>
18. How to reverse the rows of a 2D array?
Difficulty Level: L2
Q. Reverse the rows of a 2D array arr
.
<code># Input arr = np.arange(9).reshape(3,3) </code>
Show Solution
<code># Input arr = np.arange(9).reshape(3,3) # Solution arr[::-1] </code>
<code>array([[6, 7, 8], [3, 4, 5], [0, 1, 2]]) </code>
19. How to reverse the columns of a 2D array?
Difficulty Level: L2
Q. Reverse the columns of a 2D array arr
.
<code># Input arr = np.arange(9).reshape(3,3) </code>
Show Solution
<code># Input arr = np.arange(9).reshape(3,3) # Solution arr[:, ::-1] #> array([[2, 1, 0], #> [5, 4, 3], #> [8, 7, 6]]) </code>
20. How to create a 2D array containing random floats between 5 and 10?
Difficulty Level: L2
Q. Create a 2D array of shape 5×3 to contain random decimal numbers between 5 and 10.
Show Solution
<code># Input arr = np.arange(9).reshape(3,3) # Solution Method 1: rand_arr = np.random.randint(low=5, high=10, size=(5,3)) + np.random.random((5,3)) # print(rand_arr) # Solution Method 2: rand_arr = np.random.uniform(5,10, size=(5,3)) print(rand_arr) #> [[ 8.50061025 9.10531502 6.85867783] #> [ 9.76262069 9.87717411 7.13466701] #> [ 7.48966403 8.33409158 6.16808631] #> [ 7.75010551 9.94535696 5.27373226] #> [ 8.0850361 5.56165518 7.31244004]] </code>
21. How to print only 3 decimal places in python numpy array?
Difficulty Level: L1
Q. Print or show only 3 decimal places of the numpy array rand_arr
.
Input:
<code>rand_arr = np.random.random((5,3)) </code>
Show Solution
<code># Input rand_arr = np.random.random((5,3)) # Create the random array rand_arr = np.random.random([5,3]) # Limit to 3 decimal places np.set_printoptions(precision=3) rand_arr[:4] #> array([[ 0.443, 0.109, 0.97 ], #> [ 0.388, 0.447, 0.191], #> [ 0.891, 0.474, 0.212], #> [ 0.609, 0.518, 0.403]]) </code>
22. How to pretty print a numpy array by suppressing the scientific notation (like 1e10)?
Difficulty Level: L1
Q. Pretty print rand_arr
by suppressing the scientific notation (like 1e10)
Input:
<code># Create the random array np.random.seed(100) rand_arr = np.random.random([3,3])/1e3 rand_arr #> array([[ 5.434049e-04, 2.783694e-04, 4.245176e-04], #> [ 8.447761e-04, 4.718856e-06, 1.215691e-04], #> [ 6.707491e-04, 8.258528e-04, 1.367066e-04]]) </code>
Desired Output:
<code>#> array([[ 0.000543, 0.000278, 0.000425], #> [ 0.000845, 0.000005, 0.000122], #> [ 0.000671, 0.000826, 0.000137]]) </code>
Show Solution
<code># Reset printoptions to default np.set_printoptions(suppress=False) # Create the random array np.random.seed(100) rand_arr = np.random.random([3,3])/1e3 rand_arr #> array([[ 5.434049e-04, 2.783694e-04, 4.245176e-04], #> [ 8.447761e-04, 4.718856e-06, 1.215691e-04], #> [ 6.707491e-04, 8.258528e-04, 1.367066e-04]]) </code>
<code>np.set_printoptions(suppress=True, precision=6) # precision is optional rand_arr #> array([[ 0.000543, 0.000278, 0.000425], #> [ 0.000845, 0.000005, 0.000122], #> [ 0.000671, 0.000826, 0.000137]]) </code>
23. How to limit the number of items printed in output of numpy array?
Difficulty Level: L1
Q. Limit the number of items printed in python numpy array a
to a maximum of 6 elements.
Input:
<code>a = np.arange(15) #> array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]) </code>
Desired Output:
<code>#> array([ 0, 1, 2, ..., 12, 13, 14]) </code>
Show Solution
<code>np.set_printoptions(threshold=6) a = np.arange(15) a #> array([ 0, 1, 2, ..., 12, 13, 14]) </code>
24. How to print the full numpy array without truncating
Difficulty Level: L1
Q. Print the full numpy array a
without truncating.
Input:
<code>np.set_printoptions(threshold=6) a = np.arange(15) a #> array([ 0, 1, 2, ..., 12, 13, 14]) </code>
Desired Output:
<code>a #> array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]) </code>
Show Solution
<code># Input np.set_printoptions(threshold=6) a = np.arange(15) # Solution np.set_printoptions(threshold=np.nan) a #> array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]) </code>
25. How to import a dataset with numbers and texts keeping the text intact in python numpy?
Difficulty Level: L2
Q. Import the iris dataset keeping the text intact.
Show Solution
<code># Solution url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species') # Print the first 3 rows iris[:3] #> array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa'], #> [b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa'], #> [b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa']], dtype=object) </code>
Since we want to retain the species, a text field, I have set the dtype
to object
. Had I set dtype=None
, a 1d array of tuples would have been returned.
26. How to extract a particular column from 1D array of tuples?
Difficulty Level: L2
Q. Extract the text column species
from the 1D iris
imported in previous question.
Input:
<code>url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_1d = np.genfromtxt(url, delimiter=',', dtype=None) </code>
Show Solution
<code># Input: url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_1d = np.genfromtxt(url, delimiter=',', dtype=None) print(iris_1d.shape) # Solution: species = np.array([row[4] for row in iris_1d]) species[:5] #> (150,) #> array([b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', #> b'Iris-setosa'], #> dtype='|S18') </code>
27. How to convert a 1d array of tuples to a 2d numpy array?
Difficulty Level: L2
Q. Convert the 1D iris
to 2D array iris_2d
by omitting the species
text field.
Input:
<code>url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_1d = np.genfromtxt(url, delimiter=',', dtype=None) </code>
Show Solution
<code># Input: url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_1d = np.genfromtxt(url, delimiter=',', dtype=None) # Solution: # Method 1: Convert each row to a list and get the first 4 items iris_2d = np.array([row.tolist()[:4] for row in iris_1d]) iris_2d[:4] # Alt Method 2: Import only the first 4 columns from source url iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) iris_2d[:4] #> array([[ 5.1, 3.5, 1.4, 0.2], #> [ 4.9, 3. , 1.4, 0.2], #> [ 4.7, 3.2, 1.3, 0.2], #> [ 4.6, 3.1, 1.5, 0.2]]) </code>
28. How to compute the mean, median, standard deviation of a numpy array?
Difficulty: L1
Q. Find the mean, median, standard deviation of iris’s sepallength
(1st column)
<code>url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') </code>
Show Solution
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0]) # Solution mu, med, sd = np.mean(sepallength), np.median(sepallength), np.std(sepallength) print(mu, med, sd) #> 5.84333333333 5.8 0.825301291785 </code>
29. How to normalize an array so the values range exactly between 0 and 1?
Difficulty: L2
Q. Create a normalized form of iris
‘s sepallength
whose values range exactly between 0 and 1 so that the minimum has value 0 and maximum has value 1.
Input:
<code>url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0]) </code>
Show Solution
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0]) # Solution Smax, Smin = sepallength.max(), sepallength.min() S = (sepallength - Smin)/(Smax - Smin) print(S) #> [ 0.222 0.167 0.111 0.083 0.194 0.306 0.083 0.194 0.028 0.167 #> 0.306 0.139 0.139 0. 0.417 0.389 0.306 0.222 0.389 0.222 #> 0.306 0.222 0.083 0.222 0.139 0.194 0.194 0.25 0.25 0.111 #> 0.139 0.306 0.25 0.333 0.167 0.194 0.333 0.167 0.028 0.222 #> 0.194 0.056 0.028 0.194 0.222 0.139 0.222 0.083 0.278 0.194 #> 0.75 0.583 0.722 0.333 0.611 0.389 0.556 0.167 0.639 0.25 #> 0.194 0.444 0.472 0.5 0.361 0.667 0.361 0.417 0.528 0.361 #> 0.444 0.5 0.556 0.5 0.583 0.639 0.694 0.667 0.472 0.389 #> 0.333 0.333 0.417 0.472 0.306 0.472 0.667 0.556 0.361 0.333 #> 0.333 0.5 0.417 0.194 0.361 0.389 0.389 0.528 0.222 0.389 #> 0.556 0.417 0.778 0.556 0.611 0.917 0.167 0.833 0.667 0.806 #> 0.611 0.583 0.694 0.389 0.417 0.583 0.611 0.944 0.944 0.472 #> 0.722 0.361 0.944 0.556 0.667 0.806 0.528 0.5 0.583 0.806 #> 0.861 1. 0.583 0.556 0.5 0.944 0.556 0.583 0.472 0.722 #> 0.667 0.722 0.417 0.694 0.667 0.667 0.556 0.611 0.528 0.444] </code>
30. How to compute the softmax score?
Difficulty Level: L3
Q. Compute the softmax score of sepallength
.
<code>url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0]) </code>
Show Solution
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') sepallength = np.array([float(row[0]) for row in iris]) # Solution def softmax(x): """Compute softmax values for each sets of scores in x. https://stackoverflow.com/questions/34968722/how-to-implement-the-softmax-function-in-python""" e_x = np.exp(x - np.max(x)) return e_x / e_x.sum(axis=0) print(softmax(sepallength)) #> [ 0.002 0.002 0.001 0.001 0.002 0.003 0.001 0.002 0.001 0.002 #> 0.003 0.002 0.002 0.001 0.004 0.004 0.003 0.002 0.004 0.002 #> 0.003 0.002 0.001 0.002 0.002 0.002 0.002 0.002 0.002 0.001 #> 0.002 0.003 0.002 0.003 0.002 0.002 0.003 0.002 0.001 0.002 #> 0.002 0.001 0.001 0.002 0.002 0.002 0.002 0.001 0.003 0.002 #> 0.015 0.008 0.013 0.003 0.009 0.004 0.007 0.002 0.01 0.002 #> 0.002 0.005 0.005 0.006 0.004 0.011 0.004 0.004 0.007 0.004 #> 0.005 0.006 0.007 0.006 0.008 0.01 0.012 0.011 0.005 0.004 #> 0.003 0.003 0.004 0.005 0.003 0.005 0.011 0.007 0.004 0.003 #> 0.003 0.006 0.004 0.002 0.004 0.004 0.004 0.007 0.002 0.004 #> 0.007 0.004 0.016 0.007 0.009 0.027 0.002 0.02 0.011 0.018 #> 0.009 0.008 0.012 0.004 0.004 0.008 0.009 0.03 0.03 0.005 #> 0.013 0.004 0.03 0.007 0.011 0.018 0.007 0.006 0.008 0.018 #> 0.022 0.037 0.008 0.007 0.006 0.03 0.007 0.008 0.005 0.013 #> 0.011 0.013 0.004 0.012 0.011 0.011 0.007 0.009 0.007 0.005] </code>
31. How to find the percentile scores of a numpy array?
Difficulty Level: L1
Q. Find the 5th and 95th percentile of iris’s sepallength
<code>url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0]) </code>
Show Solution
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0]) # Solution np.percentile(sepallength, q=[5, 95]) #> array([ 4.6 , 7.255]) </code>
32. How to insert values at random positions in an array?
Difficulty Level: L2
Q. Insert np.nan
values at 20 random positions in iris_2d
dataset
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='object') </code>
Show Solution
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='object') # Method 1 i, j = np.where(iris_2d) # i, j contain the row numbers and column numbers of 600 elements of iris_x np.random.seed(100) iris_2d[np.random.choice((i), 20), np.random.choice((j), 20)] = np.nan # Method 2 np.random.seed(100) iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan # Print first 10 rows print(iris_2d[:10]) #> [[b'5.1' b'3.5' b'1.4' b'0.2' b'Iris-setosa'] #> [b'4.9' b'3.0' b'1.4' b'0.2' b'Iris-setosa'] #> [b'4.7' b'3.2' b'1.3' b'0.2' b'Iris-setosa'] #> [b'4.6' b'3.1' b'1.5' b'0.2' b'Iris-setosa'] #> [b'5.0' b'3.6' b'1.4' b'0.2' b'Iris-setosa'] #> [b'5.4' b'3.9' b'1.7' b'0.4' b'Iris-setosa'] #> [b'4.6' b'3.4' b'1.4' b'0.3' b'Iris-setosa'] #> [b'5.0' b'3.4' b'1.5' b'0.2' b'Iris-setosa'] #> [b'4.4' nan b'1.4' b'0.2' b'Iris-setosa'] #> [b'4.9' b'3.1' b'1.5' b'0.1' b'Iris-setosa']] </code>
33. How to find the position of missing values in numpy array?
Difficulty Level: L2
Q. Find the number and position of missing values in iris_2d
‘s sepallength
(1st column)
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float') iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan </code>
Show Solution
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan # Solution print("Number of missing values: \n", np.isnan(iris_2d[:, 0]).sum()) print("Position of missing values: \n", np.where(np.isnan(iris_2d[:, 0]))) #> Number of missing values: #> 5 #> Position of missing values: #> (array([ 39, 88, 99, 130, 147]),) </code>
34. How to filter a numpy array based on two or more conditions?
Difficulty Level: L3
Q. Filter the rows of iris_2d
that has petallength (3rd column) > 1.5
and sepallength (1st column) < 5.0
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) </code>
Show Solution
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) # Solution condition = (iris_2d[:, 2] > 1.5) & (iris_2d[:, 0] < 5.0) iris_2d[condition] #> array([[ 4.8, 3.4, 1.6, 0.2], #> [ 4.8, 3.4, 1.9, 0.2], #> [ 4.7, 3.2, 1.6, 0.2], #> [ 4.8, 3.1, 1.6, 0.2], #> [ 4.9, 2.4, 3.3, 1. ], #> [ 4.9, 2.5, 4.5, 1.7]]) </code>
35. How to drop rows that contain a missing value from a numpy array?
Difficulty Level: L3:
Q. Select the rows of iris_2d that does not have any nan
value.
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) </code>
Show Solution
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan # Solution # No direct numpy function for this. any_nan_in_row = np.array([~np.any(np.isnan(row)) for row in iris_2d]) iris_2d[any_nan_in_row][:5] #> array([[ 4.9, 3. , 1.4, 0.2], #> [ 4.7, 3.2, 1.3, 0.2], #> [ 4.6, 3.1, 1.5, 0.2], #> [ 5. , 3.6, 1.4, 0.2], #> [ 5.4, 3.9, 1.7, 0.4]]) </code>
36. How to find the correlation between two columns of a numpy array?
Difficulty Level: L2
Q. Find the correlation between SepalLength(1st column) and PetalLength(3rd column) in iris_2d
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) </code>
Show Solution
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) # Solution 1 np.corrcoef(iris[:, 0], iris[:, 2])[0, 1] # Solution 2 from scipy.stats.stats import pearsonr corr, p_value = pearsonr(iris[:, 0], iris[:, 2]) print(corr) # Correlation coef indicates the degree of linear relationship between two numeric variables. # It can range between -1 to +1. # The p-value roughly indicates the probability of an uncorrelated system producing # datasets that have a correlation at least as extreme as the one computed. # The lower the p-value (<0.01), stronger is the significance of the relationship. # It is not an indicator of the strength. #> 0.871754157305 </code>
37. How to find if a given array has any null values?
Difficulty Level: L2
Q. Find out if iris_2d
has any missing values.
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) </code>
Show Solution
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) np.isnan(iris_2d).any() #> False </code>
38. How to replace all missing values with 0 in a numpy array?
Difficulty Level: L2
Q. Replace all ccurrences of nan
with 0 in numpy array
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan </code>
Show Solution
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan # Solution iris_2d[np.isnan(iris_2d)] = 0 iris_2d[:4] #> array([[ 5.1, 3.5, 1.4, 0. ], #> [ 4.9, 3. , 1.4, 0.2], #> [ 4.7, 3.2, 1.3, 0.2], #> [ 4.6, 3.1, 1.5, 0.2]]) </code>
39. How to find the count of unique values in a numpy array?
Difficulty Level: L2
Q. Find the unique values and the count of unique values in iris’s species
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species') </code>
Show Solution
<code># Import iris keeping the text column intact url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species') # Solution # Extract the species column as an array species = np.array([row.tolist()[4] for row in iris]) # Get the unique values and the counts np.unique(species, return_counts=True) #> (array([b'Iris-setosa', b'Iris-versicolor', b'Iris-virginica'], #> dtype='|S15'), array([50, 50, 50])) </code>
40. How to convert a numeric to a categorical (text) array?
Difficulty Level: L2
Q. Bin the petal length (3rd) column of iris_2d to form a text array, such that if petal length is:
- Less than 3 –> ‘small’
- 3-5 –> ‘medium’
- ‘>=5 –> ‘large’
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species') </code>
Show Solution
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species') # Bin petallength petal_length_bin = np.digitize(iris[:, 2].astype('float'), [0, 3, 5, 10]) # Map it to respective category label_map = {1: 'small', 2: 'medium', 3: 'large', 4: np.nan} petal_length_cat = [label_map[x] for x in petal_length_bin] # View petal_length_cat[:4] <#> ['small', 'small', 'small', 'small'] </code>
41. How to create a new column from existing columns of a numpy array?
Difficulty Level: L2
Q. Create a new column for volume in iris_2d, where volume is (pi x petallength x sepal_length^2)/3
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='object') names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species') </code>
Show Solution
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='object') # Solution # Compute volume sepallength = iris_2d[:, 0].astype('float') petallength = iris_2d[:, 2].astype('float') volume = (np.pi * petallength * (sepallength**2))/3 # Introduce new dimension to match iris_2d's volume = volume[:, np.newaxis] # Add the new column out = np.hstack([iris_2d, volume]) # View out[:4] #> array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa', 38.13265162927291], #> [b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa', 35.200498485922445], #> [b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa', 30.0723720777127], #> [b'4.6', b'3.1', b'1.5', b'0.2', b'Iris-setosa', 33.238050274980004]], dtype=object) </code>
42. How to do probabilistic sampling in numpy?
Difficulty Level: L3
Q. Randomly sample iris
‘s species
such that setose
is twice the number of versicolor
and virginica
<code># Import iris keeping the text column intact url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') </code>
Show Solution
<code># Import iris keeping the text column intact url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') # Solution # Get the species column species = iris[:, 4] # Approach 1: Generate Probablistically np.random.seed(100) a = np.array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']) species_out = np.random.choice(a, 150, p=[0.5, 0.25, 0.25]) # Approach 2: Probablistic Sampling (preferred) np.random.seed(100) probs = np.r_[np.linspace(0, 0.500, num=50), np.linspace(0.501, .750, num=50), np.linspace(.751, 1.0, num=50)] index = np.searchsorted(probs, np.random.random(150)) species_out = species[index] print(np.unique(species_out, return_counts=True)) #> (array([b'Iris-setosa', b'Iris-versicolor', b'Iris-virginica'], dtype=object), array([77, 37, 36])) </code>
Approach 2 is preferred because it creates an index variable that can be used to sample 2d tabular data.
43. How to get the second largest value of an array when grouped by another array?
Difficulty Level: L2
Q. What is the value of second longest petallength
of species setosa
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species') </code>
Show Solution
<code># Import iris keeping the text column intact url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') # Solution # Get the species and petal length columns petal_len_setosa = iris[iris[:, 4] == b'Iris-setosa', [2]].astype('float') # Get the second last value np.unique(np.sort(petal_len_setosa))[-2] #> 1.7</code>
44. How to sort a 2D array by a column
Difficulty Level: L2
Q. Sort the iris dataset based on sepallength
column.
<code>url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species') </code>
Show Solution
<code># Sort by column position 0: SepalLength print(iris[iris[:,0].argsort()][:20]) #> [[b'4.3' b'3.0' b'1.1' b'0.1' b'Iris-setosa'] #> [b'4.4' b'3.2' b'1.3' b'0.2' b'Iris-setosa'] #> [b'4.4' b'3.0' b'1.3' b'0.2' b'Iris-setosa'] #> [b'4.4' b'2.9' b'1.4' b'0.2' b'Iris-setosa'] #> [b'4.5' b'2.3' b'1.3' b'0.3' b'Iris-setosa'] #> [b'4.6' b'3.6' b'1.0' b'0.2' b'Iris-setosa'] #> [b'4.6' b'3.1' b'1.5' b'0.2' b'Iris-setosa'] #> [b'4.6' b'3.4' b'1.4' b'0.3' b'Iris-setosa'] #> [b'4.6' b'3.2' b'1.4' b'0.2' b'Iris-setosa'] #> [b'4.7' b'3.2' b'1.3' b'0.2' b'Iris-setosa'] #> [b'4.7' b'3.2' b'1.6' b'0.2' b'Iris-setosa'] #> [b'4.8' b'3.0' b'1.4' b'0.1' b'Iris-setosa'] #> [b'4.8' b'3.0' b'1.4' b'0.3' b'Iris-setosa'] #> [b'4.8' b'3.4' b'1.9' b'0.2' b'Iris-setosa'] #> [b'4.8' b'3.4' b'1.6' b'0.2' b'Iris-setosa'] #> [b'4.8' b'3.1' b'1.6' b'0.2' b'Iris-setosa'] #> [b'4.9' b'2.4' b'3.3' b'1.0' b'Iris-versicolor'] #> [b'4.9' b'2.5' b'4.5' b'1.7' b'Iris-virginica'] #> [b'4.9' b'3.1' b'1.5' b'0.1' b'Iris-setosa'] #> [b'4.9' b'3.1' b'1.5' b'0.1' b'Iris-setosa']] </code>
45. How to find the most frequent value in a numpy array?
Difficulty Level: L1
Q. Find the most frequent value of petal length (3rd column) in iris dataset.
Input:
<code>url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species') </code>
Show Solution
<code># Input: url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') # Solution: vals, counts = np.unique(iris[:, 3], return_counts=True) print(vals[np.argmax(counts)]) #> b'0.2' </code>
46. How to find the position of the first occurrence of a value greater than a given value?
Difficulty Level: L2
Q. Find the position of the first occurrence of a value greater than 1.0 in petalwidth 4th column of iris dataset.
<code># Input: url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') </code>
Show Solution
<code># Input: url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') # Solution: print(np.argmax(iris[:, 3].astype('float') > 1.0)) #> 50 </code>
47. How to replace all values greater than a given value to a given cutoff?
Difficulty Level: L2
Q. From the array a
, replace all values greater than 30 to 30 and less than 10 to 10.
Input:
<code>np.random.seed(100) np.random.uniform(1,50, 20) </code>
Show Solution
<code># Input np.set_printoptions(precision=2) np.random.seed(100) a = np.random.uniform(1,50, 20) # Solution 1: Using np.clip np.clip(a, a_min=10, a_max=30) # Solution 2: Using np.where print(np.where(a < 10, 10, np.where(a > 30, 30, a))) #> [ 27.63 14.64 21.8 30. 10. 10. 30. 30. 10. 29.18 30. #> 11.25 10.08 10. 11.77 30. 30. 10. 30. 14.43] </code>
48. How to get the positions of top n
values from a numpy array?
Difficulty Level: L2
Q. Get the positions of top 5 maximum values in a given array a
.
<code>np.random.seed(100) a = np.random.uniform(1,50, 20) </code>
Show Solution
<code># Input np.random.seed(100) a = np.random.uniform(1,50, 20) # Solution: print(a[a.argsort()][-5:]) #> [ 41. 41.47 42.39 44.67 48.95] # Method 2: print(np.sort(a)[-5:]) # Mthod 3: np.partition(a, kth=-5)[-5:] </code>
49. How to compute the row wise counts of all possible values in an array?
Difficulty Level: L4
Q. Compute the counts of unique values row-wise.
Input:
<code>np.random.seed(100) arr = np.random.randint(1,11,size=(6, 10)) arr > array([[ 9, 9, 4, 8, 8, 1, 5, 3, 6, 3], > [ 3, 3, 2, 1, 9, 5, 1, 10, 7, 3], > [ 5, 2, 6, 4, 5, 5, 4, 8, 2, 2], > [ 8, 8, 1, 3, 10, 10, 4, 3, 6, 9], > [ 2, 1, 8, 7, 3, 1, 9, 3, 6, 2], > [ 9, 2, 6, 5, 3, 9, 4, 6, 1, 10]]) </code>
Desired Output:
<code>> [[1, 0, 2, 1, 1, 1, 0, 2, 2, 0], > [2, 1, 3, 0, 1, 0, 1, 0, 1, 1], > [0, 3, 0, 2, 3, 1, 0, 1, 0, 0], > [1, 0, 2, 1, 0, 1, 0, 2, 1, 2], > [2, 2, 2, 0, 0, 1, 1, 1, 1, 0], > [1, 1, 1, 1, 1, 2, 0, 0, 2, 1]] </code>
Output contains 10 columns representing numbers from 1 to 10. The values are the counts of the numbers in the respective rows. For example, Cell(0,2) has the value 2, which means, the number 3 occurs exactly 2 times in the 1st row.
Show Solution
<code># Input: np.random.seed(100) arr = np.random.randint(1,11,size=(6, 10)) arr #> array([[ 9, 9, 4, 8, 8, 1, 5, 3, 6, 3], #> [ 3, 3, 2, 1, 9, 5, 1, 10, 7, 3], #> [ 5, 2, 6, 4, 5, 5, 4, 8, 2, 2], #> [ 8, 8, 1, 3, 10, 10, 4, 3, 6, 9], #> [ 2, 1, 8, 7, 3, 1, 9, 3, 6, 2], #> [ 9, 2, 6, 5, 3, 9, 4, 6, 1, 10]]) </code>
<code># Solution def counts_of_all_values_rowwise(arr2d): # Unique values and its counts row wise num_counts_array = [np.unique(row, return_counts=True) for row in arr2d] # Counts of all values row wise return([[int(b[a==i]) if i in a else 0 for i in np.unique(arr2d)] for a, b in num_counts_array]) # Print print(np.arange(1,11)) counts_of_all_values_rowwise(arr) #> [ 1 2 3 4 5 6 7 8 9 10] #> [[1, 0, 2, 1, 1, 1, 0, 2, 2, 0], #> [2, 1, 3, 0, 1, 0, 1, 0, 1, 1], #> [0, 3, 0, 2, 3, 1, 0, 1, 0, 0], #> [1, 0, 2, 1, 0, 1, 0, 2, 1, 2], #> [2, 2, 2, 0, 0, 1, 1, 1, 1, 0], #> [1, 1, 1, 1, 1, 2, 0, 0, 2, 1]] </code>
<code># Example 2: arr = np.array([np.array(list('bill clinton')), np.array(list('narendramodi')), np.array(list('jjayalalitha'))]) print(np.unique(arr)) counts_of_all_values_rowwise(arr) #> [' ' 'a' 'b' 'c' 'd' 'e' 'h' 'i' 'j' 'l' 'm' 'n' 'o' 'r' 't' 'y'] #> [[1, 0, 1, 1, 0, 0, 0, 2, 0, 3, 0, 2, 1, 0, 1, 0], #> [0, 2, 0, 0, 2, 1, 0, 1, 0, 0, 1, 2, 1, 2, 0, 0], #> [0, 4, 0, 0, 0, 0, 1, 1, 2, 2, 0, 0, 0, 0, 1, 1]] </code>
50. How to convert an array of arrays into a flat 1d array?
Difficulty Level: 2
Q. Convert array_of_arrays
into a flat linear 1d array.
Input:
<code># Input: arr1 = np.arange(3) arr2 = np.arange(3,7) arr3 = np.arange(7,10) array_of_arrays = np.array([arr1, arr2, arr3]) array_of_arrays #> array([array([0, 1, 2]), array([3, 4, 5, 6]), array([7, 8, 9])], dtype=object) </code>
Desired Output:
<code>#> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) </code>
Show Solution
<code> # Input: arr1 = np.arange(3) arr2 = np.arange(3,7) arr3 = np.arange(7,10) array_of_arrays = np.array([arr1, arr2, arr3]) print('array_of_arrays: ', array_of_arrays) # Solution 1 arr_2d = np.array([a for arr in array_of_arrays for a in arr]) # Solution 2: arr_2d = np.concatenate(array_of_arrays) print(arr_2d) #> array_of_arrays: [array([0, 1, 2]) array([3, 4, 5, 6]) array([7, 8, 9])] #> [0 1 2 3 4 5 6 7 8 9] </code>
51. How to generate one-hot encodings for an array in numpy?
Difficulty Level L4
Q. Compute the one-hot encodings (dummy binary variables for each unique value in the array)
Input:
<code>np.random.seed(101) arr = np.random.randint(1,4, size=6) arr #> array([2, 3, 2, 2, 2, 1]) </code>
Output:
<code>#> array([[ 0., 1., 0.], #> [ 0., 0., 1.], #> [ 0., 1., 0.], #> [ 0., 1., 0.], #> [ 0., 1., 0.], #> [ 1., 0., 0.]]) </code>
Show Solution
<code># Input: np.random.seed(101) arr = np.random.randint(1,4, size=6) arr #> array([2, 3, 2, 2, 2, 1]) # Solution: def one_hot_encodings(arr): uniqs = np.unique(arr) out = np.zeros((arr.shape[0], uniqs.shape[0])) for i, k in enumerate(arr): out[i, k-1] = 1 return out one_hot_encodings(arr) #> array([[ 0., 1., 0.], #> [ 0., 0., 1.], #> [ 0., 1., 0.], #> [ 0., 1., 0.], #> [ 0., 1., 0.], #> [ 1., 0., 0.]]) # Method 2: (arr[:, None] == np.unique(arr)).view(np.int8) </code>
52. How to create row numbers grouped by a categorical variable?
Difficulty Level: L3
Q. Create row numbers grouped by a categorical variable. Use the following sample from iris
species
as input.
Input:
<code>url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4) species_small = np.sort(np.random.choice(species, size=20)) species_small #> array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', #> 'Iris-setosa', 'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor', #> 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', #> 'Iris-versicolor', 'Iris-virginica', 'Iris-virginica', #> 'Iris-virginica', 'Iris-virginica', 'Iris-virginica', #> 'Iris-virginica', 'Iris-virginica', 'Iris-virginica'], #> dtype='<U15') </code>
Desired Output:
<code>#> [0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 6, 7] </code>
Show Solution
<code># Input: url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4) np.random.seed(100) species_small = np.sort(np.random.choice(species, size=20)) species_small #> array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', #> 'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor', #> 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', #> 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', #> 'Iris-versicolor', 'Iris-virginica', 'Iris-virginica', #> 'Iris-virginica', 'Iris-virginica', 'Iris-virginica', #> 'Iris-virginica'], #> dtype='<U15') </code>
<code>print([i for val in np.unique(species_small) for i, grp in enumerate(species_small[species_small==val])]) </code>
<code>[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2, 3, 4, 5] </code>
53. How to create groud ids based on a given categorical variable?
Difficulty Level: L4
Q. Create group ids based on a given categorical variable. Use the following sample from iris species
as input.
Input:
<code>url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4) species_small = np.sort(np.random.choice(species, size=20)) species_small #> array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', #> 'Iris-setosa', 'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor', #> 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', #> 'Iris-versicolor', 'Iris-virginica', 'Iris-virginica', #> 'Iris-virginica', 'Iris-virginica', 'Iris-virginica', #> 'Iris-virginica', 'Iris-virginica', 'Iris-virginica'], #> dtype='<U15') </code>
Desired Output:
<code>#> [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]</code>
Show Solution
<code># Input: url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4) np.random.seed(100) species_small = np.sort(np.random.choice(species, size=20)) species_small #> array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', #> 'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor', #> 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', #> 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', #> 'Iris-versicolor', 'Iris-virginica', 'Iris-virginica', #> 'Iris-virginica', 'Iris-virginica', 'Iris-virginica', #> 'Iris-virginica'], #> dtype='<U15') </code>
<code># Solution: output = [np.argwhere(np.unique(species_small) == s).tolist()[0][0] for val in np.unique(species_small) for s in species_small[species_small==val]] # Solution: For Loop version output = [] uniqs = np.unique(species_small) for val in uniqs: # uniq values in group for s in species_small[species_small==val]: # each element in group groupid = np.argwhere(uniqs == s).tolist()[0][0] # groupid output.append(groupid) print(output) #> [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2] </code>
54. How to rank items in an array using numpy?
Difficulty Level: L2
Q. Create the ranks for the given numeric array a
.
Input:
<code>np.random.seed(10) a = np.random.randint(20, size=10) print(a) #> [ 9 4 15 0 17 16 17 8 9 0] </code>
Desired output:
<code>[4 2 6 0 8 7 9 3 5 1]</code>
Show Solution
<code>np.random.seed(10) a = np.random.randint(20, size=10) print('Array: ', a) # Solution print(a.argsort().argsort()) print('Array: ', a) #> Array: [ 9 4 15 0 17 16 17 8 9 0] #> [4 2 6 0 8 7 9 3 5 1] #> Array: [ 9 4 15 0 17 16 17 8 9 0] </code>
55. How to rank items in a multidimensional array using numpy?
Difficulty Level: L3
Q. Create a rank array of the same shape as a given numeric array a
.
Input:
<code>np.random.seed(10) a = np.random.randint(20, size=[2,5]) print(a) #> [[ 9 4 15 0 17] #> [16 17 8 9 0]] </code>
Desired output:
<code>#> [[4 2 6 0 8] #> [7 9 3 5 1]] </code>
Show Solution
<code># Input: np.random.seed(10) a = np.random.randint(20, size=[2,5]) print(a) # Solution print(a.ravel().argsort().argsort().reshape(a.shape)) #> [[ 9 4 15 0 17] #> [16 17 8 9 0]] #> [[4 2 6 0 8] #> [7 9 3 5 1]] </code>
56. How to find the maximum value in each row of a numpy array 2d?
DifficultyLevel: L2
Q. Compute the maximum for each row in the given array.
<code>np.random.seed(100) a = np.random.randint(1,10, [5,3]) a #> array([[9, 9, 4], #> [8, 8, 1], #> [5, 3, 6], #> [3, 3, 3], #> [2, 1, 9]]) </code>
Show Solution
<code># Input np.random.seed(100) a = np.random.randint(1,10, [5,3]) a # Solution 1 np.amax(a, axis=1) # Solution 2 np.apply_along_axis(np.max, arr=a, axis=1) #> array([9, 8, 6, 3, 9]) </code>
57. How to compute the min-by-max for each row for a numpy array 2d?
DifficultyLevel: L3
Q. Compute the min-by-max for each row for given 2d numpy array.
<code>np.random.seed(100) a = np.random.randint(1,10, [5,3]) a #> array([[9, 9, 4], #> [8, 8, 1], #> [5, 3, 6], #> [3, 3, 3], #> [2, 1, 9]]) </code>
Show Solution
<code># Input np.random.seed(100) a = np.random.randint(1,10, [5,3]) a # Solution np.apply_along_axis(lambda x: np.min(x)/np.max(x), arr=a, axis=1) #> array([ 0.44444444, 0.125 , 0.5 , 1. , 0.11111111]) </code>
58. How to find the duplicate records in a numpy array?
Difficulty Level: L3
Q. Find the duplicate entries (2nd occurrence onwards) in the given numpy array and mark them as True
. First time occurrences should be False
.
<code># Input np.random.seed(100) a = np.random.randint(0, 5, 10) print('Array: ', a) #> Array: [0 0 3 0 2 4 2 2 2 2]</code>
Desired Output:
<code>#> [False True False True False False True True True True]</code>
Show Solution
<code># Input np.random.seed(100) a = np.random.randint(0, 5, 10) ## Solution # There is no direct function to do this as of 1.13.3 # Create an all True array out = np.full(a.shape[0], True) # Find the index positions of unique elements unique_positions = np.unique(a, return_index=True)[1] # Mark those positions as False out[unique_positions] = False print(out) #> [False True False True False False True True True True] </code>
59. How to find the grouped mean in numpy?
Difficulty Level L3
Q. Find the mean of a numeric column grouped by a categorical column in a 2D numpy array
Input:
<code>url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species') </code>
Desired Solution:
<code>#> [[b'Iris-setosa', 3.418], #> [b'Iris-versicolor', 2.770], #> [b'Iris-virginica', 2.974]] </code>
Show Solution
<code># Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species') # Solution # No direct way to implement this. Just a version of a workaround. numeric_column = iris[:, 1].astype('float') # sepalwidth grouping_column = iris[:, 4] # species # List comprehension version [[group_val, numeric_column[grouping_column==group_val].mean()] for group_val in np.unique(grouping_column)] # For Loop version output = [] for group_val in np.unique(grouping_column): output.append([group_val, numeric_column[grouping_column==group_val].mean()]) output #> [[b'Iris-setosa', 3.418], #> [b'Iris-versicolor', 2.770], #> [b'Iris-virginica', 2.974]] </code>
60. How to convert a PIL image to numpy array?
Difficulty Level: L3
Q. Import the image from the following URL and convert it to a numpy array.
URL = ‘https://upload.wikimedia.org/wikipedia/commons/8/8b/Denali_Mt_McKinley.jpg’
Show Solution
<code>from io import BytesIO from PIL import Image import PIL, requests # Import image from URL URL = 'https://upload.wikimedia.org/wikipedia/commons/8/8b/Denali_Mt_McKinley.jpg' response = requests.get(URL) # Read it as Image I = Image.open(BytesIO(response.content)) # Optionally resize I = I.resize([150,150]) # Convert to numpy array arr = np.asarray(I) # Optionaly Convert it back to an image and show im = PIL.Image.fromarray(np.uint8(arr)) Image.Image.show(im) </code>
61. How to drop all missing values from a numpy array?
Difficulty Level: L2
Q. Drop all nan
values from a 1D numpy array
Input:
np.array([1,2,3,np.nan,5,6,7,np.nan])
Desired Output:
<code>array([ 1., 2., 3., 5., 6., 7.])</code>
Show Solution
<code>a = np.array([1,2,3,np.nan,5,6,7,np.nan]) a[~np.isnan(a)] #> array([ 1., 2., 3., 5., 6., 7.]) </code>
62. How to compute the euclidean distance between two arrays?
Difficulty Level: L3
Q. Compute the euclidean distance between two arrays a
and b
.
Input:
<code>a = np.array([1,2,3,4,5]) b = np.array([4,5,6,7,8]) </code>
Show Solution
<code># Input a = np.array([1,2,3,4,5]) b = np.array([4,5,6,7,8]) # Solution dist = np.linalg.norm(a-b) dist #> 6.7082039324993694 </code>
63. How to find all the local maxima (or peaks) in a 1d array?
Difficulty Level: L4
Q. Find all the peaks in a 1D numpy array a
. Peaks are points surrounded by smaller values on both sides.
Input:
a = np.array([1, 3, 7, 1, 2, 6, 0, 1])
Desired Output:
<code>#> array([2, 5])</code>
where, 2 and 5 are the positions of peak values 7 and 6.
Show Solution
<code>a = np.array([1, 3, 7, 1, 2, 6, 0, 1]) doublediff = np.diff(np.sign(np.diff(a))) peak_locations = np.where(doublediff == -2)[0] + 1 peak_locations #> array([2, 5]) </code>
64. How to subtract a 1d array from a 2d array, where each item of 1d array subtracts from respective row?
Difficulty Level: L2
Q. Subtract the 1d array b_1d
from the 2d array a_2d
, such that each item of b_1d
subtracts from respective row of a_2d
.
<code>a_2d = np.array([[3,3,3],[4,4,4],[5,5,5]]) b_1d = np.array([1,1,1] </code>
Desired Output:
<code>#> [[2 2 2] #> [2 2 2] #> [2 2 2]] </code>
Show Solution
<code># Input a_2d = np.array([[3,3,3],[4,4,4],[5,5,5]]) b_1d = np.array([1,2,3]) # Solution print(a_2d - b_1d[:,None]) #> [[2 2 2] #> [2 2 2] #> [2 2 2]] </code>
65. How to find the index of n’th repetition of an item in an array
Difficulty Level L2
Q. Find the index of 5th repetition of number 1 in x
.
<code>x = np.array([1, 2, 1, 1, 3, 4, 3, 1, 1, 2, 1, 1, 2])</code>
Show Solution
<code>x = np.array([1, 2, 1, 1, 3, 4, 3, 1, 1, 2, 1, 1, 2]) n = 5 # Solution 1: List comprehension [i for i, v in enumerate(x) if v == 1][n-1] # Solution 2: Numpy version np.where(x == 1)[0][n-1] #> 8 </code>
66. How to convert numpy’s datetime64
object to datetime’s datetime
object?
Difficulty Level: L2
Q. Convert numpy’s datetime64
object to datetime’s datetime
object
<code># Input: a numpy datetime64 object dt64 = np.datetime64('2018-02-25 22:10:10')</code>
Show Solution
<code># Input: a numpy datetime64 object dt64 = np.datetime64('2018-02-25 22:10:10') # Solution from datetime import datetime dt64.tolist() # or dt64.astype(datetime) #> datetime.datetime(2018, 2, 25, 22, 10, 10) </code>
67. How to compute the moving average of a numpy array?
Difficulty Level: L3
Q. Compute the moving average of window size 3, for the given 1D array.
Input:
<code>np.random.seed(100) Z = np.random.randint(10, size=10) </code>
Show Solution
<code># Solution # Source: https://stackoverflow.com/questions/14313510/how-to-calculate-moving-average-using-numpy def moving_average(a, n=3) : ret = np.cumsum(a, dtype=float) ret[n:] = ret[n:] - ret[:-n] return ret[n - 1:] / n np.random.seed(100) Z = np.random.randint(10, size=10) print('array: ', Z) print('moving average: ', moving_average(Z, n=3).round(2)) #> array: [8 8 3 7 7 0 4 2 5 2] #> moving average: [ 6.33 6. 5.67 4.67 3.67 2. 3.67 3. ] </code>
68. How to create a numpy array sequence given only the starting point, length and the step?
Difficulty Level: L2
Q. Create a numpy array of length 10, starting from 5 and has a step of 3 between consecutive numbers
Show Solution
<code>length = 10 start = 5 step = 3 def seq(start, length, step): end = start + (step*length) return np.arange(start, end, step) seq(start, length, step) #> array([ 5, 8, 11, 14, 17, 20, 23, 26, 29, 32]) </code>
69. How to fill in missing dates in an irregular series of numpy dates?
Difficulty Level: L3
Q. Given an array of a non-continuous sequence of dates. Make it a continuous sequence of dates, by filling in the missing dates.
Input:
<code># Input dates = np.arange(np.datetime64('2018-02-01'), np.datetime64('2018-02-25'), 2) print(dates) #> ['2018-02-01' '2018-02-03' '2018-02-05' '2018-02-07' '2018-02-09' #> '2018-02-11' '2018-02-13' '2018-02-15' '2018-02-17' '2018-02-19' #> '2018-02-21' '2018-02-23'] </code>
Show Solution
<code># Input dates = np.arange(np.datetime64('2018-02-01'), np.datetime64('2018-02-25'), 2) print(dates) # Solution --------------- filled_in = np.array([np.arange(date, (date+d)) for date, d in zip(dates, np.diff(dates))]).reshape(-1) # add the last day output = np.hstack([filled_in, dates[-1]]) output # For loop version ------- out = [] for date, d in zip(dates, np.diff(dates)): out.append(np.arange(date, (date+d))) filled_in = np.array(out).reshape(-1) # add the last day output = np.hstack([filled_in, dates[-1]]) output #> ['2018-02-01' '2018-02-03' '2018-02-05' '2018-02-07' '2018-02-09' #> '2018-02-11' '2018-02-13' '2018-02-15' '2018-02-17' '2018-02-19' #> '2018-02-21' '2018-02-23'] #> array(['2018-02-01', '2018-02-02', '2018-02-03', '2018-02-04', #> '2018-02-05', '2018-02-06', '2018-02-07', '2018-02-08', #> '2018-02-09', '2018-02-10', '2018-02-11', '2018-02-12', #> '2018-02-13', '2018-02-14', '2018-02-15', '2018-02-16', #> '2018-02-17', '2018-02-18', '2018-02-19', '2018-02-20', #> '2018-02-21', '2018-02-22', '2018-02-23'], dtype='datetime64[D]') </code>
70. How to create strides from a given 1D array?
Difficulty Level: L4
Q. From the given 1d array arr
, generate a 2d matrix using strides, with a window length of 4 and strides of 2, like [[0,1,2,3], [2,3,4,5], [4,5,6,7]..]
Input:
<code>arr = np.arange(15) arr #> array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]) </code>
Desired Output:
<code>#> [[ 0 1 2 3] #> [ 2 3 4 5] #> [ 4 5 6 7] #> [ 6 7 8 9] #> [ 8 9 10 11] #> [10 11 12 13]] </code>
Show Solution
<code>def gen_strides(arr, stride_len=5, window_len=5): n_strides = ((a.size-window_len)//stride_len) + 1 # return np.array([a[s:(s+window_len)] for s in np.arange(0, a.size, stride_len)[:n_strides]]) return np.array([a[s:(s+window_len)] for s in np.arange(0, n_strides*stride_len, stride_len)]) print(gen_strides(np.arange(15), stride_len=2, window_len=4)) #> [[ 0 1 2 3] #> [ 2 3 4 5] #> [ 4 5 6 7] #> [ 6 7 8 9] #> [ 8 9 10 11] #> [10 11 12 13]] </code>
#>
To be continued . .
转载自https://juejin.im/entry/5a9619fb5188257a7262e526?utm_source=gold_browser_extension