36 Handy Numpy Features

Tip

Make sure you run the code cell below before any of the others. This will import the numpy package, and it will be remembered until you move to a different page or refresh this page.

36.1 Empty arrays

To create a new empty NumPy array, we can use the empty() function and specify the shape (length of each dimension) of the array we want.

Note

“But wait!” you exclaim, “That doesn’t look empty to me!”. The way it actually works is it creates a new array and dumps in some garbage values that are very close to 0 (note the e to the minus at the end of the numbers) based on stuff that’s in memory at the time.

Just think of it as empty. If you actually want 0s, we use something else (which you’ll see in a moment). But empty() is a little more computationally efficient.

36.2 Creating an Array of Evenly-spaced Values

NumPy has a neat function called arange() which will create a NumPy array with evenly spaced values at an interval of our choosing.

Example : let’s say we want to create a NumPy array with first value 0 and counting up in intervals of 5 up to 100.

36.3 Creating an Array of Zeros

numpy.zeros() is a function that creates a NumPy array of given dimensions, filled with zeros. This can be useful if we want to create a placeholder array that we will fill in / update with data as we go.

eg an array to hold data for five departments, each with 4 groups, each with 3 people.

36.4 Shape and ndim

The shape attribute stores the length (number of elements) in each dimension.

The ndim attribute stores the number of dimensions in the array.

36.5 Mathematical Operations

One of the key advantages of NumPy arrays is that you can apply mathematical operations at scale to large numbers of values much more efficiently than you would be able to otherwise.

It’s also really easy to do! Let’s imagine we have an array, and we want to double every single value in it.

We can also combine mathematical operations with slicing to only apply our operation to certain bits of the array.

This says “for each of the second lists in both groups, replace them with the values doubled”.

We could grab these out as a separate array if we wanted.

36.6 Statistical Operations

We can also perform statistical operations on NumPy arrays easily and efficiently. We can find a single mean value across the whole array (the same principle works for slices) :

Or find means across dimensions of the array :

axis = 0 means give the mean across the rows - ie the two lists. So, the mean of element 0 in both, mean of element 1 in both etc

axis = 1 means give the mean across the columns - ie the mean of each row (list). So, the mean of list 0, the mean of list 1 etc

Axis values of 2+ are used for third dimensional columns+ (don’t worry if your head’s starting to hurt, that’s normal!)

36.7 Dot Product

The dot product of two arrays of identical length multiplies the nth element of array a with the nth element of array b, and adds all of these multiplications together to give a single answer.

Example :

a = [1, 2, 3]
b = [2, 4, 6]

a · b = (1 x 2) + (2 x 4) + (3 x 6) = 2 + 8 + 18 = 28

To do this in NumPy, we use the dot() function.

a = [1, 2, 3]
b = [2, 4, 6]

dp = np.dot(a,b)

print(dp)

Tip

Dot Product calculations are useful when we want to weight one set of values by another set of values.

For example, in a geographic model, we may want to weight travel times by the number of people coming from a location (so that we don’t treat 1 person having to travel a longer distance the same way as 100 people having to do this).

36.8 Removing Duplicate Data

The unique() function allows us to easily remove duplicate values from a NumPy array.

c = np.array([[1,2,3,4,5], [5,6,7,8,9]])
c_unique = np.unique(c)

print(c_unique)

We can also use it to remove duplicate rows or columns.

d = np.array([[1,2,3], [4,5,6],[1,2,3]])
d_unique = np.unique(d, axis=0)

print(d_unique)

e = np.array([[1,2,1], [2,4,2],[3,5,3]])
e_unique = np.unique(e, axis=0)

print(e_unique)