NumPy is a Python library used for working with arrays. It also has functions for working in the domain of linear algebra, fourier transform, and matrices.
NumPy was created in 2005 by Travis Oliphant. It is an open source project and you can use it freely. NumPy stands for Numerical Python.
Why Use NumPy?
In Python, we have lists that serve the purpose of arrays, but they are slow to process. NumPy aims to provide an array object that is up to 50x faster than traditional Python lists. The array object in NumPy is called ndarray, it provides a lot of supporting functions that make working with ndarray very easy. Arrays are very frequently used in data science, where speed and resources are very important.
Why is NumPy Faster Than Lists?
NumPy arrays are stored at one continuous place in memory unlike lists, so processes can access and manipulate them very efficiently. This behavior is called locality of reference in computer science. This is the main reason why NumPy is faster than lists. Also it is optimized to work with latest CPU architectures.
Installation of NumPy
If you have Python and PIP already installed on a system, then installation of NumPy is very easy.
Install it using this command:
C:\Users\Your Name>pip install numpy
If this command fails, then use a python distribution that already has NumPy installed like, Anaconda, Spyder etc.
Importing NumPy
Once NumPy is installed, import it in your applications by adding the import keyword:
import numpy
Example:
import numpy
arr = numpy.array([1, 2, 3, 4, 5])
print(arr)
NumPy as np
NumPy is usually imported under the np alias.
Create an alias with the as keyword while importing:
import numpy as np
Example:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
Checking NumPy Version
The version string is stored under __version__
attribute.
import numpy as np
print(np.__version__)
NumPy Arrays
NumPy arrays are great alternatives to Python Lists. Some key advantages of NumPy arrays are that they are fast, easy to work with, and give users the opportunity to perform calculations across entire arrays.
In the following example, you will first create two Python lists. Then, you will import the numpy
package and create numpy
arrays out of the newly created lists.
Then we can perform element-wise calculations on height and weight. For example, you could take all 6 of the height and weight observations above, and calculate the BMI for each observation with a single equation. These operations are very fast and computationally efficient. They are particularly helpful when you have 1000s of observations in your data.
# Import the numpy package as np
import numpy as np
# Create 2 new lists height and weight
height = [1.87, 1.87, 1.82, 1.91, 1.90, 1.85]
weight = [81.65, 97.52, 95.25, 92.98, 86.18, 88.45]
# Create 2 numpy arrays from height and weight
np_height = np.array(height)
np_weight = np.array(weight)
# Print out the type of np_height
print(type(np_height))
# Calculate bmi
bmi = np_weight / np_height ** 2
# Print the result
print(bmi)
Subsetting
Another great feature of NumPy arrays is the ability to subset. For instance, if you wanted to know which observations in our BMI array are above 23, we could quickly subset it to find out.
# Import the numpy package as np
import numpy as np
# Create 2 new lists height and weight
height = [1.87, 1.87, 1.82, 1.91, 1.90, 1.85]
weight = [81.65, 97.52, 95.25, 92.98, 86.18, 88.45]
# Create 2 numpy arrays from height and weight
np_height = np.array(height)
np_weight = np.array(weight)
# Print out the type of np_height
print(type(np_height))
# Calculate bmi
bmi = np_weight / np_height ** 2
# Print only bmi > 25
print(bmi[bmi > 25])
Exercise
First, convert the list of weights from a list to a NumPy array. Then, convert all of the weights from kilograms to pounds. Use the scalar conversion of 2.2 lbs per kilogram to make your conversion. Lastly, print the resulting array of weights in pounds.
weight_kg = [35, 40, 45, 50, 55, 60, 65]
import numpy as np
# Create a numpy array np_weight_kg from weight_kg
# Create np_weight_lbs from np_weight_kg
# Print out np_weight_lbs
weight_kg = [35, 40, 45, 50, 55, 60, 65]
import numpy as np
# Create a numpy array np_weight_kg from weight_kg
np_weight_kg = np.array(weight_kg)
# Create np_weight_lbs from np_weight_kg
np_weight_lbs = np_weight_kg * 2.2
# Print out np_weight_lbs
print(np_weight_lbs)
test_output_contains("[ 77. 88. 99. 110. 121. 132. 143.]")
success_msg("Excellent!")
NumPy ufuncs
What are ufuncs?
ufuncs
stands for "Universal Functions" and they are NumPy functions that operates on the ndarray object.
Why use ufuncs?
ufuncs
are used to implement vectorization in NumPy which is way faster than iterating over elements. They also provide broadcasting and additional methods like reduce, accumulate etc. that are very helpful for computation.
ufuncs
also take additional arguments, like:
-
where
- boolean array or condition defining where the operations should take place. -
dtype
- defining the return type of elements. -
out
- output array where the return value should be copied.
What is Vectorization?
Converting iterative statements into a vector based operation is called vectorization.
It is faster as modern CPUs are optimized for such operations.
Add the Elements of Two Lists
list 1: [1, 2, 3, 4]
list 2: [4, 5, 6, 7]
One way of doing it is to iterate over both of the lists and then sum each element.
Example Without ufunc
:
x = [1, 2, 3, 4]
y = [4, 5, 6, 7]
z = []
for i, j in zip(x, y):
z.append(i + j)
print(z)
NumPy has a ufunc
for this, called add(x, y) that will produce the same result.
With ufunc
, we can use the add()
function:
import numpy as np
x = [1, 2, 3, 4]
y = [4, 5, 6, 7]
z = np.add(x, y)
print(z)