Numerical Python

Numerical Python#

You can write books on Numerical Python alone. There is quite some interesting history behind this topic, along with the careers of some wonderful and outstanding people who have driven the development in this area. Usually a set of packages is important for the Numerical Python stack, all worth having a closer look. To begin with, we will focus on some core functionality of these packages, just to get things going.

The first package to mention here is numpy. It was written to allow array and matrix like handling of numbers together with basic mathematical operations on it more efficiently. In a way known before from the Fortran and Matlab world. This package was created with a similar intention in mind as for the development of Python itself. There was a lack of easy to understand, yet powerful programming languages and data handling environments in the scientific world. Many scientists struggled with creating data evaluation procedures due to the lack of programming skills in languages such as Fortran and C, or faced huge licensing costs for applications such as Matlab to perform seamingly simple tasks.

Python was a rising star in these days and people notices how simple the language was although it was pretty complete in functionality and quite powerful already. The missing part was lightning fast algorithms for solving equations, curve fitting and other matrix related operations necessary to model an algorithm following a mathmatical solution. These algorithms existed though in Fortran, and Python was already capable of being extended by wrappers to get Fortran and C based solutions into the Python world by creating suitable APIs for them. It was Travis Oliphant back then who started to work on creating the first Numerical Python package to add all the necessary high performance math functionality. Today this package is known as numpy.

travis_oliphant — Fig. 2 Travis Oliphant (Courtesy unknown)#

numpy started with creating interfaces to well known and largely used fortran libraries such as linpack, lapack, atlas and blas which had been developed over decades already by very skilled scientists mainly in the fields of astronomy and particle physics. The very same packages have been used in commercial environments such as Matlab at that time. As a consequence, within a very short time frame, Python jumped on the stage for high performance data processing because of this. Today numpy is build upon the various high performance math libraries provided by the CPU vendors to offer the highest performance possible wherever available. A fallback still exists though for the good old fortran libraries to be used where no CPU related libraries exist.

A key aspect when creating the library was to create a Pythonic interface for the already existing libraries, so that common strategies to organize and handle data can be re-used in the numpy world. This results in a steep learning curve when dealing with the numpy functionality. So we can expect iterables for data we can access using square brackets or by looping over the contents and the same operators and function names known from builtins and the math module to perform calculus work.

There are some key difference in using the offered containers and operators. Operators work on whole container objects. So we don’t have to iterate over the contents of an array to calculate a sine value for each element for example. We can just use the whole array as an argument to numpys sine function. And containers are usually typed. So an array can usually not consist of different element types, but has to be based on just float values for example. We’ll see that later on. Let’s just directly dive into it by importing the numpy package. The usual way to do this is like this.

import numpy as np

Now every functionality offered by the numpy package is available by preceeding np. to the desired function call.

Tip

Use our well known dir function to get an idea what numpy has to offer. Do you recognize some functions from previous chapters?

Let’s start with the creation of containers. An array is the simplest container available and can be used to create a numpy type variable out of a Python one. By calling

a = np.array([])
a

array([], dtype=float64)

for example we create an array out of an empty Python list. The type used for the elements in this array would be a float64 type, which indicates a 64 bit precision for floating point numbers here. The default used is dependent on the CPU architecture of the host running the Python session. On a 32-bit architecture the default type will be a float32.

Running dir on our array shows the object orientated approach used for numpy. Our empty array has quite a lot of functionality to offer already.

dir(a)

['T',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_finalize__',
 '__array_function__',
 '__array_interface__',
 '__array_prepare__',
 '__array_priority__',
 '__array_struct__',
 '__array_ufunc__',
 '__array_wrap__',
 '__bool__',
 '__class__',
 '__class_getitem__',
 '__complex__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__divmod__',
 '__dlpack__',
 '__dlpack_device__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__ilshift__',
 '__imatmul__',
 '__imod__',
 '__imul__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__irshift__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lshift__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdivmod__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rfloordiv__',
 '__rlshift__',
 '__rmatmul__',
 '__rmod__',
 '__rmul__',
 '__ror__',
 '__rpow__',
 '__rrshift__',
 '__rshift__',
 '__rsub__',
 '__rtruediv__',
 '__rxor__',
 '__setattr__',
 '__setitem__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__',
 '__xor__',
 'all',
 'any',
 'argmax',
 'argmin',
 'argpartition',
 'argsort',
 'astype',
 'base',
 'byteswap',
 'choose',
 'clip',
 'compress',
 'conj',
 'conjugate',
 'copy',
 'ctypes',
 'cumprod',
 'cumsum',
 'data',
 'diagonal',
 'dot',
 'dtype',
 'dump',
 'dumps',
 'fill',
 'flags',
 'flat',
 'flatten',
 'getfield',
 'imag',
 'item',
 'itemset',
 'itemsize',
 'max',
 'mean',
 'min',
 'nbytes',
 'ndim',
 'newbyteorder',
 'nonzero',
 'partition',
 'prod',
 'ptp',
 'put',
 'ravel',
 'real',
 'repeat',
 'reshape',
 'resize',
 'round',
 'searchsorted',
 'setfield',
 'setflags',
 'shape',
 'size',
 'sort',
 'squeeze',
 'std',
 'strides',
 'sum',
 'swapaxes',
 'take',
 'tobytes',
 'tofile',
 'tolist',
 'tostring',
 'trace',
 'transpose',
 'var',
 'view']

A few of them we already recognize from previous chapters, such as the __add__ dunder. So there must be some way of addition implemented for arrays, and the __mul__ dunder for multiplication. There’s also a __len__ dunder. So a len function call on the array should reveal the number of items in it. Let’s quickly check that on our empty array.

len(a)

Yes. That matches our expectations. An empty array has no elements. Fine! There are some helper functions to create more exciting arrays. And in contrast to our Python containers, these array can be multi-dimensional. The following call for example creates a 3x3 array of zeros

z = np.zeros((3,3))
z

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

Calling len on it probably gives us the number of elements in it again. Let’s check that.

len(z)

Oh wait. We probably expected 9 elements here, whilst len returned us 3 elements here. This is because of the array being two dimensional in this case, consisting of 3 vectors having 3 elements each. Look at the output of the array very closely. The square brackets indicate just that.

But the array itself has some properties to guide you here. The size property can be used to show the real number of elements here.

z.size

And the shape property can be used to show the dimensionality. It returns a tuple with the number of rows and columns.

z.shape

(3, 3)

You can use the dtype property to check the used type for the elements.

z.dtype

dtype('float64')

There is a similar helper function to create multidimensional arrays of just ones.

o = np.ones((3,3))
o

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

Creates a 3x3 array with the value 1 in each possible location.

The eye function can be used to create a square array with ones placed into a diaginal and zeros at all other locations.

e = np.eye(3)
e

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

numpy also offers submodules for specific functionality, as for random numbers for example. The following call generates a 3x3 array of random numbers in the interval [0,1[.

np.random.random((3,3))

array([[0.4950095 , 0.05346328, 0.68209593],
       [0.13455126, 0.87754019, 0.90470143],
       [0.83374753, 0.69382415, 0.90959472]])

These methods already allow to create almost every necessary kind of multidemensional arrays, as these arrays can be added and multiplicated, both with each other - when the dimensions match - or with scalar values. For example.

o * 5

array([[5., 5., 5.],
       [5., 5., 5.],
       [5., 5., 5.]])

creates a 3x3 array of just fives.

5 * o - 2 * e

array([[3., 5., 5.],
       [5., 3., 5.],
       [5., 5., 3.]])

scales the two former arrays o and e by a scalar value and substracts the latter result from the first one.

numpy also offers some handy helper functions to create arrays with evenly spaced values in it. For example

a1 = np.linspace(0, 6, 9)
a1

array([0.  , 0.75, 1.5 , 2.25, 3.  , 3.75, 4.5 , 5.25, 6.  ])

creates a one dimensional array of 9 numbers evenly spaced starting at 0 and ending with a 6. In this case the number of values is fixed and set to 9, whilst the step size is calculated from the parameters.

On the other hand

a2 = np.arange(0, 10, .5)
a2

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. ,
       6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5])

creates an array of numbers from 0 to 10 with 10 exclusive with a step size of .5, where the number of elements is not explicitely set but the stepsize is fixed.

numpy arrays can be transformed by changing the dimensionality for example as long as the number of elements does not change. To re-arrange the one dimensional array a1 to being a two dimensional 3x3 array, we can use the reshape method.

m1 = a1.reshape((3,3))
m1

array([[0.  , 0.75, 1.5 ],
       [2.25, 3.  , 3.75],
       [4.5 , 5.25, 6.  ]])

This is a very common procedure when working with numpy arrays to create the desired shape and contents, and it almost takes no effort and calculation time, as internally the order of elements does not change at all. Only the properties describing the order visible to the outside is adjusted, which is respected in all further array handling. Note that this creates a copy of the original array a1. So the original array is not even touched. The reshape operation basically takes the effort and time of creating a copy of a1.

We can however trigger a modification of a1 as well. One trivial way would be to assign the result of the reshape operation on a1 back to a1, which would create a copy, modify its properties, assigns the result to the same variabe a1 again, essentially dropping the original a1.

The more efficient way would be to directly change the properties of a1 by assigning a proper new tuple to its shape like so.

a1.shape = (3, 3)

a1

array([[0.  , 0.75, 1.5 ],
       [2.25, 3.  , 3.75],
       [4.5 , 5.25, 6.  ]])