1.3.1. The NumPy array object

1.3.1.1. What are NumPy and NumPy arrays?

NumPy arrays

Python objects:
  • high-level number objects: integers, floating point

  • containers: lists (costless insertion and append), dictionaries (fast lookup)

NumPy provides:
  • extension package to Python for multi-dimensional arrays

  • closer to hardware (efficiency)

  • designed for scientific computation (convenience)

  • Also known as array oriented computing


>>> import numpy as np
>>> a = np.array([0, 1, 2, 3])
>>> a
array([0, 1, 2, 3])

Tip

For example, An array containing:

  • values of an experiment/simulation at discrete time steps

  • signal recorded by a measurement device, e.g. sound wave

  • pixels of an image, grey-level or colour

  • 3-D data measured at different X-Y-Z positions, e.g. MRI scan

Why it is useful: Memory-efficient container that provides fast numerical operations.

In [1]: L = range(1000)
In [2]: %timeit [i**2 for i in L]
42.6 us +- 522 ns per loop (mean +- std. dev. of 7 runs, 10,000 loops each)
In [3]: a = np.arange(1000)
In [4]: %timeit a**2
892 ns +- 4.71 ns per loop (mean +- std. dev. of 7 runs, 1,000,000 loops each)

NumPy Reference documentation

  • On the web: https://numpy.org/doc/

  • Interactive help:

    In [5]: np.array?
    
    Docstring:
    array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0,
    like=None)
    Create an array.
    Parameters
    ----------
    object : array_like
    An array, any object exposing the array interface, an object whose
    ``__array__`` method returns an array, or any (nested) sequence.
    If object is a scalar, a 0-dimensional array containing object is
    returned.
    dtype : data-type, optional
    The desired data-type for the array. If not given, NumPy will try to use
    a default ``dtype`` that can represent the values (by applying promotion
    rules when necessary.)
    copy : bool, optional
    If true (default), then the object is copied. Otherwise, a copy will
    only be made if ``__array__`` returns a copy, if obj is a nested
    sequence, or if a copy is needed to satisfy any of the other
    requirements (``dtype``, ``order``, etc.).
    order : {'K', 'A', 'C', 'F'}, optional
    Specify the memory layout of the array. If object is not an array, the
    newly created array will be in C order (row major) unless 'F' is
    specified, in which case it will be in Fortran order (column major).
    If object is an array the following holds.
    ===== ========= ===================================================
    order no copy copy=True
    ===== ========= ===================================================
    'K' unchanged F & C order preserved, otherwise most similar order
    'A' unchanged F order if input is F and not C, otherwise C order
    'C' C order C order
    'F' F order F order
    ===== ========= ===================================================
    When ``copy=False`` and a copy is made for other reasons, the result is
    the same as if ``copy=True``, with some exceptions for 'A', see the
    Notes section. The default order is 'K'.
    subok : bool, optional
    If True, then sub-classes will be passed-through, otherwise
    the returned array will be forced to be a base-class array (default).
    ndmin : int, optional
    Specifies the minimum number of dimensions that the resulting
    array should have. Ones will be prepended to the shape as
    needed to meet this requirement.
    like : array_like, optional
    Reference object to allow the creation of arrays which are not
    NumPy arrays. If an array-like passed in as ``like`` supports
    the ``__array_function__`` protocol, the result will be defined
    by it. In this case, it ensures the creation of an array object
    compatible with that passed in via this argument.
    .. versionadded:: 1.20.0
    Returns
    -------
    out : ndarray
    An array object satisfying the specified requirements.
    See Also
    --------
    empty_like : Return an empty array with shape and type of input.
    ones_like : Return an array of ones with shape and type of input.
    zeros_like : Return an array of zeros with shape and type of input.
    full_like : Return a new array with shape of input filled with value.
    empty : Return a new uninitialized array.
    ones : Return a new array setting values to one.
    zeros : Return a new array setting values to zero.
    full : Return a new array of given shape filled with value.
    Notes
    -----
    When order is 'A' and ``object`` is an array in neither 'C' nor 'F' order,
    and a copy is forced by a change in dtype, then the order of the result is
    not necessarily 'C' as expected. This is likely a bug.
    Examples
    --------
    >>> np.array([1, 2, 3])
    array([1, 2, 3])
    Upcasting:
    >>> np.array([1, 2, 3.0])
    array([ 1., 2., 3.])
    More than one dimension:
    >>> np.array([[1, 2], [3, 4]])
    array([[1, 2],
    [3, 4]])
    Minimum dimensions 2:
    >>> np.array([1, 2, 3], ndmin=2)
    array([[1, 2, 3]])
    Type provided:
    >>> np.array([1, 2, 3], dtype=complex)
    array([ 1.+0.j, 2.+0.j, 3.+0.j])
    Data-type consisting of more than one element:
    >>> x = np.array([(1,2),(3,4)],dtype=[('a','<i4'),('b','<i4')])
    >>> x['a']
    array([1, 3])
    Creating an array from sub-classes:
    >>> np.array(np.mat('1 2; 3 4'))
    array([[1, 2],
    [3, 4]])
    >>> np.array(np.mat('1 2; 3 4'), subok=True)
    matrix([[1, 2],
    [3, 4]])
    Type: builtin_function_or_method
  • Looking for something:

    >>> np.lookfor('create array') 
    
    Search results for 'create array'
    ---------------------------------
    numpy.array
    Create an array.
    numpy.memmap
    Create a memory-map to an array stored in a *binary* file on disk.
    In [6]: np.con*?
    
    np.concatenate
    np.conj
    np.conjugate
    np.convolve

Import conventions

The recommended convention to import NumPy is:

>>> import numpy as np

1.3.1.2. Creating arrays

Manual construction of arrays

  • 1-D:

    >>> a = np.array([0, 1, 2, 3])
    
    >>> a
    array([0, 1, 2, 3])
    >>> a.ndim
    1
    >>> a.shape
    (4,)
    >>> len(a)
    4
  • 2-D, 3-D, …:

    >>> b = np.array([[0, 1, 2], [3, 4, 5]])    # 2 x 3 array
    
    >>> b
    array([[0, 1, 2],
    [3, 4, 5]])
    >>> b.ndim
    2
    >>> b.shape
    (2, 3)
    >>> len(b) # returns the size of the first dimension
    2
    >>> c = np.array([[[1], [2]], [[3], [4]]])
    >>> c
    array([[[1],
    [2]],
    [[3],
    [4]]])
    >>> c.shape
    (2, 2, 1)

Functions for creating arrays

Tip

In practice, we rarely enter items one by one…

  • Evenly spaced:

    >>> a = np.arange(10) # 0 .. n-1  (!)
    
    >>> a
    array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    >>> b = np.arange(1, 9, 2) # start, end (exclusive), step
    >>> b
    array([1, 3, 5, 7])
  • or by number of points:

    >>> c = np.linspace(0, 1, 6)   # start, end, num-points
    
    >>> c
    array([0. , 0.2, 0.4, 0.6, 0.8, 1. ])
    >>> d = np.linspace(0, 1, 5, endpoint=False)
    >>> d
    array([0. , 0.2, 0.4, 0.6, 0.8])
  • Common arrays:

    >>> a = np.ones((3, 3))  # reminder: (3, 3) is a tuple
    
    >>> a
    array([[1., 1., 1.],
    [1., 1., 1.],
    [1., 1., 1.]])
    >>> b = np.zeros((2, 2))
    >>> b
    array([[0., 0.],
    [0., 0.]])
    >>> c = np.eye(3)
    >>> c
    array([[1., 0., 0.],
    [0., 1., 0.],
    [0., 0., 1.]])
    >>> d = np.diag(np.array([1, 2, 3, 4]))
    >>> d
    array([[1, 0, 0, 0],
    [0, 2, 0, 0],
    [0, 0, 3, 0],
    [0, 0, 0, 4]])
  • np.random: random numbers (Mersenne Twister PRNG):

    >>> rng = np.random.default_rng(27446968)
    
    >>> a = rng.random(4) # uniform in [0, 1]
    >>> a
    array([0.64613018, 0.48984931, 0.50851229, 0.22563948])
    >>> b = rng.standard_normal(4) # Gaussian
    >>> b
    array([-0.38250769, -0.61536465, 0.98131732, 0.59353096])

1.3.1.3. Basic data types

You may have noticed that, in some instances, array elements are displayed with a trailing dot (e.g. 2. vs 2). This is due to a difference in the data-type used:

>>> a = np.array([1, 2, 3])
>>> a.dtype
dtype('int64')
>>> b = np.array([1., 2., 3.])
>>> b.dtype
dtype('float64')

Tip

Different data-types allow us to store data more compactly in memory, but most of the time we simply work with floating point numbers. Note that, in the example above, NumPy auto-detects the data-type from the input.


You can explicitly specify which data-type you want:

>>> c = np.array([1, 2, 3], dtype=float)
>>> c.dtype
dtype('float64')

The default data type is floating point:

>>> a = np.ones((3, 3))
>>> a.dtype
dtype('float64')

There are also other types:

Complex:
>>> d = np.array([1+2j, 3+4j, 5+6*1j])
>>> d.dtype
dtype('complex128')
Bool:
>>> e = np.array([True, False, False, True])
>>> e.dtype
dtype('bool')
Strings:
>>> f = np.array(['Bonjour', 'Hello', 'Hallo'])
>>> f.dtype # <--- strings containing max. 7 letters
dtype('<U7')
Much more:
  • int32

  • int64

  • uint32

  • uint64

1.3.1.4. Basic visualization

Now that we have our first data arrays, we are going to visualize them.

Start by launching IPython:

$ ipython # or ipython3 depending on your install

Or the notebook:

$ jupyter notebook

Once IPython has started, enable interactive plots:

>>> %matplotlib  

Or, from the notebook, enable plots in the notebook:

>>> %matplotlib inline 

The inline is important for the notebook, so that plots are displayed in the notebook and not in a new window.

Matplotlib is a 2D plotting package. We can import its functions as below:

>>> import matplotlib.pyplot as plt  # the tidy way

And then use (note that you have to use show explicitly if you have not enabled interactive plots with %matplotlib):

>>> plt.plot(x, y)       # line plot    
>>> plt.show() # <-- shows the plot (not needed with interactive plots)

Or, if you have enabled interactive plots with %matplotlib:

>>> plt.plot(x, y)       # line plot    
  • 1D plotting:

>>> x = np.linspace(0, 3, 20)
>>> y = np.linspace(0, 9, 20)
>>> plt.plot(x, y) # line plot
[<matplotlib.lines.Line2D object at ...>]
>>> plt.plot(x, y, 'o') # dot plot
[<matplotlib.lines.Line2D object at ...>]
../../_images/sphx_glr_plot_basic1dplot_001.png
  • 2D arrays (such as images):

>>> rng = np.random.default_rng(27446968)
>>> image = rng.random((30, 30))
>>> plt.imshow(image, cmap=plt.cm.hot)
<matplotlib.image.AxesImage object at ...>
>>> plt.colorbar()
<matplotlib.colorbar.Colorbar object at ...>
../../_images/sphx_glr_plot_basic2dplot_001.png

See also

More in the: matplotlib chapter

1.3.1.5. Indexing and slicing

The items of an array can be accessed and assigned to the same way as other Python sequences (e.g. lists):

>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[0], a[2], a[-1]
(0, 2, 9)

Warning

Indices begin at 0, like other Python sequences (and C/C++). In contrast, in Fortran or Matlab, indices begin at 1.

The usual python idiom for reversing a sequence is supported:

>>> a[::-1]
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

For multidimensional arrays, indices are tuples of integers:

>>> a = np.diag(np.arange(3))
>>> a
array([[0, 0, 0],
[0, 1, 0],
[0, 0, 2]])
>>> a[1, 1]
1
>>> a[2, 1] = 10 # third line, second column
>>> a
array([[ 0, 0, 0],
[ 0, 1, 0],
[ 0, 10, 2]])
>>> a[1]
array([0, 1, 0])

Note

  • In 2D, the first dimension corresponds to rows, the second to columns.

  • for multidimensional a, a[0] is interpreted by taking all elements in the unspecified dimensions.

Slicing: Arrays, like other Python sequences can also be sliced:

>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[2:9:3] # [start:end:step]
array([2, 5, 8])

Note that the last index is not included! :

>>> a[:4]
array([0, 1, 2, 3])

All three slice components are not required: by default, start is 0, end is the last and step is 1:

>>> a[1:3]
array([1, 2])
>>> a[::2]
array([0, 2, 4, 6, 8])
>>> a[3:]
array([3, 4, 5, 6, 7, 8, 9])

A small illustrated summary of NumPy indexing and slicing…

../../_images/numpy_indexing.png

You can also combine assignment and slicing:

>>> a = np.arange(10)
>>> a[5:] = 10
>>> a
array([ 0, 1, 2, 3, 4, 10, 10, 10, 10, 10])
>>> b = np.arange(5)
>>> a[5:] = b[::-1]
>>> a
array([0, 1, 2, 3, 4, 4, 3, 2, 1, 0])

1.3.1.6. Copies and views

A slicing operation creates a view on the original array, which is just a way of accessing array data. Thus the original array is not copied in memory. You can use np.may_share_memory() to check if two arrays share the same memory block. Note however, that this uses heuristics and may give you false positives.

When modifying the view, the original array is modified as well:

>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = a[::2]
>>> b
array([0, 2, 4, 6, 8])
>>> np.may_share_memory(a, b)
True
>>> b[0] = 12
>>> b
array([12, 2, 4, 6, 8])
>>> a # (!)
array([12, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a = np.arange(10)
>>> c = a[::2].copy() # force a copy
>>> c[0] = 12
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.may_share_memory(a, c)
False

This behavior can be surprising at first sight… but it allows to save both memory and time.

1.3.1.7. Fancy indexing

Tip

NumPy arrays can be indexed with slices, but also with boolean or integer arrays (masks). This method is called fancy indexing. It creates copies not views.

Using boolean masks

>>> rng = np.random.default_rng(27446968)
>>> a = rng.integers(0, 21, 15)
>>> a
array([ 3, 13, 12, 10, 10, 10, 18, 4, 8, 5, 6, 11, 12, 17, 3])
>>> (a % 3 == 0)
array([ True, False, True, False, False, False, True, False, False,
False, True, False, True, False, True])
>>> mask = (a % 3 == 0)
>>> extract_from_a = a[mask] # or, a[a%3==0]
>>> extract_from_a # extract a sub-array with the mask
array([ 3, 12, 18, 6, 12, 3])

Indexing with a mask can be very useful to assign a new value to a sub-array:

>>> a[a % 3 == 0] = -1
>>> a
array([-1, 13, -1, 10, 10, 10, -1, 4, 8, 5, -1, 11, -1, 17, -1])

Indexing with an array of integers

>>> a = np.arange(0, 100, 10)
>>> a
array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

Indexing can be done with an array of integers, where the same index is repeated several time:

>>> a[[2, 3, 2, 4, 2]]  # note: [2, 3, 2, 4, 2] is a Python list
array([20, 30, 20, 40, 20])

New values can be assigned with this kind of indexing:

>>> a[[9, 7]] = -100
>>> a
array([ 0, 10, 20, 30, 40, 50, 60, -100, 80, -100])

Tip

When a new array is created by indexing with an array of integers, the new array has the same shape as the array of integers:

>>> a = np.arange(10)
>>> idx = np.array([[3, 4], [9, 7]])
>>> idx.shape
(2, 2)
>>> a[idx]
array([[3, 4],
[9, 7]])

The image below illustrates various fancy indexing applications

../../_images/numpy_fancy_indexing.png