.. _debugging_chapter:

==============
Debugging code
==============

**Author**: *Gaël Varoquaux*

This section explores tools to understand better your code base:
debugging, to find and fix bugs.

It is not specific to the scientific Python community, but the strategies
that we will employ are tailored to its needs.

.. topic:: Prerequisites

    * NumPy
    * IPython
    * `nosetests <https://nose.readthedocs.io/en/latest/>`__
    * `pyflakes <https://pypi.org/project/pyflakes>`__
    * gdb for the C-debugging part.

.. contents:: Chapter contents
   :local:
   :depth: 2


Avoiding bugs
=============

Coding best practices to avoid getting in trouble
--------------------------------------------------

.. sidebar:: Brian Kernighan

   *“Everyone knows that debugging is twice as hard as writing a
   program in the first place. So if you're as clever as you can be
   when you write it, how will you ever debug it?”*

* We all write buggy code.  Accept it.  Deal with it.
* Write your code with testing and debugging in mind.
* Keep It Simple, Stupid (KISS).

  * What is the simplest thing that could possibly work?

* Don't Repeat Yourself (DRY).

  * Every piece of knowledge must have a single, unambiguous,
    authoritative representation within a system.
  * Constants, algorithms, etc...

* Try to limit interdependencies of your code. (Loose Coupling)
* Give your variables, functions and modules meaningful names (not
  mathematics names)

pyflakes: fast static analysis
-------------------------------

They are several static analysis tools in Python; to name a few:

* `pylint <https://pylint.pycqa.org/en/latest/>`_
* `pychecker <https://pychecker.sourceforge.net/>`_
* `pyflakes <https://pypi.org/project/pyflakes>`_
* `flake8 <https://pypi.org/project/flake8>`_

Here we focus on `pyflakes`, which is the simplest tool.

    * **Fast, simple**

    * Detects syntax errors, missing imports, typos on names.

Another good recommendation is the `flake8` tool which is a combination of
pyflakes and pep8. Thus, in addition to the types of errors that pyflakes
catches, flake8 detects violations of the recommendation in `PEP8
<https://peps.python.org/pep-0008/>`_ style guide.

Integrating pyflakes (or flake8) in your editor or IDE is highly
recommended, it **does yield productivity gains**.

Running pyflakes on the current edited file
............................................

You can bind a key to run pyflakes in the current buffer.

* **In kate**
  Menu: 'settings -> configure kate

    * In plugins enable 'external tools'

    * In external Tools', add `pyflakes`::

        kdialog --title "pyflakes %filename" --msgbox "$(pyflakes %filename)"

* **In TextMate**

  Menu: TextMate -> Preferences -> Advanced -> Shell variables, add a
  shell variable::

    TM_PYCHECKER = /Library/Frameworks/Python.framework/Versions/Current/bin/pyflakes

  Then `Ctrl-Shift-V` is binded to a pyflakes report


* **In vim**
  In your `.vimrc` (binds F5 to `pyflakes`)::

    autocmd FileType python let &mp = 'echo "*** running % ***" ; pyflakes %'
    autocmd FileType tex,mp,rst,python imap <Esc>[15~ <C-O>:make!^M
    autocmd FileType tex,mp,rst,python map  <Esc>[15~ :make!^M
    autocmd FileType tex,mp,rst,python set autowrite

* **In emacs**
  In your `.emacs` (binds F5 to `pyflakes`)::

    (defun pyflakes-thisfile () (interactive)
           (compile (format "pyflakes %s" (buffer-file-name)))
    )

    (define-minor-mode pyflakes-mode
        "Toggle pyflakes mode.
        With no argument, this command toggles the mode.
        Non-null prefix argument turns on the mode.
        Null prefix argument turns off the mode."
        ;; The initial value.
        nil
        ;; The indicator for the mode line.
        " Pyflakes"
        ;; The minor mode bindings.
        '( ([f5] . pyflakes-thisfile) )
    )

    (add-hook 'python-mode-hook (lambda () (pyflakes-mode t)))

A type-as-go spell-checker like integration
............................................

* **In vim**

  * Use the pyflakes.vim plugin:

    #. download the zip file from
       https://www.vim.org/scripts/script.php?script_id=2441

    #. extract the files in ``~/.vim/ftplugin/python``

    #. make sure your vimrc has ``filetype plugin indent on``

    .. image:: vim_pyflakes.png

  * Alternatively: use the `syntastic
    <https://github.com/vim-syntastic/syntastic>`_
    plugin. This can be configured to use ``flake8`` too and also handles
    on-the-fly checking for many other languages.

    .. image:: vim_syntastic.png

* **In emacs**

  Use the flymake mode with pyflakes, documented on
  https://www.emacswiki.org/emacs/FlyMake and included in Emacs 26 and
  more recent. To activate it, use ``M-x`` (meta-key then x) and enter
  `flymake-mode` at the prompt. To enable it automatically when
  opening a Python file, add the following line to your .emacs file::

    (add-hook 'python-mode-hook '(lambda () (flymake-mode)))


Debugging workflow
===================

If you do have a non trivial bug, this is when debugging strategies kick
in. There is no silver bullet. Yet, strategies help:

   **For debugging a given problem, the favorable situation is when the
   problem is isolated in a small number of lines of code, outside
   framework or application code, with short modify-run-fail cycles**

#. Make it fail reliably.  Find a test case that makes the code fail
   every time.
#. Divide and Conquer.  Once you have a failing test case, isolate the
   failing code.

   * Which module.
   * Which function.
   * Which line of code.

   => isolate a small reproducible failure: a test case

#. Change one thing at a time and re-run the failing test case.
#. Use the debugger to understand what is going wrong.
#. Take notes and be patient.  It may take a while.

.. note::

   Once you have gone through this process: isolated a tight piece of
   code reproducing the bug and fix the bug using this piece of code, add
   the corresponding code to your test suite.

Using the Python debugger
=========================

The python debugger, ``pdb``: https://docs.python.org/3/library/pdb.html,
allows you to inspect your code interactively.

Specifically it allows you to:

  * View the source code.
  * Walk up and down the call stack.
  * Inspect values of variables.
  * Modify values of variables.
  * Set breakpoints.

.. topic:: **print**

    Yes, ``print`` statements do work as a debugging tool. However to
    inspect runtime, it is often more efficient to use the debugger.

Invoking the debugger
-----------------------

Ways to launch the debugger:

#. Postmortem, launch debugger after module errors.
#. Launch the module with the debugger.
#. Call the debugger inside the module


Postmortem
...........

**Situation**: You're working in IPython and you get a traceback.

Here we debug the file :download:`index_error.py`. When running it, an
:class:`IndexError` is raised. Type ``%debug`` and drop into the debugger.

.. code-block:: ipython

    In [1]: %run index_error.py
    ---------------------------------------------------------------------------
    IndexError                                Traceback (most recent call last)
    File ~/src/scientific-python-lectures/advanced/debugging/index_error.py:10
          6     print(lst[len(lst)])
          9 if __name__ == "__main__":
    ---> 10     index_error()

    File ~/src/scientific-python-lectures/advanced/debugging/index_error.py:6, in index_error()
          4 def index_error():
          5     lst = list("foobar")
    ----> 6     print(lst[len(lst)])

    IndexError: list index out of range

    In [2]: %debug
    > /home/jarrod/src/scientific-python-lectures/advanced/debugging/index_error.py(6)index_error()
          4 def index_error():
          5     lst = list("foobar")
    ----> 6     print(lst[len(lst)])
          7
          8

    ipdb> list
          1 """Small snippet to raise an IndexError."""
          2
          3
          4 def index_error():
          5     lst = list("foobar")
    ----> 6     print(lst[len(lst)])
          7
          8
          9 if __name__ == "__main__":
         10     index_error()

    ipdb> len(lst)
    6
    ipdb> print(lst[len(lst) - 1])
    r
    ipdb> quit

.. topic:: Post-mortem debugging without IPython

   In some situations you cannot use IPython, for instance to debug a
   script that wants to be called from the command line. In this case,
   you can call the script with ``python -m pdb script.py``::

    $ python -m pdb index_error.py
    > /home/jarrod/src/scientific-python-lectures/advanced/debugging/index_error.py(1)<module>()
    -> """Small snippet to raise an IndexError."""
    (Pdb) continue
    Traceback (most recent call last):
      File "/usr/lib64/python3.11/pdb.py", line 1793, in main
        pdb._run(target)
      File "/usr/lib64/python3.11/pdb.py", line 1659, in _run
        self.run(target.code)
      File "/usr/lib64/python3.11/bdb.py", line 600, in run
        exec(cmd, globals, locals)
      File "<string>", line 1, in <module>
      File "/home/jarrod/src/scientific-python-lectures/advanced/debugging/index_error.py", line 10, in <module>
        index_error()
      File "/home/jarrod/src/scientific-python-lectures/advanced/debugging/index_error.py", line 6, in index_error
        print(lst[len(lst)])
              ~~~^^^^^^^^^^
    IndexError: list index out of range
    Uncaught exception. Entering post mortem debugging
    Running 'cont' or 'step' will restart the program
    > /home/jarrod/src/scientific-python-lectures/advanced/debugging/index_error.py(6)index_error()
    -> print(lst[len(lst)])
    (Pdb)

Step-by-step execution
.......................

**Situation**: You believe a bug exists in a module but are not sure where.

For instance we are trying to debug :download:`wiener_filtering.py`.
Indeed the code runs, but the filtering does not work well.

* Run the script in IPython with the debugger using ``%run -d
  wiener_filtering.py`` :

  .. code-block:: ipython

    In [1]: %run -d wiener_filtering.py
    *** Blank or comment
    *** Blank or comment
    *** Blank or comment
    NOTE: Enter 'c' at the ipdb>  prompt to continue execution.
    > /home/jarrod/src/scientific-python-lectures/advanced/debugging/wiener_filtering.py(1)<module>()
    ----> 1 """Wiener filtering a noisy raccoon face: this module is buggy"""
          2
          3 import numpy as np
          4 import scipy as sp
          5 import matplotlib.pyplot as plt

* Set a break point at line 29 using ``b 29``:

  .. code-block:: ipython

    ipdb> n
    > /home/jarrod/src/scientific-python-lectures/advanced/debugging/wiener_filtering.py(3)<module>()
          1 """Wiener filtering a noisy raccoon face: this module is buggy"""
          2
    ----> 3 import numpy as np
          4 import scipy as sp
          5 import matplotlib.pyplot as plt

    ipdb> b 29
    Breakpoint 1 at /home/jarrod/src/scientific-python-lectures/advanced/debugging/wiener_filtering.py:29

* Continue execution to next breakpoint with ``c(ont(inue))``:

  .. code-block:: ipython

    ipdb> c
    > /home/jarrod/src/scientific-python-lectures/advanced/debugging/wiener_filtering.py(29)iterated_wiener()
         27     Do not use this: this is crappy code to demo bugs!
         28     """
    1--> 29     noisy_img = noisy_img
         30     denoised_img = local_mean(noisy_img, size=size)
         31     l_var = local_var(noisy_img, size=size)

* Step into code with ``n(ext)`` and ``s(tep)``: ``next`` jumps to the next
  statement in the current execution context, while ``step`` will go across
  execution contexts, i.e. enable exploring inside function calls:

  .. code-block:: ipython

    ipdb> s
    > /home/jarrod/src/scientific-python-lectures/advanced/debugging/wiener_filtering.py(30)iterated_wiener()
         28     """
    1    29     noisy_img = noisy_img
    ---> 30     denoised_img = local_mean(noisy_img, size=size)
         31     l_var = local_var(noisy_img, size=size)
         32     for i in range(3):

    ipdb> n
    > /home/jarrod/src/scientific-python-lectures/advanced/debugging/wiener_filtering.py(31)iterated_wiener()
    1    29     noisy_img = noisy_img
         30     denoised_img = local_mean(noisy_img, size=size)
    ---> 31     l_var = local_var(noisy_img, size=size)
         32     for i in range(3):
         33         res = noisy_img - denoised_img

* Step a few lines and explore the local variables:

  .. code-block:: ipython

    ipdb> n
    > /home/jarrod/src/scientific-python-lectures/advanced/debugging/wiener_filtering.py(32)iterated_wiener()
         30     denoised_img = local_mean(noisy_img, size=size)
         31     l_var = local_var(noisy_img, size=size)
    ---> 32     for i in range(3):
         33         res = noisy_img - denoised_img
         34         noise = (res**2).sum() / res.size

    ipdb> print(l_var)
    [[2571 2782 3474 ... 3008 2922 3141]
     [2105  708  475 ...  469  354 2884]
     [1697  420  645 ...  273  236 2517]
     ...
     [2437  345  432 ...  413  387 4188]
     [2598  179  247 ...  367  441 3909]
     [2808 2525 3117 ... 4413 4454 4385]]
    ipdb> print(l_var.min())
    0

Oh dear, nothing but integers, and 0 variation. Here is our bug, we are
doing integer arithmetic.

.. topic:: Raising exception on numerical errors

    When we run the :download:`wiener_filtering.py` file, the following
    warnings are raised:

    .. code-block:: ipython

        In [2]: %run wiener_filtering.py
        /home/jarrod/src/scientific-python-lectures/advanced/debugging/wiener_filtering.py:35: RuntimeWarning: divide by zero encountered in divide
          noise_level = 1 - noise / l_var

    We can turn these warnings in exception, which enables us to do
    post-mortem debugging on them, and find our problem more quickly:

    .. code-block:: ipython

        In [3]: np.seterr(all='raise')
        Out[3]: {'divide': 'warn', 'over': 'warn', 'under': 'ignore', 'invalid': 'warn'}

        In [4]:  %run wiener_filtering.py
        ---------------------------------------------------------------------------
        FloatingPointError                        Traceback (most recent call last)
        File ~/src/scientific-python-lectures/advanced/debugging/wiener_filtering.py:52
             49 plt.matshow(face[cut], cmap=plt.cm.gray)
             50 plt.matshow(noisy_face[cut], cmap=plt.cm.gray)
        ---> 52 denoised_face = iterated_wiener(noisy_face)
             53 plt.matshow(denoised_face[cut], cmap=plt.cm.gray)
             55 plt.show()

        File ~/src/scientific-python-lectures/advanced/debugging/wiener_filtering.py:35, in iterated_wiener(noisy_img, size)
             33 res = noisy_img - denoised_img
             34 noise = (res**2).sum() / res.size
        ---> 35 noise_level = 1 - noise / l_var
             36 noise_level[noise_level < 0] = 0
             37 denoised_img = np.int64(noise_level * res)

        FloatingPointError: divide by zero encountered in divide


Other ways of starting a debugger
....................................

* **Raising an exception as a poor man break point**

  If you find it tedious to note the line number to set a break point,
  you can simply raise an exception at the point that you want to
  inspect and use IPython's ``%debug``. Note that in this case you cannot
  step or continue the execution.

* **Debugging test failures using nosetests**

  You can run ``nosetests --pdb`` to drop in post-mortem debugging on
  exceptions, and ``nosetests --pdb-failure`` to inspect test failures
  using the debugger.

  In addition, you can use the IPython interface for the debugger in nose
  by installing the nose plugin
  `ipdbplugin <https://pypi.org/project/ipdbplugin>`_. You can than
  pass ``--ipdb`` and ``--ipdb-failure`` options to nosetests.

* **Calling the debugger explicitly**

  Insert the following line where you want to drop in the debugger::

    import pdb; pdb.set_trace()

.. warning::

    When running ``nosetests``, the output is captured, and thus it seems
    that the debugger does not work. Simply run the nosetests with the ``-s``
    flag.


.. topic:: Graphical debuggers and alternatives

    * `pudb <https://pypi.org/project/pudb>`_ is a good semi-graphical
      debugger with a text user interface in the console.

    * The `Visual Studio Code <https://code.visualstudio.com/>`_ integrated
      development environment includes a debugging mode.

    * The `Mu editor <https://codewith.mu/>`_ is a simple Python editor that
      includes a debugging mode.


Debugger commands and interaction
----------------------------------

============ ======================================================================
``l(list)``   Lists the code at the current position
``u(p)``      Walk up the call stack
``d(own)``    Walk down the call stack
``n(ext)``    Execute the next line (does not go down in new functions)
``s(tep)``    Execute the next statement (goes down in new functions)
``bt``        Print the call stack
``a``         Print the local variables
``!command``  Execute the given **Python** command (by opposition to pdb commands
============ ======================================================================

.. warning:: **Debugger commands are not Python code**

    You cannot name the variables the way you want. For instance, if in
    you cannot override the variables in the current frame with the same
    name: **use different names than your local variable when typing code
    in the debugger**.

Getting help when in the debugger
.................................

Type ``h`` or ``help`` to access the interactive help:

.. sourcecode:: pycon

    ipdb> help

    Documented commands (type help <topic>):
    ========================================
    EOF    commands   enable      ll        pp       s                until
    a      condition  exceptions  longlist  psource  skip_hidden      up
    alias  cont       exit        n         q        skip_predicates  w
    args   context    h           next      quit     source           whatis
    b      continue   help        p         r        step             where
    break  d          ignore      pdef      restart  tbreak
    bt     debug      j           pdoc      return   u
    c      disable    jump        pfile     retval   unalias
    cl     display    l           pinfo     run      undisplay
    clear  down       list        pinfo2    rv       unt

    Miscellaneous help topics:
    ==========================
    exec  pdb

    Undocumented commands:
    ======================
    interact

Debugging segmentation faults using gdb
==========================================

If you have a segmentation fault, you cannot debug it with pdb, as it
crashes the Python interpreter before it can drop in the debugger.
Similarly, if you have a bug in C code embedded in Python, pdb is
useless. For this we turn to the gnu debugger,
`gdb <https://www.gnu.org/software/gdb/>`_, available on Linux.

Before we start with gdb, let us add a few Python-specific tools to it.
For this we add a few macros to our ``~/.gdbinit``. The optimal choice of
macro depends on your Python version and your gdb version. I have added a
simplified version in :download:`gdbinit`, but feel free to read
`DebuggingWithGdb <https://wiki.python.org/moin/DebuggingWithGdb>`_.

To debug with gdb the Python script :download:`segfault.py`, we can run the
script in gdb as follows

.. sourcecode:: console

    $ gdb python
    ...
    (gdb) run segfault.py
    Starting program: /usr/bin/python segfault.py
    [Thread debugging using libthread_db enabled]

    Program received signal SIGSEGV, Segmentation fault.
    _strided_byte_copy (dst=0x8537478 "\360\343G", outstrides=4, src=
        0x86c0690 <Address 0x86c0690 out of bounds>, instrides=32, N=3,
        elsize=4)
            at numpy/core/src/multiarray/ctors.c:365
    365            _FAST_MOVE(Int32);
    (gdb)

We get a segfault, and gdb captures it for post-mortem debugging in the C
level stack (not the Python call stack). We can debug the C call stack
using gdb's commands:

.. sourcecode:: console

    (gdb) up
    #1  0x004af4f5 in _copy_from_same_shape (dest=<value optimized out>,
        src=<value optimized out>, myfunc=0x496780 <_strided_byte_copy>,
        swap=0)
    at numpy/core/src/multiarray/ctors.c:748
    748         myfunc(dit->dataptr, dest->strides[maxaxis],

As you can see, right now, we are in the C code of numpy. We would like
to know what is the Python code that triggers this segfault, so we go up
the stack until we hit the Python execution loop:

.. sourcecode:: console

    (gdb) up
    #8  0x080ddd23 in call_function (f=
        Frame 0x85371ec, for file /home/varoquau/usr/lib/python2.6/site-packages/numpy/core/arrayprint.py, line 156, in _leading_trailing (a=<numpy.ndarray at remote 0x85371b0>, _nc=<module at remote 0xb7f93a64>), throwflag=0)
        at ../Python/ceval.c:3750
    3750    ../Python/ceval.c: No such file or directory.
            in ../Python/ceval.c

    (gdb) up
    #9  PyEval_EvalFrameEx (f=
        Frame 0x85371ec, for file /home/varoquau/usr/lib/python2.6/site-packages/numpy/core/arrayprint.py, line 156, in _leading_trailing (a=<numpy.ndarray at remote 0x85371b0>, _nc=<module at remote 0xb7f93a64>), throwflag=0)
        at ../Python/ceval.c:2412
    2412    in ../Python/ceval.c
    (gdb)

Once we are in the Python execution loop, we can use our special Python
helper function. For instance we can find the corresponding Python code:

.. sourcecode:: console

    (gdb) pyframe
    /home/varoquau/usr/lib/python2.6/site-packages/numpy/core/arrayprint.py (158): _leading_trailing
    (gdb)

This is numpy code, we need to go up until we find code that we have
written:

.. sourcecode:: console

    (gdb) up
    ...
    (gdb) up
    #34 0x080dc97a in PyEval_EvalFrameEx (f=
        Frame 0x82f064c, for file segfault.py, line 11, in print_big_array (small_array=<numpy.ndarray at remote 0x853ecf0>, big_array=<numpy.ndarray at remote 0x853ed20>), throwflag=0) at ../Python/ceval.c:1630
    1630    ../Python/ceval.c: No such file or directory.
            in ../Python/ceval.c
    (gdb) pyframe
    segfault.py (12): print_big_array

The corresponding code is:

.. literalinclude:: segfault.py
   :language: py
   :lines: 8-14

Thus the segfault happens when printing ``big_array[-10:]``. The reason is
simply that ``big_array`` has been allocated with its end outside the
program memory.

.. note::

    For a list of Python-specific commands defined in the `gdbinit`, read
    the source of this file.


____

.. topic:: **Wrap up exercise**
    :class: green

    The following script is well documented and hopefully legible. It
    seeks to answer a problem of actual interest for numerical computing,
    but it does not work... Can you debug it?

    **Python source code:** :download:`to_debug.py <to_debug.py>`

    .. only:: html

        .. literalinclude:: to_debug.py