.. _debugging_chapter: ============== Debugging code ============== **Author**: *Gaël Varoquaux* This section explores tools to understand better your code base: debugging, to find and fix bugs. It is not specific to the scientific Python community, but the strategies that we will employ are tailored to its needs. .. topic:: Prerequisites * NumPy * IPython * `nosetests `__ * `pyflakes `__ * gdb for the C-debugging part. .. contents:: Chapter contents :local: :depth: 2 Avoiding bugs ============= Coding best practices to avoid getting in trouble -------------------------------------------------- .. sidebar:: Brian Kernighan *“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?”* * We all write buggy code. Accept it. Deal with it. * Write your code with testing and debugging in mind. * Keep It Simple, Stupid (KISS). * What is the simplest thing that could possibly work? * Don't Repeat Yourself (DRY). * Every piece of knowledge must have a single, unambiguous, authoritative representation within a system. * Constants, algorithms, etc... * Try to limit interdependencies of your code. (Loose Coupling) * Give your variables, functions and modules meaningful names (not mathematics names) pyflakes: fast static analysis ------------------------------- They are several static analysis tools in Python; to name a few: * `pylint `_ * `pychecker `_ * `pyflakes `_ * `flake8 `_ Here we focus on `pyflakes`, which is the simplest tool. * **Fast, simple** * Detects syntax errors, missing imports, typos on names. Another good recommendation is the `flake8` tool which is a combination of pyflakes and pep8. Thus, in addition to the types of errors that pyflakes catches, flake8 detects violations of the recommendation in `PEP8 `_ style guide. Integrating pyflakes (or flake8) in your editor or IDE is highly recommended, it **does yield productivity gains**. Running pyflakes on the current edited file ............................................ You can bind a key to run pyflakes in the current buffer. * **In kate** Menu: 'settings -> configure kate * In plugins enable 'external tools' * In external Tools', add `pyflakes`:: kdialog --title "pyflakes %filename" --msgbox "$(pyflakes %filename)" * **In TextMate** Menu: TextMate -> Preferences -> Advanced -> Shell variables, add a shell variable:: TM_PYCHECKER = /Library/Frameworks/Python.framework/Versions/Current/bin/pyflakes Then `Ctrl-Shift-V` is binded to a pyflakes report * **In vim** In your `.vimrc` (binds F5 to `pyflakes`):: autocmd FileType python let &mp = 'echo "*** running % ***" ; pyflakes %' autocmd FileType tex,mp,rst,python imap [15~ :make!^M autocmd FileType tex,mp,rst,python map [15~ :make!^M autocmd FileType tex,mp,rst,python set autowrite * **In emacs** In your `.emacs` (binds F5 to `pyflakes`):: (defun pyflakes-thisfile () (interactive) (compile (format "pyflakes %s" (buffer-file-name))) ) (define-minor-mode pyflakes-mode "Toggle pyflakes mode. With no argument, this command toggles the mode. Non-null prefix argument turns on the mode. Null prefix argument turns off the mode." ;; The initial value. nil ;; The indicator for the mode line. " Pyflakes" ;; The minor mode bindings. '( ([f5] . pyflakes-thisfile) ) ) (add-hook 'python-mode-hook (lambda () (pyflakes-mode t))) A type-as-go spell-checker like integration ............................................ * **In vim** * Use the pyflakes.vim plugin: #. download the zip file from https://www.vim.org/scripts/script.php?script_id=2441 #. extract the files in ``~/.vim/ftplugin/python`` #. make sure your vimrc has ``filetype plugin indent on`` .. image:: vim_pyflakes.png * Alternatively: use the `syntastic `_ plugin. This can be configured to use ``flake8`` too and also handles on-the-fly checking for many other languages. .. image:: vim_syntastic.png * **In emacs** Use the flymake mode with pyflakes, documented on https://www.emacswiki.org/emacs/FlyMake and included in Emacs 26 and more recent. To activate it, use ``M-x`` (meta-key then x) and enter `flymake-mode` at the prompt. To enable it automatically when opening a Python file, add the following line to your .emacs file:: (add-hook 'python-mode-hook '(lambda () (flymake-mode))) Debugging workflow =================== If you do have a non trivial bug, this is when debugging strategies kick in. There is no silver bullet. Yet, strategies help: **For debugging a given problem, the favorable situation is when the problem is isolated in a small number of lines of code, outside framework or application code, with short modify-run-fail cycles** #. Make it fail reliably. Find a test case that makes the code fail every time. #. Divide and Conquer. Once you have a failing test case, isolate the failing code. * Which module. * Which function. * Which line of code. => isolate a small reproducible failure: a test case #. Change one thing at a time and re-run the failing test case. #. Use the debugger to understand what is going wrong. #. Take notes and be patient. It may take a while. .. note:: Once you have gone through this process: isolated a tight piece of code reproducing the bug and fix the bug using this piece of code, add the corresponding code to your test suite. Using the Python debugger ========================= The python debugger, ``pdb``: https://docs.python.org/3/library/pdb.html, allows you to inspect your code interactively. Specifically it allows you to: * View the source code. * Walk up and down the call stack. * Inspect values of variables. * Modify values of variables. * Set breakpoints. .. topic:: **print** Yes, ``print`` statements do work as a debugging tool. However to inspect runtime, it is often more efficient to use the debugger. Invoking the debugger ----------------------- Ways to launch the debugger: #. Postmortem, launch debugger after module errors. #. Launch the module with the debugger. #. Call the debugger inside the module Postmortem ........... **Situation**: You're working in IPython and you get a traceback. Here we debug the file :download:`index_error.py`. When running it, an :class:`IndexError` is raised. Type ``%debug`` and drop into the debugger. .. code-block:: ipython In [1]: %run index_error.py --------------------------------------------------------------------------- IndexError Traceback (most recent call last) File ~/src/scientific-python-lectures/advanced/debugging/index_error.py:10 6 print(lst[len(lst)]) 9 if __name__ == "__main__": ---> 10 index_error() File ~/src/scientific-python-lectures/advanced/debugging/index_error.py:6, in index_error() 4 def index_error(): 5 lst = list("foobar") ----> 6 print(lst[len(lst)]) IndexError: list index out of range In [2]: %debug > /home/jarrod/src/scientific-python-lectures/advanced/debugging/index_error.py(6)index_error() 4 def index_error(): 5 lst = list("foobar") ----> 6 print(lst[len(lst)]) 7 8 ipdb> list 1 """Small snippet to raise an IndexError.""" 2 3 4 def index_error(): 5 lst = list("foobar") ----> 6 print(lst[len(lst)]) 7 8 9 if __name__ == "__main__": 10 index_error() ipdb> len(lst) 6 ipdb> print(lst[len(lst) - 1]) r ipdb> quit .. topic:: Post-mortem debugging without IPython In some situations you cannot use IPython, for instance to debug a script that wants to be called from the command line. In this case, you can call the script with ``python -m pdb script.py``:: $ python -m pdb index_error.py > /home/jarrod/src/scientific-python-lectures/advanced/debugging/index_error.py(1)() -> """Small snippet to raise an IndexError.""" (Pdb) continue Traceback (most recent call last): File "/usr/lib64/python3.11/pdb.py", line 1793, in main pdb._run(target) File "/usr/lib64/python3.11/pdb.py", line 1659, in _run self.run(target.code) File "/usr/lib64/python3.11/bdb.py", line 600, in run exec(cmd, globals, locals) File "", line 1, in File "/home/jarrod/src/scientific-python-lectures/advanced/debugging/index_error.py", line 10, in index_error() File "/home/jarrod/src/scientific-python-lectures/advanced/debugging/index_error.py", line 6, in index_error print(lst[len(lst)]) ~~~^^^^^^^^^^ IndexError: list index out of range Uncaught exception. Entering post mortem debugging Running 'cont' or 'step' will restart the program > /home/jarrod/src/scientific-python-lectures/advanced/debugging/index_error.py(6)index_error() -> print(lst[len(lst)]) (Pdb) Step-by-step execution ....................... **Situation**: You believe a bug exists in a module but are not sure where. For instance we are trying to debug :download:`wiener_filtering.py`. Indeed the code runs, but the filtering does not work well. * Run the script in IPython with the debugger using ``%run -d wiener_filtering.py`` : .. code-block:: ipython In [1]: %run -d wiener_filtering.py *** Blank or comment *** Blank or comment *** Blank or comment NOTE: Enter 'c' at the ipdb> prompt to continue execution. > /home/jarrod/src/scientific-python-lectures/advanced/debugging/wiener_filtering.py(1)() ----> 1 """Wiener filtering a noisy raccoon face: this module is buggy""" 2 3 import numpy as np 4 import scipy as sp 5 import matplotlib.pyplot as plt * Set a break point at line 29 using ``b 29``: .. code-block:: ipython ipdb> n > /home/jarrod/src/scientific-python-lectures/advanced/debugging/wiener_filtering.py(3)() 1 """Wiener filtering a noisy raccoon face: this module is buggy""" 2 ----> 3 import numpy as np 4 import scipy as sp 5 import matplotlib.pyplot as plt ipdb> b 29 Breakpoint 1 at /home/jarrod/src/scientific-python-lectures/advanced/debugging/wiener_filtering.py:29 * Continue execution to next breakpoint with ``c(ont(inue))``: .. code-block:: ipython ipdb> c > /home/jarrod/src/scientific-python-lectures/advanced/debugging/wiener_filtering.py(29)iterated_wiener() 27 Do not use this: this is crappy code to demo bugs! 28 """ 1--> 29 noisy_img = noisy_img 30 denoised_img = local_mean(noisy_img, size=size) 31 l_var = local_var(noisy_img, size=size) * Step into code with ``n(ext)`` and ``s(tep)``: ``next`` jumps to the next statement in the current execution context, while ``step`` will go across execution contexts, i.e. enable exploring inside function calls: .. code-block:: ipython ipdb> s > /home/jarrod/src/scientific-python-lectures/advanced/debugging/wiener_filtering.py(30)iterated_wiener() 28 """ 1 29 noisy_img = noisy_img ---> 30 denoised_img = local_mean(noisy_img, size=size) 31 l_var = local_var(noisy_img, size=size) 32 for i in range(3): ipdb> n > /home/jarrod/src/scientific-python-lectures/advanced/debugging/wiener_filtering.py(31)iterated_wiener() 1 29 noisy_img = noisy_img 30 denoised_img = local_mean(noisy_img, size=size) ---> 31 l_var = local_var(noisy_img, size=size) 32 for i in range(3): 33 res = noisy_img - denoised_img * Step a few lines and explore the local variables: .. code-block:: ipython ipdb> n > /home/jarrod/src/scientific-python-lectures/advanced/debugging/wiener_filtering.py(32)iterated_wiener() 30 denoised_img = local_mean(noisy_img, size=size) 31 l_var = local_var(noisy_img, size=size) ---> 32 for i in range(3): 33 res = noisy_img - denoised_img 34 noise = (res**2).sum() / res.size ipdb> print(l_var) [[2571 2782 3474 ... 3008 2922 3141] [2105 708 475 ... 469 354 2884] [1697 420 645 ... 273 236 2517] ... [2437 345 432 ... 413 387 4188] [2598 179 247 ... 367 441 3909] [2808 2525 3117 ... 4413 4454 4385]] ipdb> print(l_var.min()) 0 Oh dear, nothing but integers, and 0 variation. Here is our bug, we are doing integer arithmetic. .. topic:: Raising exception on numerical errors When we run the :download:`wiener_filtering.py` file, the following warnings are raised: .. code-block:: ipython In [2]: %run wiener_filtering.py /home/jarrod/src/scientific-python-lectures/advanced/debugging/wiener_filtering.py:35: RuntimeWarning: divide by zero encountered in divide noise_level = 1 - noise / l_var We can turn these warnings in exception, which enables us to do post-mortem debugging on them, and find our problem more quickly: .. code-block:: ipython In [3]: np.seterr(all='raise') Out[3]: {'divide': 'warn', 'over': 'warn', 'under': 'ignore', 'invalid': 'warn'} In [4]: %run wiener_filtering.py --------------------------------------------------------------------------- FloatingPointError Traceback (most recent call last) File ~/src/scientific-python-lectures/advanced/debugging/wiener_filtering.py:52 49 plt.matshow(face[cut], cmap=plt.cm.gray) 50 plt.matshow(noisy_face[cut], cmap=plt.cm.gray) ---> 52 denoised_face = iterated_wiener(noisy_face) 53 plt.matshow(denoised_face[cut], cmap=plt.cm.gray) 55 plt.show() File ~/src/scientific-python-lectures/advanced/debugging/wiener_filtering.py:35, in iterated_wiener(noisy_img, size) 33 res = noisy_img - denoised_img 34 noise = (res**2).sum() / res.size ---> 35 noise_level = 1 - noise / l_var 36 noise_level[noise_level < 0] = 0 37 denoised_img = np.int64(noise_level * res) FloatingPointError: divide by zero encountered in divide Other ways of starting a debugger .................................... * **Raising an exception as a poor man break point** If you find it tedious to note the line number to set a break point, you can simply raise an exception at the point that you want to inspect and use IPython's ``%debug``. Note that in this case you cannot step or continue the execution. * **Debugging test failures using nosetests** You can run ``nosetests --pdb`` to drop in post-mortem debugging on exceptions, and ``nosetests --pdb-failure`` to inspect test failures using the debugger. In addition, you can use the IPython interface for the debugger in nose by installing the nose plugin `ipdbplugin `_. You can than pass ``--ipdb`` and ``--ipdb-failure`` options to nosetests. * **Calling the debugger explicitly** Insert the following line where you want to drop in the debugger:: import pdb; pdb.set_trace() .. warning:: When running ``nosetests``, the output is captured, and thus it seems that the debugger does not work. Simply run the nosetests with the ``-s`` flag. .. topic:: Graphical debuggers and alternatives * `pudb `_ is a good semi-graphical debugger with a text user interface in the console. * The `Visual Studio Code `_ integrated development environment includes a debugging mode. * The `Mu editor `_ is a simple Python editor that includes a debugging mode. Debugger commands and interaction ---------------------------------- ============ ====================================================================== ``l(list)`` Lists the code at the current position ``u(p)`` Walk up the call stack ``d(own)`` Walk down the call stack ``n(ext)`` Execute the next line (does not go down in new functions) ``s(tep)`` Execute the next statement (goes down in new functions) ``bt`` Print the call stack ``a`` Print the local variables ``!command`` Execute the given **Python** command (by opposition to pdb commands ============ ====================================================================== .. warning:: **Debugger commands are not Python code** You cannot name the variables the way you want. For instance, if in you cannot override the variables in the current frame with the same name: **use different names than your local variable when typing code in the debugger**. Getting help when in the debugger ................................. Type ``h`` or ``help`` to access the interactive help: .. sourcecode:: pycon ipdb> help Documented commands (type help ): ======================================== EOF commands enable ll pp s until a condition exceptions longlist psource skip_hidden up alias cont exit n q skip_predicates w args context h next quit source whatis b continue help p r step where break d ignore pdef restart tbreak bt debug j pdoc return u c disable jump pfile retval unalias cl display l pinfo run undisplay clear down list pinfo2 rv unt Miscellaneous help topics: ========================== exec pdb Undocumented commands: ====================== interact Debugging segmentation faults using gdb ========================================== If you have a segmentation fault, you cannot debug it with pdb, as it crashes the Python interpreter before it can drop in the debugger. Similarly, if you have a bug in C code embedded in Python, pdb is useless. For this we turn to the gnu debugger, `gdb `_, available on Linux. Before we start with gdb, let us add a few Python-specific tools to it. For this we add a few macros to our ``~/.gdbinit``. The optimal choice of macro depends on your Python version and your gdb version. I have added a simplified version in :download:`gdbinit`, but feel free to read `DebuggingWithGdb `_. To debug with gdb the Python script :download:`segfault.py`, we can run the script in gdb as follows .. sourcecode:: console $ gdb python ... (gdb) run segfault.py Starting program: /usr/bin/python segfault.py [Thread debugging using libthread_db enabled] Program received signal SIGSEGV, Segmentation fault. _strided_byte_copy (dst=0x8537478 "\360\343G", outstrides=4, src= 0x86c0690
, instrides=32, N=3, elsize=4) at numpy/core/src/multiarray/ctors.c:365 365 _FAST_MOVE(Int32); (gdb) We get a segfault, and gdb captures it for post-mortem debugging in the C level stack (not the Python call stack). We can debug the C call stack using gdb's commands: .. sourcecode:: console (gdb) up #1 0x004af4f5 in _copy_from_same_shape (dest=, src=, myfunc=0x496780 <_strided_byte_copy>, swap=0) at numpy/core/src/multiarray/ctors.c:748 748 myfunc(dit->dataptr, dest->strides[maxaxis], As you can see, right now, we are in the C code of numpy. We would like to know what is the Python code that triggers this segfault, so we go up the stack until we hit the Python execution loop: .. sourcecode:: console (gdb) up #8 0x080ddd23 in call_function (f= Frame 0x85371ec, for file /home/varoquau/usr/lib/python2.6/site-packages/numpy/core/arrayprint.py, line 156, in _leading_trailing (a=, _nc=), throwflag=0) at ../Python/ceval.c:3750 3750 ../Python/ceval.c: No such file or directory. in ../Python/ceval.c (gdb) up #9 PyEval_EvalFrameEx (f= Frame 0x85371ec, for file /home/varoquau/usr/lib/python2.6/site-packages/numpy/core/arrayprint.py, line 156, in _leading_trailing (a=, _nc=), throwflag=0) at ../Python/ceval.c:2412 2412 in ../Python/ceval.c (gdb) Once we are in the Python execution loop, we can use our special Python helper function. For instance we can find the corresponding Python code: .. sourcecode:: console (gdb) pyframe /home/varoquau/usr/lib/python2.6/site-packages/numpy/core/arrayprint.py (158): _leading_trailing (gdb) This is numpy code, we need to go up until we find code that we have written: .. sourcecode:: console (gdb) up ... (gdb) up #34 0x080dc97a in PyEval_EvalFrameEx (f= Frame 0x82f064c, for file segfault.py, line 11, in print_big_array (small_array=, big_array=), throwflag=0) at ../Python/ceval.c:1630 1630 ../Python/ceval.c: No such file or directory. in ../Python/ceval.c (gdb) pyframe segfault.py (12): print_big_array The corresponding code is: .. literalinclude:: segfault.py :language: py :lines: 8-14 Thus the segfault happens when printing ``big_array[-10:]``. The reason is simply that ``big_array`` has been allocated with its end outside the program memory. .. note:: For a list of Python-specific commands defined in the `gdbinit`, read the source of this file. ____ .. topic:: **Wrap up exercise** :class: green The following script is well documented and hopefully legible. It seeks to answer a problem of actual interest for numerical computing, but it does not work... Can you debug it? **Python source code:** :download:`to_debug.py ` .. only:: html .. literalinclude:: to_debug.py