Prototype for Profiling Python
Last post covered some technical background using vmprof to profile Python with compiled or JIT'ed extensions. Now I've created a prototype that can convert the output to callgrind format so it can be viewed with KCachegrind.
To install the prototype using the Anaconda distribution:
- Create a new environment (if you do not use a new environment, these packages may conflict with an existing Numba install):
conda create -n profiling python numpy
- Switch to the new environment:
source activate profiling
- Install prototype versions of Numba and llvmlite:
conda install -c https://conda.anaconda.org/mdewing numba-profiling
- Install prototype version of vmprof:
conda install -c https://conda.anaconda.org/mdewing vmprof-numba
- Make sure libunwind is installed. (On Ubuntu
apt-get install libunwind8-dev
.) (On Ubuntu, it must be the -dev version. If not installed, the error message when trying to run vmprof isImportError: libunwind.so.8: cannot open shared object file: No such file or directory
) - Install KCachegrind (On Ubuntu,
apt-get install kcachegrind
)
There is a wrapper (vmprofrun
) that automates the running and processing steps.
To use it, run vmprofrun <target python script> [arguments to the python script]
.
(No need to specify python
- that gets added to the command line automatically.)
By default it will output vmprof-<pid>.out
, which can be viewed in KCachegrind.
Underneath, the vmprofrun
tool saves the vmprof output during the run to out.vmprof
. After the run, it automatically copies the /tmp/perf-<pid>.map
file to the current directory (if running under Numba).
It moves out.vmprof
to out-<pid>.vmprof
.
Finally it runs vmproftocallgrind
using these files as input.
Limitations
- Only works on 64-bit Linux
- Function-level profiles only - no line information (for either python or native code)
- Sometimes the profiling hangs during the run - kill the process and try again.
- Works with Python 2.7 or 3.4
- Not well validated or tested yet
- It does not work well yet with the existing vmprof web visualization and CLI tools.
Other notes:
- The stack dump tool will process the stacks to remove the Python interpreter frames.
- By default the Numba
Dispatcher_call
level is removed. Otherwise the call graph in KCachegrind gets tangled by all the call paths running through this function. - It should work with C extensions and Cython as well.