Last post covered some technical background using vmprof to profile Python with compiled or JIT'ed extensions. Now I've created a prototype that can convert the output to callgrind format so it can be viewed with KCachegrind.
To install the prototype using the Anaconda distribution:
- Create a new environment (if you do not use a new environment, these packages may conflict with an existing Numba install):
conda create -n profiling python numpy
- Switch to the new environment:
source activate profiling
- Install prototype versions of Numba and llvmlite:
conda install -c https://conda.anaconda.org/mdewing numba-profiling
- Install prototype version of vmprof:
conda install -c https://conda.anaconda.org/mdewing vmprof-numba
- Make sure libunwind is installed. (On Ubuntu
apt-get install libunwind8-dev.) (On Ubuntu, it must be the -dev version. If not installed, the error message when trying to run vmprof is
ImportError: libunwind.so.8: cannot open shared object file: No such file or directory)
- Install KCachegrind (On Ubuntu,
apt-get install kcachegrind)
There is a wrapper (
vmprofrun) that automates the running and processing steps.
To use it, run
vmprofrun <target python script> [arguments to the python script].
(No need to specify
python - that gets added to the command line automatically.)
By default it will output
vmprof-<pid>.out, which can be viewed in KCachegrind.
vmprofrun tool saves the vmprof output during the run to
out.vmprof. After the run, it automatically copies the
/tmp/perf-<pid>.map file to the current directory (if running under Numba).
Finally it runs
vmproftocallgrind using these files as input.
- Only works on 64-bit Linux
- Function-level profiles only - no line information (for either python or native code)
- Sometimes the profiling hangs during the run - kill the process and try again.
- Works with Python 2.7 or 3.4
- Not well validated or tested yet
- It does not work well yet with the existing vmprof web visualization and CLI tools.
- The stack dump tool will process the stacks to remove the Python interpreter frames.
- By default the Numba
Dispatcher_calllevel is removed. Otherwise the call graph in KCachegrind gets tangled by all the call paths running through this function.
- It should work with C extensions and Cython as well.