Prototype for Profiling Python
Last post covered some technical background using vmprof to profile Python with compiled or JIT'ed extensions. Now I've created a prototype that can convert the output to callgrind format so it can be viewed with KCachegrind.
To install the prototype using the Anaconda distribution:
- Create a new environment (if you do not use a new environment, these packages may conflict with an existing Numba install):
conda create -n profiling python numpy - Switch to the new environment:
source activate profiling - Install prototype versions of Numba and llvmlite:
conda install -c https://conda.anaconda.org/mdewing numba-profiling - Install prototype version of vmprof:
conda install -c https://conda.anaconda.org/mdewing vmprof-numba - Make sure libunwind is installed. (On Ubuntu
apt-get install libunwind8-dev.) (On Ubuntu, it must be the -dev version. If not installed, the error message when trying to run vmprof isImportError: libunwind.so.8: cannot open shared object file: No such file or directory) - Install KCachegrind (On Ubuntu,
apt-get install kcachegrind)
There is a wrapper (vmprofrun) that automates the running and processing steps.
To use it, run vmprofrun <target python script> [arguments to the python script].
(No need to specify python - that gets added to the command line automatically.)
By default it will output vmprof-<pid>.out, which can be viewed in KCachegrind.
Underneath, the vmprofrun tool saves the vmprof output during the run to out.vmprof. After the run, it automatically copies the /tmp/perf-<pid>.map file to the current directory (if running under Numba).
It moves out.vmprof to out-<pid>.vmprof.
Finally it runs vmproftocallgrind using these files as input.
Limitations
- Only works on 64-bit Linux
- Function-level profiles only - no line information (for either python or native code)
- Sometimes the profiling hangs during the run - kill the process and try again.
- Works with Python 2.7 or 3.4
- Not well validated or tested yet
- It does not work well yet with the existing vmprof web visualization and CLI tools.
Other notes:
- The stack dump tool will process the stacks to remove the Python interpreter frames.
- By default the Numba
Dispatcher_calllevel is removed. Otherwise the call graph in KCachegrind gets tangled by all the call paths running through this function. - It should work with C extensions and Cython as well.
Comments
Comments powered by Disqus