Profiling in Python

Here at HumanGeo we use a lot of Python, and it is tons of fun. Python is a great language for writing beautiful and functional code amazingly fast, and it is most definitely my favorite language to use both privately and professionally. However, even though it is a wonderful language, Python can be painfully slow. Luckily, there are some amazing tools to help profile your code so that you can keep your beautiful code fast.

When I started working here at HumanGeo, I was tasked with taking a program that took many hours to run, finding the bottlenecks, and then doing whatever I could to make it run faster. I used many tools, including cProfilePyCallGraph (source), and even PyPy (an alternate, fast, interpreter for Python), to determine the best ways to optimize the program. I will go through how I used all of these programs, except for PyPy (which I ruled out to maintain interpreter consistency in production), and how they can help even the most seasoned developers find ways to better optimize their code.

Disclaimer: do not prematurely optimize! I’ll just leave this here.


Lets talk about some of the handy tools that you can use to profile Python code.


The CPython distribution comes with two profiling tools, profile and cProfile. Both share the same API, and should act the same, however the former has greater runtime overhead, so we shall stick with cProfile for this post.

cProfile is a handy tool for getting a nice greppable profile of your code, and for getting a good idea of where the hot parts of your code are. Lets look at some sample slow code:

-> % cat
import time

def main():
    sum = 0
    for i in range(10):
    	sum += expensive(i // 2)
    return sum

def expensive(t):
    return t

if __name__ == '__main__':

Here we are simulating a long running program by calling time.sleep, and pretending that the result matters. Lets profile this and see what we find:

-> % python -m cProfile
         34 function calls in 20.030 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000<module>)
        1    0.000    0.000    0.000    0.000
        7    0.000    0.000    0.000    0.000
       10    0.000    0.000   20.027    2.003
        1    0.002    0.002   20.030   20.030<module>)
        1    0.000    0.000   20.027   20.027
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {print}
        1    0.000    0.000    0.000    0.000 {range}
       10   20.027    2.003   20.027    2.003 {time.sleep}

Now this is very trivial code, but I find this not as helpful as it could be. The list of calls is sorted alphabetically, which has no importance to us, and I would much rather see the list sorted by number of calls, or by cumulative run time. Luckily, the -s argument exists, and we can sort the list and see the hot parts of our code right now!

-> % python -m cProfile -s calls
         34 function calls in 20.028 seconds

   Ordered by: call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       10    0.000    0.000   20.025    2.003
       10   20.025    2.003   20.025    2.003 {time.sleep}
        7    0.000    0.000    0.000    0.000
        1    0.000    0.000   20.026   20.026
        1    0.000    0.000    0.000    0.000
        1    0.000    0.000    0.000    0.000 {print}
        1    0.000    0.000    0.000    0.000<module>)
        1    0.003    0.003   20.028   20.028<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {range}

Ah! Now we see that the hot code is in our expensive function, which ends up being calling time.sleep enough times to cause an annoying slowdown.

The list of valid arguments to the -s parameter can be found in the Python documentation.  Make sure to use the output option, -o, if you want to save these results to a file!

With the basics down, lets look at some other ways that we can find hot code using profiling tools.


PyCallGraph can be seen as a visual extension of cProfile, where we can follow the flow of the code with a nice Graphviz image to look through. PyCallGraph is not part of the standard Python installation, and therefore can be simply installed with:

-> % pip install pycallgraph

We can run this graphical application with the following command:

-> % pycallgraph graphviz -- python
A pycallgraph.png file should be created in the directory where you ran the script, and it should give you some familiar results (if you have already run cProfile). The numbers should be the same as the data we got from cProfile, however, the benefit of PyCallGraph is in its ability to show the relationships of functions being called.

Let us look at what that graph looks like

This is so handy! It shows the flow of the program, and nicely notifies us of each function, module, and file that the program runs through, along with runtime and number of calls. Running this in a big application generates a large image, but with the coloring, it is quite easy to find the code that matters. Here is a graph from the PyCallGraph documentation, showing the flow of code involving complex regular expression calls:

source code of graph

What can we do with this information?

Once we determine the cause of the slow code, we can then properly choose a course of action to speed up our code. Lets talk about some possible solutions to slow code, given an issue.


If you find your code is heavily input/output dependant, including sending many web requests, then you may be able to solve this problem by using Python’s standard threading module. Non I/O related threading is not suited for Python, due to  CPython’s GIL, which precludes it from using more than one core at a time for code centric tasks.


You know what they say, once you decide to use regular expressions to fix a problem, you now have two problems. Regular expressions are hard to get right, and hard to maintain. I could write a whole separate blog post on this (and I will not, regexes are hard, and there are much better posts than one that I could write), but I will give a few quick tips:

  1. Avoid .*, greedy catchalls are slow, and using character classes as much as possible can help with this
  2. Do not use regex! many regexes can be solved with simple string methods anyways, such as str.startswith, and str.endswith. Check out the str documentation for tons of great info.
  3. Use re.VERBOSE! Python’s Regex engine is great and super helpful, use it!

Thats all I will say on Regex, there are some great posts all over the internet if you want more information.

Python Code

In the case of the code I was profiling, we were running a Python function tens of thousands of times in order to stem English words. The best part about finding that this was the culprit was that this kind of operation is easily cachable. We were able to save the results of this function, and in the end, the code ran 10 times as fast as it did before. Creating a cache in Python is super easy:

from functools import wraps
def memoize(f):
    cache = {}
    def inner(arg):
        if arg not in cache:
            cache[arg] = f(arg)
        return cache[arg]
    return inner

This technique is called memoization, and is shown being implemented as a decorator, which can be easily applied to Python functions as so:

import time
def slow(you):
    print("Hello after 3 seconds, {}!".format(you))
    return 3

Now if we run this function multiple times, then the result will only be computed once.

>>> slow("Davis")
Hello after 3 seconds, Davis!
>>> slow("Davis")
>>> slow("Visitor")
Hello after 3 seconds, Visitor!
>>> slow("Visitor")

This was a great speedup for the project, and the code runs without a hitch.

disclaimer: make sure this is only used for pure functions! If memoization is used for functions with side effects such as I/O, then caching can have unintended results.

Other cases

If your code is not readily memoizable, your algorithm is not something crazy like O(n!), or your profile is ‘flat’, as in there are no obvious hot sections of code, then maybe you should look into another runtime or language. PyPy is a great option, along with possibly writing a C-extension for the meat of your algorithm. Luckily, what I was working on did not come to this, but the option is there if it is needed.


Profiling code can help you understand the flow of the project, where the hot code is, and what you can do as a developer to speed it up. Python profiling tools are great, super easy to use, and in depth enough to help you get to the root of the cause fast. Python is not meant to be a fast language, but that does not mean that you should be writing slow code! Take charge of your algorithms, do not forget to profile, and never prematurely optimize.

We’re hiring!  If you’re interested in geospatial, big data, social media analytics, Amazon Web Services (AWS), visualization, and/or the latest UI and server technologies, drop us an e-mail at

No Soup for You: When Beautiful Soup Doesn’t Like Your XML

Over the years, Beautiful Soup has probably saved us more hours on scraping, data collection, and other projects than we can count. Crummy’s landing page for the library even says:

Beautiful Soup is here to help. Since 2004, it’s been saving programmers hours or days of work on quick-turnaround screen scraping projects.

You can plot me directly in the “days” column. When I’m starting a Python project that requires me to parse through HTML data, the first dependency I’ll pull is BeautifulSoup. It makes what would normally be a nasty Perl-esque mess into a something nice and Pythonic, keeping sanity intact. But what about structured data other than HTML? I’ve also learned that BS can be a huge boon for XML data, but not without a couple of speed bumps.

Enter data from the client (not real data, but play along for a moment):

  <sport name="baseball">
    <team name="Braves" city="Atlanta">

Seems well-structured! Our customer just needs all the links that we have in the data. We fire up our editor of choice and roll up our sleeves.

from __future__ import print_function
from bs4 import BeautifulSoup

# Make our soup
with open('data.xml') as infile:
    blob =
# Use LXML for blazing speed
soup = BeautifulSoup(blob, 'lxml')

We get in return:


Wait, what? What happened to our content? This is a pretty basic BS use case, but something strange is happening. Well, I’ll come back to this and start working with their other very hypothetical data sets where <link> tags become <resource> tags, but the rest of the data is structured exactly the same. This time around:

  <sport name="baseball">
    <team name="Braves" city="Atlanta">
        <resource><!-- Changed to resouce -->
        </resource><!-- Corresponding close -->

…and corresponding result…

>>> print(soup.find_all('resource'))

Interesting! To compound our problem, we’re on a customer site where we don’t have internet access to grab that sweet, sweet documentation we crave. After toiling on a Stack Overflow dump in all the wrong places, I was reminded of one of my favorite blog posts by SO’s founder, Jeff Atwood. Read the Source. But what was I looking for? Well, let’s dig around for <link> tags and see what turns up.

Sure enough, after some quick searches, we find what I believe to be the smoking gun (for those following along at home,, lines 228/229 in v4.3.2).

empty_element_tags = set(['br' , 'hr', 'input', 'img', 'meta',
                          'spacer', 'link', 'frame', 'base'])

We have a seemingly harmless word with “link” in our XML, but it means something very different in HTML and more specifically, the TreeBuilder implementation that LXML is using. As a test, if I change our <link> turned <resource> tags into <base> tags we get the same result – no content. It also turns out that if you have LXML installed, BeautifulSoup4 will fall back to that for parsing. Uninstalling it grants us the results we want – tags with content. The stricter (but faster) TreeBuilder implementations from LXML take precedence over the built-in HTMLParser or html5lib (if you have it installed). How do we know that? Back to the source code!

bs4/builder/, lines 304:321

# Builders are registered in reverse order of priority, so that custom
# builder registrations will take precedence. In general, we want lxml
# to take precedence over html5lib, because it's faster. And we only
# want to use HTMLParser as a last result.
from . import _htmlparser
    from . import _html5lib
except ImportError:
    # They don't have html5lib installed.
    from . import _lxml
except ImportError:
    # They don't have lxml installed.

As it turns out, when creating your soup, ‘lxml’ != ‘xml’. Changing the soup creation gets us the results we’re looking for (UPDATE: corresponding doc “helpfully” pointed out by a Reddit commenter here). BeautifulSoup was still falling back to HTML builders, thus why we were seeing the results we were when specifying ‘lxml’.

# Use HTML for sanity
soup = BeautifulSoup(blob, 'xml')

While I didn’t find that magic code snippet to fix everything,  (UPDATE: Thanks Reddit). We found our problem, but went really roundabout to get there. Understanding why it was happening made me a feel a lot better in the end. It’s easy to get frustrated when coding, but always remember, read the docs and – Read the Source, Luke. It might help you understand the problem.

We’re hiring!  If you’re interested in geospatial, big data, social media analytics, Amazon Web Services (AWS), visualization, and/or the latest UI and server technologies, drop us an e-mail at