Tools I use: matplotlib


I get a lot of questions along the lines of “Hey Diwaker, what do you use for blah?” (insert your requirement there). Apparently, I seem to have a talent for finding “smart” tools that people like using. So I figured I should blog about some of the tools I used. Maybe others can benefit.

I’ll skip a couple of obvious ones here: my editor of choice is [[http://floatingsun.net/blog/tags/vim|vim]], and I used [[http://floatingsun.net/blog/tags/wordpress|wordpress]] for my blogs.

Let me instead come to something that //every// grad student ends up doing a lot of — making graphs (well, almost every. My friends in theory hardly draw graphs). And frequently, even people outside of research need to make pretty looking graphs and plots. Within academia and researchers, [[http://www.gnuplot.info|gnuplot]] has been the defacto plotting tool for as long as I can remember (I’m pretty sure it goes back atleast a decade, if not more). Outside the research community, most people tend to use the plotting tools that come with Office software — M$ Excel or Powerpoint and the likes.

Don’t even get me started on the Excel/Powerpoint crap. Maybe they are good enough for a quick and dirty work. But for anything more than that, for doing any //real// analysis/visualization, they are pretty much useless. For me, a good tool must meet the following requirements:

* It must be scriptable: I don’t want to have to open a bloated GUI and click a 100 buttons and drag-and-select columns to get a plot out. When dealing with large amounts of data stored in myriads of files, it is **critical** to be able to script/automate the process.
* It must support multiple output formats: EPS, PS, PDF, PNG, JPG, SVG are the ones I usually need. (E)PS/PDF for embedding in papers. PNG/JPG for viewining/emailing. SVG is just cool :-)
* It must support a variety of graph types: bar charts, pie charts, histograms, error bars.
* It should be **highly** customizable: tick size, label fonts, colors, line styles, thicknesses, positioning, subplots, grids, log scales, transparency, marker styles — EVERYTHING.
* Easy things should be **really** easy, and complicated things must be possible.

GNUPlot has served us quite well over the years. Atleast in CSE, I can confidently say that close to 80% of all graphs in papers are done using GNUPlot. In rare cases its OpenOffice/Excel. But GNUPlot is showing its age now: it can only deal with very simplistic input formats, its not very customizable, it supports a very limited number of graph types (AFAIK, it //still// doesn’t support bar charts natively). But for me, the biggest gripe is that it forces me to break my data analysis phase in two steps: in the first phase I write some scripts (typically in Python) to process the raw data into a form that can be consumed by GNUPlot; in the second phase I write another script (in GNUPlot) to do the actual plotting.

Enter [[http://matplotlib.sourceforge.net/|Matplotlib]]: this is easily the **best** plotting library I have ever used. Endlessly customizable, Matplotlib can do almost [[http://matplotlib.sourceforge.net/screenshots.html|any kind of plot you can imagine]] and some more. Apart from the traditional object oriented interface, Matplotlib also gives a very simple MATLAB (R) like interface for easy plotting. The API maintains a high degree of compatibility with MATLAB API, so MATLAB users will feel right at home.

Furthermore, since it is written in Python, it means that I can unify my data analysis — the possibilties are endless. I can feed all kinds of data directly to Matplotlib. I can process, analyze and plot in the same script. I can make my scripts highly generic (since they are in Python, I can pass command line parameters and what not — none of this is possible with GNUPlot).

Here’s a code fragment to make a really simple plot (stolen from [[http://matplotlib.sourceforge.net/tutorial.html|the excellent tutorial]]):


from pylab import *
plot([1,2,3])
xlabel('time')
ylabel('volts')
title('A line')
show()

And, best of all, you get a fabulous, interactive user interface for free! Yes, I know GNUPlot has an interactive mode too, but this is beyond comparison. You can pan, zoom, go back-forward in view history, save and what not. Here’s a screenshot:
{{ http://matplotlib.sourceforge.net/tut/navcontrols2.png?300×200|Toolbar2}}

Finally, since its Python, its very portable. It supports a variety of [[http://matplotlib.sourceforge.net/backends.html|backends]], [[http://matplotlib.sourceforge.net/interactive.html|interfaces with ipython]] and is under active development (at version 0.86 currently). So next time you have to do a plot, consider doing it in style — do it with matplotlib!

14 comments

  1. diwaker

    @nikhil: No. As I mentioned in my post, the author(s) have deliberately tried to provide a Matlab compatible interface so that users find it easy to transition. While Matlab is vastly more powerful that matplotlib, I don’t like using Matlab for several reasons:

    * closed source. No bindings for scripting languages such as Python.
    * _huge_: the installation takes over a gig. You need more than a gig of RAM to do anything worthwhile with it.
    * Licensing: I can use it at school (and from home using VPN). But the licensing is still cumbersome.

    That said, I think Matlab is a really great software with one of the best documentations I have ever seen. Totally worth its price for people who need that kind of functionality.

  2. Bernardo Torres

    The bad thing about pylab is that it DEMANDS you get your display open, otherwise, it doesn’t run. Does anybody has the solution for putting pylab output in a file? Mail-me. Thanks!

  3. diwaker

    *@bernardo*: It depends on which backend you are using. If you use the Agg backend, you can run everything from console. Infact, even with the GtkAgg backend, my scripts run perfectly fine from console. So I’m pretty sure its just a configuration issue — try changing your backend.

  4. iztok

    from pylab import *
    matplotlib.use(‘PS’)

    plot(…)

    savefig(“myfile.eps”)

    this does not need any display but only generates an .eps file.

  5. Bernardo Torres

    Yeah!
    Except that:
    import matplotlib
    matplotlib.use(WhateverOutput)

    needs to be done first, at least in Debian Sid, since it uses GTK by default :\

  6. diwaker

    *@greg*: Thanks for the tutorial! Yeah, weird about the trackbacks not appearning. I’ll check up on my site config to see if anything’s broken. I’ve recently upgraded some stuff so it might very well be the case.

  7. Pingback: Making Dynamic Charts and Graphs for Your WebPage--Answer My Searches
  8. Pingback: Tools for the savvy grad student

Leave a Reply