I get a lot of questions along the lines of “Hey Diwaker, what do you use for blah?” (insert your requirement there). Apparently, I seem to have a talent for finding “smart” tools that people like using. So I figured I should blog about some of the tools I used. Maybe others can benefit.
I’ll skip a couple of obvious ones here: my editor of choice is [[http://floatingsun.net/blog/tags/vim|vim]], and I used [[http://floatingsun.net/blog/tags/wordpress|wordpress]] for my blogs.
Let me instead come to something that //every// grad student ends up doing a lot of — making graphs (well, almost every. My friends in theory hardly draw graphs). And frequently, even people outside of research need to make pretty looking graphs and plots. Within academia and researchers, [[http://www.gnuplot.info|gnuplot]] has been the defacto plotting tool for as long as I can remember (I’m pretty sure it goes back atleast a decade, if not more). Outside the research community, most people tend to use the plotting tools that come with Office software — M$ Excel or Powerpoint and the likes.
Don’t even get me started on the Excel/Powerpoint crap. Maybe they are good enough for a quick and dirty work. But for anything more than that, for doing any //real// analysis/visualization, they are pretty much useless. For me, a good tool must meet the following requirements:
* It must be scriptable: I don’t want to have to open a bloated GUI and click a 100 buttons and drag-and-select columns to get a plot out. When dealing with large amounts of data stored in myriads of files, it is **critical** to be able to script/automate the process.
* It must support multiple output formats: EPS, PS, PDF, PNG, JPG, SVG are the ones I usually need. (E)PS/PDF for embedding in papers. PNG/JPG for viewining/emailing. SVG is just cool :-)
* It must support a variety of graph types: bar charts, pie charts, histograms, error bars.
* It should be **highly** customizable: tick size, label fonts, colors, line styles, thicknesses, positioning, subplots, grids, log scales, transparency, marker styles — EVERYTHING.
* Easy things should be **really** easy, and complicated things must be possible.
GNUPlot has served us quite well over the years. Atleast in CSE, I can confidently say that close to 80% of all graphs in papers are done using GNUPlot. In rare cases its OpenOffice/Excel. But GNUPlot is showing its age now: it can only deal with very simplistic input formats, its not very customizable, it supports a very limited number of graph types (AFAIK, it //still// doesn’t support bar charts natively). But for me, the biggest gripe is that it forces me to break my data analysis phase in two steps: in the first phase I write some scripts (typically in Python) to process the raw data into a form that can be consumed by GNUPlot; in the second phase I write another script (in GNUPlot) to do the actual plotting.
Enter [[http://matplotlib.sourceforge.net/|Matplotlib]]: this is easily the **best** plotting library I have ever used. Endlessly customizable, Matplotlib can do almost [[http://matplotlib.sourceforge.net/screenshots.html|any kind of plot you can imagine]] and some more. Apart from the traditional object oriented interface, Matplotlib also gives a very simple MATLAB (R) like interface for easy plotting. The API maintains a high degree of compatibility with MATLAB API, so MATLAB users will feel right at home.
Furthermore, since it is written in Python, it means that I can unify my data analysis — the possibilties are endless. I can feed all kinds of data directly to Matplotlib. I can process, analyze and plot in the same script. I can make my scripts highly generic (since they are in Python, I can pass command line parameters and what not — none of this is possible with GNUPlot).
Here’s a code fragment to make a really simple plot (stolen from [[http://matplotlib.sourceforge.net/tutorial.html|the excellent tutorial]]):
from pylab import *
And, best of all, you get a fabulous, interactive user interface for free! Yes, I know GNUPlot has an interactive mode too, but this is beyond comparison. You can pan, zoom, go back-forward in view history, save and what not. Here’s a screenshot:
Finally, since its Python, its very portable. It supports a variety of [[http://matplotlib.sourceforge.net/backends.html|backends]], [[http://matplotlib.sourceforge.net/interactive.html|interfaces with ipython]] and is under active development (at version 0.86 currently). So next time you have to do a plot, consider doing it in style — do it with matplotlib!