Tools for the savvy grad student

As a grad student, I was always looking out for tools to make my life simple (read: I’m quite lazy). Here are some of the tools I think every savvy grad student must know.

A good plotting library

Don’t even mention gnuplot. Not only is it old school (how many times have you looked at a graph in a paper and just known that it was produced using gnuplot?), but it is extremely limited in its feature set. My biggest gripe with gnuplot, however, is that it forced me to separate my data collection/analysis from the actual plotting of the data. I personally am a huge fan of matplotlib — it is an uber-plotting library written in Python. It can produce high-quality graphics in dozens of formats (including interactive plotting), it has an object-oriented API as well as a imperative API along the lines of Matlab (hence the name). You can create amazingly rich plots and best of all, you can combine your data collection and analysis (which I was doing in Python anyways) with your plotting.

If you are more a Ruby person, check out gruff.

Bibliography Management

There are two aspects of bibliography management. First is the context of a specific paper: you are working on a paper and you want to collect all the relevant bibliographic information for citing in the paper. BibTex is the tool that is most commonly used for this, in combination with LaTeX. However, BibTex is buggy, the syntax is inconsistent across implementations, it lacks simple features like variables and the ability to “import” other bibtex files etc. Enter CrossTeX — a drop-in replacement for BibTex. CrossTeX is written in Python. It has an object-oriented model for representing citations. So once you define an object for author “Foo Bar” aliased as foobar, you can simply use foobar wherever you would like to cite “Foo Bar”.  CrossTeX also makes it trivial to define new formatting styles for your citations. For instance, if you want to change the capitalization of the titles or abbreviate “Proceedings” to “Proc.” everywhere. Finally, CrossTeX was built by some nice folks at Cornell, so they know exactly what the pain points of BibTeX were.

The second aspect of bibliography management is simply keeping a track of all the papers you read and review. These will come in handy when you are writing a paper, a dissertation, preparing for a talk or an interview, or simply trying to recall prior work in a given field. I highly recommend using CiteULike — it is an online bibliography management portal. Some features I really like: CiteULike has a really nice bookmarklet that you adding new items to your bibliography using a single click from various sites such as ACM, USENIX, IEEE, PubMed, arXiv and so on; it has some really nice social features as well such as tagging, groups, watch lists etc.; you can download selected citations in multiple formats; you can search easily by keyword, tag, author, area, year etc.

A Text Editor

I don’t mean an IDE (like Eclipse) or a Word processor (like MS Word). I mean a text editor and only a text editor. AFAIC, that means Vim or Emacs (if that works for you). The bottom line is, learn a text editor and become really really good at it. You will be amazed at how much time will save you and how much can it impact your productivity. Some features that are essential: syntax highlighting, regular expression support, spell check, support for snippets etc.

On that note, learn to write in LaTeX. I’m horrified by the fact that so many people are still using Word like tools to write papers. I don’t have anything against Word, but it is the wrong tool for writing papers. Just reference management, formatting, including figures etc are so incredibly easier in LaTeX. And if you are struggling to find the code for the right symbol in LaTeX, you’ll love detexify (hat tip: Nate)!

Version Control

I can’t stress this more — you must get in the habit of versioning everything. Not just code, but your notes, write-ups and obviously papers. Having some version control has saved me from disasters many a times. And if you are collaborating on papers, I can’t imagine how people do it without some kind of version control system. Now there are a lot of choices out there. But if you are really savvy, you must use git :) Basically use any reasonable distributed VCS (Mercurial and Bazaar are also ok), but avoid Subversion and absolutely refuse to use CVS at all costs. CVS has lived a good life, but its time is now past and we must let it go.

Information Management

And by that, I mean staying on top of the news and research in your research area and/or academic community. I’ve found it very useful to add all the relevant blogs to a ‘research’ tag in my Google Reader (yes, the blogging bug has bit academia). Likewise, you can find a lot of current information on Twitter. I’m sure people have already started live-blogging and twittering from academic conferences as well!

Of course, for more conventional searches, DBLP and Google Scholar are invaluable. CiteSeer used to be the go-to website a few years ago, but I personally find Google Scholar much nicer to use and with just as much information, if not more.

Enhanced by Zemanta

16 comments

  1. Patrick Verkaik

    Re: “My biggest gripe with gnuplot, however, is that it forced me to separate my data collection/analysis from the actual plotting of the data.”

    Care to give an example and how matplotlib solved it for you? Thanks.

    • Diwaker Gupta

      @Patrick Verkaik: So before matplotlib, my workflow used to be like this:

      Collect the logs and extract the data to be plotted. For instance, I might want to extract the throughput numbers from httperf logs. Or analyze tcpdump output to extract the timestamp for each packet. I would usually do this in Python.
      Plot the data using gnuplot.

      With matplotlib, I can combine the two steps together.

  2. Will

    Thanks for this list. I can’t believe I did not know about Google Scholar. One of the things I write about a lot is Ethical Food. Often I feel like I am doing you know what into wind as I voice opinion and reasoning on the topic. It is discussed much in the UK, but here in the US, it is almost ignored. Or at least it seemed that way until I just used GS after reading this post to refine a search and found about a days worth of reading!

    Your take on the importance of version control is also right on. I work on keeping several databases up to date and often write papers that have several people contributing. It is amazing to me that so many otherwise very smart people are in the dark ages with versioning. And you would not believe how many times I get called things like “a..l” for wanting to track changes exactly.

    I use Google reader as you suggest. The only issue there is the sheer amount of information. I get so overwhelmed by the volume that I tend to ignore it out of fear for days on end. That only makes it worse. I probably have several lifetimes worth of reading bookmarked somewhere.

    Thanks again for a great post!

    • Diwaker Gupta

      @Will: Glad you found the post useful! There are several useful resources out there on managing information in Google Reader. You might want to try the Inbox Zero approach. Just delete all your feeds and start fresh, adding only what you really read. I find the “Trends” section in Reader quite useful — it can help you identify what you do read and what you don’t, what data sources get updated regularly and which ones are dead etc.

  3. Aanjhan Ranganathan

    What do you use to draw figures? Like block diagrams and stuff?
    OoDraw works but converting to pdf or eps is plain sucky. Do you have any work flow for diagrams. Like from drawing it to \includegraphics :-) Will be helpful.
    xfig is SO oldschool and Inkscape seems overwhelming for me.

  4. matt

    great post! i took much the same journey.

    i use emacs because there’s a module for it (auctex). i found crosstex too late for my thesis, but i use it and citeulike for my manuscripts now. i made the mistake of using svn. but i want to learn mercurial.

    on the math side, i use PyXPlot, R, and mysql. give those a try if you haven’t.

    also, you didn’t mention powerpoint presentations/posters which you probably will end up doing a lot. give pstricks and powerdot a go.

    • Diwaker Gupta

      Glad you found this useful. For version control, I highly recommend you use Git — bzr and mercurial don’t even come close. I’ve used R in the past; its nice. I’m not sure how you are using MySQL for math?

      For presentations, I used LaTeX Beamer for a while (I blogged about it http://floatingsun.net/2006/05/15/tools-i-use-beamer/). Later in my grad school days I gave in and used OpenOffice and MS Powerpoint. These days I just use Google Docs where possible — as long as you don’t need fancy animations, it should work just fine.

  5. Nate R

    I would add TikZ (http://www.texample.net/tikz/) to this list for any figure drawing. It is a large, complex package (the manual is over 700 pages long!) but once you get the hang of it you can create complex figures right inside of your tex document. You get all of the power of latex in a figure markup environment, meaning for example that the appropriate fonts will be used when you’re inserting $math mode markup$ in a figure (never again will your PDF come back from the publisher because R outputs PDFs w/ missing fonts!).

      • Nate R

        The TikZ examples are great; often when I can’t figure out how to do something I come across an example of somebody doing _exactly that thing_ as part of one of the examples.

Leave a Reply