Floating Sun » Research

reCAPTCHA: Stop Spam, Read Books

Diwaker Gupta — Tue, 08 Jan 2008 03:09:54 +0000

A brilliant idea, and well executed too. The kind of thing that makes me wonder, I wish I had thought of that! The best part? CMU is behind it. reCAPTCHA: Stop Spam, Read Books

STL is Slow Template Library

Diwaker Gupta — Thu, 19 Oct 2006 17:33:01 +0000

The C++ Standard Template Library (STL) can be pretty intimidating. I used to think that there’s a lot of magic under the covers to make things go really fast. It turns out that while using the STL is convinient for prototype, its not really built for performance.

A few days back, we needed to do some //compare-by-hash// operations on two files. In all we were doing around 32 million hash table lookups (plus of course the overhead of computing the hash values themselves). On a reasonably fast machine, one would expect that 32 million operations shouldn’t take very long. However, this particular program ran for one **whole day**.

Then Amin suggested that we rip out the STL stuff and just work with a statically allocated hash table since we weren’t really concerned about memory management at this stage. And guess what, the runtime fell to around **10 minutes**. Thats //two orders of magnitude// performance improvement!! I had never imagined that the STL could be //soo// slow.

The essential workspace

Diwaker Gupta — Fri, 07 Jul 2006 00:31:21 +0000

Whenever I switch work spaces — which, thankfully, is not often — I spend a huge amount of time just setting up my working environment, typically 1-2 days. Its critical for your productivity, efficiency and general well being at your work place that you are familar and comfortable with your workspace. In my context, workspace typically means your computer/laptop, your desk, keyboard and pointing devices, and your chair. For people in different industries, workspaces may be different.

Anyways, so I was saying that it takes me quite some effort to setup the workspace just the way I like it. Often times its impossible to get it //exactly// the way I want it (if, for instance, my employer “recommends” that I use Windows only). I’m so used to running cutting edge stuff that even when I’m using Linux systems in a corporate environment, I usually don’t immediately feel at home. I mean, msot of these systems are running really old “stable” software (like RHEL 4WS, or Debian 3 etc).

Some of the things I go after right away are:

* browser (Flock)
* editor (Vim 7)
* email (kmail or gmail)
* scripting language (Python)
* version control (Mercurial)

That pretty much covers the basic necessities. I avoid instant messengers except from my laptop or my lab machine at UCSD, so thats not a big concern. Although I did give Meebo a shot, and its pretty cool. Infact, these days it can even store your chat logs (across all messengers, of course). The only problem was that running Meebo in Flock pretty much killed my system’s memory, so thats a no go for now.

But the point of this post is that it is //still// incredibly hard to //quickly// setup your workspace to your liking. A solution that immediately jumps out is to use VMs: each user could carry his/her customized profiles in a USB key, plug it in the base station, and voila, you’re ready to go. Microsoft already has a [[http://research.microsoft.com/research/sv/keychain/|prototype for the desktop on a keychain]], but several kinks need to be worked out for this to fly. The most important one is security. What is the threat model here? Do I trust the base station? Does the base station trust my VM image? What if the organization wants to impose some restrictions on the things VMs can do? Sounds like we need a policy based architecture for this :-D

So, how long do you take to set up your workspace?

On blind reviews

Diwaker Gupta — Fri, 23 Jun 2006 06:34:40 +0000

This probably won’t make much sense to people out of academia. But anyways.

As you are probably aware, peer reviewed publications go through an elaborate reviewing process. In Computer Science (and perhaps in other fields as well), typically conferences follow some kind of anonymization on the review process. These are called blind reviews. Depending on the degree of anonymity, one has the following three kinds of review processes:

* Zero blind: the reviewers know who the authors are; the authors know who wrote their reviews.
* Single blind: the reviewers know who the authors are; the authors don’t know the reviewers.
* Double blind: neither the authors nor the reviewers know each others’ identities.

I don’t really know of any (good) conference that is zero blind, but there are several under the single and double blind categories. Now time and again, people get into debates on which system is the best. The debate, naturally, is about the anonymity of the authors — there seems to be consensus on the anonymity of the reviewers.

Advocates of the single blind process argue that sometimes a weak paper (such as one with a good idea but backed by and not-so-good implementation/evaluation) might get accepted if the reviewers knew the authors and were convinced (from their reputation/past record/whatever) that they will do a good job by the camera ready deadline. On the flip side, of course, there is the danger that “well known” authors may get an unfair advantage and the under-dogs and small-fish’s potential will be undermined.

Meanwhile, proponents of the double-blind process claim that not knowing authors’ identities makes the reviewing process more fair. Critics, however, argue (sometimes correctly) that usually these research communities are so tightly knit that practically everyone knows the authors anyways. So the whole double blind thing doesn’t really work; besides it unnecessarily inconviniences the authors since they have to put in some extra effort to anonymize their submissions.

For some conferences (such as SIGCOMM), double blind seems to work fairly well — every year there’s atleast one “surprise” paper. While for SOSP, it just seems to be a pain — there are far fewer submission submissions than SIGCOMM and pretty much everyone knows who has written which papers. I guess in the end its up to the community to figure out what works best for them. But what really pisses me off is people’s carelessness — if you are submitting to a double-blind conference, you //must// honor the guidelines. Some of the papers I’ve reviewed have been just unbelievably callous.

So how is it with other fields?

Women in Science

Diwaker Gupta — Fri, 03 Mar 2006 08:13:17 +0000

Funny essay. WARNING: if you’re a graduate student pursuing a PhD in science/engineering, this might be depressing. Also, the essay isn’t really about women. Women in Science

FeedTree: collaborative RSS and Atom delivery

Diwaker Gupta — Mon, 20 Feb 2006 22:59:31 +0000

Dan Sandler, a CS grad student at Rice, gives me another one of those “why didn’t I think of this” moments! FeedTree: collaborative RSS and Atom delivery

NSDI

Diwaker Gupta — Sat, 14 Jan 2006 20:11:23 +0000

I got the official confirmation yesterday — my paper on the time dilation stuff has been accepted for NSDI ’06! I’m happy, because its my first, first-authored paper in a respected systems conference :-) Sadly though the conference is in San Jose (unlike the last two conferences I attended, which were in Europe!).

New tag: research

Diwaker Gupta — Tue, 11 Oct 2005 07:20:30 +0000

I’ve realized that if I spend as much time reading “research news” as I do reading “tech news”, I’d probably be doing much better in research (in terms of having new ideas, getting inspired with creative thoughts, and just generally to know whats going on elsewhere). So I’ve decided to add a new tag “research” and try and regular post items that are relevant to my research (or just interesting from a research point of view).

So to start off this tag, let me just mention today’s faculty recruit talk. This was a talk by [[http://www-math.mit.edu/~vempala/|Santosh Vempala]]. He’s a faculty in the Math department at MIT, and is now interviewing at some schools for a Computer Science position.

His talk was interesting and impressive in a number of aspects. For one, he did not use powerpoint. Infact, he did not use a computer at all! He did it the old fashioned way — using transparencies and a overhead projector. However, that doesn’t mean his presentation was not good. Quite the opposite — the quality of his slides was exception. Each slide was extremely well thought out, colorful (imagine all the hard work! all slides were done manually) and brought out the relevant points without going into too much detail.

He had a diverse audience, so it was also very nice that he was able to reach out to almost everyone in the audience without losing people in technical details. The talk was on spectral methods and their applications in clustering. The idea was simple, the applications far reaching. To top it all off, he had a [[http://eigencluster.csail.mit.edu/|cool demo]] (try the query ‘jaguar’) and data from some real applications. Works that are based in theory and have some real, pratical applications are the most attractive to me.

Forrest

Diwaker Gupta — Sun, 05 Jun 2005 07:26:24 +0000

Last 2 days I’ve been pretty active with [[http://forrest.apache.org | Apache Forrest]]. Primarily with the development and enhancement of the new theme mechanism (skins or views).

Just today I have updated my website with a completely redesigned theme using the new views mechanism. I also wrote the entire CSS from scratch, using the colors from the publicly available KDE, GNOME and Ubuntu color palettes. So far the feedback has been nice, and this theme might actually make its way to the default Forrest theme for the next version!

I’m still tweaking the theme so things might break unexpectedly. If you find something, do drop me a note!

On the academic front, I have read some more papers on virtualization and its kind of disappointing that a lot of the challenges had been very clearly identified and laid out almost 3 decades back, and the worst part is that we are **still** fighting those very same issues today. I have written up some more stuff that I have to go over with Amin tomorrow.

The house hunt for Palo Alto is coming along pathetically, I just have the worst luck ever. [sigh] :-(

Yearly PhD evaluation

Diwaker Gupta — Tue, 10 May 2005 21:48:11 +0000

I had my yearly PhD evaluation on Thursday. My first, so I was kind of nervous. Fortunately and unfortunately, Amin takes the evaluation seriously. Unfortunately because he focused mainly on my weaknesses (since he said strengths are goody goody anyways). Fortunately because I found the discussion very valuable, and he gave some very constructive comments. I mean Ragesh’s advisor just left his evaluation blank! Now what help is that to anyone?!

I just hope that I’m able to address some, if not all, of the issues that Amin pointed out that day. And next time, I’ll be prepared with some feedback of my own :-D This time I didn’t even know students were allowed to give feedback on their advisors :) The only awkward thing is that the whole thing has to be done face-to-face, with consent. So if I disagree with what he says, or he disagrees with what I say, then it can’t be put down.