Tagged: Research

January 7th, 2008

reCAPTCHA: Stop Spam, Read Books

A brilliant idea, and well executed too. The kind of thing that makes me wonder, I wish I had thought of that! The best part? CMU is behind it. reCAPTCHA: Stop Spam, Read Books

October 19th, 2006

STL is Slow Template Library

The C++ Standard Template Library (STL) can be pretty intimidating. I used to think that there’s a lot of magic under the covers to make things go really fast. It turns out that while using the STL is convinient for prototype, its not really built for performance.

A few days back, we needed to do some //compare-by-hash// operations on two files. In all we were doing around 32 million hash table lookups (plus of course the overhead of computing the hash values themselves). On a reasonably fast machine, one would expect that 32 million operations shouldn’t take very long. However, this particular program ran for one **whole day**.

Then Amin suggested that we rip out the STL stuff and just work with a statically allocated hash table since we weren’t really concerned about memory management at this stage. And guess what, the runtime fell to around **10 minutes**. Thats //two orders of magnitude// performance improvement!! I had never imagined that the STL could be //soo// slow.

July 6th, 2006

The essential workspace

Whenever I switch work spaces — which, thankfully, is not often — I spend a huge amount of time just setting up my working environment, typically 1-2 days. Its critical for your productivity, efficiency and general well being at your work place that you are familar and comfortable with your workspace. In my context, workspace typically means your computer/laptop, your desk, keyboard and pointing devices, and your chair. For people in different industries, workspaces may be different.

Anyways, so I was saying that it takes me quite some effort to setup the workspace just the way I like it. Often times its impossible to get it //exactly// the way I want it (if, for instance, my employer “recommends” that I use Windows only). I’m so used to running cutting edge stuff that even when I’m using Linux systems in a corporate environment, I usually don’t immediately feel at home. I mean, msot of these systems are running really old “stable” software (like RHEL 4WS, or Debian 3 etc).

Some of the things I go after right away are:

* browser (Flock)
* editor (Vim 7)
* email (kmail or gmail)
* scripting language (Python)
* version control (Mercurial)

That pretty much covers the basic necessities. I avoid instant messengers except from my laptop or my lab machine at UCSD, so thats not a big concern. Although I did give Meebo a shot, and its pretty cool. Infact, these days it can even store your chat logs (across all messengers, of course). The only problem was that running Meebo in Flock pretty much killed my system’s memory, so thats a no go for now.

But the point of this post is that it is //still// incredibly hard to //quickly// setup your workspace to your liking. A solution that immediately jumps out is to use VMs: each user could carry his/her customized profiles in a USB key, plug it in the base station, and voila, you’re ready to go. Microsoft already has a [[http://research.microsoft.com/research/sv/keychain/|prototype for the desktop on a keychain]], but several kinks need to be worked out for this to fly. The most important one is security. What is the threat model here? Do I trust the base station? Does the base station trust my VM image? What if the organization wants to impose some restrictions on the things VMs can do? Sounds like we need a policy based architecture for this :-D

So, how long do you take to set up your workspace?

June 22nd, 2006

On blind reviews

This probably won’t make much sense to people out of academia. But anyways.

As you are probably aware, peer reviewed publications go through an elaborate reviewing process. In Computer Science (and perhaps in other fields as well), typically conferences follow some kind of anonymization on the review process. These are called blind reviews. Depending on the degree of anonymity, one has the following three kinds of review processes:

* Zero blind: the reviewers know who the authors are; the authors know who wrote their reviews.
* Single blind: the reviewers know who the authors are; the authors don’t know the reviewers.
* Double blind: neither the authors nor the reviewers know each others’ identities.

I don’t really know of any (good) conference that is zero blind, but there are several under the single and double blind categories. Now time and again, people get into debates on which system is the best. The debate, naturally, is about the anonymity of the authors — there seems to be consensus on the anonymity of the reviewers.

Advocates of the single blind process argue that sometimes a weak paper (such as one with a good idea but backed by and not-so-good implementation/evaluation) might get accepted if the reviewers knew the authors and were convinced (from their reputation/past record/whatever) that they will do a good job by the camera ready deadline. On the flip side, of course, there is the danger that “well known” authors may get an unfair advantage and the under-dogs and small-fish’s potential will be undermined.

Meanwhile, proponents of the double-blind process claim that not knowing authors’ identities makes the reviewing process more fair. Critics, however, argue (sometimes correctly) that usually these research communities are so tightly knit that practically everyone knows the authors anyways. So the whole double blind thing doesn’t really work; besides it unnecessarily inconviniences the authors since they have to put in some extra effort to anonymize their submissions.

For some conferences (such as SIGCOMM), double blind seems to work fairly well — every year there’s atleast one “surprise” paper. While for SOSP, it just seems to be a pain — there are far fewer submission submissions than SIGCOMM and pretty much everyone knows who has written which papers. I guess in the end its up to the community to figure out what works best for them. But what really pisses me off is people’s carelessness — if you are submitting to a double-blind conference, you //must// honor the guidelines. Some of the papers I’ve reviewed have been just unbelievably callous.

So how is it with other fields?

March 3rd, 2006

Women in Science

Funny essay. WARNING: if you’re a graduate student pursuing a PhD in science/engineering, this might be depressing. Also, the essay isn’t really about women. Women in Science