Tagged: search

Source code search engines

Lazy people like me are often looking for re-usable components and libraries to do the hard work for them. While a combination of Google, Freshmeat and Sourceforge usually gets the job done, the process can be made much more painless with a dedicated source code search engine.

The very first search engine that I ran into was [[http://koders.com|Koders.com]]. I haven’t used it very much, but it looks pretty cool. Searching for “java hash map” will return a list of Java files having a map or a hashmap, //and// show the relevant code snippet for context. Each search result also has links to the corresponding project and the source file.

//Edit: I had written up a whole bunch when I accidentally pressed Ctrl-W and closed the window. Dang! Can’t wait for some drafts like feature in WordPress.//

A new entrant in this arena is [[http://krugle.com|Krugle.com]]. You can think of it as the Web 2.0 version of Koders.com. Krugle is not open to the public yet, so I couldn’t test it out. You cn get some idea of the functionality by [[http://krugle.com/product/|looking at the screenshots]]. Functionally, I can’t find too many differentiating features, the one exception being the ability to save/share searches.

Finally, there is [[http://gonzui.sourceforge.net/|Gonzui]] — its essentially a desktop version of source code search engines. So instead of going to some website, you just download and run Gonzui on your own. Seems to be a waste of resources, but the advantage is that you can customize and filter the projects you actually want to search in. This might be particularly useful if you need to frequently search code of a small number of projects. But then why wouldn’t I use something like cscope/ctags instead?

It seems like the [[http://raa.ruby-lang.org/|Ruby Application Archive]] is using Gonzui to provide a [[http://raa.ruby-lang.org/gonzui/|search interface]] for projects it hosts/lists. I guess thats one way to use Gonzui, specially if RAA denies services like Koders or Krugle the right to index their code. Though given that a lot of this code is freely downloadable, it doesn’t seem likely its possible to do that legally.

Yahoo lied?

A few days back [[http://www.ysearchblog.com/archives/000172.html|Yahoo! announced]] that their search index had grown to more than //twice// the size of Google’s index (which, of course, [[http://battellemedia.com/archives/001790.php|Google refuted]]).

So some folks from NCSA went ahead and did a little testing, and the conclusion is that Yahoo’s claims [[http://vburton.ncsa.uiuc.edu/indexsize.html|might be suspicious]]. Are we entering a new world of corporate dishonesty?

To be fair, the NCSA experiment was very very simplistic. I mean, you could do it from your home computer, if you wanted. They just took the standard ispell dictionary file, created around 10,000 random searches consisting of two words and fed them to both Yahoo and Google. Then they compared the size of the result set.

A few points to note — they only compare if the number of results is less than 1000. This can bias the result of their experiment if Google is simply //better// than Yahoo at indexing documents. Its still not a concrete measure of the size of the index itself. Also, their experiments cover regular queries — specialized queries for images, audio/video files, blogs etc are not covered.

But certainly something that Yahoo! is going to note and hopefully respond to in the next few days.