Category: Software

September 17th, 2010

Reclaim the TV

Do you:

wonder who watches all of the 700+ channels that you get from your cable provider?
wonder why you are paying more than $50/month for all those channels, when you only watch a handful?
find yourself channel surfing, just because you can, not because you know what you want to watch?
feel like you are not getting the most out of your TV?
feel like you can’t wait for Google TV or the Boxee box to improve your television experience?

If you answered yes to any of the above questions, I have a few tips for you. There have been times when I spent an hour flipping through channels, not really watching anything in particular, and later feeling like an idiot for having wasted my time. All this while I just accepted that cable TV as a given, an inevitable companion of our internet service provider… until I met a few friends who were perfectly happy without any cable service.

With that inspiration, I started looking around for a better TV experience. There is so much content available online (and good content!) that I wasn’t really worried about lack of material to watch. Both Google TV and the Boxee box seemed very promising (not to mention, the new Apple TV). There were two little problems:

Neither Boxee box nor Google TV are here yet, and
All of these devices have very very limited functionality. They address the problem of Internet TV, but I don’t want to use my TV to just watch TV. I want it to become a hub for all my media. I want to access my local music, photos, videos and more.

So I did what any self-respecting geek would do — I built my own media center PC. It runs Ubuntu; it hosts all my media in a single location; it serves as media server and storage server; and it serves as a compute server when I have to transcode media. You can find all the gory details in this guide: HOWTO Build an Ubuntu based media center.

September 9th, 2010

And just like that, I’m a GNOME user

When I first started using Linux (more than a decade ago), I did my share of playing around with various desktop environments: the classic FVWM, GNOME, KDE, Enlightenment etc. I settled down with KDE. Over the years, I kept coming back to GNOME to check it out but somehow KDE always felt home to me.

Well guess what, not any more. As of a few days ago, I’m (mostly) a GNOME user.

I still love KDE (the desktop) and KDE based applications (KMail, Amarok etc). It is still infinitely more configurable than anything comparable in GNOME (Evolution and Thunderbird are still fairly limited in comparison) and over the years I’ve tweaked it to just the way I like it. But GNOME has something the KDE project does not: Canonical.

Thats right, I switched to GNOME because of Canonical, the company that drives Ubuntu development. Sure, there is a lot of effort behind the various Ubuntu variants such as Kubuntu, Xubuntu etc. But make no mistake, none of these variants are first-class citizens in the Ubuntu ecosystem.

The switch was a result of my recent experience setting up Ubuntu on my home theater PC. The effort Canonical has put into making the Ubuntu experience more seamless and pleasant is clearly visible. Pretty much everything works out of the box: folders that I share show up on other computers in my home network, bluetooth/webcam etc all work just fine, setting up remote desktop is a breeze and so on, Avahi/bonjour works like a charm; I can setup a DAAP server to share my music and it shows up on iTunes just like that.

Note that all of these things are obviously not limited to Ubuntu in any way. But the user experience in Ubuntu is unparalleled in comparison with Kubuntu etc. Subtle niceties like the notifications (the Ayatana project), the Me menu, the messaging menu, the “light” themes etc. come together in a very cohesive way to deliver an experience that rivals that of Mac OS. But beyond the subtleties, Canonical is shaping the future of Linux on the desktop, laptop and mobile devices: the Unity interface, multi-touch support for mobile devices and more. Bottomline: having a company put its weight behind a desktop has ramifications.

So as much as I love thy, KDE, for now we shall part ways. I’m still using some KDE apps (like digiKam), but until Canonical decides to officially adopt Kubuntu, GNOME it is.

August 4th, 2010

Toying with node.js

A commenter rightly complained that despite my claims of “playing around” with node.js, all I could come up was with the example in the man page. I replied saying that I did intend to post something that I wrote from scratch, and as promised, here is my first toy node.js program:

var sys = require('sys');
var http = require('http');
var url = require('url');
var path = require('path');

function search() {
  stdin = process.openStdin();
  stdin.setEncoding('utf8');
  stdin.on('data', function(term) {
    term = term.substring(0, term.length - 1);
    var google = http.createClient(80, 'ajax.googleapis.com');
    var search_url = "/ajax/services/search/web?v=1.0&q=" + term;
    var request = google.request('GET', search_url, {
      'host': 'ajax.googleapis.com',
      'Referer': 'http://floatingsun.net',
      'User-Agent': 'NodeJS HTTP client',
      'Accept': '*/*'});
    request.on('response', function(response) {
      response.setEncoding('utf8');
      var body = ""
      response.on('data', function(chunk) {
        body += chunk;
      });
      response.on('end', function() {
        var searchResults = JSON.parse(body);
        var results = searchResults["responseData"]["results"];
        for (var i = 0; i < results.length; i++) {
          console.log(results[i]["url"]);
        }
      });
    });
    request.end();
  });
}

search();

This program (also available as a gist) reads in search terms on standard input, and does a Google search on those terms, printing the URLs of the search results.

I was quite surprised (and a bit embarrassed) at how long it took me to get this simple program working. For instance, it took me the better part of an hour to realize that when I read something from stdin, it includes the trailing newline (as the user hits ‘Enter’). Earlier, I was using the input as-is for the search term, and that was leading to a 404 error, because the resulting URL was malformed.

Debugging was also harder, as expected. Syntax errors are easily caught by V8, but everything else is still obscure. I’m sure some of the difficulty is because of my lack of expertise with Javascript. But at one point, I got this error:

events:12
        throw arguments[1];
                       ^
Error: Parse Error
    at Client.ondata (http:881:22)
    at IOWatcher.callback (net:517:29)
    at node.js:270:9

I still haven’t figured out exactly where that error was coming from. Nonetheless, it was an interesting exercise. I’m looking forward to writing some non-trivial code with node.js now.

August 3rd, 2010

Some thoughts on dbShards

I heard about dbShards via two recent blog posts — one by Curt Monash and the other by Todd Hoff. It seemed like an interesting product, so I spent some time digging around on their website.

As the name suggests, dbShards is all about sharding. Sharding, also known as partitioning, is the process of distributing a given dataset into smaller chunks based some policy. AFAIK, the term “shard” was popularized recently by Google even though the concept of partitioning is at least a few decades old. Most distributed data management systems implement some form of sharding by necessity, since the entire data set will not fit in memory on a single node (if it would, you should not be using a distributed system). And therein lies the USP of dbShards — it brings sharding (and with it, performance and scalability) to commodity, single-node databases such as MySQL and Postgres.

So how does it work? Well, dbShards acts as a transparent layer sitting in front of multiple nodes running MySQL, lets say. Transparent, because they want to work with legacy code, meaning no or minimal client side modifications. Inserting new data is pretty simple: dbShards using a “sharding key” to route an incoming tuple to the appropriate destination. Queries are a bit more complex, and here the website is skimpy on details. Monash’s post mentions that join performance is good when sharding keys are the same — this is not a surprise. I’m not interested in what other kinds of query optimizations are in place. When data is partitioned, you really need a sophisticated query planner and optimizer that can minimize data movement and aggregation, and push down as much computation as possible to individual nodes.

I found the page on replication intriguing. I’m guessing when they say “reliable replication”, they mean “consistent replication” in more common parlance (alternative, that dbShards supports strong consistency, as opposed to eventual or lazy consistency). This particular bit in the first paragraph caught my eye: “deliver high performance yet reliable multi-threaded replication for High Availability (HA)”. I’m not sure how to read this. Are they implying that multi-threaded replication is typically not performant? And usually you do NOT want threading for high availability, because a single thread can still take the entire process down. The actual mechanism for replication seems like a straightforward application of the replicated state machine approach.

But making a replicated state machine based system scale requires very careful engineering, otherwise it is easy to hit performance bottlenecks. I’d be very interested in knowing a bit more about the transaction model in dbShards and how it performs on larger systems (tens to hundreds of nodes).

The pricing model is also quite interesting. I think it is the first vendor I know of that is pricing on CPU and not storage (their pricing is $5,000 per year per server). I think this is indicative of the target customer segment as well — I would imagine dbShards works well with a few TBs of data on machines with a lot of CPU and memory.

July 21st, 2010

What is node.js?

: Image via Wikipedia

If you follow the world of Javascript and/or high-performance networking, you have probably heard of node.js. If you already grok Node, then this post is not for you; move along. If, however, you are a bit confused as to exactly what Node.js is and how it works, then you should read on.

The node.js website doesn’t mince words in describing the software: “Evented I/O for V8 JavaScript.” While that statement is precise and captures the essence of node.js succinctly, at first glance it did not tell me much about node.js. I did what anyone interested in node.js should do: downloaded the source and started playing around with it.

So what exactly is node.js? Well, first and foremost it is a Javascript runtime. Think of your web browser; how does it run Javascript? It implements a Javascript runtime and supports APIs that make sense in the browser such as DOM manipulation etc. Javascript as a language itself is fairly browser agnostic. So node.js is yet another runtime for Javascript, implemented primarily in C++.

Because node.js focuses on networking, it does not support the standard APIs available in a browser. Instead, it provides a different set of APIs (with fantastic documentation). Thus, for instance, HTTP support is built into node.js — it is not an external library.

The other salient feature of node.js is that it is event driven. If you are familiar with event driven programming (ala Python Twisted, Ruby’s Event Machine, the event loop in Qt etc), you know what I’m talking about. The key difference though is that unlike all these systems, you never explicitly invoke a blocking call to start the event loop — node.js automatically enters the event loop as soon as it has finished loaded the program. A corollary is that you can only write event driven programs in node.js, no other programming models are supported. Another consequence of this design choice is that node.js is single-threaded. To exploit CPU parallelism, you need to run multiple node.js instances. Of course, there are several node.js modules and projects already available to address this very issue.

To implement a runtime for Javascript, node.js first needs to parse the input Javascript. node.js leverages Google’s V8 Javascript engine to do this. V8 takes care of interpreting the Javascript so node.js need not worry about syntactical issues; it only need to implement the appropriate hooks and callbacks for V8.

node.js claims to be extremely memory efficient and scalable. This is possible because node.js does not expose any blocking APIs. As a result, the program is completely callback driven. Of course, any kind of I/O (disk or network) will eventually block. node.js does all blocking I/O in an internal thread pool — thus even though the application executes in a single thread, internally there are multiple threads that node.js manages.

Overall, node.js is very refreshing. The community seems great and there is a lot of buzz around the project right now, with some big companies like Yahoo starting to use experiment with node.js. node.js is also driving the “server side Javascript” movement. For instance, Joyent’s Smart platform allows you to write your server code in Javascript, which they can then execute on their hosted platforms.

Finally, no blog post about node.js is complete without an example of node.js code. Here is a simple web server:

[gist id=485001]