Category: Internet

Some thoughts on iCloud

Sorry, all the sensationalist headlines were taken, so I had to pick something boring.

As we all know by now (read: probably 1% of the world’s population), at WWDC earlier this week, Apple spilled the beans on the upcoming iCloud, among other things. In this post, I wanted to share some of my thoughts on the much hyped iCloud (not that there is any dearth of opinions and articles on the subject, thanks to the echo-chamber that is Twitterverse and Blogosphere)

iCloud

First off, some quick bullets summarizing what it is:

  • iCloud aims to make cloud storage painless, the idea being that your data should be available to you from all your devices, all the time.
  • It’s automatic and transparent. Apple is baking iCloud support deep into 9 different applications: iTunes, Photo Stream, Apps, Books, Documents, Backup, Contacts, Calendar and Mail. And that’s just the beginning.
  • It’s free. Upto 5GB — excluding purchased music, books, apps and photo stream.
  • Sync over the air: iCloud can sync across devices over wireless. As a concrete example, you’ll no longer need a cable to sync and backup your iPhone with your laptop.

Here are some cool things about iCloud:

  • Scan and skip upload (iTunes only): when dealing with large data sets (such as your movies and music collection), one of the main impediments to using cloud storage is the overhead of doing the initial import. With a 1Mbps uplink, a 10GB music collection will take a full day to upload. Of course, if the file you are trying to upload already exists somewhere in the cloud, you don’t need to upload it and this is exactly what iCloud does. Because of the iTunes store, Apple already has a library of 18 million songs (and counting) and detecting if two files are for the same song is a lot easier than for many other media types (say images or movies).
  • Storage APIs for developers: APIs are all the rage these days. By exposing the right set of APIs, Apple could attract developers to build iCloud functionality on other platforms (Android, for example). Unfortunately, the API is fairly limited at this point (key-value store or documents).
  • HP, Teradata, maybe EMC are rumored to have supplied bulk of the hardware in the spanking new datacenter that will be the backbone for iCloud.
  • Despite all the hoopla around “cloud” recently, it was still grounded firmly within the tech circles. Apple has the ability, experience and motivation to take cloud computing truly mainstream with iCloud.

What is NOT so cool:

  • Apple has a habit of exaggerating the novelty and efficacy of their features (remember Spaces?) Scan and skip upload is nothing new: it is just deduplication under the wraps — a well known technique in storage systems. Videos and photos will still have to be uploaded though — there’s no real shortcut for those. Of course, there are techniques to dedup arbitrary data and I hope Apple is leveraging them.
  • In the same vein, syncing of Mail, Calendar and Contacts is just catch up. Ever used Google? Likewise for Docs and Books. The delivery model is different — Apple apps work with the local data and sync when there’s connectivity. They haven’t touched upon conflict resolution, disconnected clients etc.
  • Implications for Dropbox: transparent, automatic sync across multiple devices is a phenomenally hard problem. Apple makes it sound like they’ve nailed it. It took Dropbox several years to address all the performance and security concerns. I’d wager Apple will run into its share of snags along the way.
  • Apples all the way: despite their claims, iCloud is designed to lock you in. Sure you may be able to leverage some of the features by installing additional software on a PC. But unless you are using an Apple device, you won’t get the full experience or service. Want your “reading list” available on Android (or Chome, for that matter)? Tough luck. Want your music available to other music players (open source players like Banshee and Amarok, god forbid)? How about your photo stream in Picasa?

Finally, there’s no doubt that iCloud will drastically alter the cloud landscape. However, Apple is focused mainly on the personal cloud — which is a good thing, they are playing to their strengths. It is also a great opportunity because the enterprise cloud market is still wide open. The requirements, challenges and “killer apps” in that market are very very different than the personal/consumer cloud market. Should be fun!

How do you use Twitter/Buzz/Facebook?

No no, I’m not late to the party and I’m not asking literally how does one use the above mentioned services. Rather, I’m asking how does one put these various services to use. When do you post something on Twitter but not on Buzz, Facebook but not on Twitter; or do you post everything everywhere (ping.fm style)? I’m not a heavy hitter by any means and my usage of social networks is mediocre at best. Yet I myself confounded with all of the various services and their accompanying warts and virtues. Don’t you?

To help sort out my thoughts, I drew a picture (don’t you dare judge me for my lack of creativity!):

Twitter/Facebook/Buzz

Below I elaborate more on how I currently use each of the services.

Twitter

  • I tend to use it for technical and/or non-personal content. Things that I would want to publicize.
  • Unlike Buzz/Facebook, I don’t pay too much attention to who is following me. Most tweets are public anyways.
  • The 140 character limit is sometimes amusing, but often irritating. Are people still using regular SMS with Twitter?
  • Multiple startups devoted to managing Twitter “noise” is not encouraging.
  • @ replies are bandaid. Twitter is a broadcast-and-forget medium — I can’t have (or follow) a conversation on it.

Facebook

  • Use it for sharing random, personal updates (or things I find interesting :p)
  • Mostly on because of network effect (read: don’t want to be left off the social bandwagon).
  • Like that I can “Like” most things and actually follow the conversation via comments.
  • Always worried if my privacy settings are working and if there’s a new “default” I need to worry about.
  • Pay more attention to who I friend. The noise level is still quite high despite that.

Buzz

  • Usage domain similar to that of Facebook. Unlike Facebook, can choose to make posts Public.
  • Love the email integration. Conversely, API/clients still have to catch up to Twitter.
  • Supports likes, comments and “resharing”.
  • Privacy is modeled around my contacts (chat or otherwise), which seems natural.

I’m fine with using Twitter for all of my public posts. The main confusion lies between Buzz and Facebook. Facebook obviously has more social traction. That said, Buzz is just more convenient to use (because of the email integration mostly). Of course, all of the various connectors available (Twitter <-> Buzz, Twitter <-> Facebook, multicast via ping.fm or Chromedeck etc) make the whole thing even more confusing. At the end of the day, I might just go back to not using anything on a regular basis.

How are you using Twitter, Buzz and Facebook?

Observations from The Social Network

Image representing Facebook as depicted in Cru...
Image via CrunchBase

The Social Network is rather like a fast paced documentary. The content, production value and background scores were great. I really enjoyed the bit around the Harvard boat race — a nice piece of whitespace in the movie :) But this post is not about these aspects; rather I wanted to make a few observations about the several tiny tid-bits of open source sprinkled throughout the movie.

  • wget makes several appearances in a short segment of the movie where Mark is scraping the Harvard intranet for the seed data for various precursors to Facebook. To my relief, everything I saw seemed very real and plausible unlike, say, the hackery mumbo-jumbo in Matrix or (gasp) Swordfish. Nonetheless, I did not see (and have not seen) any evidence that Mark Zuckerberg is the programming genius that most reviews and synopsis claim. Of course, programming genius has no correlation with being successful (read: being the youngest billionaire)
  • The usage of Emacs, Perl and curl were also faithful. The emphasis should be on Zuck’s intuition about the idea and his ability to prototype quickly. The technology itself was something any script kiddy could have come up with.
  • Zuck is shown running KDE 3 on his workstation. Again, the attention to detail is impressive. KDE 3 was around the same time as the early years of Facebook development.

The Social Network
The Social Network

There were a few more things, but I saw the movie several weeks ago and the details are fuzzy in my head. Meanwhile, if you are interested in the veracity of the movie’s substance, I found this Gigaom post useful.

Toying with node.js

A commenter rightly complained that despite my claims of “playing around” with node.js, all I could come up was with the example in the man page. I replied saying that I did intend to post something that I wrote from scratch, and as promised, here is my first toy node.js program:

var sys = require('sys');
var http = require('http');
var url = require('url');
var path = require('path');

function search() {
  stdin = process.openStdin();
  stdin.setEncoding('utf8');
  stdin.on('data', function(term) {
    term = term.substring(0, term.length - 1);
    var google = http.createClient(80, 'ajax.googleapis.com');
    var search_url = "/ajax/services/search/web?v=1.0&q=" + term;
    var request = google.request('GET', search_url, {
      'host': 'ajax.googleapis.com',
      'Referer': 'http://floatingsun.net',
      'User-Agent': 'NodeJS HTTP client',
      'Accept': '*/*'});
    request.on('response', function(response) {
      response.setEncoding('utf8');
      var body = ""
      response.on('data', function(chunk) {
        body += chunk;
      });
      response.on('end', function() {
        var searchResults = JSON.parse(body);
        var results = searchResults["responseData"]["results"];
        for (var i = 0; i < results.length; i++) {
          console.log(results[i]["url"]);
        }
      });
    });
    request.end();
  });
}

search();

This program (also available as a gist) reads in search terms on standard input, and does a Google search on those terms, printing the URLs of the search results.

I was quite surprised (and a bit embarrassed) at how long it took me to get this simple program working. For instance, it took me the better part of an hour to realize that when I read something from stdin, it includes the trailing newline (as the user hits ‘Enter’). Earlier, I was using the input as-is for the search term, and that was leading to a 404 error, because the resulting URL was malformed.

Debugging was also harder, as expected. Syntax errors are easily caught by V8, but everything else is still obscure. I’m sure some of the difficulty is because of my lack of expertise with Javascript. But at one point, I got this error:

events:12
        throw arguments[1];
                       ^
Error: Parse Error
    at Client.ondata (http:881:22)
    at IOWatcher.callback (net:517:29)
    at node.js:270:9

I still haven’t figured out exactly where that error was coming from. Nonetheless, it was an interesting exercise. I’m looking forward to writing some non-trivial code with node.js now.

What is node.js?

The logo of the Node.js Project from the offic...
Image via Wikipedia

If you follow the world of Javascript and/or high-performance networking, you have probably heard of node.js. If you already grok Node, then this post is not for you; move along. If, however, you are a bit confused as to exactly what Node.js is and how it works, then you should read on.

The node.js website doesn’t mince words in describing the software: “Evented I/O for V8 JavaScript.” While that statement is precise and captures the essence of node.js succinctly, at first glance it did not tell me much about node.js. I did what anyone interested in node.js should do: downloaded the source and started playing around with it.

So what exactly is node.js? Well, first and foremost it is a Javascript runtime. Think of your web browser; how does it run Javascript? It implements a Javascript runtime and supports APIs that make sense in the browser such as DOM manipulation etc. Javascript as a language itself is fairly browser agnostic. So node.js is yet another runtime for Javascript, implemented primarily in C++.

Because node.js focuses on networking, it does not support the standard APIs available in a browser. Instead, it provides a different set of APIs (with fantastic documentation). Thus, for instance, HTTP support is built into node.js — it is not an external library.

The other salient feature of node.js is that it is event driven. If you are familiar with event driven programming (ala Python Twisted, Ruby’s Event Machine, the event loop in Qt etc), you know what I’m talking about. The key difference though is that unlike all these systems, you never explicitly invoke a blocking call to start the event loop — node.js automatically enters the event loop as soon as it has finished loaded the program. A corollary is that you can only write event driven programs in node.js, no other programming models are supported. Another consequence of this design choice is that node.js is single-threaded. To exploit CPU parallelism, you need to run multiple node.js instances. Of course, there are several node.js modules and projects already available to address this very issue.

To implement a runtime for Javascript, node.js first needs to parse the input Javascript. node.js leverages Google’s V8 Javascript engine to do this. V8 takes care of interpreting the Javascript so node.js need not worry about syntactical issues; it only need to implement the appropriate hooks and callbacks for V8.

node.js claims to be extremely memory efficient and scalable. This is possible because node.js does not expose any blocking APIs. As a result, the program is completely callback driven. Of course, any kind of I/O (disk or network) will eventually block. node.js does all blocking I/O in an internal thread pool — thus even though the application executes in a single thread, internally there are multiple threads that node.js manages.

Overall, node.js is very refreshing. The community seems great and there is a lot of buzz around the project right now, with some big companies like Yahoo starting to use experiment with node.js. node.js is also driving the “server side Javascript” movement. For instance, Joyent’s Smart platform allows you to write your server code in Javascript, which they can then execute on their hosted platforms.

Finally, no blog post about node.js is complete without an example of node.js code. Here is a simple web server:

[gist id=485001]