Floating Sun » Aster http://floatingsun.net Mon, 07 Jan 2013 02:53:26 +0000 en-US hourly 1 http://wordpress.org/?v=3.5.1 Indian Sweets in the Kitchen http://floatingsun.net/2011/04/20/indian-sweets-in-the-kitchen/?utm_source=rss&utm_medium=rss&utm_campaign=indian-sweets-in-the-kitchen http://floatingsun.net/2011/04/20/indian-sweets-in-the-kitchen/#comments Thu, 21 Apr 2011 05:38:54 +0000 Diwaker Gupta http://floatingsun.net/?p=1760 Related posts:
  1. The Hypocrisy of Indian Politics – The India Uncut Blog – India Uncut
  2. The new vanguards of Indian democracy
  3. Directory of Indian ads
]]>
JohnC wrote a cute post over on his blog that filled me with nostalgia. I figured I’d respond with a counter-point and shamelessly stole the title from the original post :) So go read his post first.

Welcome back. I’ve been meaning to write a series of posts about my startup experience(s) and John’s post is good inspiration. If nothing else, I’ll try to post something to supplement or respond to his posts.

Back to Kaju Barfi (also known as Kaju Katli by many). First, a picture:

Kaju Barfi
Image Courtesy: slimwithyoga.com

John makes a good point about being in a diverse environment surrounded by people who come from different cultures and backgrounds. I have seen myself grow personally and intellectually in similar situations. While I agree that US immigration law needs serious reform, let there be no doubt that there are very few countries that are as immigrant-friendly as the United States. Countless people from all over the world have come to the US, made it their home and contributed to all walks of society. I can not imagine Americans having the same kind of success in India as Indians have had in the US.

Another experience probably most Indians in the US would share is this: only when you are outside India do you realize how little you know about your country. Over the years I’ve been asked all sorts of questions about India — from naive ones about elephants and snake-charmers to hard-hitting ones about religion, freedom and corruption.

While we are on the subject of Kaju Barfi, do you see the silver material coating the surface of the barfi’s in the picture above? Here’s an advice — do NOT microwave that thing! Or any other Indian sweet that has the silver foil. It is a question that comes up often: what is it? what is it’s purpose? The silver foil is commonly called “vark” in India and yes, traditionally it is meant to be a super-thin silver foil. It servers no particular purpose other than to give a grandiose look to the sweets — it is edible and does not modify the taste of the sweets.

You may worry about consuming metal with your sweets and if you think that many people likely don’t use silver anymore, you’d probably be right. But relative to the challenges our world faces, I’d say its a minor concern. Hundreds of millions who are eating the silver foiled sweets daily are doing just fine.

John, I’ll bring you a box next time I’m in India!

]]>
http://floatingsun.net/2011/04/20/indian-sweets-in-the-kitchen/feed/ 1
Big Data Analytics http://floatingsun.net/2010/03/31/big-data-analytics/?utm_source=rss&utm_medium=rss&utm_campaign=big-data-analytics http://floatingsun.net/2010/03/31/big-data-analytics/#comments Thu, 01 Apr 2010 06:49:01 +0000 Diwaker Gupta http://floatingsun.net/?p=1251 Related posts:
  1. Big Data Summit
  2. Interesting Analytics
  3. Google Analytics — the deep web
]]>
DISCLAIMER: As with all other material on this blog, these are my thoughts and do NOT reflect the opinions of my employer.

I really like the tagline on our logo: big data. fast insights.

But leaving the marketing aside, what does it mean really? What is all the hoopla about big data analytics?

The way I look at things, a few key observations here are:

  1. Data is increasing. This is almost self-evident, so I won’t bother with presenting any evidence.
  2. Data is driving businesses more than ever. Whether it is search, advertising, insurance, finance, health care, governance — data is becoming an integral part of more and more business processes.
  3. Finally, data movement is slow. And I mean really really slow, compared to our processing and memory speeds. Once you go into the range of hundreds of terabytes of petabytes of data, you really don’t want to keep moving around that data into isolated silos for doing analytics.

Clearly, none of these observations is particularly new or insightful. However, I do think some of the implications of these observations are quite powerful and were new at least for me. For instance, (3) implies that once you have accumulated a lot of data in one place (imagine hundreds of TB or more), it is extremely difficult and time consuming to move that data around. This, in turn, means that more often than not, data is likely to reside in a single place.

Traditionally, it was not uncommon to have a large data warehouse that would be the repository of all data. Then smaller data sets could be carved out from this master data set (also known as data marts) as required. This approach is becoming increasingly unfeasible. Carving out 100TB data marts from a 1PB data warehouse is simply not going to scale.

At the same time, it is clear that a one-size-fits-all approach to data storage and analysis is not practical either. Some data sets naturally lend themselves to a relational data model, while others might be more suited to unstructured processing (Hadoop) or document oriented processing (CouchDB or MarkLogic) or graph analysis (Neo4J) and so on. Forcing a single model or access mechanism down all customers’ throat is not tenable.

So what would the ideal platform for big data analytics look like? One that allows you to store and access data in various ways, seamlessly.

Reblog this post [with Zemanta]
]]>
http://floatingsun.net/2010/03/31/big-data-analytics/feed/ 0
Big Data Summit http://floatingsun.net/2009/10/04/big-data-summit/?utm_source=rss&utm_medium=rss&utm_campaign=big-data-summit http://floatingsun.net/2009/10/04/big-data-summit/#comments Sun, 04 Oct 2009 12:58:03 +0000 Diwaker Gupta http://floatingsun.net/?p=1154 Related posts:
  1. Big Data Analytics
  2. Lousy reporting at TechCrunch
]]>
DISCLAIMER: I work at Aster.

If you are interested in Big Data, you should check out the Big Data Summit.

Big Data Summit

What is it about?

The informal evening event, colocated with Hadoop World: NYC, will highlight advancements in
data warehousing and big data management. Similar to ScaleCamp, Big Data Summit will showcase
some of the most innovative uses of MPP data warehousing and other complementary solutions like
Hadoop to harness the power of Big Data.

]]>
http://floatingsun.net/2009/10/04/big-data-summit/feed/ 0
Aster is hiring! http://floatingsun.net/2009/03/30/aster-is-hiring/?utm_source=rss&utm_medium=rss&utm_campaign=aster-is-hiring http://floatingsun.net/2009/03/30/aster-is-hiring/#comments Mon, 30 Mar 2009 18:36:00 +0000 Diwaker Gupta http://floatingsun.net/?p=1079 Related posts:
  1. Aster @ MySpace
  2. The Aster Flower
]]>
These days all news you hear is pretty much bad news. Layoffs and cutdowns everywhere.

Aster Data Systems

I’m therefore very happy to note that we are still hiring at Aster! We have a couple of positions open — you can find all the details on our website. So if you are looking for an exciting place to work at in these challenging times, drop us a note. Or if you know someone who is looking for a job, point them our way. Thanks!

]]>
http://floatingsun.net/2009/03/30/aster-is-hiring/feed/ 0
The Aster Flower http://floatingsun.net/2009/03/24/the-aster-flower/?utm_source=rss&utm_medium=rss&utm_campaign=the-aster-flower http://floatingsun.net/2009/03/24/the-aster-flower/#comments Wed, 25 Mar 2009 04:55:32 +0000 Diwaker Gupta http://floatingsun.net/2009/03/24/the-aster-flower/ Related posts:
  1. Aster is hiring!
  2. Aster @ MySpace
  3. Mother’s Day
]]>
DISCLAIMER: These are just my opinions and should not be construed as official in any sense.

A few weeks back I was thinking about the name “Aster” and I realized that I didn’t really know what it meant. So I did some quick Googling and landed up at good old Wikipedia. It turns out that Aster is a genus of flowers.

On some more reading, it was evident that one could draw parallels between some of the characteristics of this genus and the features of Aster’s products. Here are some quotes from the article on Asteraceae:

The most evident characteristic of Asteraceae is perhaps their inflorescence: a specialised capitulum, technically called a calathid or calathidium, but generally referred to as flower head or, alternatively, simply capitulum.[7] The capitulum is a contracted raceme composed of numerous individual sessile flowers, called the florets, all sharing the same receptacle.

In particular, a pseudanthium (Greek for “false flower”) or flower head is a special type of inflorescence, in which several flowers are grouped together to form a flower-like structure.”

In the same spirit, the Aster nCluster database is composed of many commodity components but exposes itself a single, unified database.

]]>
http://floatingsun.net/2009/03/24/the-aster-flower/feed/ 2
Diving into nPath http://floatingsun.net/2009/03/17/diving-into-npath/?utm_source=rss&utm_medium=rss&utm_campaign=diving-into-npath http://floatingsun.net/2009/03/17/diving-into-npath/#comments Wed, 18 Mar 2009 06:36:42 +0000 Diwaker Gupta http://floatingsun.net/?p=1012 No related posts. ]]> A few days ago, Steve posted a concrete example of Aster nPath. Though the example in the above mentioned post was called “straightforward”, I think that for people unfamiliar with nPath syntax, it could have been a little difficult to digest in one glance. It certainly wasn’t immediately obvious to me. So I’ll try to break down the query in that post into little bits and hopefully clarify the syntax further.

First, some context. Here is the problem that Steve posed:

For example, suppose we are interested in the optimization of our website flow in order to retain and engage visitors driven to us by SEO/SEM. We want to answer the question: for SEO/SEM-driven traffic that stay on our site only for 5 or less pageviews and then leave our site and never return in the same session, what are the top referring search queries and what are the top path of navigated pages on our site? In traditional data warehouse solutions, this problem would require a five-way self-join of granular weblog data, which is simply unfeasible for large sites such as Myspace.

And here is the nPath query that answers the above problem (taken from Steve’s post):


SELECT entry_refquerystring, entry_page || “,” || onsite_pagepath as onsite_pagepath, count(*) as session_count

FROM nPath(
ON ( select * from clicks where year = 2009 )
PARTITION BY customerid, sessionid
ORDER BY timestamp
PATTERN ( ‘Entry.Onsite+.OffSite+$’ )
SYMBOLS (
domain ilike “mysite.com” and refdomain ~* “yahoo.com|google.com|msn.com|live.com” as Entry,
domain ilike “mysite.com” as OnSite,
domain not ilike “mysite.com” as OffSite
)
MODE( NONOVERLAPPING )
RESULT(
first(page of Entry) as entry_page,
first(refquerystring of Entry) as entry_refquerystring,
accumulate(page of Onsite) as onsite_pagepath,
count(* of Onsite) as onsitecount_minus1
)
)
WHERE onsitecount_minus1 < 4
GROUP BY 1,2
ORDER BY 3 DESC
LIMIT 1000;

Alright, so lets see whats going on here. It is important to always keep the big picture in mind — nPath scans groups of (sequential) rows at a time, searching for user specified patterns. Thus, the first thing that we need to specify is exactly what rows will nPath be operating on. This looks pretty much like a regular SQL query:


ON ( select * from clicks where year = 2009 )
PARTITION BY customerid, sessionid
ORDER BY timestamp

Next, we must specify what is it that we are looking for, or the search pattern. The search pattern is unimaginatively specified via the PATTERN clause:


PATTERN ( ‘Entry.Onsite+.OffSite+$’ )

The pattern description looks very much like any regular expression. The above pattern will find groups of rows where the first row matches “Entry”, followed by one or more “Onsite”s and ending with one or more “Offsite”(s). Next, we need to define what “Entry”, “Onsite” and “Offsite” mean. This is done via the SYMBOLS clause:


SYMBOLS (
domain ilike “mysite.com” and refdomain ~* “yahoo.com|google.com|msn.com|live.com” as Entry,
domain ilike “mysite.com” as OnSite,
domain not ilike “mysite.com” as OffSite
)

A row denotes an “Entry” if the corresponding page is on the domain mysite.com and the referer domain was one of the popular search engines. This makes sure we are not counting clicks from other links on mysite.com. “OnSite” and “OffSite” have similar descriptions.

Once we have identified the pattern and the symbols that make up the pattern, we can do some processing with them. Among other things, nPath allows you to dynamically control the columns of the output. These are specified via the RESULTS clause. Columns can be specified via existing column attributes, or using SQL/nPath aggregates:


RESULT(
first(page of Entry) as entry_page,
first(refquerystring of Entry) as entry_refquerystring,
accumulate(page of Onsite) as onsite_pagepath,
count(* of Onsite) as onsitecount_minus1
)

Finally, we can specify exactly what columns from the output we want to view, and how we want to view them. This is specified just like a regular SELECT clause, which operates on the output of an nPath query instead of regular SQL tables:


SELECT entry_refquerystring, entry_page || “,” || onsite_pagepath as onsite_pagepath, count(*) as session_count

And thats about it as far as the functionality of the query goes — there is of course some more syntactic sugar for specifying other constraints such as GROUP BY and ORDER BY. There is a LOT more to nPath than this, but hopefully it gives you some idea of the capabilities of this extremely powerful construct.

]]>
http://floatingsun.net/2009/03/17/diving-into-npath/feed/ 1
Aster @ MySpace http://floatingsun.net/2009/03/05/aster-myspace/?utm_source=rss&utm_medium=rss&utm_campaign=aster-myspace http://floatingsun.net/2009/03/05/aster-myspace/#comments Thu, 05 Mar 2009 19:32:55 +0000 Diwaker Gupta http://floatingsun.net/?p=1034 Related posts:
  1. Aster is hiring!
  2. The Aster Flower
  3. Diving into nPath
]]>
[myspace]http://vids.myspace.com/index.cfm?fuseaction=vids.individual&videoid=53516936[/myspace]

For more details, check out Chris’s post over at the Aster blog.

]]>
http://floatingsun.net/2009/03/05/aster-myspace/feed/ 0