Tagged: Aster

Aster is hiring!

These days all news you hear is pretty much bad news. Layoffs and cutdowns everywhere.

Aster Data Systems

I’m therefore very happy to note that we are still hiring at Aster! We have a couple of positions open — you can find all the details on our website. So if you are looking for an exciting place to work at in these challenging times, drop us a note. Or if you know someone who is looking for a job, point them our way. Thanks!

The Aster Flower

DISCLAIMER: These are just my opinions and should not be construed as official in any sense.

A few weeks back I was thinking about the name “Aster” and I realized that I didn’t really know what it meant. So I did some quick Googling and landed up at good old Wikipedia. It turns out that Aster is a genus of flowers.

On some more reading, it was evident that one could draw parallels between some of the characteristics of this genus and the features of Aster’s products. Here are some quotes from the article on Asteraceae:

The most evident characteristic of Asteraceae is perhaps their inflorescence: a specialised capitulum, technically called a calathid or calathidium, but generally referred to as flower head or, alternatively, simply capitulum.[7] The capitulum is a contracted raceme composed of numerous individual sessile flowers, called the florets, all sharing the same receptacle.

In particular, a pseudanthium (Greek for “false flower”) or flower head is a special type of inflorescence, in which several flowers are grouped together to form a flower-like structure.”

In the same spirit, the Aster nCluster database is composed of many commodity components but exposes itself a single, unified database.

Diving into nPath

A few days ago, Steve posted a concrete example of Aster nPath. Though the example in the above mentioned post was called “straightforward”, I think that for people unfamiliar with nPath syntax, it could have been a little difficult to digest in one glance. It certainly wasn’t immediately obvious to me. So I’ll try to break down the query in that post into little bits and hopefully clarify the syntax further.

First, some context. Here is the problem that Steve posed:

For example, suppose we are interested in the optimization of our website flow in order to retain and engage visitors driven to us by SEO/SEM. We want to answer the question: for SEO/SEM-driven traffic that stay on our site only for 5 or less pageviews and then leave our site and never return in the same session, what are the top referring search queries and what are the top path of navigated pages on our site? In traditional data warehouse solutions, this problem would require a five-way self-join of granular weblog data, which is simply unfeasible for large sites such as Myspace.

And here is the nPath query that answers the above problem (taken from Steve’s post):


SELECT entry_refquerystring, entry_page || “,” || onsite_pagepath as onsite_pagepath, count(*) as session_count

FROM nPath(
ON ( select * from clicks where year = 2009 )
PARTITION BY customerid, sessionid
ORDER BY timestamp
PATTERN ( ‘Entry.Onsite+.OffSite+$’ )
SYMBOLS (
domain ilike “mysite.com” and refdomain ~* “yahoo.com|google.com|msn.com|live.com” as Entry,
domain ilike “mysite.com” as OnSite,
domain not ilike “mysite.com” as OffSite
)
MODE( NONOVERLAPPING )
RESULT(
first(page of Entry) as entry_page,
first(refquerystring of Entry) as entry_refquerystring,
accumulate(page of Onsite) as onsite_pagepath,
count(* of Onsite) as onsitecount_minus1
)
)
WHERE onsitecount_minus1 < 4
GROUP BY 1,2
ORDER BY 3 DESC
LIMIT 1000;

Alright, so lets see whats going on here. It is important to always keep the big picture in mind — nPath scans groups of (sequential) rows at a time, searching for user specified patterns. Thus, the first thing that we need to specify is exactly what rows will nPath be operating on. This looks pretty much like a regular SQL query:


ON ( select * from clicks where year = 2009 )
PARTITION BY customerid, sessionid
ORDER BY timestamp

Next, we must specify what is it that we are looking for, or the search pattern. The search pattern is unimaginatively specified via the PATTERN clause:


PATTERN ( ‘Entry.Onsite+.OffSite+$’ )

The pattern description looks very much like any regular expression. The above pattern will find groups of rows where the first row matches “Entry”, followed by one or more “Onsite”s and ending with one or more “Offsite”(s). Next, we need to define what “Entry”, “Onsite” and “Offsite” mean. This is done via the SYMBOLS clause:


SYMBOLS (
domain ilike “mysite.com” and refdomain ~* “yahoo.com|google.com|msn.com|live.com” as Entry,
domain ilike “mysite.com” as OnSite,
domain not ilike “mysite.com” as OffSite
)

A row denotes an “Entry” if the corresponding page is on the domain mysite.com and the referer domain was one of the popular search engines. This makes sure we are not counting clicks from other links on mysite.com. “OnSite” and “OffSite” have similar descriptions.

Once we have identified the pattern and the symbols that make up the pattern, we can do some processing with them. Among other things, nPath allows you to dynamically control the columns of the output. These are specified via the RESULTS clause. Columns can be specified via existing column attributes, or using SQL/nPath aggregates:


RESULT(
first(page of Entry) as entry_page,
first(refquerystring of Entry) as entry_refquerystring,
accumulate(page of Onsite) as onsite_pagepath,
count(* of Onsite) as onsitecount_minus1
)

Finally, we can specify exactly what columns from the output we want to view, and how we want to view them. This is specified just like a regular SELECT clause, which operates on the output of an nPath query instead of regular SQL tables:


SELECT entry_refquerystring, entry_page || “,” || onsite_pagepath as onsite_pagepath, count(*) as session_count

And thats about it as far as the functionality of the query goes — there is of course some more syntactic sugar for specifying other constraints such as GROUP BY and ORDER BY. There is a LOT more to nPath than this, but hopefully it gives you some idea of the capabilities of this extremely powerful construct.

Milestones

I have been away from this blog for quite a while now and much has happened since then. The past few months have been extremely busy but also quite eventful. 2008 was a very important year for me as I hit several big milestones. Each of these warrant posts of their own, but these one-liners will do for now.

Milestones

First, I successfully defended my PhD on October 31st. I have since finished all the paperwork and filed my dissertation. I am now, officially, Dr. Gupta! :-)

PhD

Second, I had a blast doing job interviews, and was in the difficult, but fortunate position of having to choose among several excellent offers, especially given the current economy. I will be joining Aster Data in a few days!

Work

And finally, I got married to this most wonderful woman! While our wedding was crazy and intense and hectic and a ton of fun, I have come to believe that the reason marriages are more successful in India is that just one marriage drains you so much that you can’t even think about ever getting married again :-D

Diwaker and Surabhi