Comics Grabber


Ever feel tired of visiting tens of web pages every day to read all your favorite comic strips? In the spirit of [[http://dailystrips.sf.net|dailystrips]] and [[http://www.orson.it/~domine/komics/|komics]], comics-grabber is yet another one-stop solution viewing comics on the web.


Keep in mind that this program is for personal use only, as making the output publicly available on the internet constitutes copyright infringement without permission from the strips’ authors. If you’re running it on a personal webserver that can be accessed fron the internet (even if it’s not specifically public), make sure you set up restrictions so that only you have access to it.

===== Motivation =====
I was driven to write comics-grabber for the following reasons:

* I wanted to learn [[http://python.org|Python]] and so I wanted to take up some small but exciting project — comics-grabber seemed like the perfect idea!
* I was using komics earlier, but then I realized it has a fundamental limitation that severely restricted the number of comics I could read. In particular, I wasn’t able to access **any** comics from the [[http://kingfeatures.com|King Features]] website — have you ever gotten that irrirating “no referrer found” image in place of the comic strip you were expecting? There is a similar problem with some modes of dailystrips. So I thought what the heck, let me just write my own solution
* Above all, just for fun!

===== Download =====

The latest version is 0.4: [[http://floatingsun.net/data/code/comics-grabber-0.4.tar.gz|Download here]]

===== Usage =====


usage: comics-grabber.py [options]

options:
--version show program's version number and exit
-h, --help show this help message and exit
-q, --quiet supress messages
-V, --verbose verbose output
-d OUTPUTDIR, --output-dir=OUTPUTDIR
output directory for comics
-c CONFIG, --config=CONFIG
configuration file for comics
-l, --list print a list of available comics and exit
-b, --browser launch the generated HTML file in a browser
-g GROUP, --group=GROUP
comic group to download
-r, --rss create rss instead of page

===== Configuration =====

==== Comic definitions ====

(no, the definition is //not// comic!)

Let us begin by taking an example, and dissecting it step by step. Later in the section I will walk you through the steps needed to write your own comic definition.


[dilbert]
name = Dilbert
type = comic
url = http://www.dilbert.com
prefix = http://www.dilbert.com
attr = alt
value = Today's Dilbert Comic
regex =

Okay, now lets go through this line by line. The first line of the section definition ”[dilbert]” is just the section header. The section header is used to name the final image file, and also used in the group definitions. By convention, I like to keep the section names small, no spaces and special characters. If you need a more descriptive name to appear in the HTML file, you can use the name attribute.

The next line, ”name = Dilbert” gives the descriptive name for the comic. This is a free form string, so you can put anything in here. This will be used to construct the heading for the comic while generating the HTML

The ”type = comic” line is crucial. This is the only line that distinguishes a group definition from a comic definition, and so you must ensure that a valid type line is present for each section in your definitions file. Currently only two values “comic” and “group” are supported, and comics-grabber will exit with an error if it finds anything else in the definition.

Now comes the part where you actually describe where and how to fetch the comic from. The url field specifies the web page that the image will be found on. The page is then parsed to extract the image URL. If the image URL is relative, then the prefix field should contain the base URL, which will be prefixed with the relative URL obtained from the page to obtain the complete URL. If the image URL is already absolute, this field should be left empty

”comics-grabber” allows two flexible mechanisms to locate the image corresponding to the comic strip inside a webpage. The first is to specify an attribute and its expected value. Like in Dilbert’s case above, the comic strip’s image always carries the alt tag “Today’s Dilbert Comic” which we make use of.

The second, and perhaps more powerful mechanism is to specify the regular expression for the image file itself. Often, comic strip images have file names like ”somecomic20040212.gif” — a regular expression like ”d+.(gif|jpg)” will match this. The regular expression need only be precise enough to match the comic image file name, but nothing else on that page. Both mechanisms can be used independently or together to provide an unambiguous description to accurately locate the comic image.

==== Defining Groups ====

[diwaker]
type = group
name = Diwaker
include = dilbert garfield babyblues marvin calvinandhobbes bornloser

The group definition format is extremely simply. It consists of a name for the group, followed by the comics to be included. The list of comics should be space separated, and fit on a **single** line — that means that the line can be as long as it needs to be, but it should not contain any line breaks or newlines. Tabs or other form of white space is okay to separate comic names. The comic names should match with the section header of the corresponding comic definition.

Often times the number of comics you want to group together will be too many to fit all on the same line. No problems at all! Just follow the following example to first define “sub-groups” and then use them to define your final grouping:


[diwaker]
type = group
name = Diwaker
group1 = dilbert babyblues hagar marvin lockhorn bbailey bethalf bfriends
group2 = redeye peanuts phdcomics looseparts animalcrackers wizardofid
group3 = calvinandhobbes boundandgagged bottomliners bornloser penny-arcade
group4 = zits garfield graffiti committed willynethel pcnpixel forbetter
group5 = dennis astropix
include = %(group1)s %(group2)s %(group3)s %(group4)s %(group5)s

In the above example, we first define group1, group2, and then use them to define the final include list. This is a feature supported by Python’s configuration parser itself, so do make use of it!


In the above example, the names group1 etc. can be arbitrary strings. Just make sure you use the appropriate name in the substitution string.

===== ChangeLog =====

* 2005-01-23 version 0.4
* comics-grabber can now generate an RSS feed! (thanks Brad Miller)
* 2004-11-06 version 0.3.4
* fixed bug in downloading User Friendly
* added generic referer field capability
* some sanity checks
* 2004-08-13 version 0.3.2
* fixed KingFeatures bug, everything downloads fine now
* added 4 new comics
* 2004-08-10 version 0.3.1
* display sorted list of available comics with total count (-l option)
* added 8 more comics
* add default extension if image type cannot be determined
* progress gracefully on exception
* 2004-07-29 version 0.3
* added support for groups
* comics can now have better names
* updated comics.conf to new format
* 2004-07-27 version 0.2
* initial release

Posted in

5 comments

  1. Pingback: Floating Sun » Comics Grabber
  2. Jose

    Ucomics just started publishing its comics as swf’s not imgages, bummer now I can’t download comics from them

Leave a Reply