Libcoffee.net on Github

Posted by Jerry Wed, Oct 21 '09

Libcoffee.net's source code is now released at http://github.com/zanglang/libcoffee under the Creative Commons Attribution-Share Alike license, with which the end result is what you are reading right now. ;) The codebase is by no means complete, and by extension of that I'm pretty darn sure it's neither bugless, stable, nor razor efficient yet. However, it's being continuously updated so if by any chance you spot an error feel free to send a shout. So, to summarize:

Libcoffee.net is the name of the blogging engine [1] hosting this site. It's written using the Django framework, and is intended to be run on a Google App Engine account. Some of its features:

  • Basic blog authoring and publishing with plain URLs e.g. 'blog/2009/12/31/slug'
  • Comments with Google, OpenID and Facebook identities
  • Content markup with reStructuredText, Markdown and Textile
  • XMPP and email notifications for new comments
  • Clean default template with Blueprint CSS framework, hAtom and hCard microformats
  • Code block syntax highlighting
  • Trackbacks and pingbacks
  • Automated datastore backup via email
  • And of course, other miscellaneous benefits brought by running on Google's distributed computing platform.

There's a fair bunch of library dependencies used, so refer to READEME.rst for a list, as well as installation instructions. (So, uh, why yet another Django blog engine [2]? Answer: because it's not cool if you don't write one while learning Django. Amirite? ^^) All in all, fun experience. Lessons learned from this mini project:

  • Resource management on a distributed computing environment is very textbook. Disk space and CPU is cheap. Cloud computing, a buzzword it may be, could very well be the future of software. Memory is a lot cheaper than disk read/writes too, so it pays to cache the hell out of everything.
  • I have mixed feelings about running Django on App Engine, via app-engine-patch. Django itself is great, but BigTable imposes so many restrictions that half of Django doesn't work the same anymore, not to mention there's a massive requent-response latency when the app has to cold start [3], and then load the 3mb Django zip file, then crawl along as compiled Python bytecode is not allowed. In the first week, I was seeing ridiculously high response times averaging at 25 seconds each, but that seems to be have improved by now.
  • I clearly have not reached the point where I require the benefits of BigTable yet, so I frequently suspect if it would have been much less trouble if I just coughed up the money for a Python web hosting account... :P

And stuff on the to-do list:

  • Rewrite the comments/feedback modules. They were originally ported from Django apps, but a lot of unused cruft is still left over. I still don't know if trackback/pingbacks work properly, for instance.
  • Content caching is still very inconsistent. As it is, memcache keys are pretty much generated adhoc without a proper consistent namespace policy, and purged en masse every time a post is created so it's pretty wasteful. Need to get ideas from Nick Johnson recent posts on implementing a static content generator.
  • Use more of those leftover App Engine quota. I guess that requires me to blog more. :)
  • I really need to look at how to further cut down response times too. Perhaps it'll be yet another fun project - replace bits and pieces with newer and more interesting bits and pieces.

Related Links:

[1]I've originally wanted to call it Nii, as it's a *typo* of Ni, which was an utterance of the Knights from Monty Python, which inspired Python, which I would be using to write it with - uh, it was all supposed to be all connected and funny, okay?
[2]In fact, there's probably already a whole rainbow spectrum of blogs named Yet another Django blog in a dozen alphabetical variants, too.
[3]A proper analogy would be: after a period of inactivity an App Engine app goes cold (deactivated?), until the next request is received, where Google picks a new CPU slice for us to run on again. A popular app will propably stay warm quite well.
# Posted in 6 years ago comments
 

Fetching Twitter by App Engine Cron

Posted by Jerry Wed, Oct 21 '09

Snippet for fetching a Twitter feed to monitor for updates on AppEngine/Django.

from django.conf import settings
import feedparser # (http://www.feedparser.org/)

# cron job function
def update_twitter(request):
    TWITTER_URL = 'http://twitter.com/statuses/user_timeline/%s.rss' % \
                            settings.TWITTER_USERNAME # my username
    data = feedparser.parse(TWITTER_URL)
    if data.has_key('bozo_exception'): return HttpResponse('OK')

    pattern = re.compile(r'^http://twitter.com/.*?/statuses/(\d+)$')
    tweets = dict()
    for d in data['entries']: tweets[pattern.search(d['id']).groups()[0]] = d
    # there's a 'Tweet' model for previously saved tweets
    tweeted = [t.guid for t in Tweet.all().filter('guid IN', tweets.keys())]
    not_tweeted = [t for t in tweets.keys() if t not in tweeted] # there we go.
    if not len(not_tweeted):
        return HttpResponse('Nothing to save')

    for t in not_tweeted:
        ... # save t
    return HttpResponse('OK')

And schedule it in cron.yaml:

- description: Download twitter
  url: /lifestream/tasks/update_twitter/
  schedule: every 10 minutes`

Have I ever said I love Python's libraries? I love Python's libraries.

Seems that Twitter's RSS feed doesn't always generate successfully either (not unimaginable I guess, considering their concurrent hits), so it's important to catch errors... although they're not always very helpful (I've got several very generic 'Application Error: 2' so far, what on earth does that mean?). I suspect I'm not doing it very efficiently either, with 3 for loops and a dictionary, but not too many choices as I've only want to hit the AppEngine Datastore once for the entire query.

P.S - just noticed Markdown + syntax highlighting is still fairly dodgy with detecting whitespaces / code block boundaries. On to the TODO list it goes.

Edit: Attached a more complete code snippet. May have been a bad idea to blog at 1am.

# Posted in 6 years ago comments
 

Reopening

Posted by Jerry Fri, Oct 16 '09

Well, it certainly has been a while... The last post dates back to 2 years ago, which in internet time is A Very Long Time (tm) indeed. Can't remember why did my blog go on an almost indefinite hiatus - perhaps studying for my honours year exams (which I am glad to report went brilliantly well) as well as last minute hacking on my thesis (which was relatively underwhelming - I like my Advanced Security Project report a lot more due to topical interests... but I digress), and after that a fairly long period of WoW. Anyway, this post is not to reminisce about the past, but as a quick memory jog for myself... Meanwhile, I'll try not to Catcher-in-the-Rye -esque stream of consciousness rant too much. :P

So, for a very long time I've wanted to sit down, dump my Ruby on Rails blog (using Typo), and rewrite it with something. (I admit that my knowledge of Ruby/Rails only very barely skims the surface, so hosting one without understanding what it does was quite a peculiar experience which I suspect most true hackers would not endorse :P) I was severely tempted to just get a vanilla Wordpress install too, but that wouldn't have been a cool thing to do, of course. During April this year I've also no longer had my own server, which also meant Libcoffee.net went down for several months...

About a month ago on a Friday night with a very free weekend ahead, I decided to rewrite the site with Django until it matches functionalities, and then chuck it on Google App Engine for free hosting, so over the next 4 weeks of accumulated after-work hacking, I did. :) Technical details and source code will have to wait, but I'll chuck it on Github along with a detailed writeup later. (I promise.)

So, hi again blog. ただいま~!

# Posted in 6 years ago comments
 

Test post

Posted by Jerry Sat, Oct 10 '09

Test post, 1 2 3. Ping, pong!

    # testing code syntax highlighting
    def test(self):
        print 'lorem ipsum'
  • This is a list
# Posted in 6 years ago comments
 

Backlog July 2nd

Posted by Jerry Fri, Jul 03 '09

Reposted from Tomboy Notes on 16 October 2009.

  • Long discussion with Chee Hong about version control, and the mess that is Panton's design and code organization. Curiously, he's quite eager to learn, but I can't vouch for the rest of the company. May have to write proposal and jump through hoops to convince them to properly version control to save themselves, but then, I'm not supposed to be Code Jesus am I?
  • Playing around with the new Gnome Do 0.8.2. Docks are still rather pointless without a 22" screen. Sorry Deskbar, but I'm removing you for the new kid on the block.

Going down to JB on Friday, so probably have to use Conboy for writing.

# Posted in 7 years ago comments