Snippet for fetching a Twitter feed to monitor for updates on AppEngine/Django.
from django.conf import settings import feedparser # (http://www.feedparser.org/) # cron job function def update_twitter(request): TWITTER_URL = 'http://twitter.com/statuses/user_timeline/%s.rss' % \ settings.TWITTER_USERNAME # my username data = feedparser.parse(TWITTER_URL) if data.has_key('bozo_exception'): return HttpResponse('OK') pattern = re.compile(r'^http://twitter.com/.*?/statuses/(\d+)$') tweets = dict() for d in data['entries']: tweets[pattern.search(d['id']).groups()] = d # there's a 'Tweet' model for previously saved tweets tweeted = [t.guid for t in Tweet.all().filter('guid IN', tweets.keys())] not_tweeted = [t for t in tweets.keys() if t not in tweeted] # there we go. if not len(not_tweeted): return HttpResponse('Nothing to save') for t in not_tweeted: ... # save t return HttpResponse('OK')
And schedule it in cron.yaml:
- description: Download twitter url: /lifestream/tasks/update_twitter/ schedule: every 10 minutes`
Have I ever said I love Python's libraries? I love Python's libraries.
Seems that Twitter's RSS feed doesn't always generate successfully either (not unimaginable I guess, considering their concurrent hits), so it's important to catch errors... although they're not always very helpful (I've got several very generic
'Application Error: 2' so far, what on earth does that mean?). I suspect I'm not doing it very efficiently either, with 3 for loops and a dictionary, but not too many choices as I've only want to hit the AppEngine Datastore once for the entire query.
P.S - just noticed Markdown + syntax highlighting is still fairly dodgy with detecting whitespaces / code block boundaries. On to the TODO list it goes.
Edit: Attached a more complete code snippet. May have been a bad idea to blog at 1am.