Fetching Twitter by App Engine Cron

Posted by Jerry Wed, Oct 21 '09

Snippet for fetching a Twitter feed to monitor for updates on AppEngine/Django.

from django.conf import settings
import feedparser # (http://www.feedparser.org/)

# cron job function
def update_twitter(request):
    TWITTER_URL = 'http://twitter.com/statuses/user_timeline/%s.rss' % \
                            settings.TWITTER_USERNAME # my username
    data = feedparser.parse(TWITTER_URL)
    if data.has_key('bozo_exception'): return HttpResponse('OK')

    pattern = re.compile(r'^http://twitter.com/.*?/statuses/(\d+)$')
    tweets = dict()
    for d in data['entries']: tweets[pattern.search(d['id']).groups()[0]] = d
    # there's a 'Tweet' model for previously saved tweets
    tweeted = [t.guid for t in Tweet.all().filter('guid IN', tweets.keys())]
    not_tweeted = [t for t in tweets.keys() if t not in tweeted] # there we go.
    if not len(not_tweeted):
        return HttpResponse('Nothing to save')

    for t in not_tweeted:
        ... # save t
    return HttpResponse('OK')

And schedule it in cron.yaml:

- description: Download twitter
  url: /lifestream/tasks/update_twitter/
  schedule: every 10 minutes`

Have I ever said I love Python's libraries? I love Python's libraries.

Seems that Twitter's RSS feed doesn't always generate successfully either (not unimaginable I guess, considering their concurrent hits), so it's important to catch errors... although they're not always very helpful (I've got several very generic 'Application Error: 2' so far, what on earth does that mean?). I suspect I'm not doing it very efficiently either, with 3 for loops and a dictionary, but not too many choices as I've only want to hit the AppEngine Datastore once for the entire query.

P.S - just noticed Markdown + syntax highlighting is still fairly dodgy with detecting whitespaces / code block boundaries. On to the TODO list it goes.

Edit: Attached a more complete code snippet. May have been a bad idea to blog at 1am.

# Posted in 11 years ago comments