Mike Saffitz

Enumerating S3 Directory Structures With AWS SDK for Ruby

A few months ago, Amazon released an official Ruby gem for use with Amazon Web Services, and I’ve slowly been moving my projects over to it as the syntax is extremely friendly and powerful.

Today, I hit an unexpected gotcha. When enumerating objects stored in a directory structure using the standard bucket.objects.each ... syntax, both leaf keys as well as non-leaf keys are returned. This means if you have a bucket with a single object with a key of directory/object, your block will called twice: once with an object with key directory/ and once with an object with key directory/object.

Unfornately, the gem doesn’t include a leaf? method for S3Objects, but there are two easy ways to address this.

First, the Tree classes allow for navigating a bucket using a tree and selecting only the leaf objects. If you go this route, however, you’ll have to iterate through the tree, enumerating leafs at each branch— children only returns immediate descendants from a given node.

Alternatively, in the block enumerating the objects, you can check for a trailing slash on the object key, and when present, skip the object:

bucket.objects.each do |obj|
  next if obj.key.end_with? '/'

I’m curious if anyone has a scenario where enumerating objects should return the branch nodes interspersed with the leaf nodes?