Using boto3? Think pagination!

2018-01-09 Boto is a name for river dolphins of the Amazon

This is a problem I've seen several times over the past few years.

When using boto3 to talk to AWS the APIs are pleasantly consistent, so it’s easy to write code to, for example, ‘do something’ with every object in an S3 bucket:

s3_client = boto3.client("s3")

result = s3_client.list_objects(Bucket="my-bucket")
for obj in result["Contents"]:
    do_something(obj)

However there’s one giant flaw with this code, and you won’t spot it until you know one detail of the S3 API: every endpoint is paginated. This is the standard across all of the AWS APIs returning lists of things.

This code will work just fine while the bucket has less than 1000 objects (the default page size), but once there are 1001 or more it will silently fail to do_something() on the objects beyond the first page. I’ve found such a latent flaw several times across different projects using different services, normally in my own code! Very often one writes a feature at the start of a project, when the number of ‘things’ on the service is small, and it’s only weeks or months of growth later when the page limit gets hit.

To write the code to survive with any number of objects, we need to use a Paginator. The official user guide is a little verbose for my taste. Here’s a learn-by-example rewrite of the above code to use a paginator, which be adapted for other services or endpoints:

s3_client = boto3.client("s3")

paginator = s3_client.get_paginator("list_objects")
pages = paginator.paginate(Bucket="my-bucket")
for page in pages:
    for obj in page["Contents"]:
        do_something(obj)

Overall I think the boto3 design is back-to-front here - the most common operation “do X with all Y” is tied up with complicated pagination, while less common control of the pages is the default. Making all-the-things pagination the default API and would make more sense; and indeed this is the approach of the AWS CLI. Maybe there’s an edge case I’m not thinking of though, and it’s probably too late to change the behaviour of boto3 at such a fundamental level (maybe in ‘boto4’?).

Anyway, you have been warned: think pagination!

Update 2018-01-28: some redditors have shared useful code snippets for pagination in the comments there.

😸😸😸 Check out my new book on using GitHub effectively, Boost Your GitHub DX! 😸😸😸

One summary email a week, no spam, I pinky promise.

Related posts:

Tags: aws, python