Using boto3? Think pagination!2018-01-09
This is a problem I’ve seen several times over the past few years.
When using boto3 to talk to AWS the API’s are pleasantly consistent, so it’s easy to write code to, for example, ‘do something’ with every object in an S3 bucket:
However there’s one giant flaw with this code, and you won’t spot it until you know one detail of the S3 API: every endpoint is paginated. This is the standard across all of the AWS API’s returning lists of things.
This code will work just fine whilst the bucket has less than 1000 objects
(the default page
but once there are 1001 or more it will silently fail to
the objects beyond the first page. I’ve found such a latent flaw several
times across different projects using different services, normally in my own
code! Very often one writes a feature at the start of a project, when the
number of ‘things’ on the service is small, and it’s only weeks or months of
growth later when the page limit gets hit.
To write the code to survive with any number of objects, we need to use a
Paginator. The official user
guide is a
little verbose for my taste. Here’s a learn-by-example rewrite of the above
code to use a paginator, which be adapted for other services or endpoints:
Overall I think the boto3 design is back-to-front here - the most common operation “do X with all Y” is tied up with complicated pagination, whilst less common control of the pages is the default. Making all-the-things pagination the default API and would make more sense; and indeed this is the approach of the AWS CLI. Maybe there’s an edge case I’m not thinking of though, and it’s probably too late to change the behaviour of boto3 at such a fundamental level (maybe in ‘boto4’?).
Anyway, you have been warned: think pagination!
Tags: aws, python
© 2019 All rights reserved.