Getting the sizes of Top level Directories in an AWS S3 Bucket with Boto3

I was recently asked to create a report showing the total files within the top level folders and all the subdirs under the folder in our S3 Buckets.

S3 bucket ‘files’ are objects that will return a key that contains the path where the object is stored within the bucket.
I came up with this function to take a bucket and iterate over the objects within the bucket. For each item, the key is examined and added to a running total kept in a dictionary.

Here’s what I ended up with.

def get_top_dir_size_summary(bucket_to_search):
    """
    This function takes in the name of an s3 bucket and returns a dictionary
    containing the top level dirs as keys and total filesize and value.
    :param bucket_to_search: a String containing the name of the bucket
    """
    # Setup the output dictionary for running totals
    dirsizedict = {}
    # Create 1 entry for '.' to represent the root folder instead of the default.
    dirsizedict['.'] = 0

    # ------------
    # Setup the AWS Res. and Clients
    s3 = boto3.resource('s3')
    s3client = boto3.client('s3')

    # This is a check to ensure a bad bucket name wasn't passed in.   I'm sure there is a better
    # way to check this.   If you have a better method, please comment on the article. 
    try:
        response = s3client.head_bucket(Bucket=bucket_to_search)
    except:
        print('Bucket ' + bucket_to_search + ' does not exist or is unavailable. - Exiting')
        quit()

    # since buckets could have more than 1000 items, have to use paginator to iterate 1000 at a time
    paginator = s3client.get_paginator('list_objects')
    pageresponse = paginator.paginate(Bucket=bucket_to_search)

    # iterate through each object in the bucket through the paginator.
    for pageobject in pageresponse:

        # Check to see of a buckets has contents, without this an empty bucket would throw an error. 
        if 'Contents' in pageobject.keys():

            # if there are contents, then iterate through each 'file'.
            for file in pageobject['Contents']:
                itemtocheck = s3.ObjectSummary(bucket_to_search, file['Key'])

                # Get Top level directory from the file by splitting the key. 
                keylist = file['Key'].split('/')

                # See if file is on root, if keylist has 1 item (root dir), there are no dirs on item
                if len(keylist) == 1:
                    dirsizedict['.'] += itemtocheck.size
                else:
                    # Not root, check if key already exists, create it needed, and add value otherwise
                    # Just add the value to the running total
                    if keylist[0] in dirsizedict:
                        dirsizedict[keylist[0]] += itemtocheck.size
                    else:
                        dirsizedict[keylist[0]] = itemtocheck.size

    return dirsizedict

That script is probably a little rough to an elite coder, so if you have any thoughts on improvement, let me hear them.

6 Responses to Getting the sizes of Top level Directories in an AWS S3 Bucket with Boto3

siva says:

June 1, 2017 at 3:51 pm

Hi,
Your work in this is awesome, helping a lot to move forward.
Can you please give a hint on how to extract “security group ID whose cidrIP is 0.0.0.0/0 in IpRanges in IpPermissions, from clouttrail log which is in JSON format using boto3 and python”. I tried all the ways but unable to move forward. Thanks in advance.

- mike says:
  
  June 4, 2017 at 12:25 pm
  
  I think you are trying to find sec groups with an allow all using 0.0.0.0/0. Why not iterate over all groups in the account and check each rule in each group for a cidr of 0.0.0.0/0
  
Mapes says:

June 13, 2018 at 6:21 pm

Hey thanks I know this is kinda old but, it helped me

Lydon says:

June 9, 2020 at 7:47 am

2020 and still great. Consider updating if you ever get the chance 🙂

jagan reddy says:

November 3, 2020 at 7:46 am

Script is not working

- mike says:
  
  November 11, 2020 at 11:37 am
  
  Script worked in my test VM. What error are you seeing?

Getting the sizes of Top level Directories in an AWS S3 Bucket with Boto3

6 Responses to Getting the sizes of Top level Directories in an AWS S3 Bucket with Boto3

Leave a Reply Cancel reply

Support

Categories

Recent Posts