I was recently asked to create a report showing the total files within the top level folders and all the subdirs under the folder in our S3 Buckets.
S3 bucket ‘files’ are objects that will return a key that contains the path where the object is stored within the bucket.
I came up with this function to take a bucket and iterate over the objects within the bucket. For each item, the key is examined and added to a running total kept in a dictionary.
Here’s what I ended up with.
def get_top_dir_size_summary(bucket_to_search): """ This function takes in the name of an s3 bucket and returns a dictionary containing the top level dirs as keys and total filesize and value. :param bucket_to_search: a String containing the name of the bucket """ # Setup the output dictionary for running totals dirsizedict = {} # Create 1 entry for '.' to represent the root folder instead of the default. dirsizedict['.'] = 0 # ------------ # Setup the AWS Res. and Clients s3 = boto3.resource('s3') s3client = boto3.client('s3') # This is a check to ensure a bad bucket name wasn't passed in. I'm sure there is a better # way to check this. If you have a better method, please comment on the article. try: response = s3client.head_bucket(Bucket=bucket_to_search) except: print('Bucket ' + bucket_to_search + ' does not exist or is unavailable. - Exiting') quit() # since buckets could have more than 1000 items, have to use paginator to iterate 1000 at a time paginator = s3client.get_paginator('list_objects') pageresponse = paginator.paginate(Bucket=bucket_to_search) # iterate through each object in the bucket through the paginator. for pageobject in pageresponse: # Check to see of a buckets has contents, without this an empty bucket would throw an error. if 'Contents' in pageobject.keys(): # if there are contents, then iterate through each 'file'. for file in pageobject['Contents']: itemtocheck = s3.ObjectSummary(bucket_to_search, file['Key']) # Get Top level directory from the file by splitting the key. keylist = file['Key'].split('/') # See if file is on root, if keylist has 1 item (root dir), there are no dirs on item if len(keylist) == 1: dirsizedict['.'] += itemtocheck.size else: # Not root, check if key already exists, create it needed, and add value otherwise # Just add the value to the running total if keylist[0] in dirsizedict: dirsizedict[keylist[0]] += itemtocheck.size else: dirsizedict[keylist[0]] = itemtocheck.size return dirsizedict
That script is probably a little rough to an elite coder, so if you have any thoughts on improvement, let me hear them.
Hi,
Your work in this is awesome, helping a lot to move forward.
Can you please give a hint on how to extract “security group ID whose cidrIP is 0.0.0.0/0 in IpRanges in IpPermissions, from clouttrail log which is in JSON format using boto3 and python”. I tried all the ways but unable to move forward. Thanks in advance.
I think you are trying to find sec groups with an allow all using 0.0.0.0/0. Why not iterate over all groups in the account and check each rule in each group for a cidr of 0.0.0.0/0
Hey thanks I know this is kinda old but, it helped me
2020 and still great. Consider updating if you ever get the chance 🙂
Script is not working
Script worked in my test VM. What error are you seeing?