In this example I want to open a file directly from an S3 bucket without having to download the file from S3 to the local file system. This is a way to stream the body of a file into a python variable, also known as a ‘Lazy Read’.
import boto3 s3client = boto3.client( 's3', region_name='us-east-1' ) # These define the bucket and object to read bucketname = mybucket file_to_read = /dir1/filename #Create a file object using the bucket and object key. fileobj = s3client.get_object( Bucket=bucketname, Key=file_to_read ) # open the file object and read it into the variable filedata. filedata = fileobj['Body'].read() # file data will be a binary stream. We have to decode it contents = filedata.decode('utf-8') # Once decoded, you can treat the file as plain text if appropriate print(contents)
And that is all there is to it. Be careful when reading in very large files. Also this example works will with text files. I use it alot when saving and reading in json data from an S3 bucket.
Good Luck.
This worked for me when I replaced mybucket with ‘mybucket’ and the same for the filename.
This code was very helpful to me.
: ‘utf-8’ codec can’t decode byte 0x8c in position 7: invalid start byte
i am getting this error message while i am trying the read parquet file type
: ‘utf-8’ codec can’t decode byte 0x8c in position 7: invalid start byte
i am getting this error message while i am trying the read parquet file type
: ‘utf-8’ codec can’t decode byte 0x8c in position 7: invalid start byte
i am getting this error message while i am trying the read parquet file type
Thanks! Solved my problem easily
You have an error on the line:
contents = filedata.decode(‘utf-8’))
Should be:
contents = filedata.decode(‘utf-8’)
Thanks for catching that – I corrected the typo.
filedata = fileobj[‘Body’].read()
This line is throwing error for me always:
file_object = self.client.get_object(Bucket=self.bucket_name, Key=self.get_mnp_checksum_file())
log.info(f”File object : {file_object}, it’s type: {type(file_object)}”)
file_content = file_object[‘Body’]
> file_content = file_content.read().decode()(‘utf-8’)
E TypeError: ‘str’ object is not callable
Please help
Aside from quoting the bucket name and input file path values, you MUST NOT include the leading slash in the input S3 file path.