标签归档:boto3

如何使用boto3将S3对象保存到文件

问题:如何使用boto3将S3对象保存到文件

我正在尝试使用适用于AWS的新boto3客户端做一个“ hello world” 。

我的用例非常简单:从S3获取对象并将其保存到文件中。

在boto 2.XI中,它应该是这样的:

import boto
key = boto.connect_s3().get_bucket('foo').get_key('foo')
key.get_contents_to_filename('/tmp/foo')

在boto 3中。我找不到一种干净的方法来做同样的事情,所以我手动遍历了“ Streaming”对象:

import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
    chunk = key['Body'].read(1024*8)
    while chunk:
        f.write(chunk)
        chunk = key['Body'].read(1024*8)

要么

import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
    for chunk in iter(lambda: key['Body'].read(4096), b''):
        f.write(chunk)

而且效果很好。我想知道是否有任何“本机” boto3函数可以完成相同的任务?

I’m trying to do a “hello world” with new boto3 client for AWS.

The use-case I have is fairly simple: get object from S3 and save it to the file.

In boto 2.X I would do it like this:

import boto
key = boto.connect_s3().get_bucket('foo').get_key('foo')
key.get_contents_to_filename('/tmp/foo')

In boto 3 . I can’t find a clean way to do the same thing, so I’m manually iterating over the “Streaming” object:

import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
    chunk = key['Body'].read(1024*8)
    while chunk:
        f.write(chunk)
        chunk = key['Body'].read(1024*8)

or

import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
    for chunk in iter(lambda: key['Body'].read(4096), b''):
        f.write(chunk)

And it works fine. I was wondering is there any “native” boto3 function that will do the same task?


回答 0

Boto3最近有一项自定义功能,可以帮助您(其中包括其他方面)。当前,它在低级S3客户端上公开,可以这样使用:

s3_client = boto3.client('s3')
open('hello.txt').write('Hello, world!')

# Upload the file to S3
s3_client.upload_file('hello.txt', 'MyBucket', 'hello-remote.txt')

# Download the file from S3
s3_client.download_file('MyBucket', 'hello-remote.txt', 'hello2.txt')
print(open('hello2.txt').read())

这些功能将自动处理读/写文件,以及并行并行处理大文件。

请注意,s3_client.download_file不会创建目录。可以将其创建为pathlib.Path('/path/to/file.txt').parent.mkdir(parents=True, exist_ok=True)

There is a customization that went into Boto3 recently which helps with this (among other things). It is currently exposed on the low-level S3 client, and can be used like this:

s3_client = boto3.client('s3')
open('hello.txt').write('Hello, world!')

# Upload the file to S3
s3_client.upload_file('hello.txt', 'MyBucket', 'hello-remote.txt')

# Download the file from S3
s3_client.download_file('MyBucket', 'hello-remote.txt', 'hello2.txt')
print(open('hello2.txt').read())

These functions will automatically handle reading/writing files as well as doing multipart uploads in parallel for large files.

Note that s3_client.download_file won’t create a directory. It can be created as pathlib.Path('/path/to/file.txt').parent.mkdir(parents=True, exist_ok=True).


回答 1

boto3现在具有比客户端更好的界面:

resource = boto3.resource('s3')
my_bucket = resource.Bucket('MyBucket')
my_bucket.download_file(key, local_filename)

就其本身而言,它并没有比client接受的答案好得多(尽管文档说它在失败时重试上载和下载做得更好),但考虑到资源通常更符合人体工程学(例如,s3 存储桶对象资源)比客户端方法更好),这确实使您可以停留在资源层而不必下拉。

Resources 通常,可以使用与客户端相同的方式来创建它们,并且它们采用全部或大部分相同的参数,然后将其转发给其内部客户端。

boto3 now has a nicer interface than the client:

resource = boto3.resource('s3')
my_bucket = resource.Bucket('MyBucket')
my_bucket.download_file(key, local_filename)

This by itself isn’t tremendously better than the client in the accepted answer (although the docs say that it does a better job retrying uploads and downloads on failure) but considering that resources are generally more ergonomic (for example, the s3 bucket and object resources are nicer than the client methods) this does allow you to stay at the resource layer without having to drop down.

Resources generally can be created in the same way as clients, and they take all or most of the same arguments and just forward them to their internal clients.


回答 2

对于那些想模拟set_contents_from_string类似boto2方法的人,您可以尝试

import boto3
from cStringIO import StringIO

s3c = boto3.client('s3')
contents = 'My string to save to S3 object'
target_bucket = 'hello-world.by.vor'
target_file = 'data/hello.txt'
fake_handle = StringIO(contents)

# notice if you do fake_handle.read() it reads like a file handle
s3c.put_object(Bucket=target_bucket, Key=target_file, Body=fake_handle.read())

对于Python3:

在python3中,StringIO和cStringIO都消失了StringIO像这样使用导入:

from io import StringIO

要同时支持两个版本:

try:
   from StringIO import StringIO
except ImportError:
   from io import StringIO

For those of you who would like to simulate the set_contents_from_string like boto2 methods, you can try

import boto3
from cStringIO import StringIO

s3c = boto3.client('s3')
contents = 'My string to save to S3 object'
target_bucket = 'hello-world.by.vor'
target_file = 'data/hello.txt'
fake_handle = StringIO(contents)

# notice if you do fake_handle.read() it reads like a file handle
s3c.put_object(Bucket=target_bucket, Key=target_file, Body=fake_handle.read())

For Python3:

In python3 both StringIO and cStringIO are gone. Use the StringIO import like:

from io import StringIO

To support both version:

try:
   from StringIO import StringIO
except ImportError:
   from io import StringIO

回答 3

# Preface: File is json with contents: {'name': 'Android', 'status': 'ERROR'}

import boto3
import io

s3 = boto3.resource('s3')

obj = s3.Object('my-bucket', 'key-to-file.json')
data = io.BytesIO()
obj.download_fileobj(data)

# object is now a bytes string, Converting it to a dict:
new_dict = json.loads(data.getvalue().decode("utf-8"))

print(new_dict['status']) 
# Should print "Error"
# Preface: File is json with contents: {'name': 'Android', 'status': 'ERROR'}

import boto3
import io

s3 = boto3.resource('s3')

obj = s3.Object('my-bucket', 'key-to-file.json')
data = io.BytesIO()
obj.download_fileobj(data)

# object is now a bytes string, Converting it to a dict:
new_dict = json.loads(data.getvalue().decode("utf-8"))

print(new_dict['status']) 
# Should print "Error"

回答 4

当您想要读取与默认配置不同的文件时,请mpu.aws.s3_download(s3path, destination)直接使用或复制粘贴的代码:

def s3_download(source, destination,
                exists_strategy='raise',
                profile_name=None):
    """
    Copy a file from an S3 source to a local destination.

    Parameters
    ----------
    source : str
        Path starting with s3://, e.g. 's3://bucket-name/key/foo.bar'
    destination : str
    exists_strategy : {'raise', 'replace', 'abort'}
        What is done when the destination already exists?
    profile_name : str, optional
        AWS profile

    Raises
    ------
    botocore.exceptions.NoCredentialsError
        Botocore is not able to find your credentials. Either specify
        profile_name or add the environment variables AWS_ACCESS_KEY_ID,
        AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN.
        See https://boto3.readthedocs.io/en/latest/guide/configuration.html
    """
    exists_strategies = ['raise', 'replace', 'abort']
    if exists_strategy not in exists_strategies:
        raise ValueError('exists_strategy \'{}\' is not in {}'
                         .format(exists_strategy, exists_strategies))
    session = boto3.Session(profile_name=profile_name)
    s3 = session.resource('s3')
    bucket_name, key = _s3_path_split(source)
    if os.path.isfile(destination):
        if exists_strategy is 'raise':
            raise RuntimeError('File \'{}\' already exists.'
                               .format(destination))
        elif exists_strategy is 'abort':
            return
    s3.Bucket(bucket_name).download_file(key, destination)

from collections import namedtuple

S3Path = namedtuple("S3Path", ["bucket_name", "key"])


def _s3_path_split(s3_path):
    """
    Split an S3 path into bucket and key.

    Parameters
    ----------
    s3_path : str

    Returns
    -------
    splitted : (str, str)
        (bucket, key)

    Examples
    --------
    >>> _s3_path_split('s3://my-bucket/foo/bar.jpg')
    S3Path(bucket_name='my-bucket', key='foo/bar.jpg')
    """
    if not s3_path.startswith("s3://"):
        raise ValueError(
            "s3_path is expected to start with 's3://', " "but was {}"
            .format(s3_path)
        )
    bucket_key = s3_path[len("s3://"):]
    bucket_name, key = bucket_key.split("/", 1)
    return S3Path(bucket_name, key)

When you want to read a file with a different configuration than the default one, feel free to use either mpu.aws.s3_download(s3path, destination) directly or the copy-pasted code:

def s3_download(source, destination,
                exists_strategy='raise',
                profile_name=None):
    """
    Copy a file from an S3 source to a local destination.

    Parameters
    ----------
    source : str
        Path starting with s3://, e.g. 's3://bucket-name/key/foo.bar'
    destination : str
    exists_strategy : {'raise', 'replace', 'abort'}
        What is done when the destination already exists?
    profile_name : str, optional
        AWS profile

    Raises
    ------
    botocore.exceptions.NoCredentialsError
        Botocore is not able to find your credentials. Either specify
        profile_name or add the environment variables AWS_ACCESS_KEY_ID,
        AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN.
        See https://boto3.readthedocs.io/en/latest/guide/configuration.html
    """
    exists_strategies = ['raise', 'replace', 'abort']
    if exists_strategy not in exists_strategies:
        raise ValueError('exists_strategy \'{}\' is not in {}'
                         .format(exists_strategy, exists_strategies))
    session = boto3.Session(profile_name=profile_name)
    s3 = session.resource('s3')
    bucket_name, key = _s3_path_split(source)
    if os.path.isfile(destination):
        if exists_strategy is 'raise':
            raise RuntimeError('File \'{}\' already exists.'
                               .format(destination))
        elif exists_strategy is 'abort':
            return
    s3.Bucket(bucket_name).download_file(key, destination)

from collections import namedtuple

S3Path = namedtuple("S3Path", ["bucket_name", "key"])


def _s3_path_split(s3_path):
    """
    Split an S3 path into bucket and key.

    Parameters
    ----------
    s3_path : str

    Returns
    -------
    splitted : (str, str)
        (bucket, key)

    Examples
    --------
    >>> _s3_path_split('s3://my-bucket/foo/bar.jpg')
    S3Path(bucket_name='my-bucket', key='foo/bar.jpg')
    """
    if not s3_path.startswith("s3://"):
        raise ValueError(
            "s3_path is expected to start with 's3://', " "but was {}"
            .format(s3_path)
        )
    bucket_key = s3_path[len("s3://"):]
    bucket_name, key = bucket_key.split("/", 1)
    return S3Path(bucket_name, key)

回答 5

注意:我假设您已经分别配置了身份验证。下面的代码是从S3存储桶下载单个对象。

import boto3

#initiate s3 client 
s3 = boto3.resource('s3')

#Download object to the file    
s3.Bucket('mybucket').download_file('hello.txt', '/tmp/hello.txt')

Note: I’m assuming you have configured authentication separately. Below code is to download the single object from the S3 bucket.

import boto3

#initiate s3 client 
s3 = boto3.resource('s3')

#Download object to the file    
s3.Bucket('mybucket').download_file('hello.txt', '/tmp/hello.txt')

使用boto3连接到CloudFront时如何选择AWS配置文件

问题:使用boto3连接到CloudFront时如何选择AWS配置文件

我正在使用Boto 3 python库,并想连接到AWS CloudFront。我需要指定正确的AWS Profile(AWS凭证),但是在查看官方文档时,我看不到指定它的方法。

我正在使用代码初始化客户端: client = boto3.client('cloudfront')

但是,这导致它使用默认配置文件进行连接。我找不到可以指定要使用的配置文件的方法。

I am using the Boto 3 python library, and want to connect to AWS CloudFront. I need to specify the correct AWS Profile (AWS Credentials), but looking at the official documentation, I see no way to specify it.

I am initializing the client using the code: client = boto3.client('cloudfront')

However, this results in it using the default profile to connect. I couldn’t find a method where I can specify which profile to use.


回答 0

我认为文档在展示如何执行此操作方面并不出色。一段时间以来,它一直是受支持的功能,并且此pull request中有一些细节。

因此,有三种不同的方法可以做到这一点:

选项A)使用配置文件创建新会话

    dev = boto3.session.Session(profile_name='dev')

选项B)在代码中更改默认会话的配置文件

    boto3.setup_default_session(profile_name='dev')

选项C)使用环境变量更改默认会话的配置文件

    $ AWS_PROFILE=dev ipython
    >>> import boto3
    >>> s3dev = boto3.resource('s3')

I think the docs aren’t wonderful at exposing how to do this. It has been a supported feature for some time, however, and there are some details in this pull request.

So there are three different ways to do this:

Option A) Create a new session with the profile

    dev = boto3.session.Session(profile_name='dev')

Option B) Change the profile of the default session in code

    boto3.setup_default_session(profile_name='dev')

Option C) Change the profile of the default session with an environment variable

    $ AWS_PROFILE=dev ipython
    >>> import boto3
    >>> s3dev = boto3.resource('s3')

回答 1

这样做以使用名称为’dev’的配置文件:

session = boto3.session.Session(profile_name='dev')
s3 = session.resource('s3')
for bucket in s3.buckets.all():
    print(bucket.name)

Do this to use a profile with name ‘dev’:

session = boto3.session.Session(profile_name='dev')
s3 = session.resource('s3')
for bucket in s3.buckets.all():
    print(bucket.name)

回答 2

boto3文档的本部分非常有用。

这对我有用:

session = boto3.Session(profile_name='dev')
client = session.client('cloudfront')

This section of the boto3 documentation is helpful.

Here’s what worked for me:

session = boto3.Session(profile_name='dev')
client = session.client('cloudfront')

回答 3

只需在客户端调用之前将配置文件添加到会话配置即可。 boto3.session.Session(profile_name='YOUR_PROFILE_NAME').client('cloudwatch')

Just add profile to session configuration before client call. boto3.session.Session(profile_name='YOUR_PROFILE_NAME').client('cloudwatch')


将Dataframe保存到csv直接保存到s3 Python

问题:将Dataframe保存到csv直接保存到s3 Python

我有一个要上传到新CSV文件的pandas DataFrame。问题是在将文件传输到s3之前,我不想在本地保存文件。是否有像to_csv这样的方法可以将数据帧直接写入s3?我正在使用boto3。
这是我到目前为止的内容:

import boto3
s3 = boto3.client('s3', aws_access_key_id='key', aws_secret_access_key='secret_key')
read_file = s3.get_object(Bucket, Key)
df = pd.read_csv(read_file['Body'])

# Make alterations to DataFrame

# Then export DataFrame to CSV through direct transfer to s3

I have a pandas DataFrame that I want to upload to a new CSV file. The problem is that I don’t want to save the file locally before transferring it to s3. Is there any method like to_csv for writing the dataframe to s3 directly? I am using boto3.
Here is what I have so far:

import boto3
s3 = boto3.client('s3', aws_access_key_id='key', aws_secret_access_key='secret_key')
read_file = s3.get_object(Bucket, Key)
df = pd.read_csv(read_file['Body'])

# Make alterations to DataFrame

# Then export DataFrame to CSV through direct transfer to s3

回答 0

您可以使用:

from io import StringIO # python3; python2: BytesIO 
import boto3

bucket = 'my_bucket_name' # already created on S3
csv_buffer = StringIO()
df.to_csv(csv_buffer)
s3_resource = boto3.resource('s3')
s3_resource.Object(bucket, 'df.csv').put(Body=csv_buffer.getvalue())

You can use:

from io import StringIO # python3; python2: BytesIO 
import boto3

bucket = 'my_bucket_name' # already created on S3
csv_buffer = StringIO()
df.to_csv(csv_buffer)
s3_resource = boto3.resource('s3')
s3_resource.Object(bucket, 'df.csv').put(Body=csv_buffer.getvalue())

回答 1

您可以直接使用S3路径。我正在使用Pandas 0.24.1

In [1]: import pandas as pd

In [2]: df = pd.DataFrame( [ [1, 1, 1], [2, 2, 2] ], columns=['a', 'b', 'c'])

In [3]: df
Out[3]:
   a  b  c
0  1  1  1
1  2  2  2

In [4]: df.to_csv('s3://experimental/playground/temp_csv/dummy.csv', index=False)

In [5]: pd.__version__
Out[5]: '0.24.1'

In [6]: new_df = pd.read_csv('s3://experimental/playground/temp_csv/dummy.csv')

In [7]: new_df
Out[7]:
   a  b  c
0  1  1  1
1  2  2  2

发行公告:

S3文件处理

熊猫现在使用s3fs处理S3连接。这不应破坏任何代码。但是,由于s3fs不是必需的依赖项,因此您将需要单独安装它,例如以前版本的panda中的boto。GH11915

You can directly use the S3 path. I am using Pandas 0.24.1

In [1]: import pandas as pd

In [2]: df = pd.DataFrame( [ [1, 1, 1], [2, 2, 2] ], columns=['a', 'b', 'c'])

In [3]: df
Out[3]:
   a  b  c
0  1  1  1
1  2  2  2

In [4]: df.to_csv('s3://experimental/playground/temp_csv/dummy.csv', index=False)

In [5]: pd.__version__
Out[5]: '0.24.1'

In [6]: new_df = pd.read_csv('s3://experimental/playground/temp_csv/dummy.csv')

In [7]: new_df
Out[7]:
   a  b  c
0  1  1  1
1  2  2  2

Release Note:

S3 File Handling

pandas now uses s3fs for handling S3 connections. This shouldn’t break any code. However, since s3fs is not a required dependency, you will need to install it separately, like boto in prior versions of pandas. GH11915.


回答 2

我喜欢s3fs,它使您可以像本地文件系统一样(几乎)使用s3。

你可以这样做:

import s3fs

bytes_to_write = df.to_csv(None).encode()
fs = s3fs.S3FileSystem(key=key, secret=secret)
with fs.open('s3://bucket/path/to/file.csv', 'wb') as f:
    f.write(bytes_to_write)

s3fs只支持rbwb打开文件,这就是为什么我做这个模式bytes_to_write的东西。

I like s3fs which lets you use s3 (almost) like a local filesystem.

You can do this:

import s3fs

bytes_to_write = df.to_csv(None).encode()
fs = s3fs.S3FileSystem(key=key, secret=secret)
with fs.open('s3://bucket/path/to/file.csv', 'wb') as f:
    f.write(bytes_to_write)

s3fs supports only rb and wb modes of opening the file, that’s why I did this bytes_to_write stuff.


回答 3

这是最新的答案:

import s3fs

s3 = s3fs.S3FileSystem(anon=False)

# Use 'w' for py3, 'wb' for py2
with s3.open('<bucket-name>/<filename>.csv','w') as f:
    df.to_csv(f)

StringIO的问题在于它将吞噬您的内存。使用此方法,您将文件流式传输到s3,而不是将其转换为字符串,然后将其写入s3。将pandas数据框及其字符串副本保存在内存中似乎效率很低。

如果您在ec2 Instant中工作,则可以为其赋予IAM角色以使其能够写入s3,因此您无需直接传递凭据。但是,您也可以通过将凭据传递给S3FileSystem()功能来连接到存储桶。请参阅文档:https : //s3fs.readthedocs.io/en/latest/

This is a more up to date answer:

import s3fs

s3 = s3fs.S3FileSystem(anon=False)

# Use 'w' for py3, 'wb' for py2
with s3.open('<bucket-name>/<filename>.csv','w') as f:
    df.to_csv(f)

The problem with StringIO is that it will eat away at your memory. With this method, you are streaming the file to s3, rather than converting it to string, then writing it into s3. Holding the pandas dataframe and its string copy in memory seems very inefficient.

If you are working in an ec2 instant, you can give it an IAM role to enable writing it to s3, thus you dont need to pass in credentials directly. However, you can also connect to a bucket by passing credentials to the S3FileSystem() function. See documention:https://s3fs.readthedocs.io/en/latest/


回答 4

如果None将第一个参数传递to_csv()给数据,则将以字符串形式返回。从那里开始,只需一步即可将其上传到S3。

也可以将一个StringIO对象传递给to_csv(),但是使用字符串会更容易。

If you pass None as the first argument to to_csv() the data will be returned as a string. From there it’s an easy step to upload that to S3 in one go.

It should also be possible to pass a StringIO object to to_csv(), but using a string will be easier.


回答 5

您还可以使用AWS Data Wrangler

import awswrangler

session = awswrangler.Session()
session.pandas.to_csv(
    dataframe=df,
    path="s3://...",
)

请注意,由于它是并行上传的,因此它将分为几部分。

You can also use the AWS Data Wrangler:

import awswrangler as wr
    
wr.s3.to_csv(
    df=df,
    path="s3://...",
)

Note that it will handle multipart upload for you to make the upload faster.


回答 6

我发现也可以使用client,而不仅仅是resource

from io import StringIO
import boto3
s3 = boto3.client("s3",\
                  region_name=region_name,\
                  aws_access_key_id=aws_access_key_id,\
                  aws_secret_access_key=aws_secret_access_key)
csv_buf = StringIO()
df.to_csv(csv_buf, header=True, index=False)
csv_buf.seek(0)
s3.put_object(Bucket=bucket, Body=csv_buf.getvalue(), Key='path/test.csv')

I found this can be done using client also and not just resource.

from io import StringIO
import boto3
s3 = boto3.client("s3",\
                  region_name=region_name,\
                  aws_access_key_id=aws_access_key_id,\
                  aws_secret_access_key=aws_secret_access_key)
csv_buf = StringIO()
df.to_csv(csv_buf, header=True, index=False)
csv_buf.seek(0)
s3.put_object(Bucket=bucket, Body=csv_buf.getvalue(), Key='path/test.csv')

回答 7

由于您正在使用boto3.client(),请尝试:

import boto3
from io import StringIO #python3 
s3 = boto3.client('s3', aws_access_key_id='key', aws_secret_access_key='secret_key')
def copy_to_s3(client, df, bucket, filepath):
    csv_buf = StringIO()
    df.to_csv(csv_buf, header=True, index=False)
    csv_buf.seek(0)
    client.put_object(Bucket=bucket, Body=csv_buf.getvalue(), Key=filepath)
    print(f'Copy {df.shape[0]} rows to S3 Bucket {bucket} at {filepath}, Done!')

copy_to_s3(client=s3, df=df_to_upload, bucket='abc', filepath='def/test.csv')

since you are using boto3.client(), try:

import boto3
from io import StringIO #python3 
s3 = boto3.client('s3', aws_access_key_id='key', aws_secret_access_key='secret_key')
def copy_to_s3(client, df, bucket, filepath):
    csv_buf = StringIO()
    df.to_csv(csv_buf, header=True, index=False)
    csv_buf.seek(0)
    client.put_object(Bucket=bucket, Body=csv_buf.getvalue(), Key=filepath)
    print(f'Copy {df.shape[0]} rows to S3 Bucket {bucket} at {filepath}, Done!')

copy_to_s3(client=s3, df=df_to_upload, bucket='abc', filepath='def/test.csv')

回答 8

我找到了一个似乎很有效的简单解决方案:

s3 = boto3.client("s3")

s3.put_object(
    Body=open("filename.csv").read(),
    Bucket="your-bucket",
    Key="your-key"
)

希望能有所帮助!

I found a very simple solution that seems to be working :

s3 = boto3.client("s3")

s3.put_object(
    Body=open("filename.csv").read(),
    Bucket="your-bucket",
    Key="your-key"
)

Hope that helps !


回答 9

我从存储桶s3中读取了两列的csv,并将文件csv的内容放入了pandas数据框。

例:

config.json

{
  "credential": {
    "access_key":"xxxxxx",
    "secret_key":"xxxxxx"
}
,
"s3":{
       "bucket":"mybucket",
       "key":"csv/user.csv"
   }
}

cls_config.json

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os
import json

class cls_config(object):

    def __init__(self,filename):

        self.filename = filename


    def getConfig(self):

        fileName = os.path.join(os.path.dirname(__file__), self.filename)
        with open(fileName) as f:
        config = json.load(f)
        return config

cls_pandas.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import pandas as pd
import io

class cls_pandas(object):

    def __init__(self):
        pass

    def read(self,stream):

        df = pd.read_csv(io.StringIO(stream), sep = ",")
        return df

cls_s3.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import boto3
import json

class cls_s3(object):

    def  __init__(self,access_key,secret_key):

        self.s3 = boto3.client('s3', aws_access_key_id=access_key, aws_secret_access_key=secret_key)

    def getObject(self,bucket,key):

        read_file = self.s3.get_object(Bucket=bucket, Key=key)
        body = read_file['Body'].read().decode('utf-8')
        return body

test.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from cls_config import *
from cls_s3 import *
from cls_pandas import *

class test(object):

    def __init__(self):
        self.conf = cls_config('config.json')

    def process(self):

        conf = self.conf.getConfig()

        bucket = conf['s3']['bucket']
        key = conf['s3']['key']

        access_key = conf['credential']['access_key']
        secret_key = conf['credential']['secret_key']

        s3 = cls_s3(access_key,secret_key)
        ob = s3.getObject(bucket,key)

        pa = cls_pandas()
        df = pa.read(ob)

        print df

if __name__ == '__main__':
    test = test()
    test.process()

I read a csv with two columns from bucket s3, and the content of the file csv i put in pandas dataframe.

Example:

config.json

{
  "credential": {
    "access_key":"xxxxxx",
    "secret_key":"xxxxxx"
}
,
"s3":{
       "bucket":"mybucket",
       "key":"csv/user.csv"
   }
}

cls_config.json

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os
import json

class cls_config(object):

    def __init__(self,filename):

        self.filename = filename


    def getConfig(self):

        fileName = os.path.join(os.path.dirname(__file__), self.filename)
        with open(fileName) as f:
        config = json.load(f)
        return config

cls_pandas.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import pandas as pd
import io

class cls_pandas(object):

    def __init__(self):
        pass

    def read(self,stream):

        df = pd.read_csv(io.StringIO(stream), sep = ",")
        return df

cls_s3.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import boto3
import json

class cls_s3(object):

    def  __init__(self,access_key,secret_key):

        self.s3 = boto3.client('s3', aws_access_key_id=access_key, aws_secret_access_key=secret_key)

    def getObject(self,bucket,key):

        read_file = self.s3.get_object(Bucket=bucket, Key=key)
        body = read_file['Body'].read().decode('utf-8')
        return body

test.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from cls_config import *
from cls_s3 import *
from cls_pandas import *

class test(object):

    def __init__(self):
        self.conf = cls_config('config.json')

    def process(self):

        conf = self.conf.getConfig()

        bucket = conf['s3']['bucket']
        key = conf['s3']['key']

        access_key = conf['credential']['access_key']
        secret_key = conf['credential']['secret_key']

        s3 = cls_s3(access_key,secret_key)
        ob = s3.getObject(bucket,key)

        pa = cls_pandas()
        df = pa.read(ob)

        print df

if __name__ == '__main__':
    test = test()
    test.process()

连接到boto3 S3时如何指定凭据?

问题:连接到boto3 S3时如何指定凭据?

在boto上,当以这种方式连接到S3时,我通常指定我的凭据:

import boto
from boto.s3.connection import Key, S3Connection
S3 = S3Connection( settings.AWS_SERVER_PUBLIC_KEY, settings.AWS_SERVER_SECRET_KEY )

然后,我可以使用S3执行操作(在我的情况下,从存储桶中删除对象)。

使用boto3,我发现的所有示例都是这样的:

import boto3
S3 = boto3.resource( 's3' )
S3.Object( bucket_name, key_name ).delete()

我无法指定我的凭据,因此所有尝试均因InvalidAccessKeyId错误而失败。

如何使用boto3指定凭据?

On boto I used to specify my credentials when connecting to S3 in such a way:

import boto
from boto.s3.connection import Key, S3Connection
S3 = S3Connection( settings.AWS_SERVER_PUBLIC_KEY, settings.AWS_SERVER_SECRET_KEY )

I could then use S3 to perform my operations (in my case deleting an object from a bucket).

With boto3 all the examples I found are such:

import boto3
S3 = boto3.resource( 's3' )
S3.Object( bucket_name, key_name ).delete()

I couldn’t specify my credentials and thus all attempts fail with InvalidAccessKeyId error.

How can I specify credentials with boto3?


回答 0

您可以创建一个会话

import boto3
session = boto3.Session(
    aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY,
    aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY,
)

然后使用该会话获取S3资源:

s3 = session.resource('s3')

You can create a session:

import boto3
session = boto3.Session(
    aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY,
    aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY,
)

Then use that session to get an S3 resource:

s3 = session.resource('s3')

回答 1

您可以client像下面这样直接获得一个新会话。

 s3_client = boto3.client('s3', 
                      aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY, 
                      aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY, 
                      region_name=REGION_NAME
                      )

You can get a client with new session directly like below.

 s3_client = boto3.client('s3', 
                      aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY, 
                      aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY, 
                      region_name=REGION_NAME
                      )

回答 2

这是旧的,但也将其放在这里供我参考。boto3.resource只是实现默认的Session,可以通过boto3.resource会话详细信息。

Help on function resource in module boto3:

resource(*args, **kwargs)
    Create a resource service client by name using the default session.

    See :py:meth:`boto3.session.Session.resource`.

https://github.com/boto/boto3/blob/86392b5ca26da57ce6a776365a52d3cab8487d60/boto3/session.py#L265

您会看到它只接受与Boto3.Session相同的参数

import boto3
S3 = boto3.resource('s3', region_name='us-west-2', aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY, aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY)
S3.Object( bucket_name, key_name ).delete()

This is older but placing this here for my reference too. boto3.resource is just implementing the default Session, you can pass through boto3.resource session details.

Help on function resource in module boto3:

resource(*args, **kwargs)
    Create a resource service client by name using the default session.

    See :py:meth:`boto3.session.Session.resource`.

https://github.com/boto/boto3/blob/86392b5ca26da57ce6a776365a52d3cab8487d60/boto3/session.py#L265

you can see that it just takes the same arguments as Boto3.Session

import boto3
S3 = boto3.resource('s3', region_name='us-west-2', aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY, aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY)
S3.Object( bucket_name, key_name ).delete()

回答 3

我想扩展@JustAGuy的答案。我更喜欢使用的方法是AWS CLI创建配置文件。原因是使用配置文件时,CLISDK会自动在~/.aws文件夹中查找凭据。好AWS CLI是用python编写的。

如果还没有,可以从pypi获取cli。这是从终端获取CLI的步骤

$> pip install awscli  #can add user flag 
$> aws configure
AWS Access Key ID [****************ABCD]:[enter your key here]
AWS Secret Access Key [****************xyz]:[enter your secret key here]
Default region name [us-west-2]:[enter your region here]
Default output format [None]:

之后,您boto无需指定键即可访问和任何api(除非您想使用其他凭据)。

I’d like expand on @JustAGuy’s answer. The method I prefer is to use AWS CLI to create a config file. The reason is, with the config file, the CLI or the SDK will automatically look for credentials in the ~/.aws folder. And the good thing is that AWS CLI is written in python.

You can get cli from pypi if you don’t have it already. Here are the steps to get cli set up from terminal

$> pip install awscli  #can add user flag 
$> aws configure
AWS Access Key ID [****************ABCD]:[enter your key here]
AWS Secret Access Key [****************xyz]:[enter your secret key here]
Default region name [us-west-2]:[enter your region here]
Default output format [None]:

After this you can access boto and any of the api without having to specify keys (unless you want to use a different credentials).


回答 4

在仍然使用boto3.resource()的情况下,有多种存储凭证的方法。我自己在使用AWS CLI方法。完美运作。

https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html?fbclid=IwAR2LlrS4O2gYH6xAF4QDVIH2Q2tzfF_VZ6loM3XfXsPAOR4qA-pX_qAILys

There are numerous ways to store credentials while still using boto3.resource(). I’m using the AWS CLI method myself. It works perfectly.

https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html?fbclid=IwAR2LlrS4O2gYH6xAF4QDVIH2Q2tzfF_VZ6loM3XfXsPAOR4qA-pX_qAILys


如何使用boto3将文件或数据写入S3对象

问题:如何使用boto3将文件或数据写入S3对象

在boto 2中,可以使用以下方法写入S3对象:

是否有boto 3等效项?将数据保存到S3上存储的对象的boto3方法是什么?

In boto 2, you can write to an S3 object using these methods:

Is there a boto 3 equivalent? What is the boto3 method for saving data to an object stored on S3?


回答 0

在Boto 3中,“ Key.set_contents_from_”方法被替换为

例如:

import boto3

some_binary_data = b'Here we have some data'
more_binary_data = b'Here we have some more data'

# Method 1: Object.put()
s3 = boto3.resource('s3')
object = s3.Object('my_bucket_name', 'my/key/including/filename.txt')
object.put(Body=some_binary_data)

# Method 2: Client.put_object()
client = boto3.client('s3')
client.put_object(Body=more_binary_data, Bucket='my_bucket_name', Key='my/key/including/anotherfilename.txt')

另外,二进制数据可以来自读取文件,如官方文档中比较boto 2和boto 3所述

储存资料

从文件,流或字符串存储数据很容易:

# Boto 2.x
from boto.s3.key import Key
key = Key('hello.txt')
key.set_contents_from_file('/tmp/hello.txt')

# Boto 3
s3.Object('mybucket', 'hello.txt').put(Body=open('/tmp/hello.txt', 'rb'))

In boto 3, the ‘Key.set_contents_from_’ methods were replaced by

For example:

import boto3

some_binary_data = b'Here we have some data'
more_binary_data = b'Here we have some more data'

# Method 1: Object.put()
s3 = boto3.resource('s3')
object = s3.Object('my_bucket_name', 'my/key/including/filename.txt')
object.put(Body=some_binary_data)

# Method 2: Client.put_object()
client = boto3.client('s3')
client.put_object(Body=more_binary_data, Bucket='my_bucket_name', Key='my/key/including/anotherfilename.txt')

Alternatively, the binary data can come from reading a file, as described in the official docs comparing boto 2 and boto 3:

Storing Data

Storing data from a file, stream, or string is easy:

# Boto 2.x
from boto.s3.key import Key
key = Key('hello.txt')
key.set_contents_from_file('/tmp/hello.txt')

# Boto 3
s3.Object('mybucket', 'hello.txt').put(Body=open('/tmp/hello.txt', 'rb'))

回答 1

boto3还有一种直接上传文件的方法:

s3.Bucket('bucketname').upload_file('/local/file/here.txt','folder/sub/path/to/s3key')

http://boto3.readthedocs.io/zh_CN/latest/reference/services/s3.html#S3.Bucket.upload_file

boto3 also has a method for uploading a file directly:

s3.Bucket('bucketname').upload_file('/local/file/here.txt','folder/sub/path/to/s3key')

http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Bucket.upload_file


回答 2

在S3中写入文件之前,您不再需要将内容转换为二进制文件。以下示例在具有字符串内容的S3存储桶中创建一个新的文本文件(称为newfile.txt):

import boto3

s3 = boto3.resource(
    's3',
    region_name='us-east-1',
    aws_access_key_id=KEY_ID,
    aws_secret_access_key=ACCESS_KEY
)
content="String content to write to a new S3 file"
s3.Object('my-bucket-name', 'newfile.txt').put(Body=content)

You no longer have to convert the contents to binary before writing to the file in S3. The following example creates a new text file (called newfile.txt) in an S3 bucket with string contents:

import boto3

s3 = boto3.resource(
    's3',
    region_name='us-east-1',
    aws_access_key_id=KEY_ID,
    aws_secret_access_key=ACCESS_KEY
)
content="String content to write to a new S3 file"
s3.Object('my-bucket-name', 'newfile.txt').put(Body=content)

回答 3

这是一个从s3读取JSON的好技巧:

import json, boto3
s3 = boto3.resource("s3").Bucket("bucket")
json.load_s3 = lambda f: json.load(s3.Object(key=f).get()["Body"])
json.dump_s3 = lambda obj, f: s3.Object(key=f).put(Body=json.dumps(obj))

现在你可以使用json.load_s3json.dump_s3使用相同的API loaddump

data = {"test":0}
json.dump_s3(data, "key") # saves json to s3://bucket/key
data = json.load_s3("key") # read json from s3://bucket/key

Here’s a nice trick to read JSON from s3:

import json, boto3
s3 = boto3.resource("s3").Bucket("bucket")
json.load_s3 = lambda f: json.load(s3.Object(key=f).get()["Body"])
json.dump_s3 = lambda obj, f: s3.Object(key=f).put(Body=json.dumps(obj))

Now you can use json.load_s3 and json.dump_s3 with the same API as load and dump

data = {"test":0}
json.dump_s3(data, "key") # saves json to s3://bucket/key
data = json.load_s3("key") # read json from s3://bucket/key

回答 4

简洁明了的版本,可用于将文件动态上传到给定的S3存储桶和子文件夹-

import boto3

BUCKET_NAME = 'sample_bucket_name'
PREFIX = 'sub-folder/'

s3 = boto3.resource('s3')

# Creating an empty file called "_DONE" and putting it in the S3 bucket
s3.Object(BUCKET_NAME, PREFIX + '_DONE').put(Body="")

注意:您应始终将AWS凭证(aws_access_key_idaws_secret_access_key)放在单独的文件中,例如-~/.aws/credentials

A cleaner and concise version which I use to upload files on the fly to a given S3 bucket and sub-folder-

import boto3

BUCKET_NAME = 'sample_bucket_name'
PREFIX = 'sub-folder/'

s3 = boto3.resource('s3')

# Creating an empty file called "_DONE" and putting it in the S3 bucket
s3.Object(BUCKET_NAME, PREFIX + '_DONE').put(Body="")

Note: You should ALWAYS put your AWS credentials (aws_access_key_id and aws_secret_access_key) in a separate file, for example- ~/.aws/credentials


回答 5

值得一提boto3用作后端的智能开放

smart-open是一个下拉更换为Python的open,可以从打开的文件s3,以及ftphttp和许多其他协议。

例如

from smart_open import open
import json
with open("s3://your_bucket/your_key.json", 'r') as f:
    data = json.load(f)

aws凭证通过boto3凭证(通常是~/.aws/dir中的文件或环境变量)加载。

it is worth mentioning smart-open that uses boto3 as a back-end.

smart-open is a drop-in replacement for python’s open that can open files from s3, as well as ftp, http and many other protocols.

for example

from smart_open import open
import json
with open("s3://your_bucket/your_key.json", 'r') as f:
    data = json.load(f)

The aws credentials are loaded via boto3 credentials, usually a file in the ~/.aws/ dir or an environment variable.


回答 6

您可能会使用以下代码进行写操作,例如在2019年将图像写入S3。要连接到S3,您将必须使用command安装AWS CLI pip install awscli,然后使用command 输入少量凭证aws configure

import urllib3
import uuid
from pathlib import Path
from io import BytesIO
from errors import custom_exceptions as cex

BUCKET_NAME = "xxx.yyy.zzz"
POSTERS_BASE_PATH = "assets/wallcontent"
CLOUDFRONT_BASE_URL = "https://xxx.cloudfront.net/"


class S3(object):
    def __init__(self):
        self.client = boto3.client('s3')
        self.bucket_name = BUCKET_NAME
        self.posters_base_path = POSTERS_BASE_PATH

    def __download_image(self, url):
        manager = urllib3.PoolManager()
        try:
            res = manager.request('GET', url)
        except Exception:
            print("Could not download the image from URL: ", url)
            raise cex.ImageDownloadFailed
        return BytesIO(res.data)  # any file-like object that implements read()

    def upload_image(self, url):
        try:
            image_file = self.__download_image(url)
        except cex.ImageDownloadFailed:
            raise cex.ImageUploadFailed

        extension = Path(url).suffix
        id = uuid.uuid1().hex + extension
        final_path = self.posters_base_path + "/" + id
        try:
            self.client.upload_fileobj(image_file,
                                       self.bucket_name,
                                       final_path
                                       )
        except Exception:
            print("Image Upload Error for URL: ", url)
            raise cex.ImageUploadFailed

        return CLOUDFRONT_BASE_URL + id

You may use the below code to write, for example an image to S3 in 2019. To be able to connect to S3 you will have to install AWS CLI using command pip install awscli, then enter few credentials using command aws configure:

import urllib3
import uuid
from pathlib import Path
from io import BytesIO
from errors import custom_exceptions as cex

BUCKET_NAME = "xxx.yyy.zzz"
POSTERS_BASE_PATH = "assets/wallcontent"
CLOUDFRONT_BASE_URL = "https://xxx.cloudfront.net/"


class S3(object):
    def __init__(self):
        self.client = boto3.client('s3')
        self.bucket_name = BUCKET_NAME
        self.posters_base_path = POSTERS_BASE_PATH

    def __download_image(self, url):
        manager = urllib3.PoolManager()
        try:
            res = manager.request('GET', url)
        except Exception:
            print("Could not download the image from URL: ", url)
            raise cex.ImageDownloadFailed
        return BytesIO(res.data)  # any file-like object that implements read()

    def upload_image(self, url):
        try:
            image_file = self.__download_image(url)
        except cex.ImageDownloadFailed:
            raise cex.ImageUploadFailed

        extension = Path(url).suffix
        id = uuid.uuid1().hex + extension
        final_path = self.posters_base_path + "/" + id
        try:
            self.client.upload_fileobj(image_file,
                                       self.bucket_name,
                                       final_path
                                       )
        except Exception:
            print("Image Upload Error for URL: ", url)
            raise cex.ImageUploadFailed

        return CLOUDFRONT_BASE_URL + id

使用boto3检查s3中存储桶中是否存在密钥

问题:使用boto3检查s3中存储桶中是否存在密钥

我想知道boto3中是否存在密钥。我可以循环存储桶中的内容并检查密钥是否匹配。

但这似乎更长,并且是一个过大的杀伤力。Boto3官方文档明确说明了如何执行此操作。

可能是我缺少明显之处。谁能指出我如何实现这一目标。

I would like to know if a key exists in boto3. I can loop the bucket contents and check the key if it matches.

But that seems longer and an overkill. Boto3 official docs explicitly state how to do this.

May be I am missing the obvious. Can anybody point me how I can achieve this.


回答 0

Boto 2的boto.s3.key.Key对象曾经有一种exists方法,该方法通过执行HEAD请求并查看结果来检查密钥是否在S3上存在,但似乎不再存在。您必须自己做:

import boto3
import botocore

s3 = boto3.resource('s3')

try:
    s3.Object('my-bucket', 'dootdoot.jpg').load()
except botocore.exceptions.ClientError as e:
    if e.response['Error']['Code'] == "404":
        # The object does not exist.
        ...
    else:
        # Something else has gone wrong.
        raise
else:
    # The object does exist.
    ...

load() 对单个键执行HEAD请求,这是快速的,即使有问题的对象很大或存储桶中有很多对象也是如此。

当然,您可能正在检查对象是否存在,因为您打算使用它。如果是这种情况,您可以只忘了load()并直接执行a get()或操作download_file(),然后在那里处理错误情况。

Boto 2’s boto.s3.key.Key object used to have an exists method that checked if the key existed on S3 by doing a HEAD request and looking at the the result, but it seems that that no longer exists. You have to do it yourself:

import boto3
import botocore

s3 = boto3.resource('s3')

try:
    s3.Object('my-bucket', 'dootdoot.jpg').load()
except botocore.exceptions.ClientError as e:
    if e.response['Error']['Code'] == "404":
        # The object does not exist.
        ...
    else:
        # Something else has gone wrong.
        raise
else:
    # The object does exist.
    ...

load() does a HEAD request for a single key, which is fast, even if the object in question is large or you have many objects in your bucket.

Of course, you might be checking if the object exists because you’re planning on using it. If that is the case, you can just forget about the load() and do a get() or download_file() directly, then handle the error case there.


回答 1

我不是非常喜欢将异常用于控制流。这是在boto3中起作用的替代方法:

import boto3

s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket')
key = 'dootdoot.jpg'
objs = list(bucket.objects.filter(Prefix=key))
if any([w.key == path_s3 for w in objs]):
    print("Exists!")
else:
    print("Doesn't exist")

I’m not a big fan of using exceptions for control flow. This is an alternative approach that works in boto3:

import boto3

s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket')
key = 'dootdoot.jpg'
objs = list(bucket.objects.filter(Prefix=key))
if any([w.key == path_s3 for w in objs]):
    print("Exists!")
else:
    print("Doesn't exist")

回答 2

我发现(可能是最有效)的最简单方法是:

import boto3
from botocore.errorfactory import ClientError

s3 = boto3.client('s3')
try:
    s3.head_object(Bucket='bucket_name', Key='file_path')
except ClientError:
    # Not found
    pass

The easiest way I found (and probably the most efficient) is this:

import boto3
from botocore.errorfactory import ClientError

s3 = boto3.client('s3')
try:
    s3.head_object(Bucket='bucket_name', Key='file_path')
except ClientError:
    # Not found
    pass

回答 3

在Boto3中,如果要使用list_objects检查文件夹(前缀)或文件。您可以使用响应字典中“目录”的存在来检查对象是否存在。就像@EvilPuppetMaster建议的那样,这是避免try / except捕获的另一种方法

import boto3
client = boto3.client('s3')
results = client.list_objects(Bucket='my-bucket', Prefix='dootdoot.jpg')
return 'Contents' in results

In Boto3, if you’re checking for either a folder (prefix) or a file using list_objects. You can use the existence of ‘Contents’ in the response dict as a check for whether the object exists. It’s another way to avoid the try/except catches as @EvilPuppetMaster suggests

import boto3
client = boto3.client('s3')
results = client.list_objects(Bucket='my-bucket', Prefix='dootdoot.jpg')
return 'Contents' in results

回答 4

不仅client但是bucket太:

import boto3
import botocore
bucket = boto3.resource('s3', region_name='eu-west-1').Bucket('my-bucket')

try:
  bucket.Object('my-file').get()
except botocore.exceptions.ClientError as ex:
  if ex.response['Error']['Code'] == 'NoSuchKey':
    print('NoSuchKey')

Not only client but bucket too:

import boto3
import botocore
bucket = boto3.resource('s3', region_name='eu-west-1').Bucket('my-bucket')

try:
  bucket.Object('my-file').get()
except botocore.exceptions.ClientError as ex:
  if ex.response['Error']['Code'] == 'NoSuchKey':
    print('NoSuchKey')

回答 5

您可以使用S3Fs,它实际上是boto3的包装,它公开了典型的文件系统样式操作:

import s3fs
s3 = s3fs.S3FileSystem()
s3.exists('myfile.txt')

You can use S3Fs, which is essentially a wrapper around boto3 that exposes typical file-system style operations:

import s3fs
s3 = s3fs.S3FileSystem()
s3.exists('myfile.txt')

回答 6

import boto3
client = boto3.client('s3')
s3_key = 'Your file without bucket name e.g. abc/bcd.txt'
bucket = 'your bucket name'
content = client.head_object(Bucket=bucket,Key=s3_key)
    if content.get('ResponseMetadata',None) is not None:
        print "File exists - s3://%s/%s " %(bucket,s3_key) 
    else:
        print "File does not exist - s3://%s/%s " %(bucket,s3_key)
import boto3
client = boto3.client('s3')
s3_key = 'Your file without bucket name e.g. abc/bcd.txt'
bucket = 'your bucket name'
content = client.head_object(Bucket=bucket,Key=s3_key)
    if content.get('ResponseMetadata',None) is not None:
        print "File exists - s3://%s/%s " %(bucket,s3_key) 
    else:
        print "File does not exist - s3://%s/%s " %(bucket,s3_key)

回答 7

FWIW,这是我正在使用的非常简单的功能

import boto3

def get_resource(config: dict={}):
    """Loads the s3 resource.

    Expects AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to be in the environment
    or in a config dictionary.
    Looks in the environment first."""

    s3 = boto3.resource('s3',
                        aws_access_key_id=os.environ.get(
                            "AWS_ACCESS_KEY_ID", config.get("AWS_ACCESS_KEY_ID")),
                        aws_secret_access_key=os.environ.get("AWS_SECRET_ACCESS_KEY", config.get("AWS_SECRET_ACCESS_KEY")))
    return s3


def get_bucket(s3, s3_uri: str):
    """Get the bucket from the resource.
    A thin wrapper, use with caution.

    Example usage:

    >> bucket = get_bucket(get_resource(), s3_uri_prod)"""
    return s3.Bucket(s3_uri)


def isfile_s3(bucket, key: str) -> bool:
    """Returns T/F whether the file exists."""
    objs = list(bucket.objects.filter(Prefix=key))
    return len(objs) == 1 and objs[0].key == key


def isdir_s3(bucket, key: str) -> bool:
    """Returns T/F whether the directory exists."""
    objs = list(bucket.objects.filter(Prefix=key))
    return len(objs) > 1

FWIW, here are the very simple functions that I am using

import boto3

def get_resource(config: dict={}):
    """Loads the s3 resource.

    Expects AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to be in the environment
    or in a config dictionary.
    Looks in the environment first."""

    s3 = boto3.resource('s3',
                        aws_access_key_id=os.environ.get(
                            "AWS_ACCESS_KEY_ID", config.get("AWS_ACCESS_KEY_ID")),
                        aws_secret_access_key=os.environ.get("AWS_SECRET_ACCESS_KEY", config.get("AWS_SECRET_ACCESS_KEY")))
    return s3


def get_bucket(s3, s3_uri: str):
    """Get the bucket from the resource.
    A thin wrapper, use with caution.

    Example usage:

    >> bucket = get_bucket(get_resource(), s3_uri_prod)"""
    return s3.Bucket(s3_uri)


def isfile_s3(bucket, key: str) -> bool:
    """Returns T/F whether the file exists."""
    objs = list(bucket.objects.filter(Prefix=key))
    return len(objs) == 1 and objs[0].key == key


def isdir_s3(bucket, key: str) -> bool:
    """Returns T/F whether the directory exists."""
    objs = list(bucket.objects.filter(Prefix=key))
    return len(objs) > 1

回答 8

假设您只想检查密钥是否存在(而不是悄悄地覆盖它),请首先执行以下检查:

import boto3

def key_exists(mykey, mybucket):
  s3_client = boto3.client('s3')
  response = s3_client.list_objects_v2(Bucket=mybucket, Prefix=mykey)
  if response:
      for obj in response['Contents']:
          if mykey == obj['Key']:
              return True
  return False

if key_exists('someprefix/myfile-abc123', 'my-bucket-name'):
    print("key exists")
else:
    print("safe to put new bucket object")
    # try:
    #     resp = s3_client.put_object(Body="Your string or file-like object",
    #                                 Bucket=mybucket,Key=mykey)
    # ...check resp success and ClientError exception for errors...

Assuming you just want to check if a key exists (instead of quietly over-writing it), do this check first:

import boto3

def key_exists(mykey, mybucket):
  s3_client = boto3.client('s3')
  response = s3_client.list_objects_v2(Bucket=mybucket, Prefix=mykey)
  if response:
      for obj in response['Contents']:
          if mykey == obj['Key']:
              return True
  return False

if key_exists('someprefix/myfile-abc123', 'my-bucket-name'):
    print("key exists")
else:
    print("safe to put new bucket object")
    # try:
    #     resp = s3_client.put_object(Body="Your string or file-like object",
    #                                 Bucket=mybucket,Key=mykey)
    # ...check resp success and ClientError exception for errors...

回答 9

试试这个简单

import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('mybucket_name') # just Bucket name
file_name = 'A/B/filename.txt'      # full file path
obj = list(bucket.objects.filter(Prefix=file_name))
if len(obj) > 0:
    print("Exists")
else:
    print("Not Exists")

Try This simple

import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('mybucket_name') # just Bucket name
file_name = 'A/B/filename.txt'      # full file path
obj = list(bucket.objects.filter(Prefix=file_name))
if len(obj) > 0:
    print("Exists")
else:
    print("Not Exists")

回答 10

这可以同时检查前缀和密钥,并最多获取1个密钥。

def prefix_exits(bucket, prefix):
    s3_client = boto3.client('s3')
    res = s3_client.list_objects_v2(Bucket=bucket, Prefix=prefix, MaxKeys=1)
    return 'Contents' in res

This could check both prefix and key, and fetches at most 1 key.

def prefix_exits(bucket, prefix):
    s3_client = boto3.client('s3')
    res = s3_client.list_objects_v2(Bucket=bucket, Prefix=prefix, MaxKeys=1)
    return 'Contents' in res

回答 11

如果目录或存储桶中的少于1000个,则可以获取它们的集合,然后检查该集合中是否包含此类密钥:

files_in_dir = {d['Key'].split('/')[-1] for d in s3_client.list_objects_v2(
Bucket='mybucket',
Prefix='my/dir').get('Contents') or []}

即使my/dir不存在,这样的代码也可以工作。

http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.list_objects_v2

If you have less than 1000 in a directory or bucket you can get set of them and after check if such key in this set:

files_in_dir = {d['Key'].split('/')[-1] for d in s3_client.list_objects_v2(
Bucket='mybucket',
Prefix='my/dir').get('Contents') or []}

Such code works even if my/dir is not exists.

http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.list_objects_v2


回答 12

S3_REGION="eu-central-1"
bucket="mybucket1"
name="objectname"

import boto3
from botocore.client import Config
client = boto3.client('s3',region_name=S3_REGION,config=Config(signature_version='s3v4'))
list = client.list_objects_v2(Bucket=bucket,Prefix=name)
for obj in list.get('Contents', []):
    if obj['Key'] == name: return True
return False
S3_REGION="eu-central-1"
bucket="mybucket1"
name="objectname"

import boto3
from botocore.client import Config
client = boto3.client('s3',region_name=S3_REGION,config=Config(signature_version='s3v4'))
list = client.list_objects_v2(Bucket=bucket,Prefix=name)
for obj in list.get('Contents', []):
    if obj['Key'] == name: return True
return False

回答 13

对于boto3,可以使用ObjectSummary检查对象是否存在。

包含存储在Amazon S3存储桶中的对象的摘要。该对象不包含该对象的完整元数据或其任何内容

import boto3
from botocore.errorfactory import ClientError
def path_exists(path, bucket_name):
    """Check to see if an object exists on S3"""
    s3 = boto3.resource('s3')
    try:
        s3.ObjectSummary(bucket_name=bucket_name, key=path).load()
    except ClientError as e:
        if e.response['Error']['Code'] == "404":
            return False
        else:
            raise e
    return True

path_exists('path/to/file.html')

ObjectSummary.load中

调用s3.Client.head_object以更新ObjectSummary资源的属性。

这表明您可以使用,ObjectSummary而不是Object如果您打算不使用get()。该load()函数不检索对象,仅获取摘要。

For boto3, ObjectSummary can be used to check if an object exists.

Contains the summary of an object stored in an Amazon S3 bucket. This object doesn’t contain contain the object’s full metadata or any of its contents

import boto3
from botocore.errorfactory import ClientError
def path_exists(path, bucket_name):
    """Check to see if an object exists on S3"""
    s3 = boto3.resource('s3')
    try:
        s3.ObjectSummary(bucket_name=bucket_name, key=path).load()
    except ClientError as e:
        if e.response['Error']['Code'] == "404":
            return False
        else:
            raise e
    return True

path_exists('path/to/file.html')

In ObjectSummary.load

Calls s3.Client.head_object to update the attributes of the ObjectSummary resource.

This shows that you can use ObjectSummary instead of Object if you are planning on not using get(). The load() function does not retrieve the object it only obtains the summary.


回答 14

这是一个对我有用的解决方案。一个警告是我提前知道密钥的确切格式,所以我只列出单个文件

import boto3

# The s3 base class to interact with S3
class S3(object):
  def __init__(self):
    self.s3_client = boto3.client('s3')

  def check_if_object_exists(self, s3_bucket, s3_key):
    response = self.s3_client.list_objects(
      Bucket = s3_bucket,
      Prefix = s3_key
      )
    if 'ETag' in str(response):
      return True
    else:
      return False

if __name__ == '__main__':
  s3  = S3()
  if s3.check_if_object_exists(bucket, key):
    print "Found S3 object."
  else:
    print "No object found."

Here is a solution that works for me. One caveat is that I know the exact format of the key ahead of time, so I am only listing the single file

import boto3

# The s3 base class to interact with S3
class S3(object):
  def __init__(self):
    self.s3_client = boto3.client('s3')

  def check_if_object_exists(self, s3_bucket, s3_key):
    response = self.s3_client.list_objects(
      Bucket = s3_bucket,
      Prefix = s3_key
      )
    if 'ETag' in str(response):
      return True
    else:
      return False

if __name__ == '__main__':
  s3  = S3()
  if s3.check_if_object_exists(bucket, key):
    print "Found S3 object."
  else:
    print "No object found."

回答 15

您可以为此使用Boto3。

import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket')
objs = list(bucket.objects.filter(Prefix=key))
if(len(objs)>0):
    print("key exists!!")
else:
    print("key doesn't exist!")

此处的关键是您要检查的路径是否存在

you can use Boto3 for this.

import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket')
objs = list(bucket.objects.filter(Prefix=key))
if(len(objs)>0):
    print("key exists!!")
else:
    print("key doesn't exist!")

Here key is the path you want to check exists or not


回答 16

get()方法 真的很简单

import botocore
from boto3.session import Session
session = Session(aws_access_key_id='AWS_ACCESS_KEY',
                aws_secret_access_key='AWS_SECRET_ACCESS_KEY')
s3 = session.resource('s3')
bucket_s3 = s3.Bucket('bucket_name')

def not_exist(file_key):
    try:
        file_details = bucket_s3.Object(file_key).get()
        # print(file_details) # This line prints the file details
        return False
    except botocore.exceptions.ClientError as e:
        if e.response['Error']['Code'] == "NoSuchKey": # or you can check with e.reponse['HTTPStatusCode'] == '404'
            return True
        return False # For any other error it's hard to determine whether it exists or not. so based on the requirement feel free to change it to True/ False / raise Exception

print(not_exist('hello_world.txt')) 

It’s really simple with get() method

import botocore
from boto3.session import Session
session = Session(aws_access_key_id='AWS_ACCESS_KEY',
                aws_secret_access_key='AWS_SECRET_ACCESS_KEY')
s3 = session.resource('s3')
bucket_s3 = s3.Bucket('bucket_name')

def not_exist(file_key):
    try:
        file_details = bucket_s3.Object(file_key).get()
        # print(file_details) # This line prints the file details
        return False
    except botocore.exceptions.ClientError as e:
        if e.response['Error']['Code'] == "NoSuchKey": # or you can check with e.reponse['HTTPStatusCode'] == '404'
            return True
        return False # For any other error it's hard to determine whether it exists or not. so based on the requirement feel free to change it to True/ False / raise Exception

print(not_exist('hello_world.txt')) 

回答 17

有一种简单的方法可以检查S3存储桶中的文件是否存在。我们不需要为此使用异常

sesssion = boto3.Session(aws_access_key_id, aws_secret_access_key)
s3 = session.client('s3')

object_name = 'filename'
bucket = 'bucketname'
obj_status = s3.list_objects(Bucket = bucket, Prefix = object_name)
if obj_status.get('Contents'):
    print("File exists")
else:
    print("File does not exists")

There is one simple way by which we can check if file exists or not in S3 bucket. We donot need to use exception for this

sesssion = boto3.Session(aws_access_key_id, aws_secret_access_key)
s3 = session.client('s3')

object_name = 'filename'
bucket = 'bucketname'
obj_status = s3.list_objects(Bucket = bucket, Prefix = object_name)
if obj_status.get('Contents'):
    print("File exists")
else:
    print("File does not exists")

回答 18

如果您寻找与目录等效的键,则可能需要这种方法

session = boto3.session.Session()
resource = session.resource("s3")
bucket = resource.Bucket('mybucket')

key = 'dir-like-or-file-like-key'
objects = [o for o in bucket.objects.filter(Prefix=key).limit(1)]    
has_key = len(objects) > 0

这适用于父密钥或等同于file的密钥或不存在的密钥。我尝试了上面喜欢的方法,但在父键上失败了。

If you seek a key that is equivalent to a directory then you might want this approach

session = boto3.session.Session()
resource = session.resource("s3")
bucket = resource.Bucket('mybucket')

key = 'dir-like-or-file-like-key'
objects = [o for o in bucket.objects.filter(Prefix=key).limit(1)]    
has_key = len(objects) > 0

This works for a parent key or a key that equates to file or a key that does not exist. I tried the favored approach above and failed on parent keys.


回答 19

我注意到,仅仅为了捕获异常,botocore.exceptions.ClientError我们需要安装botocore。botocore占用36M的磁盘空间。如果我们使用aws lambda函数,这尤其会产生影响。代替的是,如果我们只使用异常,那么我们可以跳过使用额外的库!

  • 我正在验证文件扩展名为“ .csv”
  • 如果存储桶不存在,则不会抛出异常!
  • 如果存储桶存在但对象不存在,则不会引发异常!
  • 如果存储桶为空,则会抛出异常!
  • 如果存储桶没有权限,则会抛出异常!

代码看起来像这样。请分享您的想法:

import boto3
import traceback

def download4mS3(s3bucket, s3Path, localPath):
    s3 = boto3.resource('s3')

    print('Looking for the csv data file ending with .csv in bucket: ' + s3bucket + ' path: ' + s3Path)
    if s3Path.endswith('.csv') and s3Path != '':
        try:
            s3.Bucket(s3bucket).download_file(s3Path, localPath)
        except Exception as e:
            print(e)
            print(traceback.format_exc())
            if e.response['Error']['Code'] == "404":
                print("Downloading the file from: [", s3Path, "] failed")
                exit(12)
            else:
                raise
        print("Downloading the file from: [", s3Path, "] succeeded")
    else:
        print("csv file not found in in : [", s3Path, "]")
        exit(12)

I noticed that just for catching the exception using botocore.exceptions.ClientError we need to install botocore. botocore takes up 36M of disk space. This is particularly impacting if we use aws lambda functions. In place of that if we just use exception then we can skip using the extra library!

  • I am validating for the file extension to be ‘.csv’
  • This will not throw an exception if the bucket does not exist!
  • This will not throw an exception if the bucket exists but object does not exist!
  • This throws out an exception if the bucket is empty!
  • This throws out an exception if the bucket has no permissions!

The code looks like this. Please share your thoughts:

import boto3
import traceback

def download4mS3(s3bucket, s3Path, localPath):
    s3 = boto3.resource('s3')

    print('Looking for the csv data file ending with .csv in bucket: ' + s3bucket + ' path: ' + s3Path)
    if s3Path.endswith('.csv') and s3Path != '':
        try:
            s3.Bucket(s3bucket).download_file(s3Path, localPath)
        except Exception as e:
            print(e)
            print(traceback.format_exc())
            if e.response['Error']['Code'] == "404":
                print("Downloading the file from: [", s3Path, "] failed")
                exit(12)
            else:
                raise
        print("Downloading the file from: [", s3Path, "] succeeded")
    else:
        print("csv file not found in in : [", s3Path, "]")
        exit(12)

回答 20

紧随线程之后,有人可以得出结论,哪一种是检查S3中是否存在对象的最有效方法?

我认为head_object可能会获胜,因为它只是检查比实际对象本身更浅的元数据

Just following the thread, can someone conclude which one is the most efficient way to check if an object exists in S3?

I think head_object might win as it just checks the metadata which is lighter than the actual object itself


回答 21

https://www.peterbe.com/plog/fastest-way-to-find-out-if-a-file-exists-in-s3中指出,这是最快的方法:

import boto3

boto3_session = boto3.session.Session()
s3_session_client = boto3_session.client("s3")
response = s3_session_client.list_objects_v2(
    Bucket=bc_df_caches_bucket, Prefix=s3_key
)
for obj in response.get("Contents", []):
    if obj["Key"] == s3_key:
        return True
return False

From https://www.peterbe.com/plog/fastest-way-to-find-out-if-a-file-exists-in-s3 this is pointed out to be the fastest method:

import boto3

boto3_session = boto3.session.Session()
s3_session_client = boto3_session.client("s3")
response = s3_session_client.list_objects_v2(
    Bucket=bc_df_caches_bucket, Prefix=s3_key
)
for obj in response.get("Contents", []):
    if obj["Key"] == s3_key:
        return True
return False

回答 22

退房

bucket.get_key(
    key_name, 
    headers=None, 
    version_id=None, 
    response_headers=None, 
    validate=True
)

检查存储桶中是否存在特定密钥。此方法使用HEAD请求检查密钥是否存在。返回:Key对象的实例或None

来自Boto S3 Docs

您可以只调用bucket.get_key(keyname)并检查返回的对象是否为None。

Check out

bucket.get_key(
    key_name, 
    headers=None, 
    version_id=None, 
    response_headers=None, 
    validate=True
)

Check to see if a particular key exists within the bucket. This method uses a HEAD request to check for the existence of the key. Returns: An instance of a Key object or None

from Boto S3 Docs

You can just call bucket.get_key(keyname) and check if the returned object is None.


使用Boto3将S3对象作为字符串打开

问题:使用Boto3将S3对象作为字符串打开

我知道,使用Boto 2,可以使用以下命令将S3对象作为字符串打开: get_contents_as_string()

boto3中有等效功能吗?

I’m aware that with Boto 2 it’s possible to open an S3 object as a string with: get_contents_as_string()

Is there an equivalent function in boto3 ?


回答 0

read将返回字节。至少对于Python 3,如果要返回字符串,则必须使用正确的编码进行解码:

import boto3

s3 = boto3.resource('s3')

obj = s3.Object(bucket, key)
obj.get()['Body'].read().decode('utf-8') 

read will return bytes. At least for Python 3, if you want to return a string, you have to decode using the right encoding:

import boto3

s3 = boto3.resource('s3')

obj = s3.Object(bucket, key)
obj.get()['Body'].read().decode('utf-8') 

回答 1

由于.get()在AWS Lambda 中使用Python 2.7,我无法从S3读取/解析对象。

我在示例中添加了json以表明它可解析:)

import boto3
import json

s3 = boto3.client('s3')

obj = s3.get_object(Bucket=bucket, Key=key)
j = json.loads(obj['Body'].read())

注意(对于python 2.7):我的对象都是ascii,所以我不需要 .decode('utf-8')

注意(对于python 3.6及更高版本):我们移至python 3.6并发现read()现在返回了,bytes因此,如果要从中获取字符串,则必须使用:

j = json.loads(obj['Body'].read().decode('utf-8'))

I had a problem to read/parse the object from S3 because of .get() using Python 2.7 inside an AWS Lambda.

I added json to the example to show it became parsable :)

import boto3
import json

s3 = boto3.client('s3')

obj = s3.get_object(Bucket=bucket, Key=key)
j = json.loads(obj['Body'].read())

NOTE (for python 2.7): My object is all ascii, so I don’t need .decode('utf-8')

NOTE (for python 3.6+): We moved to python 3.6 and discovered that read() now returns bytes so if you want to get a string out of it, you must use:

j = json.loads(obj['Body'].read().decode('utf-8'))


回答 2

boto3文档中没有此内容。这为我工作:

object.get()["Body"].read()

对象是s3对象:http : //boto3.readthedocs.org/en/latest/reference/services/s3.html#object

This isn’t in the boto3 documentation. This worked for me:

object.get()["Body"].read()

object being an s3 object: http://boto3.readthedocs.org/en/latest/reference/services/s3.html#object


回答 3

Python3 +使用boto3 API方法。

通过使用S3.Client.download_fileobj API类似Python文件的对象,可以将S3对象的内容检索到内存中。

由于检索到的内容是字节,因此为了转换为str,需要对其进行解码。

import io
import boto3

client = boto3.client('s3')
bytes_buffer = io.BytesIO()
client.download_fileobj(Bucket=bucket_name, Key=object_key, Fileobj=bytes_buffer)
byte_value = bytes_buffer.getvalue()
str_value = byte_value.decode() #python3, default decoding is utf-8

Python3 + Using boto3 API approach.

By using S3.Client.download_fileobj API and Python file-like object, S3 Object content can be retrieved to memory.

Since the retrieved content is bytes, in order to convert to str, it need to be decoded.

import io
import boto3

client = boto3.client('s3')
bytes_buffer = io.BytesIO()
client.download_fileobj(Bucket=bucket_name, Key=object_key, Fileobj=bytes_buffer)
byte_value = bytes_buffer.getvalue()
str_value = byte_value.decode() #python3, default decoding is utf-8

回答 4

如果body包含io.StringIO,则必须执行以下操作:

object.get()['Body'].getvalue()

If body contains a io.StringIO, you have to do like below:

object.get()['Body'].getvalue()

boto3客户端NoRegionError:仅在某些情况下必须指定区域错误

问题:boto3客户端NoRegionError:仅在某些情况下必须指定区域错误

我有一个boto3客户程序:

boto3.client('kms')

但是它发生在新机器上,它们动态地打开和关闭。

    if endpoint is None:
        if region_name is None:
            # Raise a more specific error message that will give
            # better guidance to the user what needs to happen.
            raise NoRegionError()

为什么会这样呢?为什么只有部分时间呢?

I have a boto3 client :

boto3.client('kms')

But it happens on new machines, They open and close dynamically.

    if endpoint is None:
        if region_name is None:
            # Raise a more specific error message that will give
            # better guidance to the user what needs to happen.
            raise NoRegionError()

Why is this happening? and why only part of the time?


回答 0

您必须以一种或另一种方式告诉boto3您希望在哪个区域kms创建客户端。可以使用region_name参数明确地完成此操作,如下所示:

kms = boto3.client('kms', region_name='us-west-2')

或者,您可以在~/.aws/config文件中拥有一个与个人资料相关联的默认区域,如下所示:

[default]
region=us-west-2

或者您可以使用如下环境变量:

export AWS_DEFAULT_REGION=us-west-2

但您确实需要告诉boto3使用哪个区域。

One way or another you must tell boto3 in which region you wish the kms client to be created. This could be done explicitly using the region_name parameter as in:

kms = boto3.client('kms', region_name='us-west-2')

or you can have a default region associated with your profile in your ~/.aws/config file as in:

[default]
region=us-west-2

or you can use an environment variable as in:

export AWS_DEFAULT_REGION=us-west-2

but you do need to tell boto3 which region to use.


回答 1

os.environ['AWS_DEFAULT_REGION'] = 'your_region_name'

就我而言,敏感性很重要。

os.environ['AWS_DEFAULT_REGION'] = 'your_region_name'

In my case sensitivity mattered.


回答 2

我相信默认情况下,boto会选择aws cli中设置的区域。您可以运行#aws configure命令,然后按Enter键(它显示您在aws cli中使用区域设置的凭据)两次以确认您的区域。

I believe, by default, boto picks the region which is set in aws cli. You can run command #aws configure and press enter (it shows what creds you have set in aws cli with region)twice to confirm your region.


回答 3

您还可以在脚本本身中设置环境变量,而不是传递region_name参数

os.environ['AWS_DEFAULT_REGION'] = 'your_region_name'

区分大小写可能很重要。

you can also set environment variables in the script itself, rather than passing region_name parameter

os.environ['AWS_DEFAULT_REGION'] = 'your_region_name'

case sensitivity may matter.


回答 4

对于Python 2,我发现,~/.aws/config如果在默认的其他配置文件中定义了区域,则boto3库不会从中获取区域。因此,您必须在会话创建中对其进行定义。

session = boto3.Session(
    profile_name='NotDefault',
    region_name='ap-southeast-2'
)

print(session.available_profiles)

client = session.client(
    'ec2'
)

我的~/.aws/config文件如下所示:

[default]
region=ap-southeast-2

[NotDefault]
region=ap-southeast-2

之所以这样做,是因为我将不同的配置文件用于对AWS,Personal和Work的不同登录。

For Python 2 I have found that the boto3 library does not source the region from the ~/.aws/config if the region is defined in a different profile to default. So you have to define it in the session creation.

session = boto3.Session(
    profile_name='NotDefault',
    region_name='ap-southeast-2'
)

print(session.available_profiles)

client = session.client(
    'ec2'
)

Where my ~/.aws/config file looks like this:

[default]
region=ap-southeast-2

[NotDefault]
region=ap-southeast-2

I do this because I use different profiles for different logins to AWS, Personal and Work.


回答 5

对于那些使用CloudFormation模板的人。您可以AWS_DEFAULT_REGION使用UserData和设置环境变量AWS::Region。例如,

MyInstance1:
    Type: AWS::EC2::Instance                
    Properties:                           
        ImageId: ami-04b9e92b5572fa0d1 #ubuntu
        InstanceType: t2.micro
        UserData: 
            Fn::Base64: !Sub |
                    #!/bin/bash -x

                    echo "export AWS_DEFAULT_REGION=${AWS::Region}" >> /etc/profile

For those using CloudFormation template. You can set AWS_DEFAULT_REGION environment variable using UserData and AWS::Region. For example,

MyInstance1:
    Type: AWS::EC2::Instance                
    Properties:                           
        ImageId: ami-04b9e92b5572fa0d1 #ubuntu
        InstanceType: t2.micro
        UserData: 
            Fn::Base64: !Sub |
                    #!/bin/bash -x

                    echo "export AWS_DEFAULT_REGION=${AWS::Region}" >> /etc/profile

AWS boto和boto3有什么区别[关闭]

问题:AWS boto和boto3有什么区别[关闭]

我是使用Python的AWS新手,并且正在尝试学习boto API,但是我注意到有两个主要的Python版本/软件包。那将是boto和boto3。

AWS boto库和boto3库之间有什么区别?

I’m new to AWS using Python and I’m trying to learn the boto API however I noticed that there are two major versions/packages for Python. That would be boto and boto3.

What is the difference between the AWS boto and boto3 libraries?


回答 0

博托包是手工编写Python库自2006年以来即一直围绕这是非常流行,并通过AWS是完全支持,但因为它是手工编码,有这么多的服务(有更多的出现所有的时间),它很难维护。

因此,boto3是基于botocore的boto库的新版本。AWS的所有低级接口均由JSON服务描述驱动,而JSON服务描述是根据服务的规范描述自动生成的。因此,界面始终正确且始终是最新的。客户端层之上有一个资源层,它提供了一个更好的,更具Pythonic的界面。

AWS正在积极开发boto3库,如果人们开始新的开发,我会建议他使用它。

The boto package is the hand-coded Python library that has been around since 2006. It is very popular and is fully supported by AWS but because it is hand-coded and there are so many services available (with more appearing all the time) it is difficult to maintain.

So, boto3 is a new version of the boto library based on botocore. All of the low-level interfaces to AWS are driven from JSON service descriptions that are generated automatically from the canonical descriptions of the services. So, the interfaces are always correct and always up to date. There is a resource layer on top of the client-layer that provides a nicer, more Pythonic interface.

The boto3 library is being actively developed by AWS and is the one I would recommend people use if they are starting new development.


资源,客户端和会话之间的boto3差异?

问题:资源,客户端和会话之间的boto3差异?

我在Ubuntu 16.04 LTS中使用Python 2.7.12。我正在通过以下链接学习如何使用boto3:https ://boto3.readthedocs.io/en/latest/guide/quickstart.html#using-boto-3 。我的疑问是何时使用资源,客户端或会话及其各自的功能。

I am using Python 2.7.12 in Ubuntu 16.04 LTS. I’m learning how to use boto3 from the following link: https://boto3.readthedocs.io/en/latest/guide/quickstart.html#using-boto-3. My doubt is when to use resource, client, or session, and their respective functionality.


回答 0

这里是有关ClientResourceSession的所有更详细的信息。

客户:

  • 低级AWS服务访问
  • 从AWS 服务描述生成
  • 向开发人员展示botocore客户端
  • 通常使用AWS服务API 1:1映射
  • 客户端支持所有AWS服务操作
  • 蛇形方法名称(例如ListBuckets API => list_buckets方法)

以下是客户端级别访问S3存储桶的对象(最多1000 **)的示例:

import boto3

client = boto3.client('s3')
response = client.list_objects_v2(Bucket='mybucket')
for content in response['Contents']:
    obj_dict = client.get_object(Bucket='mybucket', Key=content['Key'])
    print(content['Key'], obj_dict['LastModified'])

**您必须使用分页器,或实现自己的循环,如果数量超过1000,则使用连续标记重复调用list_objects()。

资源:

  • 更高级别的,面向对象的API
  • 根据资源描述生成
  • 使用标识符和属性
  • 有行动(对资源的操作)
  • 公开子资源和AWS资源集合
  • 不提供AWS服务的100%API覆盖率

以下是使用资源级别访问S3存储桶的对象(全部)的等效示例:

import boto3

s3 = boto3.resource('s3')
bucket = s3.Bucket('mybucket')
for obj in bucket.objects.all():
    print(obj.key, obj.last_modified)

请注意,在这种情况下,您无需进行第二次API调用即可获取对象。您可以将其作为存储桶中的集合使用。这些子资源集合是延迟加载的。

您可以看到Resource该代码的版本更简单,更紧凑并且具有更多功能(它可以为您分页)。Client如果要包括分页,代码的版本实际上比上面显示的要复杂。

会议:

  • 存储配置信息(主要是凭据和所选区域)
  • 允许您创建服务客户端和资源
  • boto3在需要时为您创建一个默认会话

re:Invent video入门视频是了解这些boto3概念的有用资源。

Here’s some more detailed information on what Client, Resource, and Session are all about.

Client:

  • low-level AWS service access
  • generated from AWS service description
  • exposes botocore client to the developer
  • typically maps 1:1 with the AWS service API
  • all AWS service operations are supported by clients
  • snake-cased method names (e.g. ListBuckets API => list_buckets method)

Here’s an example of client-level access to an S3 bucket’s objects (at most 1000**):

import boto3

client = boto3.client('s3')
response = client.list_objects_v2(Bucket='mybucket')
for content in response['Contents']:
    obj_dict = client.get_object(Bucket='mybucket', Key=content['Key'])
    print(content['Key'], obj_dict['LastModified'])

** you would have to use a paginator, or implement your own loop, calling list_objects() repeatedly with a continuation marker if there were more than 1000.

Resource:

  • higher-level, object-oriented API
  • generated from resource description
  • uses identifiers and attributes
  • has actions (operations on resources)
  • exposes subresources and collections of AWS resources
  • does not provide 100% API coverage of AWS services

Here’s the equivalent example using resource-level access to an S3 bucket’s objects (all):

import boto3

s3 = boto3.resource('s3')
bucket = s3.Bucket('mybucket')
for obj in bucket.objects.all():
    print(obj.key, obj.last_modified)

Note that in this case you do not have to make a second API call to get the objects; they’re available to you as a collection on the bucket. These collections of subresources are lazily-loaded.

You can see that the Resource version of the code is much simpler, more compact, and has more capability (it does pagination for you). The Client version of the code would actually be more complicated than shown above if you wanted to include pagination.

Session:

  • stores configuration information (primarily credentials and selected region)
  • allows you to create service clients and resources
  • boto3 creates a default session for you when needed

A useful resource to learn more about these boto3 concepts is the introductory re:Invent video.


回答 1

我将尽力解释它。因此,不能保证实际条款的准确性。

会话是启动与AWS服务的连接的地方。例如,以下是使用默认凭据配置文件的默认会话(例如〜/ .aws / credentials,或使用IAM实例配置文件假设您的EC2)

sqs = boto3.client('sqs')
s3 = boto3.resource('s3')

由于默认会话仅限于使用的配置文件或实例配置文件,因此有时您需要使用自定义会话来覆盖默认会话配置(例如region_name,endpoint_url等),例如

# custom resource session must use boto3.Session to do the override
my_west_session = boto3.Session(region_name = 'us-west-2')
my_east_session = boto3.Session(region_name = 'us-east-1')
backup_s3 = my_west_session.resource('s3')
video_s3 = my_east_session.resource('s3')

# you have two choices of create custom client session. 
backup_s3c = my_west_session.client('s3')
video_s3c = boto3.client("s3", region_name = 'us-east-1')

资源:这是建议使用的高级服务类。这使您可以绑定特定的AWS资源并将其传递,因此您只需要使用此抽象,就不必担心要指向哪个目标服务。从会话部分可以注意到,如果您有一个自定义会话,则只需传递此抽象对象,而不必担心所有自定义区域等的传递。以下是一个复杂的示例,例如

import boto3 
my_west_session = boto3.Session(region_name = 'us-west-2')
my_east_session = boto3.Session(region_name = 'us-east-1')
backup_s3 = my_west_session.resource("s3")
video_s3 = my_east_session.resource("s3")
backup_bucket = backup_s3.Bucket('backupbucket') 
video_bucket = video_s3.Bucket('videobucket')

# just pass the instantiated bucket object
def list_bucket_contents(bucket):
   for object in bucket.objects.all():
      print(object.key)

list_bucket_contents(backup_bucket)
list_bucket_contents(video_bucket)

客户端是低级别的对象。对于每个客户端调用,您需要显式指定目标资源,指定的服务目标名称必须传递很长时间。您将失去抽象能力。

例如,如果仅处理默认会话,则该外观类似于boto3.resource。

import boto3 
s3 = boto3.client('s3')

def list_bucket_contents(bucket_name):
   for object in s3.list_objects_v2(Bucket=bucket_name) :
      print(object.key)

list_bucket_contents('Mybucket') 

但是,如果要列出不同区域中存储桶中的对象,则需要指定客户端所需的显式存储桶参数。

import boto3 
backup_s3 = my_west_session.client('s3',region_name = 'us-west-2')
video_s3 = my_east_session.client('s3',region_name = 'us-east-1')

# you must pass boto3.Session.client and the bucket name 
def list_bucket_contents(s3session, bucket_name):
   response = s3session.list_objects_v2(Bucket=bucket_name)
   if 'Contents' in response:
     for obj in response['Contents']:
        print(obj['key'])

list_bucket_contents(backup_s3, 'backupbucket')
list_bucket_contents(video_s3 , 'videobucket') 

I’ll try and explain it as simple as possible. So there is no guarantee of the accuracy of the actual terms.

Session is where to initiate the connectivity to AWS services. E.g. following is default session that uses the default credential profile(e.g. ~/.aws/credentials, or assume your EC2 using IAM instance profile )

sqs = boto3.client('sqs')
s3 = boto3.resource('s3')

Because default session is limit to the profile or instance profile used, sometimes you need to use the custom session to override the default session configuration (e.g. region_name, endpoint_url, etc. ) e.g.

# custom resource session must use boto3.Session to do the override
my_west_session = boto3.Session(region_name = 'us-west-2')
my_east_session = boto3.Session(region_name = 'us-east-1')
backup_s3 = my_west_session.resource('s3')
video_s3 = my_east_session.resource('s3')

# you have two choices of create custom client session. 
backup_s3c = my_west_session.client('s3')
video_s3c = boto3.client("s3", region_name = 'us-east-1')

Resource : This is the high-level service class recommended to be used. This allows you to tied particular AWS resources and passes it along, so you just use this abstraction than worry which target services are pointed to. As you notice from the session part, if you have a custom session, you just pass this abstract object than worrying about all custom regions,etc to pass along. Following is a complicated example E.g.

import boto3 
my_west_session = boto3.Session(region_name = 'us-west-2')
my_east_session = boto3.Session(region_name = 'us-east-1')
backup_s3 = my_west_session.resource("s3")
video_s3 = my_east_session.resource("s3")
backup_bucket = backup_s3.Bucket('backupbucket') 
video_bucket = video_s3.Bucket('videobucket')

# just pass the instantiated bucket object
def list_bucket_contents(bucket):
   for object in bucket.objects.all():
      print(object.key)

list_bucket_contents(backup_bucket)
list_bucket_contents(video_bucket)

Client is a low level class object. For each client call, you need to explicitly specify the targeting resources, the designated service target name must be pass long. You will lose the abstraction ability.

For example, if you only deal with the default session, this looks similar to boto3.resource.

import boto3 
s3 = boto3.client('s3')

def list_bucket_contents(bucket_name):
   for object in s3.list_objects_v2(Bucket=bucket_name) :
      print(object.key)

list_bucket_contents('Mybucket') 

However, if you want to list objects from a bucket in different regions, you need to specify the explicit bucket parameter required for the client.

import boto3 
backup_s3 = my_west_session.client('s3',region_name = 'us-west-2')
video_s3 = my_east_session.client('s3',region_name = 'us-east-1')

# you must pass boto3.Session.client and the bucket name 
def list_bucket_contents(s3session, bucket_name):
   response = s3session.list_objects_v2(Bucket=bucket_name)
   if 'Contents' in response:
     for obj in response['Contents']:
        print(obj['key'])

list_bucket_contents(backup_s3, 'backupbucket')
list_bucket_contents(video_s3 , 'videobucket')