标签归档:file-upload

python请求文件上传

问题:python请求文件上传

我正在执行一个使用Python请求库上传文件的简单任务。我搜索了Stack Overflow,似乎没有人遇到相同的问题,即服务器未收到该文件:

import requests
url='http://nesssi.cacr.caltech.edu/cgi-bin/getmulticonedb_release2.cgi/post'
files={'files': open('file.txt','rb')}
values={'upload_file' : 'file.txt' , 'DB':'photcat' , 'OUT':'csv' , 'SHORT':'short'}
r=requests.post(url,files=files,data=values)

我用文件名填充了’upload_file’关键字的值,因为如果我将其保留为空白,则表示

Error - You must select a file to upload!

现在我明白了

File  file.txt  of size    bytes is  uploaded successfully!
Query service results:  There were 0 lines.

仅当文件为空时才会出现。因此,我对如何成功发送文件感到困惑。我知道该文件有效,因为如果我访问此网站并手动填写表格,它将返回一个很好的匹配对象列表,这就是我想要的。我非常感谢所有提示。

其他一些相关的线程(但不能回答我的问题):

I’m performing a simple task of uploading a file using Python requests library. I searched Stack Overflow and no one seemed to have the same problem, namely, that the file is not received by the server:

import requests
url='http://nesssi.cacr.caltech.edu/cgi-bin/getmulticonedb_release2.cgi/post'
files={'files': open('file.txt','rb')}
values={'upload_file' : 'file.txt' , 'DB':'photcat' , 'OUT':'csv' , 'SHORT':'short'}
r=requests.post(url,files=files,data=values)

I’m filling the value of ‘upload_file’ keyword with my filename, because if I leave it blank, it says

Error - You must select a file to upload!

And now I get

File  file.txt  of size    bytes is  uploaded successfully!
Query service results:  There were 0 lines.

Which comes up only if the file is empty. So I’m stuck as to how to send my file successfully. I know that the file works because if I go to this website and manually fill in the form it returns a nice list of matched objects, which is what I’m after. I’d really appreciate all hints.

Some other threads related (but not answering my problem):


回答 0

如果upload_file要作为文件,请使用:

files = {'upload_file': open('file.txt','rb')}
values = {'DB': 'photcat', 'OUT': 'csv', 'SHORT': 'short'}

r = requests.post(url, files=files, data=values)

并且requests将派遣一个多部分表单POST体与upload_file字段设置为内容file.txt的文件。

文件名将包含在特定字段的mime标头中:

>>> import requests
>>> open('file.txt', 'wb')  # create an empty demo file
<_io.BufferedWriter name='file.txt'>
>>> files = {'upload_file': open('file.txt', 'rb')}
>>> print(requests.Request('POST', 'http://example.com', files=files).prepare().body.decode('ascii'))
--c226ce13d09842658ffbd31e0563c6bd
Content-Disposition: form-data; name="upload_file"; filename="file.txt"


--c226ce13d09842658ffbd31e0563c6bd--

注意filename="file.txt"参数。

files如果需要更多控制,则可以使用元组作为映射值,其中包含2到4个元素。第一个元素是文件名,其后是内容,以及可选的content-type标头值和可选的附加标头映射:

files = {'upload_file': ('foobar.txt', open('file.txt','rb'), 'text/x-spam')}

这将设置备用文件名和内容类型,而忽略可选的标题。

如果您要从文件中提取整个POST正文(未指定其他字段),则不要使用files参数,只需将文件直接发布为即可data。然后,您可能还需要设置Content-Type标头,否则将不会设置任何标头。请参阅Python请求-文件中的POST数据

If upload_file is meant to be the file, use:

files = {'upload_file': open('file.txt','rb')}
values = {'DB': 'photcat', 'OUT': 'csv', 'SHORT': 'short'}

r = requests.post(url, files=files, data=values)

and requests will send a multi-part form POST body with the upload_file field set to the contents of the file.txt file.

The filename will be included in the mime header for the specific field:

>>> import requests
>>> open('file.txt', 'wb')  # create an empty demo file
<_io.BufferedWriter name='file.txt'>
>>> files = {'upload_file': open('file.txt', 'rb')}
>>> print(requests.Request('POST', 'http://example.com', files=files).prepare().body.decode('ascii'))
--c226ce13d09842658ffbd31e0563c6bd
Content-Disposition: form-data; name="upload_file"; filename="file.txt"


--c226ce13d09842658ffbd31e0563c6bd--

Note the filename="file.txt" parameter.

You can use a tuple for the files mapping value, with between 2 and 4 elements, if you need more control. The first element is the filename, followed by the contents, and an optional content-type header value and an optional mapping of additional headers:

files = {'upload_file': ('foobar.txt', open('file.txt','rb'), 'text/x-spam')}

This sets an alternative filename and content type, leaving out the optional headers.

If you are meaning the whole POST body to be taken from a file (with no other fields specified), then don’t use the files parameter, just post the file directly as data. You then may want to set a Content-Type header too, as none will be set otherwise. See Python requests – POST data from a file.


回答 1

(2018)新的python请求库简化了此过程,我们可以使用’files’变量表示我们要上传经过多部分编码的文件

url = 'http://httpbin.org/post'
files = {'file': open('report.xls', 'rb')}

r = requests.post(url, files=files)
r.text

(2018) the new python requests library has simplified this process, we can use the ‘files’ variable to signal that we want to upload a multipart-encoded file

url = 'http://httpbin.org/post'
files = {'file': open('report.xls', 'rb')}

r = requests.post(url, files=files)
r.text

回答 2

客户上传

如果要使用Python requests库上传单个文件,则请求lib 支持流上传,这使您无需读取内存即可发送大文件或流。

with open('massive-body', 'rb') as f:
    requests.post('http://some.url/streamed', data=f)

服务器端

然后将文件存储在server.py侧面,这样就可以将流保存到文件中而不加载到内存中。以下是使用Flask文件上传的示例。

@app.route("/upload", methods=['POST'])
def upload_file():
    from werkzeug.datastructures import FileStorage
    FileStorage(request.stream).save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
    return 'OK', 200

或使用修复程序中提到的werkzeug表单数据解析来解决“ 大文件上传占用内存 ”的问题,以避免在大文件上传时(约60秒内无效使用 st 22 GiB文件。) 13 MiB。)。

@app.route("/upload", methods=['POST'])
def upload_file():
    def custom_stream_factory(total_content_length, filename, content_type, content_length=None):
        import tempfile
        tmpfile = tempfile.NamedTemporaryFile('wb+', prefix='flaskapp', suffix='.nc')
        app.logger.info("start receiving file ... filename => " + str(tmpfile.name))
        return tmpfile

    import werkzeug, flask
    stream, form, files = werkzeug.formparser.parse_form_data(flask.request.environ, stream_factory=custom_stream_factory)
    for fil in files.values():
        app.logger.info(" ".join(["saved form name", fil.name, "submitted as", fil.filename, "to temporary file", fil.stream.name]))
        # Do whatever with stored file at `fil.stream.name`
    return 'OK', 200

Client Upload

If you want to upload a single file with Python requests library, then requests lib supports streaming uploads, which allow you to send large files or streams without reading into memory.

with open('massive-body', 'rb') as f:
    requests.post('http://some.url/streamed', data=f)

Server Side

Then store the file on the server.py side such that save the stream into file without loading into the memory. Following is an example with using Flask file uploads.

@app.route("/upload", methods=['POST'])
def upload_file():
    from werkzeug.datastructures import FileStorage
    FileStorage(request.stream).save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
    return 'OK', 200

Or use werkzeug Form Data Parsing as mentioned in a fix for the issue of “large file uploads eating up memory” in order to avoid using memory inefficiently on large files upload (s.t. 22 GiB file in ~60 seconds. Memory usage is constant at about 13 MiB.).

@app.route("/upload", methods=['POST'])
def upload_file():
    def custom_stream_factory(total_content_length, filename, content_type, content_length=None):
        import tempfile
        tmpfile = tempfile.NamedTemporaryFile('wb+', prefix='flaskapp', suffix='.nc')
        app.logger.info("start receiving file ... filename => " + str(tmpfile.name))
        return tmpfile

    import werkzeug, flask
    stream, form, files = werkzeug.formparser.parse_form_data(flask.request.environ, stream_factory=custom_stream_factory)
    for fil in files.values():
        app.logger.info(" ".join(["saved form name", fil.name, "submitted as", fil.filename, "to temporary file", fil.stream.name]))
        # Do whatever with stored file at `fil.stream.name`
    return 'OK', 200

回答 3

在Ubuntu中,您可以采用这种方式,

将文件保存在某个位置(临时),然后打开并发送给API

      path = default_storage.save('static/tmp/' + f1.name, ContentFile(f1.read()))
      path12 = os.path.join(os.getcwd(), "static/tmp/" + f1.name)
      data={} #can be anything u want to pass along with File
      file1 = open(path12, 'rb')
      header = {"Content-Disposition": "attachment; filename=" + f1.name, "Authorization": "JWT " + token}
       res= requests.post(url,data,header)

In Ubuntu you can apply this way,

to save file at some location (temporary) and then open and send it to API

      path = default_storage.save('static/tmp/' + f1.name, ContentFile(f1.read()))
      path12 = os.path.join(os.getcwd(), "static/tmp/" + f1.name)
      data={} #can be anything u want to pass along with File
      file1 = open(path12, 'rb')
      header = {"Content-Disposition": "attachment; filename=" + f1.name, "Authorization": "JWT " + token}
       res= requests.post(url,data,header)

使用POST从Python脚本发送文件

问题:使用POST从Python脚本发送文件

有没有一种方法可以使用Python脚本中的POST发送文件?

Is there a way to send a file using POST from a Python script?


回答 0

来自:https : //requests.readthedocs.io/en/latest/user/quickstart/#post-a-multipart-encoded-file

通过请求,上传多部分编码的文件变得非常简单:

with open('report.xls', 'rb') as f:
    r = requests.post('http://httpbin.org/post', files={'report.xls': f})

而已。我不是在开玩笑-这是一行代码。文件已发送。让我们检查:

>>> r.text
{
  "origin": "179.13.100.4",
  "files": {
    "report.xls": "<censored...binary...data>"
  },
  "form": {},
  "url": "http://httpbin.org/post",
  "args": {},
  "headers": {
    "Content-Length": "3196",
    "Accept-Encoding": "identity, deflate, compress, gzip",
    "Accept": "*/*",
    "User-Agent": "python-requests/0.8.0",
    "Host": "httpbin.org:80",
    "Content-Type": "multipart/form-data; boundary=127.0.0.1.502.21746.1321131593.786.1"
  },
  "data": ""
}

From: https://requests.readthedocs.io/en/latest/user/quickstart/#post-a-multipart-encoded-file

Requests makes it very simple to upload Multipart-encoded files:

with open('report.xls', 'rb') as f:
    r = requests.post('http://httpbin.org/post', files={'report.xls': f})

That’s it. I’m not joking – this is one line of code. The file was sent. Let’s check:

>>> r.text
{
  "origin": "179.13.100.4",
  "files": {
    "report.xls": "<censored...binary...data>"
  },
  "form": {},
  "url": "http://httpbin.org/post",
  "args": {},
  "headers": {
    "Content-Length": "3196",
    "Accept-Encoding": "identity, deflate, compress, gzip",
    "Accept": "*/*",
    "User-Agent": "python-requests/0.8.0",
    "Host": "httpbin.org:80",
    "Content-Type": "multipart/form-data; boundary=127.0.0.1.502.21746.1321131593.786.1"
  },
  "data": ""
}

回答 1

是。您将使用urllib2模块,并使用multipart/form-data内容类型进行编码。这是一些入门示例代码-不仅仅是文件上传,但您应该能够阅读它并了解其工作方式:

user_agent = "image uploader"
default_message = "Image $current of $total"

import logging
import os
from os.path import abspath, isabs, isdir, isfile, join
import random
import string
import sys
import mimetypes
import urllib2
import httplib
import time
import re

def random_string (length):
    return ''.join (random.choice (string.letters) for ii in range (length + 1))

def encode_multipart_data (data, files):
    boundary = random_string (30)

    def get_content_type (filename):
        return mimetypes.guess_type (filename)[0] or 'application/octet-stream'

    def encode_field (field_name):
        return ('--' + boundary,
                'Content-Disposition: form-data; name="%s"' % field_name,
                '', str (data [field_name]))

    def encode_file (field_name):
        filename = files [field_name]
        return ('--' + boundary,
                'Content-Disposition: form-data; name="%s"; filename="%s"' % (field_name, filename),
                'Content-Type: %s' % get_content_type(filename),
                '', open (filename, 'rb').read ())

    lines = []
    for name in data:
        lines.extend (encode_field (name))
    for name in files:
        lines.extend (encode_file (name))
    lines.extend (('--%s--' % boundary, ''))
    body = '\r\n'.join (lines)

    headers = {'content-type': 'multipart/form-data; boundary=' + boundary,
               'content-length': str (len (body))}

    return body, headers

def send_post (url, data, files):
    req = urllib2.Request (url)
    connection = httplib.HTTPConnection (req.get_host ())
    connection.request ('POST', req.get_selector (),
                        *encode_multipart_data (data, files))
    response = connection.getresponse ()
    logging.debug ('response = %s', response.read ())
    logging.debug ('Code: %s %s', response.status, response.reason)

def make_upload_file (server, thread, delay = 15, message = None,
                      username = None, email = None, password = None):

    delay = max (int (delay or '0'), 15)

    def upload_file (path, current, total):
        assert isabs (path)
        assert isfile (path)

        logging.debug ('Uploading %r to %r', path, server)
        message_template = string.Template (message or default_message)

        data = {'MAX_FILE_SIZE': '3145728',
                'sub': '',
                'mode': 'regist',
                'com': message_template.safe_substitute (current = current, total = total),
                'resto': thread,
                'name': username or '',
                'email': email or '',
                'pwd': password or random_string (20),}
        files = {'upfile': path}

        send_post (server, data, files)

        logging.info ('Uploaded %r', path)
        rand_delay = random.randint (delay, delay + 5)
        logging.debug ('Sleeping for %.2f seconds------------------------------\n\n', rand_delay)
        time.sleep (rand_delay)

    return upload_file

def upload_directory (path, upload_file):
    assert isabs (path)
    assert isdir (path)

    matching_filenames = []
    file_matcher = re.compile (r'\.(?:jpe?g|gif|png)$', re.IGNORECASE)

    for dirpath, dirnames, filenames in os.walk (path):
        for name in filenames:
            file_path = join (dirpath, name)
            logging.debug ('Testing file_path %r', file_path)
            if file_matcher.search (file_path):
                matching_filenames.append (file_path)
            else:
                logging.info ('Ignoring non-image file %r', path)

    total_count = len (matching_filenames)
    for index, file_path in enumerate (matching_filenames):
        upload_file (file_path, index + 1, total_count)

def run_upload (options, paths):
    upload_file = make_upload_file (**options)

    for arg in paths:
        path = abspath (arg)
        if isdir (path):
            upload_directory (path, upload_file)
        elif isfile (path):
            upload_file (path)
        else:
            logging.error ('No such path: %r' % path)

    logging.info ('Done!')

Yes. You’d use the urllib2 module, and encode using the multipart/form-data content type. Here is some sample code to get you started — it’s a bit more than just file uploading, but you should be able to read through it and see how it works:

user_agent = "image uploader"
default_message = "Image $current of $total"

import logging
import os
from os.path import abspath, isabs, isdir, isfile, join
import random
import string
import sys
import mimetypes
import urllib2
import httplib
import time
import re

def random_string (length):
    return ''.join (random.choice (string.letters) for ii in range (length + 1))

def encode_multipart_data (data, files):
    boundary = random_string (30)

    def get_content_type (filename):
        return mimetypes.guess_type (filename)[0] or 'application/octet-stream'

    def encode_field (field_name):
        return ('--' + boundary,
                'Content-Disposition: form-data; name="%s"' % field_name,
                '', str (data [field_name]))

    def encode_file (field_name):
        filename = files [field_name]
        return ('--' + boundary,
                'Content-Disposition: form-data; name="%s"; filename="%s"' % (field_name, filename),
                'Content-Type: %s' % get_content_type(filename),
                '', open (filename, 'rb').read ())

    lines = []
    for name in data:
        lines.extend (encode_field (name))
    for name in files:
        lines.extend (encode_file (name))
    lines.extend (('--%s--' % boundary, ''))
    body = '\r\n'.join (lines)

    headers = {'content-type': 'multipart/form-data; boundary=' + boundary,
               'content-length': str (len (body))}

    return body, headers

def send_post (url, data, files):
    req = urllib2.Request (url)
    connection = httplib.HTTPConnection (req.get_host ())
    connection.request ('POST', req.get_selector (),
                        *encode_multipart_data (data, files))
    response = connection.getresponse ()
    logging.debug ('response = %s', response.read ())
    logging.debug ('Code: %s %s', response.status, response.reason)

def make_upload_file (server, thread, delay = 15, message = None,
                      username = None, email = None, password = None):

    delay = max (int (delay or '0'), 15)

    def upload_file (path, current, total):
        assert isabs (path)
        assert isfile (path)

        logging.debug ('Uploading %r to %r', path, server)
        message_template = string.Template (message or default_message)

        data = {'MAX_FILE_SIZE': '3145728',
                'sub': '',
                'mode': 'regist',
                'com': message_template.safe_substitute (current = current, total = total),
                'resto': thread,
                'name': username or '',
                'email': email or '',
                'pwd': password or random_string (20),}
        files = {'upfile': path}

        send_post (server, data, files)

        logging.info ('Uploaded %r', path)
        rand_delay = random.randint (delay, delay + 5)
        logging.debug ('Sleeping for %.2f seconds------------------------------\n\n', rand_delay)
        time.sleep (rand_delay)

    return upload_file

def upload_directory (path, upload_file):
    assert isabs (path)
    assert isdir (path)

    matching_filenames = []
    file_matcher = re.compile (r'\.(?:jpe?g|gif|png)$', re.IGNORECASE)

    for dirpath, dirnames, filenames in os.walk (path):
        for name in filenames:
            file_path = join (dirpath, name)
            logging.debug ('Testing file_path %r', file_path)
            if file_matcher.search (file_path):
                matching_filenames.append (file_path)
            else:
                logging.info ('Ignoring non-image file %r', path)

    total_count = len (matching_filenames)
    for index, file_path in enumerate (matching_filenames):
        upload_file (file_path, index + 1, total_count)

def run_upload (options, paths):
    upload_file = make_upload_file (**options)

    for arg in paths:
        path = abspath (arg)
        if isdir (path):
            upload_directory (path, upload_file)
        elif isfile (path):
            upload_file (path)
        else:
            logging.error ('No such path: %r' % path)

    logging.info ('Done!')

回答 2

阻止您直接在文件对象上直接使用urlopen的唯一原因是内置文件对象缺少len定义。一种简单的方法是创建一个子类,该子类为urlopen提供正确的文件。我还修改了下面文件中的Content-Type标头。

import os
import urllib2
class EnhancedFile(file):
    def __init__(self, *args, **keyws):
        file.__init__(self, *args, **keyws)

    def __len__(self):
        return int(os.fstat(self.fileno())[6])

theFile = EnhancedFile('a.xml', 'r')
theUrl = "http://example.com/abcde"
theHeaders= {'Content-Type': 'text/xml'}

theRequest = urllib2.Request(theUrl, theFile, theHeaders)

response = urllib2.urlopen(theRequest)

theFile.close()


for line in response:
    print line

The only thing that stops you from using urlopen directly on a file object is the fact that the builtin file object lacks a len definition. A simple way is to create a subclass, which provides urlopen with the correct file. I have also modified the Content-Type header in the file below.

import os
import urllib2
class EnhancedFile(file):
    def __init__(self, *args, **keyws):
        file.__init__(self, *args, **keyws)

    def __len__(self):
        return int(os.fstat(self.fileno())[6])

theFile = EnhancedFile('a.xml', 'r')
theUrl = "http://example.com/abcde"
theHeaders= {'Content-Type': 'text/xml'}

theRequest = urllib2.Request(theUrl, theFile, theHeaders)

response = urllib2.urlopen(theRequest)

theFile.close()


for line in response:
    print line

回答 3

看起来python请求无法处理非常大的多部分文件。

该文档建议您仔细阅读requests-toolbelt

这是他们文档中的相关页面

Looks like python requests does not handle extremely large multi-part files.

The documentation recommends you look into requests-toolbelt.

Here’s the pertinent page from their documentation.


回答 4

克里斯·阿特利(Chris Atlee)的海报库在此方面非常有效(尤其是便捷功能poster.encode.multipart_encode())。另外,它支持流大文件而无需将整个文件加载到内存中。另请参阅Python问题3244

Chris Atlee’s poster library works really well for this (particularly the convenience function poster.encode.multipart_encode()). As a bonus, it supports streaming of large files without loading an entire file into memory. See also Python issue 3244.


回答 5

我正在尝试测试Django Rest API及其对我的工作:

def test_upload_file(self):
        filename = "/Users/Ranvijay/tests/test_price_matrix.csv"
        data = {'file': open(filename, 'rb')}
        client = APIClient()
        # client.credentials(HTTP_AUTHORIZATION='Token ' + token.key)
        response = client.post(reverse('price-matrix-csv'), data, format='multipart')

        print response
        self.assertEqual(response.status_code, status.HTTP_200_OK)

I am trying to test django rest api and its working for me:

def test_upload_file(self):
        filename = "/Users/Ranvijay/tests/test_price_matrix.csv"
        data = {'file': open(filename, 'rb')}
        client = APIClient()
        # client.credentials(HTTP_AUTHORIZATION='Token ' + token.key)
        response = client.post(reverse('price-matrix-csv'), data, format='multipart')

        print response
        self.assertEqual(response.status_code, status.HTTP_200_OK)

回答 6

您可能还想看看带有示例的httplib2。我发现使用httplib2比使用内置HTTP模块更为简洁。

You may also want to have a look at httplib2, with examples. I find using httplib2 is more concise than using the built-in HTTP modules.


回答 7

def visit_v2(device_code, camera_code):
    image1 = MultipartParam.from_file("files", "/home/yuzx/1.txt")
    image2 = MultipartParam.from_file("files", "/home/yuzx/2.txt")
    datagen, headers = multipart_encode([('device_code', device_code), ('position', 3), ('person_data', person_data), image1, image2])
    print "".join(datagen)
    if server_port == 80:
        port_str = ""
    else:
        port_str = ":%s" % (server_port,)
    url_str = "http://" + server_ip + port_str + "/adopen/device/visit_v2"
    headers['nothing'] = 'nothing'
    request = urllib2.Request(url_str, datagen, headers)
    try:
        response = urllib2.urlopen(request)
        resp = response.read()
        print "http_status =", response.code
        result = json.loads(resp)
        print resp
        return result
    except urllib2.HTTPError, e:
        print "http_status =", e.code
        print e.read()
def visit_v2(device_code, camera_code):
    image1 = MultipartParam.from_file("files", "/home/yuzx/1.txt")
    image2 = MultipartParam.from_file("files", "/home/yuzx/2.txt")
    datagen, headers = multipart_encode([('device_code', device_code), ('position', 3), ('person_data', person_data), image1, image2])
    print "".join(datagen)
    if server_port == 80:
        port_str = ""
    else:
        port_str = ":%s" % (server_port,)
    url_str = "http://" + server_ip + port_str + "/adopen/device/visit_v2"
    headers['nothing'] = 'nothing'
    request = urllib2.Request(url_str, datagen, headers)
    try:
        response = urllib2.urlopen(request)
        resp = response.read()
        print "http_status =", response.code
        result = json.loads(resp)
        print resp
        return result
    except urllib2.HTTPError, e:
        print "http_status =", e.code
        print e.read()