问题:给定一个文本文件的URL,最简单的读取文本文件内容的方法是什么?

在Python中,当获得文本文件的URL时,最简单的方法是从文本文件访问内容并逐行将文件内容打印出本地而不保存文本文件的本地副本?

TargetURL=http://www.myhost.com/SomeFile.txt
#read the file
#print first line
#print second line
#etc

In Python, when given the URL for a text file, what is the simplest way to access the contents off the text file and print the contents of the file out locally line-by-line without saving a local copy of the text file?

TargetURL=http://www.myhost.com/SomeFile.txt
#read the file
#print first line
#print second line
#etc

回答 0

编辑09/2016:在Python 3及更高版本中,使用urllib.request而不是urllib2

实际上,最简单的方法是:

import urllib2  # the lib that handles the url stuff

data = urllib2.urlopen(target_url) # it's a file like object and works just like a file
for line in data: # files are iterable
    print line

正如Will所建议的,您甚至不需要“ readlines”。您甚至可以将其缩短为: *

import urllib2

for line in urllib2.urlopen(target_url):
    print line

但是请记住,在Python中,可读性很重要。

但是,这是最简单的方法,但不是安全的方法,因为在大多数情况下,使用网络编程时,您不知道预期的数据量是否会得到遵守。因此,通常最好读取固定且合理数量的数据,这足以满足您的期望,但可以防止脚本被淹没:

import urllib2

data = urllib2.urlopen("http://www.google.com").read(20000) # read only 20 000 chars
data = data.split("\n") # then split it into lines

for line in data:
    print line

* Python 3中的第二个示例:

import urllib.request  # the lib that handles the url stuff

for line in urllib.request.urlopen(target_url):
    print(line.decode('utf-8')) #utf-8 or iso8859-1 or whatever the page encoding scheme is

Edit 09/2016: In Python 3 and up use urllib.request instead of urllib2

Actually the simplest way is:

import urllib2  # the lib that handles the url stuff

data = urllib2.urlopen(target_url) # it's a file like object and works just like a file
for line in data: # files are iterable
    print line

You don’t even need “readlines”, as Will suggested. You could even shorten it to: *

import urllib2

for line in urllib2.urlopen(target_url):
    print line

But remember in Python, readability matters.

However, this is the simplest way but not the safe way because most of the time with network programming, you don’t know if the amount of data to expect will be respected. So you’d generally better read a fixed and reasonable amount of data, something you know to be enough for the data you expect but will prevent your script from been flooded:

import urllib2

data = urllib2.urlopen("http://www.google.com").read(20000) # read only 20 000 chars
data = data.split("\n") # then split it into lines

for line in data:
    print line

* Second example in Python 3:

import urllib.request  # the lib that handles the url stuff

for line in urllib.request.urlopen(target_url):
    print(line.decode('utf-8')) #utf-8 or iso8859-1 or whatever the page encoding scheme is

回答 1

我是Python的新手,在公认的解决方案中关于Python 3的副手评论令人困惑。为了后代,在Python 3中执行此操作的代码是

import urllib.request
data = urllib.request.urlopen(target_url)

for line in data:
    ...

或者

from urllib.request import urlopen
data = urlopen(target_url)

请注意,这import urllib是行不通的。

I’m a newbie to Python and the offhand comment about Python 3 in the accepted solution was confusing. For posterity, the code to do this in Python 3 is

import urllib.request
data = urllib.request.urlopen(target_url)

for line in data:
    ...

or alternatively

from urllib.request import urlopen
data = urlopen(target_url)

Note that just import urllib does not work.


回答 2

确实不需要逐行阅读。您可以像这样得到整个东西:

import urllib
txt = urllib.urlopen(target_url).read()

There’s really no need to read line-by-line. You can get the whole thing like this:

import urllib
txt = urllib.urlopen(target_url).read()

回答 3

请求库有一个简单的界面,并与两个Python 2和3的作品。

import requests

response = requests.get(target_url)
data = response.text

The requests library has a simpler interface and works with both Python 2 and 3.

import requests

response = requests.get(target_url)
data = response.text

回答 4

import urllib2
for line in urllib2.urlopen("http://www.myhost.com/SomeFile.txt"):
    print line
import urllib2
for line in urllib2.urlopen("http://www.myhost.com/SomeFile.txt"):
    print line

回答 5

import urllib2

f = urllib2.urlopen(target_url)
for l in f.readlines():
    print l
import urllib2

f = urllib2.urlopen(target_url)
for l in f.readlines():
    print l

回答 6

Python 3中的另一种方法是使用urllib3包

import urllib3

http = urllib3.PoolManager()
response = http.request('GET', target_url)
data = response.data.decode('utf-8')

这可能比urllib更好,因为urllib3拥有

  • 线程安全。
  • 连接池。
  • 客户端SSL / TLS验证。
  • 使用分段编码上传文件。
  • 重试请求和处理HTTP重定向的助手。
  • 支持gzip和deflate编码。
  • HTTP和SOCKS的代理支持。
  • 100%的测试覆盖率。

Another way in Python 3 is to use the urllib3 package.

import urllib3

http = urllib3.PoolManager()
response = http.request('GET', target_url)
data = response.data.decode('utf-8')

This can be a better option than urllib since urllib3 boasts having

  • Thread safety.
  • Connection pooling.
  • Client-side SSL/TLS verification.
  • File uploads with multipart encoding.
  • Helpers for retrying requests and dealing with HTTP redirects.
  • Support for gzip and deflate encoding.
  • Proxy support for HTTP and SOCKS.
  • 100% test coverage.

回答 7

对我来说,上述任何回应都没有直接起作用。相反,我必须执行以下操作(Python 3):

from urllib.request import urlopen

data = urlopen("[your url goes here]").read().decode('utf-8')

# Do what you need to do with the data.

For me, none of the above responses worked straight ahead. Instead, I had to do the following (Python 3):

from urllib.request import urlopen

data = urlopen("[your url goes here]").read().decode('utf-8')

# Do what you need to do with the data.

回答 8

只需更新@ ken-kinder建议的Python 2解决方案即可用于Python 3:

import urllib
urllib.request.urlopen(target_url).read()

Just updating here solution suggested by @ken-kinder for Python 2 to work for Python 3:

import urllib
urllib.request.urlopen(target_url).read()

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。