为什么csvwriter.writerow()在每个字符后加逗号?

问题:为什么csvwriter.writerow()在每个字符后加逗号?

此代码打开url并/names在末尾附加,然后打开页面并将字符串打印到test1.csv

import urllib2
import re
import csv

url = ("http://www.example.com")
bios = [u'/name1', u'/name2', u'/name3']
csvwriter = csv.writer(open("/test1.csv", "a"))

for l in bios:
    OpenThisLink = url + l
    response = urllib2.urlopen(OpenThisLink)
    html = response.read()
    item = re.search('(JD)(.*?)(\d+)', html)
    if item:
        JD = item.group()
        csvwriter.writerow(JD)
    else:
        NoJD = "NoJD"
        csvwriter.writerow(NoJD)

但是我得到这个结果:

J,D,",", ,C,o,l,u,m,b,i,a, ,L,a,w, ,S,c,h,o,o,l,....

如果我将字符串更改为(“ JD”,“哥伦比亚法学院” ….),则会得到

JD, Columbia Law School...)

我在文档中找不到如何指定分度符。

如果我尝试使用delimenter,则会出现此错误:

TypeError: 'delimeter' is an invalid keyword argument for this function

谢谢您的帮助。

This code opens the url and appends the /names at the end and opens the page and prints the string to test1.csv:

import urllib2
import re
import csv

url = ("http://www.example.com")
bios = [u'/name1', u'/name2', u'/name3']
csvwriter = csv.writer(open("/test1.csv", "a"))

for l in bios:
    OpenThisLink = url + l
    response = urllib2.urlopen(OpenThisLink)
    html = response.read()
    item = re.search('(JD)(.*?)(\d+)', html)
    if item:
        JD = item.group()
        csvwriter.writerow(JD)
    else:
        NoJD = "NoJD"
        csvwriter.writerow(NoJD)

But I get this result:

J,D,",", ,C,o,l,u,m,b,i,a, ,L,a,w, ,S,c,h,o,o,l,....

If I change the string to (“JD”, “Columbia Law School” ….) then I get

JD, Columbia Law School...)

I couldn’t find in the documentation how to specify the delimeter.

If I try to use delimenter I get this error:

TypeError: 'delimeter' is an invalid keyword argument for this function

Thanks for the help.


回答 0

它需要一个字符串序列(例如:列表或元组)。您给它一个字符串。一个字符串也恰好是一个字符串序列,但是它是一个由1个字符串组成的序列,这不是您想要的。

如果您只希望每行一个字符串,则可以执行以下操作:

csvwriter.writerow([JD])

这会用列表包装JD(字符串)。

It expects a sequence (eg: a list or tuple) of strings. You’re giving it a single string. A string happens to be a sequence of strings too, but it’s a sequence of 1 character strings, which isn’t what you want.

If you just want one string per row you could do something like this:

csvwriter.writerow([JD])

This wraps JD (a string) with a list.


回答 1

csv.writer类将一个可迭代的参数作为writerow的参数。由于Python中的字符串可以按字符进行迭代,因此它们是writerow可接受的参数,但是您会得到上面的输出。

为了解决这个问题,您可以根据空格分割值(我假设这就是您想要的)

csvwriter.writerow(JD.split())

The csv.writer class takes an iterable as it’s argument to writerow; as strings in Python are iterable by character, they are an acceptable argument to writerow, but you get the above output.

To correct this, you could split the value based on whitespace (I’m assuming that’s what you want)

csvwriter.writerow(JD.split())

回答 2

发生这种情况的原因是,当MatchObject实例的group()方法仅返回单个值时,它将作为字符串返回。当有多个值时,它们将作为字符串元组返回。

如果您要写一行,我想csv.writer会遍历传递给它的对象。如果传递单个字符串(可迭代),则会对其字符进行迭代,从而产生您正在观察的结果。如果传递字符串的元组,它将获得实际的字符串,而不是每次迭代都包含单个字符。

This happens, because when group() method of a MatchObject instance returns only a single value, it returns it as a string. When there are multiple values, they are returned as a tuple of strings.

If you are writing a row, I guess, csv.writer iterates over the object you pass to it. If you pass a single string (which is an iterable), it iterates over its characters, producing the result you are observing. If you pass a tuple of strings, it gets an actual string, not a single character on every iteration.