标签归档:concatenation

在python中连接字符串和整数

问题:在python中连接字符串和整数

用python说你有

s = "string"
i = 0
print s+i

会给你错误,所以你写

print s+str(i) 

不会出错。

我认为这是处理int和字符串连接的笨拙方式。甚至Java也不需要显式转换为String来进行这种连接。有没有更好的方法来进行这种串联,即在Python中无需显式转换?

In python say you have

s = "string"
i = 0
print s+i

will give you error so you write

print s+str(i) 

to not get error.

I think this is quite a clumsy way to handle int and string concatenation. Even Java does not need explicit casting to String to do this sort of concatenation. Is there a better way to do this sort of concatenation i.e without explicit casting in Python?


回答 0

现代字符串格式:

"{} and {}".format("string", 1)

Modern string formatting:

"{} and {}".format("string", 1)

回答 1

没有字符串格式:

>> print 'Foo',0
Foo 0

No string formatting:

>> print 'Foo',0
Foo 0

回答 2

使用新样式.format()方法(默认使用.format()提供)进行字符串格式化:

 '{}{}'.format(s, i)

或更旧的,但“仍然存在”,- %格式化:

 '%s%d' %(s, i)

在以上两个示例中,串联的两个项目之间没有空格。如果需要空间,可以简单地将其添加到格式字符串中。

这些提供了 如何串联项目,它们之间的空间等很多控制和灵活性。有关的详细信息格式规范的请参见此

String formatting, using the new-style .format() method (with the defaults .format() provides):

 '{}{}'.format(s, i)

Or the older, but “still sticking around”, %-formatting:

 '%s%d' %(s, i)

In both examples above there’s no space between the two items concatenated. If space is needed, it can simply be added in the format strings.

These provide a lot of control and flexibility about how to concatenate items, the space between them etc. For details about format specifications see this.


回答 3

Python是一种有趣的语言,尽管通常有一种(或两种)“显而易见”的方式来完成任何给定的任务,但灵活性仍然存在。

s = "string"
i = 0

print (s + repr(i))

上面的代码段使用Python3语法编写,但始终允许打印后的括号(可选),直到版本3强制使用为止。

希望这可以帮助。

凯特琳

Python is an interesting language in that while there is usually one (or two) “obvious” ways to accomplish any given task, flexibility still exists.

s = "string"
i = 0

print (s + repr(i))

The above code snippet is written in Python3 syntax but the parentheses after print were always allowed (optional) until version 3 made them mandatory.

Hope this helps.

Caitlin


回答 4

format()方法可用于连接字符串和整数

print(s+"{}".format(i))

format() method can be used to concatenate string and integer

print(s+"{}".format(i))

回答 5

如果您只想打印哟,可以这样做:

print(s , i)

if you only want to print yo can do this:

print(s , i)

回答 6

在python 3.6及更高版本中,您可以像这样格式化它:

new_string = f'{s} {i}'
print(new_string)

要不就:

print(f'{s} {i}')

in python 3.6 and newer, you can format it just like this:

new_string = f'{s} {i}'
print(new_string)

or just:

print(f'{s} {i}')

在Python中,“。append()”和“ + = []”之间有什么区别?

问题:在Python中,“。append()”和“ + = []”之间有什么区别?

之间有什么区别?

some_list1 = []
some_list1.append("something")

some_list2 = []
some_list2 += ["something"]

What is the difference between:

some_list1 = []
some_list1.append("something")

and

some_list2 = []
some_list2 += ["something"]

回答 0

对于您而言,唯一的区别是性能:append是两倍的速度。

Python 3.0 (r30:67507, Dec  3 2008, 20:14:27) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.Timer('s.append("something")', 's = []').timeit()
0.20177424499999999
>>> timeit.Timer('s += ["something"]', 's = []').timeit()
0.41192320500000079

Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.Timer('s.append("something")', 's = []').timeit()
0.23079359499999999
>>> timeit.Timer('s += ["something"]', 's = []').timeit()
0.44208112500000141

通常情况下,append会将一个项目添加到列表中,而+=将右侧列表的所有元素复制到左侧列表中。

更新:性能分析

比较字节码,我们可以假设appendversion在LOAD_ATTR+ CALL_FUNCTION和+ = version-中浪费了周期BUILD_LIST。显然BUILD_LIST大于LOAD_ATTR+ CALL_FUNCTION

>>> import dis
>>> dis.dis(compile("s = []; s.append('spam')", '', 'exec'))
  1           0 BUILD_LIST               0
              3 STORE_NAME               0 (s)
              6 LOAD_NAME                0 (s)
              9 LOAD_ATTR                1 (append)
             12 LOAD_CONST               0 ('spam')
             15 CALL_FUNCTION            1
             18 POP_TOP
             19 LOAD_CONST               1 (None)
             22 RETURN_VALUE
>>> dis.dis(compile("s = []; s += ['spam']", '', 'exec'))
  1           0 BUILD_LIST               0
              3 STORE_NAME               0 (s)
              6 LOAD_NAME                0 (s)
              9 LOAD_CONST               0 ('spam')
             12 BUILD_LIST               1
             15 INPLACE_ADD
             16 STORE_NAME               0 (s)
             19 LOAD_CONST               1 (None)
             22 RETURN_VALUE

我们可以通过减少LOAD_ATTR开销来进一步提高性能:

>>> timeit.Timer('a("something")', 's = []; a = s.append').timeit()
0.15924410999923566

For your case the only difference is performance: append is twice as fast.

Python 3.0 (r30:67507, Dec  3 2008, 20:14:27) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.Timer('s.append("something")', 's = []').timeit()
0.20177424499999999
>>> timeit.Timer('s += ["something"]', 's = []').timeit()
0.41192320500000079

Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.Timer('s.append("something")', 's = []').timeit()
0.23079359499999999
>>> timeit.Timer('s += ["something"]', 's = []').timeit()
0.44208112500000141

In general case append will add one item to the list, while += will copy all elements of right-hand-side list into the left-hand-side list.

Update: perf analysis

Comparing bytecodes we can assume that append version wastes cycles in LOAD_ATTR + CALL_FUNCTION, and += version — in BUILD_LIST. Apparently BUILD_LIST outweighs LOAD_ATTR + CALL_FUNCTION.

>>> import dis
>>> dis.dis(compile("s = []; s.append('spam')", '', 'exec'))
  1           0 BUILD_LIST               0
              3 STORE_NAME               0 (s)
              6 LOAD_NAME                0 (s)
              9 LOAD_ATTR                1 (append)
             12 LOAD_CONST               0 ('spam')
             15 CALL_FUNCTION            1
             18 POP_TOP
             19 LOAD_CONST               1 (None)
             22 RETURN_VALUE
>>> dis.dis(compile("s = []; s += ['spam']", '', 'exec'))
  1           0 BUILD_LIST               0
              3 STORE_NAME               0 (s)
              6 LOAD_NAME                0 (s)
              9 LOAD_CONST               0 ('spam')
             12 BUILD_LIST               1
             15 INPLACE_ADD
             16 STORE_NAME               0 (s)
             19 LOAD_CONST               1 (None)
             22 RETURN_VALUE

We can improve performance even more by removing LOAD_ATTR overhead:

>>> timeit.Timer('a("something")', 's = []; a = s.append').timeit()
0.15924410999923566

回答 1

在您给出的示例中,append和之间在输出方面没有区别+=。但是append和之间+(这是最初询问的问题)之间存在区别。

>>> a = []
>>> id(a)
11814312
>>> a.append("hello")
>>> id(a)
11814312

>>> b = []
>>> id(b)
11828720
>>> c = b + ["hello"]
>>> id(c)
11833752
>>> b += ["hello"]
>>> id(b)
11828720

如您所见,append+=具有相同的结果;他们将项目添加到列表中,而不生成新列表。使用+添加两个列表并生成一个新列表。

In the example you gave, there is no difference, in terms of output, between append and +=. But there is a difference between append and + (which the question originally asked about).

>>> a = []
>>> id(a)
11814312
>>> a.append("hello")
>>> id(a)
11814312

>>> b = []
>>> id(b)
11828720
>>> c = b + ["hello"]
>>> id(c)
11833752
>>> b += ["hello"]
>>> id(b)
11828720

As you can see, append and += have the same result; they add the item to the list, without producing a new list. Using + adds the two lists and produces a new list.


回答 2

>>> a=[]
>>> a.append([1,2])
>>> a
[[1, 2]]
>>> a=[]
>>> a+=[1,2]
>>> a
[1, 2]

看到append将单个元素添加到列表,可以是任何元素。+=[]加入列表。

>>> a=[]
>>> a.append([1,2])
>>> a
[[1, 2]]
>>> a=[]
>>> a+=[1,2]
>>> a
[1, 2]

See that append adds a single element to the list, which may be anything. +=[] joins the lists.


回答 3

+ =是一个分配。使用它时,您实际上是在说“ some_list2 = some_list2 + [‘something’]”。分配涉及重新绑定,因此:

l= []

def a1(x):
    l.append(x) # works

def a2(x):
    l= l+[x] # assign to l, makes l local
             # so attempt to read l for addition gives UnboundLocalError

def a3(x):
    l+= [x]  # fails for the same reason

+ =运算符通常还应该像list + list通常那样创建一个新的列表对象:

>>> l1= []
>>> l2= l1

>>> l1.append('x')
>>> l1 is l2
True

>>> l1= l1+['x']
>>> l1 is l2
False

但是实际上:

>>> l2= l1
>>> l1+= ['x']
>>> l1 is l2
True

这是因为Python列表实现了__iadd __()来使+ =扩展分配短路,而调用list.extend()。(这有点奇怪:它通常按照您的意思进行,但出于令人困惑的原因。)

通常,如果要添加/扩展现有列表,并且希望保留对同一列表的引用(而不是创建新列表),则最好明确并坚持使用append()/ extend()方法。

+= is an assignment. When you use it you’re really saying ‘some_list2= some_list2+[‘something’]’. Assignments involve rebinding, so:

l= []

def a1(x):
    l.append(x) # works

def a2(x):
    l= l+[x] # assign to l, makes l local
             # so attempt to read l for addition gives UnboundLocalError

def a3(x):
    l+= [x]  # fails for the same reason

The += operator should also normally create a new list object like list+list normally does:

>>> l1= []
>>> l2= l1

>>> l1.append('x')
>>> l1 is l2
True

>>> l1= l1+['x']
>>> l1 is l2
False

However in reality:

>>> l2= l1
>>> l1+= ['x']
>>> l1 is l2
True

This is because Python lists implement __iadd__() to make a += augmented assignment short-circuit and call list.extend() instead. (It’s a bit of a strange wart this: it usually does what you meant, but for confusing reasons.)

In general, if you’re appending/extended an existing list, and you want to keep the reference to the same list (instead of making a new one), it’s best to be explicit and stick with the append()/extend() methods.


回答 4

 some_list2 += ["something"]

实际上是

 some_list2.extend(["something"])

对于一个值,没有区别。文档指出:

s.append(x) 与… s[len(s):len(s)] = [x]
s.extend(x) 相同s[len(s):len(s)] = x

因此显然s.append(x)s.extend([x])

 some_list2 += ["something"]

is actually

 some_list2.extend(["something"])

for one value, there is no difference. Documentation states, that:

s.append(x) same as s[len(s):len(s)] = [x]
s.extend(x) same as s[len(s):len(s)] = x

Thus obviously s.append(x) is same as s.extend([x])


回答 5

区别在于,连接将使结果列表变平,而附加将使级别保持完整:

因此,例如:

myList = [ ]
listA = [1,2,3]
listB = ["a","b","c"]

使用append,最终得到一个列表列表:

>> myList.append(listA)
>> myList.append(listB)
>> myList
[[1,2,3],['a',b','c']]

而是使用串联,最终得到一个平面列表:

>> myList += listA + listB
>> myList
[1,2,3,"a","b","c"]

The difference is that concatenate will flatten the resulting list, whereas append will keep the levels intact:

So for example with:

myList = [ ]
listA = [1,2,3]
listB = ["a","b","c"]

Using append, you end up with a list of lists:

>> myList.append(listA)
>> myList.append(listB)
>> myList
[[1,2,3],['a',b','c']]

Using concatenate instead, you end up with a flat list:

>> myList += listA + listB
>> myList
[1,2,3,"a","b","c"]

回答 6

这里的性能测试不正确:

  1. 您不应只运行一次配置文件。
  2. 如果比较append与+ = []的次数,则应将append声明为局部函数。
  3. 时间结果在不同的python版本上是不同的:64位和32位

例如

timeit.Timer(’for xrange(100)中的i:app(i)’,’s = []; app = s.append’)。timeit()

好的测试可以在这里找到:http : //markandclick.com/1/post/2012/01/python-list-append-vs.html

The performance tests here are not correct:

  1. You shouldn’t run the profile only once.
  2. If comparing append vs. += [] number of times you should declare append as a local function.
  3. time results are different on different python versions: 64 and 32 bit

e.g.

timeit.Timer(‘for i in xrange(100): app(i)’, ‘s = [] ; app = s.append’).timeit()

good tests can be found here: http://markandclick.com/1/post/2012/01/python-list-append-vs.html


回答 7

除了其他答案中描述的方面之外,当您尝试构建列表列表时,append和+ []的行为也非常不同。

>>> list1=[[1,2],[3,4]]
>>> list2=[5,6]
>>> list3=list1+list2
>>> list3
[[1, 2], [3, 4], 5, 6]
>>> list1.append(list2)
>>> list1
[[1, 2], [3, 4], [5, 6]]

list1 + [‘5’,’6’]将“ 5”和“ 6”添加到list1作为单独的元素。list1.append([‘5’,’6’])将列表[‘5’,’6’]作为单个元素添加到list1。

In addition to the aspects described in the other answers, append and +[] have very different behaviors when you’re trying to build a list of lists.

>>> list1=[[1,2],[3,4]]
>>> list2=[5,6]
>>> list3=list1+list2
>>> list3
[[1, 2], [3, 4], 5, 6]
>>> list1.append(list2)
>>> list1
[[1, 2], [3, 4], [5, 6]]

list1+[‘5′,’6’] adds ‘5’ and ‘6’ to the list1 as individual elements. list1.append([‘5′,’6’]) adds the list [‘5′,’6’] to the list1 as a single element.


回答 8

其他答案中提到的重新绑定行为在某些情况下确实很重要:

>>> a = ([],[])
>>> a[0].append(1)
>>> a
([1], [])
>>> a[1] += [1]
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

这是因为即使对象已就地突变,增强分配也会始终重新绑定。这里的绑定恰好是a[1] = *mutated list*,不适用于元组。

The rebinding behaviour mentioned in other answers does matter in certain circumstances:

>>> a = ([],[])
>>> a[0].append(1)
>>> a
([1], [])
>>> a[1] += [1]
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

That’s because augmented assignment always rebinds, even if the object was mutated in-place. The rebinding here happens to be a[1] = *mutated list*, which doesn’t work for tuples.


回答 9

让我们先举一个例子

list1=[1,2,3,4]
list2=list1     (that means they points to same object)

if we do 
list1=list1+[5]    it will create a new object of list
print(list1)       output [1,2,3,4,5] 
print(list2)       output [1,2,3,4]

but if we append  then 
list1.append(5)     no new object of list created
print(list1)       output [1,2,3,4,5] 
print(list2)       output [1,2,3,4,5]

extend(list) also do the same work as append it just append a list instead of a 
single variable 

let’s take an example first

list1=[1,2,3,4]
list2=list1     (that means they points to same object)

if we do 
list1=list1+[5]    it will create a new object of list
print(list1)       output [1,2,3,4,5] 
print(list2)       output [1,2,3,4]

but if we append  then 
list1.append(5)     no new object of list created
print(list1)       output [1,2,3,4,5] 
print(list2)       output [1,2,3,4,5]

extend(list) also do the same work as append it just append a list instead of a 
single variable 

回答 10

append()方法将单个项目添加到现有列表中

some_list1 = []
some_list1.append("something")

因此,这里some_list1将被修改。

更新:

而使用+组合现有列表中列表的元素(多个元素),类似于扩展(由Flux纠正)。

some_list2 = []
some_list2 += ["something"]

因此,这里some_list2和[“ something”]是结合在一起的两个列表。

The append() method adds a single item to the existing list

some_list1 = []
some_list1.append("something")

So here the some_list1 will get modified.

Updated:

Whereas using + to combine the elements of lists (more than one element) in the existing list similar to the extend (as corrected by Flux).

some_list2 = []
some_list2 += ["something"]

So here the some_list2 and [“something”] are the two lists that are combined.


回答 11

“ +” 不会改变列表

.append()改变旧列表

“+” does not mutate the list

.append() mutates the old list


字符串如何串联?

问题:字符串如何串联?

如何在python中连接字符串?

例如:

Section = 'C_type'

将其与Sec_形成字符串:

Sec_C_type

How to concatenate strings in python?

For example:

Section = 'C_type'

Concatenate it with Sec_ to form the string:

Sec_C_type

回答 0

最简单的方法是

Section = 'Sec_' + Section

但为了提高效率,请参阅:https : //waymoot.org/home/python_string/

The easiest way would be

Section = 'Sec_' + Section

But for efficiency, see: https://waymoot.org/home/python_string/


回答 1

您也可以这样做:

section = "C_type"
new_section = "Sec_%s" % section

这样,您不仅可以追加,还可以在字符串中的任意位置插入:

section = "C_type"
new_section = "Sec_%s_blah" % section

you can also do this:

section = "C_type"
new_section = "Sec_%s" % section

This allows you not only append, but also insert wherever in the string:

section = "C_type"
new_section = "Sec_%s_blah" % section

回答 2

只是一条评论,就像有人可能会发现它很有用-您可以一次连接多个字符串:

>>> a='rabbit'
>>> b='fox'
>>> print '%s and %s' %(a,b)
rabbit and fox

Just a comment, as someone may find it useful – you can concatenate more than one string in one go:

>>> a='rabbit'
>>> b='fox'
>>> print '%s and %s' %(a,b)
rabbit and fox

回答 3

连接字符串的更有效方法是:

加入():

效率很高,但有点难读。

>>> Section = 'C_type'  
>>> new_str = ''.join(['Sec_', Section]) # inserting a list of strings 
>>> print new_str 
>>> 'Sec_C_type'

字符串格式:

易于阅读,在大多数情况下比“ +”级联更快

>>> Section = 'C_type'
>>> print 'Sec_%s' % Section
>>> 'Sec_C_type'

More efficient ways of concatenating strings are:

join():

Very efficent, but a bit hard to read.

>>> Section = 'C_type'  
>>> new_str = ''.join(['Sec_', Section]) # inserting a list of strings 
>>> print new_str 
>>> 'Sec_C_type'

String formatting:

Easy to read and in most cases faster than ‘+’ concatenating

>>> Section = 'C_type'
>>> print 'Sec_%s' % Section
>>> 'Sec_C_type'

回答 4

使用+字符串连接为:

section = 'C_type'
new_section = 'Sec_' + section

Use + for string concatenation as:

section = 'C_type'
new_section = 'Sec_' + section

回答 5

要在python中连接字符串,请使用“ +”号

参考:http : //www.gidnetwork.com/b-40.html

To concatenate strings in python you use the “+” sign

ref: http://www.gidnetwork.com/b-40.html


回答 6

对于附加到现有字符串末尾的情况:

string = "Sec_"
string += "C_type"
print(string)

结果是

Sec_C_type

For cases of appending to end of existing string:

string = "Sec_"
string += "C_type"
print(string)

results in

Sec_C_type

Python连接文本文件

问题:Python连接文本文件

我列出了20个文件名,例如['file1.txt', 'file2.txt', ...]。我想编写一个Python脚本将这些文件连接成一个新文件。我可以通过打开每个文件f = open(...),通过调用逐行读取f.readline(),然后将每一行写入该新文件。在我看来,这并不是很“优雅”,尤其是我必须逐行读取/写入的部分。

在Python中是否有更“优雅”的方式来做到这一点?

I have a list of 20 file names, like ['file1.txt', 'file2.txt', ...]. I want to write a Python script to concatenate these files into a new file. I could open each file by f = open(...), read line by line by calling f.readline(), and write each line into that new file. It doesn’t seem very “elegant” to me, especially the part where I have to read//write line by line.

Is there a more “elegant” way to do this in Python?


回答 0

这应该做

对于大文件:

filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            for line in infile:
                outfile.write(line)

对于小文件:

filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            outfile.write(infile.read())

……还有我想到的另一个有趣的东西

filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
    for line in itertools.chain.from_iterable(itertools.imap(open, filnames)):
        outfile.write(line)

遗憾的是,这最后一个方法留下了一些打开的文件描述符,GC还是应该照顾这些文件描述符。我只是觉得很有趣

This should do it

For large files:

filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            for line in infile:
                outfile.write(line)

For small files:

filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            outfile.write(infile.read())

… and another interesting one that I thought of:

filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
    for line in itertools.chain.from_iterable(itertools.imap(open, filnames)):
        outfile.write(line)

Sadly, this last method leaves a few open file descriptors, which the GC should take care of anyway. I just thought it was interesting


回答 1

使用shutil.copyfileobj

它会自动为您逐块读取输入文件,这样效率更高,并且可以读取输入文件,即使某些输入文件太大而无法装入内存也可以正常工作:

import shutil

with open('output_file.txt','wb') as wfd:
    for f in ['seg1.txt','seg2.txt','seg3.txt']:
        with open(f,'rb') as fd:
            shutil.copyfileobj(fd, wfd)

Use shutil.copyfileobj.

It automatically reads the input files chunk by chunk for you, which is more more efficient and reading the input files in and will work even if some of the input files are too large to fit into memory:

import shutil

with open('output_file.txt','wb') as wfd:
    for f in ['seg1.txt','seg2.txt','seg3.txt']:
        with open(f,'rb') as fd:
            shutil.copyfileobj(fd, wfd)

回答 2

这正是fileinput的目的:

import fileinput
with open(outfilename, 'w') as fout, fileinput.input(filenames) as fin:
    for line in fin:
        fout.write(line)

对于这种用例,它不仅比手动遍历文件简单得多,而且在其他情况下,拥有一个遍历所有文件就好像它们是单个文件一样的单个迭代器非常方便。(此外,事实是fileinput,一旦完成就关闭每个文件,这意味着不需要withclose每个文件,但这只是节省一行,而不是什么大不了的。)

中还有其他一些漂亮的功能fileinput,例如仅通过过滤每一行就可以对文件进行就地修改的功能。


正如评论中所述,并在另一篇文章中讨论的那样,fileinputPython 2.7将无法按指示工作。在这里稍作修改以使代码与Python 2.7兼容

with open('outfilename', 'w') as fout:
    fin = fileinput.input(filenames)
    for line in fin:
        fout.write(line)
    fin.close()

That’s exactly what fileinput is for:

import fileinput
with open(outfilename, 'w') as fout, fileinput.input(filenames) as fin:
    for line in fin:
        fout.write(line)

For this use case, it’s really not much simpler than just iterating over the files manually, but in other cases, having a single iterator that iterates over all of the files as if they were a single file is very handy. (Also, the fact that fileinput closes each file as soon as it’s done means there’s no need to with or close each one, but that’s just a one-line savings, not that big of a deal.)

There are some other nifty features in fileinput, like the ability to do in-place modifications of files just by filtering each line.


As noted in the comments, and discussed in another post, fileinput for Python 2.7 will not work as indicated. Here slight modification to make the code Python 2.7 compliant

with open('outfilename', 'w') as fout:
    fin = fileinput.input(filenames)
    for line in fin:
        fout.write(line)
    fin.close()

回答 3

我对优雅并不了解,但这可行:

    import glob
    import os
    for f in glob.glob("file*.txt"):
         os.system("cat "+f+" >> OutFile.txt")

I don’t know about elegance, but this works:

    import glob
    import os
    for f in glob.glob("file*.txt"):
         os.system("cat "+f+" >> OutFile.txt")

回答 4

UNIX命令怎么了?(假设您不在Windows上工作):

ls | xargs cat | tee output.txt 做这项工作(如果需要,您可以使用子进程从python调用它)

What’s wrong with UNIX commands ? (given you’re not working on Windows) :

ls | xargs cat | tee output.txt does the job ( you can call it from python with subprocess if you want)


回答 5

outfile.write(infile.read()) # time: 2.1085190773010254s
shutil.copyfileobj(fd, wfd, 1024*1024*10) # time: 0.60599684715271s

一个简单的基准表明,shutil性能更好。

outfile.write(infile.read()) # time: 2.1085190773010254s
shutil.copyfileobj(fd, wfd, 1024*1024*10) # time: 0.60599684715271s

A simple benchmark shows that the shutil performs better.


回答 6

@ inspectorG4dget答案的替代方法(最佳答案日期29-03-2016)。我测试了436MB的3个文件。

@ inspectorG4dget解决方案:162秒

解决方法:125秒

from subprocess import Popen
filenames = ['file1.txt', 'file2.txt', 'file3.txt']
fbatch = open('batch.bat','w')
str ="type "
for f in filenames:
    str+= f + " "
fbatch.write(str + " > file4results.txt")
fbatch.close()
p = Popen("batch.bat", cwd=r"Drive:\Path\to\folder")
stdout, stderr = p.communicate()

这个想法是利用“旧的好技术”来创建并执行一个批处理文件。它是半Python,但运行速度更快。适用于Windows。

An alternative to @inspectorG4dget answer (best answer to date 29-03-2016). I tested with 3 files of 436MB.

@inspectorG4dget solution: 162 seconds

The following solution : 125 seconds

from subprocess import Popen
filenames = ['file1.txt', 'file2.txt', 'file3.txt']
fbatch = open('batch.bat','w')
str ="type "
for f in filenames:
    str+= f + " "
fbatch.write(str + " > file4results.txt")
fbatch.close()
p = Popen("batch.bat", cwd=r"Drive:\Path\to\folder")
stdout, stderr = p.communicate()

The idea is to create a batch file and execute it, taking advantage of “old good technology”. Its semi-python but works faster. Works for windows.


回答 7

如果目录中有很多文件,则glob2最好是生成文件名列表,而不是手工编写文件名。

import glob2

filenames = glob2.glob('*.txt')  # list of all .txt files in the directory

with open('outfile.txt', 'w') as f:
    for file in filenames:
        with open(file) as infile:
            f.write(infile.read()+'\n')

If you have a lot of files in the directory then glob2 might be a better option to generate a list of filenames rather than writing them by hand.

import glob2

filenames = glob2.glob('*.txt')  # list of all .txt files in the directory

with open('outfile.txt', 'w') as f:
    for file in filenames:
        with open(file) as infile:
            f.write(infile.read()+'\n')

回答 8

检出File对象的.read()方法:

http://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects

您可以执行以下操作:

concat = ""
for file in files:
    concat += open(file).read()

或更“优雅”的python-way:

concat = ''.join([open(f).read() for f in files])

根据这篇文章:http : //www.skymind.com/~ocrow/python_string/也是最快的。

Check out the .read() method of the File object:

http://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects

You could do something like:

concat = ""
for file in files:
    concat += open(file).read()

or a more ‘elegant’ python-way:

concat = ''.join([open(f).read() for f in files])

which, according to this article: http://www.skymind.com/~ocrow/python_string/ would also be the fastest.


回答 9

如果文件不是巨大的:

with open('newfile.txt','wb') as newf:
    for filename in list_of_files:
        with open(filename,'rb') as hf:
            newf.write(hf.read())
            # newf.write('\n\n\n')   if you want to introduce
            # some blank lines between the contents of the copied files

如果文件太大而无法完全读取并保存在RAM中,则算法必须有所不同read(10000),例如使用固定长度的块读取要循环复制的每个文件。

If the files are not gigantic:

with open('newfile.txt','wb') as newf:
    for filename in list_of_files:
        with open(filename,'rb') as hf:
            newf.write(hf.read())
            # newf.write('\n\n\n')   if you want to introduce
            # some blank lines between the contents of the copied files

If the files are too big to be entirely read and held in RAM, the algorithm must be a little different to read each file to be copied in a loop by chunks of fixed length, using read(10000) for example.


回答 10

def concatFiles():
    path = 'input/'
    files = os.listdir(path)
    for idx, infile in enumerate(files):
        print ("File #" + str(idx) + "  " + infile)
    concat = ''.join([open(path + f).read() for f in files])
    with open("output_concatFile.txt", "w") as fo:
        fo.write(path + concat)

if __name__ == "__main__":
    concatFiles()
def concatFiles():
    path = 'input/'
    files = os.listdir(path)
    for idx, infile in enumerate(files):
        print ("File #" + str(idx) + "  " + infile)
    concat = ''.join([open(path + f).read() for f in files])
    with open("output_concatFile.txt", "w") as fo:
        fo.write(path + concat)

if __name__ == "__main__":
    concatFiles()

回答 11

  import os
  files=os.listdir()
  print(files)
  print('#',tuple(files))
  name=input('Enter the inclusive file name: ')
  exten=input('Enter the type(extension): ')
  filename=name+'.'+exten
  output_file=open(filename,'w+')
  for i in files:
    print(i)
    j=files.index(i)
    f_j=open(i,'r')
    print(f_j.read())
    for x in f_j:
      outfile.write(x)
  import os
  files=os.listdir()
  print(files)
  print('#',tuple(files))
  name=input('Enter the inclusive file name: ')
  exten=input('Enter the type(extension): ')
  filename=name+'.'+exten
  output_file=open(filename,'w+')
  for i in files:
    print(i)
    j=files.index(i)
    f_j=open(i,'r')
    print(f_j.read())
    for x in f_j:
      outfile.write(x)

连接两个一维NumPy数组

问题:连接两个一维NumPy数组

我在NumPy中有两个简单的一维数组。我应该能够使用numpy.concatenate将它们连接起来。但是我收到以下代码的错误:

TypeError:只有length-1数组可以转换为Python标量

import numpy
a = numpy.array([1, 2, 3])
b = numpy.array([5, 6])
numpy.concatenate(a, b)

为什么?

I have two simple one-dimensional arrays in NumPy. I should be able to concatenate them using numpy.concatenate. But I get this error for the code below:

TypeError: only length-1 arrays can be converted to Python scalars

Code

import numpy
a = numpy.array([1, 2, 3])
b = numpy.array([5, 6])
numpy.concatenate(a, b)

Why?


回答 0

该行应为:

numpy.concatenate([a,b])

要连接的数组需要作为一个序列而不是作为单独的参数传递。

NumPy文档中

numpy.concatenate((a1, a2, ...), axis=0)

将一系列数组连接在一起。

它试图将您解释b为axis参数,这就是为什么它抱怨无法将其转换为标量。

The line should be:

numpy.concatenate([a,b])

The arrays you want to concatenate need to be passed in as a sequence, not as separate arguments.

From the NumPy documentation:

numpy.concatenate((a1, a2, ...), axis=0)

Join a sequence of arrays together.

It was trying to interpret your b as the axis parameter, which is why it complained it couldn’t convert it into a scalar.


回答 1

连接一维数组有多种可能性,例如,

numpy.r_[a, a],
numpy.stack([a, a]).reshape(-1),
numpy.hstack([a, a]),
numpy.concatenate([a, a])

对于大型阵列,所有这些选项都同样快。对于小型的,concatenate有一点优势:

该图是使用perfplot创建的:

import numpy
import perfplot

perfplot.show(
    setup=lambda n: numpy.random.rand(n),
    kernels=[
        lambda a: numpy.r_[a, a],
        lambda a: numpy.stack([a, a]).reshape(-1),
        lambda a: numpy.hstack([a, a]),
        lambda a: numpy.concatenate([a, a]),
    ],
    labels=["r_", "stack+reshape", "hstack", "concatenate"],
    n_range=[2 ** k for k in range(19)],
    xlabel="len(a)",
)

There are several possibilities for concatenating 1D arrays, e.g.,

numpy.r_[a, a],
numpy.stack([a, a]).reshape(-1),
numpy.hstack([a, a]),
numpy.concatenate([a, a])

All those options are equally fast for large arrays; for small ones, concatenate has a slight edge:

The plot was created with perfplot:

import numpy
import perfplot

perfplot.show(
    setup=lambda n: numpy.random.rand(n),
    kernels=[
        lambda a: numpy.r_[a, a],
        lambda a: numpy.stack([a, a]).reshape(-1),
        lambda a: numpy.hstack([a, a]),
        lambda a: numpy.concatenate([a, a]),
    ],
    labels=["r_", "stack+reshape", "hstack", "concatenate"],
    n_range=[2 ** k for k in range(19)],
    xlabel="len(a)",
)

回答 2

的第一个参数concatenate本身应该是要串联的数组序列

numpy.concatenate((a,b)) # Note the extra parentheses.

The first parameter to concatenate should itself be a sequence of arrays to concatenate:

numpy.concatenate((a,b)) # Note the extra parentheses.

回答 3

另一种方法是使用“ concatenate”的缩写形式,即“ r _ […]”或“ c _ […]”,如下面的示例代码所示(请参见http://wiki.scipy.org / NumPy_for_Matlab_Users以获取更多信息):

%pylab
vector_a = r_[0.:10.] #short form of "arange"
vector_b = array([1,1,1,1])
vector_c = r_[vector_a,vector_b]
print vector_a
print vector_b
print vector_c, '\n\n'

a = ones((3,4))*4
print a, '\n'
c = array([1,1,1])
b = c_[a,c]
print b, '\n\n'

a = ones((4,3))*4
print a, '\n'
c = array([[1,1,1]])
b = r_[a,c]
print b

print type(vector_b)

结果是:

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
[1 1 1 1]
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.  1.  1.  1.  1.] 


[[ 4.  4.  4.  4.]
 [ 4.  4.  4.  4.]
 [ 4.  4.  4.  4.]] 

[[ 4.  4.  4.  4.  1.]
 [ 4.  4.  4.  4.  1.]
 [ 4.  4.  4.  4.  1.]] 


[[ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]] 

[[ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 1.  1.  1.]]

An alternative ist to use the short form of “concatenate” which is either “r_[…]” or “c_[…]” as shown in the example code beneath (see http://wiki.scipy.org/NumPy_for_Matlab_Users for additional information):

%pylab
vector_a = r_[0.:10.] #short form of "arange"
vector_b = array([1,1,1,1])
vector_c = r_[vector_a,vector_b]
print vector_a
print vector_b
print vector_c, '\n\n'

a = ones((3,4))*4
print a, '\n'
c = array([1,1,1])
b = c_[a,c]
print b, '\n\n'

a = ones((4,3))*4
print a, '\n'
c = array([[1,1,1]])
b = r_[a,c]
print b

print type(vector_b)

Which results in:

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
[1 1 1 1]
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.  1.  1.  1.  1.] 


[[ 4.  4.  4.  4.]
 [ 4.  4.  4.  4.]
 [ 4.  4.  4.  4.]] 

[[ 4.  4.  4.  4.  1.]
 [ 4.  4.  4.  4.  1.]
 [ 4.  4.  4.  4.  1.]] 


[[ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]] 

[[ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 1.  1.  1.]]

回答 4

以下是使用numpy.ravel(),的更多方法:numpy.array()利用一维数组可以解包为普通元素的事实:

# we'll utilize the concept of unpacking
In [15]: (*a, *b)
Out[15]: (1, 2, 3, 5, 6)

# using `numpy.ravel()`
In [14]: np.ravel((*a, *b))
Out[14]: array([1, 2, 3, 5, 6])

# wrap the unpacked elements in `numpy.array()`
In [16]: np.array((*a, *b))
Out[16]: array([1, 2, 3, 5, 6])

Here are more approaches for doing this by using numpy.ravel(), numpy.array(), utilizing the fact that 1D arrays can be unpacked into plain elements:

# we'll utilize the concept of unpacking
In [15]: (*a, *b)
Out[15]: (1, 2, 3, 5, 6)

# using `numpy.ravel()`
In [14]: np.ravel((*a, *b))
Out[14]: array([1, 2, 3, 5, 6])

# wrap the unpacked elements in `numpy.array()`
In [16]: np.array((*a, *b))
Out[16]: array([1, 2, 3, 5, 6])

回答 5

来自numpy 文档的更多事实:

语法为 numpy.concatenate((a1, a2, ...), axis=0, out=None)

轴= 0用于行连接轴= 1用于列连接

>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])

# Appending below last row
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
       [3, 4],
       [5, 6]])

# Appending after last column
>>> np.concatenate((a, b.T), axis=1)    # Notice the transpose
array([[1, 2, 5],
       [3, 4, 6]])

# Flattening the final array
>>> np.concatenate((a, b), axis=None)
array([1, 2, 3, 4, 5, 6])

希望对您有所帮助!

Some more facts from the numpy docs :

With syntax as numpy.concatenate((a1, a2, ...), axis=0, out=None)

axis = 0 for row-wise concatenation axis = 1 for column-wise concatenation

>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])

# Appending below last row
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
       [3, 4],
       [5, 6]])

# Appending after last column
>>> np.concatenate((a, b.T), axis=1)    # Notice the transpose
array([[1, 2, 5],
       [3, 4, 6]])

# Flattening the final array
>>> np.concatenate((a, b), axis=None)
array([1, 2, 3, 4, 5, 6])

I hope it helps !


将列表中的项目连接到字符串

问题:将列表中的项目连接到字符串

有没有更简单的方法将列表中的字符串项连接为单个字符串?我可以使用该str.join()功能吗?

例如,这是输入['this','is','a','sentence'],这是所需的输出this-is-a-sentence

sentence = ['this','is','a','sentence']
sent_str = ""
for i in sentence:
    sent_str += str(i) + "-"
sent_str = sent_str[:-1]
print sent_str

Is there a simpler way to concatenate string items in a list into a single string? Can I use the str.join() function?

E.g. this is the input ['this','is','a','sentence'] and this is the desired output this-is-a-sentence

sentence = ['this','is','a','sentence']
sent_str = ""
for i in sentence:
    sent_str += str(i) + "-"
sent_str = sent_str[:-1]
print sent_str

回答 0

用途join

>>> sentence = ['this','is','a','sentence']
>>> '-'.join(sentence)
'this-is-a-sentence'

Use join:

>>> sentence = ['this','is','a','sentence']
>>> '-'.join(sentence)
'this-is-a-sentence'

回答 1

将python列表转换为字符串的更通用的方法是:

>>> my_lst = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> my_lst_str = ''.join(map(str, my_lst))
>>> print(my_lst_str)
'12345678910'

A more generic way to convert python lists to strings would be:

>>> my_lst = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> my_lst_str = ''.join(map(str, my_lst))
>>> print(my_lst_str)
'12345678910'

回答 2

对于初学者来说,了解join为什么是字符串方法非常有用 。

一开始很奇怪,但此后非常有用。

连接的结果始终是一个字符串,但是要连接的对象可以有多种类型(生成器,列表,元组等)。

.join更快,因为它只分配一次内存。比经典串联更好(请参阅扩展说明)。

一旦学习了它,它就会非常舒适,您可以执行以下技巧来添加括号。

>>> ",".join("12345").join(("(",")"))
Out:
'(1,2,3,4,5)'

>>> list = ["(",")"]
>>> ",".join("12345").join(list)
Out:
'(1,2,3,4,5)'

It’s very useful for beginners to know why join is a string method.

It’s very strange at the beginning, but very useful after this.

The result of join is always a string, but the object to be joined can be of many types (generators, list, tuples, etc).

.join is faster because it allocates memory only once. Better than classical concatenation (see, extended explanation).

Once you learn it, it’s very comfortable and you can do tricks like this to add parentheses.

>>> ",".join("12345").join(("(",")"))
Out:
'(1,2,3,4,5)'

>>> list = ["(",")"]
>>> ",".join("12345").join(list)
Out:
'(1,2,3,4,5)'

回答 3

尽管@Burhan Khalid的回答很好,但我认为这样更容易理解:

from str import join

sentence = ['this','is','a','sentence']

join(sentence, "-") 

join()的第二个参数是可选的,默认为“”。

编辑:此功能已在Python 3中删除

Although @Burhan Khalid’s answer is good, I think it’s more understandable like this:

from str import join

sentence = ['this','is','a','sentence']

join(sentence, "-") 

The second argument to join() is optional and defaults to ” “.

EDIT: This function was removed in Python 3


回答 4

我们可以指定如何连接字符串。除了使用’-‘,我们还可以使用”

sentence = ['this','is','a','sentence']
s=(" ".join(sentence))
print(s)

We can specify how we have to join the string. Instead of ‘-‘, we can use ‘ ‘

sentence = ['this','is','a','sentence']
s=(" ".join(sentence))
print(s)

回答 5

我们还可以使用Python的reduce函数:

from functools import reduce

sentence = ['this','is','a','sentence']
out_str = str(reduce(lambda x,y: x+"-"+y, sentence))
print(out_str)

We can also use Python’s reduce function:

from functools import reduce

sentence = ['this','is','a','sentence']
out_str = str(reduce(lambda x,y: x+"-"+y, sentence))
print(out_str)

回答 6

def eggs(someParameter):
    del spam[3]
    someParameter.insert(3, ' and cats.')


spam = ['apples', 'bananas', 'tofu', 'cats']
eggs(spam)
spam =(','.join(spam))
print(spam)
def eggs(someParameter):
    del spam[3]
    someParameter.insert(3, ' and cats.')


spam = ['apples', 'bananas', 'tofu', 'cats']
eggs(spam)
spam =(','.join(spam))
print(spam)

将多个csv文件导入到pandas中并串联到一个DataFrame中

问题:将多个csv文件导入到pandas中并串联到一个DataFrame中

我想将目录中的多个csv文件读入pandas,并将它们连接成一个大的DataFrame。我还无法弄清楚。这是我到目前为止的内容:

import glob
import pandas as pd

# get data file names
path =r'C:\DRO\DCL_rawdata_files'
filenames = glob.glob(path + "/*.csv")

dfs = []
for filename in filenames:
    dfs.append(pd.read_csv(filename))

# Concatenate all data into one DataFrame
big_frame = pd.concat(dfs, ignore_index=True)

我想我在for循环中需要一些帮助吗???

I would like to read several csv files from a directory into pandas and concatenate them into one big DataFrame. I have not been able to figure it out though. Here is what I have so far:

import glob
import pandas as pd

# get data file names
path =r'C:\DRO\DCL_rawdata_files'
filenames = glob.glob(path + "/*.csv")

dfs = []
for filename in filenames:
    dfs.append(pd.read_csv(filename))

# Concatenate all data into one DataFrame
big_frame = pd.concat(dfs, ignore_index=True)

I guess I need some help within the for loop???


回答 0

如果所有csv文件中的列均相同,则可以尝试以下代码。我已添加,header=0以便在读取后csv可以将第一行分配为列名。

import pandas as pd
import glob

path = r'C:\DRO\DCL_rawdata_files' # use your path
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

If you have same columns in all your csv files then you can try the code below. I have added header=0 so that after reading csv first row can be assigned as the column names.

import pandas as pd
import glob

path = r'C:\DRO\DCL_rawdata_files' # use your path
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

回答 1

替代darindaCoder的答案

path = r'C:\DRO\DCL_rawdata_files'                     # use your path
all_files = glob.glob(os.path.join(path, "*.csv"))     # advisable to use os.path.join as this makes concatenation OS independent

df_from_each_file = (pd.read_csv(f) for f in all_files)
concatenated_df   = pd.concat(df_from_each_file, ignore_index=True)
# doesn't create a list, nor does it append to one

An alternative to darindaCoder’s answer:

path = r'C:\DRO\DCL_rawdata_files'                     # use your path
all_files = glob.glob(os.path.join(path, "*.csv"))     # advisable to use os.path.join as this makes concatenation OS independent

df_from_each_file = (pd.read_csv(f) for f in all_files)
concatenated_df   = pd.concat(df_from_each_file, ignore_index=True)
# doesn't create a list, nor does it append to one

回答 2

import glob, os    
df = pd.concat(map(pd.read_csv, glob.glob(os.path.join('', "my_files*.csv"))))
import glob, os    
df = pd.concat(map(pd.read_csv, glob.glob(os.path.join('', "my_files*.csv"))))

回答 3

Dask库可以从多个文件读取数据帧:

>>> import dask.dataframe as dd
>>> df = dd.read_csv('data*.csv')

(来源:http : //dask.pydata.org/en/latest/examples/dataframe-csv.html

Dask数据框实现了Pandas数据框API的子集。如果所有数据都适合内存,则可以调用df.compute()将数据框转换为Pandas数据框。

The Dask library can read a dataframe from multiple files:

>>> import dask.dataframe as dd
>>> df = dd.read_csv('data*.csv')

(Source: http://dask.pydata.org/en/latest/examples/dataframe-csv.html)

The Dask dataframes implement a subset of the Pandas dataframe API. If all the data fits into memory, you can call df.compute() to convert the dataframe into a Pandas dataframe.


回答 4

这里几乎所有答案都是不必要的复杂(全局模式匹配)或依赖于其他第三方库。您可以使用已内置的Pandas和python(所有版本)在2行中执行此操作。

对于一些文件-1个衬纸:

df = pd.concat(map(pd.read_csv, ['data/d1.csv', 'data/d2.csv','data/d3.csv']))

对于许多文件:

from os import listdir

filepaths = [f for f in listdir("./data") if f.endswith('.csv')]
df = pd.concat(map(pd.read_csv, filepaths))

设置df的这条熊猫线利用了3件事:

  1. Python的地图(函数,可迭代)发送到函数( pd.read_csv()可迭代(我们的列表)(是文件路径中的每个csv元素)。
  2. 熊猫的read_csv()函数可以正常读取每个CSV文件。
  3. 熊猫的concat()将所有这些都放在一个df变量下。

Almost all of the answers here are either unnecessarily complex (glob pattern matching) or rely on additional 3rd party libraries. You can do this in 2 lines using everything Pandas and python (all versions) already have built in.

For a few files – 1 liner:

df = pd.concat(map(pd.read_csv, ['data/d1.csv', 'data/d2.csv','data/d3.csv']))

For many files:

from os import listdir

filepaths = [f for f in listdir("./data") if f.endswith('.csv')]
df = pd.concat(map(pd.read_csv, filepaths))

This pandas line which sets the df utilizes 3 things:

  1. Python’s map (function, iterable) sends to the function (the pd.read_csv()) the iterable (our list) which is every csv element in filepaths).
  2. Panda’s read_csv() function reads in each CSV file as normal.
  3. Panda’s concat() brings all these under one df variable.

回答 5

编辑:我用谷歌搜索https://stackoverflow.com/a/21232849/186078。但是,最近我发现使用numpy进行任何操作,然后将其分配给数据框一次,而不是在迭代的基础上操纵数据框本身,这样更快,并且似乎也可以在此解决方案中工作。

我确实希望任何访问此页面的人都考虑采用这种方法,但又不想将这段巨大的代码作为注释并使其可读性降低。

您可以利用numpy真正加快数据帧的连接速度。

import os
import glob
import pandas as pd
import numpy as np

path = "my_dir_full_path"
allFiles = glob.glob(os.path.join(path,"*.csv"))


np_array_list = []
for file_ in allFiles:
    df = pd.read_csv(file_,index_col=None, header=0)
    np_array_list.append(df.as_matrix())

comb_np_array = np.vstack(np_array_list)
big_frame = pd.DataFrame(comb_np_array)

big_frame.columns = ["col1","col2"....]

时间统计:

total files :192
avg lines per file :8492
--approach 1 without numpy -- 8.248656988143921 seconds ---
total records old :1630571
--approach 2 with numpy -- 2.289292573928833 seconds ---

Edit: I googled my way into https://stackoverflow.com/a/21232849/186078. However of late I am finding it faster to do any manipulation using numpy and then assigning it once to dataframe rather than manipulating the dataframe itself on an iterative basis and it seems to work in this solution too.

I do sincerely want anyone hitting this page to consider this approach, but don’t want to attach this huge piece of code as a comment and making it less readable.

You can leverage numpy to really speed up the dataframe concatenation.

import os
import glob
import pandas as pd
import numpy as np

path = "my_dir_full_path"
allFiles = glob.glob(os.path.join(path,"*.csv"))


np_array_list = []
for file_ in allFiles:
    df = pd.read_csv(file_,index_col=None, header=0)
    np_array_list.append(df.as_matrix())

comb_np_array = np.vstack(np_array_list)
big_frame = pd.DataFrame(comb_np_array)

big_frame.columns = ["col1","col2"....]

Timing stats:

total files :192
avg lines per file :8492
--approach 1 without numpy -- 8.248656988143921 seconds ---
total records old :1630571
--approach 2 with numpy -- 2.289292573928833 seconds ---

回答 6

如果要递归搜索Python 3.5或更高版本),则可以执行以下操作:

from glob import iglob
import pandas as pd

path = r'C:\user\your\path\**\*.csv'

all_rec = iglob(path, recursive=True)     
dataframes = (pd.read_csv(f) for f in all_rec)
big_dataframe = pd.concat(dataframes, ignore_index=True)

请注意,最后三行可以用一行表示:

df = pd.concat((pd.read_csv(f) for f in iglob(path, recursive=True)), ignore_index=True)

您可以在** 此处找到文档。另外,我用iglob代替glob,因为它返回一个迭代器而不是列表。



编辑:多平台递归函数:

您可以将以上内容包装到一个多平台功能(Linux,Windows,Mac)中,因此可以执行以下操作:

df = read_df_rec('C:\user\your\path', *.csv)

这是函数:

from glob import iglob
from os.path import join
import pandas as pd

def read_df_rec(path, fn_regex=r'*.csv'):
    return pd.concat((pd.read_csv(f) for f in iglob(
        join(path, '**', fn_regex), recursive=True)), ignore_index=True)

If you want to search recursively (Python 3.5 or above), you can do the following:

from glob import iglob
import pandas as pd

path = r'C:\user\your\path\**\*.csv'

all_rec = iglob(path, recursive=True)     
dataframes = (pd.read_csv(f) for f in all_rec)
big_dataframe = pd.concat(dataframes, ignore_index=True)

Note that the three last lines can be expressed in one single line:

df = pd.concat((pd.read_csv(f) for f in iglob(path, recursive=True)), ignore_index=True)

You can find the documentation of ** here. Also, I used iglobinstead of glob, as it returns an iterator instead of a list.



EDIT: Multiplatform recursive function:

You can wrap the above into a multiplatform function (Linux, Windows, Mac), so you can do:

df = read_df_rec('C:\user\your\path', *.csv)

Here is the function:

from glob import iglob
from os.path import join
import pandas as pd

def read_df_rec(path, fn_regex=r'*.csv'):
    return pd.concat((pd.read_csv(f) for f in iglob(
        join(path, '**', fn_regex), recursive=True)), ignore_index=True)

回答 7

方便快捷

导入两个或多个csv而不需要列出名称。

import glob

df = pd.concat(map(pd.read_csv, glob.glob('data/*.csv')))

Easy and Fast

Import two or more csv‘s without having to make a list of names.

import glob

df = pd.concat(map(pd.read_csv, glob.glob('data/*.csv')))

回答 8

一个衬里使用map,但是如果您要指定其他参数,则可以执行以下操作:

import pandas as pd
import glob
import functools

df = pd.concat(map(functools.partial(pd.read_csv, sep='|', compression=None), 
                    glob.glob("data/*.csv")))

注意:map本身不允许您提供其他参数。

one liner using map, but if you’d like to specify additional args, you could do:

import pandas as pd
import glob
import functools

df = pd.concat(map(functools.partial(pd.read_csv, sep='|', compression=None), 
                    glob.glob("data/*.csv")))

Note: map by itself does not let you supply additional args.


回答 9

如果压缩了多个csv文件,则可以使用zipfile读取全部内容并进行如下连接:

import zipfile
import numpy as np
import pandas as pd

ziptrain = zipfile.ZipFile('yourpath/yourfile.zip')

train=[]

for f in range(0,len(ziptrain.namelist())):
    if (f == 0):
        train = pd.read_csv(ziptrain.open(ziptrain.namelist()[f]))
    else:
        my_df = pd.read_csv(ziptrain.open(ziptrain.namelist()[f]))
        train = (pd.DataFrame(np.concatenate((train,my_df),axis=0), 
                          columns=list(my_df.columns.values)))

If the multiple csv files are zipped, you may use zipfile to read all and concatenate as below:

import zipfile
import numpy as np
import pandas as pd

ziptrain = zipfile.ZipFile('yourpath/yourfile.zip')

train=[]

for f in range(0,len(ziptrain.namelist())):
    if (f == 0):
        train = pd.read_csv(ziptrain.open(ziptrain.namelist()[f]))
    else:
        my_df = pd.read_csv(ziptrain.open(ziptrain.namelist()[f]))
        train = (pd.DataFrame(np.concatenate((train,my_df),axis=0), 
                          columns=list(my_df.columns.values)))

回答 10

另一个具有列表理解功能的内联函数,它允许将参数与read_csv一起使用。

df = pd.concat([pd.read_csv(f'dir/{f}') for f in os.listdir('dir') if f.endswith('.csv')])

Another on-liner with list comprehension which allows to use arguments with read_csv.

df = pd.concat([pd.read_csv(f'dir/{f}') for f in os.listdir('dir') if f.endswith('.csv')])

回答 11

基于@Sid的正确答案。

串联之前,您可以将csv文件加载到中间字典中,该字典可以根据文件名(格式为dict_of_df['filename.csv'])访问每个数据集。例如,当列名未对齐时,此类词典可帮助您识别异构数据格式的问题。

导入模块并找到文件路径:

import os
import glob
import pandas
from collections import OrderedDict
path =r'C:\DRO\DCL_rawdata_files'
filenames = glob.glob(path + "/*.csv")

注意:OrderedDict不是必需的,但是它将保留文件顺序,这可能对分析有用。

将csv文件加载到字典中。然后连接:

dict_of_df = OrderedDict((f, pandas.read_csv(f)) for f in filenames)
pandas.concat(dict_of_df, sort=True)

键是文件名f,值是csv文件的数据帧内容。除了f用作字典键之外,还可以使用os.path.basename(f)或其他os.path方法将字典中键的大小减小到仅相关的较小部分。

Based on @Sid’s good answer.

Before concatenating, you can load csv files into an intermediate dictionary which gives access to each data set based on the file name (in the form dict_of_df['filename.csv']). Such a dictionary can help you identify issues with heterogeneous data formats, when column names are not aligned for example.

Import modules and locate file paths:

import os
import glob
import pandas
from collections import OrderedDict
path =r'C:\DRO\DCL_rawdata_files'
filenames = glob.glob(path + "/*.csv")

Note: OrderedDict is not necessary, but it’ll keep the order of files which might be useful for analysis.

Load csv files into a dictionary. Then concatenate:

dict_of_df = OrderedDict((f, pandas.read_csv(f)) for f in filenames)
pandas.concat(dict_of_df, sort=True)

Keys are file names f and values are the data frame content of csv files. Instead of using f as a dictionary key, you can also use os.path.basename(f) or other os.path methods to reduce the size of the key in the dictionary to only the smaller part that is relevant.


回答 12

使用pathlib库的替代方法(通常首选而不是os.path)。

此方法避免了pandas concat()/的迭代使用apped()

从pandas文档中:
值得注意的是,concat()(因此,append())会完整复制数据,并且不断重用此函数可能会对性能产生重大影响。如果需要对多个数据集使用该操作,请使用列表推导。

import pandas as pd
from pathlib import Path

dir = Path("../relevant_directory")

df = (pd.read_csv(f) for f in dir.glob("*.csv"))
df = pd.concat(df)

Alternative using the pathlib library (often preferred over os.path).

This method avoids iterative use of pandas concat()/apped().

From the pandas documentation:
It is worth noting that concat() (and therefore append()) makes a full copy of the data, and that constantly reusing this function can create a significant performance hit. If you need to use the operation over several datasets, use a list comprehension.

import pandas as pd
from pathlib import Path

dir = Path("../relevant_directory")

df = (pd.read_csv(f) for f in dir.glob("*.csv"))
df = pd.concat(df)

回答 13

这是在Google云端硬盘上使用Colab的方式

import pandas as pd
import glob

path = r'/content/drive/My Drive/data/actual/comments_only' # use your path
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True,sort=True)
frame.to_csv('/content/drive/onefile.csv')

This is how you can do using Colab on Google Drive

import pandas as pd
import glob

path = r'/content/drive/My Drive/data/actual/comments_only' # use your path
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True,sort=True)
frame.to_csv('/content/drive/onefile.csv')

回答 14

import pandas as pd
import glob

path = r'C:\DRO\DCL_rawdata_files' # use your path
file_path_list = glob.glob(path + "/*.csv")

file_iter = iter(file_path_list)

list_df_csv = []
list_df_csv.append(pd.read_csv(next(file_iter)))

for file in file_iter:
    lsit_df_csv.append(pd.read_csv(file, header=0))
df = pd.concat(lsit_df_csv, ignore_index=True)
import pandas as pd
import glob

path = r'C:\DRO\DCL_rawdata_files' # use your path
file_path_list = glob.glob(path + "/*.csv")

file_iter = iter(file_path_list)

list_df_csv = []
list_df_csv.append(pd.read_csv(next(file_iter)))

for file in file_iter:
    lsit_df_csv.append(pd.read_csv(file, header=0))
df = pd.concat(lsit_df_csv, ignore_index=True)

如何在Python中串联两个列表?

问题:如何在Python中串联两个列表?

如何在Python中串联两个列表?

例:

listone = [1, 2, 3]
listtwo = [4, 5, 6]

预期结果:

>>> joinedlist
[1, 2, 3, 4, 5, 6]

How do I concatenate two lists in Python?

Example:

listone = [1, 2, 3]
listtwo = [4, 5, 6]

Expected outcome:

>>> joinedlist
[1, 2, 3, 4, 5, 6]

回答 0

您可以使用+运算符来组合它们:

listone = [1,2,3]
listtwo = [4,5,6]

joinedlist = listone + listtwo

输出:

>>> joinedlist
[1,2,3,4,5,6]

You can use the + operator to combine them:

listone = [1,2,3]
listtwo = [4,5,6]

joinedlist = listone + listtwo

Output:

>>> joinedlist
[1,2,3,4,5,6]

回答 1

也可以创建一个生成器,使用来简单地遍历两个列表中的项目itertools.chain()。这使您可以将列表(或任何可迭代的)链接在一起进行处理,而无需将项目复制到新列表中:

import itertools
for item in itertools.chain(listone, listtwo):
    # Do something with each list item

It’s also possible to create a generator that simply iterates over the items in both lists using itertools.chain(). This allows you to chain lists (or any iterable) together for processing without copying the items to a new list:

import itertools
for item in itertools.chain(listone, listtwo):
    # Do something with each list item

回答 2

Python >= 3.5替代品:[*l1, *l2]

通过接受已经引入了另一种选择PEP 448,值得一提。

当在Python中使用加星标的表达式时,PEP的标题为Additional Unpacking Generalizations,通常会减少一些语法限制*。使用它,现在还可以使用以下方法来加入两个列表(适用于任何可迭代对象):

>>> l1 = [1, 2, 3]
>>> l2 = [4, 5, 6]
>>> joined_list = [*l1, *l2]  # unpack both iterables in a list literal
>>> print(joined_list)
[1, 2, 3, 4, 5, 6]

此功能是为Python定义的,3.5尚未回迁到该3.x系列的先前版本中。在不受支持的版本SyntaxError中,将被提出。

与其他方法一样,这也会在相应列表中创建元素的浅表副本


这种方法的好处是,您实际上不需要列表即可执行它,任何可迭代的操作都可以。如PEP中所述:

这对于将可迭代项求和成一个列表(如my_list + list(my_tuple) + list(my_range)现在等同于just)的可读性更高,也很有用[*my_list, *my_tuple, *my_range]

因此,虽然加上+与会TypeError由于类型不匹配而引发:

l = [1, 2, 3]
r = range(4, 7)
res = l + r

以下内容不会:

res = [*l, *r]

因为它首先将可迭代对象的内容解包,然后list仅从内容中创建一个即可。

Python >= 3.5 alternative: [*l1, *l2]

Another alternative has been introduced via the acceptance of PEP 448 which deserves mentioning.

The PEP, titled Additional Unpacking Generalizations, generally reduced some syntactic restrictions when using the starred * expression in Python; with it, joining two lists (applies to any iterable) can now also be done with:

>>> l1 = [1, 2, 3]
>>> l2 = [4, 5, 6]
>>> joined_list = [*l1, *l2]  # unpack both iterables in a list literal
>>> print(joined_list)
[1, 2, 3, 4, 5, 6]

This functionality was defined for Python 3.5 it hasn’t been backported to previous versions in the 3.x family. In unsupported versions a SyntaxError is going to be raised.

As with the other approaches, this too creates as shallow copy of the elements in the corresponding lists.


The upside to this approach is that you really don’t need lists in order to perform it, anything that is iterable will do. As stated in the PEP:

This is also useful as a more readable way of summing iterables into a list, such as my_list + list(my_tuple) + list(my_range) which is now equivalent to just [*my_list, *my_tuple, *my_range].

So while addition with + would raise a TypeError due to type mismatch:

l = [1, 2, 3]
r = range(4, 7)
res = l + r

The following won’t:

res = [*l, *r]

because it will first unpack the contents of the iterables and then simply create a list from the contents.


回答 3

您可以使用集合来获取唯一值的合并列表

mergedlist = list(set(listone + listtwo))

You can use sets to obtain merged list of unique values

mergedlist = list(set(listone + listtwo))

回答 4

您也可以使用list.extend()方法将a添加list到另一个的结尾:

listone = [1,2,3]
listtwo = [4,5,6]

listone.extend(listtwo)

如果要保持原始列表完整无缺,则可以创建一个新list对象,并且extend两个列表都指向该对象:

mergedlist = []
mergedlist.extend(listone)
mergedlist.extend(listtwo)

You could also use the list.extend() method in order to add a list to the end of another one:

listone = [1,2,3]
listtwo = [4,5,6]

listone.extend(listtwo)

If you want to keep the original list intact, you can create a new list object, and extend both lists to it:

mergedlist = []
mergedlist.extend(listone)
mergedlist.extend(listtwo)

回答 5

如何在Python中串联两个列表?

从3.7开始,这些是在python中串联两个(或多个)列表的最受欢迎的stdlib方法。

脚注

  1. 由于它的简洁性,这是一个不错的解决方案。但是sum以成对方式执行串联操作,这意味着这是二次操作,因为必须为每个步骤分配内存。如果您的列表很大,请不要使用。

  2. 查看chain 和阅读 chain.from_iterable 文档。您需要import itertools先。串联在内存中是线性的,因此这在性能和版本兼容性方面是最佳的。chain.from_iterable在2.6中引入。

  3. 此方法使用“ 其他解包概述”(PEP 448),但除非您手动手动解压缩每个列表,否则无法将其归纳为N个列表。

  4. a += ba.extend(b)在所有实际用途上大致相同。+=当在列表上调用时list.__iadd__,将内部调用 ,从而将第一个列表扩展到第二个列表。


性能

2列表串联1

这些方法之间没有太大区别,但是鉴于它们都具有相同的复杂度顺序(线性),这是有道理的。除了风格之外,没有特别的理由比一个更喜欢一个。

N列表串联

使用perfplot模块已生成图。代码,供您参考。

1. iadd+=)和extend方法就地操作,因此每次测试前都必须生成一个副本。为了公平起见,所有方法在左侧列表中都有一个预复制步骤,可以忽略。


对其他解决方案的评论

  • 请勿list.__add__以任何方式,形状或形式直接使用DUNDER方法。实际上,请避免使用dunder方法,并使用operator设计用于它们的运算符和函数。Python具有仔细的语义,这些语义比直接调用dunder更复杂。这是一个例子。因此,总而言之,a.__add__(b)=>差;a + b=>好。

  • 此处提供reduce(operator.add, [a, b])了成对连接的一些答案-这与sum([a, b], [])更多单词一样。

  • 使用的任何方法都set将删除重复项并失去顺序。请谨慎使用。

  • for i in b: a.append(i)a.extend(b)单一功能调用和惯用语言更加罗word,并且速度更慢。append之所以变慢,是因为为列表分配和增长了内存的语义。参见此处进行类似的讨论。

  • heapq.merge可以使用,但是它的用例是在线性时间内合并排序列表。在任何其他情况下使用它都是一种反模式。

  • yield从函数中列出列表元素是一种可以接受的方法,但是chain这样做更快,更好(它在C中具有代码路径,因此速度很快)。

  • operator.add(a, b)是可以接受的等效功能a + b。它的用例主要用于动态方法分派。否则,在我看来,最好选择更a + b短,更易读的格式。YMMV。

How do I concatenate two lists in Python?

As of 3.7, these are the most popular stdlib methods for concatenating two (or more) lists in python.

Footnotes

  1. This is a slick solution because of its succinctness. But sum performs concatenation in a pairwise fashion, which means this is a quadratic operation as memory has to be allocated for each step. DO NOT USE if your lists are large.

  2. See chain and chain.from_iterable from the docs. You will need to import itertools first. Concatenation is linear in memory, so this is the best in terms of performance and version compatibility. chain.from_iterable was introduced in 2.6.

  3. This method uses Additional Unpacking Generalizations (PEP 448), but cannot generalize to N lists unless you manually unpack each one yourself.

  4. a += b and a.extend(b) are more or less equivalent for all practical purposes. += when called on a list will internally call list.__iadd__, which extends the first list by the second.


Performance

2-List Concatenation1

There’s not much difference between these methods but that makes sense given they all have the same order of complexity (linear). There’s no particular reason to prefer one over the other except as a matter of style.

N-List Concatenation

Plots have been generated using the perfplot module. Code, for your reference.

1. The iadd (+=) and extend methods operate in-place, so a copy has to be generated each time before testing. To keep things fair, all methods have a pre-copy step for the left-hand list which can be ignored.


Comments on Other Solutions

  • DO NOT USE THE DUNDER METHOD list.__add__ directly in any way, shape or form. In fact, stay clear of dunder methods, and use the operators and operator functions like they were designed for. Python has careful semantics baked into these which are more complicated than just calling the dunder directly. Here is an example. So, to summarise, a.__add__(b) => BAD; a + b => GOOD.

  • Some answers here offer reduce(operator.add, [a, b]) for pairwise concatenation — this is the same as sum([a, b], []) only more wordy.

  • Any method that uses set will drop duplicates and lose ordering. Use with caution.

  • for i in b: a.append(i) is more wordy, and slower than a.extend(b), which is single function call and more idiomatic. append is slower because of the semantics with which memory is allocated and grown for lists. See here for a similar discussion.

  • heapq.merge will work, but its use case is for merging sorted lists in linear time. Using it in any other situation is an anti-pattern.

  • yielding list elements from a function is an acceptable method, but chain does this faster and better (it has a code path in C, so it is fast).

  • operator.add(a, b) is an acceptable functional equivalent to a + b. It’s use cases are mainly for dynamic method dispatch. Otherwise, prefer a + b which is shorter and more readable, in my opinion. YMMV.


回答 6

这很简单,我认为它甚至在本教程中已显示:

>>> listone = [1,2,3]
>>> listtwo = [4,5,6]
>>>
>>> listone + listtwo
[1, 2, 3, 4, 5, 6]

This is quite simple, and I think it was even shown in the tutorial:

>>> listone = [1,2,3]
>>> listtwo = [4,5,6]
>>>
>>> listone + listtwo
[1, 2, 3, 4, 5, 6]

回答 7

这个问题直接询问有关加入两个列表的问题。但是,即使您正在寻找加入许多列表的方式(包括加入零列表的情况),其搜索量也很高。

我认为最好的选择是使用列表推导:

>>> a = [[1,2,3], [4,5,6], [7,8,9]]
>>> [x for xs in a for x in xs]
[1, 2, 3, 4, 5, 6, 7, 8, 9]

您还可以创建生成器:

>>> map(str, (x for xs in a for x in xs))
['1', '2', '3', '4', '5', '6', '7', '8', '9']

旧答案

考虑这种更通用的方法:

a = [[1,2,3], [4,5,6], [7,8,9]]
reduce(lambda c, x: c + x, a, [])

将输出:

[1, 2, 3, 4, 5, 6, 7, 8, 9]

请注意,这在ais []或时也可以正常使用[[1,2,3]]

但是,可以使用以下命令更有效地完成此操作itertools

a = [[1,2,3], [4,5,6], [7,8,9]]
list(itertools.chain(*a))

如果不需要a list,而只是需要迭代,请省略list()

更新资料

Patrick Collins在评论中建议的替代方法也可能对您有用:

sum(a, [])

This question directly asks about joining two lists. However it’s pretty high in search even when you are looking for a way of joining many lists (including the case when you joining zero lists).

I think the best option is to use list comprehensions:

>>> a = [[1,2,3], [4,5,6], [7,8,9]]
>>> [x for xs in a for x in xs]
[1, 2, 3, 4, 5, 6, 7, 8, 9]

You can create generators as well:

>>> map(str, (x for xs in a for x in xs))
['1', '2', '3', '4', '5', '6', '7', '8', '9']

Old Answer

Consider this more generic approach:

a = [[1,2,3], [4,5,6], [7,8,9]]
reduce(lambda c, x: c + x, a, [])

Will output:

[1, 2, 3, 4, 5, 6, 7, 8, 9]

Note, this also works correctly when a is [] or [[1,2,3]].

However, this can be done more efficiently with itertools:

a = [[1,2,3], [4,5,6], [7,8,9]]
list(itertools.chain(*a))

If you don’t need a list, but just an iterable, omit list().

Update

Alternative suggested by Patrick Collins in the comments could also work for you:

sum(a, [])

回答 8

您可以简单地使用+or +=运算符,如下所示:

a = [1, 2, 3]
b = [4, 5, 6]

c = a + b

要么:

c = []
a = [1, 2, 3]
b = [4, 5, 6]

c += (a + b)

另外,如果您希望合并列表中的值唯一,则可以执行以下操作:

c = list(set(a + b))

You could simply use the + or += operator as follows:

a = [1, 2, 3]
b = [4, 5, 6]

c = a + b

Or:

c = []
a = [1, 2, 3]
b = [4, 5, 6]

c += (a + b)

Also, if you want the values in the merged list to be unique you can do:

c = list(set(a + b))

回答 9

值得注意的是,该itertools.chain函数接受可变数量的参数:

>>> l1 = ['a']; l2 = ['b', 'c']; l3 = ['d', 'e', 'f']
>>> [i for i in itertools.chain(l1, l2)]
['a', 'b', 'c']
>>> [i for i in itertools.chain(l1, l2, l3)]
['a', 'b', 'c', 'd', 'e', 'f']

如果输入一个可迭代的(元组,列表,生成器等),from_iterable则可以使用class方法:

>>> il = [['a'], ['b', 'c'], ['d', 'e', 'f']]
>>> [i for i in itertools.chain.from_iterable(il)]
['a', 'b', 'c', 'd', 'e', 'f']

It’s worth noting that the itertools.chain function accepts variable number of arguments:

>>> l1 = ['a']; l2 = ['b', 'c']; l3 = ['d', 'e', 'f']
>>> [i for i in itertools.chain(l1, l2)]
['a', 'b', 'c']
>>> [i for i in itertools.chain(l1, l2, l3)]
['a', 'b', 'c', 'd', 'e', 'f']

If an iterable (tuple, list, generator, etc.) is the input, the from_iterable class method may be used:

>>> il = [['a'], ['b', 'c'], ['d', 'e', 'f']]
>>> [i for i in itertools.chain.from_iterable(il)]
['a', 'b', 'c', 'd', 'e', 'f']

回答 10

使用Python 3.3+,您可以使用yield from

listone = [1,2,3]
listtwo = [4,5,6]

def merge(l1, l2):
    yield from l1
    yield from l2

>>> list(merge(listone, listtwo))
[1, 2, 3, 4, 5, 6]

或者,如果您想支持任意数量的迭代器:

def merge(*iters):
    for it in iters:
        yield from it

>>> list(merge(listone, listtwo, 'abcd', [20, 21, 22]))
[1, 2, 3, 4, 5, 6, 'a', 'b', 'c', 'd', 20, 21, 22]

With Python 3.3+ you can use yield from:

listone = [1,2,3]
listtwo = [4,5,6]

def merge(l1, l2):
    yield from l1
    yield from l2

>>> list(merge(listone, listtwo))
[1, 2, 3, 4, 5, 6]

Or, if you want to support an arbitrary number of iterators:

def merge(*iters):
    for it in iters:
        yield from it

>>> list(merge(listone, listtwo, 'abcd', [20, 21, 22]))
[1, 2, 3, 4, 5, 6, 'a', 'b', 'c', 'd', 20, 21, 22]

回答 11

如果你想在排序的形式两个列表合并,您可以使用merge从函数heapq库。

from heapq import merge

a = [1, 2, 4]
b = [2, 4, 6, 7]

print list(merge(a, b))

If you want to merge the two lists in sorted form, you can use the merge function from the heapq library.

from heapq import merge

a = [1, 2, 4]
b = [2, 4, 6, 7]

print list(merge(a, b))

回答 12

如果您不能使用加号(+),则可以使用operator导入:

import operator

listone = [1,2,3]
listtwo = [4,5,6]

result = operator.add(listone, listtwo)
print(result)

>>> [1, 2, 3, 4, 5, 6]

另外,您也可以使用__add__ dunder函数:

listone = [1,2,3]
listtwo = [4,5,6]

result = list.__add__(listone, listtwo)
print(result)

>>> [1, 2, 3, 4, 5, 6]

If you can’t use the plus operator (+), you can use the operator import:

import operator

listone = [1,2,3]
listtwo = [4,5,6]

result = operator.add(listone, listtwo)
print(result)

>>> [1, 2, 3, 4, 5, 6]

Alternatively, you could also use the __add__ dunder function:

listone = [1,2,3]
listtwo = [4,5,6]

result = list.__add__(listone, listtwo)
print(result)

>>> [1, 2, 3, 4, 5, 6]

回答 13

作为更多列表的更通用方法,您可以将它们放在列表中并使用itertools.chain.from_iterable()1函数,该函数基于答案是扁平化嵌套列表的最佳方法:

>>> l=[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> import itertools
>>> list(itertools.chain.from_iterable(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

1.请注意,这chain.from_iterable()在Python 2.6和更高版本中可用。在其他版本中,请使用chain(*l)

As a more general way for more lists you can put them within a list and use the itertools.chain.from_iterable()1 function which based on this answer is the best way for flatting a nested list:

>>> l=[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> import itertools
>>> list(itertools.chain.from_iterable(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

1. Note that chain.from_iterable() is available in Python 2.6 and later. In other versions, use chain(*l).


回答 14

如果您需要使用复杂的排序规则合并两个有序列表,则可能需要像下面的代码一样滚动它(使用简单的排序规则以提高可读性:-))。

list1 = [1,2,5]
list2 = [2,3,4]
newlist = []

while list1 and list2:
    if list1[0] == list2[0]:
        newlist.append(list1.pop(0))
        list2.pop(0)
    elif list1[0] < list2[0]:
        newlist.append(list1.pop(0))
    else:
        newlist.append(list2.pop(0))

if list1:
    newlist.extend(list1)
if list2:
    newlist.extend(list2)

assert(newlist == [1, 2, 3, 4, 5])

If you need to merge two ordered lists with complicated sorting rules, you might have to roll it yourself like in the following code (using a simple sorting rule for readability :-) ).

list1 = [1,2,5]
list2 = [2,3,4]
newlist = []

while list1 and list2:
    if list1[0] == list2[0]:
        newlist.append(list1.pop(0))
        list2.pop(0)
    elif list1[0] < list2[0]:
        newlist.append(list1.pop(0))
    else:
        newlist.append(list2.pop(0))

if list1:
    newlist.extend(list1)
if list2:
    newlist.extend(list2)

assert(newlist == [1, 2, 3, 4, 5])

回答 15

您可以使用append()list对象上定义的方法:

mergedlist =[]
for elem in listone:
    mergedlist.append(elem)
for elem in listtwo:
    mergedlist.append(elem)

You could use the append() method defined on list objects:

mergedlist =[]
for elem in listone:
    mergedlist.append(elem)
for elem in listtwo:
    mergedlist.append(elem)

回答 16

list(set(listone) | set(listtwo))

上面的代码不保留顺序,而是从每个列表中删除重复项(但不从串联列表中删除)

list(set(listone) | set(listtwo))

The above code, does not preserve order, removes duplicate from each list (but not from the concatenated list)


回答 17

正如许多人已经指出itertools.chain()的那样,如果一个人需要对两个列表应用完全相同的处理方法,那该走的路。就我而言,我有一个标签和一个标志,它们与一个列表彼此不同,因此我需要稍微复杂一些的东西。事实证明,在幕后itertools.chain()只需执行以下操作即可:

for it in iterables:
    for element in it:
        yield element

(请参阅https://docs.python.org/2/library/itertools.html),所以我从这里汲取了灵感,并根据以下内容写了一些东西:

for iterable, header, flag in ( (newList, 'New', ''), (modList, 'Modified', '-f')):
    print header + ':'
    for path in iterable:
        [...]
        command = 'cp -r' if os.path.isdir(srcPath) else 'cp'
        print >> SCRIPT , command, flag, srcPath, mergedDirPath
        [...]

这里要理解的要点是,列表只是可迭代的特例,它们是与其他对象一样的对象。并且for ... inpython 中的循环可以使用元组变量,因此同时循环多个变量很简单。

As already pointed out by many, itertools.chain() is the way to go if one needs to apply exactly the same treatment to both lists. In my case, I had a label and a flag which were different from one list to the other, so I needed something slightly more complex. As it turns out, behind the scenes itertools.chain() simply does the following:

for it in iterables:
    for element in it:
        yield element

(see https://docs.python.org/2/library/itertools.html), so I took inspiration from here and wrote something along these lines:

for iterable, header, flag in ( (newList, 'New', ''), (modList, 'Modified', '-f')):
    print header + ':'
    for path in iterable:
        [...]
        command = 'cp -r' if os.path.isdir(srcPath) else 'cp'
        print >> SCRIPT , command, flag, srcPath, mergedDirPath
        [...]

The main points to understand here are that lists are just a special case of iterable, which are objects like any other; and that for ... in loops in python can work with tuple variables, so it is simple to loop on multiple variables at the same time.


回答 18

使用简单的列表理解:

joined_list = [item for list_ in [list_one, list_two] for item in list_]

它具有使用附加解包概括的最新方法的所有优点-即您可以以这种方式连接任意数量的不同可迭代对象(例如,列表,元组,范围和生成器)-而且不限于Python 3.5或更高版本。

Use a simple list comprehension:

joined_list = [item for list_ in [list_one, list_two] for item in list_]

It has all the advantages of the newest approach of using Additional Unpacking Generalizations – i.e. you can concatenate an arbitrary number of different iterables (for example, lists, tuples, ranges, and generators) that way – and it’s not limited to Python 3.5 or later.


回答 19

合并列表列表的一种非常简洁的方法是

list_of_lists = [[1,2,3], [4,5,6], [7,8,9]]
reduce(list.__add__, list_of_lists)

这给了我们

[1, 2, 3, 4, 5, 6, 7, 8, 9]

A really concise way to combine a list of lists is

list_of_lists = [[1,2,3], [4,5,6], [7,8,9]]
reduce(list.__add__, list_of_lists)

which gives us

[1, 2, 3, 4, 5, 6, 7, 8, 9]

回答 20

在Python中,您可以使用此命令来连接两个兼容维度的数组

numpy.concatenate([a,b])

In Python you can concatenate two arrays of compatible dimensions with this command

numpy.concatenate([a,b])

回答 21

因此,有两种简单的方法。

  1. 使用+:它从提供的列表中创建一个新列表

例:

In [1]: a = [1, 2, 3]

In [2]: b = [4, 5, 6]

In [3]: a + b
Out[3]: [1, 2, 3, 4, 5, 6]

In [4]: %timeit a + b
10000000 loops, best of 3: 126 ns per loop
  1. 使用extend:将新列表追加到现有列表。这意味着它不会创建单独的列表。

例:

In [1]: a = [1, 2, 3]

In [2]: b = [4, 5, 6]

In [3]: %timeit a.extend(b)
10000000 loops, best of 3: 91.1 ns per loop

因此,我们看到在两种最流行的方法中,它extend是有效的。

So there are two easy ways.

  1. Using +: It creates a new list from provided lists

Example:

In [1]: a = [1, 2, 3]

In [2]: b = [4, 5, 6]

In [3]: a + b
Out[3]: [1, 2, 3, 4, 5, 6]

In [4]: %timeit a + b
10000000 loops, best of 3: 126 ns per loop
  1. Using extend: It appends new list to existing list. That means it does not create a separate list.

Example:

In [1]: a = [1, 2, 3]

In [2]: b = [4, 5, 6]

In [3]: %timeit a.extend(b)
10000000 loops, best of 3: 91.1 ns per loop

Thus we see that out of two of most popular methods, extend is efficient.


回答 22

有多种方法可以在python中串联列表。

l1 = [1,2,3,4]
l2 = [3,4,5,6]

1. new_list = l1.extend(l2)
2. new_list = l1 + l2
3. new_list = [*l1, *l2]

There are multiple ways to concatenete lists in python.

l1 = [1,2,3,4]
l2 = [3,4,5,6]

1. new_list = l1.extend(l2)
2. new_list = l1 + l2
3. new_list = [*l1, *l2]

回答 23

import itertools

A = list(zip([1,3,5,7,9],[2,4,6,8,10]))
B = [1,3,5,7,9]+[2,4,6,8,10]
C = list(set([1,3,5,7,9] + [2,4,6,8,10]))

D = [1,3,5,7,9]
D.append([2,4,6,8,10])

E = [1,3,5,7,9]
E.extend([2,4,6,8,10])

F = []
for a in itertools.chain([1,3,5,7,9], [2,4,6,8,10]):
    F.append(a)


print ("A: " + str(A))
print ("B: " + str(B))
print ("C: " + str(C))
print ("D: " + str(D))
print ("E: " + str(E))
print ("F: " + str(F))

输出:

A: [(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)]
B: [1, 3, 5, 7, 9, 2, 4, 6, 8, 10]
C: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
D: [1, 3, 5, 7, 9, [2, 4, 6, 8, 10]]
E: [1, 3, 5, 7, 9, 2, 4, 6, 8, 10]
F: [1, 3, 5, 7, 9, 2, 4, 6, 8, 10]
import itertools

A = list(zip([1,3,5,7,9],[2,4,6,8,10]))
B = [1,3,5,7,9]+[2,4,6,8,10]
C = list(set([1,3,5,7,9] + [2,4,6,8,10]))

D = [1,3,5,7,9]
D.append([2,4,6,8,10])

E = [1,3,5,7,9]
E.extend([2,4,6,8,10])

F = []
for a in itertools.chain([1,3,5,7,9], [2,4,6,8,10]):
    F.append(a)


print ("A: " + str(A))
print ("B: " + str(B))
print ("C: " + str(C))
print ("D: " + str(D))
print ("E: " + str(E))
print ("F: " + str(F))

Output:

A: [(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)]
B: [1, 3, 5, 7, 9, 2, 4, 6, 8, 10]
C: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
D: [1, 3, 5, 7, 9, [2, 4, 6, 8, 10]]
E: [1, 3, 5, 7, 9, 2, 4, 6, 8, 10]
F: [1, 3, 5, 7, 9, 2, 4, 6, 8, 10]

回答 24

如果您想要一个新列表,同时保留两个旧列表:

def concatenate_list(listOne, listTwo):
    joinedList = []
    for i in listOne:
        joinedList.append(i)
    for j in listTwo:
        joinedList.append(j)

    sorted(joinedList)

    return joinedList

If you wanted a new list whilst keeping the two old lists:

def concatenate_list(listOne, listTwo):
    joinedList = []
    for i in listOne:
        joinedList.append(i)
    for j in listTwo:
        joinedList.append(j)

    sorted(joinedList)

    return joinedList

回答 25

lst1 = [1,2]

lst2 = [3,4]

def list_combinationer(Bushisms, are_funny):

    for item in lst1:
        lst2.append(item)
        lst1n2 = sorted(lst2)
        print lst1n2

list_combinationer(lst1, lst2)

[1,2,3,4]
lst1 = [1,2]

lst2 = [3,4]

def list_combinationer(Bushisms, are_funny):

    for item in lst1:
        lst2.append(item)
        lst1n2 = sorted(lst2)
        print lst1n2

list_combinationer(lst1, lst2)

[1,2,3,4]

回答 26

您可以按照代码进行操作

listone = [1, 2, 3]
listtwo = [4, 5, 6]

for i in listone:
    listtwo.append(i)
print(listtwo)

[1,2,3,4,5,6]

You may follow the code

listone = [1, 2, 3]
listtwo = [4, 5, 6]

for i in listone:
    listtwo.append(i)
print(listtwo)

[1,2,3,4,5,6]