I think this is quite a clumsy way to handle int and string concatenation.
Even Java does not need explicit casting to String to do this sort
of concatenation.
Is there a better way to do this sort of concatenation i.e without explicit casting in Python?
String formatting, using the new-style .format() method (with the defaults .format() provides):
'{}{}'.format(s, i)
Or the older, but “still sticking around”, %-formatting:
'%s%d' %(s, i)
In both examples above there’s no space between the two items concatenated. If space is needed, it can simply be added in the format strings.
These provide a lot of control and flexibility about how to concatenate items, the space between them etc. For details about format specifications see this.
Python is an interesting language in that while there is usually one (or two) “obvious” ways to accomplish any given task, flexibility still exists.
s = "string"
i = 0
print (s + repr(i))
The above code snippet is written in Python3 syntax but the parentheses after print were always allowed (optional) until version 3 made them mandatory.
For your case the only difference is performance: append is twice as fast.
Python 3.0 (r30:67507, Dec 3 2008, 20:14:27) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.Timer('s.append("something")', 's = []').timeit()
0.20177424499999999
>>> timeit.Timer('s += ["something"]', 's = []').timeit()
0.41192320500000079
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.Timer('s.append("something")', 's = []').timeit()
0.23079359499999999
>>> timeit.Timer('s += ["something"]', 's = []').timeit()
0.44208112500000141
In general case append will add one item to the list, while += will copy all elements of right-hand-side list into the left-hand-side list.
Update: perf analysis
Comparing bytecodes we can assume that append version wastes cycles in LOAD_ATTR + CALL_FUNCTION, and += version — in BUILD_LIST. Apparently BUILD_LIST outweighs LOAD_ATTR + CALL_FUNCTION.
>>> a =[]>>> id(a)11814312>>> a.append("hello")>>> id(a)11814312>>> b =[]>>> id(b)11828720>>> c = b +["hello"]>>> id(c)11833752>>> b +=["hello"]>>> id(b)11828720
In the example you gave, there is no difference, in terms of output, between append and +=. But there is a difference between append and + (which the question originally asked about).
>>> a = []
>>> id(a)
11814312
>>> a.append("hello")
>>> id(a)
11814312
>>> b = []
>>> id(b)
11828720
>>> c = b + ["hello"]
>>> id(c)
11833752
>>> b += ["hello"]
>>> id(b)
11828720
As you can see, append and += have the same result; they add the item to the list, without producing a new list. Using + adds the two lists and produces a new list.
回答 2
>>> a=[]>>> a.append([1,2])>>> a
[[1,2]]>>> a=[]>>> a+=[1,2]>>> a
[1,2]
l=[]def a1(x):
l.append(x)# worksdef a2(x):
l= l+[x]# assign to l, makes l local# so attempt to read l for addition gives UnboundLocalErrordef a3(x):
l+=[x]# fails for the same reason
+ =运算符通常还应该像list + list通常那样创建一个新的列表对象:
>>> l1=[]>>> l2= l1
>>> l1.append('x')>>> l1 is l2
True>>> l1= l1+['x']>>> l1 is l2
False
+= is an assignment. When you use it you’re really saying ‘some_list2= some_list2+[‘something’]’. Assignments involve rebinding, so:
l= []
def a1(x):
l.append(x) # works
def a2(x):
l= l+[x] # assign to l, makes l local
# so attempt to read l for addition gives UnboundLocalError
def a3(x):
l+= [x] # fails for the same reason
The += operator should also normally create a new list object like list+list normally does:
This is because Python lists implement __iadd__() to make a += augmented assignment short-circuit and call list.extend() instead. (It’s a bit of a strange wart this: it usually does what you meant, but for confusing reasons.)
In general, if you’re appending/extended an existing list, and you want to keep the reference to the same list (instead of making a new one), it’s best to be explicit and stick with the append()/extend() methods.
回答 4
some_list2 +=["something"]
实际上是
some_list2.extend(["something"])
对于一个值,没有区别。文档指出:
s.append(x) 与… s[len(s):len(s)] = [x] s.extend(x) 相同s[len(s):len(s)] = x
list1+[‘5′,’6’] adds ‘5’ and ‘6’ to the list1 as individual elements. list1.append([‘5′,’6’]) adds the list [‘5′,’6’] to the list1 as a single element.
回答 8
其他答案中提到的重新绑定行为在某些情况下确实很重要:
>>> a =([],[])>>> a[0].append(1)>>> a
([1],[])>>> a[1]+=[1]Traceback(most recent call last):File"<interactive input>", line 1,in<module>TypeError:'tuple' object does not support item assignment
The rebinding behaviour mentioned in other answers does matter in certain circumstances:
>>> a = ([],[])
>>> a[0].append(1)
>>> a
([1], [])
>>> a[1] += [1]
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
That’s because augmented assignment always rebinds, even if the object was mutated in-place. The rebinding here happens to be a[1] = *mutated list*, which doesn’t work for tuples.
回答 9
让我们先举一个例子
list1=[1,2,3,4]
list2=list1 (that means they points to same object)if we do
list1=list1+[5] it will create a new object of list
print(list1) output [1,2,3,4,5]print(list2) output [1,2,3,4]
but if we append then
list1.append(5) no new object of list created
print(list1) output [1,2,3,4,5]print(list2) output [1,2,3,4,5]
extend(list) also do the same work as append it just append a list instead of a
single variable
list1=[1,2,3,4]
list2=list1 (that means they points to same object)
if we do
list1=list1+[5] it will create a new object of list
print(list1) output [1,2,3,4,5]
print(list2) output [1,2,3,4]
but if we append then
list1.append(5) no new object of list created
print(list1) output [1,2,3,4,5]
print(list2) output [1,2,3,4,5]
extend(list) also do the same work as append it just append a list instead of a
single variable
I have a list of 20 file names, like ['file1.txt', 'file2.txt', ...]. I want to write a Python script to concatenate these files into a new file. I could open each file by f = open(...), read line by line by calling f.readline(), and write each line into that new file. It doesn’t seem very “elegant” to me, especially the part where I have to read//write line by line.
Is there a more “elegant” way to do this in Python?
回答 0
这应该做
对于大文件:
filenames =['file1.txt','file2.txt',...]with open('path/to/output/file','w')as outfile:for fname in filenames:with open(fname)as infile:for line in infile:
outfile.write(line)
对于小文件:
filenames =['file1.txt','file2.txt',...]with open('path/to/output/file','w')as outfile:for fname in filenames:with open(fname)as infile:
outfile.write(infile.read())
……还有我想到的另一个有趣的东西:
filenames =['file1.txt','file2.txt',...]with open('path/to/output/file','w')as outfile:for line in itertools.chain.from_iterable(itertools.imap(open, filnames)):
outfile.write(line)
filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
for line in infile:
outfile.write(line)
For small files:
filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
outfile.write(infile.read())
… and another interesting one that I thought of:
filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
for line in itertools.chain.from_iterable(itertools.imap(open, filnames)):
outfile.write(line)
Sadly, this last method leaves a few open file descriptors, which the GC should take care of anyway. I just thought it was interesting
It automatically reads the input files chunk by chunk for you, which is more more efficient and reading the input files in and will work even if some of the input files are too large to fit into memory:
import shutil
with open('output_file.txt','wb') as wfd:
for f in ['seg1.txt','seg2.txt','seg3.txt']:
with open(f,'rb') as fd:
shutil.copyfileobj(fd, wfd)
import fileinput
with open(outfilename, 'w') as fout, fileinput.input(filenames) as fin:
for line in fin:
fout.write(line)
For this use case, it’s really not much simpler than just iterating over the files manually, but in other cases, having a single iterator that iterates over all of the files as if they were a single file is very handy. (Also, the fact that fileinput closes each file as soon as it’s done means there’s no need to with or close each one, but that’s just a one-line savings, not that big of a deal.)
There are some other nifty features in fileinput, like the ability to do in-place modifications of files just by filtering each line.
As noted in the comments, and discussed in another post, fileinput for Python 2.7 will not work as indicated. Here slight modification to make the code Python 2.7 compliant
with open('outfilename', 'w') as fout:
fin = fileinput.input(filenames)
for line in fin:
fout.write(line)
fin.close()
回答 3
我对优雅并不了解,但这可行:
import glob
import os
for f in glob.glob("file*.txt"):
os.system("cat "+f+" >> OutFile.txt")
An alternative to @inspectorG4dget answer (best answer to date 29-03-2016). I tested with 3 files of 436MB.
@inspectorG4dget solution: 162 seconds
The following solution : 125 seconds
from subprocess import Popen
filenames = ['file1.txt', 'file2.txt', 'file3.txt']
fbatch = open('batch.bat','w')
str ="type "
for f in filenames:
str+= f + " "
fbatch.write(str + " > file4results.txt")
fbatch.close()
p = Popen("batch.bat", cwd=r"Drive:\Path\to\folder")
stdout, stderr = p.communicate()
The idea is to create a batch file and execute it, taking advantage of “old good technology”. Its semi-python but works faster. Works for windows.
回答 7
如果目录中有很多文件,则glob2最好是生成文件名列表,而不是手工编写文件名。
import glob2
filenames = glob2.glob('*.txt')# list of all .txt files in the directorywith open('outfile.txt','w')as f:for file in filenames:with open(file)as infile:
f.write(infile.read()+'\n')
If you have a lot of files in the directory then glob2 might be a better option to generate a list of filenames rather than writing them by hand.
import glob2
filenames = glob2.glob('*.txt') # list of all .txt files in the directory
with open('outfile.txt', 'w') as f:
for file in filenames:
with open(file) as infile:
f.write(infile.read()+'\n')
with open('newfile.txt','wb')as newf:for filename in list_of_files:with open(filename,'rb')as hf:
newf.write(hf.read())# newf.write('\n\n\n') if you want to introduce# some blank lines between the contents of the copied files
with open('newfile.txt','wb') as newf:
for filename in list_of_files:
with open(filename,'rb') as hf:
newf.write(hf.read())
# newf.write('\n\n\n') if you want to introduce
# some blank lines between the contents of the copied files
If the files are too big to be entirely read and held in RAM, the algorithm must be a little different to read each file to be copied in a loop by chunks of fixed length, using read(10000) for example.
回答 10
def concatFiles():
path ='input/'
files = os.listdir(path)for idx, infile in enumerate(files):print("File #"+ str(idx)+" "+ infile)
concat =''.join([open(path + f).read()for f in files])with open("output_concatFile.txt","w")as fo:
fo.write(path + concat)if __name__ =="__main__":
concatFiles()
def concatFiles():
path = 'input/'
files = os.listdir(path)
for idx, infile in enumerate(files):
print ("File #" + str(idx) + " " + infile)
concat = ''.join([open(path + f).read() for f in files])
with open("output_concatFile.txt", "w") as fo:
fo.write(path + concat)
if __name__ == "__main__":
concatFiles()
回答 11
import os
files=os.listdir()print(files)print('#',tuple(files))
name=input('Enter the inclusive file name: ')
exten=input('Enter the type(extension): ')
filename=name+'.'+exten
output_file=open(filename,'w+')for i in files:print(i)
j=files.index(i)
f_j=open(i,'r')print(f_j.read())for x in f_j:
outfile.write(x)
import os
files=os.listdir()
print(files)
print('#',tuple(files))
name=input('Enter the inclusive file name: ')
exten=input('Enter the type(extension): ')
filename=name+'.'+exten
output_file=open(filename,'w+')
for i in files:
print(i)
j=files.index(i)
f_j=open(i,'r')
print(f_j.read())
for x in f_j:
outfile.write(x)
I have two simple one-dimensional arrays in NumPy. I should be able to concatenate them using numpy.concatenate. But I get this error for the code below:
TypeError: only length-1 arrays can be converted to Python scalars
Code
import numpy
a = numpy.array([1, 2, 3])
b = numpy.array([5, 6])
numpy.concatenate(a, b)
%pylab
vector_a = r_[0.:10.]#short form of "arange"
vector_b = array([1,1,1,1])
vector_c = r_[vector_a,vector_b]print vector_a
print vector_b
print vector_c,'\n\n'
a = ones((3,4))*4print a,'\n'
c = array([1,1,1])
b = c_[a,c]print b,'\n\n'
a = ones((4,3))*4print a,'\n'
c = array([[1,1,1]])
b = r_[a,c]print b
print type(vector_b)
An alternative ist to use the short form of “concatenate” which is either “r_[…]” or “c_[…]” as shown in the example code beneath (see http://wiki.scipy.org/NumPy_for_Matlab_Users for additional information):
%pylab
vector_a = r_[0.:10.] #short form of "arange"
vector_b = array([1,1,1,1])
vector_c = r_[vector_a,vector_b]
print vector_a
print vector_b
print vector_c, '\n\n'
a = ones((3,4))*4
print a, '\n'
c = array([1,1,1])
b = c_[a,c]
print b, '\n\n'
a = ones((4,3))*4
print a, '\n'
c = array([[1,1,1]])
b = r_[a,c]
print b
print type(vector_b)
# we'll utilize the concept of unpackingIn[15]:(*a,*b)Out[15]:(1,2,3,5,6)# using `numpy.ravel()`In[14]: np.ravel((*a,*b))Out[14]: array([1,2,3,5,6])# wrap the unpacked elements in `numpy.array()`In[16]: np.array((*a,*b))Out[16]: array([1,2,3,5,6])
>>> a = np.array([[1,2],[3,4]])>>> b = np.array([[5,6]])# Appending below last row>>> np.concatenate((a, b), axis=0)
array([[1,2],[3,4],[5,6]])# Appending after last column>>> np.concatenate((a, b.T), axis=1)# Notice the transpose
array([[1,2,5],[3,4,6]])# Flattening the final array>>> np.concatenate((a, b), axis=None)
array([1,2,3,4,5,6])
import glob
import pandas as pd
# get data file names
path =r'C:\DRO\DCL_rawdata_files'
filenames = glob.glob(path +"/*.csv")
dfs =[]for filename in filenames:
dfs.append(pd.read_csv(filename))# Concatenate all data into one DataFrame
big_frame = pd.concat(dfs, ignore_index=True)
I would like to read several csv files from a directory into pandas and concatenate them into one big DataFrame. I have not been able to figure it out though. Here is what I have so far:
import glob
import pandas as pd
# get data file names
path =r'C:\DRO\DCL_rawdata_files'
filenames = glob.glob(path + "/*.csv")
dfs = []
for filename in filenames:
dfs.append(pd.read_csv(filename))
# Concatenate all data into one DataFrame
big_frame = pd.concat(dfs, ignore_index=True)
If you have same columns in all your csv files then you can try the code below.
I have added header=0 so that after reading csv first row can be assigned as the column names.
import pandas as pd
import glob
path = r'C:\DRO\DCL_rawdata_files' # use your path
all_files = glob.glob(path + "/*.csv")
li = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
path = r'C:\DRO\DCL_rawdata_files'# use your path
all_files = glob.glob(os.path.join(path,"*.csv"))# advisable to use os.path.join as this makes concatenation OS independent
df_from_each_file =(pd.read_csv(f)for f in all_files)
concatenated_df = pd.concat(df_from_each_file, ignore_index=True)# doesn't create a list, nor does it append to one
path = r'C:\DRO\DCL_rawdata_files' # use your path
all_files = glob.glob(os.path.join(path, "*.csv")) # advisable to use os.path.join as this makes concatenation OS independent
df_from_each_file = (pd.read_csv(f) for f in all_files)
concatenated_df = pd.concat(df_from_each_file, ignore_index=True)
# doesn't create a list, nor does it append to one
回答 2
import glob, os
df = pd.concat(map(pd.read_csv, glob.glob(os.path.join('',"my_files*.csv"))))
The Dask dataframes implement a subset of the Pandas dataframe API. If all the data fits into memory, you can call df.compute() to convert the dataframe into a Pandas dataframe.
Almost all of the answers here are either unnecessarily complex (glob pattern matching) or rely on additional 3rd party libraries. You can do this in 2 lines using everything Pandas and python (all versions) already have built in.
import os
import glob
import pandas as pd
import numpy as np
path ="my_dir_full_path"
allFiles = glob.glob(os.path.join(path,"*.csv"))
np_array_list =[]for file_ in allFiles:
df = pd.read_csv(file_,index_col=None, header=0)
np_array_list.append(df.as_matrix())
comb_np_array = np.vstack(np_array_list)
big_frame = pd.DataFrame(comb_np_array)
big_frame.columns =["col1","col2"....]
时间统计:
total files :192
avg lines per file :8492--approach 1 without numpy --8.248656988143921 seconds ---
total records old :1630571--approach 2with numpy --2.289292573928833 seconds ---
Edit: I googled my way into https://stackoverflow.com/a/21232849/186078.
However of late I am finding it faster to do any manipulation using numpy and then assigning it once to dataframe rather than manipulating the dataframe itself on an iterative basis and it seems to work in this solution too.
I do sincerely want anyone hitting this page to consider this approach, but don’t want to attach this huge piece of code as a comment and making it less readable.
You can leverage numpy to really speed up the dataframe concatenation.
import os
import glob
import pandas as pd
import numpy as np
path = "my_dir_full_path"
allFiles = glob.glob(os.path.join(path,"*.csv"))
np_array_list = []
for file_ in allFiles:
df = pd.read_csv(file_,index_col=None, header=0)
np_array_list.append(df.as_matrix())
comb_np_array = np.vstack(np_array_list)
big_frame = pd.DataFrame(comb_np_array)
big_frame.columns = ["col1","col2"....]
Timing stats:
total files :192
avg lines per file :8492
--approach 1 without numpy -- 8.248656988143921 seconds ---
total records old :1630571
--approach 2 with numpy -- 2.289292573928833 seconds ---
回答 6
如果要递归搜索(Python 3.5或更高版本),则可以执行以下操作:
from glob import iglob
import pandas as pd
path = r'C:\user\your\path\**\*.csv'
all_rec = iglob(path, recursive=True)
dataframes =(pd.read_csv(f)for f in all_rec)
big_dataframe = pd.concat(dataframes, ignore_index=True)
请注意,最后三行可以用一行表示:
df = pd.concat((pd.read_csv(f)for f in iglob(path, recursive=True)), ignore_index=True)
from glob import iglob
from os.path import join
import pandas as pd
def read_df_rec(path, fn_regex=r'*.csv'):return pd.concat((pd.read_csv(f)for f in iglob(
join(path,'**', fn_regex), recursive=True)), ignore_index=True)
If you want to search recursively (Python 3.5 or above), you can do the following:
from glob import iglob
import pandas as pd
path = r'C:\user\your\path\**\*.csv'
all_rec = iglob(path, recursive=True)
dataframes = (pd.read_csv(f) for f in all_rec)
big_dataframe = pd.concat(dataframes, ignore_index=True)
Note that the three last lines can be expressed in one single line:
df = pd.concat((pd.read_csv(f) for f in iglob(path, recursive=True)), ignore_index=True)
You can find the documentation of **here. Also, I used iglobinstead of glob, as it returns an iterator instead of a list.
EDIT: Multiplatform recursive function:
You can wrap the above into a multiplatform function (Linux, Windows, Mac), so you can do:
df = read_df_rec('C:\user\your\path', *.csv)
Here is the function:
from glob import iglob
from os.path import join
import pandas as pd
def read_df_rec(path, fn_regex=r'*.csv'):
return pd.concat((pd.read_csv(f) for f in iglob(
join(path, '**', fn_regex), recursive=True)), ignore_index=True)
Before concatenating, you can load csv files into an intermediate dictionary which gives access to each data set based on the file name (in the form dict_of_df['filename.csv']). Such a dictionary can help you identify issues with heterogeneous data formats, when column names are not aligned for example.
Import modules and locate file paths:
import os
import glob
import pandas
from collections import OrderedDict
path =r'C:\DRO\DCL_rawdata_files'
filenames = glob.glob(path + "/*.csv")
Note: OrderedDict is not necessary,
but it’ll keep the order of files which might be useful for analysis.
Load csv files into a dictionary. Then concatenate:
dict_of_df = OrderedDict((f, pandas.read_csv(f)) for f in filenames)
pandas.concat(dict_of_df, sort=True)
Keys are file names f and values are the data frame content of csv files.
Instead of using f as a dictionary key, you can also use os.path.basename(f) or other os.path methods to reduce the size of the key in the dictionary to only the smaller part that is relevant.
Alternative using the pathlib library (often preferred over os.path).
This method avoids iterative use of pandas concat()/apped().
From the pandas documentation: It is worth noting that concat() (and therefore append()) makes a full copy of the data, and that constantly reusing this function can create a significant performance hit. If you need to use the operation over several datasets, use a list comprehension.
import pandas as pd
from pathlib import Path
dir = Path("../relevant_directory")
df = (pd.read_csv(f) for f in dir.glob("*.csv"))
df = pd.concat(df)
回答 13
这是在Google云端硬盘上使用Colab的方式
import pandas as pd
import glob
path = r'/content/drive/My Drive/data/actual/comments_only'# use your path
all_files = glob.glob(path +"/*.csv")
li =[]for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True,sort=True)
frame.to_csv('/content/drive/onefile.csv')
It’s also possible to create a generator that simply iterates over the items in both lists using itertools.chain(). This allows you to chain lists (or any iterable) together for processing without copying the items to a new list:
import itertools
for item in itertools.chain(listone, listtwo):
# Do something with each list item
Another alternative has been introduced via the acceptance of PEP 448 which deserves mentioning.
The PEP, titled Additional Unpacking Generalizations, generally reduced some syntactic restrictions when using the starred * expression in Python; with it, joining two lists (applies to any iterable) can now also be done with:
>>> l1 = [1, 2, 3]
>>> l2 = [4, 5, 6]
>>> joined_list = [*l1, *l2] # unpack both iterables in a list literal
>>> print(joined_list)
[1, 2, 3, 4, 5, 6]
This functionality was defined for Python 3.5 it hasn’t been backported to previous versions in the 3.x family. In unsupported versions a SyntaxError is going to be raised.
As with the other approaches, this too creates as shallow copy of the elements in the corresponding lists.
The upside to this approach is that you really don’t need lists in order to perform it, anything that is iterable will do. As stated in the PEP:
This is also useful as a more readable way of summing iterables into a
list, such as my_list + list(my_tuple) + list(my_range) which is now
equivalent to just [*my_list, *my_tuple, *my_range].
So while addition with + would raise a TypeError due to type mismatch:
l = [1, 2, 3]
r = range(4, 7)
res = l + r
The following won’t:
res = [*l, *r]
because it will first unpack the contents of the iterables and then simply create a list from the contents.
As of 3.7, these are the most popular stdlib methods for concatenating two (or more) lists in python.
Footnotes
This is a slick solution because of its succinctness. But sum performs concatenation in a pairwise fashion, which means this is a
quadratic operation as memory has to be allocated for each step. DO
NOT USE if your lists are large.
See chain
and
chain.from_iterable
from the docs. You will need to import itertools first.
Concatenation is linear in memory, so this is the best in terms of
performance and version compatibility. chain.from_iterable was introduced in 2.6.
a += b and a.extend(b) are more or less equivalent for all practical purposes. += when called on a list will internally call
list.__iadd__, which extends the first list by the second.
Performance
2-List Concatenation1
There’s not much difference between these methods but that makes sense given they all have the same order of complexity (linear). There’s no particular reason to prefer one over the other except as a matter of style.
1. The iadd (+=) and extend methods operate in-place, so a copy has to be generated each time before testing. To keep things fair, all methods have a pre-copy step for the left-hand list which can be ignored.
Comments on Other Solutions
DO NOT USE THE DUNDER METHOD list.__add__ directly in any way, shape or form. In fact, stay clear of dunder methods, and use the operators and operator functions like they were designed for. Python has careful semantics baked into these which are more complicated than just calling the dunder directly. Here is an example. So, to summarise, a.__add__(b) => BAD; a + b => GOOD.
Some answers here offer reduce(operator.add, [a, b]) for pairwise concatenation — this is the same as sum([a, b], []) only more wordy.
Any method that uses set will drop duplicates and lose ordering. Use with caution.
for i in b: a.append(i) is more wordy, and slower than a.extend(b), which is single function call and more idiomatic. append is slower because of the semantics with which memory is allocated and grown for lists. See here for a similar discussion.
heapq.merge will work, but its use case is for merging sorted lists in linear time. Using it in any other situation is an anti-pattern.
yielding list elements from a function is an acceptable method, but chain does this faster and better (it has a code path in C, so it is fast).
operator.add(a, b) is an acceptable functional equivalent to a + b. It’s use cases are mainly for dynamic method dispatch. Otherwise, prefer a + b which is shorter and more readable, in my opinion. YMMV.
This question directly asks about joining two lists. However it’s pretty high in search even when you are looking for a way of joining many lists (including the case when you joining zero lists).
I think the best option is to use list comprehensions:
>>> a = [[1,2,3], [4,5,6], [7,8,9]]
>>> [x for xs in a for x in xs]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
You can create generators as well:
>>> map(str, (x for xs in a for x in xs))
['1', '2', '3', '4', '5', '6', '7', '8', '9']
Old Answer
Consider this more generic approach:
a = [[1,2,3], [4,5,6], [7,8,9]]
reduce(lambda c, x: c + x, a, [])
Will output:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
Note, this also works correctly when a is [] or [[1,2,3]].
However, this can be done more efficiently with itertools:
a = [[1,2,3], [4,5,6], [7,8,9]]
list(itertools.chain(*a))
If you don’t need a list, but just an iterable, omit list().
Update
Alternative suggested by Patrick Collins in the comments could also work for you:
You could simply use the + or += operator as follows:
a = [1, 2, 3]
b = [4, 5, 6]
c = a + b
Or:
c = []
a = [1, 2, 3]
b = [4, 5, 6]
c += (a + b)
Also, if you want the values in the merged list to be unique you can do:
c = list(set(a + b))
回答 9
值得注意的是,该itertools.chain函数接受可变数量的参数:
>>> l1 =['a']; l2 =['b','c']; l3 =['d','e','f']>>>[i for i in itertools.chain(l1, l2)]['a','b','c']>>>[i for i in itertools.chain(l1, l2, l3)]['a','b','c','d','e','f']
如果输入一个可迭代的(元组,列表,生成器等),from_iterable则可以使用class方法:
>>> il =[['a'],['b','c'],['d','e','f']]>>>[i for i in itertools.chain.from_iterable(il)]['a','b','c','d','e','f']
As a more general way for more lists you can put them within a list and use the itertools.chain.from_iterable()1 function which based on this answer is the best way for flatting a nested list:
If you need to merge two ordered lists with complicated sorting rules, you might have to roll it yourself like in the following code (using a simple sorting rule for readability :-) ).
list1 = [1,2,5]
list2 = [2,3,4]
newlist = []
while list1 and list2:
if list1[0] == list2[0]:
newlist.append(list1.pop(0))
list2.pop(0)
elif list1[0] < list2[0]:
newlist.append(list1.pop(0))
else:
newlist.append(list2.pop(0))
if list1:
newlist.extend(list1)
if list2:
newlist.extend(list2)
assert(newlist == [1, 2, 3, 4, 5])
回答 15
您可以使用append()在list对象上定义的方法:
mergedlist =[]for elem in listone:
mergedlist.append(elem)for elem in listtwo:
mergedlist.append(elem)
As already pointed out by many, itertools.chain() is the way to go if one needs to apply exactly the same treatment to both lists. In my case, I had a label and a flag which were different from one list to the other, so I needed something slightly more complex. As it turns out, behind the scenes itertools.chain() simply does the following:
for it in iterables:
for element in it:
yield element
for iterable, header, flag in ( (newList, 'New', ''), (modList, 'Modified', '-f')):
print header + ':'
for path in iterable:
[...]
command = 'cp -r' if os.path.isdir(srcPath) else 'cp'
print >> SCRIPT , command, flag, srcPath, mergedDirPath
[...]
The main points to understand here are that lists are just a special case of iterable, which are objects like any other; and that for ... in loops in python can work with tuple variables, so it is simple to loop on multiple variables at the same time.
回答 18
使用简单的列表理解:
joined_list =[item for list_ in[list_one, list_two]for item in list_]
joined_list = [item for list_ in [list_one, list_two] for item in list_]
It has all the advantages of the newest approach of using Additional Unpacking Generalizations – i.e. you can concatenate an arbitrary number of different iterables (for example, lists, tuples, ranges, and generators) that way – and it’s not limited to Python 3.5 or later.
import itertools
A = list(zip([1,3,5,7,9],[2,4,6,8,10]))
B =[1,3,5,7,9]+[2,4,6,8,10]
C = list(set([1,3,5,7,9]+[2,4,6,8,10]))
D =[1,3,5,7,9]
D.append([2,4,6,8,10])
E =[1,3,5,7,9]
E.extend([2,4,6,8,10])
F =[]for a in itertools.chain([1,3,5,7,9],[2,4,6,8,10]):
F.append(a)print("A: "+ str(A))print("B: "+ str(B))print("C: "+ str(C))print("D: "+ str(D))print("E: "+ str(E))print("F: "+ str(F))
def concatenate_list(listOne, listTwo):
joinedList =[]for i in listOne:
joinedList.append(i)for j in listTwo:
joinedList.append(j)
sorted(joinedList)return joinedList
If you wanted a new list whilst keeping the two old lists:
def concatenate_list(listOne, listTwo):
joinedList = []
for i in listOne:
joinedList.append(i)
for j in listTwo:
joinedList.append(j)
sorted(joinedList)
return joinedList