The + operation adds the array elements to the original array. The array.append operation inserts the array (or any object) into the end of the original array, which results in a reference to self in that spot (hence the infinite recursion).
The difference here is that the + operation acts specific when you add an array (it’s overloaded like others, see this chapter on sequences) by concatenating the element. The append-method however does literally what you ask: append the object on the right-hand side that you give it (the array or any other object), instead of taking its elements.
An alternative
Use extend() if you want to use a function that acts similar to the + operator (as others have shown here as well). It’s not wise to do the opposite: to try to mimic append with the + operator for lists (see my earlier link on why).
Little history
For fun, a little history: the birth of the array module in Python in February 1993. it might surprise you, but arrays were added way after sequences and lists came into existence.
The concatenation operator + is a binary infix operator which, when applied to lists, returns a new list containing all the elements of each of its two operands. The list.append() method is a mutator on list which appends its single object argument (in your specific example the list c) to the subject list. In your example this results in c appending a reference to itself (hence the infinite recursion).
An alternative to ‘+’ concatenation
The list.extend() method is also a mutator method which concatenates its sequence argument with the subject list. Specifically, it appends each of the elements of sequence in iteration order.
An aside
Being an operator, + returns the result of the expression as a new value. Being a non-chaining mutator method, list.extend() modifies the subject list in-place and returns nothing.
Arrays
I’ve added this due to the potential confusion which the Abel’s answer above may cause by mixing the discussion of lists, sequences and arrays.Arrays were added to Python after sequences and lists, as a more efficient way of storing arrays of integral data types. Do not confuse arrays with lists. They are not the same.
Arrays are sequence types and behave very much like lists, except that the type of objects stored in them is constrained. The type is specified at object creation time by using a type code, which is a single character.
Python lists are heterogeneous that is the elements in the same list can be any type of object. The expression: c.append(c) appends the object c what ever it may be to the list. In the case it makes the list itself a member of the list.
The expression c += c adds two lists together and assigns the result to the variable c. The overloaded + operator is defined on lists to create a new list whose contents are the elements in the first list and the elements in the second list.
So these are really just different expressions used to do different things by design.
list.append(x)Add an item to the end of the list; equivalent to a[len(a):]=[x].
list.extend(L)Extend the list by appending all the items in the given list; equivalent to a[len(a):]= L.
list.insert(i, x)Insert an item at a given position.The first argument is the index of the element before which to insert, so a.insert(0, x) inserts at the front of the list,and a.insert(len(a), x)is equivalent to a.append(x).
The method you’re looking for is extend(). From the Python documentation:
list.append(x)
Add an item to the end of the list; equivalent to a[len(a):] = [x].
list.extend(L)
Extend the list by appending all the items in the given list; equivalent to a[len(a):] = L.
list.insert(i, x)
Insert an item at a given position. The first argument is the index of the element before which to insert, so a.insert(0, x) inserts at the front of the list, and a.insert(len(a), x) is equivalent to a.append(x).
So I have initialized an empty pandas DataFrame and I would like to iteratively append lists (or Series) as rows in this DataFrame. What is the best way of doing this?
回答 0
有时,在熊猫之外进行所有附加操作会更容易,然后只需创建DataFrame即可。
>>>import pandas as pd
>>> simple_list=[['a','b']]>>> simple_list.append(['e','f'])>>> df=pd.DataFrame(simple_list,columns=['col1','col2'])
col1 col2
0 a b
1 e f
Sometimes it’s easier to do all the appending outside of pandas, then, just create the DataFrame in one shot.
>>> import pandas as pd
>>> simple_list=[['a','b']]
>>> simple_list.append(['e','f'])
>>> df=pd.DataFrame(simple_list,columns=['col1','col2'])
col1 col2
0 a b
1 e f
In[1]:import pandas as pd
In[2]: df = pd.DataFrame()In[3]: row=pd.Series([1,2,3],["A","B","C"])In[4]: row
Out[4]:
A 1
B 2
C 3
dtype: int64
In[5]: df.append([row],ignore_index=True)Out[5]:
A B C
0123[1 rows x 3 columns]
If you want to add a Series and use the Series’ index as columns of the DataFrame, you only need to append the Series between brackets:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame()
In [3]: row=pd.Series([1,2,3],["A","B","C"])
In [4]: row
Out[4]:
A 1
B 2
C 3
dtype: int64
In [5]: df.append([row],ignore_index=True)
Out[5]:
A B C
0 1 2 3
[1 rows x 3 columns]
Whitout the ignore_index=True you don’t get proper index.
import pandas as pd
import numpy as np
def addRow(df,ls):"""
Given a dataframe and a list, append the list as a new row to the dataframe.
:param df: <DataFrame> The original dataframe
:param ls: <list> The new row to be added
:return: <DataFrame> The dataframe with the newly appended row
"""
numEl = len(ls)
newRow = pd.DataFrame(np.array(ls).reshape(1,numEl), columns = list(df.columns))
df = df.append(newRow, ignore_index=True)return df
Here’s a function that, given an already created dataframe, will append a list as a new row. This should probably have error catchers thrown in, but if you know exactly what you’re adding then it shouldn’t be an issue.
import pandas as pd
import numpy as np
def addRow(df,ls):
"""
Given a dataframe and a list, append the list as a new row to the dataframe.
:param df: <DataFrame> The original dataframe
:param ls: <list> The new row to be added
:return: <DataFrame> The dataframe with the newly appended row
"""
numEl = len(ls)
newRow = pd.DataFrame(np.array(ls).reshape(1,numEl), columns = list(df.columns))
df = df.append(newRow, ignore_index=True)
return df
open() returns a file object, and is most commonly used with two arguments: open(filename, mode).
>>> f = open('workfile', 'w')
>>> print f <open file 'workfile', mode 'w' at 80a0960>
The first argument is a string containing the filename. The second argument is
another string containing a few characters describing the way in which
the file will be used. mode can be ‘r’ when the file will only be
read, ‘w’ for only writing (an existing file with the same name will
be erased), and ‘a’ opens the file for appending; any data written to
the file is automatically added to the end. ‘r+’ opens the file for
both reading and writing. The mode argument is optional; ‘r’ will be
assumed if it’s omitted.
On Windows, ‘b’ appended to the mode opens the file in binary mode, so
there are also modes like ‘rb’, ‘wb’, and ‘r+b’. Python on Windows
makes a distinction between text and binary files; the end-of-line
characters in text files are automatically altered slightly when data
is read or written. This behind-the-scenes modification to file data
is fine for ASCII text files, but it’ll corrupt binary data like that
in JPEG or EXE files. Be very careful to use binary mode when reading
and writing such files. On Unix, it doesn’t hurt to append a ‘b’ to
the mode, so you can use it platform-independently for all binary
files.
If the file exists and contains data, then it is possible to generate the fieldname parameter for csv.DictWriter automatically:
# read header automatically
with open(myFile, "r") as f:
reader = csv.reader(f)
for header in reader:
break
# add row to CSV file
with open(myFile, "a", newline='') as f:
writer = csv.DictWriter(f, fieldnames=header)
writer.writerow(myDict)
回答 5
# I like using the codecs opening in a with
field_names =['latitude','longitude','date','user','text']with codecs.open(filename,"ab", encoding='utf-8')as logfile:
logger = csv.DictWriter(logfile, fieldnames=field_names)
logger.writeheader()# some more code stuff for video in aList:
video_result ={}
video_result['date']= video['snippet']['publishedAt']
video_result['user']= video['id']
video_result['text']= video['snippet']['description'].encode('utf8')
logger.writerow(video_result)
# I like using the codecs opening in a with
field_names = ['latitude', 'longitude', 'date', 'user', 'text']
with codecs.open(filename,"ab", encoding='utf-8') as logfile:
logger = csv.DictWriter(logfile, fieldnames=field_names)
logger.writeheader()
# some more code stuff
for video in aList:
video_result = {}
video_result['date'] = video['snippet']['publishedAt']
video_result['user'] = video['id']
video_result['text'] = video['snippet']['description'].encode('utf8')
logger.writerow(video_result)
If we just do x.append(y), y gets referenced into x such that any changes made to y will affect appended x as well. So if we need to insert only elements, we should do following:
Use update to add elements from tuples, sets, lists or frozen-sets
a.update([3,4])
>> print(a)
{1, 2, 3, 4}
If you want to add a tuple or frozen-set itself, use add
a.add((5, 6))
>> print(a)
{1, 2, 3, 4, (5, 6)}
Note: Since set elements must be hashable, and lists are considered mutable, you cannot add a list to a set. You also cannot add other sets to a set. You can however, add the elements from lists and sets as demonstrated with the “.update” method.
This question is the first one that shows up on Google when one looks up “Python how to add elements to set”, so it’s worth noting explicitly that, if you want to add a whole string to a set, it should be added with .add(), not .update().
Say you have a string foo_str whose contents are 'this is a sentence', and you have some set bar_set equal to set().
If you do
bar_set.update(foo_str), the contents of your set will be {'t', 'a', ' ', 'e', 's', 'n', 'h', 'c', 'i'}.
If you do bar_set.add(foo_str), the contents of your set will be {'this is a sentence'}.
The way I like to do this is to convert both the original set and the values I’d like to add into lists, add them, and then convert them back into a set, like this:
This way I can also easily add multiple sets using the same logic, which gets me an TypeError: unhashable type: 'set' if I try doing it with the .update() method.
voidPyBytes_ConcatAndDel(registerPyObject**pv,registerPyObject*w){PyBytes_Concat(pv, w);Py_XDECREF(w);}/* The following function breaks the notion that strings are immutable:
it changes the size of a string. We get away with this only if there
is only one module referencing the object. You can also think of it
as creating a new string object and destroying the old one, only
more efficiently. In any case, don't use this if the string may
already be known to some other part of the code...
Note that if there's not enough memory to resize the string, the original
string object at *pv is deallocated, *pv is set to NULL, an "out of
memory" exception is set, and -1 is returned. Else (on success) 0 is
returned, and the value in *pv may or may not be the same as on input.
As always, an extra byte is allocated for a trailing \0 byte (newsize
does *not* include that), and a trailing \0 byte is stored.
*/int_PyBytes_Resize(PyObject**pv,Py_ssize_t newsize){registerPyObject*v;registerPyBytesObject*sv;
v =*pv;if(!PyBytes_Check(v)||Py_REFCNT(v)!=1|| newsize <0){*pv =0;Py_DECREF(v);PyErr_BadInternalCall();return-1;}/* XXX UNREF/NEWREF interface should be more symmetrical */_Py_DEC_REFTOTAL;_Py_ForgetReference(v);*pv =(PyObject*)PyObject_REALLOC((char*)v,PyBytesObject_SIZE+ newsize);if(*pv == NULL){PyObject_Del(v);PyErr_NoMemory();return-1;}_Py_NewReference(*pv);
sv =(PyBytesObject*)*pv;Py_SIZE(sv)= newsize;
sv->ob_sval[newsize]='\0';
sv->ob_shash =-1;/* invalidate cached hash value */return0;}
凭经验进行验证很容易。
$ python -m timeit -s“ s =”“”对于xrange(10):s + ='a'
1000000次循环,每循环3:1.85最佳
$ python -m timeit -s“ s =”“”对于xrange(100):s + ='a'
10000次循环,最佳为3次:每个循环16.8微秒
$ python -m timeit -s“ s =”“”对于xrange(1000)中的我来说:s + ='a'“
10000次循环,最佳为3次:每个循环158微秒
$ python -m timeit -s“ s =”“”对于xrange(10000):s + ='a'
1000次循环,每循环3:1.71毫秒最佳
$ python -m timeit -s“ s =”“”对于xrange(100000):s + ='a'
10个循环,每循环最好3:14.6毫秒
$ python -m timeit -s“ s =”“”对于xrange(1000000):s + ='a'
10个循环,最佳3:每个循环173毫秒
If you only have one reference to a string and you concatenate another string to the end, CPython now special cases this and tries to extend the string in place.
The end result is that the operation is amortized O(n).
e.g.
s = ""
for i in range(n):
s+=str(i)
used to be O(n^2), but now it is O(n).
From the source (bytesobject.c):
void
PyBytes_ConcatAndDel(register PyObject **pv, register PyObject *w)
{
PyBytes_Concat(pv, w);
Py_XDECREF(w);
}
/* The following function breaks the notion that strings are immutable:
it changes the size of a string. We get away with this only if there
is only one module referencing the object. You can also think of it
as creating a new string object and destroying the old one, only
more efficiently. In any case, don't use this if the string may
already be known to some other part of the code...
Note that if there's not enough memory to resize the string, the original
string object at *pv is deallocated, *pv is set to NULL, an "out of
memory" exception is set, and -1 is returned. Else (on success) 0 is
returned, and the value in *pv may or may not be the same as on input.
As always, an extra byte is allocated for a trailing \0 byte (newsize
does *not* include that), and a trailing \0 byte is stored.
*/
int
_PyBytes_Resize(PyObject **pv, Py_ssize_t newsize)
{
register PyObject *v;
register PyBytesObject *sv;
v = *pv;
if (!PyBytes_Check(v) || Py_REFCNT(v) != 1 || newsize < 0) {
*pv = 0;
Py_DECREF(v);
PyErr_BadInternalCall();
return -1;
}
/* XXX UNREF/NEWREF interface should be more symmetrical */
_Py_DEC_REFTOTAL;
_Py_ForgetReference(v);
*pv = (PyObject *)
PyObject_REALLOC((char *)v, PyBytesObject_SIZE + newsize);
if (*pv == NULL) {
PyObject_Del(v);
PyErr_NoMemory();
return -1;
}
_Py_NewReference(*pv);
sv = (PyBytesObject *) *pv;
Py_SIZE(sv) = newsize;
sv->ob_sval[newsize] = '\0';
sv->ob_shash = -1; /* invalidate cached hash value */
return 0;
}
It’s easy enough to verify empirically.
$ python -m timeit -s"s=''" "for i in xrange(10):s+='a'"
1000000 loops, best of 3: 1.85 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(100):s+='a'"
10000 loops, best of 3: 16.8 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(1000):s+='a'"
10000 loops, best of 3: 158 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(10000):s+='a'"
1000 loops, best of 3: 1.71 msec per loop
$ python -m timeit -s"s=''" "for i in xrange(100000):s+='a'"
10 loops, best of 3: 14.6 msec per loop
$ python -m timeit -s"s=''" "for i in xrange(1000000):s+='a'"
10 loops, best of 3: 173 msec per loop
It’s important however to note that this optimisation isn’t part of the Python spec. It’s only in the cPython implementation as far as I know. The same empirical testing on pypy or jython for example might show the older O(n**2) performance .
$ pypy -m timeit -s"s=''" "for i in xrange(10):s+='a'"
10000 loops, best of 3: 90.8 usec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(100):s+='a'"
1000 loops, best of 3: 896 usec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(1000):s+='a'"
100 loops, best of 3: 9.03 msec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(10000):s+='a'"
10 loops, best of 3: 89.5 msec per loop
So far so good, but then,
$ pypy -m timeit -s"s=''" "for i in xrange(100000):s+='a'"
10 loops, best of 3: 12.8 sec per loop
ouch even worse than quadratic. So pypy is doing something that works well with short strings, but performs poorly for larger strings.
Don’t prematurely optimize. If you have no reason to believe there’s a speed bottleneck caused by string concatenations then just stick with + and +=:
s = 'foo'
s += 'bar'
s += 'baz'
That said, if you’re aiming for something like Java’s StringBuilder, the canonical Python idiom is to add items to a list and then use str.join to concatenate them all at the end:
l = []
l.append('foo')
l.append('bar')
l.append('baz')
s = ''.join(l)
That joins str1 and str2 with a space as separators. You can also do "".join(str1, str2, ...). str.join() takes an iterable, so you’d have to put the strings in a list or a tuple.
That’s about as efficient as it gets for a builtin method.
If you need to do many append operations to build a large string, you can use StringIO or cStringIO. The interface is like a file. ie: you write to append text to it.
If you’re just appending two strings then just use +.
it really depends on your application. If you’re looping through hundreds of words and want to append them all into a list, .join() is better. But if you’re putting together a long sentence, you’re better off using +=.
回答 7
基本上没有区别。唯一一致的趋势是,每个版本的Python似乎都变得越来越慢… :(
清单
%%timeit
x =[]for i in range(100000000):# xrange on Python 2.7
x.append('a')
x =''.join(x)
Python 2.7
1个循环,每循环3:7.34 s 最佳
Python 3.4
1个循环,每个循环最好3:7.99 s
Python 3.5
1次循环,每循环3:8.48 s 最佳
Python 3.6
1次循环,每循环3:9.93 s 最佳
串
%%timeit
x =''for i in range(100000000):# xrange on Python 2.7
x +='a'
I understand that pandas is designed to load fully populated DataFrame but I need to create an empty DataFrame then add rows, one by one.
What is the best way to do this ?
I successfully created an empty DataFrame with :
res = DataFrame(columns=('lib', 'qty1', 'qty2'))
Then I can add a new row and fill a field with :
res = res.set_value(len(res), 'qty1', 10.0)
It works but seems very odd :-/ (it fails for adding string value)
How can I add a new row to my DataFrame (with different columns type) ?
In case you can get all data for the data frame upfront, there is a much faster approach than appending to a data frame:
Create a list of dictionaries in which each dictionary corresponds to an input data row.
Create a data frame from this list.
I had a similar task for which appending to a data frame row by row took 30 min, and creating a data frame from a list of dictionaries completed within seconds.
rows_list = []
for row in input_rows:
dict1 = {}
# get input row in dictionary format
# key = col_name
dict1.update(blah..)
rows_list.append(dict1)
df = pd.DataFrame(rows_list)
Also thanks to @krassowski for useful comment – I updated the code.
So I use addition through the dictionary for myself.
Code:
import pandas as pd
import numpy as np
import time
del df1, df2, df3, df4
numOfRows = 1000
# append
startTime = time.perf_counter()
df1 = pd.DataFrame(np.random.randint(100, size=(5,5)), columns=['A', 'B', 'C', 'D', 'E'])
for i in range( 1,numOfRows-4):
df1 = df1.append( dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E']), ignore_index=True)
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df1.shape)
# .loc w/o prealloc
startTime = time.perf_counter()
df2 = pd.DataFrame(np.random.randint(100, size=(5,5)), columns=['A', 'B', 'C', 'D', 'E'])
for i in range( 1,numOfRows):
df2.loc[i] = np.random.randint(100, size=(1,5))[0]
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df2.shape)
# .loc with prealloc
df3 = pd.DataFrame(index=np.arange(0, numOfRows), columns=['A', 'B', 'C', 'D', 'E'] )
startTime = time.perf_counter()
for i in range( 1,numOfRows):
df3.loc[i] = np.random.randint(100, size=(1,5))[0]
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df3.shape)
# dict
startTime = time.perf_counter()
row_list = []
for i in range (0,5):
row_list.append(dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E']))
for i in range( 1,numOfRows-4):
dict1 = dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E'])
row_list.append(dict1)
df4 = pd.DataFrame(row_list, columns=['A','B','C','D','E'])
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df4.shape)
P.S. I believe, my realization isn’t perfect, and maybe there is some optimization.
回答 4
如果事先知道条目数,则应该通过提供索引来预分配空间(从另一个答案中获取数据示例):
import pandas as pd
import numpy as np
# we know we're gonna have 5 rows of data
numberOfRows =5# create dataframe
df = pd.DataFrame(index=np.arange(0, numberOfRows), columns=('lib','qty1','qty2'))# now fill it up row by rowfor x in np.arange(0, numberOfRows):#loc or iloc both work here since the index is natural numbers
df.loc[x]=[np.random.randint(-1,1)for n in range(3)]In[23]: df
Out[23]:
lib qty1 qty2
0-1-1-110002-10-130-104-100
速度比较
In[30]:%timeit tryThis()# function wrapper for this answerIn[31]:%timeit tryOther()# function wrapper without index (see, for example, @fred)1000 loops, best of 3:1.23 ms per loop
100 loops, best of 3:2.31 ms per loop
If you know the number of entries ex ante, you should preallocate the space by also providing the index (taking the data example from a different answer):
import pandas as pd
import numpy as np
# we know we're gonna have 5 rows of data
numberOfRows = 5
# create dataframe
df = pd.DataFrame(index=np.arange(0, numberOfRows), columns=('lib', 'qty1', 'qty2') )
# now fill it up row by row
for x in np.arange(0, numberOfRows):
#loc or iloc both work here since the index is natural numbers
df.loc[x] = [np.random.randint(-1,1) for n in range(3)]
In[23]: df
Out[23]:
lib qty1 qty2
0 -1 -1 -1
1 0 0 0
2 -1 0 -1
3 0 -1 0
4 -1 0 0
Speed comparison
In[30]: %timeit tryThis() # function wrapper for this answer
In[31]: %timeit tryOther() # function wrapper without index (see, for example, @fred)
1000 loops, best of 3: 1.23 ms per loop
100 loops, best of 3: 2.31 ms per loop
And – as from the comments – with a size of 6000, the speed difference becomes even larger:
Increasing the size of the array (12) and the number of rows (500) makes
the speed difference more striking: 313ms vs 2.29s
In[1]: se = pd.Series([1,2,3])In[2]: se
Out[2]:011223
dtype: int64
In[3]: se[5]=5.In[4]: se
Out[4]:01.012.023.055.0
dtype: float64
要么:
In[1]: dfi = pd.DataFrame(np.arange(6).reshape(3,2),.....: columns=['A','B']).....:In[2]: dfi
Out[2]:
A B
001123245In[3]: dfi.loc[:,'C']= dfi.loc[:,'A']In[4]: dfi
Out[4]:
A B C
001012322454In[5]: dfi.loc[3]=5In[6]: dfi
Out[6]:
A B C
0010123224543555
Add rows through loc/ix on non existing key index data. e.g. :
In [1]: se = pd.Series([1,2,3])
In [2]: se
Out[2]:
0 1
1 2
2 3
dtype: int64
In [3]: se[5] = 5.
In [4]: se
Out[4]:
0 1.0
1 2.0
2 3.0
5 5.0
dtype: float64
Or:
In [1]: dfi = pd.DataFrame(np.arange(6).reshape(3,2),
.....: columns=['A','B'])
.....:
In [2]: dfi
Out[2]:
A B
0 0 1
1 2 3
2 4 5
In [3]: dfi.loc[:,'C'] = dfi.loc[:,'A']
In [4]: dfi
Out[4]:
A B C
0 0 1 0
1 2 3 2
2 4 5 4
In [5]: dfi.loc[3] = 5
In [6]: dfi
Out[6]:
A B C
0 0 1 0
1 2 3 2
2 4 5 4
3 5 5 5
回答 7
您可以使用ignore_index选项将单行附加为字典。
>>> f = pandas.DataFrame(data ={'Animal':['cow','horse'],'Color':['blue','red']})>>> f
AnimalColor0 cow blue
1 horse red
>>> f.append({'Animal':'mouse','Color':'black'}, ignore_index=True)AnimalColor0 cow blue
1 horse red
2 mouse black
You can append a single row as a dictionary using the ignore_index option.
>>> f = pandas.DataFrame(data = {'Animal':['cow','horse'], 'Color':['blue', 'red']})
>>> f
Animal Color
0 cow blue
1 horse red
>>> f.append({'Animal':'mouse', 'Color':'black'}, ignore_index=True)
Animal Color
0 cow blue
1 horse red
2 mouse black
回答 8
为了Python的方式,在这里添加我的答案:
res = pd.DataFrame(columns=('lib','qty1','qty2'))
res = res.append([{'qty1':10.0}], ignore_index=True)print(res.head())
lib qty1 qty2
0NaN10.0NaN
res = pd.DataFrame(columns=('lib', 'qty1', 'qty2'))
res = res.append([{'qty1':10.0}], ignore_index=True)
print(res.head())
lib qty1 qty2
0 NaN 10.0 NaN
回答 9
您还可以建立列表列表,并将其转换为数据框-
import pandas as pd
columns =['i','double','square']
rows =[]for i in range(6):
row =[i, i*2, i*i]
rows.append(row)
df = pd.DataFrame(rows, columns=columns)
This is not an answer to the OP question but a toy example to illustrate the answer of @ShikharDua above which I found very useful.
While this fragment is trivial, in the actual data I had 1,000s of rows, and many columns, and I wished to be able to group by different columns and then perform the stats below for more than one taget column. So having a reliable method for building the data frame one row at a time was a great convenience. Thank you @ShikharDua !
import pandas as pd
BaseData = pd.DataFrame({ 'Customer' : ['Acme','Mega','Acme','Acme','Mega','Acme'],
'Territory' : ['West','East','South','West','East','South'],
'Product' : ['Econ','Luxe','Econ','Std','Std','Econ']})
BaseData
columns = ['Customer','Num Unique Products', 'List Unique Products']
rows_list=[]
for name, group in BaseData.groupby('Customer'):
RecordtoAdd={} #initialise an empty dict
RecordtoAdd.update({'Customer' : name}) #
RecordtoAdd.update({'Num Unique Products' : len(pd.unique(group['Product']))})
RecordtoAdd.update({'List Unique Products' : pd.unique(group['Product'])})
rows_list.append(RecordtoAdd)
AnalysedData = pd.DataFrame(rows_list)
print('Base Data : \n',BaseData,'\n\n Analysed Data : \n',AnalysedData)
回答 11
想出了一种简单而又不错的方法:
>>> df
A B C
one 123>>> df.loc["two"]=[4,5,6]>>> df
A B C
one 123
two 456
>>> df
A B C
one 1 2 3
>>> df.loc["two"] = [4,5,6]
>>> df
A B C
one 1 2 3
two 4 5 6
回答 12
您可以使用生成器对象创建Dataframe,这将在列表上提高内存效率。
num =10# Generator function to generate generator objectdef numgen_func(num):for i in range(num):yield('name_{}'.format(i),(i*i),(i*i*i))# Generator expression to generate generator object (Only once data get populated, can not be re used)
numgen_expression =(('name_{}'.format(i),(i*i),(i*i*i))for i in range(num))
df = pd.DataFrame(data=numgen_func(num), columns=('lib','qty1','qty2'))
You can use generator object to create Dataframe, which will be more memory efficient over the list.
num = 10
# Generator function to generate generator object
def numgen_func(num):
for i in range(num):
yield ('name_{}'.format(i), (i*i), (i*i*i))
# Generator expression to generate generator object (Only once data get populated, can not be re used)
numgen_expression = (('name_{}'.format(i), (i*i), (i*i*i)) for i in range(num) )
df = pd.DataFrame(data=numgen_func(num), columns=('lib', 'qty1', 'qty2'))
To add raw to existing DataFrame you can use append method.
# current data
data ={"Animal":["cow","horse"],"Color":["blue","red"]}# adding a new row (be careful to ensure every column gets another value)
data["Animal"].append("mouse")
data["Color"].append("black")# at the end, construct our DataFrame
df = pd.DataFrame(data)# Animal Color# 0 cow blue# 1 horse red# 2 mouse black
Instead of a list of dictionaries as in ShikharDua’s answer, we can also represent our table as a dictionary of lists, where each list stores one column in row-order, given we know our columns beforehand. At the end we construct our DataFrame once.
For c columns and n rows, this uses 1 dictionary and c lists, versus 1 list and n dictionaries. The list of dictionaries method has each dictionary storing all keys and requires creating a new dictionary for every row. Here we only append to lists, which is constant time and theoretically very fast.
# current data
data = {"Animal":["cow", "horse"], "Color":["blue", "red"]}
# adding a new row (be careful to ensure every column gets another value)
data["Animal"].append("mouse")
data["Color"].append("black")
# at the end, construct our DataFrame
df = pd.DataFrame(data)
# Animal Color
# 0 cow blue
# 1 horse red
# 2 mouse black
回答 16
如果要在行末添加行,请将其添加为列表
valuestoappend =[va1,val2,val3]
res = res.append(pd.Series(valuestoappend,index =['lib','qty1','qty2']),ignore_index =True)
# add a rowdef add_row(df, row):
colnames = list(df.columns)
ncol = len(colnames)assert ncol == len(row),"Length of row must be the same as width of DataFrame: %s"% row
return df.append(pd.DataFrame([row], columns=colnames))
Another way to do it (probably not very performant):
# add a row
def add_row(df, row):
colnames = list(df.columns)
ncol = len(colnames)
assert ncol == len(row), "Length of row must be the same as width of DataFrame: %s" % row
return df.append(pd.DataFrame([row], columns=colnames))
You can also enhance the DataFrame class like this:
import pandas as pd
res = pd.DataFrame(columns=('lib','qty1','qty2'))for i in range(5):
res_list = list(map(int, input().split()))
res = res.append(pd.Series(res_list,index=['lib','qty1','qty2']), ignore_index=True)
Make it simple. By taking list as input which will be appended as row in data-frame:-
import pandas as pd
res = pd.DataFrame(columns=('lib', 'qty1', 'qty2'))
for i in range(5):
res_list = list(map(int, input().split()))
res = res.append(pd.Series(res_list,index=['lib','qty1','qty2']), ignore_index=True)
回答 19
您需要的是loc[df.shape[0]]或loc[len(df)]
# Assuming your df has 4 columns (str, int, str, bool)
df.loc[df.shape[0]]=['col1Value',100,'col3Value',False]
We often see the construct df.loc[subscript] = … to assign to one DataFrame row. Mikhail_Sam posted benchmarks containing, among others, this construct as well as the method using dict and create DataFrame in the end. He found the latter to be the fastest by far. But if we replace the df3.loc[i] = … (with preallocated DataFrame) in his code with df3.values[i] = …, the outcome changes significantly, in that that method performs similar to the one using dict. So we should more often take the use of df.values[subscript] = … into consideration. However note that .values takes a zero-based subscript, which may be different from the DataFrame.index.
df2=df.to_dict()
values=["s_101","hyderabad",10,20,16,13,15,12,12,13,25,26,25,27,"good","bad"]#this is total row that we are going to add
i=0for x in df.columns:#here df.columns gives us the main dictionary key
df2[x][101]=values[i]#here the 101 is our index number it is also key of sub dictionary
i+=1
before going to add a row, we have to convert the dataframe to dictionary there you can see the keys as columns in dataframe and values of the columns are again stored in the dictionary but there key for every column is the index number in dataframe. That idea make me to write the below code.
df2=df.to_dict()
values=["s_101","hyderabad",10,20,16,13,15,12,12,13,25,26,25,27,"good","bad"] #this is total row that we are going to add
i=0
for x in df.columns: #here df.columns gives us the main dictionary key
df2[x][101]=values[i] #here the 101 is our index number it is also key of sub dictionary
i+=1
new_dict ={put input for new row here}
new_list =[put your index here]
new_df = pd.DataFrame(data=new_dict, index=new_list)
df = pd.concat([existing_df, new_df])
You can concatenate two DataFrames for this. I basically came across this problem to add a new row to an existing DataFrame with a character index(not numeric).
So, I input the data for a new row in a duct() and index in a list.
new_dict = {put input for new row here}
new_list = [put your index here]
new_df = pd.DataFrame(data=new_dict, index=new_list)
df = pd.concat([existing_df, new_df])
df = pd.DataFrame(columns=['timeMS','accelX','accelY','accelZ','gyroX','gyroY','gyroZ'])
df.loc[0if math.isnan(df.index.max())else df.index.max()+1]=[x for x in range(7)]
>>> f = open('test','a+')# Not using 'with' just to simplify the example REPL session>>> f.write('hi')>>> f.seek(0)>>> f.read()'hi'>>> f.seek(0)>>> f.write('bye')# Will still append despite the seek(0)!>>> f.seek(0)>>> f.read()'hibye'
You need to open the file in append mode, by setting “a” or “ab” as the mode. See open().
When you open with “a” mode, the write position will always be at the end of the file (an append). You can open with “a+” to allow reading, seek backwards and read (but all writes will still be at the end of the file!).
Example:
>>> with open('test1','wb') as f:
f.write('test')
>>> with open('test1','ab') as f:
f.write('koko')
>>> with open('test1','rb') as f:
f.read()
'testkoko'
Note: Using ‘a’ is not the same as opening with ‘w’ and seeking to the end of the file – consider what might happen if another program opened the file and started writing between the seek and the write. On some operating systems, opening the file with ‘a’ guarantees that all your following writes will be appended atomically to the end of the file (even as the file grows by other writes).
A few more details about how the “a” mode operates (tested on Linux only). Even if you seek back, every write will append to the end of the file:
>>> f = open('test','a+') # Not using 'with' just to simplify the example REPL session
>>> f.write('hi')
>>> f.seek(0)
>>> f.read()
'hi'
>>> f.seek(0)
>>> f.write('bye') # Will still append despite the seek(0)!
>>> f.seek(0)
>>> f.read()
'hibye'
Opening a file in append mode (a as the first character of mode)
causes all subsequent write operations to this stream to occur at
end-of-file, as if preceded the call:
fseek(stream, 0, SEEK_END);
Old simplified answer (not using with):
Example: (in a real program use with to close the file – see the documentation)
with open("test.txt", "a") as myfile:
myfile.write("append me")
We declared the variable myfile to open a file named test.txt. Open takes 2 arguments, the file that we want to open and a string that represents the kinds of permission or operation we want to do on the file
here is file mode options
Mode Description
'r' This is the default mode. It Opens file for reading.
'w' This Mode Opens file for writing.
If file does not exist, it creates a new file.
If file exists it truncates the file.
'x' Creates a new file. If file already exists, the operation fails.
'a' Open file in append mode.
If file does not exist, it creates a new file.
't' This is the default mode. It opens in text mode.
'b' This opens in binary mode.
'+' This will open a file for reading and writing (updating)
The 'a' parameter signifies append mode. If you don’t want to use with open each time, you can easily write a function to do it for you:
def append(txt='\nFunction Successfully Executed', file):
with open(file, 'a') as f:
f.write(txt)
If you want to write somewhere else other than the end, you can use 'r+'†:
import os
with open(file, 'r+') as f:
f.seek(0, os.SEEK_END)
f.write("text to add")
Finally, the 'w+' parameter grants even more freedom. Specifically, it allows you to create the file if it doesn’t exist, as well as empty the contents of a file that currently exists.
The simplest way to append more text to the end of a file would be to use:
with open('/path/to/file', 'a+') as file:
file.write("Additions to file")
file.close()
The a+ in the open(...) statement instructs to open the file in append mode and allows read and write access.
It is also always good practice to use file.close() to close any files that you have opened once you are done using them.
回答 11
这是我的脚本,基本上计算行数,然后追加,然后再对它们进行计数,这样您就可以证明它起作用了。
shortPath ="../file_to_be_appended"
short = open(shortPath,'r')## this counts how many line are originally in the file:
long_path ="../file_to_be_appended_to"
long = open(long_path,'r')for i,l in enumerate(long):passprint"%s has %i lines initially"%(long_path,i)
long.close()
long = open(long_path,'a')## now open long file to append
l =True## will be a line
c =0## count the number of lines you writewhile l:try:
l = short.next()## when you run out of lines, this breaks and the except statement is run
c +=1
long.write(l)except:
l =None
long.close()print"Done!, wrote %s lines"%c
## finally, count how many lines are left.
long = open(long_path,'r')for i,l in enumerate(long):passprint"%s has %i lines after appending new lines"%(long_path, i)
long.close()
Here’s my script, which basically counts the number of lines, then appends, then counts them again so you have evidence it worked.
shortPath = "../file_to_be_appended"
short = open(shortPath, 'r')
## this counts how many line are originally in the file:
long_path = "../file_to_be_appended_to"
long = open(long_path, 'r')
for i,l in enumerate(long):
pass
print "%s has %i lines initially" %(long_path,i)
long.close()
long = open(long_path, 'a') ## now open long file to append
l = True ## will be a line
c = 0 ## count the number of lines you write
while l:
try:
l = short.next() ## when you run out of lines, this breaks and the except statement is run
c += 1
long.write(l)
except:
l = None
long.close()
print "Done!, wrote %s lines" %c
## finally, count how many lines are left.
long = open(long_path, 'r')
for i,l in enumerate(long):
pass
print "%s has %i lines after appending new lines" %(long_path, i)
long.close()
>>> li =['a','b','mpilgrim','z','example']>>> li
['a','b','mpilgrim','z','example']>>> li.append("new")>>> li
['a','b','mpilgrim','z','example','new']>>> li.append(["new",2])>>> li
['a','b','mpilgrim','z','example','new',['new',2]]>>> li.insert(2,"new")>>> li
['a','b','new','mpilgrim','z','example','new',['new',2]]>>> li.extend(["two","elements"])>>> li
['a','b','new','mpilgrim','z','example','new',['new',2],'two','elements']
def append_one(a_list, element):
a_list.append(element)def extend_one(a_list, element):"""creating a new list is semantically the most direct
way to create an iterable to give to extend"""
a_list.extend([element])import timeit
What is the difference between the list methods append and extend?
append adds its argument as a single element to the end of a list. The length of the list itself will increase by one.
extend iterates over its argument adding each element to the list, extending the list. The length of the list will increase by however many elements were in the iterable argument.
append
The list.append method appends an object to the end of the list.
my_list.append(object)
Whatever the object is, whether a number, a string, another list, or something else, it gets added onto the end of my_list as a single entry on the list.
So keep in mind that a list is an object. If you append another list onto a list, the first list will be a single object at the end of the list (which may not be what you want):
>>> another_list = [1, 2, 3]
>>> my_list.append(another_list)
>>> my_list
['foo', 'bar', 'baz', [1, 2, 3]]
#^^^^^^^^^--- single item at the end of the list.
extend
The list.extend method extends a list by appending elements from an iterable:
my_list.extend(iterable)
So with extend, each element of the iterable gets appended onto the list. For example:
Keep in mind that a string is an iterable, so if you extend a list with a string, you’ll append each character as you iterate over the string (which may not be what you want):
Both + and += operators are defined for list. They are semantically similar to extend.
my_list + another_list creates a third list in memory, so you can return the result of it, but it requires that the second iterable be a list.
my_list += another_list modifies the list in-place (it is the in-place operator, and lists are mutable objects, as we’ve seen) so it does not create a new list. It also works like extend, in that the second iterable can be any kind of iterable.
Don’t get confused – my_list = my_list + another_list is not equivalent to += – it gives you a brand new list assigned to my_list.
Iterating through the multiple calls to append adds to the complexity, making it equivalent to that of extend, and since extend’s iteration is implemented in C, it will always be faster if you intend to append successive items from an iterable onto a list.
Performance
You may wonder what is more performant, since append can be used to achieve the same outcome as extend. The following functions do the same thing:
def append(alist, iterable):
for item in iterable:
alist.append(item)
def extend(alist, iterable):
alist.extend(iterable)
Perfect answer, I just miss the timing of comparing adding only one element
Do the semantically correct thing. If you want to append all elements in an iterable, use extend. If you’re just adding one element, use append.
Ok, so let’s create an experiment to see how this works out in time:
def append_one(a_list, element):
a_list.append(element)
def extend_one(a_list, element):
"""creating a new list is semantically the most direct
way to create an iterable to give to extend"""
a_list.extend([element])
import timeit
And we see that going out of our way to create an iterable just to use extend is a (minor) waste of time:
We learn from this that there’s nothing gained from using extend when we have only one element to append.
Also, these timings are not that important. I am just showing them to make the point that, in Python, doing the semantically correct thing is doing things the Right Way™.
It’s conceivable that you might test timings on two comparable operations and get an ambiguous or inverse result. Just focus on doing the semantically correct thing.
Conclusion
We see that extend is semantically clearer, and that it can run much faster than append, when you intend to append each element in an iterable to a list.
If you only have a single element (not in an iterable) to add to the list, use append.
回答 3
append追加一个元素。extend追加元素列表。
请注意,如果您传递要追加的列表,它仍会添加一个元素:
>>> a =[1,2,3]>>> a.append([4,5,6])>>> a
[1,2,3,[4,5,6]]
With append you can append a single element that will extend the list:
>>> a = [1,2]
>>> a.append(3)
>>> a
[1,2,3]
If you want to extend more than one element you should use extend, because you can only append one elment or one list of element:
>>> a.append([4,5])
>>> a
>>> [1,2,3,[4,5]]
So that you get a nested list
Instead with extend, you can extend a single element like this
>>> a = [1,2]
>>> a.extend([3])
>>> a
[1,2,3]
Or, differently, from append, extend more elements in one time without nesting the list into the original one (that’s the reason of the name extend)
>>> a.extend([4,5,6])
>>> a
[1,2,3,4,5,6]
Adding one element with both methods
Both append and extend can add one element to the end of the list, though append is simpler.
append 1 element
>>> x = [1,2]
>>> x.append(3)
>>> x
[1,2,3]
extend one element
>>> x = [1,2]
>>> x.extend([3])
>>> x
[1,2,3]
Adding more elements… with different results
If you use append for more than one element, you have to pass a list of elements as arguments and you will obtain a NESTED list!
>>> x = [1,2]
>>> x.append([3,4])
>>> x
[1,2,[3,4]]
With extend, instead, you pass a list as an argument, but you will obtain a list with the new element that is not nested in the old one.
>>> z = [1,2]
>>> z.extend([3,4])
>>> z
[1,2,3,4]
So, with more elements, you will use extend to get a list with more items.
However, appending a list will not add more elements to the list, but one element that is a nested list as you can clearly see in the output of the code.
The append() method adds a single item to the end of the list.
x = [1, 2, 3]
x.append([4, 5])
x.append('abc')
print(x)
# gives you
[1, 2, 3, [4, 5], 'abc']
The extend() method takes one argument, a list, and appends each of the items of the argument to the original list. (Lists are implemented as classes. “Creating” a list is really instantiating a class. As such, a list has methods that operate on it.)
x = [1, 2, 3]
x.extend([4, 5])
x.extend('abc')
print(x)
# gives you
[1, 2, 3, 4, 5, 'a', 'b', 'c']
Similarly += for in place behavior, but with slight differences from append & extend. One of the biggest differences of += from append and extend is when it is used in function scopes, see this blog post.
回答 8
append(object) -通过将对象添加到列表来更新列表。
x =[20]# List passed to the append(object) method is treated as a single object.
x.append([21,22,23])# Hence the resultant list length will be 2print(x)-->[20,[21,22,23]]
extend(list) -本质上是串联两个列表。
x =[20]# The parameter passed to extend(list) method is treated as a list.# Eventually it is two lists being concatenated.
x.extend([21,22,23])# Here the resultant list's length is 4print(x)[20,21,22,23]
append(object) – Updates the list by adding an object to the list.
x = [20]
# List passed to the append(object) method is treated as a single object.
x.append([21, 22, 23])
# Hence the resultant list length will be 2
print(x)
--> [20, [21, 22, 23]]
extend(list) – Essentially concatenates two lists.
x = [20]
# The parameter passed to extend(list) method is treated as a list.
# Eventually it is two lists being concatenated.
x.extend([21, 22, 23])
# Here the resultant list's length is 4
print(x)
[20, 21, 22, 23]
An interesting point that has been hinted, but not explained, is that extend is faster than append. For any loop that has append inside should be considered to be replaced by list.extend(processed_elements).
Bear in mind that apprending new elements might result in the realloaction of the whole list to a better location in memory. If this is done several times because we are appending 1 element at a time, overall performance suffers. In this sense, list.extend is analogous to “”.join(stringlist).
Append adds the entire data at once. The whole data will be added to the newly created index. On the other hand, extend, as it name suggests, extends the current array.
lis =[1,2,3]# 'extend' is equivalent to this
lis = lis + list(iterable)# 'append' simply appends its argument as the last element to the list# as long as the argument is a valid Python object
list.append(object)
An English dictionary defines the words append and extend as:
append: add (something) to the end of a written document. extend: make larger. Enlarge or expand
With that knowledge, now let’s understand
1) The difference between append and extend
append:
Appends any Python object as-is to the end of the list (i.e. as a
the last element in the list).
The resulting list may be nested and contain heterogeneous elements (i.e. list, string, tuple, dictionary, set, etc.)
extend:
Accepts any iterable as its argument and makes the list larger.
The resulting list is always one-dimensional list (i.e. no nesting) and it may contain heterogeneous elements in it (e.g. characters, integers, float) as a result of applying list(iterable).
2) Similarity between append and extend
Both take exactly one argument.
Both modify the list in-place.
As a result, both returns None.
Example
lis = [1, 2, 3]
# 'extend' is equivalent to this
lis = lis + list(iterable)
# 'append' simply appends its argument as the last element to the list
# as long as the argument is a valid Python object
list.append(object)
I hope I can make a useful supplement to this question. If your list stores a specific type object, for example Info, here is a situation that extend method is not suitable: In a for loop and and generating an Info object every time and using extend to store it into your list, it will fail. The exception is like below:
TypeError: ‘Info’ object is not iterable
But if you use the append method, the result is OK. Because every time using the extend method, it will always treat it as a list or any other collection type, iterate it, and place it after the previous list. A specific object can not be iterated, obviously.
def append_o(a_list, element):
a_list.append(element)print('append:', end =' ')for item in a_list:print(item, end =',')print()def extend_o(a_list, element):
a_list.extend(element)print('extend:', end =' ')for item in a_list:print(item, end =',')print()
append_o(['ab'],'cd')
extend_o(['ab'],'cd')
append_o(['ab'],['cd','ef'])
extend_o(['ab'],['cd','ef'])
append_o(['ab'],['cd'])
extend_o(['ab'],['cd'])
append “extends” the list (in place) by only one item, the single object passed (as argument).
extend “extends” the list (in place) by as many items as the object passed (as argument) contains.
This may be slightly confusing for str objects.
If you pass a string as argument:
append will add a single string item at the end but
extend will add as many “single” ‘str’ items as the length of that string.
If you pass a list of strings as argument:
append will still add a single ‘list’ item at the end and
extend will add as many ‘list’ items as the length of the passed list.
def append_o(a_list, element):
a_list.append(element)
print('append:', end = ' ')
for item in a_list:
print(item, end = ',')
print()
def extend_o(a_list, element):
a_list.extend(element)
print('extend:', end = ' ')
for item in a_list:
print(item, end = ',')
print()
append_o(['ab'],'cd')
extend_o(['ab'],'cd')
append_o(['ab'],['cd', 'ef'])
extend_o(['ab'],['cd', 'ef'])
append_o(['ab'],['cd'])
extend_o(['ab'],['cd'])
Append and extend are one of the extensibility mechanisms in python.
Append: Adds an element to the end of the list.
my_list = [1,2,3,4]
To add a new element to the list, we can use append method in the following way.
my_list.append(5)
The default location that the new element will be added is always in the (length+1) position.
Insert: The insert method was used to overcome the limitations of append. With insert, we can explicitly define the exact position we want our new element to be inserted at.
Method descriptor of insert(index, object). It takes two arguments, first being the index we want to insert our element and second the element itself.
Extend: This is very useful when we want to join two or more lists into a single list. Without extend, if we want to join two lists, the resulting object will contain a list of lists.
a = [1,2]
b = [3]
a.append(b)
print (a)
[1,2,[3]]
If we try to access the element at pos 2, we get a list ([3]), instead of the element. To join two lists, we’ll have to use append.
a = [1,2]
b = [3]
a.extend(b)
print (a)
[1,2,3]
To join multiple lists
a = [1]
b = [2]
c = [3]
a.extend(b+c)
print (a)
[1,2,3]