分类目录归档:知识问答

类似于sprintf的Python功能

问题:类似于sprintf的Python功能

我想创建一个字符串缓冲区来进行大量处理,格式化,最后使用sprintfPython中的C样式功能将缓冲区写入文本文件中。由于条件语句,我无法将它们直接写到文件中。

例如伪代码:

sprintf(buf,"A = %d\n , B= %s\n",A,B)
/* some processing */
sprint(buf,"C=%d\n",c)
....
...
fprintf(file,buf)

所以在输出文件中,我们有这种o / p:

A= foo B= bar
C= ded
etc...

编辑,以澄清我的问题:
buf是一个大缓冲区,其中包含所有使用sprintf格式化的字符串。按照您的示例,buf将仅包含当前值,而不包含旧值。例如,buf我最初写的书A= something ,B= something后来C= something被附加在同一书中buf,但是在您的Python答案buf中仅包含最后一个值,这不是我想要的-我想拥有自开始以来所做的buf所有printf操作,例如in C

I would like to create a string buffer to do lots of processing, format and finally write the buffer in a text file using a C-style sprintf functionality in Python. Because of conditional statements, I can’t write them directly to the file.

e.g pseudo code:

sprintf(buf,"A = %d\n , B= %s\n",A,B)
/* some processing */
sprint(buf,"C=%d\n",c)
....
...
fprintf(file,buf)

So in the output file we have this kind of o/p:

A= foo B= bar
C= ded
etc...

Edit, to clarify my question:
buf is a big buffer contains all these strings which have formatted using sprintf. Going by your examples, buf will only contain current values, not older ones. e.g first in buf I wrote A= something ,B= something later C= something was appended in the same buf, but in your Python answers buf contains only last value, which is not I want – I want buf to have all the printfs I have done since the beginning, like in C.


回答 0

Python %为此提供了一个运算符。

>>> a = 5
>>> b = "hello"
>>> buf = "A = %d\n , B = %s\n" % (a, b)
>>> print buf
A = 5
 , B = hello

>>> c = 10
>>> buf = "C = %d\n" % c
>>> print buf
C = 10

有关所有受支持的格式说明符,请参见此参考

您也可以使用format

>>> print "This is the {}th tome of {}".format(5, "knowledge")
This is the 5th tome of knowledge

Python has a % operator for this.

>>> a = 5
>>> b = "hello"
>>> buf = "A = %d\n , B = %s\n" % (a, b)
>>> print buf
A = 5
 , B = hello

>>> c = 10
>>> buf = "C = %d\n" % c
>>> print buf
C = 10

See this reference for all supported format specifiers.

You could as well use format:

>>> print "This is the {}th tome of {}".format(5, "knowledge")
This is the 5th tome of knowledge

回答 1

如果我正确理解了您的问题,那么format()就是您所要的东西,以及它的迷你语言

python 2.7及更高版本的愚蠢示例:

>>> print "{} ...\r\n {}!".format("Hello", "world")
Hello ...
 world!

对于早期的python版本:(已通过2.6.2测试)

>>> print "{0} ...\r\n {1}!".format("Hello", "world")
Hello ...
 world!

If I understand your question correctly, format() is what you are looking for, along with its mini-language.

Silly example for python 2.7 and up:

>>> print "{} ...\r\n {}!".format("Hello", "world")
Hello ...
 world!

For earlier python versions: (tested with 2.6.2)

>>> print "{0} ...\r\n {1}!".format("Hello", "world")
Hello ...
 world!

回答 2

我并不完全确定我了解您的目标,但是您可以将StringIO实例用作缓冲区:

>>> import StringIO 
>>> buf = StringIO.StringIO()
>>> buf.write("A = %d, B = %s\n" % (3, "bar"))
>>> buf.write("C=%d\n" % 5)
>>> print(buf.getvalue())
A = 3, B = bar
C=5

与不同sprintf,您只需将字符串传递给buf.write,即可使用%运算符或format字符串方法对其进行格式化。

您当然可以定义一个函数来获取sprintf您希望的接口:

def sprintf(buf, fmt, *args):
    buf.write(fmt % args)

可以这样使用:

>>> buf = StringIO.StringIO()
>>> sprintf(buf, "A = %d, B = %s\n", 3, "foo")
>>> sprintf(buf, "C = %d\n", 5)
>>> print(buf.getvalue())
A = 3, B = foo
C = 5

I’m not completely certain that I understand your goal, but you can use a StringIO instance as a buffer:

>>> import StringIO 
>>> buf = StringIO.StringIO()
>>> buf.write("A = %d, B = %s\n" % (3, "bar"))
>>> buf.write("C=%d\n" % 5)
>>> print(buf.getvalue())
A = 3, B = bar
C=5

Unlike sprintf, you just pass a string to buf.write, formatting it with the % operator or the format method of strings.

You could of course define a function to get the sprintf interface you’re hoping for:

def sprintf(buf, fmt, *args):
    buf.write(fmt % args)

which would be used like this:

>>> buf = StringIO.StringIO()
>>> sprintf(buf, "A = %d, B = %s\n", 3, "foo")
>>> sprintf(buf, "C = %d\n", 5)
>>> print(buf.getvalue())
A = 3, B = foo
C = 5

回答 3

使用格式运算符%

buf = "A = %d\n , B= %s\n" % (a, b)
print >>f, buf

Use the formatting operator %:

buf = "A = %d\n , B= %s\n" % (a, b)
print >>f, buf

回答 4

您可以使用字符串格式:

>>> a=42
>>> b="bar"
>>> "The number is %d and the word is %s" % (a,b)
'The number is 42 and the word is bar'

但这已在Python 3中删除,您应该使用“ str.format()”:

>>> a=42
>>> b="bar"
>>> "The number is {0} and the word is {1}".format(a,b)
'The number is 42 and the word is bar'

You can use string formatting:

>>> a=42
>>> b="bar"
>>> "The number is %d and the word is %s" % (a,b)
'The number is 42 and the word is bar'

But this is removed in Python 3, you should use “str.format()”:

>>> a=42
>>> b="bar"
>>> "The number is {0} and the word is {1}".format(a,b)
'The number is 42 and the word is bar'

回答 5

要插入很长的字符串,最好为不同的参数使用名称,而不是希望它们放在正确的位置。这也使替换多次重复变得更容易。

>>> 'Coordinates: {latitude}, {longitude}'.format(latitude='37.24N', longitude='-115.81W')
'Coordinates: 37.24N, -115.81W'

取自“ 格式”示例,其中Format还显示了所有其他与之相关的答案。

To insert into a very long string it is nice to use names for the different arguments, instead of hoping they are in the right positions. This also makes it easier to replace multiple recurrences.

>>> 'Coordinates: {latitude}, {longitude}'.format(latitude='37.24N', longitude='-115.81W')
'Coordinates: 37.24N, -115.81W'

Taken from Format examples, where all the other Format-related answers are also shown.


回答 6

这可能是从C代码到Python代码最接近的翻译。

A = 1
B = "hello"
buf = "A = %d\n , B= %s\n" % (A, B)

c = 2
buf += "C=%d\n" % c

f = open('output.txt', 'w')
print >> f, c
f.close()

%Python中的运算符几乎执行与C相同的操作sprintf。您也可以将字符串直接打印到文件中。如果涉及许多此类字符串格式的字符串,那么明智的做法是使用StringIO对象来加快处理时间。

因此+=,不要这样做,而是这样做:

import cStringIO
buf = cStringIO.StringIO()

...

print >> buf, "A = %d\n , B= %s\n" % (A, B)

...

print >> buf, "C=%d\n" % c

...

print >> f, buf.getvalue()

This is probably the closest translation from your C code to Python code.

A = 1
B = "hello"
buf = "A = %d\n , B= %s\n" % (A, B)

c = 2
buf += "C=%d\n" % c

f = open('output.txt', 'w')
print >> f, c
f.close()

The % operator in Python does almost exactly the same thing as C’s sprintf. You can also print the string to a file directly. If there are lots of these string formatted stringlets involved, it might be wise to use a StringIO object to speed up processing time.

So instead of doing +=, do this:

import cStringIO
buf = cStringIO.StringIO()

...

print >> buf, "A = %d\n , B= %s\n" % (A, B)

...

print >> buf, "C=%d\n" % c

...

print >> f, buf.getvalue()

回答 7

如果您想要类似python3打印功能的东西,但是想要一个字符串:

def sprint(*args, **kwargs):
    sio = io.StringIO()
    print(*args, **kwargs, file=sio)
    return sio.getvalue()
>>> x = sprint('abc', 10, ['one', 'two'], {'a': 1, 'b': 2}, {1, 2, 3})
>>> x
"abc 10 ['one', 'two'] {'a': 1, 'b': 2} {1, 2, 3}\n"

'\n'结尾没有:

def sprint(*args, end='', **kwargs):
    sio = io.StringIO()
    print(*args, **kwargs, end=end, file=sio)
    return sio.getvalue()
>>> x = sprint('abc', 10, ['one', 'two'], {'a': 1, 'b': 2}, {1, 2, 3})
>>> x
"abc 10 ['one', 'two'] {'a': 1, 'b': 2} {1, 2, 3}"

If you want something like the python3 print function but to a string:

def sprint(*args, **kwargs):
    sio = io.StringIO()
    print(*args, **kwargs, file=sio)
    return sio.getvalue()
>>> x = sprint('abc', 10, ['one', 'two'], {'a': 1, 'b': 2}, {1, 2, 3})
>>> x
"abc 10 ['one', 'two'] {'a': 1, 'b': 2} {1, 2, 3}\n"

or without the '\n' at the end:

def sprint(*args, end='', **kwargs):
    sio = io.StringIO()
    print(*args, **kwargs, end=end, file=sio)
    return sio.getvalue()
>>> x = sprint('abc', 10, ['one', 'two'], {'a': 1, 'b': 2}, {1, 2, 3})
>>> x
"abc 10 ['one', 'two'] {'a': 1, 'b': 2} {1, 2, 3}"

回答 8

就像是…

greetings = 'Hello {name}'.format(name = 'John')

Hello John

Something like…

greetings = 'Hello {name}'.format(name = 'John')

Hello John

回答 9

Take a look at “Literal String Interpolation” https://www.python.org/dev/peps/pep-0498/

I found it through the http://www.malemburg.com/


回答 10

两种方法是写入字符串缓冲区或将行写入列表,然后再将它们连接。我认为该StringIO方法更具pythonic功能,但在python 2.6之前不起作用。

from io import StringIO

with StringIO() as s:
   print("Hello", file=s)
   print("Goodbye", file=s)
   # And later...
   with open('myfile', 'w') as f:
       f.write(s.getvalue())

您也可以不使用ContextManangers = StringIO())使用它们。当前,我正在使用带有print函数的上下文管理器类。为了能够插入调试或奇数页的需求,此片段可能很有用:

class Report:
    ... usual init/enter/exit
    def print(self, *args, **kwargs):
        with StringIO() as s:
            print(*args, **kwargs, file=s)
            out = s.getvalue()
        ... stuff with out

with Report() as r:
   r.print(f"This is {datetime.date.today()}!", 'Yikes!', end=':')

Two approaches are to write to a string buffer or to write lines to a list and join them later. I think the StringIO approach is more pythonic, but didn’t work before Python 2.6.

from io import StringIO

with StringIO() as s:
   print("Hello", file=s)
   print("Goodbye", file=s)
   # And later...
   with open('myfile', 'w') as f:
       f.write(s.getvalue())

You can also use these without a ContextMananger (s = StringIO()). Currently, I’m using a context manager class with a print function. This fragment might be useful to be able to insert debugging or odd paging requirements:

class Report:
    ... usual init/enter/exit
    def print(self, *args, **kwargs):
        with StringIO() as s:
            print(*args, **kwargs, file=s)
            out = s.getvalue()
        ... stuff with out

with Report() as r:
   r.print(f"This is {datetime.date.today()}!", 'Yikes!', end=':')

熊猫将数据框转换为元组数组

问题:熊猫将数据框转换为元组数组

我已经使用熊猫处理了一些数据,现在我想将批处理保存回数据库。这要求我将数据帧转换为元组数组,每个元组都对应于数据帧的“行”。

我的DataFrame看起来像:

In [182]: data_set
Out[182]: 
  index data_date   data_1  data_2
0  14303 2012-02-17  24.75   25.03 
1  12009 2012-02-16  25.00   25.07 
2  11830 2012-02-15  24.99   25.15 
3  6274  2012-02-14  24.68   25.05 
4  2302  2012-02-13  24.62   24.77 
5  14085 2012-02-10  24.38   24.61 

我想将其转换为元组数组,例如:

[(datetime.date(2012,2,17),24.75,25.03),
(datetime.date(2012,2,16),25.00,25.07),
...etc. ]

关于如何有效执行此操作的任何建议?

I have manipulated some data using pandas and now I want to carry out a batch save back to the database. This requires me to convert the dataframe into an array of tuples, with each tuple corresponding to a “row” of the dataframe.

My DataFrame looks something like:

In [182]: data_set
Out[182]: 
  index data_date   data_1  data_2
0  14303 2012-02-17  24.75   25.03 
1  12009 2012-02-16  25.00   25.07 
2  11830 2012-02-15  24.99   25.15 
3  6274  2012-02-14  24.68   25.05 
4  2302  2012-02-13  24.62   24.77 
5  14085 2012-02-10  24.38   24.61 

I want to convert it to an array of tuples like:

[(datetime.date(2012,2,17),24.75,25.03),
(datetime.date(2012,2,16),25.00,25.07),
...etc. ]

Any suggestion on how I can efficiently do this?


回答 0

怎么样:

subset = data_set[['data_date', 'data_1', 'data_2']]
tuples = [tuple(x) for x in subset.to_numpy()]

大熊猫<0.24使用

tuples = [tuple(x) for x in subset.values]

How about:

subset = data_set[['data_date', 'data_1', 'data_2']]
tuples = [tuple(x) for x in subset.to_numpy()]

for pandas < 0.24 use

tuples = [tuple(x) for x in subset.values]

回答 1

list(data_set.itertuples(index=False))

从17.1开始,以上代码将返回namedtuples列表

如果需要普通元组的列表,请name=None作为参数传递:

list(data_set.itertuples(index=False, name=None))
list(data_set.itertuples(index=False))

As of 17.1, the above will return a list of namedtuples.

If you want a list of ordinary tuples, pass name=None as an argument:

list(data_set.itertuples(index=False, name=None))

回答 2

通用方式:

[tuple(x) for x in data_set.to_records(index=False)]

A generic way:

[tuple(x) for x in data_set.to_records(index=False)]

回答 3

动机
许多数据集足够大,我们需要关注自身的速度/效率。因此,我本着这种精神提供此解决方案。它恰好也是简洁的。

为了比较,让我们删除该index

df = data_set.drop('index', 1)

解决方案
我将建议使用zipmap

list(zip(*map(df.get, df)))

[('2012-02-17', 24.75, 25.03),
 ('2012-02-16', 25.0, 25.07),
 ('2012-02-15', 24.99, 25.15),
 ('2012-02-14', 24.68, 25.05),
 ('2012-02-13', 24.62, 24.77),
 ('2012-02-10', 24.38, 24.61)]

如果我们要处理特定的列子集,它也很灵活。我们假设已经显示的列是我们想要的子集。

list(zip(*map(df.get, ['data_date', 'data_1', 'data_2'])))

[('2012-02-17', 24.75, 25.03),
 ('2012-02-16', 25.0, 25.07),
 ('2012-02-15', 24.99, 25.15),
 ('2012-02-14', 24.68, 25.05),
 ('2012-02-13', 24.62, 24.77),
 ('2012-02-10', 24.38, 24.61)]

什么是更快?

转弯records最快,然后渐近收敛zipmapiter_tuples

我将使用simple_benchmarks这篇文章中获得的库

from simple_benchmark import BenchmarkBuilder
b = BenchmarkBuilder()

import pandas as pd
import numpy as np

def tuple_comp(df): return [tuple(x) for x in df.to_numpy()]
def iter_namedtuples(df): return list(df.itertuples(index=False))
def iter_tuples(df): return list(df.itertuples(index=False, name=None))
def records(df): return df.to_records(index=False).tolist()
def zipmap(df): return list(zip(*map(df.get, df)))

funcs = [tuple_comp, iter_namedtuples, iter_tuples, records, zipmap]
for func in funcs:
    b.add_function()(func)

def creator(n):
    return pd.DataFrame({"A": random.randint(n, size=n), "B": random.randint(n, size=n)})

@b.add_arguments('Rows in DataFrame')
def argument_provider():
    for n in (10 ** (np.arange(4, 11) / 2)).astype(int):
        yield n, creator(n)

r = b.run()

检查结果

r.to_pandas_dataframe().pipe(lambda d: d.div(d.min(1), 0))

        tuple_comp  iter_namedtuples  iter_tuples   records    zipmap
100       2.905662          6.626308     3.450741  1.469471  1.000000
316       4.612692          4.814433     2.375874  1.096352  1.000000
1000      6.513121          4.106426     1.958293  1.000000  1.316303
3162      8.446138          4.082161     1.808339  1.000000  1.533605
10000     8.424483          3.621461     1.651831  1.000000  1.558592
31622     7.813803          3.386592     1.586483  1.000000  1.515478
100000    7.050572          3.162426     1.499977  1.000000  1.480131

r.plot()

Motivation
Many data sets are large enough that we need to concern ourselves with speed/efficiency. So I offer this solution in that spirit. It happens to also be succinct.

For the sake of comparison, let’s drop the index column

df = data_set.drop('index', 1)

Solution
I’ll propose the use of zip and map

list(zip(*map(df.get, df)))

[('2012-02-17', 24.75, 25.03),
 ('2012-02-16', 25.0, 25.07),
 ('2012-02-15', 24.99, 25.15),
 ('2012-02-14', 24.68, 25.05),
 ('2012-02-13', 24.62, 24.77),
 ('2012-02-10', 24.38, 24.61)]

It happens to also be flexible if we wanted to deal with a specific subset of columns. We’ll assume the columns we’ve already displayed are the subset we want.

list(zip(*map(df.get, ['data_date', 'data_1', 'data_2'])))

[('2012-02-17', 24.75, 25.03),
 ('2012-02-16', 25.0, 25.07),
 ('2012-02-15', 24.99, 25.15),
 ('2012-02-14', 24.68, 25.05),
 ('2012-02-13', 24.62, 24.77),
 ('2012-02-10', 24.38, 24.61)]

What is Quicker?

Turn’s out records is quickest followed by asymptotically converging zipmap and iter_tuples

I’ll use a library simple_benchmarks that I got from this post

from simple_benchmark import BenchmarkBuilder
b = BenchmarkBuilder()

import pandas as pd
import numpy as np

def tuple_comp(df): return [tuple(x) for x in df.to_numpy()]
def iter_namedtuples(df): return list(df.itertuples(index=False))
def iter_tuples(df): return list(df.itertuples(index=False, name=None))
def records(df): return df.to_records(index=False).tolist()
def zipmap(df): return list(zip(*map(df.get, df)))

funcs = [tuple_comp, iter_namedtuples, iter_tuples, records, zipmap]
for func in funcs:
    b.add_function()(func)

def creator(n):
    return pd.DataFrame({"A": random.randint(n, size=n), "B": random.randint(n, size=n)})

@b.add_arguments('Rows in DataFrame')
def argument_provider():
    for n in (10 ** (np.arange(4, 11) / 2)).astype(int):
        yield n, creator(n)

r = b.run()

Check the results

r.to_pandas_dataframe().pipe(lambda d: d.div(d.min(1), 0))

        tuple_comp  iter_namedtuples  iter_tuples   records    zipmap
100       2.905662          6.626308     3.450741  1.469471  1.000000
316       4.612692          4.814433     2.375874  1.096352  1.000000
1000      6.513121          4.106426     1.958293  1.000000  1.316303
3162      8.446138          4.082161     1.808339  1.000000  1.533605
10000     8.424483          3.621461     1.651831  1.000000  1.558592
31622     7.813803          3.386592     1.586483  1.000000  1.515478
100000    7.050572          3.162426     1.499977  1.000000  1.480131

r.plot()


回答 4

这是一种向量化方法(假设将数据帧data_set定义为df),它返回的listof tuples,如下所示:

>>> df.set_index(['data_date'])[['data_1', 'data_2']].to_records().tolist()

生成:

[(datetime.datetime(2012, 2, 17, 0, 0), 24.75, 25.03),
 (datetime.datetime(2012, 2, 16, 0, 0), 25.0, 25.07),
 (datetime.datetime(2012, 2, 15, 0, 0), 24.99, 25.15),
 (datetime.datetime(2012, 2, 14, 0, 0), 24.68, 25.05),
 (datetime.datetime(2012, 2, 13, 0, 0), 24.62, 24.77),
 (datetime.datetime(2012, 2, 10, 0, 0), 24.38, 24.61)]

将datetime列设置为索引轴的想法是,通过对数据帧使用其中的参数来帮助将Timestamp值转换为其对应的datetime.datetime等效格式。convert_datetime64DF.to_recordsDateTimeIndex

这会返回recarray,然后可以将其返回给listusing.tolist


根据用例,更通用的解决方案是:

df.to_records().tolist()                              # Supply index=False to exclude index

Here’s a vectorized approach (assuming the dataframe, data_set to be defined as df instead) that returns a list of tuples as shown:

>>> df.set_index(['data_date'])[['data_1', 'data_2']].to_records().tolist()

produces:

[(datetime.datetime(2012, 2, 17, 0, 0), 24.75, 25.03),
 (datetime.datetime(2012, 2, 16, 0, 0), 25.0, 25.07),
 (datetime.datetime(2012, 2, 15, 0, 0), 24.99, 25.15),
 (datetime.datetime(2012, 2, 14, 0, 0), 24.68, 25.05),
 (datetime.datetime(2012, 2, 13, 0, 0), 24.62, 24.77),
 (datetime.datetime(2012, 2, 10, 0, 0), 24.38, 24.61)]

The idea of setting datetime column as the index axis is to aid in the conversion of the Timestamp value to it’s corresponding datetime.datetime format equivalent by making use of the convert_datetime64 argument in DF.to_records which does so for a DateTimeIndex dataframe.

This returns a recarray which could be then made to return a list using .tolist


More generalized solution depending on the use case would be:

df.to_records().tolist()                              # Supply index=False to exclude index

回答 5

最有效,最简单的方法:

list(data_set.to_records())

您可以在此调用之前过滤所需的列。

The most efficient and easy way:

list(data_set.to_records())

You can filter the columns you need before this call.


回答 6

该答案不会添加尚未讨论的任何答案,但是这里提供了一些速度结果。我认为这应该可以解决评论中出现的问题。所有这些看起来都像是O(n)基于这三个值,。

TL; DRtuples = list(df.itertuples(index=False, name=None))tuples = list(zip(*[df[c].values.tolist() for c in df]))并列最快。

我对结果进行了快速速度测试,得出以下三个建议:

  1. @pirsquared的zip答案: tuples = list(zip(*[df[c].values.tolist() for c in df]))
  2. @ wes-mckinney接受的答案: tuples = [tuple(x) for x in df.values]
  3. itertuples来自@ksindi的答案以及来自@Axel的name=None建议:tuples = list(df.itertuples(index=False, name=None))
from numpy import random
import pandas as pd


def create_random_df(n):
    return pd.DataFrame({"A": random.randint(n, size=n), "B": random.randint(n, size=n)})

小尺寸:

df = create_random_df(10000)
%timeit tuples = list(zip(*[df[c].values.tolist() for c in df]))
%timeit tuples = [tuple(x) for x in df.values]
%timeit tuples = list(df.itertuples(index=False, name=None))

给出:

1.66 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
15.5 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.74 ms ± 75.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

较大:

df = create_random_df(1000000)
%timeit tuples = list(zip(*[df[c].values.tolist() for c in df]))
%timeit tuples = [tuple(x) for x in df.values]
%timeit tuples = list(df.itertuples(index=False, name=None))

给出:

202 ms ± 5.91 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
1.52 s ± 98.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
209 ms ± 11.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

尽我所能:

df = create_random_df(10000000)
%timeit tuples = list(zip(*[df[c].values.tolist() for c in df]))
%timeit tuples = [tuple(x) for x in df.values]
%timeit tuples = list(df.itertuples(index=False, name=None))

给出:

1.78 s ± 118 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
15.4 s ± 222 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.68 s ± 96.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

zip版本和itertuples版本彼此在置信区间内。我怀疑他们在幕后做着同样的事情。

这些速度测试可能无关紧要。突破计算机内存的限制并不需要花费大量时间,并且您实际上不应该对大型数据集执行此操作。完成这些操作后,使用这些元组将最终效率低下。这不太可能成为代码中的主要瓶颈,因此请坚持使用您认为最易读的版本。

This answer doesn’t add any answers that aren’t already discussed, but here are some speed results. I think this should resolve questions that came up in the comments. All of these look like they are O(n), based on these three values.

TL;DR: tuples = list(df.itertuples(index=False, name=None)) and tuples = list(zip(*[df[c].values.tolist() for c in df])) are tied for the fastest.

I did a quick speed test on results for three suggestions here:

  1. The zip answer from @pirsquared: tuples = list(zip(*[df[c].values.tolist() for c in df]))
  2. The accepted answer from @wes-mckinney: tuples = [tuple(x) for x in df.values]
  3. The itertuples answer from @ksindi with the name=None suggestion from @Axel: tuples = list(df.itertuples(index=False, name=None))
from numpy import random
import pandas as pd


def create_random_df(n):
    return pd.DataFrame({"A": random.randint(n, size=n), "B": random.randint(n, size=n)})

Small size:

df = create_random_df(10000)
%timeit tuples = list(zip(*[df[c].values.tolist() for c in df]))
%timeit tuples = [tuple(x) for x in df.values]
%timeit tuples = list(df.itertuples(index=False, name=None))

Gives:

1.66 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
15.5 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.74 ms ± 75.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Larger:

df = create_random_df(1000000)
%timeit tuples = list(zip(*[df[c].values.tolist() for c in df]))
%timeit tuples = [tuple(x) for x in df.values]
%timeit tuples = list(df.itertuples(index=False, name=None))

Gives:

202 ms ± 5.91 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
1.52 s ± 98.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
209 ms ± 11.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

As much patience as I have:

df = create_random_df(10000000)
%timeit tuples = list(zip(*[df[c].values.tolist() for c in df]))
%timeit tuples = [tuple(x) for x in df.values]
%timeit tuples = list(df.itertuples(index=False, name=None))

Gives:

1.78 s ± 118 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
15.4 s ± 222 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.68 s ± 96.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The zip version and the itertuples version are within the confidence intervals each other. I suspect that they are doing the same thing under the hood.

These speed tests are probably irrelevant though. Pushing the limits of my computer’s memory doesn’t take a huge amount of time, and you really shouldn’t be doing this on a large data set. Working with those tuples after doing this will end up being really inefficient. It’s unlikely to be a major bottleneck in your code, so just stick with the version you think is most readable.


回答 7

#try this one:

tuples = list(zip(data_set["data_date"], data_set["data_1"],data_set["data_2"]))
print (tuples)
#try this one:

tuples = list(zip(data_set["data_date"], data_set["data_1"],data_set["data_2"]))
print (tuples)

回答 8

将数据框架列表更改为元组列表。

df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})
print(df)
OUTPUT
   col1  col2
0     1     4
1     2     5
2     3     6

records = df.to_records(index=False)
result = list(records)
print(result)
OUTPUT
[(1, 4), (2, 5), (3, 6)]

Changing the data frames list into a list of tuples.

df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})
print(df)
OUTPUT
   col1  col2
0     1     4
1     2     5
2     3     6

records = df.to_records(index=False)
result = list(records)
print(result)
OUTPUT
[(1, 4), (2, 5), (3, 6)]

回答 9

更多pythonic方式:

df = data_set[['data_date', 'data_1', 'data_2']]
map(tuple,df.values)

More pythonic way:

df = data_set[['data_date', 'data_1', 'data_2']]
map(tuple,df.values)

如何使用boto3将S3对象保存到文件

问题:如何使用boto3将S3对象保存到文件

我正在尝试使用适用于AWS的新boto3客户端做一个“ hello world” 。

我的用例非常简单:从S3获取对象并将其保存到文件中。

在boto 2.XI中,它应该是这样的:

import boto
key = boto.connect_s3().get_bucket('foo').get_key('foo')
key.get_contents_to_filename('/tmp/foo')

在boto 3中。我找不到一种干净的方法来做同样的事情,所以我手动遍历了“ Streaming”对象:

import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
    chunk = key['Body'].read(1024*8)
    while chunk:
        f.write(chunk)
        chunk = key['Body'].read(1024*8)

要么

import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
    for chunk in iter(lambda: key['Body'].read(4096), b''):
        f.write(chunk)

而且效果很好。我想知道是否有任何“本机” boto3函数可以完成相同的任务?

I’m trying to do a “hello world” with new boto3 client for AWS.

The use-case I have is fairly simple: get object from S3 and save it to the file.

In boto 2.X I would do it like this:

import boto
key = boto.connect_s3().get_bucket('foo').get_key('foo')
key.get_contents_to_filename('/tmp/foo')

In boto 3 . I can’t find a clean way to do the same thing, so I’m manually iterating over the “Streaming” object:

import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
    chunk = key['Body'].read(1024*8)
    while chunk:
        f.write(chunk)
        chunk = key['Body'].read(1024*8)

or

import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
    for chunk in iter(lambda: key['Body'].read(4096), b''):
        f.write(chunk)

And it works fine. I was wondering is there any “native” boto3 function that will do the same task?


回答 0

Boto3最近有一项自定义功能,可以帮助您(其中包括其他方面)。当前,它在低级S3客户端上公开,可以这样使用:

s3_client = boto3.client('s3')
open('hello.txt').write('Hello, world!')

# Upload the file to S3
s3_client.upload_file('hello.txt', 'MyBucket', 'hello-remote.txt')

# Download the file from S3
s3_client.download_file('MyBucket', 'hello-remote.txt', 'hello2.txt')
print(open('hello2.txt').read())

这些功能将自动处理读/写文件,以及并行并行处理大文件。

请注意,s3_client.download_file不会创建目录。可以将其创建为pathlib.Path('/path/to/file.txt').parent.mkdir(parents=True, exist_ok=True)

There is a customization that went into Boto3 recently which helps with this (among other things). It is currently exposed on the low-level S3 client, and can be used like this:

s3_client = boto3.client('s3')
open('hello.txt').write('Hello, world!')

# Upload the file to S3
s3_client.upload_file('hello.txt', 'MyBucket', 'hello-remote.txt')

# Download the file from S3
s3_client.download_file('MyBucket', 'hello-remote.txt', 'hello2.txt')
print(open('hello2.txt').read())

These functions will automatically handle reading/writing files as well as doing multipart uploads in parallel for large files.

Note that s3_client.download_file won’t create a directory. It can be created as pathlib.Path('/path/to/file.txt').parent.mkdir(parents=True, exist_ok=True).


回答 1

boto3现在具有比客户端更好的界面:

resource = boto3.resource('s3')
my_bucket = resource.Bucket('MyBucket')
my_bucket.download_file(key, local_filename)

就其本身而言,它并没有比client接受的答案好得多(尽管文档说它在失败时重试上载和下载做得更好),但考虑到资源通常更符合人体工程学(例如,s3 存储桶对象资源)比客户端方法更好),这确实使您可以停留在资源层而不必下拉。

Resources 通常,可以使用与客户端相同的方式来创建它们,并且它们采用全部或大部分相同的参数,然后将其转发给其内部客户端。

boto3 now has a nicer interface than the client:

resource = boto3.resource('s3')
my_bucket = resource.Bucket('MyBucket')
my_bucket.download_file(key, local_filename)

This by itself isn’t tremendously better than the client in the accepted answer (although the docs say that it does a better job retrying uploads and downloads on failure) but considering that resources are generally more ergonomic (for example, the s3 bucket and object resources are nicer than the client methods) this does allow you to stay at the resource layer without having to drop down.

Resources generally can be created in the same way as clients, and they take all or most of the same arguments and just forward them to their internal clients.


回答 2

对于那些想模拟set_contents_from_string类似boto2方法的人,您可以尝试

import boto3
from cStringIO import StringIO

s3c = boto3.client('s3')
contents = 'My string to save to S3 object'
target_bucket = 'hello-world.by.vor'
target_file = 'data/hello.txt'
fake_handle = StringIO(contents)

# notice if you do fake_handle.read() it reads like a file handle
s3c.put_object(Bucket=target_bucket, Key=target_file, Body=fake_handle.read())

对于Python3:

在python3中,StringIO和cStringIO都消失了StringIO像这样使用导入:

from io import StringIO

要同时支持两个版本:

try:
   from StringIO import StringIO
except ImportError:
   from io import StringIO

For those of you who would like to simulate the set_contents_from_string like boto2 methods, you can try

import boto3
from cStringIO import StringIO

s3c = boto3.client('s3')
contents = 'My string to save to S3 object'
target_bucket = 'hello-world.by.vor'
target_file = 'data/hello.txt'
fake_handle = StringIO(contents)

# notice if you do fake_handle.read() it reads like a file handle
s3c.put_object(Bucket=target_bucket, Key=target_file, Body=fake_handle.read())

For Python3:

In python3 both StringIO and cStringIO are gone. Use the StringIO import like:

from io import StringIO

To support both version:

try:
   from StringIO import StringIO
except ImportError:
   from io import StringIO

回答 3

# Preface: File is json with contents: {'name': 'Android', 'status': 'ERROR'}

import boto3
import io

s3 = boto3.resource('s3')

obj = s3.Object('my-bucket', 'key-to-file.json')
data = io.BytesIO()
obj.download_fileobj(data)

# object is now a bytes string, Converting it to a dict:
new_dict = json.loads(data.getvalue().decode("utf-8"))

print(new_dict['status']) 
# Should print "Error"
# Preface: File is json with contents: {'name': 'Android', 'status': 'ERROR'}

import boto3
import io

s3 = boto3.resource('s3')

obj = s3.Object('my-bucket', 'key-to-file.json')
data = io.BytesIO()
obj.download_fileobj(data)

# object is now a bytes string, Converting it to a dict:
new_dict = json.loads(data.getvalue().decode("utf-8"))

print(new_dict['status']) 
# Should print "Error"

回答 4

当您想要读取与默认配置不同的文件时,请mpu.aws.s3_download(s3path, destination)直接使用或复制粘贴的代码:

def s3_download(source, destination,
                exists_strategy='raise',
                profile_name=None):
    """
    Copy a file from an S3 source to a local destination.

    Parameters
    ----------
    source : str
        Path starting with s3://, e.g. 's3://bucket-name/key/foo.bar'
    destination : str
    exists_strategy : {'raise', 'replace', 'abort'}
        What is done when the destination already exists?
    profile_name : str, optional
        AWS profile

    Raises
    ------
    botocore.exceptions.NoCredentialsError
        Botocore is not able to find your credentials. Either specify
        profile_name or add the environment variables AWS_ACCESS_KEY_ID,
        AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN.
        See https://boto3.readthedocs.io/en/latest/guide/configuration.html
    """
    exists_strategies = ['raise', 'replace', 'abort']
    if exists_strategy not in exists_strategies:
        raise ValueError('exists_strategy \'{}\' is not in {}'
                         .format(exists_strategy, exists_strategies))
    session = boto3.Session(profile_name=profile_name)
    s3 = session.resource('s3')
    bucket_name, key = _s3_path_split(source)
    if os.path.isfile(destination):
        if exists_strategy is 'raise':
            raise RuntimeError('File \'{}\' already exists.'
                               .format(destination))
        elif exists_strategy is 'abort':
            return
    s3.Bucket(bucket_name).download_file(key, destination)

from collections import namedtuple

S3Path = namedtuple("S3Path", ["bucket_name", "key"])


def _s3_path_split(s3_path):
    """
    Split an S3 path into bucket and key.

    Parameters
    ----------
    s3_path : str

    Returns
    -------
    splitted : (str, str)
        (bucket, key)

    Examples
    --------
    >>> _s3_path_split('s3://my-bucket/foo/bar.jpg')
    S3Path(bucket_name='my-bucket', key='foo/bar.jpg')
    """
    if not s3_path.startswith("s3://"):
        raise ValueError(
            "s3_path is expected to start with 's3://', " "but was {}"
            .format(s3_path)
        )
    bucket_key = s3_path[len("s3://"):]
    bucket_name, key = bucket_key.split("/", 1)
    return S3Path(bucket_name, key)

When you want to read a file with a different configuration than the default one, feel free to use either mpu.aws.s3_download(s3path, destination) directly or the copy-pasted code:

def s3_download(source, destination,
                exists_strategy='raise',
                profile_name=None):
    """
    Copy a file from an S3 source to a local destination.

    Parameters
    ----------
    source : str
        Path starting with s3://, e.g. 's3://bucket-name/key/foo.bar'
    destination : str
    exists_strategy : {'raise', 'replace', 'abort'}
        What is done when the destination already exists?
    profile_name : str, optional
        AWS profile

    Raises
    ------
    botocore.exceptions.NoCredentialsError
        Botocore is not able to find your credentials. Either specify
        profile_name or add the environment variables AWS_ACCESS_KEY_ID,
        AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN.
        See https://boto3.readthedocs.io/en/latest/guide/configuration.html
    """
    exists_strategies = ['raise', 'replace', 'abort']
    if exists_strategy not in exists_strategies:
        raise ValueError('exists_strategy \'{}\' is not in {}'
                         .format(exists_strategy, exists_strategies))
    session = boto3.Session(profile_name=profile_name)
    s3 = session.resource('s3')
    bucket_name, key = _s3_path_split(source)
    if os.path.isfile(destination):
        if exists_strategy is 'raise':
            raise RuntimeError('File \'{}\' already exists.'
                               .format(destination))
        elif exists_strategy is 'abort':
            return
    s3.Bucket(bucket_name).download_file(key, destination)

from collections import namedtuple

S3Path = namedtuple("S3Path", ["bucket_name", "key"])


def _s3_path_split(s3_path):
    """
    Split an S3 path into bucket and key.

    Parameters
    ----------
    s3_path : str

    Returns
    -------
    splitted : (str, str)
        (bucket, key)

    Examples
    --------
    >>> _s3_path_split('s3://my-bucket/foo/bar.jpg')
    S3Path(bucket_name='my-bucket', key='foo/bar.jpg')
    """
    if not s3_path.startswith("s3://"):
        raise ValueError(
            "s3_path is expected to start with 's3://', " "but was {}"
            .format(s3_path)
        )
    bucket_key = s3_path[len("s3://"):]
    bucket_name, key = bucket_key.split("/", 1)
    return S3Path(bucket_name, key)

回答 5

注意:我假设您已经分别配置了身份验证。下面的代码是从S3存储桶下载单个对象。

import boto3

#initiate s3 client 
s3 = boto3.resource('s3')

#Download object to the file    
s3.Bucket('mybucket').download_file('hello.txt', '/tmp/hello.txt')

Note: I’m assuming you have configured authentication separately. Below code is to download the single object from the S3 bucket.

import boto3

#initiate s3 client 
s3 = boto3.resource('s3')

#Download object to the file    
s3.Bucket('mybucket').download_file('hello.txt', '/tmp/hello.txt')

断言未使用Mock调用函数/方法

问题:断言未使用Mock调用函数/方法

我正在使用Mock库来测试我的应用程序,但是我想断言某些函数没有被调用。模拟文档谈论类似的方法mock.assert_called_withmock.assert_called_once_with,但我没有找到像什么mock.assert_not_called或验证模拟相关的东西是不叫

我可以使用类似以下的内容,尽管它看起来既不酷也不是pythonic:

def test_something:
    # some actions
    with patch('something') as my_var:
        try:
            # args are not important. func should never be called in this test
            my_var.assert_called_with(some, args)
        except AssertionError:
            pass  # this error being raised means it's ok
    # other stuff

任何想法如何做到这一点?

I’m using the Mock library to test my application, but I want to assert that some function was not called. Mock docs talk about methods like mock.assert_called_with and mock.assert_called_once_with, but I didn’t find anything like mock.assert_not_called or something related to verify mock was NOT called.

I could go with something like the following, though it doesn’t seem cool nor pythonic:

def test_something:
    # some actions
    with patch('something') as my_var:
        try:
            # args are not important. func should never be called in this test
            my_var.assert_called_with(some, args)
        except AssertionError:
            pass  # this error being raised means it's ok
    # other stuff

Any ideas how to accomplish this?


回答 0

这应该适合您的情况;

assert not my_var.called, 'method should not have been called'

样品;

>>> mock=Mock()
>>> mock.a()
<Mock name='mock.a()' id='4349129872'>
>>> assert not mock.b.called, 'b was called and should not have been'
>>> assert not mock.a.called, 'a was called and should not have been'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError: a was called and should not have been

This should work for your case;

assert not my_var.called, 'method should not have been called'

Sample;

>>> mock=Mock()
>>> mock.a()
<Mock name='mock.a()' id='4349129872'>
>>> assert not mock.b.called, 'b was called and should not have been'
>>> assert not mock.a.called, 'a was called and should not have been'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError: a was called and should not have been

回答 1

尽管是一个老问题,我想补充一下当前的mock库(unittest.mock的反向端口)支持assert_not_called方法。

只需升级您的;

pip install mock --upgrade

Though an old question, I would like to add that currently mock library (backport of unittest.mock) supports assert_not_called method.

Just upgrade yours;

pip install mock --upgrade


回答 2

您可以检查该called属性,但是如果断言失败,那么您接下来要了解的是有关意外调用的信息,因此您也可以安排从头开始显示该信息。使用unittest,您可以查看的内容call_args_list

self.assertItemsEqual(my_var.call_args_list, [])

当失败时,它会给出如下消息:

AssertionError:元素计数不相等:
第一个具有0,第二个具有1:call('first arguments',4)

You can check the called attribute, but if your assertion fails, the next thing you’ll want to know is something about the unexpected call, so you may as well arrange for that information to be displayed from the start. Using unittest, you can check the contents of call_args_list instead:

self.assertItemsEqual(my_var.call_args_list, [])

When it fails, it gives a message like this:

AssertionError: Element counts were not equal:
First has 0, Second has 1:  call('first argument', 4)

回答 3

当您使用类进行测试时,继承了unittest.TestCase,您可以简单地使用以下方法:

  • assertTrue
  • 断言错误
  • 断言等于

和类似的(在python文档中找到其余的)。

在您的示例中,我们可以简单地断言嘲笑嘲笑的方法是否为False,这意味着未调用该方法。

import unittest
from unittest import mock

import my_module

class A(unittest.TestCase):
    def setUp(self):
        self.message = "Method should not be called. Called {times} times!"

    @mock.patch("my_module.method_to_mock")
    def test(self, mock_method):
        my_module.method_to_mock()

        self.assertFalse(mock_method.called,
                         self.message.format(times=mock_method.call_count))

When you test using class inherits unittest.TestCase you can simply use methods like:

  • assertTrue
  • assertFalse
  • assertEqual

and similar (in python documentation you find the rest).

In your example we can simply assert if mock_method.called property is False, which means that method was not called.

import unittest
from unittest import mock

import my_module

class A(unittest.TestCase):
    def setUp(self):
        self.message = "Method should not be called. Called {times} times!"

    @mock.patch("my_module.method_to_mock")
    def test(self, mock_method):
        my_module.method_to_mock()

        self.assertFalse(mock_method.called,
                         self.message.format(times=mock_method.call_count))

回答 4

随便python >= 3.5可以使用mock_object.assert_not_called()

With python >= 3.5 you can use mock_object.assert_not_called().


回答 5

从其他答案来看,除了@ rob-kennedy之外,没有人谈论过call_args_list

它是一个强大的工具,可以实现与 MagicMock.assert_called_with()

call_args_listcall对象列表。每个call对象代表对模拟可调用对象的调用。

>>> from unittest.mock import MagicMock
>>> m = MagicMock()
>>> m.call_args_list
[]
>>> m(42)
<MagicMock name='mock()' id='139675158423872'>
>>> m.call_args_list
[call(42)]
>>> m(42, 30)
<MagicMock name='mock()' id='139675158423872'>
>>> m.call_args_list
[call(42), call(42, 30)]

使用call对象很容易,因为您可以将其与长度为2的元组进行比较,其中第一个组件是包含相关调用的所有位置参数的元组,而第二个组件是关键字arguments的字典。

>>> ((42,),) in m.call_args_list
True
>>> m(42, foo='bar')
<MagicMock name='mock()' id='139675158423872'>
>>> ((42,), {'foo': 'bar'}) in m.call_args_list
True
>>> m(foo='bar')
<MagicMock name='mock()' id='139675158423872'>
>>> ((), {'foo': 'bar'}) in m.call_args_list
True

因此,解决OP特定问题的一种方法是

def test_something():
    with patch('something') as my_var:
        assert ((some, args),) not in my_var.call_args_list

请注意,通过这种方式,MagicMock.called您不仅可以通过检查是否已调用了模拟的可调用对象,还可以通过一组特定的参数来检查它是否已被调用。

这很有用。假设您要测试一个接受列表的函数,然后compute()仅在满足特定条件的情况下针对列表的每个值调用另一个函数。

现在compute,您可以模拟,并测试是否已按某个值调用了它,但未按其他值调用了它。

Judging from other answers, no one except @rob-kennedy talked about the call_args_list.

It’s a powerful tool for that you can implement the exact contrary of MagicMock.assert_called_with()

call_args_list is a list of call objects. Each call object represents a call made on a mocked callable.

>>> from unittest.mock import MagicMock
>>> m = MagicMock()
>>> m.call_args_list
[]
>>> m(42)
<MagicMock name='mock()' id='139675158423872'>
>>> m.call_args_list
[call(42)]
>>> m(42, 30)
<MagicMock name='mock()' id='139675158423872'>
>>> m.call_args_list
[call(42), call(42, 30)]

Consuming a call object is easy, since you can compare it with a tuple of length 2 where the first component is a tuple containing all the positional arguments of the related call, while the second component is a dictionary of the keyword arguments.

>>> ((42,),) in m.call_args_list
True
>>> m(42, foo='bar')
<MagicMock name='mock()' id='139675158423872'>
>>> ((42,), {'foo': 'bar'}) in m.call_args_list
True
>>> m(foo='bar')
<MagicMock name='mock()' id='139675158423872'>
>>> ((), {'foo': 'bar'}) in m.call_args_list
True

So, a way to address the specific problem of the OP is

def test_something():
    with patch('something') as my_var:
        assert ((some, args),) not in my_var.call_args_list

Note that this way, instead of just checking if a mocked callable has been called, via MagicMock.called, you can now check if it has been called with a specific set of arguments.

That’s useful. Say you want to test a function that takes a list and call another function, compute(), for each of the value of the list only if they satisfy a specific condition.

You can now mock compute, and test if it has been called on some value but not on others.


在Python中将列表成对迭代(当前,下一个)

问题:在Python中将列表成对迭代(当前,下一个)

有时我需要在Python中迭代一个列表,以查看“当前”元素和“下一个”元素。到目前为止,我已经使用以下代码完成了此操作:

for current, next in zip(the_list, the_list[1:]):
    # Do something

这行得通,符合我的期望,但是有没有一种更惯用或有效的方式来执行相同的操作?

I sometimes need to iterate a list in Python looking at the “current” element and the “next” element. I have, till now, done so with code like:

for current, next in zip(the_list, the_list[1:]):
    # Do something

This works and does what I expect, but is there’s a more idiomatic or efficient way to do the same thing?


回答 0

这是itertools模块文档中的一个相关示例:

import itertools
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = itertools.tee(iterable)
    next(b, None)
    return zip(a, b)   

对于Python 2,您需要itertools.izip代替zip

import itertools
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = itertools.tee(iterable)
    next(b, None)
    return itertools.izip(a, b)

工作原理:

首先,创建两个并行的迭代器,a并将b它们(tee()称为调用)都指向原始可迭代的第一个元素。第二个迭代器b向前移动1个(next(b, None))调用。此时指向as0并b指向s1。双方ab可以独立遍历原来迭代器-的izip函数接受两个迭代器,使对返回的元素,以相同的速度前进的两个迭代器。

一个警告:该tee()函数产生两个可以彼此独立进行的迭代器,但这要付出一定的代价。如果一个迭代器比另一个迭代器前进得更多,则tee() 需要将消耗的元素保留在内存中,直到第二个迭代器也将它们消耗掉为止(它不能“倒回”原始迭代器)。这里没有关系,因为一个迭代器仅比另一个迭代器领先1步,但是通常很容易以这种方式使用大量内存。

而且由于tee()可以接受n参数,因此它也可以用于两个以上的并行迭代器:

def threes(iterator):
    "s -> (s0,s1,s2), (s1,s2,s3), (s2, s3,4), ..."
    a, b, c = itertools.tee(iterator, 3)
    next(b, None)
    next(c, None)
    next(c, None)
    return zip(a, b, c)

Here’s a relevant example from the itertools module docs:

import itertools
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = itertools.tee(iterable)
    next(b, None)
    return zip(a, b)   

For Python 2, you need itertools.izip instead of zip:

import itertools
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = itertools.tee(iterable)
    next(b, None)
    return itertools.izip(a, b)

How this works:

First, two parallel iterators, a and b are created (the tee() call), both pointing to the first element of the original iterable. The second iterator, b is moved 1 step forward (the next(b, None)) call). At this point a points to s0 and b points to s1. Both a and b can traverse the original iterator independently – the izip function takes the two iterators and makes pairs of the returned elements, advancing both iterators at the same pace.

One caveat: the tee() function produces two iterators that can advance independently of each other, but it comes at a cost. If one of the iterators advances further than the other, then tee() needs to keep the consumed elements in memory until the second iterator comsumes them too (it cannot ‘rewind’ the original iterator). Here it doesn’t matter because one iterator is only 1 step ahead of the other, but in general it’s easy to use a lot of memory this way.

And since tee() can take an n parameter, this can also be used for more than two parallel iterators:

def threes(iterator):
    "s -> (s0,s1,s2), (s1,s2,s3), (s2, s3,4), ..."
    a, b, c = itertools.tee(iterator, 3)
    next(b, None)
    next(c, None)
    next(c, None)
    return zip(a, b, c)

回答 1

自己滚!

def pairwise(iterable):
    it = iter(iterable)
    a = next(it, None)

    for b in it:
        yield (a, b)
        a = b

Roll your own!

def pairwise(iterable):
    it = iter(iterable)
    a = next(it, None)

    for b in it:
        yield (a, b)
        a = b

回答 2

由于the_list[1:]实际上创建了整个列表的副本(不包括其第一个元素),并zip()在调用时立即创建了一个元组列表,因此总共创建了该列表的三个副本。如果您的清单很大,则可能更喜欢

from itertools import izip, islice
for current_item, next_item in izip(the_list, islice(the_list, 1, None)):
    print(current_item, next_item)

根本不会复制列表。

Since the_list[1:] actually creates a copy of the whole list (excluding its first element), and zip() creates a list of tuples immediately when called, in total three copies of your list are created. If your list is very large, you might prefer

from itertools import izip, islice
for current_item, next_item in izip(the_list, islice(the_list, 1, None)):
    print(current_item, next_item)

which does not copy the list at all.


回答 3

我只是将其列出来,我很惊讶没有人想到enumerate()。

for (index, thing) in enumerate(the_list):
    if index < len(the_list):
        current, next_ = thing, the_list[index + 1]
        #do something

I’m just putting this out, I’m very surprised no one has thought of enumerate().

for (index, thing) in enumerate(the_list):
    if index < len(the_list):
        current, next_ = thing, the_list[index + 1]
        #do something

回答 4

通过索引进行迭代可以做同样的事情:

#!/usr/bin/python
the_list = [1, 2, 3, 4]
for i in xrange(len(the_list) - 1):
    current_item, next_item = the_list[i], the_list[i + 1]
    print(current_item, next_item)

输出:

(1, 2)
(2, 3)
(3, 4)

Iterating by index can do the same thing:

#!/usr/bin/python
the_list = [1, 2, 3, 4]
for i in xrange(len(the_list) - 1):
    current_item, next_item = the_list[i], the_list[i + 1]
    print(current_item, next_item)

Output:

(1, 2)
(2, 3)
(3, 4)

回答 5

截至2020年5月16日,这现在是一个简单的导入

from more_itertools import pairwise
for current, next in pairwise(your_iterable):
  print(f'Current = {current}, next = {nxt}')

更多itertools的文档 该代码与其他答案中的代码相同,但我更喜欢在可用时导入。

如果尚未安装,则: pip install more-itertools

例如,如果您有fibbonnacci序列,则可以计算后续对的比率为:

from more_itertools import pairwise
fib= [1,1,2,3,5,8,13]
for current, nxt in pairwise(fib):
    ratio=current/nxt
    print(f'Curent = {current}, next = {nxt}, ratio = {ratio} ')

This is now a simple Import As of 16th May 2020

from more_itertools import pairwise
for current, next in pairwise(your_iterable):
  print(f'Current = {current}, next = {nxt}')

Docs for more-itertools Under the hood this code is the same as that in the other answers, but I much prefer imports when available.

If you don’t already have it installed then: pip install more-itertools

Example

For instance if you had the fibbonnacci sequence, you could calculate the ratios of subsequent pairs as:

from more_itertools import pairwise
fib= [1,1,2,3,5,8,13]
for current, nxt in pairwise(fib):
    ratio=current/nxt
    print(f'Curent = {current}, next = {nxt}, ratio = {ratio} ')

回答 6

使用列表理解从列表中配对

the_list = [1, 2, 3, 4]
pairs = [[the_list[i], the_list[i + 1]] for i in range(len(the_list) - 1)]
for [current_item, next_item] in pairs:
    print(current_item, next_item)

输出:

(1, 2)
(2, 3)
(3, 4)

Pairs from a list using a list comprehension

the_list = [1, 2, 3, 4]
pairs = [[the_list[i], the_list[i + 1]] for i in range(len(the_list) - 1)]
for [current_item, next_item] in pairs:
    print(current_item, next_item)

Output:

(1, 2)
(2, 3)
(3, 4)

回答 7

我真的很惊讶,没有人提到更短,更简单,最重要的通用解决方案:

Python 3:

from itertools import islice

def n_wise(iterable, n):
    return zip(*(islice(iterable, i, None) for i in range(n)))

Python 2:

from itertools import izip, islice

def n_wise(iterable, n):
    return izip(*(islice(iterable, i, None) for i in xrange(n)))

它可以通过进行成对迭代n=2,但是可以处理更大的数字:

>>> for a, b in n_wise('Hello!', 2):
>>>     print(a, b)
H e
e l
l l
l o
o !

>>> for a, b, c, d in n_wise('Hello World!', 4):
>>>     print(a, b, c, d)
H e l l
e l l o
l l o
l o   W
o   W o
  W o r
W o r l
o r l d
r l d !

I am really surprised nobody has mentioned the shorter, simpler and most importantly general solution:

Python 3:

from itertools import islice

def n_wise(iterable, n):
    return zip(*(islice(iterable, i, None) for i in range(n)))

Python 2:

from itertools import izip, islice

def n_wise(iterable, n):
    return izip(*(islice(iterable, i, None) for i in xrange(n)))

It works for pairwise iteration by passing n=2, but can handle any higher number:

>>> for a, b in n_wise('Hello!', 2):
>>>     print(a, b)
H e
e l
l l
l o
o !

>>> for a, b, c, d in n_wise('Hello World!', 4):
>>>     print(a, b, c, d)
H e l l
e l l o
l l o
l o   W
o   W o
  W o r
W o r l
o r l d
r l d !

回答 8

基本解决方案:

def neighbors( list ):
  i = 0
  while i + 1 < len( list ):
    yield ( list[ i ], list[ i + 1 ] )
    i += 1

for ( x, y ) in neighbors( list ):
  print( x, y )

A basic solution:

def neighbors( list ):
  i = 0
  while i + 1 < len( list ):
    yield ( list[ i ], list[ i + 1 ] )
    i += 1

for ( x, y ) in neighbors( list ):
  print( x, y )

回答 9

code = '0016364ee0942aa7cc04a8189ef3'
# Getting the current and next item
print  [code[idx]+code[idx+1] for idx in range(len(code)-1)]
# Getting the pair
print  [code[idx*2]+code[idx*2+1] for idx in range(len(code)/2)]
code = '0016364ee0942aa7cc04a8189ef3'
# Getting the current and next item
print  [code[idx]+code[idx+1] for idx in range(len(code)-1)]
# Getting the pair
print  [code[idx*2]+code[idx*2+1] for idx in range(len(code)/2)]

Python将多个变量分配给相同的值?列出行为

问题:Python将多个变量分配给相同的值?列出行为

我试图使用如下所示的多重赋值来初始化变量,但是我对此行为感到困惑,我希望分别重新赋值列表,我的意思是b [0]和c [0]等于0。

a=b=c=[0,3,5]
a[0]=1
print(a)
print(b)
print(c)

结果是:[1、3、5] [1、3、5] [1、3、5]

那是对的吗?多重分配应该使用什么?有什么不同呢?

d=e=f=3
e=4
print('f:',f)
print('e:',e)

结果:(’f:’,3)(’e:’,4)

I tried to use multiple assignment as show below to initialize variables, but I got confused by the behavior, I expect to reassign the values list separately, I mean b[0] and c[0] equal 0 as before.

a=b=c=[0,3,5]
a[0]=1
print(a)
print(b)
print(c)

Result is: [1, 3, 5] [1, 3, 5] [1, 3, 5]

Is that correct? what should I use for multiple assignment? what is different from this?

d=e=f=3
e=4
print('f:',f)
print('e:',e)

result: (‘f:’, 3) (‘e:’, 4)


回答 0

如果您要使用C / Java / etc等语言的Python。家庭,这可能会帮助您停止将其a视为“变量”,而开始将其视为“名称”。

a,,bc不是具有相等值的不同变量;它们是相同名称的不同名称。变量具有类型,身份,地址和类似的东西。

名称没有任何名称。当然可以,并且对于相同的值,您可以有很多名称。

如果给Notorious B.I.G.热狗* Biggie Smalls,请Chris Wallace带一个热狗。如果将的第一个元素更改a为1,则b和的第一个元素为c1。

如果您想知道两个名称是否在命名同一对象,请使用is运算符:

>>> a=b=c=[0,3,5]
>>> a is b
True

然后您问:

有什么不同呢?

d=e=f=3
e=4
print('f:',f)
print('e:',e)

在这里,您将名称重新绑定e到value 4。这并不影响名称df以任何方式。

在先前的版本中,您分配给a[0],而不是a。因此,从的角度来看a[0],您正在重新绑定a[0],但是从的角度来看a,您正在就位更改它。

您可以使用该id函数,该函数为您提供代表对象身份的唯一编号,以准确查看哪个对象是哪个对象,即使在is无济于事的情况下:

>>> a=b=c=[0,3,5]
>>> id(a)
4473392520
>>> id(b)
4473392520
>>> id(a[0])
4297261120
>>> id(b[0])
4297261120

>>> a[0] = 1
>>> id(a)
4473392520
>>> id(b)
4473392520
>>> id(a[0])
4297261216
>>> id(b[0])
4297261216

请注意,该a[0]名称已从4297261120更改为4297261216-现在它是另一个值的名称。并且b[0]现在也是该新值的名称。这是因为ab仍在命名同一对象。


在幕后,a[0]=1实际上是在列表对象上调用方法。(等效于a.__setitem__(0, 1)。)因此,它实际上根本没有重新绑定任何内容。这就像打电话my_object.set_something(1)。当然,该对象可能是重新绑定实例属性以实现此方法,但这并不重要。重要的是您没有分配任何东西,只是在改变对象。与相同a[0]=1


user570826询问:

如果有的话 a = b = c = 10

这与以下情况完全相同a = b = c = [1, 2, 3]:您有三个名称相同的值。

但是在这种情况下,值是an int,而ints是不可变的。在这两种情况下,你可以重新绑定a到一个不同的值(例如,a = "Now I'm a string!"),但不会影响原来的值,这bc将仍然是名称。不同的是,有一个列表,你可以更改值[1, 2, 3][1, 2, 3, 4]这样做,例如a.append(4); 由于那实际上是在更改值bc的名称,b因此现在b [1, 2, 3, 4]。无法将值更改10为其他任何值。10永远是10,就像克劳迪娅(Claudia)一样,吸血鬼也永远是5(至少直到她被Kirsten Dunst取代)。


*警告:请勿给臭名昭著的BIG热狗。黑帮说唱僵尸永远不要在午夜之后喂饱。

If you’re coming to Python from a language in the C/Java/etc. family, it may help you to stop thinking about a as a “variable”, and start thinking of it as a “name”.

a, b, and c aren’t different variables with equal values; they’re different names for the same identical value. Variables have types, identities, addresses, and all kinds of stuff like that.

Names don’t have any of that. Values do, of course, and you can have lots of names for the same value.

If you give Notorious B.I.G. a hot dog,* Biggie Smalls and Chris Wallace have a hot dog. If you change the first element of a to 1, the first elements of b and c are 1.

If you want to know if two names are naming the same object, use the is operator:

>>> a=b=c=[0,3,5]
>>> a is b
True

You then ask:

what is different from this?

d=e=f=3
e=4
print('f:',f)
print('e:',e)

Here, you’re rebinding the name e to the value 4. That doesn’t affect the names d and f in any way.

In your previous version, you were assigning to a[0], not to a. So, from the point of view of a[0], you’re rebinding a[0], but from the point of view of a, you’re changing it in-place.

You can use the id function, which gives you some unique number representing the identity of an object, to see exactly which object is which even when is can’t help:

>>> a=b=c=[0,3,5]
>>> id(a)
4473392520
>>> id(b)
4473392520
>>> id(a[0])
4297261120
>>> id(b[0])
4297261120

>>> a[0] = 1
>>> id(a)
4473392520
>>> id(b)
4473392520
>>> id(a[0])
4297261216
>>> id(b[0])
4297261216

Notice that a[0] has changed from 4297261120 to 4297261216—it’s now a name for a different value. And b[0] is also now a name for that same new value. That’s because a and b are still naming the same object.


Under the covers, a[0]=1 is actually calling a method on the list object. (It’s equivalent to a.__setitem__(0, 1).) So, it’s not really rebinding anything at all. It’s like calling my_object.set_something(1). Sure, likely the object is rebinding an instance attribute in order to implement this method, but that’s not what’s important; what’s important is that you’re not assigning anything, you’re just mutating the object. And it’s the same with a[0]=1.


user570826 asked:

What if we have, a = b = c = 10

That’s exactly the same situation as a = b = c = [1, 2, 3]: you have three names for the same value.

But in this case, the value is an int, and ints are immutable. In either case, you can rebind a to a different value (e.g., a = "Now I'm a string!"), but the won’t affect the original value, which b and c will still be names for. The difference is that with a list, you can change the value [1, 2, 3] into [1, 2, 3, 4] by doing, e.g., a.append(4); since that’s actually changing the value that b and c are names for, b will now b [1, 2, 3, 4]. There’s no way to change the value 10 into anything else. 10 is 10 forever, just like Claudia the vampire is 5 forever (at least until she’s replaced by Kirsten Dunst).


* Warning: Do not give Notorious B.I.G. a hot dog. Gangsta rap zombies should never be fed after midnight.


回答 1

咳嗽

>>> a,b,c = (1,2,3)
>>> a
1
>>> b
2
>>> c
3
>>> a,b,c = ({'test':'a'},{'test':'b'},{'test':'c'})
>>> a
{'test': 'a'}
>>> b
{'test': 'b'}
>>> c
{'test': 'c'}
>>> 

Cough cough

>>> a,b,c = (1,2,3)
>>> a
1
>>> b
2
>>> c
3
>>> a,b,c = ({'test':'a'},{'test':'b'},{'test':'c'})
>>> a
{'test': 'a'}
>>> b
{'test': 'b'}
>>> c
{'test': 'c'}
>>> 

回答 2

是的,这是预期的行为。a,b和c都设置为同一列表的标签。如果需要三个不同的列表,则需要分别分配它们。您可以重复显示列表,也可以使用多种方式之一复制列表:

b = a[:] # this does a shallow copy, which is good enough for this case
import copy
c = copy.deepcopy(a) # this does a deep copy, which matters if the list contains mutable objects

Python中的赋值语句不复制对象-它们将名称绑定到对象,并且对象可以具有与您设置的一样多的标签。在第一次编辑中,更改a [0],您将更新a,b和c均引用的单个列表中的一个元素。在第二次更改e中,您将e切换为另一个对象的标签(4而不是3)。

Yes, that’s the expected behavior. a, b and c are all set as labels for the same list. If you want three different lists, you need to assign them individually. You can either repeat the explicit list, or use one of the numerous ways to copy a list:

b = a[:] # this does a shallow copy, which is good enough for this case
import copy
c = copy.deepcopy(a) # this does a deep copy, which matters if the list contains mutable objects

Assignment statements in Python do not copy objects – they bind the name to an object, and an object can have as many labels as you set. In your first edit, changing a[0], you’re updating one element of the single list that a, b, and c all refer to. In your second, changing e, you’re switching e to be a label for a different object (4 instead of 3).


回答 3

在python中,一切都是对象,也是“简单”的变量类型(int,float等)。

当您更改变量值时,实际上是在更改其指针,如果在两个变量之间进行比较,则会对其指针进行比较。(要清楚,指针是物理计算机内存中存储变量的地址)。

结果,当您更改内部变量值时,会更改其在内存中的值,并且会影响指向该地址的所有变量。

例如,当您这样做时:

a = b =  5 

这意味着a和b指向内存中包含值5的相同地址,但是当您这样做时:

a = 6

它不会影响b,因为a现在指向另一个包含6的存储位置,而b仍然指向包含5的存储地址。

但是,当您这样做时:

a = b = [1,2,3]

a和b再次指向相同的位置,但是不同之处在于,如果更改列表值之一:

a[0] = 2

它会更改a所指向的内存的值,但是a仍然指向与b相同的地址,结果b也将更改。

In python, everything is an object, also “simple” variables types (int, float, etc..).

When you changes a variable value, you actually changes it’s pointer, and if you compares between two variables it’s compares their pointers. (To be clear, pointer is the address in physical computer memory where a variable is stored).

As a result, when you changes an inner variable value, you changes it’s value in the memory and it’s affects all the variables that point to this address.

For your example, when you do:

a = b =  5 

This means that a and b points to the same address in memory that contains the value 5, but when you do:

a = 6

It’s not affect b because a is now points to another memory location that contains 6 and b still points to the memory address that contains 5.

But, when you do:

a = b = [1,2,3]

a and b, again, points to the same location but the difference is that if you change the one of the list values:

a[0] = 2

It’s changes the value of the memory that a is points on, but a is still points to the same address as b, and as a result, b changes as well.


回答 4

您可以id(name)用来检查两个名称是否代表相同的对象:

>>> a = b = c = [0, 3, 5]
>>> print(id(a), id(b), id(c))
46268488 46268488 46268488

列表是可变的;这意味着您可以在不创建新对象的情况下就地更改值。但是,这取决于您如何更改值:

>>> a[0] = 1
>>> print(id(a), id(b), id(c))
46268488 46268488 46268488
>>> print(a, b, c)
[1, 3, 5] [1, 3, 5] [1, 3, 5]

如果您将新列表分配给a,则其ID将更改,因此不会影响bc的值:

>>> a = [1, 8, 5]
>>> print(id(a), id(b), id(c))
139423880 46268488 46268488
>>> print(a, b, c)
[1, 8, 5] [1, 3, 5] [1, 3, 5]

整数是不可变的,因此您不能在不创建新对象的情况下更改值:

>>> x = y = z = 1
>>> print(id(x), id(y), id(z))
507081216 507081216 507081216
>>> x = 2
>>> print(id(x), id(y), id(z))
507081248 507081216 507081216
>>> print(x, y, z)
2 1 1

You can use id(name) to check if two names represent the same object:

>>> a = b = c = [0, 3, 5]
>>> print(id(a), id(b), id(c))
46268488 46268488 46268488

Lists are mutable; it means you can change the value in place without creating a new object. However, it depends on how you change the value:

>>> a[0] = 1
>>> print(id(a), id(b), id(c))
46268488 46268488 46268488
>>> print(a, b, c)
[1, 3, 5] [1, 3, 5] [1, 3, 5]

If you assign a new list to a, then its id will change, so it won’t affect b and c‘s values:

>>> a = [1, 8, 5]
>>> print(id(a), id(b), id(c))
139423880 46268488 46268488
>>> print(a, b, c)
[1, 8, 5] [1, 3, 5] [1, 3, 5]

Integers are immutable, so you cannot change the value without creating a new object:

>>> x = y = z = 1
>>> print(id(x), id(y), id(z))
507081216 507081216 507081216
>>> x = 2
>>> print(id(x), id(y), id(z))
507081248 507081216 507081216
>>> print(x, y, z)
2 1 1

回答 5

在第一个示例中,a = b = c = [1, 2, 3]您实际上是在说:

 'a' is the same as 'b', is the same as 'c' and they are all [1, 2, 3]

如果要将“ a”设置为1,将“ b”设置为“ 2”,将“ c”设置为3,请尝试以下操作:

a, b, c = [1, 2, 3]

print(a)
--> 1
print(b)
--> 2
print(c)
--> 3

希望这可以帮助!

in your first example a = b = c = [1, 2, 3] you are really saying:

 'a' is the same as 'b', is the same as 'c' and they are all [1, 2, 3]

If you want to set ‘a’ equal to 1, ‘b’ equal to ‘2’ and ‘c’ equal to 3, try this:

a, b, c = [1, 2, 3]

print(a)
--> 1
print(b)
--> 2
print(c)
--> 3

Hope this helps!


回答 6

简而言之,在第一种情况下,您要为分配多个名称list。在内存中仅创建一个列表副本,并且所有名称均指向该位置。因此,使用任何名称更改列表实际上都会修改内存中的列表。

在第二种情况下,将在内存中创建相同值的多个副本。因此,每个副本彼此独立。

Simply put, in the first case, you are assigning multiple names to a list. Only one copy of list is created in memory and all names refer to that location. So changing the list using any of the names will actually modify the list in memory.

In the second case, multiple copies of same value are created in memory. So each copy is independent of one another.


回答 7

您需要的是:

a, b, c = [0,3,5] # Unpack the list, now a, b, and c are ints
a = 1             # `a` did equal 0, not [0,3,5]
print(a)
print(b)
print(c)

What you need is this:

a, b, c = [0,3,5] # Unpack the list, now a, b, and c are ints
a = 1             # `a` did equal 0, not [0,3,5]
print(a)
print(b)
print(c)

回答 8

执行我需要的代码可能是这样的:

# test

aux=[[0 for n in range(3)] for i in range(4)]
print('aux:',aux)

# initialization

a,b,c,d=[[0 for n in range(3)] for i in range(4)]

# changing values

a[0]=1
d[2]=5
print('a:',a)
print('b:',b)
print('c:',c)
print('d:',d)

结果:

('aux:', [[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]])
('a:', [1, 0, 0])
('b:', [0, 0, 0])
('c:', [0, 0, 0])
('d:', [0, 0, 5])

The code that does what I need could be this:

# test

aux=[[0 for n in range(3)] for i in range(4)]
print('aux:',aux)

# initialization

a,b,c,d=[[0 for n in range(3)] for i in range(4)]

# changing values

a[0]=1
d[2]=5
print('a:',a)
print('b:',b)
print('c:',c)
print('d:',d)

Result:

('aux:', [[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]])
('a:', [1, 0, 0])
('b:', [0, 0, 0])
('c:', [0, 0, 0])
('d:', [0, 0, 5])

Matplotlib透明线图

问题:Matplotlib透明线图

我在matplotlib中绘制了两个相似的轨迹,我想以部分透明的方式绘制每条线,以使红色(绘制的第二个)不会遮盖蓝色。

编辑:这是带有透明线的图像。

I am plotting two similar trajectories in matplotlib and I’d like to plot each of the lines with partial transparency so that the red (plotted second) doesn’t obscure the blue.

EDIT: Here’s the image with transparent lines.


回答 0

干净利落:

plt.plot(x, y, 'r-', alpha=0.7)

(我知道我没有添加任何新内容,但是简单的答案应该可见)。

Plain and simple:

plt.plot(x, y, 'r-', alpha=0.7)

(I know I add nothing new, but the straightforward answer should be visible).


回答 1

绘制完所有线条后,可以如下设置所有线条的透明度:

for l in fig_field.gca().lines:
    l.set_alpha(.7)

编辑:请在评论中查看乔的答案。

After I plotted all the lines, I was able to set the transparency of all of them as follows:

for l in fig_field.gca().lines:
    l.set_alpha(.7)

EDIT: please see Joe’s answer in the comments.


回答 2

这实际上取决于您要使用哪些函数来绘制线条,但是请尝试查看所使用的on是否采用alpha值并将其设置为0.5。如果那不起作用,请尝试获取线对象并直接设置其alpha值。

It really depends on what functions you’re using to plot the lines, but try see if the on you’re using takes an alpha value and set it to something like 0.5. If that doesn’t work, try get the line objects and set their alpha values directly.


词形化与词干的区别是什么?

问题:词形化与词干的区别是什么?

什么时候使用每个?

另外… NLTK词素化是否取决于词类?如果不是,它会更准确吗?

When do I use each ?

Also…is the NLTK lemmatization dependent upon Parts of Speech? Wouldn’t it be more accurate if it was?


回答 0

简短而密集:http : //nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html

词干和词根化的目的都是将单词的屈折形式和有时与派生相关的形式减少为通用的基本形式。

但是,这两个词的风格不同。词干通常是指粗略的启发式过程,该过程会切掉单词的结尾,以期在大多数时间正确实现此目标,并且通常包括删除派生词缀。词法词化通常是指使用单词的词汇和词法分析来正确处理事情,通常旨在仅去除词尾变化,并返回单词的基数或字典形式,这被称为引理。

从NLTK文档:

引词化和词干化是规范化的特殊情况。他们为一组相关的单词形式确定规范的代表。

Short and dense: http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html

The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form.

However, the two words differ in their flavor. Stemming usually refers to a crude heuristic process that chops off the ends of words in the hope of achieving this goal correctly most of the time, and often includes the removal of derivational affixes. Lemmatization usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma .

From the NLTK docs:

Lemmatization and stemming are special cases of normalization. They identify a canonical representative for a set of related word forms.


回答 1

合法化阻止密切相关。不同之处在于,词干分析器在不了解上下文的情况下对单个单词进行操作,因此无法根据词性区分具有不同含义的单词。但是,茎杆通常更易于实现和运行得更快,并且降低的精度对于某些应用可能并不重要。

例如:

  1. “更好”一词的引理是“好”。由于需要字典查找,因此干掉了该链接。

  2. 单词“ walk”是单词“ walking”的基本形式,因此,它在词干和词根化方面均匹配。

  3. 根据上下文,“开会”一词可以是名词的基本形式,也可以是动词(“见面”)的形式,例如“在我们上次见面”或“我们明天再见面”。与词根提取不同,词条分解原则上可以根据上下文选择适当的词条。

资料来源https : //zh.wikipedia.org/wiki/合法化

Lemmatisation is closely related to stemming. The difference is that a stemmer operates on a single word without knowledge of the context, and therefore cannot discriminate between words which have different meanings depending on part of speech. However, stemmers are typically easier to implement and run faster, and the reduced accuracy may not matter for some applications.

For instance:

  1. The word “better” has “good” as its lemma. This link is missed by stemming, as it requires a dictionary look-up.

  2. The word “walk” is the base form for word “walking”, and hence this is matched in both stemming and lemmatisation.

  3. The word “meeting” can be either the base form of a noun or a form of a verb (“to meet”) depending on the context, e.g., “in our last meeting” or “We are meeting again tomorrow”. Unlike stemming, lemmatisation can in principle select the appropriate lemma depending on the context.

Source: https://en.wikipedia.org/wiki/Lemmatisation


回答 2

有两个方面可以显示它们的差异:

  1. 词干将返回一个字,这不必是完全相同的字的形态根的杆。即使词干本身不是有效的词根,通常只要相关词映射到相同的词干就足够了,而在词法化过程中,它将返回词的字典形式,该词必须是有效的词。

  2. lemmatisation,单词的语音部分,应首先确定和归一化的规则将是语音的不同部分不同,而词干上的单个字的运行不需要的上下文的知识,并且因此其具有不同的字之间不能区分含义取决于词性。

参考http://textminingonline.com/dive-into-nltk-part-iv-stemming-and-lemmatization

There are two aspects to show their differences:

  1. A stemmer will return the stem of a word, which needn’t be identical to the morphological root of the word. It usually sufficient that related words map to the same stem,even if the stem is not in itself a valid root, while in lemmatisation, it will return the dictionary form of a word, which must be a valid word.

  2. In lemmatisation, the part of speech of a word should be first determined and the normalisation rules will be different for different part of speech, while the stemmer operates on a single word without knowledge of the context, and therefore cannot discriminate between words which have different meanings depending on part of speech.

Reference http://textminingonline.com/dive-into-nltk-part-iv-stemming-and-lemmatization


回答 3

词干和词根化的目的都是为了减少形态变异。这与更通用的“术语合并”过程相反,后者也可以处理词汇语义,句法或正字法变化。

词干和词根化的真正区别有三点:

  1. 词干将单词形式简化为(伪)词干,而词义化将单词形式简化为在语言上有效的词缀。这种差异在形态更为复杂的语言中显而易见,但对于许多IR应用而言可能无关紧要。

  2. 引理化仅处理拐点变化,而词干化也可以处理导数变化。

  3. 在实现方面,词元化通常更为复杂(尤其是对于形态复杂的语言),并且通常需要某种词典。另一方面,可以通过相当简单的基于规则的方法来实现令人满意的词干。

词法标记器也可以支持词法化,以消除同音异义词的歧义。

The purpose of both stemming and lemmatization is to reduce morphological variation. This is in contrast to the the more general “term conflation” procedures, which may also address lexico-semantic, syntactic, or orthographic variations.

The real difference between stemming and lemmatization is threefold:

  1. Stemming reduces word-forms to (pseudo)stems, whereas lemmatization reduces the word-forms to linguistically valid lemmas. This difference is apparent in languages with more complex morphology, but may be irrelevant for many IR applications;

  2. Lemmatization deals only with inflectional variance, whereas stemming may also deal with derivational variance;

  3. In terms of implementation, lemmatization is usually more sophisticated (especially for morphologically complex languages) and usually requires some sort of lexica. Satisfatory stemming, on the other hand, can be achieved with rather simple rule-based approaches.

Lemmatization may also be backed up by a part-of-speech tagger in order to disambiguate homonyms.


回答 4

正如MYYN所指出的那样,词干提取是将词尾的,有时是衍生词的词缀去除为所有原始词都可能与之相关的基本形式的过程。词法化与获得单个单词有关,该单词使您可以将一堆变形的表格组合在一起。这比阻止更难,因为它需要考虑上下文(因此要考虑单词的含义),而阻止则忽略上下文。

至于何时使用一个或另一个,则取决于您的应用程序在多大程度上取决于正确理解上下文中单词的含义。如果您要进行机器翻译,则可能需要进行词形化处理,以避免错误翻译单词。如果您要对10亿个文档进行信息检索,而您有99%的查询(从1-3个字不等),那么您就可以满足要求。

对于NLTK,WordNetLemmatizer确实会使用词性,尽管您必须提供(否则默认为名词)。传递给它“鸽子”和“ v”会产生“潜水”,而传递给“鸽子”和“ n”会产生“鸽子”。

As MYYN pointed out, stemming is the process of removing inflectional and sometimes derivational affixes to a base form that all of the original words are probably related to. Lemmatization is concerned with obtaining the single word that allows you to group together a bunch of inflected forms. This is harder than stemming because it requires taking the context into account (and thus the meaning of the word), while stemming ignores context.

As for when you would use one or the other, it’s a matter of how much your application depends on getting the meaning of a word in context correct. If you’re doing machine translation, you probably want lemmatization to avoid mistranslating a word. If you’re doing information retrieval over a billion documents with 99% of your queries ranging from 1-3 words, you can settle for stemming.

As for NLTK, the WordNetLemmatizer does use the part of speech, though you have to provide it (otherwise it defaults to nouns). Passing it “dove” and “v” yields “dive” while “dove” and “n” yields “dove”.


回答 5

一个示例驱动的解释,说明了词源化和词干之间的区别:

词法化处理将“汽车”与“汽车”匹配以及将“汽车”与“汽车”匹配。

阻止处理将“ car”匹配到“ cars”

词法化意味着模糊词匹配的范围更广,仍然由相同的子系统处理。它暗示了用于引擎内低级处理的某些技术,也可能反映了工程上对术语的偏爱。

[…]以FAST为例,他们的词素化引擎不仅处理诸如单数或复数之类的基本单词变体,而且还处理诸如“热”匹配“暖”之类的词库运算符。

这并不是说其他​​引擎当然不会处理同义词,但是底层实现可能与处理基本词干的引擎不在同一个子系统中。

http://www.ideaeng.com/stemming-lemmatization-0601

An example-driven explanation on the differenes between lemmatization and stemming:

Lemmatization handles matching “car” to “cars” along with matching “car” to “automobile”.

Stemming handles matching “car” to “cars” .

Lemmatization implies a broader scope of fuzzy word matching that is still handled by the same subsystems. It implies certain techniques for low level processing within the engine, and may also reflect an engineering preference for terminology.

[…] Taking FAST as an example, their lemmatization engine handles not only basic word variations like singular vs. plural, but also thesaurus operators like having “hot” match “warm”.

This is not to say that other engines don’t handle synonyms, of course they do, but the low level implementation may be in a different subsystem than those that handle base stemming.

http://www.ideaeng.com/stemming-lemmatization-0601


回答 6

ianacl
但我认为词干是一个粗略的黑客人们用它来获得所有不同形式的同一个单词到它不必是本身就是一个合法的字基本形式
有点像波特施特默尔可以使用简单的正则表达式来消除常见字后缀

词法化将单词还原为实际的基本形式,在不规则动词的情况下,该词看起来可能与输入词
完全不同,例如Morpha之类的东西,它使用FST将名词和动词带入其基本形式

ianacl
but i think Stemming is a rough hack people use to get all the different forms of the same word down to a base form which need not be a legit word on its own
Something like the Porter Stemmer can uses simple regexes to eliminate common word suffixes

Lemmatization brings a word down to its actual base form which, in the case of irregular verbs, might look nothing like the input word
Something like Morpha which uses FSTs to bring nouns and verbs to their base form


回答 7

词干只是去除或阻止单词的最后几个字符,通常会导致错误的含义和拼写。词法化会考虑上下文,并将单词转换为其有意义的基本形式,即词法。有时,同一个词可以有多个不同的引词。我们应该在特定的上下文中为单词识别词性(POS)标签。以下示例说明了所有差异和用例:

  1. 如果您对“ 关怀 ” 一词进行词缀化,它将返回“ 关怀 ”。如果阻止,它将返回“ Car ”,这是错误的。
  2. 如果在动词上下文中对单词“ Stripes ”进行词缀化,它将返回“ Strip ”。如果在名词上下文中对其进行词缀化,它将返回“ Stripe ”。如果您只是阻止它,它将仅返回’ Strip ‘。
  3. 无论您是词干化还是词干化(例如走路,跑步,游泳走路,跑步,游泳等)您都会得到相同的结果。
  4. 归类化在计算上很昂贵,因为它涉及查找表,而不涉及查找表。如果您的数据集很大并且性能是一个问题,请使用Stemming。请记住,您也可以将自己的规则添加到“词干”中。如果准确性至高无上,并且数据集不那么庞大,请使用Lemmatization。

Stemming just removes or stems the last few characters of a word, often leading to incorrect meanings and spelling. Lemmatization considers the context and converts the word to its meaningful base form, which is called Lemma. Sometimes, the same word can have multiple different Lemmas. We should identify the Part of Speech (POS) tag for the word in that specific context. Here are the examples to illustrate all the differences and use cases:

  1. If you lemmatize the word ‘Caring‘, it would return ‘Care‘. If you stem, it would return ‘Car‘ and this is erroneous.
  2. If you lemmatize the word ‘Stripes‘ in verb context, it would return ‘Strip‘. If you lemmatize it in noun context, it would return ‘Stripe‘. If you just stem it, it would just return ‘Strip‘.
  3. You would get same results whether you lemmatize or stem words such as walking, running, swimming… to walk, run, swim etc.
  4. Lemmatization is computationally expensive since it involves look-up tables and what not. If you have large dataset and performance is an issue, go with Stemming. Remember you can also add your own rules to Stemming. If accuracy is paramount and dataset isn’t humongous, go with Lemmatization.

回答 8

词干处理是删除给定单词的最后几个字符以获得较短形式的过程,即使该形式没有任何意义。

例子,

"beautiful" -> "beauti"
"corpora" -> "corpora"

可以非常快速地完成茎梗。

反之,词法化是根据单词的字典含义将给定单词转换为其基本形式的过程。

例子,

"beautiful" -> "beauty"
"corpora" -> "corpus"

合法化要花费比阻止更多的时间。

Stemming is the process of removing the last few characters of a given word, to obtain a shorter form, even if that form doesn’t have any meaning.

Examples,

"beautiful" -> "beauti"
"corpora" -> "corpora"

Stemming can be done very quickly.

Lemmatization on the other hand, is the process of converting the given word into it’s base form according to the dictionary meaning of the word.

Examples,

"beautiful" -> "beauty"
"corpora" -> "corpus"

Lemmatization takes more time than stemming.


对于Django 2.0,在urls.py中使用path()或url()更好吗?

问题:对于Django 2.0,在urls.py中使用path()或url()更好吗?

在django在线类中,讲师让我们使用该url()函数调用视图并使用urlpatterns列表中的正则表达式。我在YouTube上看到了其他示例。例如

from django.contrib import admin
from django.urls import include
from django.conf.urls import url

urlpatterns = [
    path('admin/', admin.site.urls),
    url(r'^polls/', include('polls.urls')),
]


#and in polls/urls.py

urlpatterns = [        
    url(r'^$', views.index, name="index"),
]

但是,在阅读Django教程时,他们path()改用例如:

from django.urls import path
from . import views

urlpatterns = [
    path('', views.index, name="index"),        
]

此外,正则表达式似乎不适用于该path()函数,因为使用path(r'^$', views.index, name="index")将找不到mysite.com/polls/视图。

使用path()没有正则表达式匹配的正确方法是正确的吗?是url()更强大,但更复杂,所以他们正在使用path()与开始我们吗?还是针对不同工作使用不同工具的情况?

In a django online course, the instructor has us use the url() function to call views and utilize regular expressions in the urlpatterns list. I’ve seen other examples on youtube of this. e.g.

from django.contrib import admin
from django.urls import include
from django.conf.urls import url

urlpatterns = [
    path('admin/', admin.site.urls),
    url(r'^polls/', include('polls.urls')),
]


#and in polls/urls.py

urlpatterns = [        
    url(r'^$', views.index, name="index"),
]

However, in going through the Django tutorial, they use path() instead e.g.:

from django.urls import path
from . import views

urlpatterns = [
    path('', views.index, name="index"),        
]

Furthermore regular expressions don’t seem to work with the path() function as using a path(r'^$', views.index, name="index") won’t find the mysite.com/polls/ view.

Is using path() without regex matching the proper way going forward? Is url() more powerful but more complicated so they’re using path() to start us out with? Or is it a case of different tools for different jobs?


回答 0

从Django文档获取url

url(regex, view, kwargs=None, name=None)此函数是的别名django.urls.re_path()。在将来的版本中可能不推荐使用。

path和之间的主要区别re_pathpath使用不带正则表达式的路由

您可以re_path用于复杂的正则表达式调用,也可以仅path用于更简单的查找

From Django documentation for url

url(regex, view, kwargs=None, name=None) This function is an alias to django.urls.re_path(). It’s likely to be deprecated in a future release.

Key difference between path and re_path is that path uses route without regex

You can use re_path for complex regex calls and use just path for simpler lookups


回答 1

django.urls.path()功能允许使用更简单,更易读的URL路由语法。例如,此示例来自先前的Django版本:

url(r'^articles/(?P<year>[0-9]{4})/$', views.year_archive)

可以写成:

path('articles/<int:year>/', views.year_archive)

django.conf.urls.url() 以前版本中的功能现在可以作为django.urls.re_path()。保留旧位置是为了向后兼容,而不会很快淘汰。django.conf.urls.include()现在django.urls可以从导入旧功能,因此您可以使用:

from django.urls import include, path, re_path

URLconfs中。进一步阅读django doc

The new django.urls.path() function allows a simpler, more readable URL routing syntax. For example, this example from previous Django releases:

url(r'^articles/(?P<year>[0-9]{4})/$', views.year_archive)

could be written as:

path('articles/<int:year>/', views.year_archive)

The django.conf.urls.url() function from previous versions is now available as django.urls.re_path(). The old location remains for backwards compatibility, without an imminent deprecation. The old django.conf.urls.include() function is now importable from django.urls so you can use:

from django.urls import include, path, re_path

in the URLconfs. For further reading django doc


回答 2

pathDjango 2.0只是几周前才发布的新功能。大多数教程都不会针对新语法进行更新。

当然,这应该是一种更简单的处理方式。我不会说URL更强大,但是您应该能够以任何一种格式表示模式。

path is simply new in Django 2.0, which was only released a couple of weeks ago. Most tutorials won’t have been updated for the new syntax.

It was certainly supposed to be a simpler way of doing things; I wouldn’t say that URL is more powerful though, you should be able to express patterns in either format.


回答 3

正则表达式似乎不适用于path()具有以下参数的函数:path(r'^$', views.index, name="index")

应该是这样的:path('', views.index, name="index")

第一个参数必须为空才能输入正则表达式。

Regular expressions don’t seem to work with the path() function with the following arguments: path(r'^$', views.index, name="index").

It should be like this: path('', views.index, name="index").

The 1st argument must be blank to enter a regular expression.


回答 4

Path是Django 2.0的新功能。在这里解释:https : //docs.djangoproject.com/en/2.0/releases/2.0/#whats-new-2-0

看起来更像pythonic方式,并允许在传递给视图的参数中不使用正则表达式…例如,您可以使用int()函数。

Path is a new feature of Django 2.0. Explained here : https://docs.djangoproject.com/en/2.0/releases/2.0/#whats-new-2-0

Look like more pythonic way, and enable to not use regular expression in argument you pass to view… you can ue int() function for exemple.


回答 5

从v2.0开始,许多用户正在使用path,但是我们可以使用path或url。例如,在django 2.1.1中,可以通过url映射到函数

from django.contrib import admin
from django.urls import path

from django.contrib.auth import login
from posts.views import post_home
from django.conf.urls import url

urlpatterns = [
    path('admin/', admin.site.urls),
    url(r'^posts/$', post_home, name='post_home'),

]

其中posts是应用程序,而post_home是views.py中的函数

From v2.0 many users are using path, but we can use either path or url. For example in django 2.1.1 mapping to functions through url can be done as follows

from django.contrib import admin
from django.urls import path

from django.contrib.auth import login
from posts.views import post_home
from django.conf.urls import url

urlpatterns = [
    path('admin/', admin.site.urls),
    url(r'^posts/$', post_home, name='post_home'),

]

where posts is an application & post_home is a function in views.py


如何获取父目录位置

问题:如何获取父目录位置

这段代码是在b.py中获取templates / blog1 / page.html:

path = os.path.join(os.path.dirname(__file__), os.path.join('templates', 'blog1/page.html'))

但我想获取父目录位置:

aParent
   |--a
   |  |---b.py
   |      |---templates
   |              |--------blog1
   |                         |-------page.html
   |--templates
          |--------blog1
                     |-------page.html

以及如何获取父位置

谢谢

更新:

这是对的:

dirname=os.path.dirname
path = os.path.join(dirname(dirname(__file__)), os.path.join('templates', 'blog1/page.html'))

要么

path = os.path.abspath(os.path.join(os.path.dirname(__file__),".."))

this code is get the templates/blog1/page.html in b.py:

path = os.path.join(os.path.dirname(__file__), os.path.join('templates', 'blog1/page.html'))

but i want to get the parent dir location:

aParent
   |--a
   |  |---b.py
   |      |---templates
   |              |--------blog1
   |                         |-------page.html
   |--templates
          |--------blog1
                     |-------page.html

and how to get the aParent location

thanks

updated:

this is right:

dirname=os.path.dirname
path = os.path.join(dirname(dirname(__file__)), os.path.join('templates', 'blog1/page.html'))

or

path = os.path.abspath(os.path.join(os.path.dirname(__file__),".."))

回答 0

您可以重复应用dirname来爬高:dirname(dirname(file))。但是,这只能到达根包。如果有问题,请使用os.path.abspathdirname(dirname(abspath(file)))

You can apply dirname repeatedly to climb higher: dirname(dirname(file)). This can only go as far as the root package, however. If this is a problem, use os.path.abspath: dirname(dirname(abspath(file))).


回答 1

os.path.abspath不会验证任何内容,因此,如果我们已经向其追加字符串,__file__则无需理会dirname或加入其中的任何一个。只需将其__file__视为目录并开始爬山:

# climb to __file__'s parent's parent:
os.path.abspath(__file__ + "/../../")

这远不os.path.abspath(os.path.join(os.path.dirname(__file__),".."))及易处理dirname(dirname(__file__))。攀登两个以上的水平开始变得荒谬。

但是,由于我们知道要爬多少层,我们可以用一个简单的小函数来清理它:

uppath = lambda _path, n: os.sep.join(_path.split(os.sep)[:-n])

# __file__ = "/aParent/templates/blog1/page.html"
>>> uppath(__file__, 1)
'/aParent/templates/blog1'
>>> uppath(__file__, 2)
'/aParent/templates'
>>> uppath(__file__, 3)
'/aParent'

os.path.abspath doesn’t validate anything, so if we’re already appending strings to __file__ there’s no need to bother with dirname or joining or any of that. Just treat __file__ as a directory and start climbing:

# climb to __file__'s parent's parent:
os.path.abspath(__file__ + "/../../")

That’s far less convoluted than os.path.abspath(os.path.join(os.path.dirname(__file__),"..")) and about as manageable as dirname(dirname(__file__)). Climbing more than two levels starts to get ridiculous.

But, since we know how many levels to climb, we could clean this up with a simple little function:

uppath = lambda _path, n: os.sep.join(_path.split(os.sep)[:-n])

# __file__ = "/aParent/templates/blog1/page.html"
>>> uppath(__file__, 1)
'/aParent/templates/blog1'
>>> uppath(__file__, 2)
'/aParent/templates'
>>> uppath(__file__, 3)
'/aParent'

回答 2

相对路径pathlibPython 3.4+中的模块一起使用:

from pathlib import Path

Path(__file__).parent

您可以使用多个调用来parent进一步进行以下操作:

Path(__file__).parent.parent

作为指定parent两次的替代方法,可以使用:

Path(__file__).parents[1]

Use relative path with the pathlib module in Python 3.4+:

from pathlib import Path

Path(__file__).parent

You can use multiple calls to parent to go further in the path:

Path(__file__).parent.parent

As an alternative to specifying parent twice, you can use:

Path(__file__).parents[1]

回答 3

os.path.dirname(os.path.abspath(__file__))

应该给你通往的道路a

但是,如果b.py当前执行的是文件,则只需执行以下操作即可

os.path.abspath(os.path.join('templates', 'blog1', 'page.html'))
os.path.dirname(os.path.abspath(__file__))

Should give you the path to a.

But if b.py is the file that is currently executed, then you can achieve the same by just doing

os.path.abspath(os.path.join('templates', 'blog1', 'page.html'))

回答 4

os.pardir是一种更好的方法../,更具可读性。

import os
print os.path.abspath(os.path.join(given_path, os.pardir))  

这将返回给定路径的父路径

os.pardir is a better way for ../ and more readable.

import os
print os.path.abspath(os.path.join(given_path, os.pardir))  

This will return the parent path of the given_path


回答 5

一种简单的方法可以是:

import os
current_dir =  os.path.abspath(os.path.dirname(__file__))
parent_dir = os.path.abspath(current_dir + "/../")
print parent_dir

A simple way can be:

import os
current_dir =  os.path.abspath(os.path.dirname(__file__))
parent_dir = os.path.abspath(current_dir + "/../")
print parent_dir

回答 6

可能是加入两个..文件夹,以获取父文件夹的父文件夹?

path = os.path.abspath(os.path.join(os.path.dirname(os.path.abspath(__file__)),"..",".."))

May be join two .. folder, to get parent of the parent folder?

path = os.path.abspath(os.path.join(os.path.dirname(os.path.abspath(__file__)),"..",".."))

回答 7

使用以下命令跳到上一个文件夹:

os.chdir(os.pardir)

如果您需要多次跳转,那么在这种情况下,使用简单的装饰器是一个好而简单的解决方案。

Use the following to jump to previous folder:

os.chdir(os.pardir)

If you need multiple jumps a good and easy solution will be to use a simple decorator in this case.


回答 8

这是另一个相对简单的解决方案:

  • 不使用dirname()(在“ file.txt”这样的一级参数或“ ..”这样的相对父级上不能按预期工作)
  • 不使用abspath()(避免对当前工作目录进行任何假设),而是保留路径的相对字符

它只是使用normpathjoin

def parent(p):
    return os.path.normpath(os.path.join(p, os.path.pardir))

# Example:
for p in ['foo', 'foo/bar/baz', 'with/trailing/slash/', 
        'dir/file.txt', '../up/', '/abs/path']:
    print parent(p)

结果:

.
foo/bar
with/trailing
dir
..
/abs

Here is another relatively simple solution that:

  • does not use dirname() (which does not work as expected on one level arguments like “file.txt” or relative parents like “..”)
  • does not use abspath() (avoiding any assumptions about the current working directory) but instead preserves the relative character of paths

it just uses normpath and join:

def parent(p):
    return os.path.normpath(os.path.join(p, os.path.pardir))

# Example:
for p in ['foo', 'foo/bar/baz', 'with/trailing/slash/', 
        'dir/file.txt', '../up/', '/abs/path']:
    print parent(p)

Result:

.
foo/bar
with/trailing
dir
..
/abs

回答 9

我认为用这个更好:

os.path.realpath(__file__).rsplit('/', X)[0]


In [1]: __file__ = "/aParent/templates/blog1/page.html"

In [2]: os.path.realpath(__file__).rsplit('/', 3)[0]
Out[3]: '/aParent'

In [4]: __file__ = "/aParent/templates/blog1/page.html"

In [5]: os.path.realpath(__file__).rsplit('/', 1)[0]
Out[6]: '/aParent/templates/blog1'

In [7]: os.path.realpath(__file__).rsplit('/', 2)[0]
Out[8]: '/aParent/templates'

In [9]: os.path.realpath(__file__).rsplit('/', 3)[0]
Out[10]: '/aParent'

I think use this is better:

os.path.realpath(__file__).rsplit('/', X)[0]


In [1]: __file__ = "/aParent/templates/blog1/page.html"

In [2]: os.path.realpath(__file__).rsplit('/', 3)[0]
Out[3]: '/aParent'

In [4]: __file__ = "/aParent/templates/blog1/page.html"

In [5]: os.path.realpath(__file__).rsplit('/', 1)[0]
Out[6]: '/aParent/templates/blog1'

In [7]: os.path.realpath(__file__).rsplit('/', 2)[0]
Out[8]: '/aParent/templates'

In [9]: os.path.realpath(__file__).rsplit('/', 3)[0]
Out[10]: '/aParent'

回答 10

我试过了:

import os
os.path.abspath(os.path.join(os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe()))), os.pardir))

I tried:

import os
os.path.abspath(os.path.join(os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe()))), os.pardir))