如何将traceback / sys.exc_info()值保存在变量中?

问题:如何将traceback / sys.exc_info()值保存在变量中?

我想将错误的名称和追溯详细信息保存到变量中。这是我的尝试。

import sys
try:
    try:
        print x
    except Exception, ex:
        raise NameError
except Exception, er:
    print "0", sys.exc_info()[0]
    print "1", sys.exc_info()[1]
    print "2", sys.exc_info()[2]

输出:

0 <type 'exceptions.NameError'>
1 
2 <traceback object at 0xbd5fc8>

所需输出:

0 NameError
1
2 Traceback (most recent call last):
  File "exception.py", line 6, in <module>
    raise NameError

PS:我知道可以使用追溯模块轻松完成此操作,但是我想在此了解sys.exc_info()[2]对象的用法。

I want to save the name of the error and the traceback details into a variable. Here’s is my attempt.

import sys
try:
    try:
        print x
    except Exception, ex:
        raise NameError
except Exception, er:
    print "0", sys.exc_info()[0]
    print "1", sys.exc_info()[1]
    print "2", sys.exc_info()[2]

Output:

0 <type 'exceptions.NameError'>
1 
2 <traceback object at 0xbd5fc8>

Desired Output:

0 NameError
1
2 Traceback (most recent call last):
  File "exception.py", line 6, in <module>
    raise NameError

P.S. I know this can be done easily using the traceback module, but I want to know usage of sys.exc_info()[2] object here.


回答 0

这是我的方法:

>>> import traceback
>>> try:
...   int('k')
... except:
...   var = traceback.format_exc()
... 
>>> print var
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ValueError: invalid literal for int() with base 10: 'k'

但是,您应该查看traceback文档,因为您可能会发现更合适的方法,这取决于您以后要如何处理变量…

This is how I do it:

>>> import traceback
>>> try:
...   int('k')
... except:
...   var = traceback.format_exc()
... 
>>> print var
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ValueError: invalid literal for int() with base 10: 'k'

You should however take a look at the traceback documentation, as you might find there more suitable methods, depending to how you want to process your variable afterwards…


回答 1

sys.exc_info()返回具有三个值(类型,值,回溯)的元组。

  1. 这里的类型获取正在处理的异常的异常类型
  2. 值是要传递给异常类的构造函数的参数
  3. traceback包含堆栈信息,例如发生异常的位置等。

例如,在以下程序中

try:

    a = 1/0

except Exception,e:

    exc_tuple = sys.exc_info()

现在,如果我们打印元组,则值将为this。

  1. exc_tuple [0]的值将为“ ZeroDivisionError
  2. exc_tuple [1]的值将是“ 整数除法或以零 ”(作为参数传递给异常类的字符串)
  3. exc_tuple [2]的值将是“ (某些内存地址)的引用对象

也可以通过简单地以字符串格式打印异常来获取上述详细信息。

print str(e)

sys.exc_info() returns a tuple with three values (type, value, traceback).

  1. Here type gets the exception type of the Exception being handled
  2. value is the arguments that are being passed to constructor of exception class
  3. traceback contains the stack information like where the exception occurred etc.

For Example, In the following program

try:

    a = 1/0

except Exception,e:

    exc_tuple = sys.exc_info()

Now If we print the tuple the values will be this.

  1. exc_tuple[0] value will be “ZeroDivisionError
  2. exc_tuple[1] value will be “integer division or modulo by zero” (String passed as parameter to the exception class)
  3. exc_tuple[2] value will be “trackback object at (some memory address)

The above details can also be fetched by simply printing the exception in string format.

print str(e)

回答 2

使用traceback.extract_stack(),如果你想模块和函数名和行号方便。

''.join(traceback.format_stack())如果只需要一个看起来像traceback.print_stack()输出的字符串,请使用。

请注意,即使''.join()你会得到一个多行字符串,因为的元素format_stack()包含\n。请参见下面的输出。

记住要import traceback

这是的输出traceback.extract_stack()。添加了格式以提高可读性。

>>> traceback.extract_stack()
[
   ('<string>', 1, '<module>', None),
   ('C:\\Python\\lib\\idlelib\\run.py', 126, 'main', 'ret = method(*args, **kwargs)'),
   ('C:\\Python\\lib\\idlelib\\run.py', 353, 'runcode', 'exec(code, self.locals)'),
   ('<pyshell#1>', 1, '<module>', None)
]

这是的输出''.join(traceback.format_stack())。添加了格式以提高可读性。

>>> ''.join(traceback.format_stack())
'  File "<string>", line 1, in <module>\n
   File "C:\\Python\\lib\\idlelib\\run.py", line 126, in main\n
       ret = method(*args, **kwargs)\n
   File "C:\\Python\\lib\\idlelib\\run.py", line 353, in runcode\n
       exec(code, self.locals)\n  File "<pyshell#2>", line 1, in <module>\n'

Use traceback.extract_stack() if you want convenient access to module and function names and line numbers.

Use ''.join(traceback.format_stack()) if you just want a string that looks like the traceback.print_stack() output.

Notice that even with ''.join() you will get a multi-line string, since the elements of format_stack() contain \n. See output below.

Remember to import traceback.

Here’s the output from traceback.extract_stack(). Formatting added for readability.

>>> traceback.extract_stack()
[
   ('<string>', 1, '<module>', None),
   ('C:\\Python\\lib\\idlelib\\run.py', 126, 'main', 'ret = method(*args, **kwargs)'),
   ('C:\\Python\\lib\\idlelib\\run.py', 353, 'runcode', 'exec(code, self.locals)'),
   ('<pyshell#1>', 1, '<module>', None)
]

Here’s the output from ''.join(traceback.format_stack()). Formatting added for readability.

>>> ''.join(traceback.format_stack())
'  File "<string>", line 1, in <module>\n
   File "C:\\Python\\lib\\idlelib\\run.py", line 126, in main\n
       ret = method(*args, **kwargs)\n
   File "C:\\Python\\lib\\idlelib\\run.py", line 353, in runcode\n
       exec(code, self.locals)\n  File "<pyshell#2>", line 1, in <module>\n'

回答 3

当您从异常处理程序中取出异常对象或回溯对象时要小心,因为这会导致循环引用并且gc.collect()将无法收集。在ipython / jupyter笔记本环境中,这似乎是一个特殊的问题,在该环境中无法在正确的时间清除回溯对象,甚至对gc.collect()in finallysection 的显式调用也不起作用。这就是一个很大的问题,如果您有一些大的对象因此而无法回收它们的内存(例如,没有此解决方案的CUDA内存不足异常,则需要完整的内核重新启动才能恢复)。

通常,如果要保存回溯对象,则需要从对的引用中清除它locals(),如下所示:

import sys, traceback, gc
type, val, tb = None, None, None
try:
    myfunc()
except:
    type, val, tb = sys.exc_info()
    traceback.clear_frames(tb)
# some cleanup code
gc.collect()
# and then use the tb:
if tb:
    raise type(val).with_traceback(tb)

对于jupyter notebook,您至少必须在异常处理程序中执行此操作:

try:
    myfunc()
except:
    type, val, tb = sys.exc_info()
    traceback.clear_frames(tb)
    raise type(val).with_traceback(tb)
finally:
    # cleanup code in here
    gc.collect()

经过python 3.7测试。

ps ipython或jupyter Notebook env的问题在于它具有%tb魔术功能,可以保存回溯并在以后的任何时候使用。结果locals(),参与回溯的所有帧中的任何帧都不会被释放,直到笔记本退出或另一个异常将覆盖先前存储的回溯。这是很成问题的。它不应存储没有清洗其框架的回溯。已在此处提交修订。

Be careful when you take the exception object or the traceback object out of the exception handler, since this causes circular references and gc.collect() will fail to collect. This appears to be of a particular problem in the ipython/jupyter notebook environment where the traceback object doesn’t get cleared at the right time and even an explicit call to gc.collect() in finally section does nothing. And that’s a huge problem if you have some huge objects that don’t get their memory reclaimed because of that (e.g. CUDA out of memory exceptions that w/o this solution require a complete kernel restart to recover).

In general if you want to save the traceback object, you need to clear it from references to locals(), like so:

import sys, traceback, gc
type, val, tb = None, None, None
try:
    myfunc()
except:
    type, val, tb = sys.exc_info()
    traceback.clear_frames(tb)
# some cleanup code
gc.collect()
# and then use the tb:
if tb:
    raise type(val).with_traceback(tb)

In the case of jupyter notebook, you have to do that at the very least inside the exception handler:

try:
    myfunc()
except:
    type, val, tb = sys.exc_info()
    traceback.clear_frames(tb)
    raise type(val).with_traceback(tb)
finally:
    # cleanup code in here
    gc.collect()

Tested with python 3.7.

p.s. the problem with ipython or jupyter notebook env is that it has %tb magic which saves the traceback and makes it available at any point later. And as a result any locals() in all frames participating in the traceback will not be freed until the notebook exits or another exception will overwrite the previously stored backtrace. This is very problematic. It should not store the traceback w/o cleaning its frames. Fix submitted here.


回答 4

该对象可用作Exception.with_traceback()函数中的参数:

except Exception as e:
    tb = sys.exc_info()
    print(e.with_traceback(tb[2]))

The object can be used as a parameter in Exception.with_traceback() function:

except Exception as e:
    tb = sys.exc_info()
    print(e.with_traceback(tb[2]))

解密Flask app.secret_key

问题:解密Flask app.secret_key

如果 app.secret_key未设置,则Flask将不允许您设置或访问会话字典。

这是烧瓶使用者指南必须针对此主题的所有内容。

我对Web开发非常陌生,我不知道任何/为什么安全性工作原理。我想了解Flask在幕后所做的事情。

  • 为什么Flask强迫我们设置此secret_key属性?
  • Flask如何使用该secret_key物业?

If app.secret_key isn’t set, Flask will not allow you to set or access the session dictionary.

This is all that the flask user guide has to say on the subject.

I am very new to web development and I have no idea how/why any security stuff works. I would like to understand what Flask is doing under the hood.

  • Why does Flask force us to set this secret_key property?
  • How does Flask use the secret_key property?

回答 0

任何需要加密的内容(为了安全防范攻击者的篡改)都需要设置密钥。对于刚刚瓶本身,即“什么”是Session对象,但其他的扩展可以使用相同的秘密的。

secret_key仅仅是为SECRET_KEY配置密钥设置的值,或者您可以直接设置它。

快速入门中的会话”部分对应设置哪种服务器端机密提供了很好的建议。

加密取决于机密;如果您没有设置要使用的加密服务器端密码,那么每个人都可以破坏您的加密;就像您计算机的密码一样。秘密加上要签名的数据用于创建签名字符串,使用加密哈希算法很难重新创建值;仅当您具有完全相同的机密原始数据时,您才能重新创建此值,让Flask检测是否未经许可对任何内容进行了更改。由于Flask永远不会将秘密包含在发送给客户端的数据中,因此客户端无法篡改会话数据,并希望产生新的有效签名。

Flask使用该itsdangerous来完成所有艰苦的工作;会话使用带有自定义JSON序列化程序的itsdangerous.URLSafeTimedSerializer

Anything that requires encryption (for safe-keeping against tampering by attackers) requires the secret key to be set. For just Flask itself, that ‘anything’ is the Session object, but other extensions can make use of the same secret.

secret_key is merely the value set for the SECRET_KEY configuration key, or you can set it directly.

The Sessions section in the Quickstart has good, sane advice on what kind of server-side secret you should set.

Encryption relies on secrets; if you didn’t set a server-side secret for the encryption to use, everyone would be able to break your encryption; it’s like the password to your computer. The secret plus the data-to-sign are used to create a signature string, a hard-to-recreate value using a cryptographic hashing algorithm; only if you have the exact same secret and the original data can you recreate this value, letting Flask detect if anything has been altered without permission. Since the secret is never included with data Flask sends to the client, a client cannot tamper with session data and hope to produce a new, valid signature.

Flask uses the itsdangerous library to do all the hard work; sessions use the itsdangerous.URLSafeTimedSerializer class with a customized JSON serializer.


回答 1

以下答案主要与Signed Cookies有关,后者会话概念的实现(在Web应用程序中使用)。Flask同时提供普通(未签名)Cookie(通过request.cookiesresponse.set_cookie())和签名Cookie(via flask.session)。答案有两个部分,第一部分描述了如何生成签名Cookie,第二部分以解决方案不同方面的QA形式呈现。这些示例使用的语法是Python3,但是这些概念也适用于以前的版本。

什么是SECRET_KEY(或如何创建签名Cookie)?

签署cookie是防止cookie篡改的预防措施。在对Cookie进行签名的过程中,SECRET_KEY使用方式类似于在对哈希进行哈希处理之前使用“盐”来混淆密码的方式。这是对该概念的(疯狂的)简化描述。示例中的代码是说明性的。许多步骤已被省略,并非实际上所有功能都存在。这里的目的是提供对一般概念的理解,实际的实现将更多地涉及到。另外,请记住,Flask在后台为您完成了大部分操作。因此,除了(通过会话API)为cookie设置值并提供SECRET_KEY,不仅不建议您自己重新实现,而且也不需要这样做:

一个穷人的饼干签名

在发送响应到浏览器之前:

(1)首先SECRET_KEY建立。它仅应由应用程序知道,并且应在应用程序的生命周期中保持相对恒定,包括通过重新启动应用程序。

# choose a salt, a secret string of bytes
>>> SECRET_KEY = 'my super secret key'.encode('utf8')

(2)创建一个cookie

>>> cookie = make_cookie(
...     name='_profile', 
...     content='uid=382|membership=regular',
...     ...
...     expires='July 1 2030...'
... )

>>> print(cookie)
name: _profile
content: uid=382|membership=regular...
    ...
    ...
expires: July 1 2030, 1:20:40 AM UTC

(3)创建签名,将其附加(或前置)SECRET_KEY到cookie字节字符串,然后根据该组合生成哈希。

# encode and salt the cookie, then hash the result
>>> cookie_bytes = str(cookie).encode('utf8')
>>> signature = sha1(cookie_bytes+SECRET_KEY).hexdigest()
>>> print(signature)
7ae0e9e033b5fa53aa....

(4)现在将签名粘贴content在原始cookie字段的一端。

# include signature as part of the cookie
>>> cookie.content = cookie.content + '|' + signature
>>> print(cookie)
name: _profile
content: uid=382|membership=regular|7ae0e9...  <--- signature
domain: .example.com
path: /
send for: Encrypted connections only
expires: July 1 2030, 1:20:40 AM UTC

这就是发送给客户端的内容。

# add cookie to response
>>> response.set_cookie(cookie)
# send to browser --> 

从浏览器收到Cookie后:

(5)当浏览器将此Cookie返回服务器时,请从Cookie的content字段中删除签名以获取原始Cookie。

# Upon receiving the cookie from browser
>>> cookie = request.get_cookie()
# pop the signature out of the cookie
>>> (cookie.content, popped_signature) = cookie.content.rsplit('|', 1)

(6)使用与SECRET_KEY步骤3相同的方法,将原始cookie与应用程序一起使用以重新计算签名。

# recalculate signature using SECRET_KEY and original cookie
>>> cookie_bytes = str(cookie).encode('utf8')
>>> calculated_signature = sha1(cookie_bytes+SECRET_KEY).hexdigest()

(7)将计算结果与先前从刚收到的cookie中弹出的签名进行比较。如果它们匹配,我们就知道cookie没有被弄乱。但是,即使只是在cookie中添加了一个空格,签名也不会匹配。

# if both signatures match, your cookie has not been modified
>>> good_cookie = popped_signature==calculated_signature

(8)如果它们不匹配,那么您可以采取任何行动来响应,记录事件,丢弃cookie,发出新的cookie,重定向到登录页面等。

>>> if not good_cookie:
...     security_log(cookie)

基于哈希的消息验证码(HMAC)

上面生成的签名类型需要一个密钥来确保某些内容的完整性,在密码学中称为消息验证码MAC

我在前面指出,上面的示例是对该概念的过度简化,并且实现自己的签名不是一个好主意。这是因为用于在Flask中对Cookie进行签名的算法称为HMAC,并且比上述简单的逐步操作要复杂得多。总体思路是相同的,但是由于超出了本讨论范围的原因,因此一系列计算更为复杂。如果您仍然对制作DIY感兴趣(通常是这样),Python提供了一些模块来帮助您入门:)这是一个起点:

import hmac
import hashlib

def create_signature(secret_key, msg, digestmod=None):
    if digestmod is None:
        digestmod = hashlib.sha1
    mac = hmac.new(secret_key, msg=msg, digestmod=digestmod)
    return mac.digest()

hmachashlib的文档。


SECRET_KEY:) 的“神秘化”

在这种情况下,什么是“签名”?

这是确保除未经授权的个人或实体以外的任何人未修改某些内容的方法。

签名的最简单形式之一是“ 校验和 ”,它简单地验证两个数据是否相同。例如,从源代码安装软件时,重要的是首先确认您的源代码副本与作者的副本相同。一种常用的方法是通过加密哈希函数运行源,并将输出与项目主页上发布的校验和进行比较。

举例来说,假设您要从网络镜像以压缩文件的形式下载项目的源代码。在该项目的网页上发布的SHA1校验和为“ eb84e8da7ca23e9f83 …”。

# so you get the code from the mirror
download https://mirror.example-codedump.com/source_code.tar.gz
# you calculate the hash as instructed
sha1(source_code.tar.gz)
> eb84e8da7c....

这两个哈希是相同的,您知道您拥有相同的副本。

什么是Cookie?

关于cookie的广泛讨论将超出此问题的范围。我在这里提供了一个概述,因为对基础知识的最低了解对于更好地了解如何以及为什么SECRET_KEY有用很有用。我强烈建议您跟进一些有关HTTP Cookie的个人阅读。

Web应用程序中的一种常见做法是将客户端(Web浏览器)用作轻量级缓存。Cookies是这种做法的一种实现。Cookie通常是服务器通过其标头添加到HTTP响应的一些数据。它由浏览器保存,然后在发出请求时也通过HTTP标头将其发送回服务器。Cookie中包含的数据可用于模拟所谓的有状态性,即服务器正在与客户端保持持续连接的错觉。仅在这种情况下,您无需为保持连接“活动”而进行连接,而仅在处理了客户端的请求后才拥有应用程序状态的快照。这些快照在客户端和服务器之间来回传送。收到请求后,服务器首先读取cookie的内容,以重新建立与客户端的对话上下文。然后,它在该上下文中处理请求,然后将响应返回给客户端,然后更新cookie。因此,可以保持正在进行的会话的幻觉。

Cookie是什么样的?

典型的cookie如下所示:

name: _profile
content: uid=382|status=genie
domain: .example.com
path: /
send for: Encrypted connections only
expires: July 1 2030, 1:20:40 AM UTC

从任何现代浏览器中都可以细读Cookie。例如,在Firefox上,转到“首选项”>“隐私”>“历史记录”>删除单个cookie

content字段与应用程序最相关。其他字段大多带有元指令,以指定各种影响范围。

为什么要使用Cookie?

简短的答案是性能。使用cookie可以最大程度地减少在各种数据存储(内存缓存,文件,数据库等)中查找内容的需求,从而可以在服务器应用程序方面加快查找速度。请记住,cookie越大,网络上的有效负载就越重,因此您在服务器上的数据库查找中保存的内容可能会通过网络丢失。请仔细考虑您的Cookie中要包含的内容。

为什么需要对cookie进行签名?

Cookies用于保存各种信息,其中某些信息可能非常敏感。从本质上讲,它们也不安全,并且要求采取许多辅助预防措施,无论对于客户端还是服务器双方,都应以任何方式视为安全。签名cookie专门解决了在欺骗服务器应用程序时可能对其进行修补的问题。还有其他措施可以缓解其他类型的漏洞,我建议您阅读有关Cookie的更多信息。

Cookie如何被篡改?

Cookies以文本形式驻留在客户端上,可以轻松进行编辑。您的服务器应用程序收到的Cookie可能由于多种原因而被修改,其中某些原因可能并非无辜。想象一个Web应用程序在cookie上保留有关其用户的许可信息,并根据该信息授予特权。如果该cookie不可靠,则任何人都可以修改其cookie,以将其状态从“ role = visitor”提升到“ role = admin”,并且该应用程序再合适不过了。

为什么SECRET_KEY需要签署Cookie?

验证cookie与之前描述的验证源代码有点不同。对于源代码,原始作者是参考指纹(校验和)的受托人和所有者,参考指纹将保持公开状态。您不信任的是源代码,但是您信任公共签名。因此,要验证您的源副本,您只希望您计算出的哈希值与公共哈希值匹配。

对于Cookie,应用程序不跟踪签名,而是跟踪SECRET_KEY。的SECRET_KEY是参考指纹。Cookies随身携带其声称合法的签名。此处的合法性意味着签名是由Cookie的所有者(即应用程序)发布的,在这种情况下,这是声称您不信任并且需要检查签名的有效性。为此,您需要在签名中包括一个只有您自己知道的元素,即SECRET_KEY。有人可以更改Cookie,但是由于他们没有秘密成分来正确计算有效签名,因此无法欺骗它。如前所述,这种类型的指纹识别在校验和的顶部还提供了一个秘密密钥,

会话呢?

传统实现中的会话是cookie,该cookie在content字段中仅携带一个ID session_id。会话的目的与签名的cookie完全相同,即防止cookie篡改。古典会议有不同的方法。收到会话cookie后,服务器将使用ID在其自己的本地存储中查找会话数据,该本地存储可以是数据库,文件,有时还可以是内存中的缓存。会话cookie通常设置为在关闭浏览器时过期。由于存在本地存储查找步骤,因此会话的这种实现通常会导致性能下降。签名cookie正在成为首选,这就是Flask会话的实现方式。换句话说,Flask会话对Cookie 进行了签名,要在Flask中使用签名的Cookie,只需使用其Session API。

为什么不同时加密cookie?

有时,在对Cookie的内容进行签名之前,可以对其进行加密。如果认为它们太敏感而无法从浏览器中看到它们,则可以执行此操作(加密将隐藏内容)。但是,仅对cookie进行签名就解决了另一种需求,即希望在浏览器上保持一定程度的可见性和可用性,同时又避免被cookie所干扰。

如果我改变了会SECRET_KEY怎样?

通过更改,SECRET_KEY您将使所有使用前一个密钥签名的cookie 失效。当应用程序接收到一个请求,该请求包含一个使用前一个签名的cookie时,SECRET_KEY它将尝试使用新的签名计算签名SECRET_KEY,并且两个签名都不匹配,该cookie及其所有数据将被拒绝,就像浏览器是第一次连接到服务器。用户将被注销,他们的旧Cookie和内部存储的所有内容都将被遗忘。请注意,这与处理过期Cookie的方式不同。如果过期的cookie的签名签出,则可以延长其租约。无效的签名仅表示一个普通的无效cookie。

因此,除非您想使所有已签名的Cookie失效,否则请尝试SECRET_KEY长时间保持相同。

有什么好处SECRET_KEY

密钥应该很难猜到。有关Sessions的文档为随机密钥生成提供了一个很好的方法:

>>> import os
>>> os.urandom(24)
'\xfd{H\xe5<\x95\xf9\xe3\x96.5\xd1\x01O<!\xd5\xa2\xa0\x9fR"\xa1\xa8'

您复制密钥并将其粘贴为配置文件中的值SECRET_KEY

除了使用随机生成的密钥之外,您还可以使用各种复杂的单词,数字和符号,这些单词,数字和符号可能排列在您只知道的句子中,并且以字节形式编码。

不要设置SECRET_KEY与生成每个这就是所谓的时间不同的键的功能直接。例如,不要这样做:

# this is not good
SECRET_KEY = random_key_generator()

每次重新启动应用程序时,都会获得一个新密钥,从而使之前的密钥失效。

而是打开一个交互式python shell并调用该函数以生成密钥,然后将其复制并粘贴到配置中。

The answer below pertains primarily to Signed Cookies, an implementation of the concept of sessions (as used in web applications). Flask offers both, normal (unsigned) cookies (via request.cookies and response.set_cookie()) and signed cookies (via flask.session). The answer has two parts, the first describes how a Signed Cookie is generated, and the second is presented in the form of a QA that addresses different aspects of the scheme. The syntax used for the examples is Python3, but the concepts apply also to previous versions.

What is SECRET_KEY (or how to create a Signed Cookie)?

Signing cookies is a preventive measure against cookie tampering. During the process of signing a cookie, the SECRET_KEY is used in a way similar to how a “salt” would be used to muddle a password before hashing it. Here’s a (wildly) simplified description of the concept. The code in the examples is meant to be illustrative. Many of the steps have been omitted and not all of the functions actually exist. The goal here is to provide an understanding of the general idea, actual implementations will be a bit more involved. Also, keep in mind that Flask does most of this for you in the background. So, besides setting values to your cookie (via the session API) and providing a SECRET_KEY, it’s not only ill-advised to reimplement this yourself, but there’s no need to do so:

A poor man’s cookie signature

Before sending a Response to the browser:

( 1 ) First a SECRET_KEY is established. It should only be known to the application and should be kept relatively constant during the application’s life cycle, including through application restarts.

# choose a salt, a secret string of bytes
>>> SECRET_KEY = 'my super secret key'.encode('utf8')

( 2 ) create a cookie

>>> cookie = make_cookie(
...     name='_profile', 
...     content='uid=382|membership=regular',
...     ...
...     expires='July 1 2030...'
... )

>>> print(cookie)
name: _profile
content: uid=382|membership=regular...
    ...
    ...
expires: July 1 2030, 1:20:40 AM UTC

( 3 ) to create a signature, append (or prepend) the SECRET_KEY to the cookie byte string, then generate a hash from that combination.

# encode and salt the cookie, then hash the result
>>> cookie_bytes = str(cookie).encode('utf8')
>>> signature = sha1(cookie_bytes+SECRET_KEY).hexdigest()
>>> print(signature)
7ae0e9e033b5fa53aa....

( 4 ) Now affix the signature at one end of the content field of the original cookie.

# include signature as part of the cookie
>>> cookie.content = cookie.content + '|' + signature
>>> print(cookie)
name: _profile
content: uid=382|membership=regular|7ae0e9...  <--- signature
domain: .example.com
path: /
send for: Encrypted connections only
expires: July 1 2030, 1:20:40 AM UTC

and that’s what’s sent to the client.

# add cookie to response
>>> response.set_cookie(cookie)
# send to browser --> 

Upon receiving the cookie from the browser:

( 5 ) When the browser returns this cookie back to the server, strip the signature from the cookie’s content field to get back the original cookie.

# Upon receiving the cookie from browser
>>> cookie = request.get_cookie()
# pop the signature out of the cookie
>>> (cookie.content, popped_signature) = cookie.content.rsplit('|', 1)

( 6 ) Use the original cookie with the application’s SECRET_KEY to recalculate the signature using the same method as in step 3.

# recalculate signature using SECRET_KEY and original cookie
>>> cookie_bytes = str(cookie).encode('utf8')
>>> calculated_signature = sha1(cookie_bytes+SECRET_KEY).hexdigest()

( 7 ) Compare the calculated result with the signature previously popped out of the just received cookie. If they match, we know that the cookie has not been messed with. But if even just a space has been added to the cookie, the signatures won’t match.

# if both signatures match, your cookie has not been modified
>>> good_cookie = popped_signature==calculated_signature

( 8 ) If they don’t match then you may respond with any number of actions, log the event, discard the cookie, issue a fresh one, redirect to a login page, etc.

>>> if not good_cookie:
...     security_log(cookie)

Hash-based Message Authentication Code (HMAC)

The type of signature generated above that requires a secret key to ensure the integrity of some contents is called in cryptography a Message Authentication Code or MAC.

I specified earlier that the example above is an oversimplification of that concept and that it wasn’t a good idea to implement your own signing. That’s because the algorithm used to sign cookies in Flask is called HMAC and is a bit more involved than the above simple step-by-step. The general idea is the same, but due to reasons beyond the scope of this discussion, the series of computations are a tad bit more complex. If you’re still interested in crafting a DIY, as it’s usually the case, Python has some modules to help you get started :) here’s a starting block:

import hmac
import hashlib

def create_signature(secret_key, msg, digestmod=None):
    if digestmod is None:
        digestmod = hashlib.sha1
    mac = hmac.new(secret_key, msg=msg, digestmod=digestmod)
    return mac.digest()

The documentaton for hmac and hashlib.


The “Demystification” of SECRET_KEY :)

What’s a “signature” in this context?

It’s a method to ensure that some content has not been modified by anyone other than a person or an entity authorized to do so.

One of the simplest forms of signature is the “checksum“, which simply verifies that two pieces of data are the same. For example, when installing software from source it’s important to first confirm that your copy of the source code is identical to the author’s. A common approach to do this is to run the source through a cryptographic hash function and compare the output with the checksum published on the project’s home page.

Let’s say for instance that you’re about to download a project’s source in a gzipped file from a web mirror. The SHA1 checksum published on the project’s web page is ‘eb84e8da7ca23e9f83….’

# so you get the code from the mirror
download https://mirror.example-codedump.com/source_code.tar.gz
# you calculate the hash as instructed
sha1(source_code.tar.gz)
> eb84e8da7c....

Both hashes are the same, you know that you have an identical copy.

What’s a cookie?

An extensive discussion on cookies would go beyond the scope of this question. I provide an overview here since a minimal understanding can be useful to have a better understanding of how and why SECRET_KEY is useful. I highly encourage you to follow up with some personal readings on HTTP Cookies.

A common practice in web applications is to use the client (web browser) as a lightweight cache. Cookies are one implementation of this practice. A cookie is typically some data added by the server to an HTTP response by way of its headers. It’s kept by the browser which subsequently sends it back to the server when issuing requests, also by way of HTTP headers. The data contained in a cookie can be used to emulate what’s called statefulness, the illusion that the server is maintaining an ongoing connection with the client. Only, in this case, instead of a wire to keep the connection “alive”, you simply have snapshots of the state of the application after it has handled a client’s request. These snapshots are carried back and forth between client and server. Upon receiving a request, the server first reads the content of the cookie to reestablish the context of its conversation with the client. It then handles the request within that context and before returning the response to the client, updates the cookie. The illusion of an ongoing session is thus maintained.

What does a cookie look like?

A typical cookie would look like this:

name: _profile
content: uid=382|status=genie
domain: .example.com
path: /
send for: Encrypted connections only
expires: July 1 2030, 1:20:40 AM UTC

Cookies are trivial to peruse from any modern browser. On Firefox for example go to Preferences > Privacy > History > remove individual cookies.

The content field is the most relevant to the application. Other fields carry mostly meta instructions to specify various scopes of influence.

Why use cookies at all?

The short answer is performance. Using cookies, minimizes the need to look things up in various data stores (memory caches, files, databases, etc), thus speeding things up on the server application’s side. Keep in mind that the bigger the cookie the heavier the payload over the network, so what you save in database lookup on the server you might lose over the network. Consider carefully what to include in your cookies.

Why would cookies need to be signed?

Cookies are used to keep all sorts of information, some of which can be very sensitive. They’re also by nature not safe and require that a number of auxiliary precautions be taken to be considered secure in any way for both parties, client and server. Signing cookies specifically addresses the problem that they can be tinkered with in attempts to fool server applications. There are other measures to mitigate other types of vulnerabilities, I encourage you to read up more on cookies.

How can a cookie be tampered with?

Cookies reside on the client in text form and can be edited with no effort. A cookie received by your server application could have been modified for a number of reasons, some of which may not be innocent. Imagine a web application that keeps permission information about its users on cookies and grants privileges based on that information. If the cookie is not tinker-proof, anyone could modify theirs to elevate their status from “role=visitor” to “role=admin” and the application would be none the wiser.

Why is a SECRET_KEY necessary to sign cookies?

Verifying cookies is a tad bit different than verifying source code the way it’s described earlier. In the case of the source code, the original author is the trustee and owner of the reference fingerprint (the checksum), which will be kept public. What you don’t trust is the source code, but you trust the public signature. So to verify your copy of the source you simply want your calculated hash to match the public hash.

In the case of a cookie however the application doesn’t keep track of the signature, it keeps track of its SECRET_KEY. The SECRET_KEY is the reference fingerprint. Cookies travel with a signature that they claim to be legit. Legitimacy here means that the signature was issued by the owner of the cookie, that is the application, and in this case, it’s that claim that you don’t trust and you need to check the signature for validity. To do that you need to include an element in the signature that is only known to you, that’s the SECRET_KEY. Someone may change a cookie, but since they don’t have the secret ingredient to properly calculate a valid signature they cannot spoof it. As stated a bit earlier this type of fingerprinting, where on top of the checksum one also provides a secret key, is called a Message Authentication Code.

What about Sessions?

Sessions in their classical implementation are cookies that carry only an ID in the content field, the session_id. The purpose of sessions is exactly the same as signed cookies, i.e. to prevent cookie tampering. Classical sessions have a different approach though. Upon receiving a session cookie the server uses the ID to look up the session data in its own local storage, which could be a database, a file, or sometimes a cache in memory. The session cookie is typically set to expire when the browser is closed. Because of the local storage lookup step, this implementation of sessions typically incurs a performance hit. Signed cookies are becoming a preferred alternative and that’s how Flask’s sessions are implemented. In other words, Flask sessions are signed cookies, and to use signed cookies in Flask just use its Session API.

Why not also encrypt the cookies?

Sometimes the contents of cookies can be encrypted before also being signed. This is done if they’re deemed too sensitive to be visible from the browser (encryption hides the contents). Simply signing cookies however, addresses a different need, one where there’s a desire to maintain a degree of visibility and usability to cookies on the browser, while preventing that they’d be meddled with.

What happens if I change the SECRET_KEY?

By changing the SECRET_KEY you’re invalidating all cookies signed with the previous key. When the application receives a request with a cookie that was signed with a previous SECRET_KEY, it will try to calculate the signature with the new SECRET_KEY, and both signatures won’t match, this cookie and all its data will be rejected, it will be as if the browser is connecting to the server for the first time. Users will be logged out and their old cookie will be forgotten, along with anything stored inside. Note that this is different from the way an expired cookie is handled. An expired cookie may have its lease extended if its signature checks out. An invalid signature just implies a plain invalid cookie.

So unless you want to invalidate all signed cookies, try to keep the SECRET_KEY the same for extended periods.

What’s a good SECRET_KEY?

A secret key should be hard to guess. The documentation on Sessions has a good recipe for random key generation:

>>> import os
>>> os.urandom(24)
'\xfd{H\xe5<\x95\xf9\xe3\x96.5\xd1\x01O<!\xd5\xa2\xa0\x9fR"\xa1\xa8'

You copy the key and paste it in your configuration file as the value of SECRET_KEY.

Short of using a key that was randomly generated, you could use a complex assortment of words, numbers, and symbols, perhaps arranged in a sentence known only to you, encoded in byte form.

Do not set the SECRET_KEY directly with a function that generates a different key each time it’s called. For example, don’t do this:

# this is not good
SECRET_KEY = random_key_generator()

Each time your application is restarted it will be given a new key, thus invalidating the previous.

Instead, open an interactive python shell and call the function to generate the key, then copy and paste it to the config.


如何在Python中切换值

问题:如何在Python中切换值

0和之间切换的最有效方法是什么1

What is the most efficient way to toggle between 0 and 1?


回答 0

使用NOT的解决方案

如果值是布尔值,最快的方法是使用not运算符:

>>> x = True
>>> x = not x        # toggle
>>> x
False
>>> x = not x        # toggle
>>> x
True
>>> x = not x        # toggle
>>> x
False

用减法求解

如果值是数字,则从总数中减去是切换值的一种简单快捷的方法:

>>> A = 5
>>> B = 3
>>> total = A + B
>>> x = A
>>> x = total - x    # toggle
>>> x
3
>>> x = total - x    # toggle
>>> x
5
>>> x = total - x    # toggle
>>> x
3

使用XOR的解决方案

如果该值在01之间切换,则可以使用按位异或

>>> x = 1
>>> x ^= 1
>>> x
0
>>> x ^= 1
>>> x
1

该技术可以推广到任意一对整数。异或一步被替换为预异常数:

>>> A = 205
>>> B = -117
>>> t = A ^ B        # precomputed toggle constant
>>> x = A
>>> x ^= t           # toggle
>>> x
-117
>>> x ^= t           # toggle
>>> x
205
>>> x ^= t           # toggle
>>> x
-117

(此想法由Nick Coghlan提交,后来由@zxxc推广。)

使用字典的解决方案

如果值是可哈希的,则可以使用字典:

>>> A = 'xyz'
>>> B = 'pdq'
>>> d = {A:B, B:A}
>>> x = A
>>> x = d[x]         # toggle
>>> x
'pdq'
>>> x = d[x]         # toggle
>>> x
'xyz'
>>> x = d[x]         # toggle
>>> x
'pdq'

使用条件表达式的解决方案

最慢的方法是使用条件表达式

>>> A = [1,2,3]
>>> B = [4,5,6]
>>> x = A
>>> x = B if x == A else A
>>> x
[4, 5, 6]
>>> x = B if x == A else A
>>> x
[1, 2, 3]
>>> x = B if x == A else A
>>> x
[4, 5, 6]

使用itertools的解决方案

如果您有两个以上的值,则itertools.cycle()函数提供了一种通用的快速方法来在连续的值之间进行切换:

>>> import itertools
>>> toggle = itertools.cycle(['red', 'green', 'blue']).next
>>> toggle()
'red'
>>> toggle()
'green'
>>> toggle()
'blue'
>>> toggle()
'red'
>>> toggle()
'green'
>>> toggle()
'blue'

请注意,在Python 3中,next()方法已更改为__next__(),因此第一行现在将写为toggle = itertools.cycle(['red', 'green', 'blue']).__next__

Solution using NOT

If the values are boolean, the fastest approach is to use the not operator:

>>> x = True
>>> x = not x        # toggle
>>> x
False
>>> x = not x        # toggle
>>> x
True
>>> x = not x        # toggle
>>> x
False

Solution using subtraction

If the values are numerical, then subtraction from the total is a simple and fast way to toggle values:

>>> A = 5
>>> B = 3
>>> total = A + B
>>> x = A
>>> x = total - x    # toggle
>>> x
3
>>> x = total - x    # toggle
>>> x
5
>>> x = total - x    # toggle
>>> x
3

Solution using XOR

If the value toggles between 0 and 1, you can use a bitwise exclusive-or:

>>> x = 1
>>> x ^= 1
>>> x
0
>>> x ^= 1
>>> x
1

The technique generalizes to any pair of integers. The xor-by-one step is replaced with a xor-by-precomputed-constant:

>>> A = 205
>>> B = -117
>>> t = A ^ B        # precomputed toggle constant
>>> x = A
>>> x ^= t           # toggle
>>> x
-117
>>> x ^= t           # toggle
>>> x
205
>>> x ^= t           # toggle
>>> x
-117

(This idea was submitted by Nick Coghlan and later generalized by @zxxc.)

Solution using a dictionary

If the values are hashable, you can use a dictionary:

>>> A = 'xyz'
>>> B = 'pdq'
>>> d = {A:B, B:A}
>>> x = A
>>> x = d[x]         # toggle
>>> x
'pdq'
>>> x = d[x]         # toggle
>>> x
'xyz'
>>> x = d[x]         # toggle
>>> x
'pdq'

Solution using a conditional expression

The slowest way is to use a conditional expression:

>>> A = [1,2,3]
>>> B = [4,5,6]
>>> x = A
>>> x = B if x == A else A
>>> x
[4, 5, 6]
>>> x = B if x == A else A
>>> x
[1, 2, 3]
>>> x = B if x == A else A
>>> x
[4, 5, 6]

Solution using itertools

If you have more than two values, the itertools.cycle() function provides a generic fast way to toggle between successive values:

>>> import itertools
>>> toggle = itertools.cycle(['red', 'green', 'blue']).next
>>> toggle()
'red'
>>> toggle()
'green'
>>> toggle()
'blue'
>>> toggle()
'red'
>>> toggle()
'green'
>>> toggle()
'blue'

Note that in Python 3 the next() method was changed to __next__(), so the first line would be now written as toggle = itertools.cycle(['red', 'green', 'blue']).__next__


回答 1

我一直使用:

p^=True

如果p是布尔值,则在true和false之间切换。

I always use:

p^=True

If p is a boolean, this switches between true and false.


回答 2

这是另一种不直观的方法。优点是您可以循环多个值,而不仅仅是两个[0,1]

对于两个值(切换)

>>> x=[1,0]
>>> toggle=x[toggle]

对于多个值(例如4)

>>> x=[1,2,3,0]
>>> toggle=x[toggle]

我没想到这个解决方案也几乎是最快的

>>> stmt1="""
toggle=0
for i in xrange(0,100):
    toggle = 1 if toggle == 0 else 0
"""
>>> stmt2="""
x=[1,0]
toggle=0
for i in xrange(0,100):
    toggle=x[toggle]
"""
>>> t1=timeit.Timer(stmt=stmt1)
>>> t2=timeit.Timer(stmt=stmt2)
>>> print "%.2f usec/pass" % (1000000 * t1.timeit(number=100000)/100000)
7.07 usec/pass
>>> print "%.2f usec/pass" % (1000000 * t2.timeit(number=100000)/100000)
6.19 usec/pass
stmt3="""
toggle = False
for i in xrange(0,100):
    toggle = (not toggle) & 1
"""
>>> t3=timeit.Timer(stmt=stmt3)
>>> print "%.2f usec/pass" % (1000000 * t3.timeit(number=100000)/100000)
9.84 usec/pass
>>> stmt4="""
x=0
for i in xrange(0,100):
    x=x-1
"""
>>> t4=timeit.Timer(stmt=stmt4)
>>> print "%.2f usec/pass" % (1000000 * t4.timeit(number=100000)/100000)
6.32 usec/pass

Here is another non intuitive way. The beauty is you can cycle over multiple values and not just two [0,1]

For Two values (toggling)

>>> x=[1,0]
>>> toggle=x[toggle]

For Multiple Values (say 4)

>>> x=[1,2,3,0]
>>> toggle=x[toggle]

I didn’t expect this solution to be almost the fastest too

>>> stmt1="""
toggle=0
for i in xrange(0,100):
    toggle = 1 if toggle == 0 else 0
"""
>>> stmt2="""
x=[1,0]
toggle=0
for i in xrange(0,100):
    toggle=x[toggle]
"""
>>> t1=timeit.Timer(stmt=stmt1)
>>> t2=timeit.Timer(stmt=stmt2)
>>> print "%.2f usec/pass" % (1000000 * t1.timeit(number=100000)/100000)
7.07 usec/pass
>>> print "%.2f usec/pass" % (1000000 * t2.timeit(number=100000)/100000)
6.19 usec/pass
stmt3="""
toggle = False
for i in xrange(0,100):
    toggle = (not toggle) & 1
"""
>>> t3=timeit.Timer(stmt=stmt3)
>>> print "%.2f usec/pass" % (1000000 * t3.timeit(number=100000)/100000)
9.84 usec/pass
>>> stmt4="""
x=0
for i in xrange(0,100):
    x=x-1
"""
>>> t4=timeit.Timer(stmt=stmt4)
>>> print "%.2f usec/pass" % (1000000 * t4.timeit(number=100000)/100000)
6.32 usec/pass

回答 3

not运营商否定你的变量(将其转换成一个布尔值,如果它是不是已经一个)。您可以或许1,并0与互换TrueFalse,所以就否定它:

toggle = not toggle

但是,如果您使用两个任意值,请使用inline if

toggle = 'a' if toggle == 'b' else 'b'

The not operator negates your variable (converting it into a boolean if it isn’t already one). You can probably use 1 and 0 interchangeably with True and False, so just negate it:

toggle = not toggle

But if you are using two arbitrary values, use an inline if:

toggle = 'a' if toggle == 'b' else 'b'

回答 4

在1到0之间执行此操作

1-x 

x可以取1或0

Just between 1 and 0, do this

1-x 

x can take 1 or 0


回答 5

三角法,仅仅是因为sincos函数很酷。

>>> import math
>>> def generator01():
...     n=0
...     while True:
...         yield abs( int( math.cos( n * 0.5 * math.pi  ) ) )
...         n+=1
... 
>>> g=generator01() 
>>> g.next()
1
>>> g.next()
0
>>> g.next()
1
>>> g.next()
0

Trigonometric approach, just because sin and cos functions are cool.

>>> import math
>>> def generator01():
...     n=0
...     while True:
...         yield abs( int( math.cos( n * 0.5 * math.pi  ) ) )
...         n+=1
... 
>>> g=generator01() 
>>> g.next()
1
>>> g.next()
0
>>> g.next()
1
>>> g.next()
0

回答 6

令人惊讶的是,没有人提到好的旧除法模2:

In : x = (x + 1)  % 2 ; x
Out: 1

In : x = (x + 1)  % 2 ; x
Out: 0

In : x = (x + 1)  % 2 ; x
Out: 1

In : x = (x + 1)  % 2 ; x
Out: 0

请注意,它等效于x = x - 1,但是取模技术的优点是组的大小或间隔的长度可以大于2个元素,从而为循环提供了类似于轮询交错的方案。

现在只需要2,切换就可以短一些(使用按位运算符):

x = x ^ 1

Surprisingly nobody mention good old division modulo 2:

In : x = (x + 1)  % 2 ; x
Out: 1

In : x = (x + 1)  % 2 ; x
Out: 0

In : x = (x + 1)  % 2 ; x
Out: 1

In : x = (x + 1)  % 2 ; x
Out: 0

Note that it is equivalent to x = x - 1, but the advantage of modulo technique is that the size of the group or length of the interval can be bigger then just 2 elements, thus giving you a similar to round-robin interleaving scheme to loop over.

Now just for 2, toggling can be a bit shorter (using bit-wise operator):

x = x ^ 1

回答 7

一种切换方式是使用多重分配

>>> a = 5
>>> b = 3

>>> t = a, b = b, a
>>> t[0]
3

>>> t = a, b = b, a
>>> t[0]
5

使用itertools:

In [12]: foo = itertools.cycle([1, 2, 3])

In [13]: next(foo)
Out[13]: 1

In [14]: next(foo)
Out[14]: 2

In [15]: next(foo)
Out[15]: 3

In [16]: next(foo)
Out[16]: 1

In [17]: next(foo)
Out[17]: 2

one way to toggle is by using Multiple assignment

>>> a = 5
>>> b = 3

>>> t = a, b = b, a
>>> t[0]
3

>>> t = a, b = b, a
>>> t[0]
5

Using itertools:

In [12]: foo = itertools.cycle([1, 2, 3])

In [13]: next(foo)
Out[13]: 1

In [14]: next(foo)
Out[14]: 2

In [15]: next(foo)
Out[15]: 3

In [16]: next(foo)
Out[16]: 1

In [17]: next(foo)
Out[17]: 2

回答 8

在1和0之间切换的最简单方法是从1减去。

def toggle(value):
    return 1 - value

The easiest way to toggle between 1 and 0 is to subtract from 1.

def toggle(value):
    return 1 - value

回答 9

使用异常处理程序

>>> def toogle(x):
...     try:
...         return x/x-x/x
...     except  ZeroDivisionError:
...         return 1
... 
>>> x=0
>>> x=toogle(x)
>>> x
1
>>> x=toogle(x)
>>> x
0
>>> x=toogle(x)
>>> x
1
>>> x=toogle(x)
>>> x
0

好吧,我是最糟糕的:

import math
import sys

d={1:0,0:1}
l=[1,0]

def exception_approach(x):
    try:
        return x/x-x/x
    except  ZeroDivisionError:
        return 1

def cosinus_approach(x):
    return abs( int( math.cos( x * 0.5 * math.pi  ) ) )

def module_approach(x):
    return  (x + 1)  % 2

def subs_approach(x):
    return  x - 1

def if_approach(x):
    return 0 if x == 1 else 1

def list_approach(x):
    global l
    return l[x]

def dict_approach(x):
    global d
    return d[x]

def xor_approach(x):
    return x^1

def not_approach(x):
    b=bool(x)
    p=not b
    return int(p)

funcs=[ exception_approach, cosinus_approach, dict_approach, module_approach, subs_approach, if_approach, list_approach, xor_approach, not_approach ]

f=funcs[int(sys.argv[1])]
print "\n\n\n", f.func_name
x=0
for _ in range(0,100000000):
    x=f(x)

Using exception handler

>>> def toogle(x):
...     try:
...         return x/x-x/x
...     except  ZeroDivisionError:
...         return 1
... 
>>> x=0
>>> x=toogle(x)
>>> x
1
>>> x=toogle(x)
>>> x
0
>>> x=toogle(x)
>>> x
1
>>> x=toogle(x)
>>> x
0

Ok, I’m the worst:

import math
import sys

d={1:0,0:1}
l=[1,0]

def exception_approach(x):
    try:
        return x/x-x/x
    except  ZeroDivisionError:
        return 1

def cosinus_approach(x):
    return abs( int( math.cos( x * 0.5 * math.pi  ) ) )

def module_approach(x):
    return  (x + 1)  % 2

def subs_approach(x):
    return  x - 1

def if_approach(x):
    return 0 if x == 1 else 1

def list_approach(x):
    global l
    return l[x]

def dict_approach(x):
    global d
    return d[x]

def xor_approach(x):
    return x^1

def not_approach(x):
    b=bool(x)
    p=not b
    return int(p)

funcs=[ exception_approach, cosinus_approach, dict_approach, module_approach, subs_approach, if_approach, list_approach, xor_approach, not_approach ]

f=funcs[int(sys.argv[1])]
print "\n\n\n", f.func_name
x=0
for _ in range(0,100000000):
    x=f(x)

回答 10

怎么样一个假想的切换,存储不仅是当前切换,但与之相关的其他几个值?

toggle = complex.conjugate

在左侧存储任何+或-值,在右侧存储任何无符号的值:

>>> x = 2 - 3j
>>> toggle(x)
(2+3j)

零也起作用:

>>> y = -2 - 0j
>>> toggle(y)
(-2+0j)

轻松检索当前的切换值(TrueFalse表示+和-),LHS(实数)值或RHS(虚数)值:

>>> import math
>>> curr = lambda i: math.atan2(i.imag, -abs(i.imag)) > 0
>>> lhs = lambda i: i.real
>>> rhs = lambda i: abs(i.imag)
>>> x = toggle(x)
>>> curr(x)
True
>>> lhs(x)
2.0
>>> rhs(x)
3.0

轻松交换LHS和RHS(但请注意,两个值的符号一定不重要):

>>> swap = lambda i: i/-1j
>>> swap(2+0j)
2j
>>> swap(3+2j)
(2+3j)

轻松交换LHS和RHS 并同时切换:

>>> swaggle = lambda i: i/1j
>>> swaggle(2+0j)
-2j
>>> swaggle(3+2j)
(2-3j)

防止错误:

>>> toggle(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: descriptor 'conjugate' requires a 'complex' object but received a 'int'

对LHS和RHS进行更改:

>>> x += 1+2j
>>> x
(3+5j)

…但是要小心操作RHS:

>>> z = 1-1j
>>> z += 2j
>>> z
(1+1j) # whoops! toggled it!

How about an imaginary toggle that stores not only the current toggle, but a couple other values associated with it?

toggle = complex.conjugate

Store any + or – value on the left, and any unsigned value on the right:

>>> x = 2 - 3j
>>> toggle(x)
(2+3j)

Zero works, too:

>>> y = -2 - 0j
>>> toggle(y)
(-2+0j)

Easily retrieve the current toggle value (True and False represent + and -), LHS (real) value, or RHS (imaginary) value:

>>> import math
>>> curr = lambda i: math.atan2(i.imag, -abs(i.imag)) > 0
>>> lhs = lambda i: i.real
>>> rhs = lambda i: abs(i.imag)
>>> x = toggle(x)
>>> curr(x)
True
>>> lhs(x)
2.0
>>> rhs(x)
3.0

Easily swap LHS and RHS (but note that the sign of the both values must not be important):

>>> swap = lambda i: i/-1j
>>> swap(2+0j)
2j
>>> swap(3+2j)
(2+3j)

Easily swap LHS and RHS and also toggle at the same time:

>>> swaggle = lambda i: i/1j
>>> swaggle(2+0j)
-2j
>>> swaggle(3+2j)
(2-3j)

Guards against errors:

>>> toggle(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: descriptor 'conjugate' requires a 'complex' object but received a 'int'

Perform changes to LHS and RHS:

>>> x += 1+2j
>>> x
(3+5j)

…but be careful manipulating the RHS:

>>> z = 1-1j
>>> z += 2j
>>> z
(1+1j) # whoops! toggled it!

回答 11

变量a和b可以是任意两个值,例如0和1,或者117和711,或“ heads”和“ tails”。不使用数学运算,每次需要切换时仅快速交换值。

a = True   
b = False   

a,b = b,a   # a is now False
a,b = b,a   # a is now True

Variables a and b can be ANY two values, like 0 and 1, or 117 and 711, or “heads” and “tails”. No math is used, just a quick swap of the values each time a toggle is desired.

a = True   
b = False   

a,b = b,a   # a is now False
a,b = b,a   # a is now True

回答 12

我使用abs函数,在循环中非常有用

x = 1
for y in range(0, 3):
    x = abs(x - 1)

x将为0。

I use abs function, very useful on loops

x = 1
for y in range(0, 3):
    x = abs(x - 1)

x will be 0.


回答 13

让我们做一些框架黑客。按名称切换变量。注意:这可能不适用于每个Python运行时。

假设您有一个变量“ x”

>>> import inspect
>>> def toggle(var_name):
>>>     frame = inspect.currentframe().f_back
>>>     vars = frame.f_locals
>>>     vars[var_name] = 0 if vars[var_name] == 1 else 1

>>> x = 0
>>> toggle('x')
>>> x
1
>>> toggle('x')
>>> x
0

Let’s do some frame hacking. Toggle a variable by name. Note: This may not work with every Python runtime.

Say you have a variable “x”

>>> import inspect
>>> def toggle(var_name):
>>>     frame = inspect.currentframe().f_back
>>>     vars = frame.f_locals
>>>     vars[var_name] = 0 if vars[var_name] == 1 else 1

>>> x = 0
>>> toggle('x')
>>> x
1
>>> toggle('x')
>>> x
0

回答 14

如果要处理整数变量,则可以递增1并将其限制为0和1(mod)

X = 0  # or X = 1
X = (X + 1)%2

If you are dealing with an integer variable, you can increment 1 and limit your set to 0 and 1 (mod)

X = 0  # or X = 1
X = (X + 1)%2

回答 15

可以通过内联乘法在-1和+1之间切换。用于以“ Leibniz”方式(或类似方式)计算pi:

sign = 1
result = 0
for i in range(100000):
    result += 1 / (2*i + 1) * sign
    sign *= -1
print("pi (estimate): ", result*4)

Switching between -1 and +1 can be obtained by inline multiplication; used for calculation of pi the ‘Leibniz’ way (or similar):

sign = 1
result = 0
for i in range(100000):
    result += 1 / (2*i + 1) * sign
    sign *= -1
print("pi (estimate): ", result*4)

回答 16

您可以使用的indexlist秒。

def toggleValues(values, currentValue):
    return values[(values.index(currentValue) + 1) % len(values)]

> toggleValues( [0,1] , 1 )
> 0
> toggleValues( ["one","two","three"] , "one" )
> "two"
> toggleValues( ["one","two","three"] , "three")
> "one"

优点:无需其他库,自我解释代码并可以处理任意数据类型。

缺点:不重复保存。 toggleValues(["one","two","duped", "three", "duped", "four"], "duped") 永远会回来"three"

You can make use of the index of lists.

def toggleValues(values, currentValue):
    return values[(values.index(currentValue) + 1) % len(values)]

> toggleValues( [0,1] , 1 )
> 0
> toggleValues( ["one","two","three"] , "one" )
> "two"
> toggleValues( ["one","two","three"] , "three")
> "one"

Pros: No additional libraries, self.explanatory code and working with arbitrary data types.

Cons: not duplicate-save. toggleValues(["one","two","duped", "three", "duped", "four"], "duped") will always return "three"


使用简单对话框在Python中选择文件

问题:使用简单对话框在Python中选择文件

我想在我的Python控制台应用程序中获取文件路径作为输入。

目前,我只能要求完整路径作为控制台中的输入。

有没有一种触发简单用户界面的方法,用户可以在其中选择文件而不是键入完整路径?

I would like to get file path as input in my Python console application.

Currently I can only ask for full path as an input in the console.

Is there a way to trigger a simple user interface where users can select file instead of typing the full path?


回答 0

使用tkinter怎么样?

from Tkinter import Tk
from tkinter.filedialog import askopenfilename

Tk().withdraw() # we don't want a full GUI, so keep the root window from appearing
filename = askopenfilename() # show an "Open" dialog box and return the path to the selected file
print(filename)

做完了!

How about using tkinter?

from Tkinter import Tk     # from tkinter import Tk for Python 3.x
from tkinter.filedialog import askopenfilename

Tk().withdraw() # we don't want a full GUI, so keep the root window from appearing
filename = askopenfilename() # show an "Open" dialog box and return the path to the selected file
print(filename)

Done!


回答 1

Etaoin答案的Python 3.x版本的完整性:

from tkinter.filedialog import askopenfilename
filename = askopenfilename()

Python 3.x version of Etaoin’s answer for completeness:

from tkinter.filedialog import askopenfilename
filename = askopenfilename()

回答 2

使用EasyGui(由pydocepydoc生成的0.96版文档):

import easygui
print(easygui.fileopenbox())

安装:

pip install easygui

演示:

import easygui
easygui.egdemo()

With EasyGui:

import easygui
print(easygui.fileopenbox())

To install:

pip install easygui

Demo:

import easygui
easygui.egdemo()

回答 3

在Python 2中,使用tkFileDialog模块。

import tkFileDialog

tkFileDialog.askopenfilename()

在Python 3中,使用tkinter.filedialog模块。

import tkinter.filedialog

tkinter.filedialog.askopenfilename()

In Python 2 use the tkFileDialog module.

import tkFileDialog

tkFileDialog.askopenfilename()

In Python 3 use the tkinter.filedialog module.

import tkinter.filedialog

tkinter.filedialog.askopenfilename()

回答 4

另一个要考虑的选项是Zenity:http://freecode.com/projects/zenity

我遇到的情况是我正在开发Python服务器应用程序(没有GUI组件),因此不想引入对任何python GUI工具包的依赖关系,但是我希望我的一些调试脚本可以通过输入文件进行参数化,并且想要如果用户未在命令行上指定文件,则以可视方式提示用户输入文件。Zenity非常适合。为此,请使用子流程模块调用“ zenity –file-selection”并捕获标准输出。当然,此解决方案不是特定于Python的。

Zenity支持多种平台,并且恰好已经安装在我们的开发服务器上,因此它方便了我们的调试/开发,而没有引入不必要的依赖性。

Another option to consider is Zenity: http://freecode.com/projects/zenity.

I had a situation where I was developing a Python server application (no GUI component) and hence didn’t want to introduce a dependency on any python GUI toolkits, but I wanted some of my debug scripts to be parameterized by input files and wanted to visually prompt the user for a file if they didn’t specify one on the command line. Zenity was a perfect fit. To achieve this, invoke “zenity –file-selection” using the subprocess module and capture the stdout. Of course this solution isn’t Python-specific.

Zenity supports multiple platforms and happened to already be installed on our dev servers so it facilitated our debugging/development without introducing an unwanted dependency.


回答 5

我用wxPython获得的结果比tkinter更好,这是对以后重复的问题的回答:

https://stackoverflow.com/a/9319832

wxPython版本通过xfce桌面在我的OpenSUSE Tumbleweed安装中几乎通过任何其他应用程序生成的文件对话框看上去都与打开文件对话框相同,而tkinter却产生了局促且难以理解的内容,并且使用了不熟悉的侧滚动界面。

I obtained much better results with wxPython than tkinter, as suggested in this answer to a later duplicate question:

https://stackoverflow.com/a/9319832

The wxPython version produced the file dialog that looked the same as the open file dialog from just about any other application on my OpenSUSE Tumbleweed installation with the xfce desktop, whereas tkinter produced something cramped and hard to read with an unfamiliar side-scrolling interface.


如何对字符串列表进行数字排序?

问题:如何对字符串列表进行数字排序?

我知道这听起来微不足道,但是我没有意识到sort()Python 的功能很奇怪。我有一个实际上是字符串形式的“数字”列表,因此我首先将它们转换为整数,然后尝试进行排序。

list1=["1","10","3","22","23","4","2","200"]
for item in list1:
    item=int(item)

list1.sort()
print list1

给我:

['1', '10', '2', '200', '22', '23', '3', '4']

我想要的是

['1','2','3','4','10','22','23','200']

我四处寻找与排序数字集相关的算法,但是我发现所有算法都涉及对字母数字集进行排序。

我知道这可能是个没有脑子的问题,但是google和我的教科书没有提供比该.sort()功能有用的功能。

I know that this sounds trivial but I did not realize that the sort() function of Python was weird. I have a list of “numbers” that are actually in string form, so I first convert them to ints, then attempt a sort.

list1=["1","10","3","22","23","4","2","200"]
for item in list1:
    item=int(item)

list1.sort()
print list1

Gives me:

['1', '10', '2', '200', '22', '23', '3', '4']

What I want is

['1','2','3','4','10','22','23','200']

I’ve looked around for some of the algorithms associated with sorting numeric sets, but the ones I found all involve sorting alphanumeric sets.

I know this is probably a no brainer problem but google and my textbook don’t offer anything more or less useful than the .sort() function.


回答 0

您实际上尚未将字符串转换为int。或更确切地说,您做了,但是随后您对结果什么也没做。您想要的是:

list1 = ["1","10","3","22","23","4","2","200"]
list1 = [int(x) for x in list1]
list1.sort()

如果由于某种原因需要保留字符串而不是整数(通常是一个坏主意,但是可能需要保留前导零或其他东西),则可以使用函数。sort接受一个命名参数,key该参数是在比较每个元素之前对其进行调用的函数。比较键函数的返回值,而不是直接比较列表元素:

list1 = ["1","10","3","22","23","4","2","200"]
# call int(x) on each element before comparing it
list1.sort(key=int)

You haven’t actually converted your strings to ints. Or rather, you did, but then you didn’t do anything with the results. What you want is:

list1 = ["1","10","3","22","23","4","2","200"]
list1 = [int(x) for x in list1]
list1.sort()

If for some reason you need to keep strings instead of ints (usually a bad idea, but maybe you need to preserve leading zeros or something), you can use a key function. sort takes a named parameter, key, which is a function that is called on each element before it is compared. The key function’s return values are compared instead of comparing the list elements directly:

list1 = ["1","10","3","22","23","4","2","200"]
# call int(x) on each element before comparing it
list1.sort(key=int)

回答 1

你可以传递一个函数的key参数.sort方法。这样,系统将按key(x)而不是x进行排序。

list1.sort(key=int)

顺便说一句,到列表中,以永久的整数转换,使用map功能

list1 = list(map(int, list1))   # you don't need to call list() in Python 2.x

或列表理解

list1 = [int(x) for x in list1]

You could pass a function to the key parameter to the .sort method. With this, the system will sort by key(x) instead of x.

list1.sort(key=int)

BTW, to convert the list to integers permanently, use the map function

list1 = list(map(int, list1))   # you don't need to call list() in Python 2.x

or list comprehension

list1 = [int(x) for x in list1]

回答 2

如果要使用sorted()功能:sorted(list1, key=int)

它返回一个新的排序列表。

In case you want to use sorted() function: sorted(list1, key=int)

It returns a new sorted list.


回答 3

Python的排序并不奇怪。只是这段代码:

for item in list1:
   item=int(item)

没有按照您的想法去做- item不会被替换回列表中,只是被丢弃了。

无论如何,正确的解决方案是使用key=int其他人向您展示的方法。

Python’s sort isn’t weird. It’s just that this code:

for item in list1:
   item=int(item)

isn’t doing what you think it is – item is not replaced back into the list, it is simply thrown away.

Anyway, the correct solution is to use key=int as others have shown you.


回答 4

您还可以使用:

import re

def sort_human(l):
    convert = lambda text: float(text) if text.isdigit() else text
    alphanum = lambda key: [convert(c) for c in re.split('([-+]?[0-9]*\.?[0-9]*)', key)]
    l.sort(key=alphanum)
    return l

这与您可以在互联网上找到的其他内容非常相似,但也适用于字母数字[abc0.1, abc0.2, ...]

You can also use:

import re

def sort_human(l):
    convert = lambda text: float(text) if text.isdigit() else text
    alphanum = lambda key: [convert(c) for c in re.split('([-+]?[0-9]*\.?[0-9]*)', key)]
    l.sort(key=alphanum)
    return l

This is very similar to other stuff that you can find on the internet but also works for alphanumericals like [abc0.1, abc0.2, ...].


回答 5

Seamus Campbell的答案不适用于python2.x。
list1 = sorted(list1, key=lambda e: int(e))使用lambda功能效果很好。

Seamus Campbell‘s answer doesnot work on python2.x.
list1 = sorted(list1, key=lambda e: int(e)) using lambda function works well.


回答 6

昨天我也遇到了同样的问题,并找到了一个名为[natsort] [1]的模块,它可以解决您的问题。用:

from natsort import natsorted # pip install natsort

# Example list of strings
a = ['1', '10', '2', '3', '11']

[In]  sorted(a)
[Out] ['1', '10', '11', '2', '3']

[In]  natsorted(a)
[Out] ['1', '2', '3', '10', '11']

# Your array may contain strings
[In]  natsorted(['string11', 'string3', 'string1', 'string10', 'string100'])
[Out] ['string1', 'string3', 'string10', 'string11', 'string100']

它也适用于字典sorted。[1]:https//pypi.org/project/natsort/

I approached the same problem yesterday and found a module called [natsort][1], which solves your problem. Use:

from natsort import natsorted # pip install natsort

# Example list of strings
a = ['1', '10', '2', '3', '11']

[In]  sorted(a)
[Out] ['1', '10', '11', '2', '3']

[In]  natsorted(a)
[Out] ['1', '2', '3', '10', '11']

# Your array may contain strings
[In]  natsorted(['string11', 'string3', 'string1', 'string10', 'string100'])
[Out] ['string1', 'string3', 'string10', 'string11', 'string100']

It also works for dictionaries as an equivalent of sorted. [1]: https://pypi.org/project/natsort/


回答 7

尝试此操作,它将按降序对列表进行排序(在这种情况下,无需指定键):

处理

listB = [24, 13, -15, -36, 8, 22, 48, 25, 46, -9]
listC = sorted(listB, reverse=True) # listB remains untouched
print listC

输出:

 [48, 46, 25, 24, 22, 13, 8, -9, -15, -36]

Try this, it’ll sort the list in-place in descending order (there’s no need to specify a key in this case):

Process

listB = [24, 13, -15, -36, 8, 22, 48, 25, 46, -9]
listC = sorted(listB, reverse=True) # listB remains untouched
print listC

output:

 [48, 46, 25, 24, 22, 13, 8, -9, -15, -36]

回答 8

最新的解决方案是正确的。您正在以字符串形式读取解决方案,在这种情况下,顺序为1、100、104、2、21、20010010010、3,依此类推。

您必须将输入的内容转换为int:

排序的字符串:

stringList = (1, 10, 2, 21, 3)

排序的整数:

intList = (1, 2, 3, 10, 21)

要进行转换,只需将stringList放入int(blahblah)中。

再次:

stringList = (1, 10, 2, 21, 3)

newList = int (stringList)

print newList

=> returns (1, 2, 3, 10, 21) 

The most recent solution is right. You are reading solutions as a string, in which case the order is 1, then 100, then 104 followed by 2 then 21, then 2001001010, 3 and so forth.

You have to CAST your input as an int instead:

sorted strings:

stringList = (1, 10, 2, 21, 3)

sorted ints:

intList = (1, 2, 3, 10, 21)

To cast, just put the stringList inside int ( blahblah ).

Again:

stringList = (1, 10, 2, 21, 3)

newList = int (stringList)

print newList

=> returns (1, 2, 3, 10, 21) 

回答 9

如果您想使用数字字符串更好地采用我的代码中所示的另一个列表,它将很好地工作。

list1=["1","10","3","22","23","4","2","200"]

k=[]    
for item in list1:    
    k.append(int(item))

k.sort()
print(k)
# [1, 2, 3, 4, 10, 22, 23, 200]

If you want to use strings of the numbers better take another list as shown in my code it will work fine.

list1=["1","10","3","22","23","4","2","200"]

k=[]    
for item in list1:    
    k.append(int(item))

k.sort()
print(k)
# [1, 2, 3, 4, 10, 22, 23, 200]

回答 10

排序数字列表的简单方法

numlists = ["5","50","7","51","87","97","53"]
results = list(map(int, numlists))
results.sort(reverse=False)
print(results)

Simple way to sort a numerical list

numlists = ["5","50","7","51","87","97","53"]
results = list(map(int, numlists))
results.sort(reverse=False)
print(results)

回答 11

真正的问题是按字母数字排序。因此,如果您有一个列表[‘1’,’2’,’10’,’19’]并进行排序,则您会得到[‘1’,’10’。’19’,’2’]。即10排在2之前,因为它着眼于第一个字符并以此排序。似乎python中的大多数方法都按此顺序返回事物。例如,如果您有一个名为abc的目录,且文件标记为1.jpg,2.jpg等,最多说15.jpg,并且您执行file_list = os.listdir(abc),则file_list的顺序不是您期望的那样,而是file_list = [‘1.jpg’,’11 .jpg’—’15.jpg’,’2.jpg]。如果处理文件的顺序很重要(大概是您用数字命名它们的原因),那么该顺序就不是您认为的那样。您可以通过使用“零”填充来避免这种情况。例如,如果您有一个列表alist = [’01’,’03’,’05’,’10’,’02’,’04’,’06],然后对其进行排序,则会得到所需的顺序。alist = [’01’,’02’等],因为第一个字符是1之前的0。您需要填充的零个数由列表中的最大值确定。例如,如果最大的值介于100和1000,您需要填充001、002 — 010,011–100、101等个位数。

real problem is that sort sorts things alphanumerically. So if you have a list [‘1’, ‘2’, ’10’, ’19’] and run sort you get [‘1′, ’10’. ’19’, ‘2’]. ie 10 comes before 2 because it looks at the first character and sorts starting from that. It seems most methods in python return things in that order. For example if you have a directory named abc with the files labelled as 1.jpg, 2.jpg etc say up to 15.jpg and you do file_list=os.listdir(abc) the file_list is not ordered as you expect but rather as file_list=[‘1.jpg’, ’11.jpg’—’15.jpg’, ‘2.jpg]. If the order in which files are processed is important (presumably that’s why you named them numerically) the order is not what you think it will be. You can avoid this by using “zeros” padding. For example if you have a list alist=[’01’, ’03’, ’05’, ’10’, ’02’,’04’, ’06] and you run sort on it you get the order you wanted. alist=[’01’, ’02’ etc] because the first character is 0 which comes before 1. The amount of zeros padding you need is determined by the largest value in the list.For example if the largest is say between 100 and 1000 you need to pad single digits as 001, 002 —010,011–100, 101 etc.


回答 12

scores = ['91','89','87','86','85']
scores.sort()
print (scores)

这在python版本3中对我有用,尽管在版本2中没有。

scores = ['91','89','87','86','85']
scores.sort()
print (scores)

This worked for me using python version 3, though it didn’t in version 2.


如何在Python中编写“标签”?

问题:如何在Python中编写“标签”?

假设我有一个文件。如何写“你好” TAB“ alex”?

Let’s say I have a file. How do I write “hello” TAB “alex”?


回答 0

这是代码:

f = open(filename, 'w')
f.write("hello\talex")

\t字符串的内部是水平制表符的转义序列。

This is the code:

f = open(filename, 'w')
f.write("hello\talex")

The \t inside the string is the escape sequence for the horizontal tabulation.


回答 1

Python 参考手册包括几个可以在字符串中使用的字符串文字。这些特殊的字符序列被转义序列的预期含义代替。

这是一些更有用的转义序列的表格,并描述了它们的输出。

Escape Sequence       Meaning
\t                    Tab
\\                    Inserts a back slash (\)
\'                    Inserts a single quote (')
\"                    Inserts a double quote (")
\n                    Inserts a ASCII Linefeed (a new line)

基本范例

如果我想打印一些由制表符分隔的数据点,则可以打印此字符串。

DataString = "0\t12\t24"
print (DataString)

退货

0    12    24

清单范例

这是另一个示例,其中我们正在打印列表项,并且希望通过TAB来分隔项目。

DataPoints = [0,12,24]
print (str(DataPoints[0]) + "\t" + str(DataPoints[1]) + "\t" + str(DataPoints[2]))

退货

0    12    24

原始字符串

请注意,原始字符串(包含前缀“ r”的字符串),字符串文字将被忽略。这允许将这些特殊字符序列包含在字符串中而无需更改。

DataString = r"0\t12\t24"
print (DataString)

退货

0\t12\t24

这可能是不希望的输出

弦长

还应注意,字符串文字长度仅为一个字符。

DataString = "0\t12\t24"
print (len(DataString))

退货

7

原始字符串的长度为9。

The Python reference manual includes several string literals that can be used in a string. These special sequences of characters are replaced by the intended meaning of the escape sequence.

Here is a table of some of the more useful escape sequences and a description of the output from them.

Escape Sequence       Meaning
\t                    Tab
\\                    Inserts a back slash (\)
\'                    Inserts a single quote (')
\"                    Inserts a double quote (")
\n                    Inserts a ASCII Linefeed (a new line)

Basic Example

If i wanted to print some data points separated by a tab space I could print this string.

DataString = "0\t12\t24"
print (DataString)

Returns

0    12    24

Example for Lists

Here is another example where we are printing the items of list and we want to sperate the items by a TAB.

DataPoints = [0,12,24]
print (str(DataPoints[0]) + "\t" + str(DataPoints[1]) + "\t" + str(DataPoints[2]))

Returns

0    12    24

Raw Strings

Note that raw strings (a string which include a prefix “r”), string literals will be ignored. This allows these special sequences of characters to be included in strings without being changed.

DataString = r"0\t12\t24"
print (DataString)

Returns

0\t12\t24

Which maybe an undesired output

String Lengths

It should also be noted that string literals are only one character in length.

DataString = "0\t12\t24"
print (len(DataString))

Returns

7

The raw string has a length of 9.


回答 2

您可以在字符串文字中使用\ t:

"hello\talex"

You can use \t in a string literal:

"hello\talex"


回答 3

它通常\t在命令行界面中,它将把char \t转换为空白制表符。

例如,hello\talex-> hello--->alex

It’s usually \t in command-line interfaces, which will convert the char \t into the whitespace tab character.

For example, hello\talex -> hello--->alex.


回答 4

正如未在任何答案中提到的那样,以防万一您想要对齐和间隔文本时,可以使用字符串格式功能。(在python 2.5之上)当然\t是一个TAB令牌,而所描述的方法会生成空格。

例:

print "{0:30} {1}".format("hi", "yes")
> hi                             yes

另一个示例,左对齐:

print("{0:<10} {1:<10} {2:<10}".format(1.0, 2.2, 4.4))
>1.0        2.2        4.4 

As it wasn’t mentioned in any answers, just in case you want to align and space your text, you can use the string format features. (above python 2.5) Of course \t is actually a TAB token whereas the described method generates spaces.

Example:

print "{0:30} {1}".format("hi", "yes")
> hi                             yes

Another Example, left aligned:

print("{0:<10} {1:<10} {2:<10}".format(1.0, 2.2, 4.4))
>1.0        2.2        4.4 

回答 5

以下是一些获取“ hello” TAB“ alex”(使用Python 3.6.10测试)的更奇特的Python 3方法:

"hello\N{TAB}alex"

"hello\N{tab}alex"

"hello\N{TaB}alex"

"hello\N{HT}alex"

"hello\N{CHARACTER TABULATION}alex"

"hello\N{HORIZONTAL TABULATION}alex"

"hello\x09alex"

"hello\u0009alex"

"hello\U00000009alex"

实际上,代替使用转义序列,可以将制表符直接插入字符串文字中。这是带有制表符的代码,可用于复制和尝试:

"hello alex"

如果在复制字符串期间在上方字符串中的选项卡不会丢失,则“ print(repr(<上方字符串>)”应打印’hello \ talex’。

请参阅相应的Python文档以获取参考。

Here are some more exotic Python 3 ways to get “hello” TAB “alex” (tested with Python 3.6.10):

"hello\N{TAB}alex"

"hello\N{tab}alex"

"hello\N{TaB}alex"

"hello\N{HT}alex"

"hello\N{CHARACTER TABULATION}alex"

"hello\N{HORIZONTAL TABULATION}alex"

"hello\x09alex"

"hello\u0009alex"

"hello\U00000009alex"

Actually, instead of using an escape sequence, it is possible to insert tab symbol directly into the string literal. Here is the code with a tabulation character to copy and try:

"hello alex"

If the tab in the string above won’t be lost anywhere during copying the string then “print(repr(< string from above >)” should print ‘hello\talex’.

See respective Python documentation for reference.


将浮点数向下舍入到最接近的整数?

问题:将浮点数向下舍入到最接近的整数?

如标题所示,我想取一个浮点数并将其四舍五入为最接近的整数。但是,如果它不是一个整数,那么我总是想舍入该变量,而不管它与下一个整数有多接近。有没有办法做到这一点?

As the title suggests, I want to take a floating point number and round it down to the nearest integer. However, if it’s not a whole, I ALWAYS want to round down the variable, regardless of how close it is to the next integer up. Is there a way to do this?


回答 0

简单

print int(x)

也会工作。

Simple

print int(x)

will work as well.


回答 1

其中之一应起作用:

import math
math.trunc(1.5)
> 1
math.trunc(-1.5)
> -1
math.floor(1.5)
> 1
math.floor(-1.5)
> -2

One of these should work:

import math
math.trunc(1.5)
> 1
math.trunc(-1.5)
> -1
math.floor(1.5)
> 1
math.floor(-1.5)
> -2

回答 2

x//1

//运算符返回师的地板上。由于除以1不会更改您的数字,所以这等于下限,但不需要导入。笔记:

  1. 这将返回一个浮点数
  2. 向-∞取整
x//1

The // operator returns the floor of the division. Since dividing by 1 doesn’t change your number, this is equivalent to floor but no import is needed. Notes:

  1. This returns a float
  2. This rounds towards -∞

回答 3

要获取浮点结果,只需使用:

round(x-0.5)

它也适用于负数。

To get floating point result simply use:

round(x-0.5)

It works for negative numbers as well.


回答 4

我认为您需要一个下限功能:

math.floor(x)

I think you need a floor function :

math.floor(x)


回答 5

很多人说可以使用int(x),并且在大多数情况下都可以使用,但是存在一些问题。如果OP的结果是:

x = 1.9999999999999999

它会四舍五入

x = 2

9月16日之后,它会四舍五入。如果您确定您永远不会遇到这种事情,那么这并不是什么大不了的事情。但这是要牢记的。

a lot of people say to use int(x), and this works ok for most cases, but there is a little problem. If OP’s result is:

x = 1.9999999999999999

it will round to

x = 2

after the 16th 9 it will round. This is not a big deal if you are sure you will never come across such thing. But it’s something to keep in mind.


回答 6

如果您不想导入数学,则可以使用:

int(round(x))

这是一个文档:

>>> help(round)
Help on built-in function round in module __builtin__:

round(...)
    round(number[, ndigits]) -> floating point number

    Round a number to a given precision in decimal digits (default 0 digits).
    This always returns a floating point number.  Precision may be negative.

If you don’t want to import math, you could use:

int(round(x))

Here’s a piece of documentation:

>>> help(round)
Help on built-in function round in module __builtin__:

round(...)
    round(number[, ndigits]) -> floating point number

    Round a number to a given precision in decimal digits (default 0 digits).
    This always returns a floating point number.  Precision may be negative.

回答 7

如果您使用numpy,则可以使用以下解决方案,该解决方案也适用于负数(它也适用于数组)

import numpy as np
def round_down(num):
    if num < 0:
        return -np.ceil(abs(num))
    else:
        return np.int32(num)
round_down = np.vectorize(round_down)

round_down([-1.1, -1.5, -1.6, 0, 1.1, 1.5, 1.6])
> array([-2., -2., -2.,  0.,  1.,  1.,  1.])

我认为如果仅使用math模块而不是numpy模块,它也将起作用。

If you working with numpy, you can use the following solution which also works with negative numbers (it’s also working on arrays)

import numpy as np
def round_down(num):
    if num < 0:
        return -np.ceil(abs(num))
    else:
        return np.int32(num)
round_down = np.vectorize(round_down)

round_down([-1.1, -1.5, -1.6, 0, 1.1, 1.5, 1.6])
> array([-2., -2., -2.,  0.,  1.,  1.,  1.])

I think it will also work if you just use the math module instead of numpy module.


回答 8

不知道您是否解决了这个问题,但我偶然发现了这个问题。如果要去除小数点,可以使用int(x),它将消除所有十进制数字。无需使用round(x)。

Don’t know if you solved this, but I just stumble upon this question. If you want to get rid of decimal points, you could use int(x) and it will eliminate all decimal digits. Theres no need to use round(x).


回答 9

只需取整(x-0.5),这将始终返回您的Float的下一个四舍五入的Integer值。您也可以通过do round(x + 0.5)轻松地四舍五入

Just make round(x-0.5) this will always return the next rounded down Integer value of your Float. You can also easily round up by do round(x+0.5)


回答 10

这可能很简单,但是您难道不可以将其舍去然后减去1吗?例如:

number=1.5
round(number)-1
> 1

It may be very simple, but couldn’t you just round it up then minus 1? For example:

number=1.5
round(number)-1
> 1

回答 11

我用此代码从数字中减去0.5,然后将其四舍五入,即原始数字四舍五入。

圆(a-0.5)

I used this code where you subtract 0.5 from the number and when you round it, it is the original number rounded down.

round(a-0.5)


是否应将conda或conda-forge用于Python环境?

问题:是否应将conda或conda-forge用于Python环境?

Conda并且conda-forge都是Python软件包管理器。当两个存储库中都存在一个程序包时,合适的选择是什么?例如,Django可以安装其中之一,但是两者之间的区别是几个依赖项(conda-forge还有更多)。对于这些差异没有任何解释,甚至没有简单的自述文件。

应该使用哪一个?康达或康达伪造?有关系吗?

Conda and conda-forge are both Python package managers. What is the appropriate choice when a package exists in both repositories? Django, for example, can be installed with either, but the difference between the two is several dependencies (conda-forge has many more). There is no explanation for these differences, not even a simple README.

Which one should be used? Conda or conda-forge? Does it matter?


回答 0

简短的回答是,根据我的经验,通常使用哪种都无关紧要。

长答案:

所以conda-forge是可以从其中安装的软件包的附加通道。从这个意义上讲,它没有比默认频道更特别,也没有其他任何人将软件包发布到的频道(数千个)中的任何一个。如果您在https://anaconda.org上注册并上传自己的Conda软件包,则可以添加自己的频道。

在这里,我们需要进行区分,我认为您对问题的措辞不清楚conda,即跨平台的程序包管理器和conda-forge程序包通道之间。该conda软件的主要开发人员Anaconda Inc.(以前称为Continuum IO)也维护一个单独的软件包频道,这是您在conda install packagename不更改任何选项的情况下键入的默认软件包。

有三种方法可以更改频道选项。每次安装软件包时,前两个步骤都会完成,而后一个则是持久性的。第一个是在每次安装软件包时指定一个通道:

conda install -c some-channel packagename

当然,该程序包必须存在于该通道上。这样将从进行安装packagename及其所有依赖项some-channel。或者,您可以指定:

conda install some-channel::packagename

该程序包仍然必须存在some-channel,但现在只能packagename从中提取some-channel。可以从您的默认频道列表中搜索满足依赖关系所需的任何其他软件包。

要查看您的频道配置,您可以编写:

conda config --show channels

您可以使用来控制搜索频道的顺序conda config。你可以写:

conda config --add channels some-channel

将通道添加some-channelchannels配置列表的顶部。这具有some-channel最高的优先级。当一个以上通道具有特定程序包时,优先级(部分)确定选择哪个通道。要将频道添加到列表的末尾并赋予其最低的优先级,请输入

conda config --append channels some-channel

如果您想删除添加的频道,可以通过以下方式删除

conda config --remove channels some-channel

看到

conda config -h

有关更多选项。

综上所述,使用conda-forge频道而不是defaultsAnaconda维护频道的主要原因有四个:

  1. 上的软件包conda-forge 可能defaults频道上的软件包最新
  2. conda-forge频道上的某些软件包无法从defaults
  3. 您可能希望使用诸如openblas(from conda-forge)而不是mkl(from defaults)的依赖项。
  4. 如果要安装需要编译库的软件包(例如,C扩展名或C库的包装器),则由于二进制原因,如果从单个通道在环境中安装所有软件包,则可能会减少不兼容的可能性。基本C库的兼容性(但是此建议可能会过时/将来会更改)。

The short answer is that, in my experience generally, it doesn’t matter which you use.

The long answer:

So conda-forge is an additional channel from which packages may be installed. In this sense, it is not any more special than the default channel, or any of the other hundreds (thousands?) of channels that people have posted packages to. You can add your own channel if you sign up at https://anaconda.org and upload your own Conda packages.

Here we need to make the distinction, which I think you’re not clear about from your phrasing in the question, between conda, the cross-platform package manager, and conda-forge, the package channel. Anaconda Inc. (formerly Continuum IO), the main developers of the conda software, also maintain a separate channel of packages, which is the default when you type conda install packagename without changing any options.

There are three ways to change the options for channels. The first two are done every time you install a package and the last one is persistent. The first one is to specify a channel every time you install a package:

conda install -c some-channel packagename

Of course, the package has to exist on that channel. This way will install packagename and all its dependencies from some-channel. Alternately, you can specify:

conda install some-channel::packagename

The package still has to exist on some-channel, but now, only packagename will be pulled from some-channel. Any other packages that are needed to satisfy dependencies will be searched for from your default list of channels.

To see your channel configuration, you can write:

conda config --show channels

You can control the order that channels are searched with conda config. You can write:

conda config --add channels some-channel

to add the channel some-channel to the top of the channels configuration list. This gives some-channel the highest priority. Priority determines (in part) which channel is selected when more than one channel has a particular package. To add the channel to the end of the list and give it the lowest priority, type

conda config --append channels some-channel

If you would like to remove the channel that you added, you can do so by writing

conda config --remove channels some-channel

See

conda config -h

for more options.

With all of that said, there are four main reasons to use the conda-forge channel instead of the defaults channel maintained by Anaconda:

  1. Packages on conda-forge may be more up-to-date than those on the defaults channel
  2. There are packages on the conda-forge channel that aren’t available from defaults
  3. You would prefer to use a dependency such as openblas (from conda-forge) instead of mkl (from defaults).
  4. If you are installing a package that requires a compiled library (e.g., a C extension or a wrapper around a C library), it may reduce the chance of incompatibilities if you install all of the packages in an environment from a single channel due to binary compatibility of the base C library (but this advice may be out of date/change in the future).

回答 1

Anaconda更改了服务条款,以使“大量商业用户”需要付费,其中不包括conda-forge渠道。

conda-forge如果您不想为使用付费,则可能要坚持。如文档所述

conda config --add channels conda-forge
conda config --set channel_priority strict
conda install <package-name>

你也可以使用miniforge具有conda-forge作为默认的通道,并支持ppc64le和aarch64平台,以及其他常用的。

Anaconda has changed their Terms of Service so that “heavy commercial users” would have to pay, which doesn’t include conda-forge channel.

You probably would want to stick to conda-forge if you don’t want to pay for the usage. As stated in the docs:

conda config --add channels conda-forge
conda config --set channel_priority strict
conda install <package-name>

You could also use miniforge which has conda-forge as the default channel, and supports ppc64le and aarch64 platforms as well as the other usual ones.


回答 2

在conda-forge渠道中,您可以找到针对conda构建的软件包,但尚未成为Anaconda官方发行版的一部分。

通常,您可以使用其中任何一个。

The conda-forge channel is where you can find packages that have been built for conda but yet to be part of the official Anaconda distribution.

Generally, you can use any of them.


回答 3

有些Python库无法简单安装,conda install因为除非应用conda-forge,否则它们的通道不可用。根据我的经验,与conda相比,pip更通用于研究不同的渠道来源。例如,如果要安装python-constraint,可以通过,pip install但可以通过** cond **进行安装。您必须指定频道- conda-forge

conda install -c conda-forge python-constraint // works

但不是

conda install python-constraint

There are some Python libraries that you cannot install with a simple conda install since their channel is not available unless you apply conda-forge. From my experience, pip is more generic to look into different channel sources than conda. For instance, if you want to install python-constraint you can do it via pip install but to install it via **cond **. you have to specify the channel – conda-forge.

conda install -c conda-forge python-constraint // works

but not

conda install python-constraint

如何将新列添加到Spark DataFrame(使用PySpark)?

问题:如何将新列添加到Spark DataFrame(使用PySpark)?

我有一个Spark DataFrame(使用PySpark 1.5.1),想添加一个新列。

我已经尝试了以下方法,但没有成功:

type(randomed_hours) # => list

# Create in Python and transform to RDD

new_col = pd.DataFrame(randomed_hours, columns=['new_col'])

spark_new_col = sqlContext.createDataFrame(new_col)

my_df_spark.withColumn("hours", spark_new_col["new_col"])

使用此命令也出错:

my_df_spark.withColumn("hours",  sc.parallelize(randomed_hours))

那么,如何使用PySpark将新列(基于Python向量)添加到现有DataFrame中?

I have a Spark DataFrame (using PySpark 1.5.1) and would like to add a new column.

I’ve tried the following without any success:

type(randomed_hours) # => list

# Create in Python and transform to RDD

new_col = pd.DataFrame(randomed_hours, columns=['new_col'])

spark_new_col = sqlContext.createDataFrame(new_col)

my_df_spark.withColumn("hours", spark_new_col["new_col"])

Also got an error using this:

my_df_spark.withColumn("hours",  sc.parallelize(randomed_hours))

So how do I add a new column (based on Python vector) to an existing DataFrame with PySpark?


回答 0

您不能将任意列添加到DataFrameSpark中。只能通过使用文字来创建新列(其他文字类型在如何在Spark DataFrame中添加常量列中进行了描述)。

from pyspark.sql.functions import lit

df = sqlContext.createDataFrame(
    [(1, "a", 23.0), (3, "B", -23.0)], ("x1", "x2", "x3"))

df_with_x4 = df.withColumn("x4", lit(0))
df_with_x4.show()

## +---+---+-----+---+
## | x1| x2|   x3| x4|
## +---+---+-----+---+
## |  1|  a| 23.0|  0|
## |  3|  B|-23.0|  0|
## +---+---+-----+---+

转换现有列:

from pyspark.sql.functions import exp

df_with_x5 = df_with_x4.withColumn("x5", exp("x3"))
df_with_x5.show()

## +---+---+-----+---+--------------------+
## | x1| x2|   x3| x4|                  x5|
## +---+---+-----+---+--------------------+
## |  1|  a| 23.0|  0| 9.744803446248903E9|
## |  3|  B|-23.0|  0|1.026187963170189...|
## +---+---+-----+---+--------------------+

包括使用join

from pyspark.sql.functions import exp

lookup = sqlContext.createDataFrame([(1, "foo"), (2, "bar")], ("k", "v"))
df_with_x6 = (df_with_x5
    .join(lookup, col("x1") == col("k"), "leftouter")
    .drop("k")
    .withColumnRenamed("v", "x6"))

## +---+---+-----+---+--------------------+----+
## | x1| x2|   x3| x4|                  x5|  x6|
## +---+---+-----+---+--------------------+----+
## |  1|  a| 23.0|  0| 9.744803446248903E9| foo|
## |  3|  B|-23.0|  0|1.026187963170189...|null|
## +---+---+-----+---+--------------------+----+

或使用函数/ udf生成:

from pyspark.sql.functions import rand

df_with_x7 = df_with_x6.withColumn("x7", rand())
df_with_x7.show()

## +---+---+-----+---+--------------------+----+-------------------+
## | x1| x2|   x3| x4|                  x5|  x6|                 x7|
## +---+---+-----+---+--------------------+----+-------------------+
## |  1|  a| 23.0|  0| 9.744803446248903E9| foo|0.41930610446846617|
## |  3|  B|-23.0|  0|1.026187963170189...|null|0.37801881545497873|
## +---+---+-----+---+--------------------+----+-------------------+

在性能方面,pyspark.sql.functions映射到Catalyst表达式的内置函数()通常优于Python用户定义的函数。

如果要添加任意RDD的内容作为列,则可以

  • 行号添加到现有数据框
  • 调用zipWithIndexRDD并将其转换为数据帧
  • 使用索引作为连接键来连接两者

You cannot add an arbitrary column to a DataFrame in Spark. New columns can be created only by using literals (other literal types are described in How to add a constant column in a Spark DataFrame?)

from pyspark.sql.functions import lit

df = sqlContext.createDataFrame(
    [(1, "a", 23.0), (3, "B", -23.0)], ("x1", "x2", "x3"))

df_with_x4 = df.withColumn("x4", lit(0))
df_with_x4.show()

## +---+---+-----+---+
## | x1| x2|   x3| x4|
## +---+---+-----+---+
## |  1|  a| 23.0|  0|
## |  3|  B|-23.0|  0|
## +---+---+-----+---+

transforming an existing column:

from pyspark.sql.functions import exp

df_with_x5 = df_with_x4.withColumn("x5", exp("x3"))
df_with_x5.show()

## +---+---+-----+---+--------------------+
## | x1| x2|   x3| x4|                  x5|
## +---+---+-----+---+--------------------+
## |  1|  a| 23.0|  0| 9.744803446248903E9|
## |  3|  B|-23.0|  0|1.026187963170189...|
## +---+---+-----+---+--------------------+

included using join:

from pyspark.sql.functions import exp

lookup = sqlContext.createDataFrame([(1, "foo"), (2, "bar")], ("k", "v"))
df_with_x6 = (df_with_x5
    .join(lookup, col("x1") == col("k"), "leftouter")
    .drop("k")
    .withColumnRenamed("v", "x6"))

## +---+---+-----+---+--------------------+----+
## | x1| x2|   x3| x4|                  x5|  x6|
## +---+---+-----+---+--------------------+----+
## |  1|  a| 23.0|  0| 9.744803446248903E9| foo|
## |  3|  B|-23.0|  0|1.026187963170189...|null|
## +---+---+-----+---+--------------------+----+

or generated with function / udf:

from pyspark.sql.functions import rand

df_with_x7 = df_with_x6.withColumn("x7", rand())
df_with_x7.show()

## +---+---+-----+---+--------------------+----+-------------------+
## | x1| x2|   x3| x4|                  x5|  x6|                 x7|
## +---+---+-----+---+--------------------+----+-------------------+
## |  1|  a| 23.0|  0| 9.744803446248903E9| foo|0.41930610446846617|
## |  3|  B|-23.0|  0|1.026187963170189...|null|0.37801881545497873|
## +---+---+-----+---+--------------------+----+-------------------+

Performance-wise, built-in functions (pyspark.sql.functions), which map to Catalyst expression, are usually preferred over Python user defined functions.

If you want to add content of an arbitrary RDD as a column you can


回答 1

要使用UDF添加列:

df = sqlContext.createDataFrame(
    [(1, "a", 23.0), (3, "B", -23.0)], ("x1", "x2", "x3"))

from pyspark.sql.functions import udf
from pyspark.sql.types import *

def valueToCategory(value):
   if   value == 1: return 'cat1'
   elif value == 2: return 'cat2'
   ...
   else: return 'n/a'

# NOTE: it seems that calls to udf() must be after SparkContext() is called
udfValueToCategory = udf(valueToCategory, StringType())
df_with_cat = df.withColumn("category", udfValueToCategory("x1"))
df_with_cat.show()

## +---+---+-----+---------+
## | x1| x2|   x3| category|
## +---+---+-----+---------+
## |  1|  a| 23.0|     cat1|
## |  3|  B|-23.0|      n/a|
## +---+---+-----+---------+

To add a column using a UDF:

df = sqlContext.createDataFrame(
    [(1, "a", 23.0), (3, "B", -23.0)], ("x1", "x2", "x3"))

from pyspark.sql.functions import udf
from pyspark.sql.types import *

def valueToCategory(value):
   if   value == 1: return 'cat1'
   elif value == 2: return 'cat2'
   ...
   else: return 'n/a'

# NOTE: it seems that calls to udf() must be after SparkContext() is called
udfValueToCategory = udf(valueToCategory, StringType())
df_with_cat = df.withColumn("category", udfValueToCategory("x1"))
df_with_cat.show()

## +---+---+-----+---------+
## | x1| x2|   x3| category|
## +---+---+-----+---------+
## |  1|  a| 23.0|     cat1|
## |  3|  B|-23.0|      n/a|
## +---+---+-----+---------+

回答 2

对于Spark 2.0

# assumes schema has 'age' column 
df.select('*', (df.age + 10).alias('agePlusTen'))

For Spark 2.0

# assumes schema has 'age' column 
df.select('*', (df.age + 10).alias('agePlusTen'))

回答 3

我们可以通过多种方式在pySpark中添加新列。

让我们首先创建一个简单的DataFrame。

date = [27, 28, 29, None, 30, 31]
df = spark.createDataFrame(date, IntegerType())

现在,让我们尝试将列值加倍并将其存储在新列中。PFB很少有不同的方法可以实现相同。

# Approach - 1 : using withColumn function
df.withColumn("double", df.value * 2).show()

# Approach - 2 : using select with alias function.
df.select("*", (df.value * 2).alias("double")).show()

# Approach - 3 : using selectExpr function with as clause.
df.selectExpr("*", "value * 2 as double").show()

# Approach - 4 : Using as clause in SQL statement.
df.createTempView("temp")
spark.sql("select *, value * 2 as double from temp").show()

有关Spark DataFrame函数的更多示例和说明,请访问我的博客

我希望这有帮助。

There are multiple ways we can add a new column in pySpark.

Let’s first create a simple DataFrame.

date = [27, 28, 29, None, 30, 31]
df = spark.createDataFrame(date, IntegerType())

Now let’s try to double the column value and store it in a new column. PFB few different approaches to achieve the same.

# Approach - 1 : using withColumn function
df.withColumn("double", df.value * 2).show()

# Approach - 2 : using select with alias function.
df.select("*", (df.value * 2).alias("double")).show()

# Approach - 3 : using selectExpr function with as clause.
df.selectExpr("*", "value * 2 as double").show()

# Approach - 4 : Using as clause in SQL statement.
df.createTempView("temp")
spark.sql("select *, value * 2 as double from temp").show()

For more examples and explanation on spark DataFrame functions, you can visit my blog.

I hope this helps.


回答 4

您可以udf在添加时定义一个新的column_name

u_f = F.udf(lambda :yourstring,StringType())
a.select(u_f().alias('column_name')

You can define a new udf when adding a column_name:

u_f = F.udf(lambda :yourstring,StringType())
a.select(u_f().alias('column_name')

回答 5

from pyspark.sql.functions import udf
from pyspark.sql.types import *
func_name = udf(
    lambda val: val, # do sth to val
    StringType()
)
df.withColumn('new_col', func_name(df.old_col))
from pyspark.sql.functions import udf
from pyspark.sql.types import *
func_name = udf(
    lambda val: val, # do sth to val
    StringType()
)
df.withColumn('new_col', func_name(df.old_col))

回答 6

我想提供一个非常相似的用例的通用示例:

用例:我的csv包含:

First|Third|Fifth
data|data|data
data|data|data
...billion more lines

我需要执行一些转换,最终的csv需要看起来像

First|Second|Third|Fourth|Fifth
data|null|data|null|data
data|null|data|null|data
...billion more lines

我需要执行此操作,因为这是某些模型定义的架构,并且我需要最终数据与SQL Bulk Inserts等具有互操作性。

所以:

1)我使用spark.read读取原始的csv,并将其称为“ df”。

2)我对数据做了一些处理。

3)我使用此脚本添加空列:

outcols = []
for column in MY_COLUMN_LIST:
    if column in df.columns:
        outcols.append(column)
    else:
        outcols.append(lit(None).cast(StringType()).alias('{0}'.format(column)))

df = df.select(outcols)

这样,您可以在加载csv之后构造架构(如果必须对许多表执行此操作,也可以对列进行重新排序)。

I would like to offer a generalized example for a very similar use case:

Use Case: I have a csv consisting of:

First|Third|Fifth
data|data|data
data|data|data
...billion more lines

I need to perform some transformations and the final csv needs to look like

First|Second|Third|Fourth|Fifth
data|null|data|null|data
data|null|data|null|data
...billion more lines

I need to do this because this is the schema defined by some model and I need for my final data to be interoperable with SQL Bulk Inserts and such things.

so:

1) I read the original csv using spark.read and call it “df”.

2) I do something to the data.

3) I add the null columns using this script:

outcols = []
for column in MY_COLUMN_LIST:
    if column in df.columns:
        outcols.append(column)
    else:
        outcols.append(lit(None).cast(StringType()).alias('{0}'.format(column)))

df = df.select(outcols)

In this way, you can structure your schema after loading a csv (would also work for reordering columns if you have to do this for many tables).


回答 7

添加列的最简单方法是使用“ withColumn”。由于数据框是使用sqlContext创建的,因此您必须指定架构或默认情况下可以在数据集中使用。如果指定了架构,则每次更改时工作量都会变得很乏味。

您可以考虑以下示例:

from pyspark.sql import SQLContext
from pyspark.sql.types import *
sqlContext = SQLContext(sc) # SparkContext will be sc by default 

# Read the dataset of your choice (Already loaded with schema)
Data = sqlContext.read.csv("/path", header = True/False, schema = "infer", sep = "delimiter")

# For instance the data has 30 columns from col1, col2, ... col30. If you want to add a 31st column, you can do so by the following:
Data = Data.withColumn("col31", "Code goes here")

# Check the change 
Data.printSchema()

The simplest way to add a column is to use “withColumn”. Since the dataframe is created using sqlContext, you have to specify the schema or by default can be available in the dataset. If the schema is specified, the workload becomes tedious when changing every time.

Below is an example that you can consider:

from pyspark.sql import SQLContext
from pyspark.sql.types import *
sqlContext = SQLContext(sc) # SparkContext will be sc by default 

# Read the dataset of your choice (Already loaded with schema)
Data = sqlContext.read.csv("/path", header = True/False, schema = "infer", sep = "delimiter")

# For instance the data has 30 columns from col1, col2, ... col30. If you want to add a 31st column, you can do so by the following:
Data = Data.withColumn("col31", "Code goes here")

# Check the change 
Data.printSchema()

回答 8

我们可以通过以下步骤直接向DataFrame添加其他列:

from pyspark.sql.functions import when
df = spark.createDataFrame([["amit", 30], ["rohit", 45], ["sameer", 50]], ["name", "age"])
df = df.withColumn("profile", when(df.age >= 40, "Senior").otherwise("Executive"))
df.show()

We can add additional columns to DataFrame directly with below steps:

from pyspark.sql.functions import when
df = spark.createDataFrame([["amit", 30], ["rohit", 45], ["sameer", 50]], ["name", "age"])
df = df.withColumn("profile", when(df.age >= 40, "Senior").otherwise("Executive"))
df.show()

安全地从字典中删除多个键

问题:安全地从字典中删除多个键

我知道d安全地从字典中删除条目“键” ,您可以这样做:

if d.has_key('key'):
    del d['key']

但是,我需要安全地从字典中删除多个条目。我正在考虑在元组中定义条目,因为我将需要多次执行此操作。

entitiesToREmove = ('a', 'b', 'c')
for x in entitiesToRemove:
    if d.has_key(x):
        del d[x]

但是,我想知道是否有更聪明的方法来做到这一点?

I know how to remove an entry, 'key' from my dictionary d, safely. You do:

if d.has_key('key'):
    del d['key']

However, I need to remove multiple entries from a dictionary safely. I was thinking of defining the entries in a tuple as I will need to do this more than once.

entities_to_remove = ('a', 'b', 'c')
for x in entities_to_remove:
    if x in d:
        del d[x]

However, I was wondering if there is a smarter way to do this?


回答 0

为什么不这样:

entries = ('a', 'b', 'c')
the_dict = {'b': 'foo'}

def entries_to_remove(entries, the_dict):
    for key in entries:
        if key in the_dict:
            del the_dict[key]

mattbornski使用dict.pop()提供了一个更紧凑的版本

Why not like this:

entries = ('a', 'b', 'c')
the_dict = {'b': 'foo'}

def entries_to_remove(entries, the_dict):
    for key in entries:
        if key in the_dict:
            del the_dict[key]

A more compact version was provided by mattbornski using dict.pop()


回答 1

d = {'some':'data'}
entriesToRemove = ('any', 'iterable')
for k in entriesToRemove:
    d.pop(k, None)

Using dict.pop:

d = {'some': 'data'}
entries_to_remove = ('any', 'iterable')
for k in entries_to_remove:
    d.pop(k, None)

回答 2

使用词典理解

final_dict = {key: t[key] for key in t if key not in [key1, key2]}

其中key1key2将被删除。

在下面的示例中,将删除键“ b”和“ c”并将其保存在键列表中。

>>> a
{'a': 1, 'c': 3, 'b': 2, 'd': 4}
>>> keys = ["b", "c"]
>>> print {key: a[key] for key in a if key not in keys}
{'a': 1, 'd': 4}
>>> 

Using Dict Comprehensions

final_dict = {key: t[key] for key in t if key not in [key1, key2]}

where key1 and key2 are to be removed.

In the example below, keys “b” and “c” are to be removed & it’s kept in a keys list.

>>> a
{'a': 1, 'c': 3, 'b': 2, 'd': 4}
>>> keys = ["b", "c"]
>>> print {key: a[key] for key in a if key not in keys}
{'a': 1, 'd': 4}
>>> 

回答 3

解决方案正在使用mapfilter起作用

Python2

d={"a":1,"b":2,"c":3}
l=("a","b","d")
map(d.__delitem__, filter(d.__contains__,l))
print(d)

Python3

d={"a":1,"b":2,"c":3}
l=("a","b","d")
list(map(d.__delitem__, filter(d.__contains__,l)))
print(d)

你得到:

{'c': 3}

a solution is using map and filter functions

python 2

d={"a":1,"b":2,"c":3}
l=("a","b","d")
map(d.__delitem__, filter(d.__contains__,l))
print(d)

python 3

d={"a":1,"b":2,"c":3}
l=("a","b","d")
list(map(d.__delitem__, filter(d.__contains__,l)))
print(d)

you get:

{'c': 3}

回答 4

如果还需要检索要删除的键的值,这将是一个很好的方法:

valuesRemoved = [d.pop(k, None) for k in entitiesToRemove]

当然,您仍然可以仅从中删除键来执行此操作d,但是您将不必要使用列表理解来创建值列表。只是为了函数的副作用而使用列表理解也有点不清楚。

If you also need to retrieve the values for the keys you are removing, this would be a pretty good way to do it:

values_removed = [d.pop(k, None) for k in entities_to_remove]

You could of course still do this just for the removal of the keys from d, but you would be unnecessarily creating the list of values with the list comprehension. It is also a little unclear to use a list comprehension just for the function’s side effect.


回答 5

发现用溶液popmap

d = {'a': 'valueA', 'b': 'valueB', 'c': 'valueC', 'd': 'valueD'}
keys = ['a', 'b', 'c']
list(map(d.pop, keys))
print(d)

此输出:

{'d': 'valueD'}

我这么晚才回答了这个问题,只是因为我认为如果有人进行搜索,将来会有所帮助。这可能会有所帮助。

更新资料

如果字典中不存在键,则以上代码将引发错误。

DICTIONARY = {'a': 'valueA', 'b': 'valueB', 'c': 'valueC', 'd': 'valueD'}
keys = ['a', 'l', 'c']

def remove_keys(key):
    try:
        DICTIONARY.pop(key, None)
    except:
        pass  # or do any action

list(map(remove_key, keys))
print(DICTIONARY)

输出:

DICTIONARY = {'b': 'valueB', 'd': 'valueD'}

Found a solution with pop and map

d = {'a': 'valueA', 'b': 'valueB', 'c': 'valueC', 'd': 'valueD'}
keys = ['a', 'b', 'c']
list(map(d.pop, keys))
print(d)

The output of this:

{'d': 'valueD'}

I have answered this question so late just because I think it will help in the future if anyone searches the same. And this might help.

Update

The above code will throw an error if a key does not exist in the dict.

DICTIONARY = {'a': 'valueA', 'b': 'valueB', 'c': 'valueC', 'd': 'valueD'}
keys = ['a', 'l', 'c']

def remove_keys(key):
    try:
        DICTIONARY.pop(key, None)
    except:
        pass  # or do any action

list(map(remove_key, keys))
print(DICTIONARY)

output:

DICTIONARY = {'b': 'valueB', 'd': 'valueD'}

回答 6

任何现有的答案我都没有问题,但是我很惊讶没有找到这个解决方案:

keys_to_remove = ['a', 'b', 'c']
my_dict = {k: v for k, v in zip("a b c d e f g".split(' '), [0, 1, 2, 3, 4, 5, 6])}

for k in keys_to_remove:
    try:
        del my_dict[k]
    except KeyError:
        pass

assert my_dict == {'d': 3, 'e': 4, 'f': 5, 'g': 6}

注:我碰到这个问题,从跌跌撞撞来这里。我的答案与此答案有关

I have no problem with any of the existing answers, but I was surprised to not find this solution:

keys_to_remove = ['a', 'b', 'c']
my_dict = {k: v for k, v in zip("a b c d e f g".split(' '), [0, 1, 2, 3, 4, 5, 6])}

for k in keys_to_remove:
    try:
        del my_dict[k]
    except KeyError:
        pass

assert my_dict == {'d': 3, 'e': 4, 'f': 5, 'g': 6}

Note: I stumbled across this question coming from here. And my answer is related to this answer.


回答 7

为什么不:

entriestoremove = (2,5,1)
for e in entriestoremove:
    if d.has_key(e):
        del d[e]

我不知道您所说的“更聪明的方式”。当然,还有其他方法,也许是对字典的理解:

entriestoremove = (2,5,1)
newdict = {x for x in d if x not in entriestoremove}

Why not:

entriestoremove = (2,5,1)
for e in entriestoremove:
    if d.has_key(e):
        del d[e]

I don’t know what you mean by “smarter way”. Surely there are other ways, maybe with dictionary comprehensions:

entriestoremove = (2,5,1)
newdict = {x for x in d if x not in entriestoremove}

回答 8

排队

import functools

#: not key(c) in d
d = {"a": "avalue", "b": "bvalue", "d": "dvalue"}

entitiesToREmove = ('a', 'b', 'c')

#: python2
map(lambda x: functools.partial(d.pop, x, None)(), entitiesToREmove)

#: python3

list(map(lambda x: functools.partial(d.pop, x, None)(), entitiesToREmove))

print(d)
# output: {'d': 'dvalue'}

inline

import functools

#: not key(c) in d
d = {"a": "avalue", "b": "bvalue", "d": "dvalue"}

entitiesToREmove = ('a', 'b', 'c')

#: python2
map(lambda x: functools.partial(d.pop, x, None)(), entitiesToREmove)

#: python3

list(map(lambda x: functools.partial(d.pop, x, None)(), entitiesToREmove))

print(d)
# output: {'d': 'dvalue'}

回答 9

对cpython 3的一些计时测试表明,简单的for循环是最快的方法,并且可读性强。添加一个函数也不会导致太多开销:

timeit结果(10000次迭代):

  • all(x.pop(v) for v in r) # 0.85
  • all(map(x.pop, r)) # 0.60
  • list(map(x.pop, r)) # 0.70
  • all(map(x.__delitem__, r)) # 0.44
  • del_all(x, r) # 0.40
  • <inline for loop>(x, r) # 0.35
def del_all(mapping, to_remove):
      """Remove list of elements from mapping."""
      for key in to_remove:
          del mapping[key]

对于小迭代,由于函数调用的开销,执行“内联”要快一些。但是,del_all它比所有python理解和映射结构都更安全,可重用并且运行速度更快。

Some timing tests for cpython 3 shows that a simple for loop is the fastest way, and it’s quite readable. Adding in a function doesn’t cause much overhead either:

timeit results (10k iterations):

  • all(x.pop(v) for v in r) # 0.85
  • all(map(x.pop, r)) # 0.60
  • list(map(x.pop, r)) # 0.70
  • all(map(x.__delitem__, r)) # 0.44
  • del_all(x, r) # 0.40
  • <inline for loop>(x, r) # 0.35
def del_all(mapping, to_remove):
      """Remove list of elements from mapping."""
      for key in to_remove:
          del mapping[key]

For small iterations, doing that ‘inline’ was a bit faster, because of the overhead of the function call. But del_all is lint-safe, reusable, and faster than all the python comprehension and mapping constructs.


回答 10

我认为,如果您使用的是python 3,最好将键视为一个集合:

def remove_keys(d, keys):
    to_remove = set(keys)
    filtered_keys = d.keys() - to_remove
    filtered_values = map(d.get, filtered_keys)
    return dict(zip(filtered_keys, filtered_values))

例:

>>> remove_keys({'k1': 1, 'k3': 3}, ['k1', 'k2'])
{'k3': 3}

I think using the fact that the keys can be treated as a set is the nicest way if you’re on python 3:

def remove_keys(d, keys):
    to_remove = set(keys)
    filtered_keys = d.keys() - to_remove
    filtered_values = map(d.get, filtered_keys)
    return dict(zip(filtered_keys, filtered_values))

Example:

>>> remove_keys({'k1': 1, 'k3': 3}, ['k1', 'k2'])
{'k3': 3}

回答 11

完全支持字典的set方法(而不是我们在Python 3.9中遇到的麻烦)是很好的,这样您就可以简单地“删除”一组键。但是,只要不是这种情况,并且您有一个大型词典并且可能要删除大量键,则可能需要了解性能。因此,我创建了一些代码,该代码创建的大小足以进行有意义的比较:100,000 x 1000矩阵,因此总共10,000,00个项目。

from itertools import product
from time import perf_counter

# make a complete worksheet 100000 * 1000
start = perf_counter()
prod = product(range(1, 100000), range(1, 1000))
cells = {(x,y):x for x,y in prod}
print(len(cells))

print(f"Create time {perf_counter()-start:.2f}s")
clock = perf_counter()
# remove everything above row 50,000

keys = product(range(50000, 100000), range(1, 100))

# for x,y in keys:
#     del cells[x, y]

for n in map(cells.pop, keys):
    pass

print(len(cells))
stop = perf_counter()
print(f"Removal time {stop-clock:.2f}s")

在某些情况下,1000万个或更多的项目并不罕见。比较本地计算机上的这两种方法,我发现使用map和时会略有改善pop,大概是因为调用的函数较少,但是这两种方法在我的计算机上大约需要2.5秒的时间。但这与首先创建字典(55s)或在循环中包括检查所需的时间相比显得苍白。如果可能,那么最好创建一个集合,该集合是字典键和过滤器的交集:

keys = cells.keys() & keys

总结:del已经进行了优化,所以不用担心使用它。

It would be nice to have full support for set methods for dictionaries (and not the unholy mess we’re getting with Python 3.9) so that you could simply “remove” a set of keys. However, as long as that’s not the case, and you have a large dictionary with potentially a large number of keys to remove, you might want to know about the performance. So, I’ve created some code that creates something large enough for meaningful comparisons: a 100,000 x 1000 matrix, so 10,000,00 items in total.

from itertools import product
from time import perf_counter

# make a complete worksheet 100000 * 1000
start = perf_counter()
prod = product(range(1, 100000), range(1, 1000))
cells = {(x,y):x for x,y in prod}
print(len(cells))

print(f"Create time {perf_counter()-start:.2f}s")
clock = perf_counter()
# remove everything above row 50,000

keys = product(range(50000, 100000), range(1, 100))

# for x,y in keys:
#     del cells[x, y]

for n in map(cells.pop, keys):
    pass

print(len(cells))
stop = perf_counter()
print(f"Removal time {stop-clock:.2f}s")

10 million items or more is not unusual in some settings. Comparing the two methods on my local machine I see a slight improvement when using map and pop, presumably because of fewer function calls, but both take around 2.5s on my machine. But this pales in comparison to the time required to create the dictionary in the first place (55s), or including checks within the loop. If this is likely then its best to create a set that is a intersection of the dictionary keys and your filter:

keys = cells.keys() & keys

In summary: del is already heavily optimised, so don’t worry about using it.


回答 12

我迟到了这个讨论,但对于其他人。解决方案可以是这样创建键列表。

k = ['a','b','c','d']

然后在列表推导或for循环中使用pop()遍历这些键,并一次弹出一个键。

new_dictionary = [dictionary.pop(x, 'n/a') for x in k]

如果密钥不存在,则“ n / a”,则需要返回默认值。

I’m late to this discussion but for anyone else. A solution may be to create a list of keys as such.

k = ['a','b','c','d']

Then use pop() in a list comprehension, or for loop, to iterate over the keys and pop one at a time as such.

new_dictionary = [dictionary.pop(x, 'n/a') for x in k]

The ‘n/a’ is in case the key does not exist, a default value needs to be returned.


有趣好用的Python教程

退出移动版
微信支付
请使用 微信 扫码支付