标签归档:Python

从字符串中删除前x个字符?

问题:从字符串中删除前x个字符?

如何从字符串中删除前x个字符?例如,如果一个人有一个字符串lipsum,他们将如何删除前三个字符并得到结果sum

How might one remove the first x characters from a string? For example, if one had a string lipsum, how would they remove the first 3 characters and get a result of sum?


回答 0

>>> text = 'lipsum'
>>> text[3:]
'sum'

有关更多信息,请参见有关字符串的官方文档,有关符号的简要概述,请参见此SO答案。

>>> text = 'lipsum'
>>> text[3:]
'sum'

See the official documentation on strings for more information and this SO answer for a concise summary of the notation.


回答 1

另一种方法(取决于您的实际需求):如果要弹出前n个字符并同时保存弹出的字符和修改后的字符串:

s = 'lipsum'
n = 3
a, s = s[:n], s[n:]
print(a)
# lip
print(s)
# sum

Another way (depending on your actual needs): If you want to pop the first n characters and save both the popped characters and the modified string:

s = 'lipsum'
n = 3
a, s = s[:n], s[n:]
print(a)
# lip
print(s)
# sum

回答 2

>>> x = 'lipsum'
>>> x.replace(x[:3], '')
'sum'
>>> x = 'lipsum'
>>> x.replace(x[:3], '')
'sum'

回答 3

使用del

例:

>>> text = 'lipsum'
>>> l = list(text)
>>> del l[3:]
>>> ''.join(l)
'sum'

Use del.

Example:

>>> text = 'lipsum'
>>> l = list(text)
>>> del l[3:]
>>> ''.join(l)
'sum'

回答 4

示例显示帐号的后3位数字。

x = '1234567890'   
x.replace(x[:7], '')

o/p: '890'

Example to show last 3 digits of account number.

x = '1234567890'   
x.replace(x[:7], '')

o/p: '890'

“无法解包的值太多”异常

问题:“无法解包的值太多”异常

我正在Django中进行项目开发,我刚刚开始尝试扩展User模型以创建用户个人资料。

不幸的是,我遇到一个问题:每次尝试在模板(user.get_template.lastIP例如)中获取用户的个人资料时,都会出现以下错误:

环境:

请求方法:GET
要求网址:http:// localhost:8000 /
Django版本:1.1
Python版本:2.6.1

模板错误:
在模板/path/to/base.tpl中,第19行出现错误
   渲染时遇到异常:太多值无法解压

19:您好,{{user.username}}({{user.get_profile.rep}})。近况如何?登出


异常类型:/处的TemplateSyntaxError
异常值:渲染时捕获到异常:太多值无法解包

关于发生了什么或我做错了什么的任何想法?

I’m working on a project in Django and I’ve just started trying to extend the User model in order to make user profiles.

Unfortunately, I’ve run into a problem: Every time I try to get the user’s profile inside of a template (user.get_template.lastIP, for example), I get the following error:

Environment:

Request Method: GET
Request URL: http://localhost:8000/
Django Version: 1.1
Python Version: 2.6.1

Template error:
In template /path/to/base.tpl, error at line 19
   Caught an exception while rendering: too many values to unpack

19 :                Hello, {{user.username}} ({{ user.get_profile.rep}}). How's it goin? Logout


Exception Type: TemplateSyntaxError at /
Exception Value: Caught an exception while rendering: too many values to unpack

Any ideas as to what’s going on or what I’m doing wrong?


回答 0

该异常意味着您正试图解压缩一个元组,但是相对于目标变量的数量,该元组的值太多。例如:这项工作,先打印1,然后打印2,然后打印3

def returnATupleWithThreeValues():
    return (1,2,3)
a,b,c = returnATupleWithThreeValues()
print a
print b
print c

但这会引发您的错误

def returnATupleWithThreeValues():
    return (1,2,3)
a,b = returnATupleWithThreeValues()
print a
print b

加薪

Traceback (most recent call last):
  File "c.py", line 3, in ?
    a,b = returnATupleWithThreeValues()
ValueError: too many values to unpack

现在,我不知道在您的情况下发生这种情况的原因,但也许此答案将为您指明正确的方向。

That exception means that you are trying to unpack a tuple, but the tuple has too many values with respect to the number of target variables. For example: this work, and prints 1, then 2, then 3

def returnATupleWithThreeValues():
    return (1,2,3)
a,b,c = returnATupleWithThreeValues()
print a
print b
print c

But this raises your error

def returnATupleWithThreeValues():
    return (1,2,3)
a,b = returnATupleWithThreeValues()
print a
print b

raises

Traceback (most recent call last):
  File "c.py", line 3, in ?
    a,b = returnATupleWithThreeValues()
ValueError: too many values to unpack

Now, the reason why this happens in your case, I don’t know, but maybe this answer will point you in the right direction.


回答 1

尝试解压缩一个变量,

python会将其作为列表处理,

然后从清单中打开包装

def returnATupleWithThreeValues():
    return (1,2,3)
a = returnATupleWithThreeValues() # a is a list (1,2,3)
print a[0] # list[0] = 1
print a[1] # list[1] = 2
print a[2] # list[2] = 3

try unpacking in one variable,

python will handle it as a list,

then unpack from the list

def returnATupleWithThreeValues():
    return (1,2,3)
a = returnATupleWithThreeValues() # a is a list (1,2,3)
print a[0] # list[0] = 1
print a[1] # list[1] = 2
print a[2] # list[2] = 3

回答 2

这个问题看起来很熟悉,所以我想我可以看看能否从有限的信息中进行复制。

快速搜索在James Bennett的博客中找到了一个条目,其中提到在使用UserProfile扩展User模型时,settings.py中的常见错误可能导致Django抛出此错误。

引用博客条目:

该设置的值不是“ appname.models.modelname”,而只是“ appname.modelname”。原因是Django没有使用它进行直接导入。相反,它使用内部模型加载功能,该功能仅需要应用程序的名称和模型的名称。尝试在AUTH_PROFILE_MODULE设置中执行“ appname.models.modelname”或“ projectname.appname.models.modelname”之类的操作会导致Django崩溃,并出现可怕的“太多值无法解包”错误,因此请确保您已经将“ appname.modelname”放在AUTH_PROFILE_MODULE的值中,不要输入其他任何内容。

如果OP复制了更多的回溯,我希望看到类似下面的内容,可以通过在我的AUTH_PROFILE_MODULE设置中添加“模型”来进行复制。

TemplateSyntaxError at /

Caught an exception while rendering: too many values to unpack

Original Traceback (most recent call last):
  File "/home/brandon/Development/DJANGO_VERSIONS/Django-1.0/django/template/debug.py", line 71, in render_node
    result = node.render(context)
  File "/home/brandon/Development/DJANGO_VERSIONS/Django-1.0/django/template/debug.py", line 87, in render
    output = force_unicode(self.filter_expression.resolve(context))
  File "/home/brandon/Development/DJANGO_VERSIONS/Django-1.0/django/template/__init__.py", line 535, in resolve
    obj = self.var.resolve(context)
  File "/home/brandon/Development/DJANGO_VERSIONS/Django-1.0/django/template/__init__.py", line 676, in resolve
    value = self._resolve_lookup(context)
  File "/home/brandon/Development/DJANGO_VERSIONS/Django-1.0/django/template/__init__.py", line 711, in _resolve_lookup
    current = current()
  File "/home/brandon/Development/DJANGO_VERSIONS/Django-1.0/django/contrib/auth/models.py", line 291, in get_profile
    app_label, model_name = settings.AUTH_PROFILE_MODULE.split('.')
ValueError: too many values to unpack

我认为这是Django仍然具有一些导入魔术的少数情况之一,当小错误未引发预期的异常时,该魔术往往会引起混乱。

您可以在回溯结束时看到,我发布了如何对AUTH_PROFILE_MODULE使用除形式为“ appname.modelname”以外的任何内容会导致“ app_label,model_name = settings.AUTH_PROFILE_MODULE.split(’。’)”这一行引发“无法解包的值太多”错误。

我99%确信这是这里遇到的原始问题。

This problem looked familiar so I thought I’d see if I could replicate from the limited amount of information.

A quick search turned up an entry in James Bennett’s blog here which mentions that when working with the UserProfile to extend the User model a common mistake in settings.py can cause Django to throw this error.

To quote the blog entry:

The value of the setting is not “appname.models.modelname”, it’s just “appname.modelname”. The reason is that Django is not using this to do a direct import; instead, it’s using an internal model-loading function which only wants the name of the app and the name of the model. Trying to do things like “appname.models.modelname” or “projectname.appname.models.modelname” in the AUTH_PROFILE_MODULE setting will cause Django to blow up with the dreaded “too many values to unpack” error, so make sure you’ve put “appname.modelname”, and nothing else, in the value of AUTH_PROFILE_MODULE.

If the OP had copied more of the traceback I would expect to see something like the one below which I was able to duplicate by adding “models” to my AUTH_PROFILE_MODULE setting.

TemplateSyntaxError at /

Caught an exception while rendering: too many values to unpack

Original Traceback (most recent call last):
  File "/home/brandon/Development/DJANGO_VERSIONS/Django-1.0/django/template/debug.py", line 71, in render_node
    result = node.render(context)
  File "/home/brandon/Development/DJANGO_VERSIONS/Django-1.0/django/template/debug.py", line 87, in render
    output = force_unicode(self.filter_expression.resolve(context))
  File "/home/brandon/Development/DJANGO_VERSIONS/Django-1.0/django/template/__init__.py", line 535, in resolve
    obj = self.var.resolve(context)
  File "/home/brandon/Development/DJANGO_VERSIONS/Django-1.0/django/template/__init__.py", line 676, in resolve
    value = self._resolve_lookup(context)
  File "/home/brandon/Development/DJANGO_VERSIONS/Django-1.0/django/template/__init__.py", line 711, in _resolve_lookup
    current = current()
  File "/home/brandon/Development/DJANGO_VERSIONS/Django-1.0/django/contrib/auth/models.py", line 291, in get_profile
    app_label, model_name = settings.AUTH_PROFILE_MODULE.split('.')
ValueError: too many values to unpack

This I think is one of the few cases where Django still has a bit of import magic that tends to cause confusion when a small error doesn’t throw the expected exception.

You can see at the end of the traceback that I posted how using anything other than the form “appname.modelname” for the AUTH_PROFILE_MODULE would cause the line “app_label, model_name = settings.AUTH_PROFILE_MODULE.split(‘.’)” to throw the “too many values to unpack” error.

I’m 99% sure that this was the original problem encountered here.


回答 3

get_profile()调用中某处很可能有错误。在您看来,在返回请求对象之前,请放置以下行:

request.user.get_profile()

它会引发错误,并为您提供更详细的追溯,然后您可以将其用于进一步的调试。

Most likely there is an error somewhere in the get_profile() call. In your view, before you return the request object, put this line:

request.user.get_profile()

It should raise the error, and give you a more detailed traceback, which you can then use to further debug.


回答 4

当我使用Jinja2作为模板时,这会发生在我身上。通过使用django_extensions中runserver_plus命令运行开发服务器可以解决此问题。

它使用werkzeug调试器,该调试器也要好得多,并且具有非常好的交互式调试控制台。在任何帧(在调用堆栈中)启动python shell时,它具有ajax的魔力,因此您可以进行调试。

This happens to me when I’m using Jinja2 for templates. The problem can be solved by running the development server using the runserver_plus command from django_extensions.

It uses the werkzeug debugger which also happens to be a lot better and has a very nice interactive debugging console. It does some ajax magic to launch a python shell at any frame (in the call stack) so you can debug.


Python函数作为函数参数吗?

问题:Python函数作为函数参数吗?

Python函数可以作为另一个函数的参数吗?

说:

def myfunc(anotherfunc, extraArgs):
    # run anotherfunc and also pass the values from extraArgs to it
    pass

所以这基本上是两个问题:

  1. 可以吗?
  2. 如果是的话,如何在其他函数中使用该函数?我需要使用exec(),eval()还是类似的东西?从来不需要与他们搞混。

顺便说一句,extraArgs是anotherfunc参数的列表/元组。

Can a Python function be an argument of another function?

Say:

def myfunc(anotherfunc, extraArgs):
    # run anotherfunc and also pass the values from extraArgs to it
    pass

So this is basically two questions:

  1. Is it allowed at all?
  2. And if it is, how do I use the function inside the other function? Would I need to use exec(), eval() or something like that? Never needed to mess with them.

BTW, extraArgs is a list/tuple of anotherfunc’s arguments.


回答 0

Python函数可以作为另一个函数的参数吗?

是。

def myfunc(anotherfunc, extraArgs):
    anotherfunc(*extraArgs)

更具体地说…带有各种参数…

>>> def x(a,b):
...     print "param 1 %s param 2 %s"%(a,b)
...
>>> def y(z,t):
...     z(*t)
...
>>> y(x,("hello","manuel"))
param 1 hello param 2 manuel
>>>

Can a Python function be an argument of another function?

Yes.

def myfunc(anotherfunc, extraArgs):
    anotherfunc(*extraArgs)

To be more specific … with various arguments …

>>> def x(a,b):
...     print "param 1 %s param 2 %s"%(a,b)
...
>>> def y(z,t):
...     z(*t)
...
>>> y(x,("hello","manuel"))
param 1 hello param 2 manuel
>>>

回答 1

这是使用*args(以及可选)的另一种方法**kwargs

def a(x, y):
  print x, y

def b(other, function, *args, **kwargs):
  function(*args, **kwargs)
  print other

b('world', a, 'hello', 'dude')

输出量

hello dude
world

需要注意的是function*args**kwargs必须按照这个顺序和必须的函数调用该函数的最后的参数。

Here’s another way using *args (and also optionally), **kwargs:

def a(x, y):
  print x, y

def b(other, function, *args, **kwargs):
  function(*args, **kwargs)
  print other

b('world', a, 'hello', 'dude')

Output

hello dude
world

Note that function, *args, **kwargs have to be in that order and have to be the last arguments to the function calling the function.


回答 2

Python中的函数是一流的对象。但是您的函数定义有点偏离

def myfunc(anotherfunc, extraArgs, extraKwArgs):
  return anotherfunc(*extraArgs, **extraKwArgs)

Functions in Python are first-class objects. But your function definition is a bit off.

def myfunc(anotherfunc, extraArgs, extraKwArgs):
  return anotherfunc(*extraArgs, **extraKwArgs)

回答 3

当然,这就是python在第一个参数为函数的情况下实现以下方法的原因:

  • map(function,iterable,…)-将函数应用于iterable的每个项目并返回结果列表。
  • filter(function,iterable)-从这些iterable的元素构造一个列表,对于这些元素,函数将返回true。
  • reduce(function,iterable [,initializer])-将两个参数的函数从左到右累计应用于iterable的项,以将iterable减少为单个值。
  • Lambdas

Sure, that is why python implements the following methods where the first parameter is a function:

  • map(function, iterable, …) – Apply function to every item of iterable and return a list of the results.
  • filter(function, iterable) – Construct a list from those elements of iterable for which function returns true.
  • reduce(function, iterable[,initializer]) – Apply function of two arguments cumulatively to the items of iterable, from left to right, so as to reduce the iterable to a single value.
  • lambdas

回答 4

  1. 是的,允许。
  2. 您可以像使用其他函数一样使用该函数: anotherfunc(*extraArgs)
  1. Yes, it’s allowed.
  2. You use the function as you would any other: anotherfunc(*extraArgs)

回答 5

  1. 是。通过在输入参数中包含函数调用,可以一次调用两个(或多个)函数。

例如:

def anotherfunc(inputarg1, inputarg2):
    pass
def myfunc(func = anotherfunc):
    print func

调用myfunc时,请执行以下操作:

myfunc(anotherfunc(inputarg1, inputarg2))

这将打印anotherfunc的返回值。

希望这可以帮助!

  1. Yes. By including the function call in your input argument/s, you can call two (or more) functions at once.

For example:

def anotherfunc(inputarg1, inputarg2):
    pass
def myfunc(func = anotherfunc):
    print func

When you call myfunc, you do this:

myfunc(anotherfunc(inputarg1, inputarg2))

This will print the return value of anotherfunc.

Hope this helps!


回答 6

函数内部的函数:我们也可以将函数用作参数。

换句话说,我们可以说函数的输出也是对象的引用,请参阅下文,内部函数的输出如何引用外部函数,如下所示。

def out_func(a):

  def in_func(b):
       print(a + b + b + 3)
  return in_func

obj = out_func(1)
print(obj(5))

结果将是.. 14

希望这可以帮助。

Function inside function: we can use the function as parameter too..

In other words, we can say an output of a function is also a reference for an object, see below how the output of inner function is referencing to the outside function like below..

def out_func(a):

  def in_func(b):
       print(a + b + b + 3)
  return in_func

obj = out_func(1)
print(obj(5))

the result will be.. 14

Hope this helps.


回答 7

def x(a):
    print(a)
    return a

def y(a):
    return a

y(x(1))
def x(a):
    print(a)
    return a

def y(a):
    return a

y(x(1))

回答 8

def x(a):
    print(a)
    return a

def y(func_to_run, a):
    return func_to_run(a)

y(x, 1)

我认为这将是一个更适当的示例。现在我想知道的是,是否有一种方法可以编码要在提交给另一个函数的参数中使用的函数。我相信在C ++中,但是在Python中我不确定。

def x(a):
    print(a)
    return a

def y(func_to_run, a):
    return func_to_run(a)

y(x, 1)

That I think would be a more proper sample. Now what I wonder is if there is a way to code the function to use within the argument submission to another function. I believe there is in C++, but in Python I am not sure.


回答 9

装饰器在Python中非常强大,因为它允许程序员将函数作为参数传递,也可以在另一个函数中定义函数。

def decorator(func):
      def insideFunction():
        print("This is inside function before execution")
        func()
      return insideFunction

def func():
    print("I am argument function")

func_obj = decorator(func) 
func_obj()

输出量

  • 这是执行之前的内部函数
  • 我是参数函数

Decorators are very powerful in Python since it allows programmers to pass function as argument and can also define function inside another function.

def decorator(func):
      def insideFunction():
        print("This is inside function before execution")
        func()
      return insideFunction

def func():
    print("I am argument function")

func_obj = decorator(func) 
func_obj()

Output

  • This is inside function before execution
  • I am argument function

如何将SQL查询结果转换为PANDAS数据结构?

问题:如何将SQL查询结果转换为PANDAS数据结构?

在这个问题上的任何帮助将不胜感激。

因此,基本上我想对我的SQL数据库运行查询并将返回的数据存储为Pandas数据结构。

我已附上查询代码。

我正在阅读有关Pandas的文档,但是在识别查询的返回类型时遇到了问题。

我试图打印查询结果,但没有提供任何有用的信息。

谢谢!!!!

from sqlalchemy import create_engine

engine2 = create_engine('mysql://THE DATABASE I AM ACCESSING')
connection2 = engine2.connect()
dataid = 1022
resoverall = connection2.execute("
  SELECT 
      sum(BLABLA) AS BLA,
      sum(BLABLABLA2) AS BLABLABLA2,
      sum(SOME_INT) AS SOME_INT,
      sum(SOME_INT2) AS SOME_INT2,
      100*sum(SOME_INT2)/sum(SOME_INT) AS ctr,
      sum(SOME_INT2)/sum(SOME_INT) AS cpc
   FROM daily_report_cooked
   WHERE campaign_id = '%s'", %dataid)

因此,我有点想了解变量“ resoverall”的格式/数据类型是什么,以及如何将其与PANDAS数据结构一起使用。

Any help on this problem will be greatly appreciated.

So basically I want to run a query to my SQL database and store the returned data as Pandas data structure.

I have attached code for query.

I am reading the documentation on Pandas, but I have problem to identify the return type of my query.

I tried to print the query result, but it doesn’t give any useful information.

Thanks!!!!

from sqlalchemy import create_engine

engine2 = create_engine('mysql://THE DATABASE I AM ACCESSING')
connection2 = engine2.connect()
dataid = 1022
resoverall = connection2.execute("
  SELECT 
      sum(BLABLA) AS BLA,
      sum(BLABLABLA2) AS BLABLABLA2,
      sum(SOME_INT) AS SOME_INT,
      sum(SOME_INT2) AS SOME_INT2,
      100*sum(SOME_INT2)/sum(SOME_INT) AS ctr,
      sum(SOME_INT2)/sum(SOME_INT) AS cpc
   FROM daily_report_cooked
   WHERE campaign_id = '%s'", %dataid)

So I sort of want to understand what’s the format/datatype of my variable “resoverall” and how to put it with PANDAS data structure.


回答 0

这是完成任务的最短代码:

from pandas import DataFrame
df = DataFrame(resoverall.fetchall())
df.columns = resoverall.keys()

您可以像Paul的回答中所说的那样幻想和分析类型。

Here’s the shortest code that will do the job:

from pandas import DataFrame
df = DataFrame(resoverall.fetchall())
df.columns = resoverall.keys()

You can go fancier and parse the types as in Paul’s answer.


回答 1

编辑:2015年3月

如下所述,熊猫现在使用SQLAlchemy读取(read_sql)并将其插入(to_sql)数据库。以下应该工作

import pandas as pd

df = pd.read_sql(sql, cnxn)

以前的答案: 通过类似问题的麦克贝克斯

import pyodbc
import pandas.io.sql as psql

cnxn = pyodbc.connect(connection_info) 
cursor = cnxn.cursor()
sql = "SELECT * FROM TABLE"

df = psql.frame_query(sql, cnxn)
cnxn.close()

Edit: Mar. 2015

As noted below, pandas now uses SQLAlchemy to both read from (read_sql) and insert into (to_sql) a database. The following should work

import pandas as pd

df = pd.read_sql(sql, cnxn)

Previous answer: Via mikebmassey from a similar question

import pyodbc
import pandas.io.sql as psql

cnxn = pyodbc.connect(connection_info) 
cursor = cnxn.cursor()
sql = "SELECT * FROM TABLE"

df = psql.frame_query(sql, cnxn)
cnxn.close()

回答 2

如果您使用的是SQLAlchemy的ORM而不是表达式语言,则可能会发现自己想要将类型的对象转换sqlalchemy.orm.query.Query为Pandas数据框。

最干净的方法是从查询的statement属性获取生成的SQL,然后使用pandas的read_sql()方法执行它。例如,从名为的查询对象开始query

df = pd.read_sql(query.statement, query.session.bind)

If you are using SQLAlchemy’s ORM rather than the expression language, you might find yourself wanting to convert an object of type sqlalchemy.orm.query.Query to a Pandas data frame.

The cleanest approach is to get the generated SQL from the query’s statement attribute, and then execute it with pandas’s read_sql() method. E.g., starting with a Query object called query:

df = pd.read_sql(query.statement, query.session.bind)

回答 3

编辑2014-09-30:

熊猫现在具有read_sql功能。您肯定要使用它。

原始答案:

我无法使用SQLAlchemy帮助您-我总是根据需要使用pyodbc,MySQLdb或psychopg2。但是这样做的时候,像下面这样一个简单的函数往往可以满足我的需求:

import decimal

import pydobc
import numpy as np
import pandas

cnn, cur = myConnectToDBfunction()
cmd = "SELECT * FROM myTable"
cur.execute(cmd)
dataframe = __processCursor(cur, dataframe=True)

def __processCursor(cur, dataframe=False, index=None):
    '''
    Processes a database cursor with data on it into either
    a structured numpy array or a pandas dataframe.

    input:
    cur - a pyodbc cursor that has just received data
    dataframe - bool. if false, a numpy record array is returned
                if true, return a pandas dataframe
    index - list of column(s) to use as index in a pandas dataframe
    '''
    datatypes = []
    colinfo = cur.description
    for col in colinfo:
        if col[1] == unicode:
            datatypes.append((col[0], 'U%d' % col[3]))
        elif col[1] == str:
            datatypes.append((col[0], 'S%d' % col[3]))
        elif col[1] in [float, decimal.Decimal]:
            datatypes.append((col[0], 'f4'))
        elif col[1] == datetime.datetime:
            datatypes.append((col[0], 'O4'))
        elif col[1] == int:
            datatypes.append((col[0], 'i4'))

    data = []
    for row in cur:
        data.append(tuple(row))

    array = np.array(data, dtype=datatypes)
    if dataframe:
        output = pandas.DataFrame.from_records(array)

        if index is not None:
            output = output.set_index(index)

    else:
        output = array

    return output

Edit 2014-09-30:

pandas now has a read_sql function. You definitely want to use that instead.

Original answer:

I can’t help you with SQLAlchemy — I always use pyodbc, MySQLdb, or psychopg2 as needed. But when doing so, a function as simple as the one below tends to suit my needs:

import decimal

import pydobc
import numpy as np
import pandas

cnn, cur = myConnectToDBfunction()
cmd = "SELECT * FROM myTable"
cur.execute(cmd)
dataframe = __processCursor(cur, dataframe=True)

def __processCursor(cur, dataframe=False, index=None):
    '''
    Processes a database cursor with data on it into either
    a structured numpy array or a pandas dataframe.

    input:
    cur - a pyodbc cursor that has just received data
    dataframe - bool. if false, a numpy record array is returned
                if true, return a pandas dataframe
    index - list of column(s) to use as index in a pandas dataframe
    '''
    datatypes = []
    colinfo = cur.description
    for col in colinfo:
        if col[1] == unicode:
            datatypes.append((col[0], 'U%d' % col[3]))
        elif col[1] == str:
            datatypes.append((col[0], 'S%d' % col[3]))
        elif col[1] in [float, decimal.Decimal]:
            datatypes.append((col[0], 'f4'))
        elif col[1] == datetime.datetime:
            datatypes.append((col[0], 'O4'))
        elif col[1] == int:
            datatypes.append((col[0], 'i4'))

    data = []
    for row in cur:
        data.append(tuple(row))

    array = np.array(data, dtype=datatypes)
    if dataframe:
        output = pandas.DataFrame.from_records(array)

        if index is not None:
            output = output.set_index(index)

    else:
        output = array

    return output

回答 4

MySQL连接器

对于使用mysql连接器的用户,可以将此代码作为开始。(感谢@Daniel Velkov)

二手裁判:


import pandas as pd
import mysql.connector

# Setup MySQL connection
db = mysql.connector.connect(
    host="<IP>",              # your host, usually localhost
    user="<USER>",            # your username
    password="<PASS>",        # your password
    database="<DATABASE>"     # name of the data base
)   

# You must create a Cursor object. It will let you execute all the queries you need
cur = db.cursor()

# Use all the SQL you like
cur.execute("SELECT * FROM <TABLE>")

# Put it all to a data frame
sql_data = pd.DataFrame(cur.fetchall())
sql_data.columns = cur.column_names

# Close the session
db.close()

# Show the data
print(sql_data.head())

MySQL Connector

For those that works with the mysql connector you can use this code as a start. (Thanks to @Daniel Velkov)

Used refs:


import pandas as pd
import mysql.connector

# Setup MySQL connection
db = mysql.connector.connect(
    host="<IP>",              # your host, usually localhost
    user="<USER>",            # your username
    password="<PASS>",        # your password
    database="<DATABASE>"     # name of the data base
)   

# You must create a Cursor object. It will let you execute all the queries you need
cur = db.cursor()

# Use all the SQL you like
cur.execute("SELECT * FROM <TABLE>")

# Put it all to a data frame
sql_data = pd.DataFrame(cur.fetchall())
sql_data.columns = cur.column_names

# Close the session
db.close()

# Show the data
print(sql_data.head())

回答 5

这是我使用的代码。希望这可以帮助。

import pandas as pd
from sqlalchemy import create_engine

def getData():
  # Parameters
  ServerName = "my_server"
  Database = "my_db"
  UserPwd = "user:pwd"
  Driver = "driver=SQL Server Native Client 11.0"

  # Create the connection
  engine = create_engine('mssql+pyodbc://' + UserPwd + '@' + ServerName + '/' + Database + "?" + Driver)

  sql = "select * from mytable"
  df = pd.read_sql(sql, engine)
  return df

df2 = getData()
print(df2)

Here’s the code I use. Hope this helps.

import pandas as pd
from sqlalchemy import create_engine

def getData():
  # Parameters
  ServerName = "my_server"
  Database = "my_db"
  UserPwd = "user:pwd"
  Driver = "driver=SQL Server Native Client 11.0"

  # Create the connection
  engine = create_engine('mssql+pyodbc://' + UserPwd + '@' + ServerName + '/' + Database + "?" + Driver)

  sql = "select * from mytable"
  df = pd.read_sql(sql, engine)
  return df

df2 = getData()
print(df2)

回答 6

这是对您的问题的简短回答:

from __future__ import print_function
import MySQLdb
import numpy as np
import pandas as pd
import xlrd

# Connecting to MySQL Database
connection = MySQLdb.connect(
             host="hostname",
             port=0000,
             user="userID",
             passwd="password",
             db="table_documents",
             charset='utf8'
           )
print(connection)
#getting data from database into a dataframe
sql_for_df = 'select * from tabledata'
df_from_database = pd.read_sql(sql_for_df , connection)

This is a short and crisp answer to your problem:

from __future__ import print_function
import MySQLdb
import numpy as np
import pandas as pd
import xlrd

# Connecting to MySQL Database
connection = MySQLdb.connect(
             host="hostname",
             port=0000,
             user="userID",
             passwd="password",
             db="table_documents",
             charset='utf8'
           )
print(connection)
#getting data from database into a dataframe
sql_for_df = 'select * from tabledata'
df_from_database = pd.read_sql(sql_for_df , connection)

回答 7

1.使用MySQL-connector-python

# pip install mysql-connector-python

import mysql.connector
import pandas as pd

mydb = mysql.connector.connect(
    host = 'host',
    user = 'username',
    passwd = 'pass',
    database = 'db_name'
)
query = 'select * from table_name'
df = pd.read_sql(query, con = mydb)
print(df)

2.使用SQLAlchemy

# pip install pymysql
# pip install sqlalchemy

import pandas as pd
import sqlalchemy

engine = sqlalchemy.create_engine('mysql+pymysql://username:password@localhost:3306/db_name')

query = '''
select * from table_name
'''
df = pd.read_sql_query(query, engine)
print(df)

1. Using MySQL-connector-python

# pip install mysql-connector-python

import mysql.connector
import pandas as pd

mydb = mysql.connector.connect(
    host = 'host',
    user = 'username',
    passwd = 'pass',
    database = 'db_name'
)
query = 'select * from table_name'
df = pd.read_sql(query, con = mydb)
print(df)

2. Using SQLAlchemy

# pip install pymysql
# pip install sqlalchemy

import pandas as pd
import sqlalchemy

engine = sqlalchemy.create_engine('mysql+pymysql://username:password@localhost:3306/db_name')

query = '''
select * from table_name
'''
df = pd.read_sql_query(query, engine)
print(df)

回答 8

像Nathan一样,我经常想将sqlalchemy或sqlsoup Query的结果转储到Pandas数据框中。我自己的解决方案是:

query = session.query(tbl.Field1, tbl.Field2)
DataFrame(query.all(), columns=[column['name'] for column in query.column_descriptions])

Like Nathan, I often want to dump the results of a sqlalchemy or sqlsoup Query into a Pandas data frame. My own solution for this is:

query = session.query(tbl.Field1, tbl.Field2)
DataFrame(query.all(), columns=[column['name'] for column in query.column_descriptions])

回答 9

resoverall是sqlalchemy ResultProxy对象。您可以在sqlalchemy文档中阅读有关它的更多信息,后者介绍了使用Engines and Connections的基本用法。这里重要的resoverall是dict之类的。

熊猫喜欢像dict这样的对象来创建其数据结构,请参见 在线文档

祝您好运sqlalchemy和熊猫。

resoverall is a sqlalchemy ResultProxy object. You can read more about it in the sqlalchemy docs, the latter explains basic usage of working with Engines and Connections. Important here is that resoverall is dict like.

Pandas likes dict like objects to create its data structures, see the online docs

Good luck with sqlalchemy and pandas.


回答 10

简单地使用pandaspyodbc在一起。您必须connstr根据数据库规范修改连接字符串()。

import pyodbc
import pandas as pd

# MSSQL Connection String Example
connstr = "Server=myServerAddress;Database=myDB;User Id=myUsername;Password=myPass;"

# Query Database and Create DataFrame Using Results
df = pd.read_sql("select * from myTable", pyodbc.connect(connstr))

我已经使用pyodbc了多个企业数据库(例如SQL Server,MySQL,MariaDB,IBM)。

Simply use pandas and pyodbc together. You’ll have to modify your connection string (connstr) according to your database specifications.

import pyodbc
import pandas as pd

# MSSQL Connection String Example
connstr = "Server=myServerAddress;Database=myDB;User Id=myUsername;Password=myPass;"

# Query Database and Create DataFrame Using Results
df = pd.read_sql("select * from myTable", pyodbc.connect(connstr))

I’ve used pyodbc with several enterprise databases (e.g. SQL Server, MySQL, MariaDB, IBM).


回答 11

这个问题很旧,但是我想加两分钱。我读到的问题是“我想对我的[my] SQL数据库运行查询并将返回的数据存储为Pandas数据结构[DataFrame]。”

从代码中看起来您的意思是mysql数据库,并假设您的意思是pandas DataFrame。

import MySQLdb as mdb
import pandas.io.sql as sql
from pandas import *

conn = mdb.connect('<server>','<user>','<pass>','<db>');
df = sql.read_frame('<query>', conn)

例如,

conn = mdb.connect('localhost','myname','mypass','testdb');
df = sql.read_frame('select * from testTable', conn)

这会将testTable的所有行导入到DataFrame中。

This question is old, but I wanted to add my two-cents. I read the question as ” I want to run a query to my [my]SQL database and store the returned data as Pandas data structure [DataFrame].”

From the code it looks like you mean mysql database and assume you mean pandas DataFrame.

import MySQLdb as mdb
import pandas.io.sql as sql
from pandas import *

conn = mdb.connect('<server>','<user>','<pass>','<db>');
df = sql.read_frame('<query>', conn)

For example,

conn = mdb.connect('localhost','myname','mypass','testdb');
df = sql.read_frame('select * from testTable', conn)

This will import all rows of testTable into a DataFrame.


回答 12

这是我的。以防万一,如果您使用“ pymysql”:

import pymysql
from pandas import DataFrame

host   = 'localhost'
port   = 3306
user   = 'yourUserName'
passwd = 'yourPassword'
db     = 'yourDatabase'

cnx    = pymysql.connect(host=host, port=port, user=user, passwd=passwd, db=db)
cur    = cnx.cursor()

query  = """ SELECT * FROM yourTable LIMIT 10"""
cur.execute(query)

field_names = [i[0] for i in cur.description]
get_data = [xx for xx in cur]

cur.close()
cnx.close()

df = DataFrame(get_data)
df.columns = field_names

Here is mine. Just in case if you are using “pymysql”:

import pymysql
from pandas import DataFrame

host   = 'localhost'
port   = 3306
user   = 'yourUserName'
passwd = 'yourPassword'
db     = 'yourDatabase'

cnx    = pymysql.connect(host=host, port=port, user=user, passwd=passwd, db=db)
cur    = cnx.cursor()

query  = """ SELECT * FROM yourTable LIMIT 10"""
cur.execute(query)

field_names = [i[0] for i in cur.description]
get_data = [xx for xx in cur]

cur.close()
cnx.close()

df = DataFrame(get_data)
df.columns = field_names

回答 13

pandas.io.sql.write_frame已弃用。 https://pandas.pydata.org/pandas-docs/version/0.15.2/generated/pandas.io.sql.write_frame.html

应该更改为使用pandas.DataFrame.to_sql https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html

还有另一种解决方案。 PYODBC到Pandas-DataFrame不起作用-传递的值的形状为(x,y),索引表示为(w,z)

从熊猫0.12(我相信)开始,您可以:

import pandas
import pyodbc

sql = 'select * from table'
cnn = pyodbc.connect(...)

data = pandas.read_sql(sql, cnn)

在0.12之前,您可以执行以下操作:

import pandas
from pandas.io.sql import read_frame
import pyodbc

sql = 'select * from table'
cnn = pyodbc.connect(...)

data = read_frame(sql, cnn)

pandas.io.sql.write_frame is DEPRECATED. https://pandas.pydata.org/pandas-docs/version/0.15.2/generated/pandas.io.sql.write_frame.html

Should change to use pandas.DataFrame.to_sql https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html

There is another solution. PYODBC to Pandas – DataFrame not working – Shape of passed values is (x,y), indices imply (w,z)

As of Pandas 0.12 (I believe) you can do:

import pandas
import pyodbc

sql = 'select * from table'
cnn = pyodbc.connect(...)

data = pandas.read_sql(sql, cnn)

Prior to 0.12, you could do:

import pandas
from pandas.io.sql import read_frame
import pyodbc

sql = 'select * from table'
cnn = pyodbc.connect(...)

data = read_frame(sql, cnn)

回答 14

离上一篇帖子很久了,但也许可以帮助某人…

比Paul H更短的路:

my_dic = session.query(query.all())
my_df = pandas.DataFrame.from_dict(my_dic)

Long time from last post but maybe it helps someone…

Shorted way than Paul H:

my_dic = session.query(query.all())
my_df = pandas.DataFrame.from_dict(my_dic)

回答 15

我这样做的最好方法

db.execute(query) where db=db_class() #database class
    mydata=[x for x in db.fetchall()]
    df=pd.DataFrame(data=mydata)

best way I do this

db.execute(query) where db=db_class() #database class
    mydata=[x for x in db.fetchall()]
    df=pd.DataFrame(data=mydata)

回答 16

如果结果类型为ResultSet,则应首先将其转换为字典。然后,将自动收集DataFrame列

这适用于我的情况:

df = pd.DataFrame([dict(r) for r in resoverall])

If the result type is ResultSet, you should convert it to dictionary first. Then the DataFrame columns will be collected automatically.

This works on my case:

df = pd.DataFrame([dict(r) for r in resoverall])

从字典中删除带有空字符串的键的有效方法

问题:从字典中删除带有空字符串的键的有效方法

我有一个字典,想删除所有有空值字符串的键。

metadata = {u'Composite:PreviewImage': u'(Binary data 101973 bytes)',
            u'EXIF:CFAPattern2': u''}

做这个的最好方式是什么?

I have a dict and would like to remove all the keys for which there are empty value strings.

metadata = {u'Composite:PreviewImage': u'(Binary data 101973 bytes)',
            u'EXIF:CFAPattern2': u''}

What is the best way to do this?


回答 0

Python 2.X

dict((k, v) for k, v in metadata.iteritems() if v)

Python 2.7-3.X

{k: v for k, v in metadata.items() if v is not None}

请注意,您所有的键都有值。只是其中一些值是空字符串。没有值的字典中就没有键。如果它没有价值,就不会在字典中。

Python 2.X

dict((k, v) for k, v in metadata.iteritems() if v)

Python 2.7 – 3.X

{k: v for k, v in metadata.items() if v is not None}

Note that all of your keys have values. It’s just that some of those values are the empty string. There’s no such thing as a key in a dict without a value; if it didn’t have a value, it wouldn’t be in the dict.


回答 1

它甚至比BrenBarn的解决方案还短(我认为它更具可读性)

{k: v for k, v in metadata.items() if v}

使用Python 2.7.3测试。

It can get even shorter than BrenBarn’s solution (and more readable I think)

{k: v for k, v in metadata.items() if v}

Tested with Python 2.7.3.


回答 2

如果您确实需要修改原始词典:

empty_keys = [k for k,v in metadata.iteritems() if not v]
for k in empty_keys:
    del metadata[k]

请注意,我们必须列出一个空键,因为我们无法在遍历字典时修改字典(您可能已经注意到)。但是,这(在内存方面)比创建全新的字典便宜(除非存在大量具有空值的条目)。

If you really need to modify the original dictionary:

empty_keys = [k for k,v in metadata.iteritems() if not v]
for k in empty_keys:
    del metadata[k]

Note that we have to make a list of the empty keys because we can’t modify a dictionary while iterating through it (as you may have noticed). This is less expensive (memory-wise) than creating a brand-new dictionary, though, unless there are a lot of entries with empty values.


回答 3

BrenBarn的解决方案是理想的(我可能会添加pythonic)。但是,这是另一个(fp)解决方案:

from operator import itemgetter
dict(filter(itemgetter(1), metadata.items()))

BrenBarn’s solution is ideal (and pythonic, I might add). Here is another (fp) solution, however:

from operator import itemgetter
dict(filter(itemgetter(1), metadata.items()))

回答 4

如果您想要一种功能全面但简洁的方法来处理通常是嵌套的甚至可能包含循环的现实世界数据结构,建议您从boltons实用程序包中查看remap实用程序

之后pip install boltons或复制iterutils.py到您的项目,只是做:

from boltons.iterutils import remap

drop_falsey = lambda path, key, value: bool(value)
clean = remap(metadata, visit=drop_falsey)

该页面上有更多示例,包括使用Github API处理更大对象的示例。

它是纯Python,因此可在任何地方使用,并已在Python 2.7和3.3+中进行了全面测试。最棒的是,我是针对这种情况编写的,因此,如果您发现它无法处理的情况,可以在这里麻烦我进行修复。

If you want a full-featured, yet succinct approach to handling real-world data structures which are often nested, and can even contain cycles, I recommend looking at the remap utility from the boltons utility package.

After pip install boltons or copying iterutils.py into your project, just do:

from boltons.iterutils import remap

drop_falsey = lambda path, key, value: bool(value)
clean = remap(metadata, visit=drop_falsey)

This page has many more examples, including ones working with much larger objects from Github’s API.

It’s pure-Python, so it works everywhere, and is fully tested in Python 2.7 and 3.3+. Best of all, I wrote it for exactly cases like this, so if you find a case it doesn’t handle, you can bug me to fix it right here.


回答 5

基于Ryan的解决方案,如果您还有列表和嵌套字典:

对于Python 2:

def remove_empty_from_dict(d):
    if type(d) is dict:
        return dict((k, remove_empty_from_dict(v)) for k, v in d.iteritems() if v and remove_empty_from_dict(v))
    elif type(d) is list:
        return [remove_empty_from_dict(v) for v in d if v and remove_empty_from_dict(v)]
    else:
        return d

对于Python 3:

def remove_empty_from_dict(d):
    if type(d) is dict:
        return dict((k, remove_empty_from_dict(v)) for k, v in d.items() if v and remove_empty_from_dict(v))
    elif type(d) is list:
        return [remove_empty_from_dict(v) for v in d if v and remove_empty_from_dict(v)]
    else:
        return d

Based on Ryan’s solution, if you also have lists and nested dictionaries:

For Python 2:

def remove_empty_from_dict(d):
    if type(d) is dict:
        return dict((k, remove_empty_from_dict(v)) for k, v in d.iteritems() if v and remove_empty_from_dict(v))
    elif type(d) is list:
        return [remove_empty_from_dict(v) for v in d if v and remove_empty_from_dict(v)]
    else:
        return d

For Python 3:

def remove_empty_from_dict(d):
    if type(d) is dict:
        return dict((k, remove_empty_from_dict(v)) for k, v in d.items() if v and remove_empty_from_dict(v))
    elif type(d) is list:
        return [remove_empty_from_dict(v) for v in d if v and remove_empty_from_dict(v)]
    else:
        return d

回答 6

如果您有一个嵌套的字典,并且希望它甚至对空的子元素也适用,则可以使用BrenBarn建议的递归变体:

def scrub_dict(d):
    if type(d) is dict:
        return dict((k, scrub_dict(v)) for k, v in d.iteritems() if v and scrub_dict(v))
    else:
        return d

If you have a nested dictionary, and you want this to work even for empty sub-elements, you can use a recursive variant of BrenBarn’s suggestion:

def scrub_dict(d):
    if type(d) is dict:
        return dict((k, scrub_dict(v)) for k, v in d.iteritems() if v and scrub_dict(v))
    else:
        return d

回答 7

快速解答(TL; DR)

范例01

### example01 -------------------

mydict  =   { "alpha":0,
              "bravo":"0",
              "charlie":"three",
              "delta":[],
              "echo":False,
              "foxy":"False",
              "golf":"",
              "hotel":"   ",                        
            }
newdict =   dict([(vkey, vdata) for vkey, vdata in mydict.iteritems() if(vdata) ])
print newdict

### result01 -------------------
result01 ='''
{'foxy': 'False', 'charlie': 'three', 'bravo': '0'}
'''

详细答案

问题

  • 内容: Python 2.x
  • 场景:开发人员希望修改字典以排除空白值
    • aka从字典中删除空值
    • 也就是删除具有空白值的键
    • aka过滤器字典,用于每个键值对上的非空白值

  • example01使用带有简单条件的python list-comprehension语法删除“空”值

陷阱

  • example01仅对原始词典的副本进行操作(未就地修改)
  • example01可能会产生意外结果,具体取决于开发人员“空”的含义
    • 开发人员是否打算保留虚假的价值
    • 如果字典中的值不保证是字符串,则开发人员可能会意外丢失数据。
    • result01显示原始集合中仅保留了三个键值对

替代示例

  • example02帮助解决潜在的陷阱
  • 该方法是通过更改条件使用“空”的更精确定义。
  • 在这里,我们只想滤除评估为空字符串的值。
  • 在这里,我们还使用.strip()过滤出仅包含空格的值。

示例02

### example02 -------------------

mydict  =   { "alpha":0,
              "bravo":"0",
              "charlie":"three",
              "delta":[],
              "echo":False,
              "foxy":"False",
              "golf":"",
              "hotel":"   ",
            }
newdict =   dict([(vkey, vdata) for vkey, vdata in mydict.iteritems() if(str(vdata).strip()) ])
print newdict

### result02 -------------------
result02 ='''
{'alpha': 0,
  'bravo': '0', 
  'charlie': 'three', 
  'delta': [],
  'echo': False,
  'foxy': 'False'
  }
'''

也可以看看

Quick Answer (TL;DR)

Example01

### example01 -------------------

mydict  =   { "alpha":0,
              "bravo":"0",
              "charlie":"three",
              "delta":[],
              "echo":False,
              "foxy":"False",
              "golf":"",
              "hotel":"   ",                        
            }
newdict =   dict([(vkey, vdata) for vkey, vdata in mydict.iteritems() if(vdata) ])
print newdict

### result01 -------------------
result01 ='''
{'foxy': 'False', 'charlie': 'three', 'bravo': '0'}
'''

Detailed Answer

Problem

  • Context: Python 2.x
  • Scenario: Developer wishes modify a dictionary to exclude blank values
    • aka remove empty values from a dictionary
    • aka delete keys with blank values
    • aka filter dictionary for non-blank values over each key-value pair

Solution

  • example01 use python list-comprehension syntax with simple conditional to remove “empty” values

Pitfalls

  • example01 only operates on a copy of the original dictionary (does not modify in place)
  • example01 may produce unexpected results depending on what developer means by “empty”
    • Does developer mean to keep values that are falsy?
    • If the values in the dictionary are not gauranteed to be strings, developer may have unexpected data loss.
    • result01 shows that only three key-value pairs were preserved from the original set

Alternate example

  • example02 helps deal with potential pitfalls
  • The approach is to use a more precise definition of “empty” by changing the conditional.
  • Here we only want to filter out values that evaluate to blank strings.
  • Here we also use .strip() to filter out values that consist of only whitespace.

Example02

### example02 -------------------

mydict  =   { "alpha":0,
              "bravo":"0",
              "charlie":"three",
              "delta":[],
              "echo":False,
              "foxy":"False",
              "golf":"",
              "hotel":"   ",
            }
newdict =   dict([(vkey, vdata) for vkey, vdata in mydict.iteritems() if(str(vdata).strip()) ])
print newdict

### result02 -------------------
result02 ='''
{'alpha': 0,
  'bravo': '0', 
  'charlie': 'three', 
  'delta': [],
  'echo': False,
  'foxy': 'False'
  }
'''

See also


回答 8

对于python 3

dict((k, v) for k, v in metadata.items() if v)

For python 3

dict((k, v) for k, v in metadata.items() if v)

回答 9

patriciasznneonneo的答案为基础,并考虑到您可能希望删除仅包含某些虚假内容(例如'')但没有其他虚假内容(例如)的密钥的可能性0,或者您甚至想包含一些真实的内容(例如'SPAM') ,那么您可以制作一个非常具体的命中列表:

unwanted = ['', u'', None, False, [], 'SPAM']

不幸的是,这并不是很有效,因为例如0 in unwanted计算结果为True。我们需要区分0和其他虚假的东西,所以我们必须使用is

any([0 is i for i in unwanted])

…评估为False

现在将其用于del不需要的东西:

unwanted_keys = [k for k, v in metadata.items() if any([v is i for i in unwanted])]
for k in unwanted_keys: del metadata[k]

如果您想要一个新的字典,而不是metadata就地修改:

newdict = {k: v for k, v in metadata.items() if not any([v is i for i in unwanted])}

Building on the answers from patriciasz and nneonneo, and accounting for the possibility that you might want to delete keys that have only certain falsy things (e.g. '') but not others (e.g. 0), or perhaps you even want to include some truthy things (e.g. 'SPAM'), then you could make a highly specific hitlist:

unwanted = ['', u'', None, False, [], 'SPAM']

Unfortunately, this doesn’t quite work, because for example 0 in unwanted evaluates to True. We need to discriminate between 0 and other falsy things, so we have to use is:

any([0 is i for i in unwanted])

…evaluates to False.

Now use it to del the unwanted things:

unwanted_keys = [k for k, v in metadata.items() if any([v is i for i in unwanted])]
for k in unwanted_keys: del metadata[k]

If you want a new dictionary, instead of modifying metadata in place:

newdict = {k: v for k, v in metadata.items() if not any([v is i for i in unwanted])}

回答 10

我阅读了该线程中的所有答复,并且也引用了该线程: 使用递归函数删除嵌套字典中的空字典

我最初在这里使用解决方案,效果很好:

尝试1:太热(不具有性能或过时的能力)

def scrub_dict(d):
    if type(d) is dict:
        return dict((k, scrub_dict(v)) for k, v in d.iteritems() if v and scrub_dict(v))
    else:
        return d

但是在Python 2.7世界中提出了一些性能和兼容性问题:

  1. isinstance代替type
  2. 将列表组合展开到for循环中以提高效率
  3. 使用python3安全items而不是iteritems

尝试2:太冷(缺乏记忆)

def scrub_dict(d):
    new_dict = {}
    for k, v in d.items():
        if isinstance(v,dict):
            v = scrub_dict(v)
        if not v in (u'', None, {}):
            new_dict[k] = v
    return new_dict

DOH!这不是递归的,也不是完全的记忆。

尝试3:正确(到目前为止)

def scrub_dict(d):
    new_dict = {}
    for k, v in d.items():
        if isinstance(v,dict):
            v = scrub_dict(v)
        if not v in (u'', None, {}):
            new_dict[k] = v
    return new_dict

I read all replies in this thread and some referred also to this thread: Remove empty dicts in nested dictionary with recursive function

I originally used solution here and it worked great:

Attempt 1: Too Hot (not performant or future-proof):

def scrub_dict(d):
    if type(d) is dict:
        return dict((k, scrub_dict(v)) for k, v in d.iteritems() if v and scrub_dict(v))
    else:
        return d

But some performance and compatibility concerns were raised in Python 2.7 world:

  1. use isinstance instead of type
  2. unroll the list comp into for loop for efficiency
  3. use python3 safe items instead of iteritems

Attempt 2: Too Cold (Lacks Memoization):

def scrub_dict(d):
    new_dict = {}
    for k, v in d.items():
        if isinstance(v,dict):
            v = scrub_dict(v)
        if not v in (u'', None, {}):
            new_dict[k] = v
    return new_dict

DOH! This is not recursive and not at all memoizant.

Attempt 3: Just Right (so far):

def scrub_dict(d):
    new_dict = {}
    for k, v in d.items():
        if isinstance(v,dict):
            v = scrub_dict(v)
        if not v in (u'', None, {}):
            new_dict[k] = v
    return new_dict

回答 11

带数组的字典

  • 在答案尝试3:刚刚好(到目前为止)BlissRage的回答不能正确处理数组中的元素。我会附上一个补丁,以防有人需要。该方法使用带有的语句块处理列表,该语句块if isinstance(v, list):使用原始scrub_dict(d)实现清理列表。
    @staticmethod
    def scrub_dict(d):
        new_dict = {}
        for k, v in d.items():
            if isinstance(v, dict):
                v = scrub_dict(v)
            if isinstance(v, list):
                v = scrub_list(v)
            if not v in (u'', None, {}):
                new_dict[k] = v
        return new_dict

    @staticmethod
    def scrub_list(d):
        scrubbed_list = []
        for i in d:
            if isinstance(i, dict):
                i = scrub_dict(i)
            scrubbed_list.append(i)
        return scrubbed_list

Dicts mixed with Arrays

  • The answer at Attempt 3: Just Right (so far) from BlissRage’s answer does not properly handle arrays elements. I’m including a patch in case anyone needs it. The method is handles list with the statement block of if isinstance(v, list):, which scrubs the list using the original scrub_dict(d) implementation.
    @staticmethod
    def scrub_dict(d):
        new_dict = {}
        for k, v in d.items():
            if isinstance(v, dict):
                v = scrub_dict(v)
            if isinstance(v, list):
                v = scrub_list(v)
            if not v in (u'', None, {}):
                new_dict[k] = v
        return new_dict

    @staticmethod
    def scrub_list(d):
        scrubbed_list = []
        for i in d:
            if isinstance(i, dict):
                i = scrub_dict(i)
            scrubbed_list.append(i)
        return scrubbed_list

回答 12

您可以执行此操作的另一种方法是使用字典理解。这应该与2.7+

result = {
    key: value for key, value in
    {"foo": "bar", "lorem": None}.items()
    if value
}

An alternative way you can do this, is using dictionary comprehension. This should be compatible with 2.7+

result = {
    key: value for key, value in
    {"foo": "bar", "lorem": None}.items()
    if value
}

回答 13

如果您使用的是以下选项pandas

import pandas as pd

d = dict.fromkeys(['a', 'b', 'c', 'd'])
d['b'] = 'not null'
d['c'] = ''  # empty string

print(d)

# convert `dict` to `Series` and replace any blank strings with `None`;
# use the `.dropna()` method and
# then convert back to a `dict`
d_ = pd.Series(d).replace('', None).dropna().to_dict()

print(d_)

Here is an option if you are using pandas:

import pandas as pd

d = dict.fromkeys(['a', 'b', 'c', 'd'])
d['b'] = 'not null'
d['c'] = ''  # empty string

print(d)

# convert `dict` to `Series` and replace any blank strings with `None`;
# use the `.dropna()` method and
# then convert back to a `dict`
d_ = pd.Series(d).replace('', None).dropna().to_dict()

print(d_)

回答 14

上面提到的某些方法会忽略是否存在整数,并且会以0和0.0的值进行浮点运算

如果有人想避免上述情况,可以使用以下代码(从嵌套字典和嵌套列表中删除空字符串和None值):

def remove_empty_from_dict(d):
    if type(d) is dict:
        _temp = {}
        for k,v in d.items():
            if v == None or v == "":
                pass
            elif type(v) is int or type(v) is float:
                _temp[k] = remove_empty_from_dict(v)
            elif (v or remove_empty_from_dict(v)):
                _temp[k] = remove_empty_from_dict(v)
        return _temp
    elif type(d) is list:
        return [remove_empty_from_dict(v) for v in d if( (str(v).strip() or str(remove_empty_from_dict(v)).strip()) and (v != None or remove_empty_from_dict(v) != None))]
    else:
        return d

Some of Methods mentioned above ignores if there are any integers and float with values 0 & 0.0

If someone wants to avoid the above can use below code(removes empty strings and None values from nested dictionary and nested list):

def remove_empty_from_dict(d):
    if type(d) is dict:
        _temp = {}
        for k,v in d.items():
            if v == None or v == "":
                pass
            elif type(v) is int or type(v) is float:
                _temp[k] = remove_empty_from_dict(v)
            elif (v or remove_empty_from_dict(v)):
                _temp[k] = remove_empty_from_dict(v)
        return _temp
    elif type(d) is list:
        return [remove_empty_from_dict(v) for v in d if( (str(v).strip() or str(remove_empty_from_dict(v)).strip()) and (v != None or remove_empty_from_dict(v) != None))]
    else:
        return d

回答 15

“由于我目前还为使用Python编写一个桌面应用程序,因此我在数据输入应用程序中发现有很多条目,而其中一些条目不是强制性的,因此用户可以将其留空,以进行验证,因此很容易抓住。所有条目,然后丢弃空键或字典的值,因此我的代码上方显示了如何使用字典理解功能轻松地将它们取出,并保留不为空的字典值元素。我使用Python 3.8.3

data = {'':'', '20':'', '50':'', '100':'1.1', '200':'1.2'}

dic = {key:value for key,value in data.items() if value != ''}

print(dic)

{'100': '1.1', '200': '1.2'}

“As I also currently write a desktop application for my work with Python, I found in data-entry application when there is lots of entry and which some are not mandatory thus user can left it blank, for validation purpose, it is easy to grab all entries and then discard empty key or value of a dictionary. So my code above a show how we can easy take them out, using dictionary comprehension and keep dictionary value element which is not blank. I use Python 3.8.3

data = {'':'', '20':'', '50':'', '100':'1.1', '200':'1.2'}

dic = {key:value for key,value in data.items() if value != ''}

print(dic)

{'100': '1.1', '200': '1.2'}

回答 16

一些基准测试:

1.列表理解重新创建字典

In [7]: %%timeit dic = {str(i):i for i in xrange(10)}; dic['10'] = None; dic['5'] = None
   ...: dic = {k: v for k, v in dic.items() if v is not None} 
   1000000 loops, best of 7: 375 ns per loop

2.列表理解使用dict()重新创建dict

In [8]: %%timeit dic = {str(i):i for i in xrange(10)}; dic['10'] = None; dic['5'] = None
   ...: dic = dict((k, v) for k, v in dic.items() if v is not None)
1000000 loops, best of 7: 681 ns per loop

3.如果v为None,则循环并删除密钥

In [10]: %%timeit dic = {str(i):i for i in xrange(10)}; dic['10'] = None; dic['5'] = None
    ...: for k, v in dic.items():
    ...:   if v is None:
    ...:     del dic[k]
    ...: 
10000000 loops, best of 7: 160 ns per loop

因此循环和删除最快在160ns时完成,列表理解在375ns时慢了一半,而调用dict()则在680ns时又慢了一半。

将3包装到函数中可将其再次降低到约275ns。对我来说,PyPy的速度也快于neet python的两倍。

Some benchmarking:

1. List comprehension recreate dict

In [7]: %%timeit dic = {str(i):i for i in xrange(10)}; dic['10'] = None; dic['5'] = None
   ...: dic = {k: v for k, v in dic.items() if v is not None} 
   1000000 loops, best of 7: 375 ns per loop

2. List comprehension recreate dict using dict()

In [8]: %%timeit dic = {str(i):i for i in xrange(10)}; dic['10'] = None; dic['5'] = None
   ...: dic = dict((k, v) for k, v in dic.items() if v is not None)
1000000 loops, best of 7: 681 ns per loop

3. Loop and delete key if v is None

In [10]: %%timeit dic = {str(i):i for i in xrange(10)}; dic['10'] = None; dic['5'] = None
    ...: for k, v in dic.items():
    ...:   if v is None:
    ...:     del dic[k]
    ...: 
10000000 loops, best of 7: 160 ns per loop

so loop and delete is the fastest at 160ns, list comprehension is half as slow at ~375ns and with a call to dict() is half as slow again ~680ns.

Wrapping 3 into a function brings it back down again to about 275ns. Also for me PyPy was about twice as fast as neet python.


如何将字符串数组转换为numpy中的浮点数组?

问题:如何将字符串数组转换为numpy中的浮点数组?

如何转换

["1.1", "2.2", "3.2"]

[1.1, 2.2, 3.2]

在NumPy中?

How to convert

["1.1", "2.2", "3.2"]

to

[1.1, 2.2, 3.2]

in NumPy?


回答 0

好吧,如果您是以列表的形式读取数据,则可以这样做np.array(map(float, list_of_strings))(或等效地,使用列表理解)。(在Python 3,你需要调用listmap,如果你使用的返回值map,因为map现在返回一个迭代器)。

但是,如果已经是一串Numpy的字符串,则有更好的方法。使用astype()

import numpy as np
x = np.array(['1.1', '2.2', '3.3'])
y = x.astype(np.float)

Well, if you’re reading the data in as a list, just do np.array(map(float, list_of_strings)) (or equivalently, use a list comprehension). (In Python 3, you’ll need to call list on the map return value if you use map, since map returns an iterator now.)

However, if it’s already a numpy array of strings, there’s a better way. Use astype().

import numpy as np
x = np.array(['1.1', '2.2', '3.3'])
y = x.astype(np.float)

回答 1

您也可以使用它

import numpy as np
x=np.array(['1.1', '2.2', '3.3'])
x=np.asfarray(x,float)

You can use this as well

import numpy as np
x=np.array(['1.1', '2.2', '3.3'])
x=np.asfarray(x,float)

回答 2

另一个选项可能是numpy.asarray

import numpy as np
a = ["1.1", "2.2", "3.2"]
b = np.asarray(a, dtype=np.float64, order='C')

对于Python 2 *:

print a, type(a), type(a[0])
print b, type(b), type(b[0])

导致:

['1.1', '2.2', '3.2'] <type 'list'> <type 'str'>
[1.1 2.2 3.2] <type 'numpy.ndarray'> <type 'numpy.float64'>

Another option might be numpy.asarray:

import numpy as np
a = ["1.1", "2.2", "3.2"]
b = np.asarray(a, dtype=np.float64, order='C')

For Python 2*:

print a, type(a), type(a[0])
print b, type(b), type(b[0])

resulting in:

['1.1', '2.2', '3.2'] <type 'list'> <type 'str'>
[1.1 2.2 3.2] <type 'numpy.ndarray'> <type 'numpy.float64'>

回答 3

如果您拥有(或创建)单个字符串,则可以使用np.fromstring

import numpy as np
x = ["1.1", "2.2", "3.2"]
x = ','.join(x)
x = np.fromstring( x, dtype=np.float, sep=',' )

注意,x = ','.join(x)将x数组转换为string '1.1, 2.2, 3.2'。如果您从txt文件中读取一行,则每一行都已经是一个字符串。

If you have (or create) a single string, you can use np.fromstring:

import numpy as np
x = ["1.1", "2.2", "3.2"]
x = ','.join(x)
x = np.fromstring( x, dtype=np.float, sep=',' )

Note, x = ','.join(x) transforms the x array to string '1.1, 2.2, 3.2'. If you read a line from a txt file, each line will be already a string.


模拟类:Mock()或patch()?

问题:模拟类:Mock()或patch()?

我在Python中使用模拟,并想知道这两种方法中哪一种更好(请参阅:更多pythonic)。

方法一:只需创建一个模拟对象并使用它即可。代码如下:

def test_one (self):
    mock = Mock()
    mock.method.return_value = True 
    self.sut.something(mock) # This should called mock.method and checks the result. 
    self.assertTrue(mock.method.called)

方法二:使用补丁创建一个模拟。代码如下:

@patch("MyClass")
def test_two (self, mock):
    instance = mock.return_value
    instance.method.return_value = True
    self.sut.something(instance) # This should called mock.method and checks the result. 
    self.assertTrue(instance.method.called)

两种方法都做同样的事情。我不确定这些差异。

谁能启发我?

I am using mock with Python and was wondering which of those two approaches is better (read: more pythonic).

Method one: Just create a mock object and use that. The code looks like:

def test_one (self):
    mock = Mock()
    mock.method.return_value = True 
    self.sut.something(mock) # This should called mock.method and checks the result. 
    self.assertTrue(mock.method.called)

Method two: Use patch to create a mock. The code looks like:

@patch("MyClass")
def test_two (self, mock):
    instance = mock.return_value
    instance.method.return_value = True
    self.sut.something(instance) # This should called mock.method and checks the result. 
    self.assertTrue(instance.method.called)

Both methods do the same thing. I am unsure of the differences.

Could anyone enlighten me?


回答 0

mock.patch与…是一个非常不同的生物mock.Mockpatch 模拟对象替换该类,并允许您使用模拟实例。看一下这个片段:

>>> class MyClass(object):
...   def __init__(self):
...     print 'Created MyClass@{0}'.format(id(self))
... 
>>> def create_instance():
...   return MyClass()
... 
>>> x = create_instance()
Created MyClass@4299548304
>>> 
>>> @mock.patch('__main__.MyClass')
... def create_instance2(MyClass):
...   MyClass.return_value = 'foo'
...   return create_instance()
... 
>>> i = create_instance2()
>>> i
'foo'
>>> def create_instance():
...   print MyClass
...   return MyClass()
...
>>> create_instance2()
<mock.Mock object at 0x100505d90>
'foo'
>>> create_instance()
<class '__main__.MyClass'>
Created MyClass@4300234128
<__main__.MyClass object at 0x100505d90>

patchMyClass以允许您控制所调用函数中类的用法的方式进行替换。修补类后,对该类的引用将完全由模拟实例替换。

mock.patch通常在测试要在测试内部创建类的新实例的东西时使用。 mock.Mock实例更清晰,更可取。如果您的self.sut.something方法创建了的实例MyClass而不是将实例作为参数接收,则mock.patch此处适当。

mock.patch is a very very different critter than mock.Mock. patch replaces the class with a mock object and lets you work with the mock instance. Take a look at this snippet:

>>> class MyClass(object):
...   def __init__(self):
...     print 'Created MyClass@{0}'.format(id(self))
... 
>>> def create_instance():
...   return MyClass()
... 
>>> x = create_instance()
Created MyClass@4299548304
>>> 
>>> @mock.patch('__main__.MyClass')
... def create_instance2(MyClass):
...   MyClass.return_value = 'foo'
...   return create_instance()
... 
>>> i = create_instance2()
>>> i
'foo'
>>> def create_instance():
...   print MyClass
...   return MyClass()
...
>>> create_instance2()
<mock.Mock object at 0x100505d90>
'foo'
>>> create_instance()
<class '__main__.MyClass'>
Created MyClass@4300234128
<__main__.MyClass object at 0x100505d90>

patch replaces MyClass in a way that allows you to control the usage of the class in functions that you call. Once you patch a class, references to the class are completely replaced by the mock instance.

mock.patch is usually used when you are testing something that creates a new instance of a class inside of the test. mock.Mock instances are clearer and are preferred. If your self.sut.something method created an instance of MyClass instead of receiving an instance as a parameter, then mock.patch would be appropriate here.


回答 1

我有一个YouTube视频

简短答案:mock在传递要嘲笑的东西时使用,patch如果不是,则使用。在这两种方法中,mock是首选,因为它意味着您正在使用适当的依赖注入来编写代码。

愚蠢的例子:

# Use a mock to test this.
my_custom_tweeter(twitter_api, sentence):
    sentence.replace('cks','x')   # We're cool and hip.
    twitter_api.send(sentence)

# Use a patch to mock out twitter_api. You have to patch the Twitter() module/class 
# and have it return a mock. Much uglier, but sometimes necessary.
my_badly_written_tweeter(sentence):
    twitter_api = Twitter(user="XXX", password="YYY")
    sentence.replace('cks','x') 
    twitter_api.send(sentence)

I’ve got a YouTube video on this.

Short answer: Use mock when you’re passing in the thing that you want mocked, and patch if you’re not. Of the two, mock is strongly preferred because it means you’re writing code with proper dependency injection.

Silly example:

# Use a mock to test this.
my_custom_tweeter(twitter_api, sentence):
    sentence.replace('cks','x')   # We're cool and hip.
    twitter_api.send(sentence)

# Use a patch to mock out twitter_api. You have to patch the Twitter() module/class 
# and have it return a mock. Much uglier, but sometimes necessary.
my_badly_written_tweeter(sentence):
    twitter_api = Twitter(user="XXX", password="YYY")
    sentence.replace('cks','x') 
    twitter_api.send(sentence)

替换Python中第一次出现的字符串

问题:替换Python中第一次出现的字符串

我有一些示例字符串。如何用空字符串替换长字符串中第一次出现的该字符串?

regex = re.compile('text')
match = regex.match(url)
if match:
    url = url.replace(regex, '')

I have some sample string. How can I replace first occurrence of this string in a longer string with empty string?

regex = re.compile('text')
match = regex.match(url)
if match:
    url = url.replace(regex, '')

回答 0

字符串replace()函数可以完美解决此问题:

string.replace(s,old,new [,maxreplace])

返回字符串s的副本,其中所有出现的子字符串old都被new替换。如果给出了可选参数maxreplace,则替换第一个出现的maxreplace。

>>> u'longlongTESTstringTEST'.replace('TEST', '?', 1)
u'longlong?stringTEST'

string replace() function perfectly solves this problem:

string.replace(s, old, new[, maxreplace])

Return a copy of string s with all occurrences of substring old replaced by new. If the optional argument maxreplace is given, the first maxreplace occurrences are replaced.

>>> u'longlongTESTstringTEST'.replace('TEST', '?', 1)
u'longlong?stringTEST'

回答 1

re.sub直接使用,可让您指定count

regex.sub('', url, 1)

(请注意,参数的顺序是replacementoriginal而不是相反的,这可能令人怀疑。)

Use re.sub directly, this allows you to specify a count:

regex.sub('', url, 1)

(Note that the order of arguments is replacement, original not the opposite, as might be suspected.)


SQLAlchemy:级联删除

问题:SQLAlchemy:级联删除

我必须缺少SQLAlchemy的级联选项的琐碎内容,因为我无法获得简单的级联删除来正确操作-如果删除了父元素,则子级将保留并带有null外键。

我在这里放了一个简洁的测试用例:

from sqlalchemy import Column, Integer, ForeignKey
from sqlalchemy.orm import relationship

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Parent(Base):
    __tablename__ = "parent"
    id = Column(Integer, primary_key = True)

class Child(Base):
    __tablename__ = "child"
    id = Column(Integer, primary_key = True)
    parentid = Column(Integer, ForeignKey(Parent.id))
    parent = relationship(Parent, cascade = "all,delete", backref = "children")

engine = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)

session = Session()

parent = Parent()
parent.children.append(Child())
parent.children.append(Child())
parent.children.append(Child())

session.add(parent)
session.commit()

print "Before delete, children = {0}".format(session.query(Child).count())
print "Before delete, parent = {0}".format(session.query(Parent).count())

session.delete(parent)
session.commit()

print "After delete, children = {0}".format(session.query(Child).count())
print "After delete parent = {0}".format(session.query(Parent).count())

session.close()

输出:

Before delete, children = 3
Before delete, parent = 1
After delete, children = 3
After delete parent = 0

父母与子女之间存在简单的一对多关系。该脚本创建一个父级,添加3个子级,然后提交。接下来,它删除父级,但子级仍然存在。为什么?如何使孩子级联删除?

I must be missing something trivial with SQLAlchemy’s cascade options because I cannot get a simple cascade delete to operate correctly — if a parent element is a deleted, the children persist, with null foreign keys.

I’ve put a concise test case here:

from sqlalchemy import Column, Integer, ForeignKey
from sqlalchemy.orm import relationship

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Parent(Base):
    __tablename__ = "parent"
    id = Column(Integer, primary_key = True)

class Child(Base):
    __tablename__ = "child"
    id = Column(Integer, primary_key = True)
    parentid = Column(Integer, ForeignKey(Parent.id))
    parent = relationship(Parent, cascade = "all,delete", backref = "children")

engine = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)

session = Session()

parent = Parent()
parent.children.append(Child())
parent.children.append(Child())
parent.children.append(Child())

session.add(parent)
session.commit()

print "Before delete, children = {0}".format(session.query(Child).count())
print "Before delete, parent = {0}".format(session.query(Parent).count())

session.delete(parent)
session.commit()

print "After delete, children = {0}".format(session.query(Child).count())
print "After delete parent = {0}".format(session.query(Parent).count())

session.close()

Output:

Before delete, children = 3
Before delete, parent = 1
After delete, children = 3
After delete parent = 0

There is a simple, one-to-many relationship between Parent and Child. The script creates a parent, adds 3 children, then commits. Next, it deletes the parent, but the children persist. Why? How do I make the children cascade delete?


回答 0

问题是sqlalchemy认为Child是父级的,因为这是您定义关系的地方(当然,它并不关心您将其称为“子级”)。

如果您在Parent类上定义关系,它将起作用:

children = relationship("Child", cascade="all,delete", backref="parent")

(请注意"Child"为字符串:使用声明式样式时允许这样做,以便您可以引用尚未定义的类)

您可能还想添加delete-orphandelete导致删除父级时删除子级,delete-orphan也删除从父级“删除”的所有子级,即使未删除父级也是如此)

编辑:刚刚发现:如果您确实想在Child类上定义关系,则可以这样做,但是您将必须在backref上定义级联(通过显式创建backref),如下所示:

parent = relationship(Parent, backref=backref("children", cascade="all,delete"))

(暗示from sqlalchemy.orm import backref

The problem is that sqlalchemy considers Child as the parent, because that is where you defined your relationship (it doesn’t care that you called it “Child” of course).

If you define the relationship on the Parent class instead, it will work:

children = relationship("Child", cascade="all,delete", backref="parent")

(note "Child" as a string: this is allowed when using the declarative style, so that you are able to refer to a class that is not yet defined)

You might want to add delete-orphan as well (delete causes children to be deleted when the parent gets deleted, delete-orphan also deletes any children that were “removed” from the parent, even if the parent is not deleted)

EDIT: just found out: if you really want to define the relationship on the Child class, you can do so, but you will have to define the cascade on the backref (by creating the backref explicitly), like this:

parent = relationship(Parent, backref=backref("children", cascade="all,delete"))

(implying from sqlalchemy.orm import backref)


回答 1

当您删除@Steven的附件时,session.delete()这是一件好事,对于我而言,这永远不会发生。我注意到大部分时间都是通过删除session.query().filter().delete()(它不会将元素放入内存中并直接从db中删除)。使用此方法sqlalchemy cascade='all, delete'无效。但是,有一个解决方案:ON DELETE CASCADE通过db(注意:并非所有数据库都支持它)。

class Child(Base):
    __tablename__ = "children"

    id = Column(Integer, primary_key=True)
    parent_id = Column(Integer, ForeignKey("parents.id", ondelete='CASCADE'))

class Parent(Base):
    __tablename__ = "parents"

    id = Column(Integer, primary_key=True)
    child = relationship(Child, backref="parent", passive_deletes=True)

@Steven’s asnwer is good when you are deleting through session.delete() which never happens in my case. I noticed that most of the time I delete through session.query().filter().delete() (which doesn’t put elements in the memory and deletes directly from db). Using this method sqlalchemy’s cascade='all, delete' doesn’t work. There is a solution though: ON DELETE CASCADE through db (note: not all databases support it).

class Child(Base):
    __tablename__ = "children"

    id = Column(Integer, primary_key=True)
    parent_id = Column(Integer, ForeignKey("parents.id", ondelete='CASCADE'))

class Parent(Base):
    __tablename__ = "parents"

    id = Column(Integer, primary_key=True)
    child = relationship(Child, backref="parent", passive_deletes=True)

回答 2

很老的帖子,但是我只是花了一两个小时,所以我想分享我的发现,特别是因为列出的其他一些评论不太正确。

TL; DR

给子表一个外部表或修改现有表,并添加ondelete='CASCADE'

parent_id = db.Column(db.Integer, db.ForeignKey('parent.id', ondelete='CASCADE'))

一个下列关系:

a)在父表上:

children = db.relationship('Child', backref='parent', passive_deletes=True)

b)在子表上:

parent = db.relationship('Parent', backref=backref('children', passive_deletes=True))

细节

首先,尽管接受了答案,但父母/子女关系不是通过使用建立的relationship,而是通过使用建立的ForeignKey。您可以将它relationship放在父表或子表上,它将正常工作。尽管显然在子表上,backref除了关键字参数之外,您还必须使用该函数。

选项1(首选)

其次,SqlAlchemy支持两种不同的级联。我建议的第一个和第一个建议是内置于数据库中的,通常采取对外键声明的约束形式。在PostgreSQL中,它看起来像这样:

CONSTRAINT child_parent_id_fkey FOREIGN KEY (parent_id)
REFERENCES parent_table(id) MATCH SIMPLE
ON DELETE CASCADE

这意味着当您从中删除记录时parent_table,数据库中的所有相应行都child_table将为您删除。它快速可靠,可能是您最好的选择。您可以通过以下方式在SqlAlchemy中进行设置ForeignKey(子表定义的一部分):

parent_id = db.Column(db.Integer, db.ForeignKey('parent.id', ondelete='CASCADE'))
parent = db.relationship('Parent', backref=backref('children', passive_deletes=True))

ondelete='CASCADE'是创建零件ON DELETE CASCADE放在桌子上。

知道了!

这里有一个重要的警告。请注意我如何relationship指定passive_deletes=True?如果没有的话,整个事情将无法进行。这是因为默认情况下,当您删除父记录时,SqlAlchemy所做的事情确实很奇怪。它将所有子行的外键设置为NULL。因此,如果您从parent_tablewhere id= 5 删除一行,那么它将基本上执行

UPDATE child_table SET parent_id = NULL WHERE parent_id = 5

为什么你要这个我不知道。如果许多数据库引擎甚至允许您将有效外键设置为NULL,那么我会感到很惊讶,从而创建了一个孤儿。似乎是个坏主意,但也许有一个用例。无论如何,如果让SqlAlchemy执行此操作,则将防止数据库能够使用ON DELETE CASCADE您设置的清理子级。这是因为它依靠那些外键来知道要删除哪些子行。一旦SqlAlchemy将它们全部设置为NULL,数据库将无法删除它们。设置passive_deletes=Trueprevent可以防止SqlAlchemy NULL读出外键。

您可以在SqlAlchemy文档中阅读有关被动删除的更多信息。

选项2

您可以执行的另一种方法是让SqlAlchemy为您完成。这是使用的cascade参数设置的relationship。如果您在父表上定义了关系,则它看起来像这样:

children = relationship('Child', cascade='all,delete', backref='parent')

如果该关系与孩子有关,则可以这样进行:

parent = relationship('Parent', backref=backref('children', cascade='all,delete'))

同样,这是孩子,因此您必须调用一个称为的方法backref并将级联数据放入其中。

这样,当您删除父行时,SqlAlchemy实际上将运行delete语句供您清理子行。如果您愿意,这可能不如让该数据库处理有效,所以我不建议这样做。

这是有关其支持的级联功能的SqlAlchemy文档

Pretty old post, but I just spent an hour or two on this, so I wanted to share my finding, especially since some of the other comments listed aren’t quite right.

TL;DR

Give the child table a foreign or modify the existing one, adding ondelete='CASCADE':

parent_id = db.Column(db.Integer, db.ForeignKey('parent.id', ondelete='CASCADE'))

And one of the following relationships:

a) This on the parent table:

children = db.relationship('Child', backref='parent', passive_deletes=True)

b) Or this on the child table:

parent = db.relationship('Parent', backref=backref('children', passive_deletes=True))

Details

First off, despite what the accepted answer says, the parent/child relationship is not established by using relationship, it’s established by using ForeignKey. You can put the relationship on either the parent or child tables and it will work fine. Although, apparently on the child tables, you have to use the backref function in addition to the keyword argument.

Option 1 (preferred)

Second, SqlAlchemy supports two different kinds of cascading. The first, and the one I recommend, is built into your database and usually takes the form of a constraint on the foreign key declaration. In PostgreSQL it looks like this:

CONSTRAINT child_parent_id_fkey FOREIGN KEY (parent_id)
REFERENCES parent_table(id) MATCH SIMPLE
ON DELETE CASCADE

This means that when you delete a record from parent_table, then all the corresponding rows in child_table will be deleted for you by the database. It’s fast and reliable and probably your best bet. You set this up in SqlAlchemy through ForeignKey like this (part of the child table definition):

parent_id = db.Column(db.Integer, db.ForeignKey('parent.id', ondelete='CASCADE'))
parent = db.relationship('Parent', backref=backref('children', passive_deletes=True))

The ondelete='CASCADE' is the part that creates the ON DELETE CASCADE on the table.

Gotcha!

There’s an important caveat here. Notice how I have a relationship specified with passive_deletes=True? If you don’t have that, the entire thing will not work. This is because by default when you delete a parent record SqlAlchemy does something really weird. It sets the foreign keys of all child rows to NULL. So if you delete a row from parent_table where id = 5, then it will basically execute

UPDATE child_table SET parent_id = NULL WHERE parent_id = 5

Why you would want this I have no idea. I’d be surprised if many database engines even allowed you to set a valid foreign key to NULL, creating an orphan. Seems like a bad idea, but maybe there’s a use case. Anyway, if you let SqlAlchemy do this, you will prevent the database from being able to clean up the children using the ON DELETE CASCADE that you set up. This is because it relies on those foreign keys to know which child rows to delete. Once SqlAlchemy has set them all to NULL, the database can’t delete them. Setting the passive_deletes=True prevents SqlAlchemy from NULLing out the foreign keys.

You can read more about passive deletes in the SqlAlchemy docs.

Option 2

The other way you can do it is to let SqlAlchemy do it for you. This is set up using the cascade argument of the relationship. If you have the relationship defined on the parent table, it looks like this:

children = relationship('Child', cascade='all,delete', backref='parent')

If the relationship is on the child, you do it like this:

parent = relationship('Parent', backref=backref('children', cascade='all,delete'))

Again, this is the child so you have to call a method called backref and putting the cascade data in there.

With this in place, when you delete a parent row, SqlAlchemy will actually run delete statements for you to clean up the child rows. This will likely not be as efficient as letting this database handle if for you so I don’t recommend it.

Here are the SqlAlchemy docs on the cascading features it supports.


回答 3

Steven是正确的,因为您需要显式创建backref,这将导致级联被应用到父级(而不是像在测试场景中那样被应用于子级)。

但是,在Child上定义关系不会使sqlalchemy将Child视为父级。定义关系的位置(子级或父级)都无关紧要,它的外键链接两个确定父级和子级的表。

不过,遵循一个惯例是有意义的,并且根据史蒂文的回应,我正在定义我所有与父母的孩子关系。

Steven is correct in that you need to explicitly create the backref, this results in the cascade being applied on the parent (as opposed to it being applied to the child like in the test scenario).

However, defining the relationship on the Child does NOT make sqlalchemy consider Child the parent. It doesn’t matter where the relationship is defined (child or parent), its the foreign key that links the two tables that determines which is the parent and which is the child.

It makes sense to stick to one convention though, and based on Steven’s response, I’m defining all my child relationships on the parent.


回答 4

我也为文档苦苦挣扎,但是发现文档字符串本身比手册更容易。例如,如果您从sqlalchemy.orm导入关系并执行help(relationship),它将为您提供可以为级联指定的所有选项。项目符号为delete-orphan

如果检测到没有父母的孩子类型的项目,则将其标记为删除。
请注意,此选项可防止在没有父母出席的情况下持久保留孩子Class中待处理的项目。

我知道您的问题更多地在于定义父子关系的文档的方式。但是似乎您也可能对层叠选项有疑问,因为"all"include "delete""delete-orphan"是唯一未包含的选项"all"

I struggled with the documentation as well, but found that the docstrings themselves tend to be easier than the manual. For example, if you import relationship from sqlalchemy.orm and do help(relationship), it will give you all the options you can specify for cascade. The bullet for delete-orphan says:

if an item of the child’s type with no parent is detected, mark it for deletion.
Note that this option prevents a pending item of the child’s class from being persisted without a parent present.

I realize your issue was more with the way the documentation for defining parent-child relationships. But it seemed that you might also be having a problem with the cascade options, because "all" includes "delete". "delete-orphan" is the only option that’s not included in "all".


回答 5

史蒂文的答案很坚定。我想指出另外一个含义。

通过使用relationship,您将使应用层(Flask)负责引用完整性。这意味着其他不通过Flask访问数据库的进程(例如数据库实用程序或直接连接到数据库的人)将不会遇到这些约束,并且可能以破坏您如此努力设计的逻辑数据模型的方式更改数据。

尽可能使用ForeignKeyd512和Alex描述的方法。DB引擎非常擅长真正地执行约束(以不可避免的方式),因此,这是保持数据完整性的最佳策略。您唯一需要依赖应用程序来处理数据完整性的时间是数据库无法处理数据完整性时,例如不支持外键的SQLite版本。

如果您需要在实体之间创建进一步的链接以启用诸如导航父子对象关系之类的应用行为backref,请与结合使用ForeignKey

Steven’s answer is solid. I’d like to point out an additional implication.

By using relationship, you’re making the app layer (Flask) responsible for referential integrity. That means other processes that access the database not through Flask, like a database utility or a person connecting to the database directly, will not experience those constraints and could change your data in a way that breaks the logical data model you worked so hard to design.

Whenever possible, use the ForeignKey approach described by d512 and Alex. The DB engine is very good at truly enforcing constraints (in an unavoidable way), so this is by far the best strategy for maintaining data integrity. The only time you need to rely on an app to handle data integrity is when the database can’t handle them, e.g. versions of SQLite that don’t support foreign keys.

If you need to create further linkage among entities to enable app behaviors like navigating parent-child object relationships, use backref in conjunction with ForeignKey.


回答 6

Stevan的回答是完美的。但是,如果仍然出现错误。在此之上的其他可能的尝试是-

http://vincentaudebert.github.io/python/sql/2015/10/09/cascade-delete-sqlalchemy/

从链接复制-

快速提示:即使您在模型中指定了级联删除,如果您遇到外键依赖关系时遇到麻烦。

使用SQLAlchemy指定cascade='all, delete'父级表上应具有的级联删除。好的,但是当您执行类似的操作时:

session.query(models.yourmodule.YourParentTable).filter(conditions).delete()

实际上,它会触发有关您的子表中使用的外键的错误。

我用它来查询对象然后删除它的解决方案:

session = models.DBSession()
your_db_object = session.query(models.yourmodule.YourParentTable).filter(conditions).first()
if your_db_object is not None:
    session.delete(your_db_object)

这将删除您的父记录以及与其关联的所有子记录。

Answer by Stevan is perfect. But if you are still getting the error. Other possible try on top of that would be –

http://vincentaudebert.github.io/python/sql/2015/10/09/cascade-delete-sqlalchemy/

Copied from the link-

Quick tip if you get in trouble with a foreign key dependency even if you have specified a cascade delete in your models.

Using SQLAlchemy, to specify a cascade delete you should have cascade='all, delete' on your parent table. Ok but then when you execute something like:

session.query(models.yourmodule.YourParentTable).filter(conditions).delete()

It actually triggers an error about a foreign key used in your children tables.

The solution I used it to query the object and then delete it:

session = models.DBSession()
your_db_object = session.query(models.yourmodule.YourParentTable).filter(conditions).first()
if your_db_object is not None:
    session.delete(your_db_object)

This should delete your parent record AND all the children associated with it.


回答 7

Alex Okrushko的回答对我来说几乎是最好的。结合使用ondelete =’CASCADE’和passive_deletes = True。但是我必须做些额外的事情才能使其在sqlite中起作用。

Base = declarative_base()
ROOM_TABLE = "roomdata"
FURNITURE_TABLE = "furnituredata"

class DBFurniture(Base):
    __tablename__ = FURNITURE_TABLE
    id = Column(Integer, primary_key=True)
    room_id = Column(Integer, ForeignKey('roomdata.id', ondelete='CASCADE'))


class DBRoom(Base):
    __tablename__ = ROOM_TABLE
    id = Column(Integer, primary_key=True)
    furniture = relationship("DBFurniture", backref="room", passive_deletes=True)

确保添加此代码以确保其适用于sqlite。

from sqlalchemy import event
from sqlalchemy.engine import Engine
from sqlite3 import Connection as SQLite3Connection

@event.listens_for(Engine, "connect")
def _set_sqlite_pragma(dbapi_connection, connection_record):
    if isinstance(dbapi_connection, SQLite3Connection):
        cursor = dbapi_connection.cursor()
        cursor.execute("PRAGMA foreign_keys=ON;")
        cursor.close()

从这里偷来的:SQLAlchemy表达式语言和SQLite的删除级联

Alex Okrushko answer almost worked best for me. Used ondelete=’CASCADE’ and passive_deletes=True combined. But I had to do something extra to make it work for sqlite.

Base = declarative_base()
ROOM_TABLE = "roomdata"
FURNITURE_TABLE = "furnituredata"

class DBFurniture(Base):
    __tablename__ = FURNITURE_TABLE
    id = Column(Integer, primary_key=True)
    room_id = Column(Integer, ForeignKey('roomdata.id', ondelete='CASCADE'))


class DBRoom(Base):
    __tablename__ = ROOM_TABLE
    id = Column(Integer, primary_key=True)
    furniture = relationship("DBFurniture", backref="room", passive_deletes=True)

Make sure to add this code to ensure it works for sqlite.

from sqlalchemy import event
from sqlalchemy.engine import Engine
from sqlite3 import Connection as SQLite3Connection

@event.listens_for(Engine, "connect")
def _set_sqlite_pragma(dbapi_connection, connection_record):
    if isinstance(dbapi_connection, SQLite3Connection):
        cursor = dbapi_connection.cursor()
        cursor.execute("PRAGMA foreign_keys=ON;")
        cursor.close()

Stolen from here: SQLAlchemy expression language and SQLite’s on delete cascade


回答 8

TLDR:如果上述解决方案不起作用,请尝试将nullable = False添加到您的列中。

我想在这里为一些可能无法使层叠功能与现有解决方案配合使用的人提供一个小技巧(很棒)。我的工作和示例之间的主要区别是我使用了自动映射。我不确切知道这可能如何影响级联的设置,但是我想指出我使用了它。我也在使用SQLite数据库。

我尝试了这里描述的所有解决方案,但是当删除父行时,子表中的行继续将其外键设置为null。我在这里尝试了所有解决方案都无济于事。但是,一旦我将带有外键的子列设置为nullable = False,级联就可以工作。

在子表上,我添加了:

Column('parent_id', Integer(), ForeignKey('parent.id', ondelete="CASCADE"), nullable=False)
Child.parent = relationship("parent", backref=backref("children", passive_deletes=True)

通过此设置,级联可以按预期运行。

TLDR: If the above solutions don’t work, try adding nullable=False to your column.

I’d like to add a small point here for some people who may not get the cascade function to work with the existing solutions (which are great). The main difference between my work and the example was that I used automap. I do not know exactly how that might interfere with the setup of cascades, but I want to note that I used it. I am also working with a SQLite database.

I tried every solution described here, but rows in my child table continued to have their foreign key set to null when the parent row was deleted. I’d tried all the solutions here to no avail. However, the cascade worked once I set the child column with the foreign key to nullable = False.

On the child table, I added:

Column('parent_id', Integer(), ForeignKey('parent.id', ondelete="CASCADE"), nullable=False)
Child.parent = relationship("parent", backref=backref("children", passive_deletes=True)

With this setup, the cascade functioned as expected.