标签归档:ORM

SQLAlchemy:引擎,连接和会话的区别

问题:SQLAlchemy:引擎,连接和会话的区别

我使用SQLAlchemy并至少有三个实体:enginesession并且connection,其中有execute方法,所以如果我如想选择所有记录table我能做到这一点

engine.execute(select([table])).fetchall()

还有这个

connection.execute(select([table])).fetchall()

甚至这个

session.execute(select([table])).fetchall()

-结果将是相同的。

据我了解,如果有人使用engine.executeconnection,它会创建,打开session(Alchemy会为您处理)并执行查询。但是,执行此任务的这三种方式之间是否存在全局差异?

I use SQLAlchemy and there are at least three entities: engine, session and connection, which have execute method, so if I e.g. want to select all records from table I can do this

engine.execute(select([table])).fetchall()

and this

connection.execute(select([table])).fetchall()

and even this

session.execute(select([table])).fetchall()

– the results will be the same.

As I understand it, if someone uses engine.execute it creates connection, opens session (Alchemy takes care of it for you) and executes the query. But is there a global difference between these three ways of performing such a task?


回答 0

单行概述:

的行为execute()是在所有情况下相同,但它们是3种不同的方法,在EngineConnectionSession类。

到底是什么execute()

要了解行为,execute()我们需要调查Executable该类。Executable是所有“语句”类型对象的超类,包括select(),delete(),update(),insert(),text()-用最简单的词来说,Executable是SQLAlchemy支持的SQL表达式构造。

在所有情况下,该execute()方法均采用SQL文本或构造的SQL表达式,即SQLAlchemy支持的各种SQL表达式构造,并返回查询结果(ResultProxya-包装DB-API游标对象以更轻松地访问行列。)


为了进一步澄清(仅用于概念澄清,不建议使用方法)

除了Engine.execute()(无连接执行),Connection.execute()和之外Session.execute(),还可以execute()直接在任何Executable构造上使用。该Executable班有它自己的执行execute()-每个正式文件作为,对什么人一行说明execute()确实是“ 编译并执行这个Executable ”。在这种情况下,我们需要将Executable(SQL表达式构造)与Connection对象或Engine对象(隐式获取Connection对象)进行显式绑定,以便execute()将知道在何处执行SQL

下面的示例很好地演示了它-给定如下表:

from sqlalchemy import MetaData, Table, Column, Integer

meta = MetaData()
users_table = Table('users', meta,
    Column('id', Integer, primary_key=True),
    Column('name', String(50)))

显式执行,Connection.execute()-将SQL文本或构造的SQL表达式传递给以下execute()方法Connection

engine = create_engine('sqlite:///file.db')
connection = engine.connect()
result = connection.execute(users_table.select())
for row in result:
    # ....
connection.close()

显式无连接执行,Engine.execute()-将SQL文本或构造的SQL表达式直接传递给execute()Engine方法:

engine = create_engine('sqlite:///file.db')
result = engine.execute(users_table.select())
for row in result:
    # ....
result.close()

隐式执行(Executable.execute()-)也是无连接的,并且调用的execute()方法Executable,即它execute()直接在SQL表达式构造(的实例Executable)本身上调用方法。

engine = create_engine('sqlite:///file.db')
meta.bind = engine
result = users_table.select().execute()
for row in result:
    # ....
result.close()

注意:出于说明的目的,陈述了隐式执行示例-强烈建议不按照这种方式执行这种执行方式-按照docs

“隐式执行”是一个非常古老的使用模式,在大多数情况下,它比有用的方法更令人困惑,并且不鼓励使用它。两种模式似乎都鼓励在应用程序设计中过度使用权宜之计的“捷径”,这会在以后导致问题。


你的问题:

据我了解,如果有人使用engine.execute,它将创建连接,打开会话(Alchemy会为您关心)并执行查询。

您认为“如果有人使用engine.execute它会创建connection” 这一部分是正确的,但对于“打开session(炼金术会为您关心)并执行查询”而言,您是正确的- 在形式上,使用Engine.execute()Connection.execute()(几乎)是同一件事,在形式上,Connection对象是隐式创建的,在以后的情况下,我们显式实例化它。在这种情况下真正发生的是:

`Engine` object (instantiated via `create_engine()`) -> `Connection` object (instantiated via `engine_instance.connect()`) -> `connection.execute({*SQL expression*})`

但是,执行此任务的这三种方式之间是否存在全局差异?

在数据库层,这完全是同一回事,所有这些都在执行SQL(文本表达式或各种SQL表达式构造)。从应用程序的角度来看,有两个选项:

  • 直接执行-使用Engine.execute()Connection.execute()
  • 使用sessions-通过有效地处理交易单单元的工作,轻松session.add()session.rollback()session.commit()session.close()。在ORM(即映射表)的情况下,这是与DB进行交互的方式。提供identity_map,以便在单个请求期间立即获取已被访问的对象或新创建/添加的对象。

Session.execute()最终使用Connection.execute()语句执行方法来执行SQL语句。使用Session对象是SQLAlchemy ORM建议的应用程序与数据库交互的方式。

文档摘录:

重要的是要注意,在使用SQLAlchemy ORM时,通常不访问这些对象。而是将Session对象用作数据库的接口。但是,对于围绕直接使用文本SQL语句和/或SQL表达式构造而无需ORM更高级别的管理服务参与的应用程序,“引擎”和“连接”为王(也是王后?),请继续阅读。

A one-line overview:

The behavior of execute() is same in all the cases, but they are 3 different methods, in Engine, Connection, and Session classes.

What exactly is execute():

To understand behavior of execute() we need to look into the Executable class. Executable is a superclass for all “statement” types of objects, including select(), delete(),update(), insert(), text() – in simplest words possible, an Executable is a SQL expression construct supported in SQLAlchemy.

In all the cases the execute() method takes the SQL text or constructed SQL expression i.e. any of the variety of SQL expression constructs supported in SQLAlchemy and returns query results (a ResultProxy – Wraps a DB-API cursor object to provide easier access to row columns.)


To clarify it further (only for conceptual clarification, not a recommended approach):

In addition to Engine.execute() (connectionless execution), Connection.execute(), and Session.execute(), it is also possible to use the execute() directly on any Executable construct. The Executable class has it’s own implementation of execute() – As per official documentation, one line description about what the execute() does is “Compile and execute this Executable“. In this case we need to explicitly bind the Executable (SQL expression construct) with a Connection object or, Engine object (which implicitly get a Connection object), so the execute() will know where to execute the SQL.

The following example demonstrates it well – Given a table as below:

from sqlalchemy import MetaData, Table, Column, Integer

meta = MetaData()
users_table = Table('users', meta,
    Column('id', Integer, primary_key=True),
    Column('name', String(50)))

Explicit execution i.e. Connection.execute() – passing the SQL text or constructed SQL expression to the execute() method of Connection:

engine = create_engine('sqlite:///file.db')
connection = engine.connect()
result = connection.execute(users_table.select())
for row in result:
    # ....
connection.close()

Explicit connectionless execution i.e. Engine.execute() – passing the SQL text or constructed SQL expression directly to the execute() method of Engine:

engine = create_engine('sqlite:///file.db')
result = engine.execute(users_table.select())
for row in result:
    # ....
result.close()

Implicit execution i.e. Executable.execute() – is also connectionless, and calls the execute() method of the Executable, that is, it calls execute() method directly on the SQL expression construct (an instance of Executable) itself.

engine = create_engine('sqlite:///file.db')
meta.bind = engine
result = users_table.select().execute()
for row in result:
    # ....
result.close()

Note: Stated the implicit execution example for the purpose of clarification – this way of execution is highly not recommended – as per docs:

“implicit execution” is a very old usage pattern that in most cases is more confusing than it is helpful, and its usage is discouraged. Both patterns seem to encourage the overuse of expedient “short cuts” in application design which lead to problems later on.


Your questions:

As I understand if someone use engine.execute it creates connection, opens session (Alchemy cares about it for you) and executes query.

You’re right for the part “if someone use engine.execute it creates connection ” but not for “opens session (Alchemy cares about it for you) and executes query ” – Using Engine.execute() and Connection.execute() is (almost) one the same thing, in formal, Connection object gets created implicitly, and in later case we explicitly instantiate it. What really happens in this case is:

`Engine` object (instantiated via `create_engine()`) -> `Connection` object (instantiated via `engine_instance.connect()`) -> `connection.execute({*SQL expression*})`

But is there a global difference between these three ways of performing such task?

At DB layer it’s exactly the same thing, all of them are executing SQL (text expression or various SQL expression constructs). From application’s point of view there are two options:

  • Direct execution – Using Engine.execute() or Connection.execute()
  • Using sessions – efficiently handles transaction as single unit-of-work, with ease via session.add(), session.rollback(), session.commit(), session.close(). It is the way to interact with the DB in case of ORM i.e. mapped tables. Provides identity_map for instantly getting already accessed or newly created/added objects during a single request.

Session.execute() ultimately uses Connection.execute() statement execution method in order to execute the SQL statement. Using Session object is SQLAlchemy ORM’s recommended way for an application to interact with the database.

An excerpt from the docs:

Its important to note that when using the SQLAlchemy ORM, these objects are not generally accessed; instead, the Session object is used as the interface to the database. However, for applications that are built around direct usage of textual SQL statements and/or SQL expression constructs without involvement by the ORM’s higher level management services, the Engine and Connection are king (and queen?) – read on.


回答 1

Nabeel的答案涵盖了很多细节并且很有帮助,但是我发现难以理解。由于这是该问题的第一个Google结果,因此,我对以后发现此问题的人们加深了理解:

运行.execute()

正如OP和Nabell Ahmed都指出的那样,执行平原时SELECT * FROM tablename,提供的结果没有区别。

这三个对象之间的区别取决于上下文就成为非常重要的SELECT声明中,或者更常见的是,当你想要做其他事情一样使用INSERTDELETE等等。

何时使用引擎,连接,会话

  • 引擎是SQLAlchemy使用的最低级别的对象。它维护了一个连接池,可在应用程序需要与数据库对话时使用。.execute()是一种先调用conn = engine.connect(close_with_result=True)然后调用的便捷方法conn.execute()。close_with_result参数表示连接自动关闭。(我稍微解释了源代码,但本质上是正确的)。编辑:这是engine.execute的源代码

    您可以使用引擎执行原始SQL。

    result = engine.execute('SELECT * FROM tablename;')
    #what engine.execute() is doing under the hood
    conn = engine.connect(close_with_result=True)
    result = conn.execute('SELECT * FROM tablename;')
    
    #after you iterate over the results, the result and connection get closed
    for row in result:
        print(result['columnname']
    
    #or you can explicitly close the result, which also closes the connection
    result.close()

    基本用法下的文档中对此进行了介绍。

  • 连接(正如我们在上面看到的)实际上是执行SQL查询的工作。每当您想更好地控制连接的属性,何时关闭连接等时,都应该执行此操作。例如,非常重要的示例是Transaction,它使您可以决定何时将更改提交到数据库。在正常使用中,更改是自动提交的。通过使用事务,您可以(例如)运行多个不同的SQL语句,如果其中一个出现问题,则可以立即撤消所有更改。

    connection = engine.connect()
    trans = connection.begin()
    try:
        connection.execute("INSERT INTO films VALUES ('Comedy', '82 minutes');")
        connection.execute("INSERT INTO datalog VALUES ('added a comedy');")
        trans.commit()
    except:
        trans.rollback()
        raise

    如果一次失败,这将使您撤消两项更改,就像您忘记创建数据日志表一样。

    因此,如果您正在执行原始SQL代码并需要控制,请使用连接

  • 会话用于SQLAlchemy的对象关系管理(ORM)方面(实际上,您可以从它们的导入方式中看到这一点:)from sqlalchemy.orm import sessionmaker。他们在后台使用连接和事务来运行其自动生成的SQL语句。.execute()是一个便捷功能,可传递到会话绑定的任何对象(通常是引擎,但可以是连接)。

    如果您使用的是ORM功能,请使用会话。如果只执行不绑定对象的直接SQL查询,则最好直接使用连接。

Nabeel’s answer covers a lot of details and is helpful, but I found it confusing to follow. Since this is currently the first Google result for this issue, adding my understanding of it for future people that find this question:

Running .execute()

As OP and Nabell Ahmed both note, when executing a plain SELECT * FROM tablename, there’s no difference in the result provided.

The differences between these three objects do become important depending on the context that the SELECT statement is used in or, more commonly, when you want to do other things like INSERT, DELETE, etc.

When to use Engine, Connection, Session generally

  • Engine is the lowest level object used by SQLAlchemy. It maintains a pool of connections available for use whenever the application needs to talk to the database. .execute() is a convenience method that first calls conn = engine.connect(close_with_result=True) and the then conn.execute(). The close_with_result parameter means the connection is closed automatically. (I’m slightly paraphrasing the source code, but essentially true). edit: Here’s the source code for engine.execute

    You can use engine to execute raw SQL.

    result = engine.execute('SELECT * FROM tablename;')
    #what engine.execute() is doing under the hood
    conn = engine.connect(close_with_result=True)
    result = conn.execute('SELECT * FROM tablename;')
    
    #after you iterate over the results, the result and connection get closed
    for row in result:
        print(result['columnname']
    
    #or you can explicitly close the result, which also closes the connection
    result.close()
    

    This is covered in the docs under basic usage.

  • Connection is (as we saw above) the thing that actually does the work of executing a SQL query. You should do this whenever you want greater control over attributes of the connection, when it gets closed, etc. For example, a very import example of this is a Transaction, which lets you decide when to commit your changes to the database. In normal use, changes are autocommitted. With the use of transactions, you could (for example) run several different SQL statements and if something goes wrong with one of them you could undo all the changes at once.

    connection = engine.connect()
    trans = connection.begin()
    try:
        connection.execute("INSERT INTO films VALUES ('Comedy', '82 minutes');")
        connection.execute("INSERT INTO datalog VALUES ('added a comedy');")
        trans.commit()
    except:
        trans.rollback()
        raise
    

    This would let you undo both changes if one failed, like if you forgot to create the datalog table.

    So if you’re executing raw SQL code and need control, use connections

  • Sessions are used for the Object Relationship Management (ORM) aspect of SQLAlchemy (in fact you can see this from how they’re imported: from sqlalchemy.orm import sessionmaker). They use connections and transactions under the hood to run their automatically-generated SQL statements. .execute() is a convenience function that passes through to whatever the session is bound to (usually an engine, but can be a connection).

    If you’re using the ORM functionality, use session; if you’re only doing straight SQL queries not bound to objects, you’re probably better off using connections directly.


回答 2

这是运行诸如GRANT之类的DCL(数据控制语言)的示例

def grantAccess(db, tb, user):
  import sqlalchemy as SA
  import psycopg2

  url = "{d}+{driver}://{u}:{p}@{h}:{port}/{db}".\
            format(d="redshift",
            driver='psycopg2',
            u=username,
            p=password,
            h=host,
            port=port,
            db=db)
  engine = SA.create_engine(url)
  cnn = engine.connect()
  trans = cnn.begin()
  strSQL = "GRANT SELECT on table " + tb + " to " + user + " ;"
  try:
      cnn.execute(strSQL)
      trans.commit()
  except:
      trans.rollback()
      raise

Here is an example of running DCL (Data Control Language) such as GRANT

def grantAccess(db, tb, user):
  import sqlalchemy as SA
  import psycopg2

  url = "{d}+{driver}://{u}:{p}@{h}:{port}/{db}".\
            format(d="redshift",
            driver='psycopg2',
            u=username,
            p=password,
            h=host,
            port=port,
            db=db)
  engine = SA.create_engine(url)
  cnn = engine.connect()
  trans = cnn.begin()
  strSQL = "GRANT SELECT on table " + tb + " to " + user + " ;"
  try:
      cnn.execute(strSQL)
      trans.commit()
  except:
      trans.rollback()
      raise

使用SQLAlchemy ORM批量插入

问题:使用SQLAlchemy ORM批量插入

有什么方法可以让SQLAlchemy进行批量插入,而不是插入每个对象。即

在做:

INSERT INTO `foo` (`bar`) VALUES (1), (2), (3)

而不是:

INSERT INTO `foo` (`bar`) VALUES (1)
INSERT INTO `foo` (`bar`) VALUES (2)
INSERT INTO `foo` (`bar`) VALUES (3)

我刚刚将一些代码转换为使用sqlalchemy而不是原始sql,尽管现在使用它起来要好得多,但现在似乎要慢一些(最多10倍),我想知道这是否是原因。

也许我可以更有效地使用会话来改善这种情况。目前,我已经添加了一些东西,autoCommit=False并做了一个session.commit()。尽管如果在其他地方更改了数据库,这似乎会使数据过时,例如,即使我执行新查询,我仍然可以返回旧结果?

谢谢你的帮助!

Is there any way to get SQLAlchemy to do a bulk insert rather than inserting each individual object. i.e.,

doing:

INSERT INTO `foo` (`bar`) VALUES (1), (2), (3)

rather than:

INSERT INTO `foo` (`bar`) VALUES (1)
INSERT INTO `foo` (`bar`) VALUES (2)
INSERT INTO `foo` (`bar`) VALUES (3)

I’ve just converted some code to use sqlalchemy rather than raw sql and although it is now much nicer to work with it seems to be slower now (up to a factor of 10), I’m wondering if this is the reason.

May be I could improve the situation using sessions more efficiently. At the moment I have autoCommit=False and do a session.commit() after I’ve added some stuff. Although this seems to cause the data to go stale if the DB is changed elsewhere, like even if I do a new query I still get old results back?

Thanks for your help!


回答 0

SQLAlchemy在版本中引入了该功能1.0.0

批量操作-SQLAlchemy文档

通过这些操作,您现在可以批量插入或更新!

例如,您可以执行以下操作:

s = Session()
objects = [
    User(name="u1"),
    User(name="u2"),
    User(name="u3")
]
s.bulk_save_objects(objects)
s.commit()

在这里,将制成大量插入物。

SQLAlchemy introduced that in version 1.0.0:

Bulk operations – SQLAlchemy docs

With these operations, you can now do bulk inserts or updates!

For instance, you can do:

s = Session()
objects = [
    User(name="u1"),
    User(name="u2"),
    User(name="u3")
]
s.bulk_save_objects(objects)
s.commit()

Here, a bulk insert will be made.


回答 1

sqlalchemy文档对可用于批量插入的各种技术的性能进行了总结

ORM基本上不是用于高性能批量插入的-这是SQLAlchemy除了将ORM作为一流组件之外还提供Core的全部原因。

对于快速批量插入的用例,ORM所基于的SQL生成和执行系统是Core的一部分。直接使用该系统,我们可以产生与直接使用原始数据库API相比具有竞争力的INSERT。

另外,SQLAlchemy ORM提供了Bulk Operations方法套件,该套件提供了到工作单元过程各部分的挂钩,以便发出基于ORM的自动化程度较低的Core级INSERT和UPDATE构造。

下面的示例说明了基于时间的测试,该测试针对从自动程度最高到最少的几种不同的行插入方法。使用cPython 2.7,可以观察到运行时:

classics-MacBook-Pro:sqlalchemy classic$ python test.py
SQLAlchemy ORM: Total time for 100000 records 12.0471920967 secs
SQLAlchemy ORM pk given: Total time for 100000 records 7.06283402443 secs
SQLAlchemy ORM bulk_save_objects(): Total time for 100000 records 0.856323003769 secs
SQLAlchemy Core: Total time for 100000 records 0.485800027847 secs
sqlite3: Total time for 100000 records 0.487842082977 sec

脚本:

import time
import sqlite3

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String,  create_engine
from sqlalchemy.orm import scoped_session, sessionmaker

Base = declarative_base()
DBSession = scoped_session(sessionmaker())
engine = None


class Customer(Base):
    __tablename__ = "customer"
    id = Column(Integer, primary_key=True)
    name = Column(String(255))


def init_sqlalchemy(dbname='sqlite:///sqlalchemy.db'):
    global engine
    engine = create_engine(dbname, echo=False)
    DBSession.remove()
    DBSession.configure(bind=engine, autoflush=False, expire_on_commit=False)
    Base.metadata.drop_all(engine)
    Base.metadata.create_all(engine)


def test_sqlalchemy_orm(n=100000):
    init_sqlalchemy()
    t0 = time.time()
    for i in xrange(n):
        customer = Customer()
        customer.name = 'NAME ' + str(i)
        DBSession.add(customer)
        if i % 1000 == 0:
            DBSession.flush()
    DBSession.commit()
    print(
        "SQLAlchemy ORM: Total time for " + str(n) +
        " records " + str(time.time() - t0) + " secs")


def test_sqlalchemy_orm_pk_given(n=100000):
    init_sqlalchemy()
    t0 = time.time()
    for i in xrange(n):
        customer = Customer(id=i+1, name="NAME " + str(i))
        DBSession.add(customer)
        if i % 1000 == 0:
            DBSession.flush()
    DBSession.commit()
    print(
        "SQLAlchemy ORM pk given: Total time for " + str(n) +
        " records " + str(time.time() - t0) + " secs")


def test_sqlalchemy_orm_bulk_insert(n=100000):
    init_sqlalchemy()
    t0 = time.time()
    n1 = n
    while n1 > 0:
        n1 = n1 - 10000
        DBSession.bulk_insert_mappings(
            Customer,
            [
                dict(name="NAME " + str(i))
                for i in xrange(min(10000, n1))
            ]
        )
    DBSession.commit()
    print(
        "SQLAlchemy ORM bulk_save_objects(): Total time for " + str(n) +
        " records " + str(time.time() - t0) + " secs")


def test_sqlalchemy_core(n=100000):
    init_sqlalchemy()
    t0 = time.time()
    engine.execute(
        Customer.__table__.insert(),
        [{"name": 'NAME ' + str(i)} for i in xrange(n)]
    )
    print(
        "SQLAlchemy Core: Total time for " + str(n) +
        " records " + str(time.time() - t0) + " secs")


def init_sqlite3(dbname):
    conn = sqlite3.connect(dbname)
    c = conn.cursor()
    c.execute("DROP TABLE IF EXISTS customer")
    c.execute(
        "CREATE TABLE customer (id INTEGER NOT NULL, "
        "name VARCHAR(255), PRIMARY KEY(id))")
    conn.commit()
    return conn


def test_sqlite3(n=100000, dbname='sqlite3.db'):
    conn = init_sqlite3(dbname)
    c = conn.cursor()
    t0 = time.time()
    for i in xrange(n):
        row = ('NAME ' + str(i),)
        c.execute("INSERT INTO customer (name) VALUES (?)", row)
    conn.commit()
    print(
        "sqlite3: Total time for " + str(n) +
        " records " + str(time.time() - t0) + " sec")

if __name__ == '__main__':
    test_sqlalchemy_orm(100000)
    test_sqlalchemy_orm_pk_given(100000)
    test_sqlalchemy_orm_bulk_insert(100000)
    test_sqlalchemy_core(100000)
    test_sqlite3(100000)

The sqlalchemy docs have a writeup on the performance of various techniques that can be used for bulk inserts:

ORMs are basically not intended for high-performance bulk inserts – this is the whole reason SQLAlchemy offers the Core in addition to the ORM as a first-class component.

For the use case of fast bulk inserts, the SQL generation and execution system that the ORM builds on top of is part of the Core. Using this system directly, we can produce an INSERT that is competitive with using the raw database API directly.

Alternatively, the SQLAlchemy ORM offers the Bulk Operations suite of methods, which provide hooks into subsections of the unit of work process in order to emit Core-level INSERT and UPDATE constructs with a small degree of ORM-based automation.

The example below illustrates time-based tests for several different methods of inserting rows, going from the most automated to the least. With cPython 2.7, runtimes observed:

classics-MacBook-Pro:sqlalchemy classic$ python test.py
SQLAlchemy ORM: Total time for 100000 records 12.0471920967 secs
SQLAlchemy ORM pk given: Total time for 100000 records 7.06283402443 secs
SQLAlchemy ORM bulk_save_objects(): Total time for 100000 records 0.856323003769 secs
SQLAlchemy Core: Total time for 100000 records 0.485800027847 secs
sqlite3: Total time for 100000 records 0.487842082977 sec

Script:

import time
import sqlite3

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String,  create_engine
from sqlalchemy.orm import scoped_session, sessionmaker

Base = declarative_base()
DBSession = scoped_session(sessionmaker())
engine = None


class Customer(Base):
    __tablename__ = "customer"
    id = Column(Integer, primary_key=True)
    name = Column(String(255))


def init_sqlalchemy(dbname='sqlite:///sqlalchemy.db'):
    global engine
    engine = create_engine(dbname, echo=False)
    DBSession.remove()
    DBSession.configure(bind=engine, autoflush=False, expire_on_commit=False)
    Base.metadata.drop_all(engine)
    Base.metadata.create_all(engine)


def test_sqlalchemy_orm(n=100000):
    init_sqlalchemy()
    t0 = time.time()
    for i in xrange(n):
        customer = Customer()
        customer.name = 'NAME ' + str(i)
        DBSession.add(customer)
        if i % 1000 == 0:
            DBSession.flush()
    DBSession.commit()
    print(
        "SQLAlchemy ORM: Total time for " + str(n) +
        " records " + str(time.time() - t0) + " secs")


def test_sqlalchemy_orm_pk_given(n=100000):
    init_sqlalchemy()
    t0 = time.time()
    for i in xrange(n):
        customer = Customer(id=i+1, name="NAME " + str(i))
        DBSession.add(customer)
        if i % 1000 == 0:
            DBSession.flush()
    DBSession.commit()
    print(
        "SQLAlchemy ORM pk given: Total time for " + str(n) +
        " records " + str(time.time() - t0) + " secs")


def test_sqlalchemy_orm_bulk_insert(n=100000):
    init_sqlalchemy()
    t0 = time.time()
    n1 = n
    while n1 > 0:
        n1 = n1 - 10000
        DBSession.bulk_insert_mappings(
            Customer,
            [
                dict(name="NAME " + str(i))
                for i in xrange(min(10000, n1))
            ]
        )
    DBSession.commit()
    print(
        "SQLAlchemy ORM bulk_save_objects(): Total time for " + str(n) +
        " records " + str(time.time() - t0) + " secs")


def test_sqlalchemy_core(n=100000):
    init_sqlalchemy()
    t0 = time.time()
    engine.execute(
        Customer.__table__.insert(),
        [{"name": 'NAME ' + str(i)} for i in xrange(n)]
    )
    print(
        "SQLAlchemy Core: Total time for " + str(n) +
        " records " + str(time.time() - t0) + " secs")


def init_sqlite3(dbname):
    conn = sqlite3.connect(dbname)
    c = conn.cursor()
    c.execute("DROP TABLE IF EXISTS customer")
    c.execute(
        "CREATE TABLE customer (id INTEGER NOT NULL, "
        "name VARCHAR(255), PRIMARY KEY(id))")
    conn.commit()
    return conn


def test_sqlite3(n=100000, dbname='sqlite3.db'):
    conn = init_sqlite3(dbname)
    c = conn.cursor()
    t0 = time.time()
    for i in xrange(n):
        row = ('NAME ' + str(i),)
        c.execute("INSERT INTO customer (name) VALUES (?)", row)
    conn.commit()
    print(
        "sqlite3: Total time for " + str(n) +
        " records " + str(time.time() - t0) + " sec")

if __name__ == '__main__':
    test_sqlalchemy_orm(100000)
    test_sqlalchemy_orm_pk_given(100000)
    test_sqlalchemy_orm_bulk_insert(100000)
    test_sqlalchemy_core(100000)
    test_sqlite3(100000)

回答 2

据我所知,没有办法让ORM发出批量插入。我认为根本原因是SQLAlchemy需要跟踪每个对象的身份(即新的主键),而大容量插入会对此产生干扰。例如,假设您的foo表包含一id列并映射到一个Foo类:

x = Foo(bar=1)
print x.id
# None
session.add(x)
session.flush()
# BEGIN
# INSERT INTO foo (bar) VALUES(1)
# COMMIT
print x.id
# 1

由于SQLAlchemy在x.id不发出另一个查询的情况下获取了该值,因此我们可以推断出它直接从该INSERT语句中获取了该值。如果不需要随后通过相同实例访问创建的对象,则可以跳过ORM层进行插入:

Foo.__table__.insert().execute([{'bar': 1}, {'bar': 2}, {'bar': 3}])
# INSERT INTO foo (bar) VALUES ((1,), (2,), (3,))

SQLAlchemy无法将这些新行与任何现有对象匹配,因此您必须重新查询它们以进行任何后续操作。

至于过时的数据,记住该会话没有内置的方式来了解何时在会话外更改数据库是很有帮助的。为了通过现有实例访问外部修改的数据,必须将这些实例标记为expired。默认情况下会发生这种情况session.commit(),但可以通过调用session.expire_all()或手动完成session.expire(instance)。一个例子(省略SQL):

x = Foo(bar=1)
session.add(x)
session.commit()
print x.bar
# 1
foo.update().execute(bar=42)
print x.bar
# 1
session.expire(x)
print x.bar
# 42

session.commit()expires x,因此第一个打印语句隐式打开一个新事务并重新查询x属性。如果注释掉第一个打印语句,您会注意到第二个打印语句现在会选择正确的值,因为直到更新后才会发出新查询。

从事务隔离的角度来看,这是有道理的-您只应在事务之间进行外部修改。如果这给您带来麻烦,建议您弄清或重新考虑应用程序的事务边界,而不要立即进行操作session.expire_all()

As far as I know, there is no way to get the ORM to issue bulk inserts. I believe the underlying reason is that SQLAlchemy needs to keep track of each object’s identity (i.e., new primary keys), and bulk inserts interfere with that. For example, assuming your foo table contains an id column and is mapped to a Foo class:

x = Foo(bar=1)
print x.id
# None
session.add(x)
session.flush()
# BEGIN
# INSERT INTO foo (bar) VALUES(1)
# COMMIT
print x.id
# 1

Since SQLAlchemy picked up the value for x.id without issuing another query, we can infer that it got the value directly from the INSERT statement. If you don’t need subsequent access to the created objects via the same instances, you can skip the ORM layer for your insert:

Foo.__table__.insert().execute([{'bar': 1}, {'bar': 2}, {'bar': 3}])
# INSERT INTO foo (bar) VALUES ((1,), (2,), (3,))

SQLAlchemy can’t match these new rows with any existing objects, so you’ll have to query them anew for any subsequent operations.

As far as stale data is concerned, it’s helpful to remember that the session has no built-in way to know when the database is changed outside of the session. In order to access externally modified data through existing instances, the instances must be marked as expired. This happens by default on session.commit(), but can be done manually by calling session.expire_all() or session.expire(instance). An example (SQL omitted):

x = Foo(bar=1)
session.add(x)
session.commit()
print x.bar
# 1
foo.update().execute(bar=42)
print x.bar
# 1
session.expire(x)
print x.bar
# 42

session.commit() expires x, so the first print statement implicitly opens a new transaction and re-queries x‘s attributes. If you comment out the first print statement, you’ll notice that the second one now picks up the correct value, because the new query isn’t emitted until after the update.

This makes sense from the point of view of transactional isolation – you should only pick up external modifications between transactions. If this is causing you trouble, I’d suggest clarifying or re-thinking your application’s transaction boundaries instead of immediately reaching for session.expire_all().


回答 3

我通常使用add_all

from app import session
from models import User

objects = [User(name="u1"), User(name="u2"), User(name="u3")]
session.add_all(objects)
session.commit()

I usually do it using add_all.

from app import session
from models import User

objects = [User(name="u1"), User(name="u2"), User(name="u3")]
session.add_all(objects)
session.commit()

回答 4

从0.8版开始,直接支持已添加到SQLAlchemy

根据docsconnection.execute(table.insert().values(data))应该可以解决问题。(请注意,这是一样的connection.execute(table.insert(), data)通过将呼叫这导致许多个别行插入executemany)。除了本地连接之外,其他任何方面的性能差异都可能很大。

Direct support was added to SQLAlchemy as of version 0.8

As per the docs, connection.execute(table.insert().values(data)) should do the trick. (Note that this is not the same as connection.execute(table.insert(), data) which results in many individual row inserts via a call to executemany). On anything but a local connection the difference in performance can be enormous.


回答 5

SQLAlchemy在版本中引入了该功能1.0.0

批量操作-SQLAlchemy文档

通过这些操作,您现在可以批量插入或更新!

例如(如果您希望简单表INSERT的开销最小),可以使用Session.bulk_insert_mappings()

loadme = [(1, 'a'),
          (2, 'b'),
          (3, 'c')]
dicts = [dict(bar=t[0], fly=t[1]) for t in loadme]

s = Session()
s.bulk_insert_mappings(Foo, dicts)
s.commit()

或者,如果需要,可以跳过loadme元组,直接将字典写进去dicts(但是我发现将所有的单词遗漏在数据之外并循环加载字典列表会更容易)。

SQLAlchemy introduced that in version 1.0.0:

Bulk operations – SQLAlchemy docs

With these operations, you can now do bulk inserts or updates!

For instance (if you want the lowest overhead for simple table INSERTs), you can use Session.bulk_insert_mappings():

loadme = [(1, 'a'),
          (2, 'b'),
          (3, 'c')]
dicts = [dict(bar=t[0], fly=t[1]) for t in loadme]

s = Session()
s.bulk_insert_mappings(Foo, dicts)
s.commit()

Or, if you want, skip the loadme tuples and write the dictionaries directly into dicts (but I find it easier to leave all the wordiness out of the data and load up a list of dictionaries in a loop).


回答 6

Piere的回答是正确的,但是一个问题是bulk_save_objects,如果您担心的话,默认情况下不会返回对象的主键。设置return_defaultsTrue可得到此行为。

文档在这里

foos = [Foo(bar='a',), Foo(bar='b'), Foo(bar='c')]
session.bulk_save_objects(foos, return_defaults=True)
for foo in foos:
    assert foo.id is not None
session.commit()

Piere’s answer is correct but one issue is that bulk_save_objects by default does not return the primary keys of the objects, if that is of concern to you. Set return_defaults to True to get this behavior.

The documentation is here.

foos = [Foo(bar='a',), Foo(bar='b'), Foo(bar='c')]
session.bulk_save_objects(foos, return_defaults=True)
for foo in foos:
    assert foo.id is not None
session.commit()

回答 7

条条大路通罗马,但其中一些横穿山脉,需要渡轮,但是如果您想快速到达那儿,只需上高速公路。


在这种情况下,高速公路将使用psycopg2execute_batch()功能。该文档说的最好:

当前的实现executemany()(使用非常慈善的轻描淡写)不是特别有效。这些功能可用于加快针对一组参数的语句的重复执行。通过减少服务器往返次数,性能可以比使用服务器好几个数量级。executemany()

在我自己的测试execute_batch()快2倍左右executemany(),并给出配置进行进一步的调整所以page_size的选项(如果你想挤进业绩的最后2-3%的驾驶者)。

如果使用SQLAlchemy,则可以通过use_batch_mode=True在实例化引擎时将其设置为参数来轻松启用相同功能。create_engine()

All Roads Lead to Rome, but some of them crosses mountains, requires ferries but if you want to get there quickly just take the motorway.


In this case the motorway is to use the execute_batch() feature of psycopg2. The documentation says it the best:

The current implementation of executemany() is (using an extremely charitable understatement) not particularly performing. These functions can be used to speed up the repeated execution of a statement against a set of parameters. By reducing the number of server roundtrips the performance can be orders of magnitude better than using executemany().

In my own test execute_batch() is approximately twice as fast as executemany(), and gives the option to configure the page_size for further tweaking (if you want to squeeze the last 2-3% of performance out of the driver).

The same feature can easily be enabled if you are using SQLAlchemy by setting use_batch_mode=True as a parameter when you instantiate the engine with create_engine()


回答 8

这是一种方法:

values = [1, 2, 3]
Foo.__table__.insert().execute([{'bar': x} for x in values])

这样插入:

INSERT INTO `foo` (`bar`) VALUES (1), (2), (3)

参考:SQLAlchemy FAQ包含各种提交方法的基准。

This is a way:

values = [1, 2, 3]
Foo.__table__.insert().execute([{'bar': x} for x in values])

This will insert like this:

INSERT INTO `foo` (`bar`) VALUES (1), (2), (3)

Reference: The SQLAlchemy FAQ includes benchmarks for various commit methods.


回答 9

到目前为止,我发现的最佳答案是在sqlalchemy文档中:

http://docs.sqlalchemy.org/en/latest/faq/performance.html#im-inserting-400-000-rows-with-the-orm-and-it-s-really-slow

有一个完整示例说明了可能的解决方案基准。

如文档所示:

bulk_save_objects不是最佳解决方案,但其性能是正确的。

就可读性而言,第二好的实现是我认为使用SQLAlchemy Core:

def test_sqlalchemy_core(n=100000):
    init_sqlalchemy()
    t0 = time.time()
    engine.execute(
        Customer.__table__.insert(),
            [{"name": 'NAME ' + str(i)} for i in xrange(n)]
    )

文档文章中提供了此功能的上下文。

The best answer I found so far was in sqlalchemy documentation:

http://docs.sqlalchemy.org/en/latest/faq/performance.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow

There is a complete example of a benchmark of possible solutions.

As shown in the documentation:

bulk_save_objects is not the best solution but it performance are correct.

The second best implementation in terms of readability I think was with the SQLAlchemy Core:

def test_sqlalchemy_core(n=100000):
    init_sqlalchemy()
    t0 = time.time()
    engine.execute(
        Customer.__table__.insert(),
            [{"name": 'NAME ' + str(i)} for i in xrange(n)]
    )

The context of this function is given in the documentation article.


使用SQLAlchemy ORM高效地更新数据库

问题:使用SQLAlchemy ORM高效地更新数据库

我正在启动一个新应用程序,并考虑使用ORM,尤其是SQLAlchemy。

假设我的数据库中有一列“ foo”,我想增加它。在直通sqlite中,这很容易:

db = sqlite3.connect('mydata.sqlitedb')
cur = db.cursor()
cur.execute('update table stuff set foo = foo + 1')

我弄清楚了SQLAlchemy SQL-builder等效项:

engine = sqlalchemy.create_engine('sqlite:///mydata.sqlitedb')
md = sqlalchemy.MetaData(engine)
table = sqlalchemy.Table('stuff', md, autoload=True)
upd = table.update(values={table.c.foo:table.c.foo+1})
engine.execute(upd)

这稍微慢一点,但是没有太多。

这是我对SQLAlchemy ORM方法的最佳猜测:

# snip definition of Stuff class made using declarative_base
# snip creation of session object
for c in session.query(Stuff):
    c.foo = c.foo + 1
session.flush()
session.commit()

这样做是正确的,但所需的时间是其他两种方法的近50倍。我认为这是因为它必须先将所有数据带入内存,然后才能使用它。

有什么方法可以使用SQLAlchemy的ORM生成高效的SQL?还是使用其他任何Python ORM?还是我应该回到手工编写SQL?

I’m starting a new application and looking at using an ORM — in particular, SQLAlchemy.

Say I’ve got a column ‘foo’ in my database and I want to increment it. In straight sqlite, this is easy:

db = sqlite3.connect('mydata.sqlitedb')
cur = db.cursor()
cur.execute('update table stuff set foo = foo + 1')

I figured out the SQLAlchemy SQL-builder equivalent:

engine = sqlalchemy.create_engine('sqlite:///mydata.sqlitedb')
md = sqlalchemy.MetaData(engine)
table = sqlalchemy.Table('stuff', md, autoload=True)
upd = table.update(values={table.c.foo:table.c.foo+1})
engine.execute(upd)

This is slightly slower, but there’s not much in it.

Here’s my best guess for a SQLAlchemy ORM approach:

# snip definition of Stuff class made using declarative_base
# snip creation of session object
for c in session.query(Stuff):
    c.foo = c.foo + 1
session.flush()
session.commit()

This does the right thing, but it takes just under fifty times as long as the other two approaches. I presume that’s because it has to bring all the data into memory before it can work with it.

Is there any way to generate the efficient SQL using SQLAlchemy’s ORM? Or using any other python ORM? Or should I just go back to writing the SQL by hand?


回答 0

SQLAlchemy的ORM旨在与SQL层一起使用,而不是将其隐藏。但是,在同一事务中使用ORM和纯SQL时,您必须牢记一两件事。基本上,从一方面讲,仅当您从会话中清除更改时,ORM数据修改才会命中数据库。另一方面,SQL数据操作语句不会影响会话中的对象。

所以如果你说

for c in session.query(Stuff).all():
    c.foo = c.foo+1
session.commit()

它会按照说的去做,从数据库中获取所有对象,修改所有对象,然后在需要时将更改刷新到数据库中,一行一行地更新。

相反,您应该这样做:

session.execute(update(stuff_table, values={stuff_table.c.foo: stuff_table.c.foo + 1}))
session.commit()

这将像您期望的那样作为一个查询执行,并且因为至少默认会话配置在提交时使会话中的所有数据失效,所以您没有任何过时的数据问题。

在即将发布的0.5系列中,您还可以使用以下方法进行更新:

session.query(Stuff).update({Stuff.foo: Stuff.foo + 1})
session.commit()

基本上,它将运行与上一片段相同的SQL语句,但还会选择更改的行并使会话中的所有过时数据过期。如果您知道更新后没有使用任何会话数据,则也可以synchronize_session=False将其添加到update语句中并摆脱该选择。

SQLAlchemy’s ORM is meant to be used together with the SQL layer, not hide it. But you do have to keep one or two things in mind when using the ORM and plain SQL in the same transaction. Basically, from one side, ORM data modifications will only hit the database when you flush the changes from your session. From the other side, SQL data manipulation statements don’t affect the objects that are in your session.

So if you say

for c in session.query(Stuff).all():
    c.foo = c.foo+1
session.commit()

it will do what it says, go fetch all the objects from the database, modify all the objects and then when it’s time to flush the changes to the database, update the rows one by one.

Instead you should do this:

session.execute(update(stuff_table, values={stuff_table.c.foo: stuff_table.c.foo + 1}))
session.commit()

This will execute as one query as you would expect, and because at least the default session configuration expires all data in the session on commit you don’t have any stale data issues.

In the almost-released 0.5 series you could also use this method for updating:

session.query(Stuff).update({Stuff.foo: Stuff.foo + 1})
session.commit()

That will basically run the same SQL statement as the previous snippet, but also select the changed rows and expire any stale data in the session. If you know you aren’t using any session data after the update you could also add synchronize_session=False to the update statement and get rid of that select.


回答 1

session.query(Clients).filter(Clients.id == client_id_list).update({'status': status})
session.commit()

试试这个=)

session.query(Clients).filter(Clients.id == client_id_list).update({'status': status})
session.commit()

Try this =)


回答 2

有几种使用sqlalchemy进行更新的方法

1) for c in session.query(Stuff).all():
       c.foo += 1
   session.commit()

2) session.query().\
       update({"foo": (Stuff.foo + 1)})
   session.commit()

3) conn = engine.connect()
   stmt = Stuff.update().\
       values(Stuff.foo = (Stuff.foo + 1))
   conn.execute(stmt)

There are several ways to UPDATE using sqlalchemy

1) for c in session.query(Stuff).all():
       c.foo += 1
   session.commit()

2) session.query().\
       update({"foo": (Stuff.foo + 1)})
   session.commit()

3) conn = engine.connect()
   stmt = Stuff.update().\
       values(Stuff.foo = (Stuff.foo + 1))
   conn.execute(stmt)

回答 3

这是一个无需手动映射字段即可解决相同问题的示例:

from sqlalchemy import Column, ForeignKey, Integer, String, Date, DateTime, text, create_engine
from sqlalchemy.exc import IntegrityError
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm.attributes import InstrumentedAttribute

engine = create_engine('postgres://postgres@localhost:5432/database')
session = sessionmaker()
session.configure(bind=engine)

Base = declarative_base()


class Media(Base):
  __tablename__ = 'media'
  id = Column(Integer, primary_key=True)
  title = Column(String, nullable=False)
  slug = Column(String, nullable=False)
  type = Column(String, nullable=False)

  def update(self):
    s = session()
    mapped_values = {}
    for item in Media.__dict__.iteritems():
      field_name = item[0]
      field_type = item[1]
      is_column = isinstance(field_type, InstrumentedAttribute)
      if is_column:
        mapped_values[field_name] = getattr(self, field_name)

    s.query(Media).filter(Media.id == self.id).update(mapped_values)
    s.commit()

因此,要更新Media实例,您可以执行以下操作:

media = Media(id=123, title="Titular Line", slug="titular-line", type="movie")
media.update()

Here’s an example of how to solve the same problem without having to map the fields manually:

from sqlalchemy import Column, ForeignKey, Integer, String, Date, DateTime, text, create_engine
from sqlalchemy.exc import IntegrityError
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm.attributes import InstrumentedAttribute

engine = create_engine('postgres://postgres@localhost:5432/database')
session = sessionmaker()
session.configure(bind=engine)

Base = declarative_base()


class Media(Base):
  __tablename__ = 'media'
  id = Column(Integer, primary_key=True)
  title = Column(String, nullable=False)
  slug = Column(String, nullable=False)
  type = Column(String, nullable=False)

  def update(self):
    s = session()
    mapped_values = {}
    for item in Media.__dict__.iteritems():
      field_name = item[0]
      field_type = item[1]
      is_column = isinstance(field_type, InstrumentedAttribute)
      if is_column:
        mapped_values[field_name] = getattr(self, field_name)

    s.query(Media).filter(Media.id == self.id).update(mapped_values)
    s.commit()

So to update a Media instance, you can do something like this:

media = Media(id=123, title="Titular Line", slug="titular-line", type="movie")
media.update()

回答 4

经过足够的测试,我会尝试:

for c in session.query(Stuff).all():
     c.foo = c.foo+1
session.commit()

(IIRC,commit()在不使用flush()的情况下工作)。

我发现有时执行大型查询然后在python中进行迭代比许多查询快2个数量级。我假设遍历查询对象的效率不及遍历查询对象的all()方法生成的列表的效率。

[请注意下面的评论-这根本没有加快速度]。

Withough testing, I’d try:

for c in session.query(Stuff).all():
     c.foo = c.foo+1
session.commit()

(IIRC, commit() works without flush()).

I’ve found that at times doing a large query and then iterating in python can be up to 2 orders of magnitude faster than lots of queries. I assume that iterating over the query object is less efficient than iterating over a list generated by the all() method of the query object.

[Please note comment below – this did not speed things up at all].


回答 5

如果是由于创建对象方面的开销,那么使用SA可能根本无法加速。

如果是因为它正在加载相关对象,那么您可以通过延迟加载来执行某些操作。是否存在大量由于引用而创建的对象?(即,获取Company对象也将获取所有相关的People对象)。

If it is because of the overhead in terms of creating objects, then it probably can’t be sped up at all with SA.

If it is because it is loading up related objects, then you might be able to do something with lazy loading. Are there lots of objects being created due to references? (IE, getting a Company object also gets all of the related People objects).


小马(ORM)如何发挥作用?

问题:小马(ORM)如何发挥作用?

Pony ORM很好地把生成器表达式转换成SQL。例:

>>> select(p for p in Person if p.name.startswith('Paul'))
        .order_by(Person.name)[:2]

SELECT "p"."id", "p"."name", "p"."age"
FROM "Person" "p"
WHERE "p"."name" LIKE "Paul%"
ORDER BY "p"."name"
LIMIT 2

[Person[3], Person[1]]
>>>

我知道Python具有出色的自省和内置元编程功能,但是该库如何能够在不进行预处理的情况下转换生成器表达式?看起来像魔术。

[更新]

搅拌器写道:

这是您要查找的文件。似乎可以使用一些自省向导来重构生成器。我不确定它是否支持100%的Python语法,但这很酷。- 搅拌机

我以为他们正在研究生成器表达协议中的某些功能,但正在查看此文件并看到其中ast涉及的模块…不,他们不是在动态检查程序源,是吗?令人振奋…

@BrenBarn:如果我尝试在select函数调用之外调用生成器,则结果为:

>>> x = (p for p in Person if p.age > 20)
>>> x.next()
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
  File "<interactive input>", line 1, in <genexpr>
  File "C:\Python27\lib\site-packages\pony\orm\core.py", line 1822, in next
    % self.entity.__name__)
  File "C:\Python27\lib\site-packages\pony\utils.py", line 92, in throw
    raise exc
TypeError: Use select(...) function or Person.select(...) method for iteration
>>>

好像他们在做更多不可思议的事情,例如检查select函数调用和动态处理Python抽象语法语法树。

我仍然希望看到有人对此进行解释,其来源远远超出了我的巫术水平。

Pony ORM does the nice trick of converting a generator expression into SQL. Example:

>>> select(p for p in Person if p.name.startswith('Paul'))
        .order_by(Person.name)[:2]

SELECT "p"."id", "p"."name", "p"."age"
FROM "Person" "p"
WHERE "p"."name" LIKE "Paul%"
ORDER BY "p"."name"
LIMIT 2

[Person[3], Person[1]]
>>>

I know Python has wonderful introspection and metaprogramming builtin, but how this library is able to translate the generator expression without preprocessing? It looks like magic.

[update]

Blender wrote:

Here is the file that you’re after. It seems to reconstruct the generator using some introspection wizardry. I’m not sure if it supports 100% of Python’s syntax, but this is pretty cool. – Blender

I was thinking they were exploring some feature from the generator expression protocol, but looking this file, and seeing the ast module involved… No, they are not inspecting the program source on the fly, are they? Mind-blowing…

@BrenBarn: If I try to call the generator outside the select function call, the result is:

>>> x = (p for p in Person if p.age > 20)
>>> x.next()
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
  File "<interactive input>", line 1, in <genexpr>
  File "C:\Python27\lib\site-packages\pony\orm\core.py", line 1822, in next
    % self.entity.__name__)
  File "C:\Python27\lib\site-packages\pony\utils.py", line 92, in throw
    raise exc
TypeError: Use select(...) function or Person.select(...) method for iteration
>>>

Seems like they are doing more arcane incantations like inspecting the select function call and processing the Python abstract syntax grammar tree on the fly.

I still would like to see someone explaining it, the source is way beyond my wizardry level.


回答 0

小马ORM作者在这里。

Pony通过三个步骤将Python生成器转换为SQL查询:

  1. 反编译生成器字节码并重建生成器AST(抽象语法树)
  2. 将Python AST转换为“抽象SQL”-SQL查询的基于列表的通用表示形式
  3. 将抽象SQL表示转换为特定于数据库的SQL方言

最复杂的部分是第二步,其中Pony必须了解Python表达式的“含义”。似乎您对第一步最感兴趣,所以让我解释一下反编译的工作原理。

让我们考虑以下查询:

>>> from pony.orm.examples.estore import *
>>> select(c for c in Customer if c.country == 'USA').show()

将其转换为以下SQL:

SELECT "c"."id", "c"."email", "c"."password", "c"."name", "c"."country", "c"."address"
FROM "Customer" "c"
WHERE "c"."country" = 'USA'

下面是该查询的结果,将其打印出来:

id|email              |password|name          |country|address  
--+-------------------+--------+--------------+-------+---------
1 |john@example.com   |***     |John Smith    |USA    |address 1
2 |matthew@example.com|***     |Matthew Reed  |USA    |address 2
4 |rebecca@example.com|***     |Rebecca Lawson|USA    |address 4

select()函数接受python生成器作为参数,然后分析其字节码。我们可以使用标准的python dis模块获取此生成器的字节码指令:

>>> gen = (c for c in Customer if c.country == 'USA')
>>> import dis
>>> dis.dis(gen.gi_frame.f_code)
  1           0 LOAD_FAST                0 (.0)
        >>    3 FOR_ITER                26 (to 32)
              6 STORE_FAST               1 (c)
              9 LOAD_FAST                1 (c)
             12 LOAD_ATTR                0 (country)
             15 LOAD_CONST               0 ('USA')
             18 COMPARE_OP               2 (==)
             21 POP_JUMP_IF_FALSE        3
             24 LOAD_FAST                1 (c)
             27 YIELD_VALUE         
             28 POP_TOP             
             29 JUMP_ABSOLUTE            3
        >>   32 LOAD_CONST               1 (None)
             35 RETURN_VALUE

Pony ORM decompile()在模块内pony.orm.decompiling具有可以从字节码恢复AST 的功能:

>>> from pony.orm.decompiling import decompile
>>> ast, external_names = decompile(gen)

在这里,我们可以看到AST节点的文本表示形式:

>>> ast
GenExpr(GenExprInner(Name('c'), [GenExprFor(AssName('c', 'OP_ASSIGN'), Name('.0'),
[GenExprIf(Compare(Getattr(Name('c'), 'country'), [('==', Const('USA'))]))])]))

现在让我们看看该decompile()函数是如何工作的。

decompile()函数创建一个Decompiler对象,该对象实现了Visitor模式。反编译器实例一一获取字节码指令。对于每条指令,反编译器对象都会调用其自己的方法。该方法的名称等于当前字节码指令的名称。

Python计算表达式时,它使用堆栈,该堆栈存储中间的计算结果。反编译器对象也有自己的堆栈,但是该堆栈不存储表达式计算的结果,而是存储表达式的AST节点。

当调用下一个字节码指令的反编译器方法时,它将从堆栈中取出AST节点,将它们组合成一个新的AST节点,然后将该节点放在堆栈的顶部。

例如,让我们看看如何c.country == 'USA'计算子表达式。相应的字节码片段为:

              9 LOAD_FAST                1 (c)
             12 LOAD_ATTR                0 (country)
             15 LOAD_CONST               0 ('USA')
             18 COMPARE_OP               2 (==)

因此,反编译器对象执行以下操作:

  1. 来电decompiler.LOAD_FAST('c')。此方法将Name('c')节点放在反编译器堆栈的顶部。
  2. 来电decompiler.LOAD_ATTR('country')。此方法Name('c')从堆栈中取出节点,创建该Geattr(Name('c'), 'country')节点并将其放在堆栈顶部。
  3. 来电decompiler.LOAD_CONST('USA')。此方法将Const('USA')节点放在堆栈顶部。
  4. 来电decompiler.COMPARE_OP('==')。此方法从堆栈中获取两个节点(Getattr和Const),然后将其Compare(Getattr(Name('c'), 'country'), [('==', Const('USA'))]) 放在堆栈的顶部。

在处理完所有字节码指令之后,反编译器堆栈将包含一个与整个生成器表达式相对应的AST节点。

由于Pony ORM仅需要反编译生成器和lambda,因此并没有那么复杂,因为生成器的指令流相对简单-它只是一堆嵌套循环。

目前,Pony ORM涵盖了整个生成器指令集,但以下两点除外:

  1. 内联if表达式: a if b else c
  2. 复合比较: a < b < c

如果Pony遇到此类表达,则会引发NotImplementedError异常。但是即使在这种情况下,您也可以通过将生成器表达式作为字符串传递来使其工作。当您将生成器作为字符串传递时,Pony不使用反编译器模块。相反,它使用标准Python compiler.parse函数获取AST 。

希望这能回答您的问题。

Pony ORM author is here.

Pony translates Python generator into SQL query in three steps:

  1. Decompiling of generator bytecode and rebuilding generator AST (abstract syntax tree)
  2. Translation of Python AST into “abstract SQL” — universal list-based representation of a SQL query
  3. Converting abstract SQL representation into specific database-dependent SQL dialect

The most complex part is the second step, where Pony must understand the “meaning” of Python expressions. Seems you are most interested in the first step, so let me explain how decompiling works.

Let’s consider this query:

>>> from pony.orm.examples.estore import *
>>> select(c for c in Customer if c.country == 'USA').show()

Which will be translated into the following SQL:

SELECT "c"."id", "c"."email", "c"."password", "c"."name", "c"."country", "c"."address"
FROM "Customer" "c"
WHERE "c"."country" = 'USA'

And below is the result of this query which will be printed out:

id|email              |password|name          |country|address  
--+-------------------+--------+--------------+-------+---------
1 |john@example.com   |***     |John Smith    |USA    |address 1
2 |matthew@example.com|***     |Matthew Reed  |USA    |address 2
4 |rebecca@example.com|***     |Rebecca Lawson|USA    |address 4

The select() function accepts a python generator as argument, and then analyzes its bytecode. We can get bytecode instructions of this generator using standard python dis module:

>>> gen = (c for c in Customer if c.country == 'USA')
>>> import dis
>>> dis.dis(gen.gi_frame.f_code)
  1           0 LOAD_FAST                0 (.0)
        >>    3 FOR_ITER                26 (to 32)
              6 STORE_FAST               1 (c)
              9 LOAD_FAST                1 (c)
             12 LOAD_ATTR                0 (country)
             15 LOAD_CONST               0 ('USA')
             18 COMPARE_OP               2 (==)
             21 POP_JUMP_IF_FALSE        3
             24 LOAD_FAST                1 (c)
             27 YIELD_VALUE         
             28 POP_TOP             
             29 JUMP_ABSOLUTE            3
        >>   32 LOAD_CONST               1 (None)
             35 RETURN_VALUE

Pony ORM has the function decompile() within module pony.orm.decompiling which can restore an AST from the bytecode:

>>> from pony.orm.decompiling import decompile
>>> ast, external_names = decompile(gen)

Here, we can see the textual representation of the AST nodes:

>>> ast
GenExpr(GenExprInner(Name('c'), [GenExprFor(AssName('c', 'OP_ASSIGN'), Name('.0'),
[GenExprIf(Compare(Getattr(Name('c'), 'country'), [('==', Const('USA'))]))])]))

Let’s now see how the decompile() function works.

The decompile() function creates a Decompiler object, which implements the Visitor pattern. The decompiler instance gets bytecode instructions one-by-one. For each instruction the decompiler object calls its own method. The name of this method is equal to the name of current bytecode instruction.

When Python calculates an expression, it uses stack, which stores an intermediate result of calculation. The decompiler object also has its own stack, but this stack stores not the result of expression calculation, but AST node for the expression.

When decompiler method for the next bytecode instruction is called, it takes AST nodes from the stack, combines them into a new AST node, and then puts this node on the top of the stack.

For example, let’s see how the subexpression c.country == 'USA' is calculated. The corresponding bytecode fragment is:

              9 LOAD_FAST                1 (c)
             12 LOAD_ATTR                0 (country)
             15 LOAD_CONST               0 ('USA')
             18 COMPARE_OP               2 (==)

So, the decompiler object does the following:

  1. Calls decompiler.LOAD_FAST('c'). This method puts the Name('c') node on the top of the decompiler stack.
  2. Calls decompiler.LOAD_ATTR('country'). This method takes the Name('c') node from the stack, creates the Geattr(Name('c'), 'country') node and puts it on the top of the stack.
  3. Calls decompiler.LOAD_CONST('USA'). This method puts the Const('USA') node on top of the stack.
  4. Calls decompiler.COMPARE_OP('=='). This method takes two nodes (Getattr and Const) from the stack, and then puts Compare(Getattr(Name('c'), 'country'), [('==', Const('USA'))]) on the top of the stack.

After all bytecode instructions are processed, the decompiler stack contains a single AST node which corresponds to the whole generator expression.

Since Pony ORM needs to decompile generators and lambdas only, this is not that complex, because the instruction flow for a generator is relatively straightforward – it is just a bunch of nested loops.

Currently Pony ORM covers the whole generator instructions set except two things:

  1. Inline if expressions: a if b else c
  2. Compound comparisons: a < b < c

If Pony encounters such expression it raises the NotImplementedError exception. But even in this case you can make it work by passing the generator expression as a string. When you pass a generator as a string Pony doesn’t use the decompiler module. Instead it gets the AST using the standard Python compiler.parse function.

Hope this answers your question.


按属性筛选

问题:按属性筛选

是否可以通过模型属性过滤Django查询集?

我的模型中有一个方法:

@property
def myproperty(self):
    [..]

现在我想按此属性进行过滤,例如:

MyModel.objects.filter(myproperty=[..])

这有可能吗?

Is it possible to filter a Django queryset by model property?

i have a method in my model:

@property
def myproperty(self):
    [..]

and now i want to filter by this property like:

MyModel.objects.filter(myproperty=[..])

is this somehow possible?


回答 0

不。Django过滤器在数据库级别运行,生成SQL。要基于Python属性进行过滤,您必须将对象加载到Python中以评估该属性-到那时,您已经完成了加载该对象的所有工作。

Nope. Django filters operate at the database level, generating SQL. To filter based on Python properties, you have to load the object into Python to evaluate the property–and at that point, you’ve already done all the work to load it.


回答 1

我可能会误解您的原始问题,但是python中内置了一个过滤器

filtered = filter(myproperty, MyModel.objects)

但是最好使用列表理解

filtered = [x for x in MyModel.objects if x.myproperty()]

甚至更好的是生成器表达式

filtered = (x for x in MyModel.objects if x.myproperty())

I might be misunderstanding your original question, but there is a filter builtin in python.

filtered = filter(myproperty, MyModel.objects)

But it’s better to use a list comprehension:

filtered = [x for x in MyModel.objects if x.myproperty()]

or even better, a generator expression:

filtered = (x for x in MyModel.objects if x.myproperty())

回答 2

摆脱@TheGrimmScientist建议的解决方法,您可以通过在Manager或QuerySet上定义这些“ sql属性”,然后重新使用/链接/组成它们,来制成这些“ sql属性”:

与经理一起:

class CompanyManager(models.Manager):
    def with_chairs_needed(self):
        return self.annotate(chairs_needed=F('num_employees') - F('num_chairs'))

class Company(models.Model):
    # ...
    objects = CompanyManager()

Company.objects.with_chairs_needed().filter(chairs_needed__lt=4)

使用QuerySet:

class CompanyQuerySet(models.QuerySet):
    def many_employees(self, n=50):
        return self.filter(num_employees__gte=n)

    def needs_fewer_chairs_than(self, n=5):
        return self.with_chairs_needed().filter(chairs_needed__lt=n)

    def with_chairs_needed(self):
        return self.annotate(chairs_needed=F('num_employees') - F('num_chairs'))

class Company(models.Model):
    # ...
    objects = CompanyQuerySet.as_manager()

Company.objects.needs_fewer_chairs_than(4).many_employees()

有关更多信息,请参见https://docs.djangoproject.com/en/1.9/topics/db/managers/。请注意,我将关闭文档,并且尚未测试以上内容。

Riffing off @TheGrimmScientist’s suggested workaround, you can make these “sql properties” by defining them on the Manager or the QuerySet, and reuse/chain/compose them:

With a Manager:

class CompanyManager(models.Manager):
    def with_chairs_needed(self):
        return self.annotate(chairs_needed=F('num_employees') - F('num_chairs'))

class Company(models.Model):
    # ...
    objects = CompanyManager()

Company.objects.with_chairs_needed().filter(chairs_needed__lt=4)

With a QuerySet:

class CompanyQuerySet(models.QuerySet):
    def many_employees(self, n=50):
        return self.filter(num_employees__gte=n)

    def needs_fewer_chairs_than(self, n=5):
        return self.with_chairs_needed().filter(chairs_needed__lt=n)

    def with_chairs_needed(self):
        return self.annotate(chairs_needed=F('num_employees') - F('num_chairs'))

class Company(models.Model):
    # ...
    objects = CompanyQuerySet.as_manager()

Company.objects.needs_fewer_chairs_than(4).many_employees()

See https://docs.djangoproject.com/en/1.9/topics/db/managers/ for more. Note that I am going off the documentation and have not tested the above.


回答 3

看起来将F()与批注一起使用将是我的解决方案。

它不会被过滤@property,因为F在将对象带入python之前会与数据库进行对话。但是仍然把它作为答案,因为我想要按属性过滤的原因实际上是希望通过对两个不同字段进行简单算术运算的结果来过滤对象。

因此,类似以下内容:

companies = Company.objects\
    .annotate(chairs_needed=F('num_employees') - F('num_chairs'))\
    .filter(chairs_needed__lt=4)

而不是将属性定义为:

@property
def chairs_needed(self):
    return self.num_employees - self.num_chairs

然后对所有对象进行列表理解。

Looks like using F() with annotations will be my solution to this.

It’s not going to filter by @property, since F talks to the databse before objects are brought into python. But still putting it here as an answer since my reason for wanting filter by property was really wanting to filter objects by the result of simple arithmetic on two different fields.

so, something along the lines of:

companies = Company.objects\
    .annotate(chairs_needed=F('num_employees') - F('num_chairs'))\
    .filter(chairs_needed__lt=4)

rather than defining the property to be:

@property
def chairs_needed(self):
    return self.num_employees - self.num_chairs

then doing a list comprehension across all objects.


回答 4

我遇到了同样的问题,并开发了这个简单的解决方案:

objects_id = [x.id for x in MyModel.objects.all() if x.myProperty == [...]]
MyModel.objects.filter(id__in=objects_id)

我知道这不是最有效的解决方案,但在像我这样的简单情况下可能会有所帮助

I had the same problem, and I developed this simple solution:

objects_id = [x.id for x in MyModel.objects.all() if x.myProperty == [...]]
MyModel.objects.filter(id__in=objects_id)

I know it’s not the most performatic solution, but may help in simple cases as mine


回答 5

请有人纠正我,但我想至少在我自己的情况下,我已经找到了解决方案。

我想处理所有属性完全等于…的元素。

但是我有几个模型,这个例程应该适用于所有模型。它确实:

def selectByProperties(modelType, specify):
    clause = "SELECT * from %s" % modelType._meta.db_table

    if len(specify) > 0:
        clause += " WHERE "
        for field, eqvalue in specify.items():
            clause += "%s = '%s' AND " % (field, eqvalue)
        clause = clause [:-5]  # remove last AND

    print clause
    return modelType.objects.raw(clause)

通过这个通用子例程,我可以选择与我的“ specify”(属性名称,属性值)组合的字典完全相等的所有元素。

第一个参数采用(models.Model),

第二个字典,例如:{“ property1”:“ 77”,“ property2”:“ 12”}

它创建了一条SQL语句,例如

SELECT * from appname_modelname WHERE property1 = '77' AND property2 = '12'

并在这些元素上返回QuerySet。

这是一个测试功能:

from myApp.models import myModel

def testSelectByProperties ():

    specify = {"property1" : "77" , "property2" : "12"}
    subset = selectByProperties(myModel, specify)

    nameField = "property0"
    ## checking if that is what I expected:
    for i in subset:
        print i.__dict__[nameField], 
        for j in specify.keys():
             print i.__dict__[j], 
        print 

和?你怎么看?

PLEASE someone correct me, but I guess I have found a solution, at least for my own case.

I want to work on all those elements whose properties are exactly equal to … whatever.

But I have several models, and this routine should work for all models. And it does:

def selectByProperties(modelType, specify):
    clause = "SELECT * from %s" % modelType._meta.db_table

    if len(specify) > 0:
        clause += " WHERE "
        for field, eqvalue in specify.items():
            clause += "%s = '%s' AND " % (field, eqvalue)
        clause = clause [:-5]  # remove last AND

    print clause
    return modelType.objects.raw(clause)

With this universal subroutine, I can select all those elements which exactly equal my dictionary of ‘specify’ (propertyname,propertyvalue) combinations.

The first parameter takes a (models.Model),

the second a dictionary like: {“property1” : “77” , “property2” : “12”}

And it creates an SQL statement like

SELECT * from appname_modelname WHERE property1 = '77' AND property2 = '12'

and returns a QuerySet on those elements.

This is a test function:

from myApp.models import myModel

def testSelectByProperties ():

    specify = {"property1" : "77" , "property2" : "12"}
    subset = selectByProperties(myModel, specify)

    nameField = "property0"
    ## checking if that is what I expected:
    for i in subset:
        print i.__dict__[nameField], 
        for j in specify.keys():
             print i.__dict__[j], 
        print 

And? What do you think?


回答 6

我知道这是一个古老的问题,但是对于那些跳到这里的人来说,我认为阅读下面的问题和相对答案很有用:

如何在Django 1.4中自定义管理过滤器

i know it is an old question, but for the sake of those jumping here i think it is useful to read the question below and the relative answer:

How to customize admin filter in Django 1.4


回答 7

它也可能通过使用查询集批注复制属性get / set-逻辑,建议如@rattray@thegrimmscientist会同property。这可能会产生在Python级别数据库级别都可以使用的东西。

但是,不确定其缺点:请参阅此SO问题作为示例。

It may also be possible to use queryset annotations that duplicate the property get/set-logic, as suggested e.g. by @rattray and @thegrimmscientist, in conjunction with the property. This could yield something that works both on the Python level and on the database level.

Not sure about the drawbacks, however: see this SO question for an example.


SQLAlchemy:如何过滤日期字段?

问题:SQLAlchemy:如何过滤日期字段?

这是模型:

class User(Base):
    ...
    birthday = Column(Date, index=True)   #in database it's like '1987-01-17'
    ...

我想在两个日期之间进行过滤,例如选择间隔18-30年的所有用户。

如何用SQLAlchemy实现它?

我想:

query = DBSession.query(User).filter(
    and_(User.birthday >= '1988-01-17', User.birthday <= '1985-01-17')
) 

# means age >= 24 and age <= 27

我知道这是不正确的,但是该怎么做正确呢?

Here is model:

class User(Base):
    ...
    birthday = Column(Date, index=True)   #in database it's like '1987-01-17'
    ...

I want to filter between two dates, for example to choose all users in interval 18-30 years.

How to implement it with SQLAlchemy?

I think of:

query = DBSession.query(User).filter(
    and_(User.birthday >= '1988-01-17', User.birthday <= '1985-01-17')
) 

# means age >= 24 and age <= 27

I know this is not correct, but how to do correct?


回答 0

实际上,除了错别字,您的查询是正确的:您的过滤器排除了所有记录:您应该更改<=for >=,反之亦然:

qry = DBSession.query(User).filter(
        and_(User.birthday <= '1988-01-17', User.birthday >= '1985-01-17'))
# or same:
qry = DBSession.query(User).filter(User.birthday <= '1988-01-17').\
        filter(User.birthday >= '1985-01-17')

您也可以使用between

qry = DBSession.query(User).filter(User.birthday.between('1985-01-17', '1988-01-17'))

In fact, your query is right except for the typo: your filter is excluding all records: you should change the <= for >= and vice versa:

qry = DBSession.query(User).filter(
        and_(User.birthday <= '1988-01-17', User.birthday >= '1985-01-17'))
# or same:
qry = DBSession.query(User).filter(User.birthday <= '1988-01-17').\
        filter(User.birthday >= '1985-01-17')

Also you can use between:

qry = DBSession.query(User).filter(User.birthday.between('1985-01-17', '1988-01-17'))

回答 1

from app import SQLAlchemyDB as db

Chance.query.filter(Chance.repo_id==repo_id, 
                    Chance.status=="1", 
                    db.func.date(Chance.apply_time)<=end, 
                    db.func.date(Chance.apply_time)>=start).count()

它等于:

select
   count(id)
from
   Chance
where
   repo_id=:repo_id 
   and status='1'
   and date(apple_time) <= end
   and date(apple_time) >= start

希望可以帮助你。

from app import SQLAlchemyDB as db

Chance.query.filter(Chance.repo_id==repo_id, 
                    Chance.status=="1", 
                    db.func.date(Chance.apply_time)<=end, 
                    db.func.date(Chance.apply_time)>=start).count()

it is equal to:

select
   count(id)
from
   Chance
where
   repo_id=:repo_id 
   and status='1'
   and date(apple_time) <= end
   and date(apple_time) >= start

wish can help you.


回答 2

如果要获得整个期间:

    from sqlalchemy import and_, func

    query = DBSession.query(User).filter(and_(func.date(User.birthday) >= '1985-01-17'),\
                                              func.date(User.birthday) <= '1988-01-17'))

这表示范围:1985-01-17 00 : 00-1988-01-17 23:59

if you want to get the whole period:

    from sqlalchemy import and_, func

    query = DBSession.query(User).filter(and_(func.date(User.birthday) >= '1985-01-17'),\
                                              func.date(User.birthday) <= '1988-01-17'))

That means range: 1985-01-17 00:001988-01-17 23:59


在此Django应用程序教程中,choice_set是什么?

问题:在此Django应用程序教程中,choice_set是什么?

Django教程中的这一行,编写您的第一个Django应用,第1部分

p.choice_set.create(choice='Not much', votes=0)

它是如何choice_set存在的?它是什么?

我想这choice部分是Choice本教程中使用的模型的小写版本,但是什么是choice_set?你能详细说明吗?

更新:根据Ben的回答,我找到了此文档:遵循“向后”关系

There is this line in the Django tutorial, Writing your first Django app, part 1:

p.choice_set.create(choice='Not much', votes=0)

How is choice_set called into existence and what is it?

I suppose the choice part is the lowercase version of the model Choice used in the tutorial, but what is choice_set? Can you elaborate?

UPDATE: Based on Ben‘s answer, I located this documentation: Following relationships “backward”.


回答 0

您创建了一个外键Choice,每个外键都与关联Question

因此,每个Choice显式都有一个question字段,您可以在模型中声明该字段。

Django的ORM也遵循这种关系Question,在每个实例上自动生成一个名为foo_setwhere Foo是模型的ForeignKey字段,其中包含该模型的字段。

choice_set是一个RelatedManager可以创建ChoiceQuestion实例相关的对象的查询集的,例如q.choice_set.all()

如果您不喜欢foo_setDjango自动选择的命名,或者您对同一个模型拥有多个外键并需要区分它们,则可以使用related_name参数to 来选择自己的替代名称ForeignKey

You created a foreign key on Choice which relates each one to a Question.

So, each Choice explicitly has a question field, which you declared in the model.

Django’s ORM follows the relationship backwards from Question too, automatically generating a field on each instance called foo_set where Foo is the model with a ForeignKey field to that model.

choice_set is a RelatedManager which can create querysets of Choice objects which relate to the Question instance, e.g. q.choice_set.all()

If you don’t like the foo_set naming which Django chooses automatically, or if you have more than one foreign key to the same model and need to distinguish them, you can choose your own overriding name using the related_name argument to ForeignKey.


有什么好的Python ORM解决方案?[关闭]

问题:有什么好的Python ORM解决方案?[关闭]

我正在评估并考虑将CherryPy用于一个项目,该项目基本上是客户端(浏览器)的JavaScript前端,可与后端的Python Web服务进行通信。因此,我确实需要在后端实现快速,轻便的东西,我可以使用Python来实现,然后再通过ORM(浏览器的JSON)与PostgreSQL数据库对话。

我还在看我喜欢的Django,因为它的ORM是内置的。但是,我认为Django可能比我真正需要的要多(即比我真正需要的功能更多==慢吗?)。

任何人都具有使用不同Python ORM解决方案的经验,这些解决方案可以比较和对比其特性和功能,速度,效率等?

I’m evaluating and looking at using CherryPy for a project that’s basically a JavaScript front-end from the client-side (browser) that talks to a Python web service on the back-end. So, I really need something fast and lightweight on the back-end that I can implement using Python that then speaks to the PostgreSQL DB via an ORM (JSON to the browser).

I’m also looking at Django, which I like, since its ORM is built-in. However, I think Django might be a little more than I really need (i.e. more features than I really need == slower?).

Anyone have any experience with different Python ORM solutions that can compare and contrast their features and functionality, speed, efficiency, etc.?


回答 0

SQLAlchemy功能更强大,功能更强大(使用DataMapper模式)。Django ORM的语法更简洁,更易于编写(ActiveRecord模式)。我不了解性能差异。

SQLAlchemy还具有一个声明性层,该隐藏了一些复杂性,并使其具有类似于Django ORM的ActiveRecord样式语法。

我不会担心Django太“笨重”。它已经足够解耦,因此您可以在不需要导入其余部分的情况下使用ORM 。

就是说,如果我已经在Web层上使用CherryPy并且只需要一个ORM,那么我可能会选择SQLAlchemy。

SQLAlchemy is more full-featured and powerful (uses the DataMapper pattern). Django ORM has a cleaner syntax and is easier to write for (ActiveRecord pattern). I don’t know about performance differences.

SQLAlchemy also has a declarative layer that hides some complexity and gives it a ActiveRecord-style syntax more similar to the Django ORM.

I wouldn’t worry about Django being “too heavy.” It’s decoupled enough that you can use the ORM if you want without having to import the rest.

That said, if I were already using CherryPy for the web layer and just needed an ORM, I’d probably opt for SQLAlchemy.


回答 1

如果您正在寻找轻量级并且已经熟悉Django风格的声明性模型,请查看peewee:https : //github.com/coleifer/peewee

例:

import datetime
from peewee import *

class Blog(Model):
    name = CharField()

class Entry(Model):
    blog = ForeignKeyField(Blog)
    title = CharField()
    body = TextField()
    pub_date = DateTimeField(default=datetime.datetime.now)

# query it like django
Entry.filter(blog__name='Some great blog')

# or programmatically for finer-grained control
Entry.select().join(Blog).where(Blog.name == 'Some awesome blog')

查看文档以获取更多示例。

If you’re looking for lightweight and are already familiar with django-style declarative models, check out peewee: https://github.com/coleifer/peewee

Example:

import datetime
from peewee import *

class Blog(Model):
    name = CharField()

class Entry(Model):
    blog = ForeignKeyField(Blog)
    title = CharField()
    body = TextField()
    pub_date = DateTimeField(default=datetime.datetime.now)

# query it like django
Entry.filter(blog__name='Some great blog')

# or programmatically for finer-grained control
Entry.select().join(Blog).where(Blog.name == 'Some awesome blog')

Check the docs for more examples.


回答 2

Storm可以说是最简单的API:

from storm.locals import *

class Foo:
    __storm_table__ = 'foos'
    id = Int(primary=True)


class Thing:
    __storm_table__ = 'things'
    id = Int(primary=True)
    name = Unicode()
    description = Unicode()
    foo_id = Int()
    foo = Reference(foo_id, Foo.id)

db = create_database('sqlite:')
store = Store(db)

foo = Foo()
store.add(foo)
thing = Thing()
thing.foo = foo
store.add(thing)
store.commit()

而且,当您需要执行以下操作时,可以轻松进入原始SQL:

store.execute('UPDATE bars SET bar_name=? WHERE bar_id like ?', []) 
store.commit()

Storm has arguably the simplest API:

from storm.locals import *

class Foo:
    __storm_table__ = 'foos'
    id = Int(primary=True)


class Thing:
    __storm_table__ = 'things'
    id = Int(primary=True)
    name = Unicode()
    description = Unicode()
    foo_id = Int()
    foo = Reference(foo_id, Foo.id)

db = create_database('sqlite:')
store = Store(db)

foo = Foo()
store.add(foo)
thing = Thing()
thing.foo = foo
store.add(thing)
store.commit()

And it makes it painless to drop down into raw SQL when you need to:

store.execute('UPDATE bars SET bar_name=? WHERE bar_id like ?', []) 
store.commit()

回答 3

我通常使用SQLAlchemy。它非常强大,并且可能是最成熟的python ORM。

如果您打算使用CherryPy,您可能还会研究dejavu,就像Robert Brewer(现任CherryPy项目负责人)一样。我个人没有使用过它,但是我确实知道有些人喜欢它。

SQLObject使用ORM比使用SQLAlchemy容易一些,但是功能不那么强大。

就个人而言,除非计划在Django中编写整个项目,否则我不会使用Django ORM,但这只是我一个人。

I usually use SQLAlchemy. It’s pretty powerful and is probably the most mature python ORM.

If you’re planning on using CherryPy, you might also look into dejavu as it’s by Robert Brewer (the guy that is the current CherryPy project leader). I personally haven’t used it, but I do know some people that love it.

SQLObject is a little bit easier to use ORM than SQLAlchemy, but it’s not quite as powerful.

Personally, I wouldn’t use the Django ORM unless I was planning on writing the entire project in Django, but that’s just me.


回答 4

SQLAlchemy的声明性扩展已在0.5中成为标准,它提供了一个与Django或Storm十分相似的多合一接口。它还与使用datamapper样式配置的类/表无缝集成:

Base = declarative_base()

class Foo(Base):
    __tablename__ = 'foos'
    id = Column(Integer, primary_key=True)

class Thing(Base):
    __tablename__ = 'things'

    id = Column(Integer, primary_key=True)
    name = Column(Unicode)
    description = Column(Unicode)
    foo_id = Column(Integer, ForeignKey('foos.id'))
    foo = relation(Foo)

engine = create_engine('sqlite://')

Base.metadata.create_all(engine)  # issues DDL to create tables

session = sessionmaker(bind=engine)()

foo = Foo()
session.add(foo)
thing = Thing(name='thing1', description='some thing')
thing.foo = foo  # also adds Thing to session
session.commit()

SQLAlchemy’s declarative extension, which is becoming standard in 0.5, provides an all in one interface very much like that of Django or Storm. It also integrates seamlessly with classes/tables configured using the datamapper style:

Base = declarative_base()

class Foo(Base):
    __tablename__ = 'foos'
    id = Column(Integer, primary_key=True)

class Thing(Base):
    __tablename__ = 'things'

    id = Column(Integer, primary_key=True)
    name = Column(Unicode)
    description = Column(Unicode)
    foo_id = Column(Integer, ForeignKey('foos.id'))
    foo = relation(Foo)

engine = create_engine('sqlite://')

Base.metadata.create_all(engine)  # issues DDL to create tables

session = sessionmaker(bind=engine)()

foo = Foo()
session.add(foo)
thing = Thing(name='thing1', description='some thing')
thing.foo = foo  # also adds Thing to session
session.commit()

回答 5

我们将Elixir与SQLAlchemy结合使用,到目前为止,它还是很喜欢的。Elixir在SQLAlchemy之上放置了一层,使其看起来更像“ ActiveRecord模式”计数器部分。

We use Elixir alongside SQLAlchemy and have liked it so far. Elixir puts a layer on top of SQLAlchemy that makes it look more like the “ActiveRecord pattern” counter parts.


回答 6

这似乎是Python中高级数据库交互的规范参考点:http : //wiki.python.org/moin/HigherLevelDatabaseProgramming

从那里开始,Dejavu似乎在Python中相当抽象地实现了Martin Fowler的DataMapper模式。

This seems to be the canonical reference point for high-level database interaction in Python: http://wiki.python.org/moin/HigherLevelDatabaseProgramming

From there, it looks like Dejavu implements Martin Fowler’s DataMapper pattern fairly abstractly in Python.


回答 7

我认为您可能会看一下:

秋季

风暴

I think you might look at:

Autumn

Storm


回答 8

Django的未使用功能无法想象会降低性能。如果您决定扩大项目规模,可能会派上用场。

There is no conceivable way that the unused features in Django will give a performance penalty. Might just come in handy if you ever decide to upscale the project.


回答 9

我在一个小项目中使用了Storm + SQLite,并且对它感到非常满意,直到添加了多处理。尝试从多个进程使用数据库导致“数据库被锁定”异常。我切换到SQLAlchemy,并且相同的代码正常工作。

I used Storm + SQLite for a small project, and was pretty happy with it until I added multiprocessing. Trying to use the database from multiple processes resulted in a “Database is locked” exception. I switched to SQLAlchemy, and the same code worked with no problems.


回答 10

SQLAlchemy非常非常强大。但是,它不是线程安全的,请确保在线程池模式下使用cherrypy时要牢记这一点。

SQLAlchemy is very, very powerful. However it is not thread safe make sure you keep that in mind when working with cherrypy in thread-pool mode.


回答 11

我会检查SQLAlchemy

它真的很容易使用,并且您使用的模型也不错。Django对它的ORM使用SQLAlchemy,但单独使用它可以让您充分利用它。

这是一个有关创建和选择orm对象的小例子

>>> ed_user = User('ed', 'Ed Jones', 'edspassword')
>>> session.add(ed_user)
>>> our_user = session.query(User).filter_by(name='ed').first() 
>>> our_user
    <User('ed','Ed Jones', 'edspassword')>

I’d check out SQLAlchemy

It’s really easy to use and the models you work with aren’t bad at all. Django uses SQLAlchemy for it’s ORM but using it by itself lets you use it’s full power.

Here’s a small example on creating and selecting orm objects

>>> ed_user = User('ed', 'Ed Jones', 'edspassword')
>>> session.add(ed_user)
>>> our_user = session.query(User).filter_by(name='ed').first() 
>>> our_user
    <User('ed','Ed Jones', 'edspassword')>

Records-面向人类的sql

Records:用于人类的sql™

Records是一个非常简单但功能强大的库,用于对大多数关系数据库进行原始SQL查询

只需编写SQL即可。没有铃声,没有口哨。使用可用的标准工具,这项常见任务可能会出人意料地困难。该库努力使此工作流尽可能简单,同时提供一个优雅的界面来处理您的查询结果

数据库支持包括RedShift、Postgres、MySQL、SQLite、Oracle和MS-SQL(不包括驱动程序)


☤基础知识

我们知道如何编写SQL,所以让我们将一些内容发送到我们的数据库:

import records

db = records.Database('postgres://...')
rows = db.query('select * from active_users')    # or db.query_file('sqls/active-users.sql')

一次抓取一行:

>>> rows[0]
<Record {"username": "model-t", "active": true, "name": "Henry Ford", "user_email": "model-t@gmail.com", "timezone": "2016-02-06 22:28:23.894202"}>

或者迭代它们:

for r in rows:
    print(r.name, r.user_email)

可以通过多种方式访问值:row.user_emailrow['user_email'],或row[3]

还完全支持包含非字母数字字符(如空格)的字段

或存储您的记录集合的副本以供以后参考:

>>> rows.all()
[<Record {"username": ...}>, <Record {"username": ...}>, <Record {"username": ...}>, ...]

如果您只期待一个结果:

>>> rows.first()
<Record {"username": ...}>

其他选项包括rows.as_dict()rows.as_dict(ordered=True)

☤功能

  • 迭代行被缓存以供将来引用
  • $DATABASE_URL环境变量支持
  • 便利性Database.get_table_names方法
  • 用于导出查询的命令行记录工具
  • 安全参数化:Database.query('life=:everything', everything=42)
  • 查询可以作为字符串或文件名传递,支持的参数
  • 交易记录:t = Database.transaction(); t.commit()
  • 批量操作:Database.bulk_query()&Database.bulk_query_file()

唱片公司自豪地由SQLAlchemyTablib

☤数据导出功能

Record还具有完全的Tablib集成功能,允许您通过一行代码将结果导出为CSV、XLS、JSON、HTML表、YAML或Pandas DataFrames。非常适合与朋友共享数据或生成报告

>>> print(rows.dataset)
username|active|name      |user_email       |timezone
--------|------|----------|-----------------|--------------------------
model-t |True  |Henry Ford|model-t@gmail.com|2016-02-06 22:28:23.894202
...

逗号分隔值(CSV)

>>> print(rows.export('csv'))
username,active,name,user_email,timezone
model-t,True,Henry Ford,model-t@gmail.com,2016-02-06 22:28:23.894202
...

YAML Ain‘t Markup Language(YAML)

>>> print(rows.export('yaml'))
- {active: true, name: Henry Ford, timezone: '2016-02-06 22:28:23.894202', user_email: model-t@gmail.com, username: model-t}
...

JavaScript对象表示法(JSON)

>>> print(rows.export('json'))
[{"username": "model-t", "active": true, "name": "Henry Ford", "user_email": "model-t@gmail.com", "timezone": "2016-02-06 22:28:23.894202"}, ...]

Microsoft Excel(xls,xlsx)

with open('report.xls', 'wb') as f:
    f.write(rows.export('xls'))

熊猫数据帧

>>> rows.export('df')
    username  active       name        user_email                   timezone
0    model-t    True Henry Ford model-t@gmail.com 2016-02-06 22:28:23.894202

你说对了。Tablib的所有其他功能也可用,因此您可以对结果进行排序、添加/删除列/行、删除重复项、转置表格、添加分隔符、按列切片数据等

请参阅Tablib Documentation有关更多详细信息,请参阅

☤安装

当然,推荐的安装方法是pipenv

$ pipenv install records[pandas]
✨🍰✨

☤命令行工具

作为额外的奖励,一个records自动包括命令行工具。以下是使用信息的截图:

☤,谢谢你

感谢您借阅本图书馆!我希望你会觉得它有用

当然,总有改进的空间。请随意……open an issue所以我们可以把唱片做得更好、更强、更快

Django-为有最后期限的完美主义者准备的Web框架

Django是一个高级Python Web框架,它鼓励快速开发和干净、实用的设计。感谢您的查看

所有文档都在“docs”目录中,并在线访问https://docs.djangoproject.com/en/stable/.如果您刚刚开始,以下是我们建议您阅读文档的方法:

  • 首先,阅读docs/info/install.txt以获得有关安装Django的说明
  • 接下来,按顺序学习教程(docs/info/tutorial01.txt、docs/info/tutorial02.txt等)。
  • 如果要设置实际的部署服务器,请阅读docs/HOWTO/Deployment/index.txt以获取说明
  • 接下来,您可能想要通读主题指南(在文档/主题中);从那里您可以跳到HOWTO(在文档/HOWTO中)以了解特定问题,并查看参考资料(Docs/ref)以了解血淋淋的详细信息
  • 有关构建文档的HTML版本的说明,请参见docs/readme

文档会严格更新。如果您在文档中发现任何问题,或者认为应该以任何方式加以澄清,请在这里花30秒填写一张罚单:https://code.djangoproject.com/newticket

要获得更多帮助,请执行以下操作:

  • 加入irc.Libera.chat上的#Django频道。那里有很多乐于助人的人。如果您刚接触irc,请访问https://web.libera.chat。
  • 加入Django-Users邮件列表,或在https://groups.google.com/group/django-users上阅读存档

要对Django做出贡献,请执行以下操作:

  • Check out https://docs.djangoproject.com/en/dev/internals/contributing/ for
    information about getting involved.

要运行Django的测试套件,请执行以下操作:

  • Follow the instructions in the “Unit tests” section of
    docs/internals/contributing/writing-code/unit-tests.txt, published online at
    https://docs.djangoproject.com/en/dev/internals/contributing/writing-code/unit-tests/#running-the-unit-tests

Supporting the Development of Django

Django 的发展有赖于你们的贡献。

如果您依赖Django,请记住支持Django软件基金会:https://www.djangoproject.com/fundraising/