问题:使用SQLAlchemy ORM高效地更新数据库

我正在启动一个新应用程序,并考虑使用ORM,尤其是SQLAlchemy。

假设我的数据库中有一列“ foo”,我想增加它。在直通sqlite中,这很容易:

db = sqlite3.connect('mydata.sqlitedb')
cur = db.cursor()
cur.execute('update table stuff set foo = foo + 1')

我弄清楚了SQLAlchemy SQL-builder等效项:

engine = sqlalchemy.create_engine('sqlite:///mydata.sqlitedb')
md = sqlalchemy.MetaData(engine)
table = sqlalchemy.Table('stuff', md, autoload=True)
upd = table.update(values={table.c.foo:table.c.foo+1})
engine.execute(upd)

这稍微慢一点,但是没有太多。

这是我对SQLAlchemy ORM方法的最佳猜测:

# snip definition of Stuff class made using declarative_base
# snip creation of session object
for c in session.query(Stuff):
    c.foo = c.foo + 1
session.flush()
session.commit()

这样做是正确的,但所需的时间是其他两种方法的近50倍。我认为这是因为它必须先将所有数据带入内存,然后才能使用它。

有什么方法可以使用SQLAlchemy的ORM生成高效的SQL?还是使用其他任何Python ORM?还是我应该回到手工编写SQL?

I’m starting a new application and looking at using an ORM — in particular, SQLAlchemy.

Say I’ve got a column ‘foo’ in my database and I want to increment it. In straight sqlite, this is easy:

db = sqlite3.connect('mydata.sqlitedb')
cur = db.cursor()
cur.execute('update table stuff set foo = foo + 1')

I figured out the SQLAlchemy SQL-builder equivalent:

engine = sqlalchemy.create_engine('sqlite:///mydata.sqlitedb')
md = sqlalchemy.MetaData(engine)
table = sqlalchemy.Table('stuff', md, autoload=True)
upd = table.update(values={table.c.foo:table.c.foo+1})
engine.execute(upd)

This is slightly slower, but there’s not much in it.

Here’s my best guess for a SQLAlchemy ORM approach:

# snip definition of Stuff class made using declarative_base
# snip creation of session object
for c in session.query(Stuff):
    c.foo = c.foo + 1
session.flush()
session.commit()

This does the right thing, but it takes just under fifty times as long as the other two approaches. I presume that’s because it has to bring all the data into memory before it can work with it.

Is there any way to generate the efficient SQL using SQLAlchemy’s ORM? Or using any other python ORM? Or should I just go back to writing the SQL by hand?


回答 0

SQLAlchemy的ORM旨在与SQL层一起使用,而不是将其隐藏。但是,在同一事务中使用ORM和纯SQL时,您必须牢记一两件事。基本上,从一方面讲,仅当您从会话中清除更改时,ORM数据修改才会命中数据库。另一方面,SQL数据操作语句不会影响会话中的对象。

所以如果你说

for c in session.query(Stuff).all():
    c.foo = c.foo+1
session.commit()

它会按照说的去做,从数据库中获取所有对象,修改所有对象,然后在需要时将更改刷新到数据库中,一行一行地更新。

相反,您应该这样做:

session.execute(update(stuff_table, values={stuff_table.c.foo: stuff_table.c.foo + 1}))
session.commit()

这将像您期望的那样作为一个查询执行,并且因为至少默认会话配置在提交时使会话中的所有数据失效,所以您没有任何过时的数据问题。

在即将发布的0.5系列中,您还可以使用以下方法进行更新:

session.query(Stuff).update({Stuff.foo: Stuff.foo + 1})
session.commit()

基本上,它将运行与上一片段相同的SQL语句,但还会选择更改的行并使会话中的所有过时数据过期。如果您知道更新后没有使用任何会话数据,则也可以synchronize_session=False将其添加到update语句中并摆脱该选择。

SQLAlchemy’s ORM is meant to be used together with the SQL layer, not hide it. But you do have to keep one or two things in mind when using the ORM and plain SQL in the same transaction. Basically, from one side, ORM data modifications will only hit the database when you flush the changes from your session. From the other side, SQL data manipulation statements don’t affect the objects that are in your session.

So if you say

for c in session.query(Stuff).all():
    c.foo = c.foo+1
session.commit()

it will do what it says, go fetch all the objects from the database, modify all the objects and then when it’s time to flush the changes to the database, update the rows one by one.

Instead you should do this:

session.execute(update(stuff_table, values={stuff_table.c.foo: stuff_table.c.foo + 1}))
session.commit()

This will execute as one query as you would expect, and because at least the default session configuration expires all data in the session on commit you don’t have any stale data issues.

In the almost-released 0.5 series you could also use this method for updating:

session.query(Stuff).update({Stuff.foo: Stuff.foo + 1})
session.commit()

That will basically run the same SQL statement as the previous snippet, but also select the changed rows and expire any stale data in the session. If you know you aren’t using any session data after the update you could also add synchronize_session=False to the update statement and get rid of that select.


回答 1

session.query(Clients).filter(Clients.id == client_id_list).update({'status': status})
session.commit()

试试这个=)

session.query(Clients).filter(Clients.id == client_id_list).update({'status': status})
session.commit()

Try this =)


回答 2

有几种使用sqlalchemy进行更新的方法

1) for c in session.query(Stuff).all():
       c.foo += 1
   session.commit()

2) session.query().\
       update({"foo": (Stuff.foo + 1)})
   session.commit()

3) conn = engine.connect()
   stmt = Stuff.update().\
       values(Stuff.foo = (Stuff.foo + 1))
   conn.execute(stmt)

There are several ways to UPDATE using sqlalchemy

1) for c in session.query(Stuff).all():
       c.foo += 1
   session.commit()

2) session.query().\
       update({"foo": (Stuff.foo + 1)})
   session.commit()

3) conn = engine.connect()
   stmt = Stuff.update().\
       values(Stuff.foo = (Stuff.foo + 1))
   conn.execute(stmt)

回答 3

这是一个无需手动映射字段即可解决相同问题的示例:

from sqlalchemy import Column, ForeignKey, Integer, String, Date, DateTime, text, create_engine
from sqlalchemy.exc import IntegrityError
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm.attributes import InstrumentedAttribute

engine = create_engine('postgres://postgres@localhost:5432/database')
session = sessionmaker()
session.configure(bind=engine)

Base = declarative_base()


class Media(Base):
  __tablename__ = 'media'
  id = Column(Integer, primary_key=True)
  title = Column(String, nullable=False)
  slug = Column(String, nullable=False)
  type = Column(String, nullable=False)

  def update(self):
    s = session()
    mapped_values = {}
    for item in Media.__dict__.iteritems():
      field_name = item[0]
      field_type = item[1]
      is_column = isinstance(field_type, InstrumentedAttribute)
      if is_column:
        mapped_values[field_name] = getattr(self, field_name)

    s.query(Media).filter(Media.id == self.id).update(mapped_values)
    s.commit()

因此,要更新Media实例,您可以执行以下操作:

media = Media(id=123, title="Titular Line", slug="titular-line", type="movie")
media.update()

Here’s an example of how to solve the same problem without having to map the fields manually:

from sqlalchemy import Column, ForeignKey, Integer, String, Date, DateTime, text, create_engine
from sqlalchemy.exc import IntegrityError
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm.attributes import InstrumentedAttribute

engine = create_engine('postgres://postgres@localhost:5432/database')
session = sessionmaker()
session.configure(bind=engine)

Base = declarative_base()


class Media(Base):
  __tablename__ = 'media'
  id = Column(Integer, primary_key=True)
  title = Column(String, nullable=False)
  slug = Column(String, nullable=False)
  type = Column(String, nullable=False)

  def update(self):
    s = session()
    mapped_values = {}
    for item in Media.__dict__.iteritems():
      field_name = item[0]
      field_type = item[1]
      is_column = isinstance(field_type, InstrumentedAttribute)
      if is_column:
        mapped_values[field_name] = getattr(self, field_name)

    s.query(Media).filter(Media.id == self.id).update(mapped_values)
    s.commit()

So to update a Media instance, you can do something like this:

media = Media(id=123, title="Titular Line", slug="titular-line", type="movie")
media.update()

回答 4

经过足够的测试,我会尝试:

for c in session.query(Stuff).all():
     c.foo = c.foo+1
session.commit()

(IIRC,commit()在不使用flush()的情况下工作)。

我发现有时执行大型查询然后在python中进行迭代比许多查询快2个数量级。我假设遍历查询对象的效率不及遍历查询对象的all()方法生成的列表的效率。

[请注意下面的评论-这根本没有加快速度]。

Withough testing, I’d try:

for c in session.query(Stuff).all():
     c.foo = c.foo+1
session.commit()

(IIRC, commit() works without flush()).

I’ve found that at times doing a large query and then iterating in python can be up to 2 orders of magnitude faster than lots of queries. I assume that iterating over the query object is less efficient than iterating over a list generated by the all() method of the query object.

[Please note comment below – this did not speed things up at all].


回答 5

如果是由于创建对象方面的开销,那么使用SA可能根本无法加速。

如果是因为它正在加载相关对象,那么您可以通过延迟加载来执行某些操作。是否存在大量由于引用而创建的对象?(即,获取Company对象也将获取所有相关的People对象)。

If it is because of the overhead in terms of creating objects, then it probably can’t be sped up at all with SA.

If it is because it is loading up related objects, then you might be able to do something with lazy loading. Are there lots of objects being created due to references? (IE, getting a Company object also gets all of the related People objects).


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。