导入函数内部是pythonic吗?

问题:导入函数内部是pythonic吗?

PEP 8说:

  • 导入总是放在文件的顶部,紧随任何模块注释和文档字符串之后,以及模块全局变量和常量之前。

占用时,我违反了PEP8。有时,我在函数中导入了东西。通常,如果存在仅在单个函数中使用的导入,则执行此操作。

有什么意见吗?

编辑(我觉得导入函数的原因可能是个好主意):

主要原因:它可以使代码更清晰。

  • 在查看函数代码时,我可能会问自己:“函数/类xxx是什么?” (在函数内部使用xxx)。如果我的所有导入都在模块顶部,则必须去那里确定xxx是什么。使用时,这更成问题from m import xxx。看到m.xxx该功能可能会告诉我更多信息。取决于什么m:它是众所周知的顶级模块/软件包(import m)?还是子模块/包(from a.b.c import m)?
  • 在某些情况下,具有与使用xxx接近的位置的额外信息(“ xxx是什么?”)可以使功能更易于理解。

PEP 8 says:

  • Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.

On occation, I violate PEP 8. Some times I import stuff inside functions. As a general rule, I do this if there is an import that is only used within a single function.

Any opinions?

EDIT (the reason I feel importing in functions can be a good idea):

Main reason: It can make the code clearer.

  • When looking at the code of a function I might ask myself: “What is function/class xxx?” (xxx being used inside the function). If I have all my imports at the top of the module, I have to go look there to determine what xxx is. This is more of an issue when using from m import xxx. Seeing m.xxx in the function probably tells me more. Depending on what m is: Is it a well-known top-level module/package (import m)? Or is it a sub-module/package (from a.b.c import m)?
  • In some cases having that extra information (“What is xxx?”) close to where xxx is used can make the function easier to understand.

回答 0

从长远来看,我想您会喜欢将大多数导入都放在文件顶部的方式,这样一来您就可以一目了然地判断模块需要导入的内容有多复杂。

如果要在现有文件中添加新代码,通常会在需要的地方进行导入,然后,如果代码保留下来,则可以通过将导入行移到文件顶部来使事情变得更永久。

还有一点,我更喜欢ImportError在运行任何代码之前获取一个异常-作为健全性检查,因此这是在顶部导入的另一个原因。

pyChecker用来检查未使用的模块。

In the long run I think you’ll appreciate having most of your imports at the top of the file, that way you can tell at a glance how complicated your module is by what it needs to import.

If I’m adding new code to an existing file I’ll usually do the import where it’s needed and then if the code stays I’ll make things more permanent by moving the import line to the top of the file.

One other point, I prefer to get an ImportError exception before any code is run — as a sanity check, so that’s another reason to import at the top.

I use pyChecker to check for unused modules.


回答 1

在这方面,我有两次违反PEP 8的情况:

  • 循环导入:模块A导入模块B,但是模块B中的某些东西需要模块A(尽管这通常表明我需要重构模块以消除循环依赖)
  • 插入pdb断点:import pdb; pdb.set_trace()这很方便,我不想放在import pdb可能要调试的每个模块的顶部,并且很容易记住在删除断点时要删除导入。

除了这两种情况外,最好将所有内容放在首位。它使依赖关系更加清晰。

There are two occasions where I violate PEP 8 in this regard:

  • Circular imports: module A imports module B, but something in module B needs module A (though this is often a sign that I need to refactor the modules to eliminate the circular dependency)
  • Inserting a pdb breakpoint: import pdb; pdb.set_trace() This is handy b/c I don’t want to put import pdb at the top of every module I might want to debug, and it easy to remember to remove the import when I remove the breakpoint.

Outside of these two cases, it’s a good idea to put everything at the top. It makes the dependencies clearer.


回答 2

这是我们使用的四个导入用例

  1. import(和from x import yimport x as y)在顶部

  2. 导入选择。在顶部。

    import settings
    if setting.something:
        import this as foo
    else:
        import that as foo
  3. 有条件的导入。与JSON,XML库等一起使用。在顶部。

    try:
        import this as foo
    except ImportError:
        import that as foo
  4. 动态导入。到目前为止,我们只有一个例子。

    import settings
    module_stuff = {}
    module= __import__( settings.some_module, module_stuff )
    x = module_stuff['x']

    请注意,这种动态导入不会引入代码,而是引入以Python编写的复杂数据结构。这有点像腌制的数据,只是我们手工腌制了。

    这或多或少都在模块的顶部


这是使代码更清晰的方法:

  • 保持模块简短。

  • 如果我将所有导入内容都放在模块顶部,则必须去那里确定名称。如果模块很短,那很容易做到。

  • 在某些情况下,使多余的信息靠近名称的使用位置可使该功能更易于理解。如果模块很短,那很容易做到。

Here are the four import use cases that we use

  1. import (and from x import y and import x as y) at the top

  2. Choices for Import. At the top.

    import settings
    if setting.something:
        import this as foo
    else:
        import that as foo
    
  3. Conditional Import. Used with JSON, XML libraries and the like. At the top.

    try:
        import this as foo
    except ImportError:
        import that as foo
    
  4. Dynamic Import. So far, we only have one example of this.

    import settings
    module_stuff = {}
    module= __import__( settings.some_module, module_stuff )
    x = module_stuff['x']
    

    Note that this dynamic import doesn’t bring in code, but brings in complex data structures written in Python. It’s kind of like a pickled piece of data except we pickled it by hand.

    This is also, more-or-less, at the top of a module


Here’s what we do to make the code clearer:

  • Keep the modules short.

  • If I have all my imports at the top of the module, I have to go look there to determine what a name is. If the module is short, that’s easy to do.

  • In some cases having that extra information close to where a name is used can make the function easier to understand. If the module is short, that’s easy to do.


回答 3

请记住一件事:不必要的导入可能会导致性能问题。因此,如果此函数经常被调用,则最好将导入放在顶部。当然,这一种优化,因此,如果有一个有效的案例可以证明,在函数内部的导入比在文件顶部的导入更清晰,那么在大多数情况下,这会降低性能。

如果您正在使用IronPython,则会被告知最好导入内部函数(因为在IronPython中编译代码可能很慢)。因此,您也许可以找到一种导入内部函数的方法。但是除此之外,我认为与常规作斗争是不值得的。

通常,如果存在仅在单个函数中使用的导入,则执行此操作。

我想提出的另一点是,这可能是潜在的维护问题。如果添加的功能使用的模块以前仅由一个功能使用,会发生什么情况?您是否还记得将导入添加到文件顶部?还是要扫描每个功能的导入?

FWIW,在某些情况下,在函数内部导入是有意义的。例如,如果要在cx_Oracle中设置语言,则需要导入之前设置NLS _LANG环境变量。因此,您可能会看到如下代码:

import os

oracle = None

def InitializeOracle(lang):
    global oracle
    os.environ['NLS_LANG'] = lang
    import cx_Oracle
    oracle = cx_Oracle

One thing to bear in mind: needless imports can cause performance problems. So if this is a function that will be called frequently, you’re better off just putting the import at the top. Of course this is an optimization, so if there’s a valid case to be made that importing inside a function is more clear than importing at the top of a file, that trumps performance in most cases.

If you’re doing IronPython, I’m told that it’s better to import inside functions (since compiling code in IronPython can be slow). Thus, you may be able to get a way with importing inside functions then. But other than that, I’d argue that it’s just not worth it to fight convention.

As a general rule, I do this if there is an import that is only used within a single function.

Another point I’d like to make is that this may be a potential maintenence problem. What happens if you add a function that uses a module that was previously used by only one function? Are you going to remember to add the import to the top of the file? Or are you going to scan each and every function for imports?

FWIW, there are cases where it makes sense to import inside a function. For example, if you want to set the language in cx_Oracle, you need to set an NLS_LANG environment variable before it is imported. Thus, you may see code like this:

import os

oracle = None

def InitializeOracle(lang):
    global oracle
    os.environ['NLS_LANG'] = lang
    import cx_Oracle
    oracle = cx_Oracle

回答 4

对于自测模块,我之前已经打破了此规则。也就是说,它们通常仅用于支持,但是我为它们定义了一个主要版本,因此,如果您自己运行它们,则可以测试其功能。在那种情况下,我有时会导入getoptcmd只是进入main,因为我希望阅读代码的人可以清楚地知道这些模块与模块的正常运行无关,仅包含在测试中。

I’ve broken this rule before for modules that are self-testing. That is, they are normally just used for support, but I define a main for them so that if you run them by themselves you can test their functionality. In that case I sometimes import getopt and cmd just in main, because I want it to be clear to someone reading the code that these modules have nothing to do with the normal operation of the module and are only being included for testing.


回答 5

来自关于 两次加载模块 -为什么不两者都?

脚本顶部的导入将指示依赖关系,并且该函数中的另一个导入将使该函数更具原子性,同时由于连续导入的成本较低,因此似乎不会造成任何性能劣势。

Coming from the question about loading the module twice – Why not both?

An import at the top of the script will indicate the dependencies and another import in the function with make this function more atomic, while seemingly not causing any performance disadvantage, since a consecutive import is cheap.


回答 6

只要它importfrom x import *,您就应该将它们放在顶部。它仅向全局命名空间添加一个名称,并且您坚持使用PEP8。此外,如果以后需要在其他地方使用它,则无需四处移动。

没什么大不了的,但是由于几乎没有区别,所以我建议按照PEP 8的说明进行操作。

As long as it’s import and not from x import *, you should put them at the top. It adds just one name to the global namespace, and you stick to PEP 8. Plus, if you later need it somewhere else, you don’t have to move anything around.

It’s no big deal, but since there’s almost no difference I’d suggest doing what PEP 8 says.


回答 7

看看sqlalchemy中使用的替代方法:依赖项注入:

@util.dependencies("sqlalchemy.orm.query")
def merge_result(query, *args):
    #...
    query.Query(...)

注意导入的库如何在装饰器中声明,并作为参数传递给函数

这种方法使代码更整洁,并且比语句快4.5倍import

基准:https : //gist.github.com/kolypto/589e84fbcfb6312532658df2fabdb796

Have a look at the alternative approach that’s used in sqlalchemy: dependency injection:

@util.dependencies("sqlalchemy.orm.query")
def merge_result(query, *args):
    #...
    query.Query(...)

Notice how the imported library is declared in a decorator, and passed as an argument to the function!

This approach makes the code cleaner, and also works 4.5 times faster than an import statement!

Benchmark: https://gist.github.com/kolypto/589e84fbcfb6312532658df2fabdb796


回答 8

在既是“正常”模块又可以执行的模块中(即 if __name__ == '__main__': -section),我通常导入仅在主要部分内执行模块时使用的模块。

例:

def really_useful_function(data):
    ...


def main():
    from pathlib import Path
    from argparse import ArgumentParser
    from dataloader import load_data_from_directory

    parser = ArgumentParser()
    parser.add_argument('directory')
    args = parser.parse_args()
    data = load_data_from_directory(Path(args.directory))
    print(really_useful_function(data)


if __name__ == '__main__':
    main()

In modules that are both ‘normal’ modules and can be executed (i.e. have a if __name__ == '__main__':-section), I usually import modules that are only used when executing the module inside the main section.

Example:

def really_useful_function(data):
    ...


def main():
    from pathlib import Path
    from argparse import ArgumentParser
    from dataloader import load_data_from_directory

    parser = ArgumentParser()
    parser.add_argument('directory')
    args = parser.parse_args()
    data = load_data_from_directory(Path(args.directory))
    print(really_useful_function(data)


if __name__ == '__main__':
    main()

回答 9

还有另一种(可能是“角落”)情况,这种情况可能会对import内部很少使用的功能有利:缩短启动时间。

我曾经在小型IoT服务器上运行一个相当复杂的程序来碰壁,它接受来自串行线路的命令并执行操作,可能是非常复杂的操作。

import语句放在文件顶部意味着在服务器启动之前已处理所有导入;因为import名单中包括jinja2lxmlsignxml等“重物”(和SoC是不是很厉害),这意味着分钟的第一个指令之前实际执行。

OTOH将大多数导入放置在功能中,我能够在几秒钟内使服务器在串行线上“运行”。当然,当实际需要模块时,我必须付出代价(注意:这也可以通过import在空闲时间内生成后台任务来缓解)。

There’s another (probably “corner”) case where it may be beneficial to import inside rarely used functions: shorten startup time.

I hit that wall once with a rather complex program running on a small IoT server accepting commands from a serial line and performing operations, possibly very complex operations.

Placing import statements at top of files meant to have all imports processed before server start; since import list included jinja2, lxml, signxml and other “heavy weights” (and SoC was not very powerful) this meant minutes before the first instruction was actually executed.

OTOH placing most imports in functions I was able to have the server “alive” on the serial line in seconds. Of course when the modules were actually needed I had to pay the price (Note: also this can be mitigated by spawning a background task doing imports in idle time).