标签归档:Python

如何仅展平numpy数组的某些尺寸

问题:如何仅展平numpy数组的某些尺寸

是否有一种快速的方法来“子展平”或仅展平numpy数组中的某些第一维?

例如,给定一维的numpy数组(50,100,25),结果尺寸为(5000,25)

Is there a quick way to “sub-flatten” or flatten only some of the first dimensions in a numpy array?

For example, given a numpy array of dimensions (50,100,25), the resultant dimensions would be (5000,25)


回答 0

看看numpy.reshape

>>> arr = numpy.zeros((50,100,25))
>>> arr.shape
# (50, 100, 25)

>>> new_arr = arr.reshape(5000,25)
>>> new_arr.shape   
# (5000, 25)

# One shape dimension can be -1. 
# In this case, the value is inferred from 
# the length of the array and remaining dimensions.
>>> another_arr = arr.reshape(-1, arr.shape[-1])
>>> another_arr.shape
# (5000, 25)

Take a look at numpy.reshape .

>>> arr = numpy.zeros((50,100,25))
>>> arr.shape
# (50, 100, 25)

>>> new_arr = arr.reshape(5000,25)
>>> new_arr.shape   
# (5000, 25)

# One shape dimension can be -1. 
# In this case, the value is inferred from 
# the length of the array and remaining dimensions.
>>> another_arr = arr.reshape(-1, arr.shape[-1])
>>> another_arr.shape
# (5000, 25)

回答 1

亚历山大答案的一个概括-np.reshape可以采用-1作为参数,表示“总数组大小除以所有其他列出的维数的乘积”:

例如,将除最后一个尺寸以外的所有尺寸展平:

>>> arr = numpy.zeros((50,100,25))
>>> new_arr = arr.reshape(-1, arr.shape[-1])
>>> new_arr.shape
# (5000, 25)

A slight generalization to Alexander’s answer – np.reshape can take -1 as an argument, meaning “total array size divided by product of all other listed dimensions”:

e.g. to flatten all but the last dimension:

>>> arr = numpy.zeros((50,100,25))
>>> new_arr = arr.reshape(-1, arr.shape[-1])
>>> new_arr.shape
# (5000, 25)

回答 2

彼得的答案略有概括-如果您想超越三维阵列,则可以在原始阵列的形状上指定一个范围。

例如,将除最后两个维度之外的所有维度展平:

arr = numpy.zeros((3, 4, 5, 6))
new_arr = arr.reshape(-1, *arr.shape[-2:])
new_arr.shape
# (12, 5, 6)

编辑:对我之前的回答有一个概括-当然,您也可以在重塑的开始处指定一个范围:

arr = numpy.zeros((3, 4, 5, 6, 7, 8))
new_arr = arr.reshape(*arr.shape[:2], -1, *arr.shape[-2:])
new_arr.shape
# (3, 4, 30, 7, 8)

A slight generalization to Peter’s answer — you can specify a range over the original array’s shape if you want to go beyond three dimensional arrays.

e.g. to flatten all but the last two dimensions:

arr = numpy.zeros((3, 4, 5, 6))
new_arr = arr.reshape(-1, *arr.shape[-2:])
new_arr.shape
# (12, 5, 6)

EDIT: A slight generalization to my earlier answer — you can, of course, also specify a range at the beginning of the of the reshape too:

arr = numpy.zeros((3, 4, 5, 6, 7, 8))
new_arr = arr.reshape(*arr.shape[:2], -1, *arr.shape[-2:])
new_arr.shape
# (3, 4, 30, 7, 8)

回答 3

一种替代方法是按numpy.resize()如下方式使用:

In [37]: shp = (50,100,25)
In [38]: arr = np.random.random_sample(shp)
In [45]: resized_arr = np.resize(arr, (np.prod(shp[:2]), shp[-1]))
In [46]: resized_arr.shape
Out[46]: (5000, 25)

# sanity check with other solutions
In [47]: resized = np.reshape(arr, (-1, shp[-1]))
In [48]: np.allclose(resized_arr, resized)
Out[48]: True

An alternative approach is to use numpy.resize() as in:

In [37]: shp = (50,100,25)
In [38]: arr = np.random.random_sample(shp)
In [45]: resized_arr = np.resize(arr, (np.prod(shp[:2]), shp[-1]))
In [46]: resized_arr.shape
Out[46]: (5000, 25)

# sanity check with other solutions
In [47]: resized = np.reshape(arr, (-1, shp[-1]))
In [48]: np.allclose(resized_arr, resized)
Out[48]: True

python标准库中的装饰器(@deprecated)

问题:python标准库中的装饰器(@deprecated)

我需要将例程标记为已弃用,但显然没有用于弃用的标准库装饰器。我知道它的配方和警告模块,但是我的问题是:为什么没有用于此(常见)任务的标准库装饰器?

附加问题:标准库中是否有标准装饰器?

I need to mark routines as deprecated, but apparently there’s no standard library decorator for deprecation. I am aware of recipes for it and the warnings module, but my question is: why is there no standard library decorator for this (common) task ?

Additional question: are there standard decorators in the standard library at all ?


回答 0

这是从Leandro引用的那些代码中修改而来的一些代码片段:

import warnings
import functools

def deprecated(func):
    """This is a decorator which can be used to mark functions
    as deprecated. It will result in a warning being emitted
    when the function is used."""
    @functools.wraps(func)
    def new_func(*args, **kwargs):
        warnings.simplefilter('always', DeprecationWarning)  # turn off filter
        warnings.warn("Call to deprecated function {}.".format(func.__name__),
                      category=DeprecationWarning,
                      stacklevel=2)
        warnings.simplefilter('default', DeprecationWarning)  # reset filter
        return func(*args, **kwargs)
    return new_func

# Examples

@deprecated
def some_old_function(x, y):
    return x + y

class SomeClass:
    @deprecated
    def some_old_method(self, x, y):
        return x + y

因为在某些口译员中,第一个暴露的解决方案(无过滤器处理)可能导致警告抑制。

Here’s some snippet, modified from those cited by Leandro:

import warnings
import functools

def deprecated(func):
    """This is a decorator which can be used to mark functions
    as deprecated. It will result in a warning being emitted
    when the function is used."""
    @functools.wraps(func)
    def new_func(*args, **kwargs):
        warnings.simplefilter('always', DeprecationWarning)  # turn off filter
        warnings.warn("Call to deprecated function {}.".format(func.__name__),
                      category=DeprecationWarning,
                      stacklevel=2)
        warnings.simplefilter('default', DeprecationWarning)  # reset filter
        return func(*args, **kwargs)
    return new_func

# Examples

@deprecated
def some_old_function(x, y):
    return x + y

class SomeClass:
    @deprecated
    def some_old_method(self, x, y):
        return x + y

Because in some interpreters the first solution exposed (without filter handling) may result in a warning suppression.


回答 1

这是另一种解决方案:

该装饰器(实际上是装饰器工厂)允许您提供原因消息。通过提供源文件名行号来帮助开发人员诊断问题也更有用。

编辑:此代码使用Zero的建议:用替换warnings.warn_explicitwarnings.warn(msg, category=DeprecationWarning, stacklevel=2),它打印函数调用站点而不是函数定义站点。它使调试更加容易。

EDIT2:此版本允许开发人员指定可选的“原因”消息。

import functools
import inspect
import warnings

string_types = (type(b''), type(u''))


def deprecated(reason):
    """
    This is a decorator which can be used to mark functions
    as deprecated. It will result in a warning being emitted
    when the function is used.
    """

    if isinstance(reason, string_types):

        # The @deprecated is used with a 'reason'.
        #
        # .. code-block:: python
        #
        #    @deprecated("please, use another function")
        #    def old_function(x, y):
        #      pass

        def decorator(func1):

            if inspect.isclass(func1):
                fmt1 = "Call to deprecated class {name} ({reason})."
            else:
                fmt1 = "Call to deprecated function {name} ({reason})."

            @functools.wraps(func1)
            def new_func1(*args, **kwargs):
                warnings.simplefilter('always', DeprecationWarning)
                warnings.warn(
                    fmt1.format(name=func1.__name__, reason=reason),
                    category=DeprecationWarning,
                    stacklevel=2
                )
                warnings.simplefilter('default', DeprecationWarning)
                return func1(*args, **kwargs)

            return new_func1

        return decorator

    elif inspect.isclass(reason) or inspect.isfunction(reason):

        # The @deprecated is used without any 'reason'.
        #
        # .. code-block:: python
        #
        #    @deprecated
        #    def old_function(x, y):
        #      pass

        func2 = reason

        if inspect.isclass(func2):
            fmt2 = "Call to deprecated class {name}."
        else:
            fmt2 = "Call to deprecated function {name}."

        @functools.wraps(func2)
        def new_func2(*args, **kwargs):
            warnings.simplefilter('always', DeprecationWarning)
            warnings.warn(
                fmt2.format(name=func2.__name__),
                category=DeprecationWarning,
                stacklevel=2
            )
            warnings.simplefilter('default', DeprecationWarning)
            return func2(*args, **kwargs)

        return new_func2

    else:
        raise TypeError(repr(type(reason)))

您可以将此修饰符用于函数方法

这是一个简单的例子:

@deprecated("use another function")
def some_old_function(x, y):
    return x + y


class SomeClass(object):
    @deprecated("use another method")
    def some_old_method(self, x, y):
        return x + y


@deprecated("use another class")
class SomeOldClass(object):
    pass


some_old_function(5, 3)
SomeClass().some_old_method(8, 9)
SomeOldClass()

你会得到:

deprecated_example.py:59: DeprecationWarning: Call to deprecated function or method some_old_function (use another function).
  some_old_function(5, 3)
deprecated_example.py:60: DeprecationWarning: Call to deprecated function or method some_old_method (use another method).
  SomeClass().some_old_method(8, 9)
deprecated_example.py:61: DeprecationWarning: Call to deprecated class SomeOldClass (use another class).
  SomeOldClass()

EDIT3:现在,此装饰器已成为不推荐使用的库的一部分:

新的稳定版本v1.2.10🎉

Here is another solution:

This decorator (a decorator factory in fact) allow you to give a reason message. It is also more useful to help the developer to diagnose the problem by giving the source filename and line number.

EDIT: This code use Zero’s recommendation: it replace warnings.warn_explicit line by warnings.warn(msg, category=DeprecationWarning, stacklevel=2), which prints the function call site rather than the function definition site. It makes debugging easier.

EDIT2: This version allow the developper to specify an optional “reason” message.

import functools
import inspect
import warnings

string_types = (type(b''), type(u''))


def deprecated(reason):
    """
    This is a decorator which can be used to mark functions
    as deprecated. It will result in a warning being emitted
    when the function is used.
    """

    if isinstance(reason, string_types):

        # The @deprecated is used with a 'reason'.
        #
        # .. code-block:: python
        #
        #    @deprecated("please, use another function")
        #    def old_function(x, y):
        #      pass

        def decorator(func1):

            if inspect.isclass(func1):
                fmt1 = "Call to deprecated class {name} ({reason})."
            else:
                fmt1 = "Call to deprecated function {name} ({reason})."

            @functools.wraps(func1)
            def new_func1(*args, **kwargs):
                warnings.simplefilter('always', DeprecationWarning)
                warnings.warn(
                    fmt1.format(name=func1.__name__, reason=reason),
                    category=DeprecationWarning,
                    stacklevel=2
                )
                warnings.simplefilter('default', DeprecationWarning)
                return func1(*args, **kwargs)

            return new_func1

        return decorator

    elif inspect.isclass(reason) or inspect.isfunction(reason):

        # The @deprecated is used without any 'reason'.
        #
        # .. code-block:: python
        #
        #    @deprecated
        #    def old_function(x, y):
        #      pass

        func2 = reason

        if inspect.isclass(func2):
            fmt2 = "Call to deprecated class {name}."
        else:
            fmt2 = "Call to deprecated function {name}."

        @functools.wraps(func2)
        def new_func2(*args, **kwargs):
            warnings.simplefilter('always', DeprecationWarning)
            warnings.warn(
                fmt2.format(name=func2.__name__),
                category=DeprecationWarning,
                stacklevel=2
            )
            warnings.simplefilter('default', DeprecationWarning)
            return func2(*args, **kwargs)

        return new_func2

    else:
        raise TypeError(repr(type(reason)))

You can use this decorator for functions, methods and classes.

Here is a simple example:

@deprecated("use another function")
def some_old_function(x, y):
    return x + y


class SomeClass(object):
    @deprecated("use another method")
    def some_old_method(self, x, y):
        return x + y


@deprecated("use another class")
class SomeOldClass(object):
    pass


some_old_function(5, 3)
SomeClass().some_old_method(8, 9)
SomeOldClass()

You’ll get:

deprecated_example.py:59: DeprecationWarning: Call to deprecated function or method some_old_function (use another function).
  some_old_function(5, 3)
deprecated_example.py:60: DeprecationWarning: Call to deprecated function or method some_old_method (use another method).
  SomeClass().some_old_method(8, 9)
deprecated_example.py:61: DeprecationWarning: Call to deprecated class SomeOldClass (use another class).
  SomeOldClass()

EDIT3: This decorator is now part of the Deprecated library:

New stable release v1.2.10 🎉


回答 2

如muon所建议,您可以deprecation为此安装软件包。

deprecation库为您的测试提供了一个deprecated装饰器和一个fail_if_not_removed装饰器。

安装

pip install deprecation

用法示例

import deprecation

@deprecation.deprecated(deprecated_in="1.0", removed_in="2.0",
                        current_version=__version__,
                        details="Use the bar function instead")
def foo():
    """Do some stuff"""
    return 1

有关完整文档,请参见http://deprecation.readthedocs.io/

As muon suggested, you can install the deprecation package for this.

The deprecation library provides a deprecated decorator and a fail_if_not_removed decorator for your tests.

Installation

pip install deprecation

Example Usage

import deprecation

@deprecation.deprecated(deprecated_in="1.0", removed_in="2.0",
                        current_version=__version__,
                        details="Use the bar function instead")
def foo():
    """Do some stuff"""
    return 1

See http://deprecation.readthedocs.io/ for the full documentation.


回答 3

我猜想原因是Python代码无法静态处理(就像C ++编译器所做的那样),您在实际使用某些东西之前不会得到警告。我认为用一堆消息“警告:此脚本的开发人员正在使用已弃用的API”来向脚本用户发送垃圾邮件不是一个好主意。

更新:但是您可以创建装饰器,它将原始功能转换为另一个。新功能将标记/检查开关,告知该功能已被调用,并且仅在将开关置于打开状态时显示消息。和/或在退出时,它可以打印程序中使用的所有不赞成使用的功能的列表。

I guess the reason is that Python code can’t be processed statically (as it done for C++ compilers), you can’t get warning about using some things before actually using it. I don’t think that it’s a good idea to spam user of your script with a bunch of messages “Warning: this developer of this script is using deprecated API”.

Update: but you can create decorator which will transform original function into another. New function will mark/check switch telling that this function was called already and will show message only on turning switch into on state. And/or at exit it may print list of all deprecated functions used in program.


回答 4

您可以创建一个utils文件

import warnings

def deprecated(message):
  def deprecated_decorator(func):
      def deprecated_func(*args, **kwargs):
          warnings.warn("{} is a deprecated function. {}".format(func.__name__, message),
                        category=DeprecationWarning,
                        stacklevel=2)
          warnings.simplefilter('default', DeprecationWarning)
          return func(*args, **kwargs)
      return deprecated_func
  return deprecated_decorator

然后按如下所示导入弃用装饰器:

from .utils import deprecated

@deprecated("Use method yyy instead")
def some_method()"
 pass

You can create a utils file

import warnings

def deprecated(message):
  def deprecated_decorator(func):
      def deprecated_func(*args, **kwargs):
          warnings.warn("{} is a deprecated function. {}".format(func.__name__, message),
                        category=DeprecationWarning,
                        stacklevel=2)
          warnings.simplefilter('default', DeprecationWarning)
          return func(*args, **kwargs)
      return deprecated_func
  return deprecated_decorator

And then import the deprecation decorator as follows:

from .utils import deprecated

@deprecated("Use method yyy instead")
def some_method()"
 pass

回答 5

更新:我认为更好,当我们只为每行代码第一次显示DeprecationWarning时,以及当我们可以发送一些消息时:

import inspect
import traceback
import warnings
import functools

import time


def deprecated(message: str = ''):
    """
    This is a decorator which can be used to mark functions
    as deprecated. It will result in a warning being emitted
    when the function is used first time and filter is set for show DeprecationWarning.
    """
    def decorator_wrapper(func):
        @functools.wraps(func)
        def function_wrapper(*args, **kwargs):
            current_call_source = '|'.join(traceback.format_stack(inspect.currentframe()))
            if current_call_source not in function_wrapper.last_call_source:
                warnings.warn("Function {} is now deprecated! {}".format(func.__name__, message),
                              category=DeprecationWarning, stacklevel=2)
                function_wrapper.last_call_source.add(current_call_source)

            return func(*args, **kwargs)

        function_wrapper.last_call_source = set()

        return function_wrapper
    return decorator_wrapper


@deprecated('You must use my_func2!')
def my_func():
    time.sleep(.1)
    print('aaa')
    time.sleep(.1)


def my_func2():
    print('bbb')


warnings.simplefilter('always', DeprecationWarning)  # turn off filter
print('before cycle')
for i in range(5):
    my_func()
print('after cycle')
my_func()
my_func()
my_func()

结果:

before cycle
C:/Users/adr-0/OneDrive/Projects/Python/test/unit1.py:45: DeprecationWarning: Function my_func is now deprecated! You must use my_func2!
aaa
aaa
aaa
aaa
aaa
after cycle
C:/Users/adr-0/OneDrive/Projects/Python/test/unit1.py:47: DeprecationWarning: Function my_func is now deprecated! You must use my_func2!
aaa
C:/Users/adr-0/OneDrive/Projects/Python/test/unit1.py:48: DeprecationWarning: Function my_func is now deprecated! You must use my_func2!
aaa
C:/Users/adr-0/OneDrive/Projects/Python/test/unit1.py:49: DeprecationWarning: Function my_func is now deprecated! You must use my_func2!
aaa

Process finished with exit code 0

我们只需单击警告路径,然后转到PyCharm中的行。

UPDATE: I think is better, when we show DeprecationWarning only first time for each code line and when we can send some message:

import inspect
import traceback
import warnings
import functools

import time


def deprecated(message: str = ''):
    """
    This is a decorator which can be used to mark functions
    as deprecated. It will result in a warning being emitted
    when the function is used first time and filter is set for show DeprecationWarning.
    """
    def decorator_wrapper(func):
        @functools.wraps(func)
        def function_wrapper(*args, **kwargs):
            current_call_source = '|'.join(traceback.format_stack(inspect.currentframe()))
            if current_call_source not in function_wrapper.last_call_source:
                warnings.warn("Function {} is now deprecated! {}".format(func.__name__, message),
                              category=DeprecationWarning, stacklevel=2)
                function_wrapper.last_call_source.add(current_call_source)

            return func(*args, **kwargs)

        function_wrapper.last_call_source = set()

        return function_wrapper
    return decorator_wrapper


@deprecated('You must use my_func2!')
def my_func():
    time.sleep(.1)
    print('aaa')
    time.sleep(.1)


def my_func2():
    print('bbb')


warnings.simplefilter('always', DeprecationWarning)  # turn off filter
print('before cycle')
for i in range(5):
    my_func()
print('after cycle')
my_func()
my_func()
my_func()

Result:

before cycle
C:/Users/adr-0/OneDrive/Projects/Python/test/unit1.py:45: DeprecationWarning: Function my_func is now deprecated! You must use my_func2!
aaa
aaa
aaa
aaa
aaa
after cycle
C:/Users/adr-0/OneDrive/Projects/Python/test/unit1.py:47: DeprecationWarning: Function my_func is now deprecated! You must use my_func2!
aaa
C:/Users/adr-0/OneDrive/Projects/Python/test/unit1.py:48: DeprecationWarning: Function my_func is now deprecated! You must use my_func2!
aaa
C:/Users/adr-0/OneDrive/Projects/Python/test/unit1.py:49: DeprecationWarning: Function my_func is now deprecated! You must use my_func2!
aaa

Process finished with exit code 0

We can just click on the warning path and go to the line in PyCharm.


回答 6

增强此答案由史蒂芬Vascellaro

如果使用Anaconda,请先安装deprecation软件包:

conda install -c conda-forge deprecation 

然后将以下内容粘贴到文件顶部

import deprecation

@deprecation.deprecated(deprecated_in="1.0", removed_in="2.0",
                    current_version=__version__,
                    details="Use the bar function instead")
def foo():
    """Do some stuff"""
    return 1

有关完整文档,请参见http://deprecation.readthedocs.io/

Augmenting this answer by Steven Vascellaro:

If you use Anaconda, first install deprecation package:

conda install -c conda-forge deprecation 

Then paste the following on the top of the file

import deprecation

@deprecation.deprecated(deprecated_in="1.0", removed_in="2.0",
                    current_version=__version__,
                    details="Use the bar function instead")
def foo():
    """Do some stuff"""
    return 1

See http://deprecation.readthedocs.io/ for the full documentation.


如何在Flask-SQLAlchemy中按ID删除记录

问题:如何在Flask-SQLAlchemy中按ID删除记录

users在MySql数据库中有表。这个表有idname而且age领域。

如何删除某些记录id

现在,我使用以下代码:

user = User.query.get(id)
db.session.delete(user)
db.session.commit()

但是我不想在删除操作之前进行任何查询。有什么办法吗?我知道,我可以使用db.engine.execute("delete from users where id=..."),但是我想使用delete()方法。

I have users table in my MySql database. This table has id, name and age fields.

How can I delete some record by id?

Now I use the following code:

user = User.query.get(id)
db.session.delete(user)
db.session.commit()

But I don’t want to make any query before delete operation. Is there any way to do this? I know, I can use db.engine.execute("delete from users where id=..."), but I would like to use delete() method.


回答 0

你可以这样做,

User.query.filter_by(id=123).delete()

要么

User.query.filter(User.id == 123).delete()

确保作出commitdelete()生效。

You can do this,

User.query.filter_by(id=123).delete()

or

User.query.filter(User.id == 123).delete()

Make sure to commit for delete() to take effect.


回答 1

只想分享另一个选择:

# mark two objects to be deleted
session.delete(obj1)
session.delete(obj2)

# commit (or flush)
session.commit()

http://docs.sqlalchemy.org/en/latest/orm/session_basics.html#deleting

在此示例中,以下代码可以正常工作:

obj = User.query.filter_by(id=123).one()
session.delete(obj)
session.commit()

Just want to share another option:

# mark two objects to be deleted
session.delete(obj1)
session.delete(obj2)

# commit (or flush)
session.commit()

http://docs.sqlalchemy.org/en/latest/orm/session_basics.html#deleting

In this example, the following codes shall works fine:

obj = User.query.filter_by(id=123).one()
session.delete(obj)
session.commit()

回答 2

另一个可能的解决方案,特别是如果要批量删除

deleted_objects = User.__table__.delete().where(User.id.in_([1, 2, 3]))
session.execute(deleted_objects)
session.commit()

Another possible solution specially if you want batch delete

deleted_objects = User.__table__.delete().where(User.id.in_([1, 2, 3]))
session.execute(deleted_objects)
session.commit()

用None替换Pandas或Numpy Nan以与MysqlDB一起使用

问题:用None替换Pandas或Numpy Nan以与MysqlDB一起使用

我正在尝试使用MysqlDB将Pandas数据帧(或可以使用numpy数组)写入mysql数据库。MysqlDB似乎不理解’nan’,我的数据库抛出一个错误,说nan不在字段列表中。我需要找到一种将’nan’转换为NoneType的方法。

有任何想法吗?

I am trying to write a Pandas dataframe (or can use a numpy array) to a mysql database using MysqlDB . MysqlDB doesn’t seem understand ‘nan’ and my database throws out an error saying nan is not in the field list. I need to find a way to convert the ‘nan’ into a NoneType.

Any ideas?


回答 0

@bogatron正确,您可以使用where,值得注意的是您可以在熊猫本机执行此操作:

df1 = df.where(pd.notnull(df), None)

注意:这会将所有列的dtype更改为object

例:

In [1]: df = pd.DataFrame([1, np.nan])

In [2]: df
Out[2]: 
    0
0   1
1 NaN

In [3]: df1 = df.where(pd.notnull(df), None)

In [4]: df1
Out[4]: 
      0
0     1
1  None

注意:您不能执行的操作dtype是使用astype,然后使用DataFrame fillna方法来重铸DataFrame 以允许所有数据类型,请执行以下操作:

df1 = df.astype(object).replace(np.nan, 'None')

遗憾的是这个没有,也没有使用replace,用作品None这个(关闭)的问题


顺便说一句,值得注意的是,对于大多数用例,您不需要将NaN替换为None,请参阅有关熊猫中NaN和None之间的区别的问题。

但是,在这种特定情况下,您似乎可以这样做(至少在回答此问题时)。

@bogatron has it right, you can use where, it’s worth noting that you can do this natively in pandas:

df1 = df.where(pd.notnull(df), None)

Note: this changes the dtype of all columns to object.

Example:

In [1]: df = pd.DataFrame([1, np.nan])

In [2]: df
Out[2]: 
    0
0   1
1 NaN

In [3]: df1 = df.where(pd.notnull(df), None)

In [4]: df1
Out[4]: 
      0
0     1
1  None

Note: what you cannot do recast the DataFrames dtype to allow all datatypes types, using astype, and then the DataFrame fillna method:

df1 = df.astype(object).replace(np.nan, 'None')

Unfortunately neither this, nor using replace, works with None see this (closed) issue.


As an aside, it’s worth noting that for most use cases you don’t need to replace NaN with None, see this question about the difference between NaN and None in pandas.

However, in this specific case it seems you do (at least at the time of this answer).


回答 1

df = df.replace({np.nan: None})

这个Github问题归功于这个家伙。

df = df.replace({np.nan: None})

Credit goes to this guy here on this Github issue.


回答 2

您可以在numpy数组中替换nanNone

>>> x = np.array([1, np.nan, 3])
>>> y = np.where(np.isnan(x), None, x)
>>> print y
[1.0 None 3.0]
>>> print type(y[1])
<type 'NoneType'>

You can replace nan with None in your numpy array:

>>> x = np.array([1, np.nan, 3])
>>> y = np.where(np.isnan(x), None, x)
>>> print y
[1.0 None 3.0]
>>> print type(y[1])
<type 'NoneType'>

回答 3

经过绊脚,这对我有用:

df = df.astype(object).where(pd.notnull(df),None)

After stumbling around, this worked for me:

df = df.astype(object).where(pd.notnull(df),None)

回答 4

只是@Andy Hayden的答案的补充:

由于DataFrame.mask是的相对孪生子DataFrame.where,因此它们具有完全相同的签名,但含义相反:

  • DataFrame.where对于替换条件为False的很有用
  • DataFrame.mask用于替换条件为True的值。

所以在这个问题上,使用df.mask(df.isna(), other=None, inplace=True)可能会更直观。

Just an addition to @Andy Hayden’s answer:

Since DataFrame.mask is the opposite twin of DataFrame.where, they have the exactly same signature but with opposite meaning:

  • DataFrame.where is useful for Replacing values where the condition is False.
  • DataFrame.mask is used for Replacing values where the condition is True.

So in this question, using df.mask(df.isna(), other=None, inplace=True) might be more intuitive.


回答 5

另外除了:更换倍数和转换从柱背面的类型时要小心对象浮动。如果您想确定自己None的不会退回到np.NaN‘s’,请使用@ andy-hayden的建议pd.where。替换仍然会出错的说明:

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame({"a": [1, np.NAN, np.inf]})

In [4]: df
Out[4]:
     a
0  1.0
1  NaN
2  inf

In [5]: df.replace({np.NAN: None})
Out[5]:
      a
0     1
1  None
2   inf

In [6]: df.replace({np.NAN: None, np.inf: None})
Out[6]:
     a
0  1.0
1  NaN
2  NaN

In [7]: df.where((pd.notnull(df)), None).replace({np.inf: None})
Out[7]:
     a
0  1.0
1  NaN
2  NaN

Another addition: be careful when replacing multiples and converting the type of the column back from object to float. If you want to be certain that your None‘s won’t flip back to np.NaN‘s apply @andy-hayden’s suggestion with using pd.where. Illustration of how replace can still go ‘wrong’:

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame({"a": [1, np.NAN, np.inf]})

In [4]: df
Out[4]:
     a
0  1.0
1  NaN
2  inf

In [5]: df.replace({np.NAN: None})
Out[5]:
      a
0     1
1  None
2   inf

In [6]: df.replace({np.NAN: None, np.inf: None})
Out[6]:
     a
0  1.0
1  NaN
2  NaN

In [7]: df.where((pd.notnull(df)), None).replace({np.inf: None})
Out[7]:
     a
0  1.0
1  NaN
2  NaN

回答 6

很老,但我偶然发现了同样的问题。尝试这样做:

df['col_replaced'] = df['col_with_npnans'].apply(lambda x: None if np.isnan(x) else x)

Quite old, yet I stumbled upon the very same issue. Try doing this:

df['col_replaced'] = df['col_with_npnans'].apply(lambda x: None if np.isnan(x) else x)

使用.corr获取两列之间的相关性

问题:使用.corr获取两列之间的相关性

我有以下熊猫数据框Top15

我创建了一个估计每人可引用文件数量的列:

Top15['PopEst'] = Top15['Energy Supply'] / Top15['Energy Supply per Capita']
Top15['Citable docs per Capita'] = Top15['Citable documents'] / Top15['PopEst']

我想知道人均引用文件数量与人均能源供应之间的相关性。因此,我使用了.corr()方法(皮尔逊相关性):

data = Top15[['Citable docs per Capita','Energy Supply per Capita']]
correlation = data.corr(method='pearson')

我想返回一个数字,但是结果是:

I have the following pandas dataframe Top15:

I create a column that estimates the number of citable documents per person:

Top15['PopEst'] = Top15['Energy Supply'] / Top15['Energy Supply per Capita']
Top15['Citable docs per Capita'] = Top15['Citable documents'] / Top15['PopEst']

I want to know the correlation between the number of citable documents per capita and the energy supply per capita. So I use the .corr() method (Pearson’s correlation):

data = Top15[['Citable docs per Capita','Energy Supply per Capita']]
correlation = data.corr(method='pearson')

I want to return a single number, but the result is:


回答 0

没有实际数据,很难回答这个问题,但是我想您正在寻找这样的东西:

Top15['Citable docs per Capita'].corr(Top15['Energy Supply per Capita'])

这样就可以计算出两列 'Citable docs per Capita'之间的相关性'Energy Supply per Capita'

举个例子:

import pandas as pd

df = pd.DataFrame({'A': range(4), 'B': [2*i for i in range(4)]})

   A  B
0  0  0
1  1  2
2  2  4
3  3  6

然后

df['A'].corr(df['B'])

给出1预期。

现在,如果您更改一个值,例如

df.loc[2, 'B'] = 4.5

   A    B
0  0  0.0
1  1  2.0
2  2  4.5
3  3  6.0

命令

df['A'].corr(df['B'])

退货

0.99586

仍接近预期的1。

如果.corr直接应用于数据框,它将返回列之间的所有成对关联;这就是为什么您然后1s在矩阵的对角线处进行观察的原因(每列与自身完全相关)。

df.corr()

因此将返回

          A         B
A  1.000000  0.995862
B  0.995862  1.000000

在您显示的图形中,仅表示相关矩阵的左上角(我假设)。

在某些情况下,您可以NaN在解决方案中找到s-请查看示例。

如果要过滤高于或低于特定阈值的条目,可以检查此问题。如果要绘制相关系数的热图,可以检查该答案,如果然后遇到轴标签重叠的问题,请检查以下文章

Without actual data it is hard to answer the question but I guess you are looking for something like this:

Top15['Citable docs per Capita'].corr(Top15['Energy Supply per Capita'])

That calculates the correlation between your two columns 'Citable docs per Capita' and 'Energy Supply per Capita'.

To give an example:

import pandas as pd

df = pd.DataFrame({'A': range(4), 'B': [2*i for i in range(4)]})

   A  B
0  0  0
1  1  2
2  2  4
3  3  6

Then

df['A'].corr(df['B'])

gives 1 as expected.

Now, if you change a value, e.g.

df.loc[2, 'B'] = 4.5

   A    B
0  0  0.0
1  1  2.0
2  2  4.5
3  3  6.0

the command

df['A'].corr(df['B'])

returns

0.99586

which is still close to 1, as expected.

If you apply .corr directly to your dataframe, it will return all pairwise correlations between your columns; that’s why you then observe 1s at the diagonal of your matrix (each column is perfectly correlated with itself).

df.corr()

will therefore return

          A         B
A  1.000000  0.995862
B  0.995862  1.000000

In the graphic you show, only the upper left corner of the correlation matrix is represented (I assume).

There can be cases, where you get NaNs in your solution – check this post for an example.

If you want to filter entries above/below a certain threshold, you can check this question. If you want to plot a heatmap of the correlation coefficients, you can check this answer and if you then run into the issue with overlapping axis-labels check the following post.


回答 1

我遇到了同样的问题。它似乎Citable Documents per Person是一个浮点数,Python默认以某种方式跳过它。我数据框的所有其他列均为numpy格式,因此我通过将columnt转换为np.float64

Top15['Citable Documents per Person']=np.float64(Top15['Citable Documents per Person'])

请记住,这正是您自己计算的列

I ran into the same issue. It appeared Citable Documents per Person was a float, and python skips it somehow by default. All the other columns of my dataframe were in numpy-formats, so I solved it by converting the columnt to np.float64

Top15['Citable Documents per Person']=np.float64(Top15['Citable Documents per Person'])

Remember it’s exactly the column you calculated yourself


回答 2

我的解决方案是将数据转换为数值类型后:

Top15[['Citable docs per Capita','Energy Supply per Capita']].corr()

My solution would be after converting data to numerical type:

Top15[['Citable docs per Capita','Energy Supply per Capita']].corr()

回答 3

如果要在所有成对的列之间建立关联,可以执行以下操作:

import pandas as pd
import numpy as np

def get_corrs(df):
    col_correlations = df.corr()
    col_correlations.loc[:, :] = np.tril(col_correlations, k=-1)
    cor_pairs = col_correlations.stack()
    return cor_pairs.to_dict()

my_corrs = get_corrs(df)
# and the following line to retrieve the single correlation
print(my_corrs[('Citable docs per Capita','Energy Supply per Capita')])

If you want the correlations between all pairs of columns, you could do something like this:

import pandas as pd
import numpy as np

def get_corrs(df):
    col_correlations = df.corr()
    col_correlations.loc[:, :] = np.tril(col_correlations, k=-1)
    cor_pairs = col_correlations.stack()
    return cor_pairs.to_dict()

my_corrs = get_corrs(df)
# and the following line to retrieve the single correlation
print(my_corrs[('Citable docs per Capita','Energy Supply per Capita')])

回答 4

当您调用:

data = Top15[['Citable docs per Capita','Energy Supply per Capita']]
correlation = data.corr(method='pearson')

由于DataFrame.corr()函数执行成对关联,因此您需要从两个变量中获得四对。因此,基本上,您会得到对角线值作为自动相关性(与自身相关,两个值,因为您有两个变量),而其他两个值作为一个对另一个的互相关,反之亦然。

在两个序列之间执行相关以获得单个值:

from scipy.stats.stats import pearsonr
docs_col = Top15['Citable docs per Capita'].values
energy_col = Top15['Energy Supply per Capita'].values
corr , _ = pearsonr(docs_col, energy_col)

或者,如果您想从同一函数(DataFrame的corr)中获得一个值:

single_value = correlation[0][1] 

希望这可以帮助。

When you call this:

data = Top15[['Citable docs per Capita','Energy Supply per Capita']]
correlation = data.corr(method='pearson')

Since, DataFrame.corr() function performs pair-wise correlations, you have four pair from two variables. So, basically you are getting diagonal values as auto correlation (correlation with itself, two values since you have two variables), and other two values as cross correlations of one vs another and vice versa.

Either perform correlation between two series to get a single value:

from scipy.stats.stats import pearsonr
docs_col = Top15['Citable docs per Capita'].values
energy_col = Top15['Energy Supply per Capita'].values
corr , _ = pearsonr(docs_col, energy_col)

or, if you want a single value from the same function (DataFrame’s corr):

single_value = correlation[0][1] 

Hope this helps.


回答 5

它是这样的:

Top15['Citable docs per Capita']=np.float64(Top15['Citable docs per Capita'])

Top15['Energy Supply per Capita']=np.float64(Top15['Energy Supply per Capita'])

Top15['Energy Supply per Capita'].corr(Top15['Citable docs per Capita'])

It works like this:

Top15['Citable docs per Capita']=np.float64(Top15['Citable docs per Capita'])

Top15['Energy Supply per Capita']=np.float64(Top15['Energy Supply per Capita'])

Top15['Energy Supply per Capita'].corr(Top15['Citable docs per Capita'])

回答 6

我通过更改数据类型解决了这个问题。如果您看到“人均能源供应”是数字类型,而“人均城市文档”则是对象类型。我使用astype将列转换为float。我曾与一些NP功能相同的问题:count_nonzerosum合作,同时meanstd没有。

I solved this problem by changing the data type. If you see the ‘Energy Supply per Capita’ is a numerical type while the ‘Citable docs per Capita’ is an object type. I converted the column to float using astype. I had the same problem with some np functions: count_nonzero and sum worked while mean and std didn’t.


回答 7

在关联之前将“人均Citable docs”更改为数字可以解决该问题。

    Top15['Citable docs per Capita'] = pd.to_numeric(Top15['Citable docs per Capita'])
    data = Top15[['Citable docs per Capita','Energy Supply per Capita']]
    correlation = data.corr(method='pearson')

changing ‘Citable docs per Capita’ to numeric before correlation will solve the problem.

    Top15['Citable docs per Capita'] = pd.to_numeric(Top15['Citable docs per Capita'])
    data = Top15[['Citable docs per Capita','Energy Supply per Capita']]
    correlation = data.corr(method='pearson')

sklearn错误ValueError:输入包含NaN,无穷大或对于dtype(’float64’)而言太大的值

问题:sklearn错误ValueError:输入包含NaN,无穷大或对于dtype(’float64’)而言太大的值

我正在使用sklearn,并且亲和力传播存在问题。我建立了一个输入矩阵,并且不断收到以下错误。

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

我跑了

np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True

我尝试使用

mat[np.isfinite(mat) == True] = 0

删除无限值,但这也不起作用。我该怎么做才能摆脱矩阵中的无限值,以便可以使用亲和力传播算法?

我正在使用anaconda和python 2.7.9。

I am using sklearn and having a problem with the affinity propagation. I have built an input matrix and I keep getting the following error.

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I have run

np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True

I tried using

mat[np.isfinite(mat) == True] = 0

to remove the infinite values but this did not work either. What can I do to get rid of the infinite values in my matrix, so that I can use the affinity propagation algorithm?

I am using anaconda and python 2.7.9.


回答 0

这可能会在scikit内部发生,并且取决于您在做什么。我建议您阅读所用功能的文档。您可能正在使用一种方法,例如,这取决于您的矩阵是正定的且不满足该条件。

编辑:我怎么会错过:

np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True

显然是错误的。正确的是:

np.any(np.isnan(mat))

np.all(np.isfinite(mat))

您想检查任何元素是否为NaN,而不是该any函数的返回值是否为数字…

This might happen inside scikit, and it depends on what you’re doing. I recommend reading the documentation for the functions you’re using. You might be using one which depends e.g. on your matrix being positive definite and not fulfilling that criteria.

EDIT: How could I miss that:

np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True

is obviously wrong. Right would be:

np.any(np.isnan(mat))

and

np.all(np.isfinite(mat))

You want to check wheter any of the element is NaN, and not whether the return value of the any function is a number…


回答 1

sklearnpandas一起使用时,出现相同的错误消息。我的解决方案是df在运行任何sklearn代码之前重置数据帧的索引:

df = df.reset_index()

当我删除自己的某些条目时,我多次遇到此问题df,例如

df = df[df.label=='desired_one']

I got the same error message when using sklearn with pandas. My solution is to reset the index of my dataframe df before running any sklearn code:

df = df.reset_index()

I encountered this issue many times when I removed some entries in my df, such as

df = df[df.label=='desired_one']

回答 2

这是我的功能(基于)清洁的数据集nanInf和缺少细胞(偏斜数据集):

import pandas as pd

def clean_dataset(df):
    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
    df.dropna(inplace=True)
    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
    return df[indices_to_keep].astype(np.float64)

This is my function (based on this) to clean the dataset of nan, Inf, and missing cells (for skewed datasets):

import pandas as pd

def clean_dataset(df):
    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
    df.dropna(inplace=True)
    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
    return df[indices_to_keep].astype(np.float64)

回答 3

输入数组的维度倾斜,因为我的输入csv有空白。

The Dimensions of my input array were skewed, as my input csv had empty spaces.


回答 4

这是失败的检查:

哪说

def _assert_all_finite(X):
    """Like assert_all_finite, but only for ndarray."""
    X = np.asanyarray(X)
    # First try an O(n) time, O(1) space solution for the common case that
    # everything is finite; fall back to O(n) space np.isfinite to prevent
    # false positives from overflow in sum method.
    if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
            and not np.isfinite(X).all()):
        raise ValueError("Input contains NaN, infinity"
                         " or a value too large for %r." % X.dtype)

因此,请确保输入中没有非NaN值。所有这些值实际上都是浮点值。两个值都不应该是Inf。

This is the check on which it fails:

Which says

def _assert_all_finite(X):
    """Like assert_all_finite, but only for ndarray."""
    X = np.asanyarray(X)
    # First try an O(n) time, O(1) space solution for the common case that
    # everything is finite; fall back to O(n) space np.isfinite to prevent
    # false positives from overflow in sum method.
    if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
            and not np.isfinite(X).all()):
        raise ValueError("Input contains NaN, infinity"
                         " or a value too large for %r." % X.dtype)

So make sure that you have non NaN values in your input. And all those values are actually float values. None of the values should be Inf either.


回答 5

使用此版本的python 3:

/opt/anaconda3/bin/python --version
Python 3.6.0 :: Anaconda 4.3.0 (64-bit)

查看错误的详细信息,我发现导致失败的代码行:

/opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)
     56             and not np.isfinite(X).all()):
     57         raise ValueError("Input contains NaN, infinity"
---> 58                          " or a value too large for %r." % X.dtype)
     59 
     60 

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

由此,我能够使用错误消息给出的相同测试来提取正确的方法来测试数据的处理方式: np.isfinite(X)

然后通过快速而肮脏的循环,我发现我的数据确实包含nans

print(p[:,0].shape)
index = 0
for i in p[:,0]:
    if not np.isfinite(i):
        print(index, i)
    index +=1

(367340,)
4454 nan
6940 nan
10868 nan
12753 nan
14855 nan
15678 nan
24954 nan
30251 nan
31108 nan
51455 nan
59055 nan
...

现在,我要做的就是删除这些索引中的值。

With this version of python 3:

/opt/anaconda3/bin/python --version
Python 3.6.0 :: Anaconda 4.3.0 (64-bit)

Looking at the details of the error, I found the lines of codes causing the failure:

/opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)
     56             and not np.isfinite(X).all()):
     57         raise ValueError("Input contains NaN, infinity"
---> 58                          " or a value too large for %r." % X.dtype)
     59 
     60 

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

From this, I was able to extract the correct way to test what was going on with my data using the same test which fails given by the error message: np.isfinite(X)

Then with a quick and dirty loop, I was able to find that my data indeed contains nans:

print(p[:,0].shape)
index = 0
for i in p[:,0]:
    if not np.isfinite(i):
        print(index, i)
    index +=1

(367340,)
4454 nan
6940 nan
10868 nan
12753 nan
14855 nan
15678 nan
24954 nan
30251 nan
31108 nan
51455 nan
59055 nan
...

Now all I have to do is remove the values at these indexes.


回答 6

尝试选择行的子集后出现错误:

df = df.reindex(index=my_index)

原来my_index包含的值不包含在其中df.index,因此reindex函数插入了一些新行并将其填充为nan

I had the error after trying to select a subset of rows:

df = df.reindex(index=my_index)

Turns out that my_index contained values that were not contained in df.index, so the reindex function inserted some new rows and filled them with nan.


回答 7

在大多数情况下,消除无穷和空值可以解决此问题。

摆脱无限的价值。

df.replace([np.inf, -np.inf], np.nan, inplace=True)

以您喜欢的方式消除空值,特定值(例如999),均值,或创建自己的函数来估算缺失值

df.fillna(999, inplace=True)

In most cases getting rid of infinite and null values solve this problem.

get rid of infinite values.

df.replace([np.inf, -np.inf], np.nan, inplace=True)

get rid of null values the way you like, specific value such as 999, mean, or create your own function to impute missing values

df.fillna(999, inplace=True)

回答 8

我有同样的错误,在我的案例中,X和y是数据帧,因此我必须先将它们转换为矩阵:

X = X.values.astype(np.float)
y = y.values.astype(np.float)

编辑:建议使用最初建议的X.as_matrix()

I had the same error, and in my case X and y were dataframes so I had to convert them to matrices first:

X = X.values.astype(np.float)
y = y.values.astype(np.float)

Edit: The originally suggested X.as_matrix() is Deprecated


回答 9

我有同样的错误。它曾与df.fillna(-99999, inplace=True)做任何替换之前,替换等

i got the same error. it worked with df.fillna(-99999, inplace=True) before doing any replacement, substitution etc


回答 10

就我而言,问题是许多scikit函数返回的numpy数组没有熊猫索引。因此,当我使用这些numpy数组构建新的DataFrames,然后尝试将它们与原始数据混合时,索引不匹配。

In my case the problem was that many scikit functions return numpy arrays, which are devoid of pandas index. So there was an index mismatch when I used those numpy arrays to build new DataFrames and then I tried to mix them with the original data.


回答 11

删除所有无限值:

(并用该列的min或max代替)

# find min and max values for each column, ignoring nan, -inf, and inf
mins = [np.nanmin(matrix[:, i][matrix[:, i] != -np.inf]) for i in range(matrix.shape[1])]
maxs = [np.nanmax(matrix[:, i][matrix[:, i] != np.inf]) for i in range(matrix.shape[1])]

# go through matrix one column at a time and replace  + and -infinity 
# with the max or min for that column
for i in range(log_train_arr.shape[1]):
    matrix[:, i][matrix[:, i] == -np.inf] = mins[i]
    matrix[:, i][matrix[:, i] == np.inf] = maxs[i]

Remove all infinite values:

(and replace with min or max for that column)

# find min and max values for each column, ignoring nan, -inf, and inf
mins = [np.nanmin(matrix[:, i][matrix[:, i] != -np.inf]) for i in range(matrix.shape[1])]
maxs = [np.nanmax(matrix[:, i][matrix[:, i] != np.inf]) for i in range(matrix.shape[1])]

# go through matrix one column at a time and replace  + and -infinity 
# with the max or min for that column
for i in range(log_train_arr.shape[1]):
    matrix[:, i][matrix[:, i] == -np.inf] = mins[i]
    matrix[:, i][matrix[:, i] == np.inf] = maxs[i]

回答 12

尝试

mat.sum()

如果您的数据总和为无穷大(最大浮动值大于3.402823e + 38),则会收到该错误。

请参阅scikit源代码中validation.py中的_assert_all_finite函数:

if is_float and np.isfinite(X.sum()):
    pass
elif is_float:
    msg_err = "Input contains {} or a value too large for {!r}."
    if (allow_nan and np.isinf(X).any() or
            not allow_nan and not np.isfinite(X).all()):
        type_err = 'infinity' if allow_nan else 'NaN, infinity'
        # print(X.sum())
        raise ValueError(msg_err.format(type_err, X.dtype))

try

mat.sum()

If the sum of your data is infinity (greater that the max float value which is 3.402823e+38) you will get that error.

see the _assert_all_finite function in validation.py from the scikit source code:

if is_float and np.isfinite(X.sum()):
    pass
elif is_float:
    msg_err = "Input contains {} or a value too large for {!r}."
    if (allow_nan and np.isinf(X).any() or
            not allow_nan and not np.isfinite(X).all()):
        type_err = 'infinity' if allow_nan else 'NaN, infinity'
        # print(X.sum())
        raise ValueError(msg_err.format(type_err, X.dtype))

如何使用Python请求伪造浏览器访问?

问题:如何使用Python请求伪造浏览器访问?

我想从下面的网站获取内容。如果使用Firefox或Chrome这样的浏览器,则可以获取所需的真实网站页面,但是如果使用Python request软件包(或wget命令)进行获取,则它将返回完全不同的HTML页面。我以为网站的开发人员为此做了一些阻碍,所以问题是:

如何使用python请求或命令wget伪造浏览器访问?

http://www.ichangtou.com/#company:data_000008.html

I want to get the content from the below website. If I use a browser like Firefox or Chrome I could get the real website page I want, but if I use the Python requests package (or wget command) to get it, it returns a totally different HTML page. I thought the developer of the website had made some blocks for this, so the question is:

How do I fake a browser visit by using python requests or command wget?

http://www.ichangtou.com/#company:data_000008.html


回答 0

提供User-Agent标题

import requests

url = 'http://www.ichangtou.com/#company:data_000008.html'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

response = requests.get(url, headers=headers)
print(response.content)

仅供参考,这是不同浏览器的用户代理字符串的列表:


附带说明一下,有一个非常有用的第三方程序包,称为fake-useragent,它在用户代理上提供了一个不错的抽象层:

假用户代理

最新的简单useragent伪造者与实际数据库

演示:

>>> from fake_useragent import UserAgent
>>> ua = UserAgent()
>>> ua.chrome
u'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36'
>>> ua.random
u'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36'

Provide a User-Agent header:

import requests

url = 'http://www.ichangtou.com/#company:data_000008.html'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

response = requests.get(url, headers=headers)
print(response.content)

FYI, here is a list of User-Agent strings for different browsers:


As a side note, there is a pretty useful third-party package called fake-useragent that provides a nice abstraction layer over user agents:

fake-useragent

Up to date simple useragent faker with real world database

Demo:

>>> from fake_useragent import UserAgent
>>> ua = UserAgent()
>>> ua.chrome
u'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36'
>>> ua.random
u'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36'

回答 1

如果这个问题仍然有效

我使用了伪造的UserAgent

如何使用:

from fake_useragent import UserAgent
import requests


ua = UserAgent()
print(ua.chrome)
header = {'User-Agent':str(ua.chrome)}
print(header)
url = "https://www.hybrid-analysis.com/recent-submissions?filter=file&sort=^timestamp"
htmlContent = requests.get(url, headers=header)
print(htmlContent)

输出:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1309.0 Safari/537.17
{'User-Agent': 'Mozilla/5.0 (X11; OpenBSD i386) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36'}
<Response [200]>

if this question is still valid

I used fake UserAgent

How to use:

from fake_useragent import UserAgent
import requests


ua = UserAgent()
print(ua.chrome)
header = {'User-Agent':str(ua.chrome)}
print(header)
url = "https://www.hybrid-analysis.com/recent-submissions?filter=file&sort=^timestamp"
htmlContent = requests.get(url, headers=header)
print(htmlContent)

outPut:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1309.0 Safari/537.17
{'User-Agent': 'Mozilla/5.0 (X11; OpenBSD i386) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36'}
<Response [200]>

回答 2

尝试使用Firefox作为伪造的用户代理来执行此操作(此外,这是使用Cookie进行网络抓取的良好启动脚本):

#!/usr/bin/env python2
# -*- coding: utf8 -*-
# vim:ts=4:sw=4


import cookielib, urllib2, sys

def doIt(uri):
    cj = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    page = opener.open(uri)
    page.addheaders = [('User-agent', 'Mozilla/5.0')]
    print page.read()

for i in sys.argv[1:]:
    doIt(i)

用法:

python script.py "http://www.ichangtou.com/#company:data_000008.html"

Try doing this, using firefox as fake user agent (moreover, it’s a good startup script for web scraping with the use of cookies):

#!/usr/bin/env python2
# -*- coding: utf8 -*-
# vim:ts=4:sw=4


import cookielib, urllib2, sys

def doIt(uri):
    cj = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    page = opener.open(uri)
    page.addheaders = [('User-agent', 'Mozilla/5.0')]
    print page.read()

for i in sys.argv[1:]:
    doIt(i)

USAGE:

python script.py "http://www.ichangtou.com/#company:data_000008.html"

回答 3

答案的根源是,提出问题的人需要有一个JavaScript解释器才能获得所要查找的内容。我发现我可以在JSON网站上获取想要的所有信息,然后再用JavaScript对其进行解释。这为我节省了很多时间来解析html,希望每个网页都采用相同的格式。

因此,当您从网站收到使用请求的响应时,请真正查看html / text,因为您可能会在页脚中找到可解析的javascripts JSON。

The root of the answer is that the person asking the question needs to have a JavaScript interpreter to get what they are after. What I have found is I am able to get all of the information I wanted on a website in json before it was interpreted by JavaScript. This has saved me a ton of time in what would be parsing html hoping each webpage is in the same format.

So when you get a response from a website using requests really look at the html/text because you might find the javascripts JSON in the footer ready to be parsed.


使用pandas GroupBy.agg()对同一列进行多次聚合

问题:使用pandas GroupBy.agg()对同一列进行多次聚合

是否有熊猫内置的方法将两个不同的聚合函数f1, f2应用于同一列df["returns"],而无需agg()多次调用?

示例数据框:

import pandas as pd
import datetime as dt

pd.np.random.seed(0)
df = pd.DataFrame({
         "date"    :  [dt.date(2012, x, 1) for x in range(1, 11)], 
         "returns" :  0.05 * np.random.randn(10), 
         "dummy"   :  np.repeat(1, 10)
}) 

语法上错误但直观上正确的方法是:

# Assume `f1` and `f2` are defined for aggregating.
df.groupby("dummy").agg({"returns": f1, "returns": f2})

显然,Python不允许重复的键。还有其他表达方式agg()吗?也许元组列表[(column, function)]可以更好地工作,以允许将多个函数应用于同一列?但agg()似乎它只接受字典。

除了定义仅在其中应用这两个功能的辅助功能之外,还有其他解决方法吗?(无论如何,这如何与聚合一起使用?)

Is there a pandas built-in way to apply two different aggregating functions f1, f2 to the same column df["returns"], without having to call agg() multiple times?

Example dataframe:

import pandas as pd
import datetime as dt

pd.np.random.seed(0)
df = pd.DataFrame({
         "date"    :  [dt.date(2012, x, 1) for x in range(1, 11)], 
         "returns" :  0.05 * np.random.randn(10), 
         "dummy"   :  np.repeat(1, 10)
}) 

The syntactically wrong, but intuitively right, way to do it would be:

# Assume `f1` and `f2` are defined for aggregating.
df.groupby("dummy").agg({"returns": f1, "returns": f2})

Obviously, Python doesn’t allow duplicate keys. Is there any other manner for expressing the input to agg()? Perhaps a list of tuples [(column, function)] would work better, to allow multiple functions applied to the same column? But agg() seems like it only accepts a dictionary.

Is there a workaround for this besides defining an auxiliary function that just applies both of the functions inside of it? (How would this work with aggregation anyway?)


回答 0

您可以简单地将函数作为列表传递:

In [20]: df.groupby("dummy").agg({"returns": [np.mean, np.sum]})
Out[20]:         
           mean       sum
dummy                    
1      0.036901  0.369012

或作为字典:

In [21]: df.groupby('dummy').agg({'returns':
                                  {'Mean': np.mean, 'Sum': np.sum}})
Out[21]: 
        returns          
           Mean       Sum
dummy                    
1      0.036901  0.369012

You can simply pass the functions as a list:

In [20]: df.groupby("dummy").agg({"returns": [np.mean, np.sum]})
Out[20]:         
           mean       sum
dummy                    
1      0.036901  0.369012

or as a dictionary:

In [21]: df.groupby('dummy').agg({'returns':
                                  {'Mean': np.mean, 'Sum': np.sum}})
Out[21]: 
        returns          
           Mean       Sum
dummy                    
1      0.036901  0.369012

回答 1

TLDR;Pandas groupby.agg具有一种新的,更简单的语法,用于指定(1)多列聚合,以及(2)一列多个聚合。因此,要对大于等于0.25的熊猫执行此操作,请使用

df.groupby('dummy').agg(Mean=('returns', 'mean'), Sum=('returns', 'sum'))

           Mean       Sum
dummy                    
1      0.036901  0.369012

要么

df.groupby('dummy')['returns'].agg(Mean='mean', Sum='sum')

           Mean       Sum
dummy                    
1      0.036901  0.369012

大熊猫> = 0.25:命名汇总

熊猫已经改变了行为,GroupBy.agg转而使用更直观的语法来指定命名聚合。请参阅0.25文档部分中的增强功能以及相关的GitHub问题GH18366GH26512

从文档中

为了支持特定于列的聚合并控制输出列名称,pandas接受特殊的语法GroupBy.agg(),称为“命名聚合”,其中

  • 关键字是输出列名称
  • 值是元组,其第一个元素是要选择的列,第二个元素是要应用于该列的聚合。Pandas为pandas.NamedAgg namedtuple提供了字段[‘column’,’aggfunc’],以使参数更清晰。像往常一样,聚合可以是可调用的或字符串别名。

您现在可以通过关键字参数传递一个元组。元组遵循的格式(<colName>, <aggFunc>)

import pandas as pd

pd.__version__                                                                                                                            
# '0.25.0.dev0+840.g989f912ee'

# Setup
df = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'],
                   'height': [9.1, 6.0, 9.5, 34.0],
                   'weight': [7.9, 7.5, 9.9, 198.0]
})

df.groupby('kind').agg(
    max_height=('height', 'max'), min_weight=('weight', 'min'),)

      max_height  min_weight
kind                        
cat          9.5         7.9
dog         34.0         7.5

另外,您可以使用pd.NamedAgg(本质上是namedtuple)使事情更明确。

df.groupby('kind').agg(
    max_height=pd.NamedAgg(column='height', aggfunc='max'), 
    min_weight=pd.NamedAgg(column='weight', aggfunc='min')
)

      max_height  min_weight
kind                        
cat          9.5         7.9
dog         34.0         7.5

对于Series来说甚至更简单,只需将aggfunc传递给关键字参数即可。

df.groupby('kind')['height'].agg(max_height='max', min_height='min')    

      max_height  min_height
kind                        
cat          9.5         9.1
dog         34.0         6.0       

最后,如果您的列名不是有效的python标识符,请使用带有解包功能的字典:

df.groupby('kind')['height'].agg(**{'max height': 'max', ...})

熊猫<0.25

在最新版本的熊猫(最高可达0.24)中,如果使用字典为聚合输出指定列名,则会得到FutureWarning

df.groupby('dummy').agg({'returns': {'Mean': 'mean', 'Sum': 'sum'}})
# FutureWarning: using a dict with renaming is deprecated and will be removed 
# in a future version

v0.20中不推荐使用字典重命名列。在较新版本的熊猫上,可以通过传递元组列表来更简单地指定它。如果以这种方式指定函数,则该列的所有函数都必须指定为(名称,函数)对的元组。

df.groupby("dummy").agg({'returns': [('op1', 'sum'), ('op2', 'mean')]})

        returns          
            op1       op2
dummy                    
1      0.328953  0.032895

要么,

df.groupby("dummy")['returns'].agg([('op1', 'sum'), ('op2', 'mean')])

            op1       op2
dummy                    
1      0.328953  0.032895

TLDR; Pandas groupby.agg has a new, easier syntax for specifying (1) aggregations on multiple columns, and (2) multiple aggregations on a column. So, to do this for pandas >= 0.25, use

df.groupby('dummy').agg(Mean=('returns', 'mean'), Sum=('returns', 'sum'))

           Mean       Sum
dummy                    
1      0.036901  0.369012

OR

df.groupby('dummy')['returns'].agg(Mean='mean', Sum='sum')

           Mean       Sum
dummy                    
1      0.036901  0.369012

Pandas >= 0.25: Named Aggregation

Pandas has changed the behavior of GroupBy.agg in favour of a more intuitive syntax for specifying named aggregations. See the 0.25 docs section on Enhancements as well as relevant GitHub issues GH18366 and GH26512.

From the documentation,

To support column-specific aggregation with control over the output column names, pandas accepts the special syntax in GroupBy.agg(), known as “named aggregation”, where

  • The keywords are the output column names
  • The values are tuples whose first element is the column to select and the second element is the aggregation to apply to that column. Pandas provides the pandas.NamedAgg namedtuple with the fields [‘column’, ‘aggfunc’] to make it clearer what the arguments are. As usual, the aggregation can be a callable or a string alias.

You can now pass a tuple via keyword arguments. The tuples follow the format of (<colName>, <aggFunc>).

import pandas as pd

pd.__version__                                                                                                                            
# '0.25.0.dev0+840.g989f912ee'

# Setup
df = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'],
                   'height': [9.1, 6.0, 9.5, 34.0],
                   'weight': [7.9, 7.5, 9.9, 198.0]
})

df.groupby('kind').agg(
    max_height=('height', 'max'), min_weight=('weight', 'min'),)

      max_height  min_weight
kind                        
cat          9.5         7.9
dog         34.0         7.5

Alternatively, you can use pd.NamedAgg (essentially a namedtuple) which makes things more explicit.

df.groupby('kind').agg(
    max_height=pd.NamedAgg(column='height', aggfunc='max'), 
    min_weight=pd.NamedAgg(column='weight', aggfunc='min')
)

      max_height  min_weight
kind                        
cat          9.5         7.9
dog         34.0         7.5

It is even simpler for Series, just pass the aggfunc to a keyword argument.

df.groupby('kind')['height'].agg(max_height='max', min_height='min')    

      max_height  min_height
kind                        
cat          9.5         9.1
dog         34.0         6.0       

Lastly, if your column names aren’t valid python identifiers, use a dictionary with unpacking:

df.groupby('kind')['height'].agg(**{'max height': 'max', ...})

Pandas < 0.25

In more recent versions of pandas leading upto 0.24, if using a dictionary for specifying column names for the aggregation output, you will get a FutureWarning:

df.groupby('dummy').agg({'returns': {'Mean': 'mean', 'Sum': 'sum'}})
# FutureWarning: using a dict with renaming is deprecated and will be removed 
# in a future version

Using a dictionary for renaming columns is deprecated in v0.20. On more recent versions of pandas, this can be specified more simply by passing a list of tuples. If specifying the functions this way, all functions for that column need to be specified as tuples of (name, function) pairs.

df.groupby("dummy").agg({'returns': [('op1', 'sum'), ('op2', 'mean')]})

        returns          
            op1       op2
dummy                    
1      0.328953  0.032895

Or,

df.groupby("dummy")['returns'].agg([('op1', 'sum'), ('op2', 'mean')])

            op1       op2
dummy                    
1      0.328953  0.032895

回答 2

这样的事情会做:

In [7]: df.groupby('dummy').returns.agg({'func1' : lambda x: x.sum(), 'func2' : lambda x: x.prod()})
Out[7]: 
              func2     func1
dummy                        
1     -4.263768e-16 -0.188565

Would something like this work:

In [7]: df.groupby('dummy').returns.agg({'func1' : lambda x: x.sum(), 'func2' : lambda x: x.prod()})
Out[7]: 
              func2     func1
dummy                        
1     -4.263768e-16 -0.188565

使用boto3连接到CloudFront时如何选择AWS配置文件

问题:使用boto3连接到CloudFront时如何选择AWS配置文件

我正在使用Boto 3 python库,并想连接到AWS CloudFront。我需要指定正确的AWS Profile(AWS凭证),但是在查看官方文档时,我看不到指定它的方法。

我正在使用代码初始化客户端: client = boto3.client('cloudfront')

但是,这导致它使用默认配置文件进行连接。我找不到可以指定要使用的配置文件的方法。

I am using the Boto 3 python library, and want to connect to AWS CloudFront. I need to specify the correct AWS Profile (AWS Credentials), but looking at the official documentation, I see no way to specify it.

I am initializing the client using the code: client = boto3.client('cloudfront')

However, this results in it using the default profile to connect. I couldn’t find a method where I can specify which profile to use.


回答 0

我认为文档在展示如何执行此操作方面并不出色。一段时间以来,它一直是受支持的功能,并且此pull request中有一些细节。

因此,有三种不同的方法可以做到这一点:

选项A)使用配置文件创建新会话

    dev = boto3.session.Session(profile_name='dev')

选项B)在代码中更改默认会话的配置文件

    boto3.setup_default_session(profile_name='dev')

选项C)使用环境变量更改默认会话的配置文件

    $ AWS_PROFILE=dev ipython
    >>> import boto3
    >>> s3dev = boto3.resource('s3')

I think the docs aren’t wonderful at exposing how to do this. It has been a supported feature for some time, however, and there are some details in this pull request.

So there are three different ways to do this:

Option A) Create a new session with the profile

    dev = boto3.session.Session(profile_name='dev')

Option B) Change the profile of the default session in code

    boto3.setup_default_session(profile_name='dev')

Option C) Change the profile of the default session with an environment variable

    $ AWS_PROFILE=dev ipython
    >>> import boto3
    >>> s3dev = boto3.resource('s3')

回答 1

这样做以使用名称为’dev’的配置文件:

session = boto3.session.Session(profile_name='dev')
s3 = session.resource('s3')
for bucket in s3.buckets.all():
    print(bucket.name)

Do this to use a profile with name ‘dev’:

session = boto3.session.Session(profile_name='dev')
s3 = session.resource('s3')
for bucket in s3.buckets.all():
    print(bucket.name)

回答 2

boto3文档的本部分非常有用。

这对我有用:

session = boto3.Session(profile_name='dev')
client = session.client('cloudfront')

This section of the boto3 documentation is helpful.

Here’s what worked for me:

session = boto3.Session(profile_name='dev')
client = session.client('cloudfront')

回答 3

只需在客户端调用之前将配置文件添加到会话配置即可。 boto3.session.Session(profile_name='YOUR_PROFILE_NAME').client('cloudwatch')

Just add profile to session configuration before client call. boto3.session.Session(profile_name='YOUR_PROFILE_NAME').client('cloudwatch')


Python中是否有一个//运算符的上限?

问题:Python中是否有一个//运算符的上限?

我发现了//Python中的运算符,在Python 3中该运算符与下限相除。

是否有一个运算符与ceil分开?(我知道/在Python 3中执行浮点除法的运算符。)

I found out about the // operator in Python which in Python 3 does division with floor.

Is there an operator which divides with ceil instead? (I know about the / operator which in Python 3 does floating point division.)


回答 0

没有运算符与ceil分开。您需要import math使用math.ceil

There is no operator which divides with ceil. You need to import math and use math.ceil


回答 1

您可以做上下颠倒的楼层划分:

def ceildiv(a, b):
    return -(-a // b)

之所以有效,是因为Python的除法运算符执行地板除法(与C语言不同,整数除法会截断小数部分)。

这也适用于Python的大整数,因为没有(有损的)浮点转换。

这是一个示范:

>>> from __future__ import division   # a/b is float division
>>> from math import ceil
>>> b = 3
>>> for a in range(-7, 8):
...     print(["%d/%d" % (a, b), int(ceil(a / b)), -(-a // b)])
... 
['-7/3', -2, -2]
['-6/3', -2, -2]
['-5/3', -1, -1]
['-4/3', -1, -1]
['-3/3', -1, -1]
['-2/3', 0, 0]
['-1/3', 0, 0]
['0/3', 0, 0]
['1/3', 1, 1]
['2/3', 1, 1]
['3/3', 1, 1]
['4/3', 2, 2]
['5/3', 2, 2]
['6/3', 2, 2]
['7/3', 3, 3]

You can just do upside-down floor division:

def ceildiv(a, b):
    return -(-a // b)

This works because Python’s division operator does floor division (unlike in C, where integer division truncates the fractional part).

This also works with Python’s big integers, because there’s no (lossy) floating-point conversion.

Here’s a demonstration:

>>> from __future__ import division   # a/b is float division
>>> from math import ceil
>>> b = 3
>>> for a in range(-7, 8):
...     print(["%d/%d" % (a, b), int(ceil(a / b)), -(-a // b)])
... 
['-7/3', -2, -2]
['-6/3', -2, -2]
['-5/3', -1, -1]
['-4/3', -1, -1]
['-3/3', -1, -1]
['-2/3', 0, 0]
['-1/3', 0, 0]
['0/3', 0, 0]
['1/3', 1, 1]
['2/3', 1, 1]
['3/3', 1, 1]
['4/3', 2, 2]
['5/3', 2, 2]
['6/3', 2, 2]
['7/3', 3, 3]

回答 2

你可以做(x + (d-1)) // d划分时x通过d,即(x + 4) // 5

You could do (x + (d-1)) // d when dividing x by d, i.e. (x + 4) // 5.


回答 3

解决方案1:通过求反将地板转换为天花板

def ceiling_division(n, d):
    return -(n // -d)

让人联想到Penn&Teller的悬浮技巧,“将世界颠倒(带负号),使用普通地板分隔(天花板和地板已互换),然后使世界朝上(带负号)。 ”

解决方案2:让divmod()完成工作

def ceiling_division(n, d):
    q, r = divmod(n, d)
    return q + bool(r)

所述divmod()函数给出(a // b, a % b)为整数(这可能是用浮漂较不可靠,由于舍入误差)。bool(r)每当存在非零余数时,带有的步骤会将商加1。

解决方案3:在除法之前调整分子

def ceiling_division(n, d):
    return (n + d - 1) // d

向上平移分子,以便将地板划分向下舍入到所需的上限。注意,这仅适用于整数。

解决方案4:转换为浮点数以使用math.ceil()

def ceiling_division(n, d):
    return math.ceil(n / d)

math.ceil()代码很容易理解,但它从整数到彩车和背部转换。这不是很快,并且可能存在舍入问题。而且,它依赖于Python 3语义,其中“真除法”产生浮点,而ceil()函数返回整数。

Solution 1: Convert floor to ceiling with negation

def ceiling_division(n, d):
    return -(n // -d)

Reminiscent of the Penn & Teller levitation trick, this “turns the world upside down (with negation), uses plain floor division (where the ceiling and floor have been swapped), and then turns the world right-side up (with negation again)”

Solution 2: Let divmod() do the work

def ceiling_division(n, d):
    q, r = divmod(n, d)
    return q + bool(r)

The divmod() function gives (a // b, a % b) for integers (this may be less reliable with floats due to round-off error). The step with bool(r) adds one to the quotient whenever there is a non-zero remainder.

Solution 3: Adjust the numerator before the division

def ceiling_division(n, d):
    return (n + d - 1) // d

Translate the numerator upwards so that floor division rounds down to the intended ceiling. Note, this only works for integers.

Solution 4: Convert to floats to use math.ceil()

def ceiling_division(n, d):
    return math.ceil(n / d)

The math.ceil() code is easy to understand, but it converts from ints to floats and back. This isn’t very fast and it may have rounding issues. Also, it relies on Python 3 semantics where “true division” produces a float and where the ceil() function returns an integer.


回答 4

您也可以随时内联进行

((foo - 1) // bar) + 1

在python3中,只要您关心速度,这比强制进行float除法和调用ceil()快一个数量级。除非您已经通过使用证明,否则您不应该这样做。

>>> timeit.timeit("((5 - 1) // 4) + 1", number = 100000000)
1.7249219375662506
>>> timeit.timeit("ceil(5/4)", setup="from math import ceil", number = 100000000)
12.096064013894647

You can always just do it inline as well

((foo - 1) // bar) + 1

In python3, this is just shy of an order of magnitude faster than forcing the float division and calling ceil(), provided you care about the speed. Which you shouldn’t, unless you’ve proven through usage that you need to.

>>> timeit.timeit("((5 - 1) // 4) + 1", number = 100000000)
1.7249219375662506
>>> timeit.timeit("ceil(5/4)", setup="from math import ceil", number = 100000000)
12.096064013894647

回答 5

请注意math.ceil限制为53位精度。如果使用大整数,则可能无法获得准确的结果。

gmpy2 libary提供c_div它采用天花板的舍入函数。

免责声明:我维护gmpy2。

Note that math.ceil is limited to 53 bits of precision. If you are working with large integers, you may not get exact results.

The gmpy2 libary provides a c_div function which uses ceiling rounding.

Disclaimer: I maintain gmpy2.


回答 6

简单的解决方案:a // b + 1

Simple solution: a // b + 1