将变量名作为字符串获取

问题:将变量名作为字符串获取

此线程讨论如何在Python中以字符串形式获取函数名称如何以字符串 形式获取函数名称?

如何对变量执行相同操作?与函数相反,Python变量没有__name__属性。

换句话说,如果我有一个变量,例如:

foo = dict()
foo['bar'] = 2

我正在寻找一个功能/属性,例如retrieve_name(),以便从此列表在Pandas中创建一个DataFrame,其中列名由实际字典的名称给出:

# List of dictionaries for my DataFrame
list_of_dicts = [n_jobs, users, queues, priorities]
columns = [retrieve_name(d) for d in list_of_dicts] 

This thread discusses how to get the name of a function as a string in Python: How to get a function name as a string?

How can I do the same for a variable? As opposed to functions, Python variables do not have the __name__ attribute.

In other words, if I have a variable such as:

foo = dict()
foo['bar'] = 2

I am looking for a function/attribute, e.g. retrieve_name() in order to create a DataFrame in Pandas from this list, where the column names are given by the names of the actual dictionaries:

# List of dictionaries for my DataFrame
list_of_dicts = [n_jobs, users, queues, priorities]
columns = [retrieve_name(d) for d in list_of_dicts] 

回答 0

使用该python-varname包,您可以轻松检索变量的名称

https://github.com/pwwang/python-varname

就您而言,您可以执行以下操作:

from varname import Wrapper

foo = Wrapper(dict())

# foo.name == 'foo'
# foo.value == {}
foo.value['bar'] = 2

或者,您也可以尝试直接检索变量名称:

from varname import nameof

foo = dict()

fooname = nameof(foo)
# fooname == 'foo'

我是这个软件包的作者。如果您有任何疑问,请告诉我,或者可以在github上提交问题。

Using the python-varname package, you can easily retrieve the name of the variables

https://github.com/pwwang/python-varname

In your case, you can do:

from varname import Wrapper

foo = Wrapper(dict())

# foo.name == 'foo'
# foo.value == {}
foo.value['bar'] = 2

For list comprehension part, you can do:

n_jobs = Wrapper(<original_value>) 
users = Wrapper(<original_value>) 
queues = Wrapper(<original_value>) 
priorities = Wrapper(<original_value>) 

list_of_dicts = [n_jobs, users, queues, priorities]
columns = [d.name for d in list_of_dicts]
# ['n_jobs', 'users', 'queues', 'priorities']
# REMEMBER that you have to access the <original_value> by d.value

You can also try to retrieve the variable name DIRECTLY:

from varname import nameof

foo = dict()

fooname = nameof(foo)
# fooname == 'foo'

Note that this is working in this case as you expected:

n_jobs = <original_value>
d = n_jobs

nameof(d) # will return d, instead of n_jobs
# nameof only works directly with the variable

I am the author of this package. Please let me know if you have any questions or you can submit issues on Github.


回答 1

Python中唯一具有规范名称的对象是模块,函数和类,并且在定义函数或类或导入模块后,当然不能保证此规范名称在任何命名空间中都具有任何含义。这些名称也可以在创建对象后进行修改,因此它们可能并不总是特别值得信赖。

如果不递归地遍历命名对象的树,你要做的就是不可能; 名称是对对象的单向引用。常见的或具有花园多样性的Python对象不包含对其名称的引用。想象一下,如果要维护代表表示引用它的名称的字符串列表,是否需要每个整数,每个字典,每个列表,每个布尔值!这将是实施的噩梦,对程序员几乎没有好处。

The only objects in Python that have canonical names are modules, functions, and classes, and of course there is no guarantee that this canonical name has any meaning in any namespace after the function or class has been defined or the module imported. These names can also be modified after the objects are created so they may not always be particularly trustworthy.

What you want to do is not possible without recursively walking the tree of named objects; a name is a one-way reference to an object. A common or garden-variety Python object contains no references to its names. Imagine if every integer, every dict, every list, every Boolean needed to maintain a list of strings that represented names that referred to it! It would be an implementation nightmare, with little benefit to the programmer.


回答 2

即使变量值没有指向名称,您也可以访问每个分配的变量及其值的列表,所以我很惊讶只有一个人建议在其中循环查找您的var名称。

在该答复中提到有人说,你可能必须走栈和检查每个人的当地人和全局找到foo,但如果foo在你调用这个范围被分配retrieve_name的功能,你可以用inspectcurrent frame,让你所有的局部变量。

我的解释可能有点罗word(也许我应该少用“ foo”一词),但这是它在代码中的样子(请注意,如果有多个变量分配给相同的值,您将获得这两个变量名):

import inspect

x,y,z = 1,2,3

def retrieve_name(var):
    callers_local_vars = inspect.currentframe().f_back.f_locals.items()
    return [var_name for var_name, var_val in callers_local_vars if var_val is var]

print retrieve_name(y)

如果要从另一个函数调用此函数,则类似:

def foo(bar):
    return retrieve_name(bar)

foo(baz)

而您想要baz代替bar,则只需进一步返回范围。这可以通过.f_backcaller_local_vars初始化中添加额外的内容来完成。

在这里查看示例:ideone

Even if variable values don’t point back to the name, you have access to the list of every assigned variable and its value, so I’m astounded that only one person suggested looping through there to look for your var name.

Someone mentioned on that answer that you might have to walk the stack and check everyone’s locals and globals to find foo, but if foo is assigned in the scope where you’re calling this retrieve_name function, you can use inspect‘s current frame to get you all of those local variables.

My explanation might be a little bit too wordy (maybe I should’ve used a “foo” less words), but here’s how it would look in code (Note that if there is more than one variable assigned to the same value, you will get both of those variable names):

import inspect

x,y,z = 1,2,3

def retrieve_name(var):
    callers_local_vars = inspect.currentframe().f_back.f_locals.items()
    return [var_name for var_name, var_val in callers_local_vars if var_val is var]

print retrieve_name(y)

If you’re calling this function from another function, something like:

def foo(bar):
    return retrieve_name(bar)

foo(baz)

And you want the baz instead of bar, you’ll just need to go back a scope further. This can be done by adding an extra .f_back in the caller_local_vars initialization.

See an example here: ideone


回答 3

使用Python 3.8可以简单地使用f字符串调试功能:

>>> foo = dict()
>>> f'{foo=}'.split('=')[0]
'foo' 

With Python 3.8 one can simply use f-string debugging feature:

>>> foo = dict()
>>> f'{foo=}'.split('=')[0]
'foo' 

回答 4

在python3上,此函数将在堆栈中获得最外部的名称:

import inspect


def retrieve_name(var):
        """
        Gets the name of var. Does it from the out most frame inner-wards.
        :param var: variable to get name from.
        :return: string
        """
        for fi in reversed(inspect.stack()):
            names = [var_name for var_name, var_val in fi.frame.f_locals.items() if var_val is var]
            if len(names) > 0:
                return names[0]

它在代码中的任何地方都很有用。遍历反向堆栈以查找第一个匹配项。

On python3, this function will get the outer most name in the stack:

import inspect


def retrieve_name(var):
        """
        Gets the name of var. Does it from the out most frame inner-wards.
        :param var: variable to get name from.
        :return: string
        """
        for fi in reversed(inspect.stack()):
            names = [var_name for var_name, var_val in fi.frame.f_locals.items() if var_val is var]
            if len(names) > 0:
                return names[0]

It is useful anywhere on the code. Traverses the reversed stack looking for the first match.


回答 5

我不认为这是可能的。考虑以下示例:

>>> a = []
>>> b = a
>>> id(a)
140031712435664
>>> id(b)
140031712435664

ab指向同一个对象,但对象无法知道指向它哪些变量。

I don’t believe this is possible. Consider the following example:

>>> a = []
>>> b = a
>>> id(a)
140031712435664
>>> id(b)
140031712435664

The a and b point to the same object, but the object can’t know what variables point to it.


回答 6

def name(**variables):
    return [x for x in variables]

它的用法如下:

name(variable=variable)
def name(**variables):
    return [x for x in variables]

It’s used like this:

name(variable=variable)

回答 7

这是一种方法。我不会推荐任何重要的东西,因为它会很脆。但这是可以完成的。

创建一个使用inspect模块查找调用它的源代码的函数。然后,您可以解析源代码以标识要检索的变量名称。例如,这是一个名为的函数autodict,该函数获取变量列表并返回将变量名称映射为其值的字典。例如:

x = 'foo'
y = 'bar'
d = autodict(x, y)
print d

将给出:

{'x': 'foo', 'y': 'bar'}

检查源代码本身比搜索locals()or 更好,globals()因为后一种方法不会告诉您哪个变量是您想要的变量。

无论如何,这是代码:

def autodict(*args):
    get_rid_of = ['autodict(', ',', ')', '\n']
    calling_code = inspect.getouterframes(inspect.currentframe())[1][4][0]
    calling_code = calling_code[calling_code.index('autodict'):]
    for garbage in get_rid_of:
        calling_code = calling_code.replace(garbage, '')
    var_names, var_values = calling_code.split(), args
    dyn_dict = {var_name: var_value for var_name, var_value in
                zip(var_names, var_values)}
    return dyn_dict

该操作在的行中进行inspect.getouterframes,该操作返回调用的代码中的字符串autodict

这种魔术的明显缺点是,它对源代码的结构进行了假设。当然,如果它在解释器中运行,它将根本无法工作。

Here’s one approach. I wouldn’t recommend this for anything important, because it’ll be quite brittle. But it can be done.

Create a function that uses the inspect module to find the source code that called it. Then you can parse the source code to identify the variable names that you want to retrieve. For example, here’s a function called autodict that takes a list of variables and returns a dictionary mapping variable names to their values. E.g.:

x = 'foo'
y = 'bar'
d = autodict(x, y)
print d

Would give:

{'x': 'foo', 'y': 'bar'}

Inspecting the source code itself is better than searching through the locals() or globals() because the latter approach doesn’t tell you which of the variables are the ones you want.

At any rate, here’s the code:

def autodict(*args):
    get_rid_of = ['autodict(', ',', ')', '\n']
    calling_code = inspect.getouterframes(inspect.currentframe())[1][4][0]
    calling_code = calling_code[calling_code.index('autodict'):]
    for garbage in get_rid_of:
        calling_code = calling_code.replace(garbage, '')
    var_names, var_values = calling_code.split(), args
    dyn_dict = {var_name: var_value for var_name, var_value in
                zip(var_names, var_values)}
    return dyn_dict

The action happens in the line with inspect.getouterframes, which returns the string within the code that called autodict.

The obvious downside to this sort of magic is that it makes assumptions about how the source code is structured. And of course, it won’t work at all if it’s run inside the interpreter.


回答 8

我编写了包裹法术来稳健地执行这种魔术。你可以写:

from sorcery import dict_of

columns = dict_of(n_jobs, users, queues, priorities)

并将其传递给dataframe构造函数。等效于:

columns = dict(n_jobs=n_jobs, users=users, queues=queues, priorities=priorities)

I wrote the package sorcery to do this kind of magic robustly. You can write:

from sorcery import dict_of

columns = dict_of(n_jobs, users, queues, priorities)

and pass that to the dataframe constructor. It’s equivalent to:

columns = dict(n_jobs=n_jobs, users=users, queues=queues, priorities=priorities)

回答 9

>>> locals()['foo']
{}
>>> globals()['foo']
{}

如果要编写自己的函数,可以这样做,以便可以检查在本地变量中定义的变量,然后检查全局变量。如果未找到任何内容,则可以在id()上进行比较,以查看变量是否指向内存中的相同位置。

如果变量在类中,则可以使用className。dict .keys()或vars(self)来查看变量是否已定义。

>>> locals()['foo']
{}
>>> globals()['foo']
{}

If you wanted to write your own function, it could be done such that you could check for a variable defined in locals then check globals. If nothing is found you could compare on id() to see if the variable points to the same location in memory.

If your variable is in a class, you could use className.dict.keys() or vars(self) to see if your variable has been defined.


回答 10

在Python中,defandclass关键字会将特定名称绑定到它们定义的对象(函数或类)。同样,模块被称为文件系统中的特定名称。在这三种情况下,都有一种明显的方法可以为所讨论的对象分配“规范”名称。

但是,对于其他种类的对象,这样的规范名称可能根本不存在。例如,考虑列表的元素。列表中的元素没有单独命名,并且完全有可能在程序中引用它们的唯一方法是使用包含列表中的列表索引。如果将这样的对象列表传递到函数中,则可能无法为这些值分配有意义的标识符。

Python不会将分配左侧的名称保存到分配的对象中,因为:

  1. 这就需要弄清楚在多个冲突对象中哪个名称是“规范的”,
  2. 对于从未分配给显式变量名称的对象没有任何意义,
  3. 效率极低,
  4. 从字面上看,没有其他语言可以做到这一点。

因此,例如,使用定义的函数lambda将始终具有“ name” <lambda>,而不是特定的函数名。

最好的方法是简单地要求调用方法传递一个(可选)名称列表。如果键入'...','...'太麻烦,则可以接受例如包含逗号分隔的名称列表的单个字符串(就像namedtuple这样)。

In Python, the def and class keywords will bind a specific name to the object they define (function or class). Similarly, modules are given a name by virtue of being called something specific in the filesystem. In all three cases, there’s an obvious way to assign a “canonical” name to the object in question.

However, for other kinds of objects, such a canonical name may simply not exist. For example, consider the elements of a list. The elements in the list are not individually named, and it is entirely possible that the only way to refer to them in a program is by using list indices on the containing list. If such a list of objects was passed into your function, you could not possibly assign meaningful identifiers to the values.

Python doesn’t save the name on the left hand side of an assignment into the assigned object because:

  1. It would require figuring out which name was “canonical” among multiple conflicting objects,
  2. It would make no sense for objects which are never assigned to an explicit variable name,
  3. It would be extremely inefficient,
  4. Literally no other language in existence does that.

So, for example, functions defined using lambda will always have the “name” <lambda>, rather than a specific function name.

The best approach would be simply to ask the caller to pass in an (optional) list of names. If typing the '...','...' is too cumbersome, you could accept e.g. a single string containing a comma-separated list of names (like namedtuple does).


回答 11

>> my_var = 5
>> my_var_name = [ k for k,v in locals().items() if v == my_var][0]
>> my_var_name 
'my_var'

locals()-返回包含当前范围的局部变量的字典。通过遍历该字典,我们可以检查具有等于定义的变量的值的键,仅提取键将为我们提供字符串格式的变量文本。

来自(经过一些更改) https://www.tutorialspoint.com/How-to-get-a-variable-name-as-a-string-in-Python

>> my_var = 5
>> my_var_name = [ k for k,v in locals().items() if v == my_var][0]
>> my_var_name 
'my_var'

locals() – Return a dictionary containing the current scope’s local variables. by iterating through this dictionary we can check the key which has a value equal to the defined variable, just extracting the key will give us the text of variable in string format.

from (after a bit changes) https://www.tutorialspoint.com/How-to-get-a-variable-name-as-a-string-in-Python


回答 12

我认为用Python做到这一点是如此困难,因为一个简单的事实,就是您永远不会知道所使用的变量的名称。因此,在他的示例中,您可以执行以下操作:

代替:

list_of_dicts = [n_jobs, users, queues, priorities]

dict_of_dicts = {"n_jobs" : n_jobs, "users" : users, "queues" : queues, "priorities" : priorities}

I think it’s so difficult to do this in Python because of the simple fact that you never will not know the name of the variable you’re using. So, in his example, you could do:

Instead of:

list_of_dicts = [n_jobs, users, queues, priorities]

dict_of_dicts = {"n_jobs" : n_jobs, "users" : users, "queues" : queues, "priorities" : priorities}

回答 13

这是基于输入变量的内容执行此操作的另一种方法:

(它返回与输入变量匹配的第一个变量的名称,否则返回None。可以对其进行修改以获取与输入变量具有相同内容的所有变量名称)

def retrieve_name(x, Vars=vars()):
    for k in Vars:
        if type(x) == type(Vars[k]):
            if x is Vars[k]:
                return k
    return None

just another way to do this based on the content of input variable:

(it returns the name of the first variable that matches to the input variable, otherwise None. One can modify it to get all variable names which are having the same content as input variable)

def retrieve_name(x, Vars=vars()):
    for k in Vars:
        if type(x) == type(Vars[k]):
            if x is Vars[k]:
                return k
    return None

回答 14

此函数将打印变量名称及其值:

import inspect

def print_this(var):
    callers_local_vars = inspect.currentframe().f_back.f_locals.items()
    print(str([k for k, v in callers_local_vars if v is var][0])+': '+str(var))
***Input & Function call:***
my_var = 10

print_this(my_var)

***Output**:*
my_var: 10

This function will print variable name with its value:

import inspect

def print_this(var):
    callers_local_vars = inspect.currentframe().f_back.f_locals.items()
    print(str([k for k, v in callers_local_vars if v is var][0])+': '+str(var))
***Input & Function call:***
my_var = 10

print_this(my_var)

***Output**:*
my_var: 10

回答 15

我有一种方法,虽然不是最有效的 …但是有效!(并且它不涉及任何高级模块)。

基本上,它将变量的IDglobals()变量的ID进行比较,然后返回匹配项的名称。

def getVariableName(variable, globalVariables=globals().copy()):
    """ Get Variable Name as String by comparing its ID to globals() Variables' IDs

        args:
            variable(var): Variable to find name for (Obviously this variable has to exist)

        kwargs:
            globalVariables(dict): Copy of the globals() dict (Adding to Kwargs allows this function to work properly when imported from another .py)
    """
    for globalVariable in globalVariables:
        if id(variable) == id(globalVariables[globalVariable]): # If our Variable's ID matches this Global Variable's ID...
            return globalVariable # Return its name from the Globals() dict

I have a method, and while not the most efficient…it works! (and it doesn’t involve any fancy modules).

Basically it compares your Variable’s ID to globals() Variables’ IDs, then returns the match’s name.

def getVariableName(variable, globalVariables=globals().copy()):
    """ Get Variable Name as String by comparing its ID to globals() Variables' IDs

        args:
            variable(var): Variable to find name for (Obviously this variable has to exist)

        kwargs:
            globalVariables(dict): Copy of the globals() dict (Adding to Kwargs allows this function to work properly when imported from another .py)
    """
    for globalVariable in globalVariables:
        if id(variable) == id(globalVariables[globalVariable]): # If our Variable's ID matches this Global Variable's ID...
            return globalVariable # Return its name from the Globals() dict

回答 16

如果目标是帮助您跟踪变量,则可以编写一个简单的函数来标记变量并返回其值和类型。例如,假设i_f = 3.01并将其四舍五入为一个称为i_n的整数以在代码中使用,然后需要一个字符串i_s并将其输入报告中。

def whatis(string, x):
    print(string+' value=',repr(x),type(x))
    return string+' value='+repr(x)+repr(type(x))
i_f=3.01
i_n=int(i_f)
i_s=str(i_n)
i_l=[i_f, i_n, i_s]
i_u=(i_f, i_n, i_s)

## make report that identifies all types
report='\n'+20*'#'+'\nThis is the report:\n'
report+= whatis('i_f ',i_f)+'\n'
report+=whatis('i_n ',i_n)+'\n'
report+=whatis('i_s ',i_s)+'\n'
report+=whatis('i_l ',i_l)+'\n'
report+=whatis('i_u ',i_u)+'\n'
print(report)

在每次调用时,此命令都会打印到窗口中以进行调试,并为书面报告生成一个字符串。唯一的缺点是,每次调用该函数时,必须两次键入该变量。

我是Python的新手,发现这种非常有用的方式可以记录我在编程时的工作并尝试处理Python中的所有对象。一个缺点是,如果whatis()调用在使用它的过程之外描述的函数,则它将失败。例如,int(i_f)是有效的函数调用,仅是因为Python知道了int函数。您可以使用int(i_f ** 2)调用whatis(),但是如果出于某些奇怪的原因而选择定义一个名为int_squared的函数,则必须在使用whatis()的过程中声明它。

If the goal is to help you keep track of your variables, you can write a simple function that labels the variable and returns its value and type. For example, suppose i_f=3.01 and you round it to an integer called i_n to use in a code, and then need a string i_s that will go into a report.

def whatis(string, x):
    print(string+' value=',repr(x),type(x))
    return string+' value='+repr(x)+repr(type(x))
i_f=3.01
i_n=int(i_f)
i_s=str(i_n)
i_l=[i_f, i_n, i_s]
i_u=(i_f, i_n, i_s)

## make report that identifies all types
report='\n'+20*'#'+'\nThis is the report:\n'
report+= whatis('i_f ',i_f)+'\n'
report+=whatis('i_n ',i_n)+'\n'
report+=whatis('i_s ',i_s)+'\n'
report+=whatis('i_l ',i_l)+'\n'
report+=whatis('i_u ',i_u)+'\n'
print(report)

This prints to the window at each call for debugging purposes and also yields a string for the written report. The only downside is that you have to type the variable twice each time you call the function.

I am a Python newbie and found this very useful way to log my efforts as I program and try to cope with all the objects in Python. One flaw is that whatis() fails if it calls a function described outside the procedure where it is used. For example, int(i_f) was a valid function call only because the int function is known to Python. You could call whatis() using int(i_f**2), but if for some strange reason you choose to define a function called int_squared it must be declared inside the procedure where whatis() is used.


回答 17

也许这可能是有用的:

def Retriever(bar):
    return (list(globals().keys()))[list(map(lambda x: id(x), list(globals().values()))).index(id(bar))]

该函数遍历全局作用域中值的ID列表(可以编辑命名空间),根据其ID查找所需/所需的var或函数的索引,然后从基于以下名称的全局名称列表中返回名称在获取的索引上。

Maybe this could be useful:

def Retriever(bar):
    return (list(globals().keys()))[list(map(lambda x: id(x), list(globals().values()))).index(id(bar))]

The function goes through the list of IDs of values from the global scope (the namespace could be edited), finds the index of the wanted/required var or function based on its ID, and then returns the name from the list of global names based on the acquired index.


回答 18

以下方法不会返回变量的名称,但是如果变量在全局范围内可用,则使用此方法可以轻松创建数据框。

class CustomDict(dict):
    def __add__(self, other):
        return CustomDict({**self, **other})

class GlobalBase(type):
    def __getattr__(cls, key):
        return CustomDict({key: globals()[key]})

    def __getitem__(cls, keys):
        return CustomDict({key: globals()[key] for key in keys})

class G(metaclass=GlobalBase):
    pass

x, y, z = 0, 1, 2

print('method 1:', G['x', 'y', 'z']) # Outcome: method 1: {'x': 0, 'y': 1, 'z': 2}
print('method 2:', G.x + G.y + G.z) # Outcome: method 2: {'x': 0, 'y': 1, 'z': 2}

A = [0, 1]
B = [1, 2]
pd.DataFrame(G.A + G.B) # It will return a data frame with A and B columns

Following method will not return the name of variable but using this method you can create data frame easily if variable is available in global scope.

class CustomDict(dict):
    def __add__(self, other):
        return CustomDict({**self, **other})

class GlobalBase(type):
    def __getattr__(cls, key):
        return CustomDict({key: globals()[key]})

    def __getitem__(cls, keys):
        return CustomDict({key: globals()[key] for key in keys})

class G(metaclass=GlobalBase):
    pass

x, y, z = 0, 1, 2

print('method 1:', G['x', 'y', 'z']) # Outcome: method 1: {'x': 0, 'y': 1, 'z': 2}
print('method 2:', G.x + G.y + G.z) # Outcome: method 2: {'x': 0, 'y': 1, 'z': 2}

A = [0, 1]
B = [1, 2]
pd.DataFrame(G.A + G.B) # It will return a data frame with A and B columns

回答 19

对于常量,可以使用枚举,该枚举支持检索其名称。

For constants, you can use an enum, which supports retrieving its name.


回答 20

我尝试从本地检查人员那里获取名称,但是它无法处理像a [1],b.val这样的var。之后,我有了一个新主意—从代码中获取var名称,然后尝试使用succ!如下代码:

#direct get from called function code
def retrieve_name_ex(var):
    stacks = inspect.stack()
    try:
        func = stacks[0].function
        code = stacks[1].code_context[0]
        s = code.index(func)
        s = code.index("(", s + len(func)) + 1
        e = code.index(")", s)
        return code[s:e].strip()
    except:
        return ""

I try to get name from inspect locals, but it cann’t process var likes a[1], b.val. After it, I got a new idea — get var name from the code, and I try it succ! code like below:

#direct get from called function code
def retrieve_name_ex(var):
    stacks = inspect.stack()
    try:
        func = stacks[0].function
        code = stacks[1].code_context[0]
        s = code.index(func)
        s = code.index("(", s + len(func)) + 1
        e = code.index(")", s)
        return code[s:e].strip()
    except:
        return ""

回答 21

您可以尝试以下操作来检索定义的函数的名称(尽管不适用于内置函数):

import re
def retrieve_name(func):
    return re.match("<function\s+(\w+)\s+at.*", str(func)).group(1)

def foo(x):
    return x**2

print(retrieve_name(foo))
# foo

You can try the following to retrieve the name of a function you defined (does not work for built-in functions though):

import re
def retrieve_name(func):
    return re.match("<function\s+(\w+)\s+at.*", str(func)).group(1)

def foo(x):
    return x**2

print(retrieve_name(foo))
# foo

是否有类似RStudio for Python的东西?[关闭]

问题:是否有类似RStudio for Python的东西?[关闭]

在RStudio中,可以在代码编辑窗口中运行部分代码,结果将显示在控制台中。

您还可以做一些很酷的事情,例如选择是运行光标之前的所有内容,还是光标之后的所有内容,还是只是选择的部分,等等。所有这些东西都有热键。

这就像Python交互式外壳之上的一个步骤-您可以在其中使用readline返回上一行,但是它没有任何关于功能是什么,代码段等的“概念”。

是否有类似Python的工具?或者,您是否有某种类似的解决方法,例如在vim中使用?

In RStudio, you can run parts of code in the code editing window, and the results appear in the console.

You can also do cool stuff like selecting whether you want everything up to the cursor to run, or everything after the cursor, or just the part that you selected, and so on. And there are hot keys for all that stuff.

It’s like a step above the interactive shell in Python — there you can use readline to go back to previous individual lines, but it doesn’t have any “concept” of what a function is, a section of code, etc.

Is there a tool like that for Python? Or, do you have some sort of similar workaround that you use, say, in vim?


回答 0

IPython Notebooks很棒。我最近发现了另一个基于浏览器的更新工具:Rodeo。我的印象是,它似乎可以更好地支持类似RStudio的工作流程。

Rodeo屏幕截图

IPython Notebooks are awesome. Here’s another, newer browser-based tool I’ve recently discovered: Rodeo. My impression is that it seems to better support an RStudio-like workflow.

Rodeo screenshot


回答 1

Jupyter Notebook(以前称为IPython Notebook)是一个非常酷的项目,用于使用Python(和其他语言,包括R)进行交互式数据操作。基本上,它允许您在一个界面中交互地编码和记录正在执行的操作,然后将其另存为:

  • 笔记本(.ipynb
  • 脚本(仅包含源代码的.py文件)
  • 静态html(因此也是pdf)

您甚至可以使用nbviewer服务与他人在线共享您的笔记本,该服务使人们可以出版整本书。此外,GitHub 呈现您的.ipynb文件。您可以将Jupyter笔记本作为可复制的研究文章发表在Authorea上。要由多个用户进行协作编辑,请查看基于Jupyter构建的Google Colab。

Jupyter Notebook屏幕截图

Jupyter Notebook的默认版本在本地启动Web应用程序(或将其部署到服务器),然后从浏览器中使用它。正如Ryan在回答中提到的那样,Rodeo是一个与基于Jupyter内核构建的RStudio更相似的界面。

JupyterLab是UI的较新版本,它为您编辑笔记本,控制交互式小部件甚至在终端仿真器中运行命令提供了更大的灵活性。

还有一个用于IPythonQt控制台,这是一个带有嵌入式绘图的类似项目,它是一个桌面应用程序。

Jupyter是一个普通的Python软件包,可以使用安装pip install jupyter。但是,要使所有科学图书馆都在您的计算机上运行,​​尝试使用官方的Jupyter Docker容器可能会更容易。例如,假设您的笔记本在〜/ code / jupyter中,则可以按以下方式运行容器:

docker run -it --rm -p 8888:8888 -v ~/code/jupyter:/home/jovyan/work jupyter/datascience-notebook

Jupyter Notebook (previously known as IPython notebook) is a really cool project for interactive data manipulation in Python (and other languages, including R). It basically allows you to interactively code and document what you’re doing in one interface and later on save it as a:

  • notebook (.ipynb)
  • script (a .py file including only the source code)
  • static html (and therefore pdf as well)

You can even share your notebooks online with others using the nbviewer service, where people publish whole books. Furthermore, GitHub renders your .ipynb files. You can publish your Jupyter Notebooks as reproducible research articles on Authorea. For collaborative editing by multiple users, check out Google Colab built on top of Jupyter.

Jupyter Notebook Screenshot

The default Jupyter Notebook version starts a web application locally (or you deploy it to a server) and you use it from your browser. As Ryan also mentioned in his answer, Rodeo is an interface more similar to RStudio built on top of the Jupyter kernel.

JupyterLab is a newer take on the UI allowing for more flexibility in how you edit your notebooks, control interactive widgets and even run commands in terminal emulators.

There’s also a Qt console for IPython, a similar project with inline plots, which is a desktop application.

Jupyter is a normal Python package and can be installed using pip install jupyter. To get all the scientific libraries running on your computer, however, it might be easier to try the official Jupyter Docker containers. For example, assuming your notebooks are in ~/code/jupyter, you can run the container as:

docker run -it --rm -p 8888:8888 -v ~/code/jupyter:/home/jovyan/work jupyter/datascience-notebook

回答 2

spyder或安装python(x,y)。这太棒了。

如果您不熟悉Python,则可以安装免费的Anaconda发行版(http://continuum.io/downloads.html),它将为您安装Spyder以及Python 2.7和IPython。Spyder与RStudio非常相似。

spyder or install python(x,y). it is great.

If you are new to Python, you can install the free Anaconda distribution (http://continuum.io/downloads.html), which will install Spyder for you, as well as Python 2.7 and IPython. Spyder is very similar to RStudio.


回答 3

如果您正在寻找RStudio for Python之类的东西,请查看Yhat的Rodeo

牛仔竞技有:

  • 文本编辑器(在后台使用Atom)
  • Vim / Emacs模式
  • IPython控制台
  • 自动完成
  • 文档字符串
  • 能够查看图表,数据框,变量

Check out Rodeo from Yhat if you’re looking for something like RStudio for Python.

Rodeo has:

  • text editor (uses Atom under the hood)
  • Vim / Emacs mode
  • an IPython console
  • autocomplete
  • docstrings
  • ability to see plots, dataframes, variables

回答 4

您可能需要研究JupyterLab(下一代Jupyter Notbooks):https : //github.com/jupyter/jupyterlab

JupyterLab旨在在Web上创建更类似于桌面的体验。

更新:截至2018年3月,JupyterLab处于beta版。“该Beta版本适合一般使用。对于JupyterLab扩展开发人员而言,扩展API将会继续发展,直到1.0版本。最终,JupyterLab将在JupyterLab达到1.0后替换经典的Jupyter Notebook。

要将Jupyter Lab作为桌面应用程序运行,请参阅 christopherroach.com/articles/jupyterlab-desktop-app(感谢PatrickT)。

快速预览:

在此处输入图片说明

您可以在监视系统的终端上方的图形控制台旁边放置一个笔记本,同时将文件管理器保持在左侧:

在此处输入图片说明

有关更多详细信息,请参见:https : //blog.jupyter.org/2016/07/14/jupyter-lab-alpha/以及此处:http : //www.techatbloomberg.com/blog/inside-the-collaboration-that-内置了开源jupyterlab-project /

You might want to look into JupyterLab (the next generation of Jupyter Notbooks): https://github.com/jupyter/jupyterlab.

JupyterLab aims to create a more desktop-like experience on the Web.

Update: As of March 2018 JupyterLab is in beta. “The beta releases are suitable for general usage. For JupyterLab extension developers, the extension APIs will continue to evolve until the 1.0 release. Eventually, JupyterLab will replace the classic Jupyter Notebook after JupyterLab reaches 1.0.

To run Jupyter Lab as a Desktop Application, see christopherroach.com/articles/jupyterlab-desktop-app (Thanks to PatrickT).

Here’s a quick preview:

enter image description here

You can arrange a notebook next to a graphical console atop a terminal that is monitoring the system, while keeping the file manager on the left:

enter image description here

For more details see: https://blog.jupyter.org/2016/07/14/jupyter-lab-alpha/ and here: http://www.techatbloomberg.com/blog/inside-the-collaboration-that-built-the-open-source-jupyterlab-project/.


回答 5

Pycharm是一个非常不错的IDE。从目前为止我所看到的,它与Rstudio最相似。另一个不错的功能是,它允许您以类似于Rstudio的方式安装新的Python库(否则可能是一场噩梦)。现在有一个免费的“社区”版。

在此处输入图片说明

Pycharm is a really decent IDE. From what I have seen so far it is the most similar to Rstudio. Another nice piece is that it allows you to install new Python libraries in a fashion similar to Rstudio (which otherwise can be a nightmare). There is now a free ‘community’ edition.

enter image description here


回答 6

我认为值得一提的是RStudio v1.1.359 Preview已发布。它具有可用于Python的终端功能。

在此处下载

文档在这里

I think it is worth while to mention that RStudio v1.1.359 Preview is released. It has terminal feature that can be used for Python.

Download is available here

Documentation is available here


回答 7

间谍是您所需要的! https://code.google.com/p/spyderlib/
Spyder(以前称为Pydee)是功能强大的Python语言交互式开发环境,具有高级编辑,交互式测试,调试和自省功能

spyder is you need! https://code.google.com/p/spyderlib/
Spyder (previously known as Pydee) is a powerful interactive development environment for the Python language with advanced editing, interactive testing, debugging and introspection features


回答 8

对于更好的Python交互式外壳,请查看DreamPie。它不是真正的IDE(就像RStudio一样?)

For a nicer interactive shell for Python, have a look at DreamPie. It’s not really an IDE though (as RStudio seems to be?)


回答 9

Wing IDE,以及其他Python IDE(例如PyCharm和PyDev)也具有类似的功能。在Wing中,您可以在集成的Python Shell中选择并执行代码,或者如果要调试某些内容,则可以与Shell中暂停的调试程序进行交互(称为“调试探针”)。万一您正在使用matplotlib,它还提供了特殊支持,因此您可以交互使用绘图。

Wing IDE, and probably also other Python IDEs like PyCharm and PyDev have features like this. In Wing you can either select and execute code in the integrated Python Shell or if you’re debugging something you can interact with the paused debug program in a shell (called the Debug Probe). There is also special support for matplotlib, in case you’re using that, so that you can work with plots interactively.


Python mysqldb:库未加载:libmysqlclient.18.dylib

问题:Python mysqldb:库未加载:libmysqlclient.18.dylib

我刚刚在Mac OS 10.6上为python 2.7编译并安装了mysqldb。我创建了一个简单的测试文件,可以导入

import MySQLdb as mysql

首先,此命令带有红色下划线,并且信息告诉我“未解决的导入”。然后我尝试运行以下简单的python代码

import MySQLdb as mysql

def main():
    conn = mysql.connect( charset="utf8", use_unicode=True, host="localhost",user="root", passwd="",db="" )

if __name__ == '__main__'():
    main()

执行它时,我收到以下错误消息

Traceback (most recent call last):
  File "/path/to/project/Python/src/cvdv/TestMySQLdb.py", line 4, in <module>
    import MySQLdb as mysql
  File "build/bdist.macosx-10.6-intel/egg/MySQLdb/__init__.py", line 19, in <module>
    \namespace cvdv
  File "build/bdist.macosx-10.6-intel/egg/_mysql.py", line 7, in <module>
  File "build/bdist.macosx-10.6-intel/egg/_mysql.py", line 6, in __bootstrap__
ImportError: dlopen(/Users/toom/.python-eggs/MySQL_python-1.2.3-py2.7-macosx-10.6-intel.egg-tmp/_mysql.so, 2): Library not loaded: libmysqlclient.18.dylib
  Referenced from: /Users/toom/.python-eggs/MySQL_python-1.2.3-py2.7-macosx-10.6-intel.egg-tmp/_mysql.so
  Reason: image not found

解决我的问题的方法可能是什么?

编辑:实际上我发现该库位于/ usr / local / mysql / lib中。所以我需要告诉我的pydev eclipse版本在哪里找到它。我在哪里设置?

I just compiled and installed mysqldb for python 2.7 on my mac os 10.6. I created a simple test file that imports

import MySQLdb as mysql

Firstly, this command is red underlined and the info tells me “Unresolved import”. Then I tried to run the following simple python code

import MySQLdb as mysql

def main():
    conn = mysql.connect( charset="utf8", use_unicode=True, host="localhost",user="root", passwd="",db="" )

if __name__ == '__main__'():
    main()

When executing it I get the following error message

Traceback (most recent call last):
  File "/path/to/project/Python/src/cvdv/TestMySQLdb.py", line 4, in <module>
    import MySQLdb as mysql
  File "build/bdist.macosx-10.6-intel/egg/MySQLdb/__init__.py", line 19, in <module>
    \namespace cvdv
  File "build/bdist.macosx-10.6-intel/egg/_mysql.py", line 7, in <module>
  File "build/bdist.macosx-10.6-intel/egg/_mysql.py", line 6, in __bootstrap__
ImportError: dlopen(/Users/toom/.python-eggs/MySQL_python-1.2.3-py2.7-macosx-10.6-intel.egg-tmp/_mysql.so, 2): Library not loaded: libmysqlclient.18.dylib
  Referenced from: /Users/toom/.python-eggs/MySQL_python-1.2.3-py2.7-macosx-10.6-intel.egg-tmp/_mysql.so
  Reason: image not found

What might be the solution to my problem?

EDIT: Actually I found out that the library lies in /usr/local/mysql/lib. So I need to tell my pydev eclipse version where to find it. Where do I set this?


回答 0

我通过创建到库的符号链接解决了这个问题。即

实际的库位于

/usr/local/mysql/lib

然后我在其中创建了一个符号链接

/usr/lib

使用命令:

sudo ln -s /usr/local/mysql/lib/libmysqlclient.18.dylib /usr/lib/libmysqlclient.18.dylib

这样我就具有以下映射:

ls -l libmysqlclient.18.dylib 
lrwxr-xr-x  1 root  wheel  44 16 Jul 14:01 libmysqlclient.18.dylib -> /usr/local/mysql/lib/libmysqlclient.18.dylib

就是这样 之后,一切正常。

编辑:

请注意,自MacOS El Capitan以来,系统完整性保护(SIP,也称为“无根”)将阻止您在中创建链接/usr/lib/。您可以按照以下说明禁用SIP ,但可以在其中创建链接/usr/local/lib/

sudo ln -s /usr/local/mysql/lib/libmysqlclient.18.dylib /usr/local/lib/libmysqlclient.18.dylib

I solved the problem by creating a symbolic link to the library. I.e.

The actual library resides in

/usr/local/mysql/lib

And then I created a symbolic link in

/usr/lib

Using the command:

sudo ln -s /usr/local/mysql/lib/libmysqlclient.18.dylib /usr/lib/libmysqlclient.18.dylib

so that I have the following mapping:

ls -l libmysqlclient.18.dylib 
lrwxr-xr-x  1 root  wheel  44 16 Jul 14:01 libmysqlclient.18.dylib -> /usr/local/mysql/lib/libmysqlclient.18.dylib

That was it. After that everything worked fine.

EDIT:

Notice, that since MacOS El Capitan the System Integrity Protection (SIP, also known as “rootless”) will prevent you from creating links in /usr/lib/. You could disable SIP by following these instructions, but you can create a link in /usr/local/lib/ instead:

sudo ln -s /usr/local/mysql/lib/libmysqlclient.18.dylib /usr/local/lib/libmysqlclient.18.dylib

Python中的最大浮点数是多少?

问题:Python中的最大浮点数是多少?

我认为可以通过调用python中的最大整数sys.maxint

最大值floatlongPython中的最大值是多少?

I think the maximum integer in python is available by calling sys.maxint.

What is the maximum float or long in Python?


回答 0

对于float看看sys.float_info

>>> import sys
>>> sys.float_info
sys.floatinfo(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2
250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsil
on=2.2204460492503131e-16, radix=2, rounds=1)

具体来说sys.float_info.max

>>> sys.float_info.max
1.7976931348623157e+308

如果那还不够大,那么总会有正无穷大

>>> infinity = float("inf")
>>> infinity
inf
>>> infinity / 10000
inf

long类型具有无限的精度,因此我认为您仅受可用内存的限制。

For float have a look at sys.float_info:

>>> import sys
>>> sys.float_info
sys.floatinfo(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2
250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsil
on=2.2204460492503131e-16, radix=2, rounds=1)

Specifically, sys.float_info.max:

>>> sys.float_info.max
1.7976931348623157e+308

If that’s not big enough, there’s always positive infinity:

>>> infinity = float("inf")
>>> infinity
inf
>>> infinity / 10000
inf

The long type has unlimited precision, so I think you’re only limited by available memory.


回答 1

sys.maxint不是python支持的最大整数。它是python的常规整数类型支持的最大整数。

sys.maxint is not the largest integer supported by python. It’s the largest integer supported by python’s regular integer type.


回答 2

如果您使用numpy的,你可以使用D型float128 ”,并得到的最大浮动10E + 4931

>>> np.finfo(np.float128)
finfo(resolution=1e-18, min=-1.18973149536e+4932, max=1.18973149536e+4932, dtype=float128)

If you are using numpy, you can use dtypefloat128‘ and get a max float of 10e+4931

>>> np.finfo(np.float128)
finfo(resolution=1e-18, min=-1.18973149536e+4932, max=1.18973149536e+4932, dtype=float128)

按两个字段对Python列表进行排序

问题:按两个字段对Python列表进行排序

我有一个从排序的csv创建的以下列表

list1 = sorted(csv1, key=operator.itemgetter(1))

我实际上想按两个条件对列表进行排序:首先按字段1中的值,然后按字段2中的值。我该怎么做?

I have the following list created from a sorted csv

list1 = sorted(csv1, key=operator.itemgetter(1))

I would actually like to sort the list by two criteria: first by the value in field 1 and then by the value in field 2. How do I do this?


回答 0

像这样:

import operator
list1 = sorted(csv1, key=operator.itemgetter(1, 2))

like this:

import operator
list1 = sorted(csv1, key=operator.itemgetter(1, 2))

回答 1

使用lambda函数时无需导入任何内容。
以下list按第一个元素排序,然后按第二个元素排序。

sorted(list, key=lambda x: (x[0], -x[1]))

No need to import anything when using lambda functions.
The following sorts list by the first element, then by the second element.

sorted(list, key=lambda x: (x[0], -x[1]))

回答 2

Python具有稳定的排序方式,因此,只要性能不成问题,最简单的方法就是按字段2对其进行排序,然后再次按字段1对其进行排序。

这将为您提供所需的结果,唯一的陷阱是,如果列表很大(或者您希望经常对其进行排序),则两次调用sort可能是不可接受的开销。

list1 = sorted(csv1, key=operator.itemgetter(2))
list1 = sorted(list1, key=operator.itemgetter(1))

这样一来,还可以轻松处理需要对某些列进行反向排序的情况,只需在必要时添加’reverse = True’参数即可。

否则,您可以将多个参数传递给itemgetter或手动构建一个元组。这可能会更快一些,但是有一个问题,就是如果某些列想要反向排序,它不能很好地推广(数字列仍然可以通过取反来反转,但是这会使排序保持稳定)。

因此,如果您不需要对任何列进行反向排序,则可以向itemgetter输入多个参数(如果可能),并且这些列不是数字的,或者您希望保持排序稳定以进行多个连续排序。

编辑:对于在理解此答案的原始方式时遇到问题的评论者,以下示例准确显示了排序的稳定性,从而确保了我们可以对每个键进行单独的排序并最终对多个条件下的数据进行排序:

DATA = [
    ('Jones', 'Jane', 58),
    ('Smith', 'Anne', 30),
    ('Jones', 'Fred', 30),
    ('Smith', 'John', 60),
    ('Smith', 'Fred', 30),
    ('Jones', 'Anne', 30),
    ('Smith', 'Jane', 58),
    ('Smith', 'Twin2', 3),
    ('Jones', 'John', 60),
    ('Smith', 'Twin1', 3),
    ('Jones', 'Twin1', 3),
    ('Jones', 'Twin2', 3)
]

# Sort by Surname, Age DESCENDING, Firstname
print("Initial data in random order")
for d in DATA:
    print("{:10s} {:10s} {}".format(*d))

print('''
First we sort by first name, after this pass all
Twin1 come before Twin2 and Anne comes before Fred''')
DATA.sort(key=lambda row: row[1])

for d in DATA:
    print("{:10s} {:10s} {}".format(*d))

print('''
Second pass: sort by age in descending order.
Note that after this pass rows are sorted by age but
Twin1/Twin2 and Anne/Fred pairs are still in correct
firstname order.''')
DATA.sort(key=lambda row: row[2], reverse=True)
for d in DATA:
    print("{:10s} {:10s} {}".format(*d))

print('''
Final pass sorts the Jones from the Smiths.
Within each family members are sorted by age but equal
age members are sorted by first name.
''')
DATA.sort(key=lambda row: row[0])
for d in DATA:
    print("{:10s} {:10s} {}".format(*d))

这是一个可运行的示例,但是为了节省运行它的人员,输出为:

Initial data in random order
Jones      Jane       58
Smith      Anne       30
Jones      Fred       30
Smith      John       60
Smith      Fred       30
Jones      Anne       30
Smith      Jane       58
Smith      Twin2      3
Jones      John       60
Smith      Twin1      3
Jones      Twin1      3
Jones      Twin2      3

First we sort by first name, after this pass all
Twin1 come before Twin2 and Anne comes before Fred
Smith      Anne       30
Jones      Anne       30
Jones      Fred       30
Smith      Fred       30
Jones      Jane       58
Smith      Jane       58
Smith      John       60
Jones      John       60
Smith      Twin1      3
Jones      Twin1      3
Smith      Twin2      3
Jones      Twin2      3

Second pass: sort by age in descending order.
Note that after this pass rows are sorted by age but
Twin1/Twin2 and Anne/Fred pairs are still in correct
firstname order.
Smith      John       60
Jones      John       60
Jones      Jane       58
Smith      Jane       58
Smith      Anne       30
Jones      Anne       30
Jones      Fred       30
Smith      Fred       30
Smith      Twin1      3
Jones      Twin1      3
Smith      Twin2      3
Jones      Twin2      3

Final pass sorts the Jones from the Smiths.
Within each family members are sorted by age but equal
age members are sorted by first name.

Jones      John       60
Jones      Jane       58
Jones      Anne       30
Jones      Fred       30
Jones      Twin1      3
Jones      Twin2      3
Smith      John       60
Smith      Jane       58
Smith      Anne       30
Smith      Fred       30
Smith      Twin1      3
Smith      Twin2      3

特别要注意的是,在第二步中,reverse=True参数如何按顺序保留名字,而仅对列表进行排序然后反转,则会丢失第三个排序键的期望顺序。

Python has a stable sort, so provided that performance isn’t an issue the simplest way is to sort it by field 2 and then sort it again by field 1.

That will give you the result you want, the only catch is that if it is a big list (or you want to sort it often) calling sort twice might be an unacceptable overhead.

list1 = sorted(csv1, key=operator.itemgetter(2))
list1 = sorted(list1, key=operator.itemgetter(1))

Doing it this way also makes it easy to handle the situation where you want some of the columns reverse sorted, just include the ‘reverse=True’ parameter when necessary.

Otherwise you can pass multiple parameters to itemgetter or manually build a tuple. That is probably going to be faster, but has the problem that it doesn’t generalise well if some of the columns want to be reverse sorted (numeric columns can still be reversed by negating them but that stops the sort being stable).

So if you don’t need any columns reverse sorted, go for multiple arguments to itemgetter, if you might, and the columns aren’t numeric or you want to keep the sort stable go for multiple consecutive sorts.

Edit: For the commenters who have problems understanding how this answers the original question, here is an example that shows exactly how the stable nature of the sorting ensures we can do separate sorts on each key and end up with data sorted on multiple criteria:

DATA = [
    ('Jones', 'Jane', 58),
    ('Smith', 'Anne', 30),
    ('Jones', 'Fred', 30),
    ('Smith', 'John', 60),
    ('Smith', 'Fred', 30),
    ('Jones', 'Anne', 30),
    ('Smith', 'Jane', 58),
    ('Smith', 'Twin2', 3),
    ('Jones', 'John', 60),
    ('Smith', 'Twin1', 3),
    ('Jones', 'Twin1', 3),
    ('Jones', 'Twin2', 3)
]

# Sort by Surname, Age DESCENDING, Firstname
print("Initial data in random order")
for d in DATA:
    print("{:10s} {:10s} {}".format(*d))

print('''
First we sort by first name, after this pass all
Twin1 come before Twin2 and Anne comes before Fred''')
DATA.sort(key=lambda row: row[1])

for d in DATA:
    print("{:10s} {:10s} {}".format(*d))

print('''
Second pass: sort by age in descending order.
Note that after this pass rows are sorted by age but
Twin1/Twin2 and Anne/Fred pairs are still in correct
firstname order.''')
DATA.sort(key=lambda row: row[2], reverse=True)
for d in DATA:
    print("{:10s} {:10s} {}".format(*d))

print('''
Final pass sorts the Jones from the Smiths.
Within each family members are sorted by age but equal
age members are sorted by first name.
''')
DATA.sort(key=lambda row: row[0])
for d in DATA:
    print("{:10s} {:10s} {}".format(*d))

This is a runnable example, but to save people running it the output is:

Initial data in random order
Jones      Jane       58
Smith      Anne       30
Jones      Fred       30
Smith      John       60
Smith      Fred       30
Jones      Anne       30
Smith      Jane       58
Smith      Twin2      3
Jones      John       60
Smith      Twin1      3
Jones      Twin1      3
Jones      Twin2      3

First we sort by first name, after this pass all
Twin1 come before Twin2 and Anne comes before Fred
Smith      Anne       30
Jones      Anne       30
Jones      Fred       30
Smith      Fred       30
Jones      Jane       58
Smith      Jane       58
Smith      John       60
Jones      John       60
Smith      Twin1      3
Jones      Twin1      3
Smith      Twin2      3
Jones      Twin2      3

Second pass: sort by age in descending order.
Note that after this pass rows are sorted by age but
Twin1/Twin2 and Anne/Fred pairs are still in correct
firstname order.
Smith      John       60
Jones      John       60
Jones      Jane       58
Smith      Jane       58
Smith      Anne       30
Jones      Anne       30
Jones      Fred       30
Smith      Fred       30
Smith      Twin1      3
Jones      Twin1      3
Smith      Twin2      3
Jones      Twin2      3

Final pass sorts the Jones from the Smiths.
Within each family members are sorted by age but equal
age members are sorted by first name.

Jones      John       60
Jones      Jane       58
Jones      Anne       30
Jones      Fred       30
Jones      Twin1      3
Jones      Twin2      3
Smith      John       60
Smith      Jane       58
Smith      Anne       30
Smith      Fred       30
Smith      Twin1      3
Smith      Twin2      3

Note in particular how in the second step the reverse=True parameter keeps the firstnames in order whereas simply sorting then reversing the list would lose the desired order for the third sort key.


回答 3

list1 = sorted(csv1, key=lambda x: (x[1], x[2]) )
list1 = sorted(csv1, key=lambda x: (x[1], x[2]) )

回答 4

employees.sort(key = lambda x:x[1])
employees.sort(key = lambda x:x[0])

我们也可以将.sort与lambda一起使用2次,因为python sort到位且稳定。这将首先根据第二个元素x [1]对列表进行排序。然后,它将对第一个元素x [0](最高优先级)进行排序。

employees[0] = Employee's Name
employees[1] = Employee's Salary

这等效于执行以下操作:employee.sort(key = lambda x:(x [0],x [1]))

employees.sort(key = lambda x:x[1])
employees.sort(key = lambda x:x[0])

We can also use .sort with lambda 2 times because python sort is in place and stable. This will first sort the list according to the second element, x[1]. Then, it will sort the first element, x[0] (highest priority).

employees[0] = Employee's Name
employees[1] = Employee's Salary

This is equivalent to doing the following: employees.sort(key = lambda x:(x[0], x[1]))


回答 5

您可以按升序使用:

sorted_data= sorted(non_sorted_data, key=lambda k: (k[1],k[0]))

或按降序使用:

sorted_data= sorted(non_sorted_data, key=lambda k: (k[1],k[0]),reverse=True)

In ascending order you can use:

sorted_data= sorted(non_sorted_data, key=lambda k: (k[1],k[0]))

or in descending order you can use:

sorted_data= sorted(non_sorted_data, key=lambda k: (k[1],k[0]),reverse=True)

回答 6

使用下面的字典排序列表将以降序对列表进行排序,第一列为薪水,第二列为年龄

d=[{'salary':123,'age':23},{'salary':123,'age':25}]
d=sorted(d, key=lambda i: (i['salary'], i['age']),reverse=True)

输出:[{‘salary’:123,’age’:25},{‘salary’:123,’age’:23}]

Sorting list of dicts using below will sort list in descending order on first column as salary and second column as age

d=[{'salary':123,'age':23},{'salary':123,'age':25}]
d=sorted(d, key=lambda i: (i['salary'], i['age']),reverse=True)

Output: [{‘salary’: 123, ‘age’: 25}, {‘salary’: 123, ‘age’: 23}]


Python Git模块的经验?[关闭]

问题:Python Git模块的经验?[关闭]

人们对Python的任何Git模块有何经验?(我知道GitPython,PyGit和Dulwich-如果您知道其他人,请随意提及。)

我正在编写一个程序,该程序必须与Git存储库进行交互(添加,删除,提交),但是没有使用Git的经验,所以我要寻找的一件事是关于Git的易用性/理解性。

我主要感兴趣的其他内容是库的成熟度和完整性,合理的错误缺失,持续的开发以及文档和开发人员的帮助。

如果您有其他我想/需要知道的事情,请随时提及。

What are people’s experiences with any of the Git modules for Python? (I know of GitPython, PyGit, and Dulwich – feel free to mention others if you know of them.)

I am writing a program which will have to interact (add, delete, commit) with a Git repository, but have no experience with Git, so one of the things I’m looking for is ease of use/understanding with regards to Git.

The other things I’m primarily interested in are maturity and completeness of the library, a reasonable lack of bugs, continued development, and helpfulness of the documentation and developers.

If you think of something else I might want/need to know, please feel free to mention it.


回答 0

虽然这个问题是在不久前提出的,但我当时还不知道库的状态,但是值得搜索的人提到,GitPython在抽象命令行工具方面做得很好,因此您无需使用子流程。您可以使用一些有用的内置抽象,但是对于其他所有事情,您都可以执行以下操作:

import git
repo = git.Repo( '/home/me/repodir' )
print repo.git.status()
# checkout and track a remote branch
print repo.git.checkout( 'origin/somebranch', b='somebranch' )
# add a file
print repo.git.add( 'somefile' )
# commit
print repo.git.commit( m='my commit message' )
# now we are one commit ahead
print repo.git.status()

GitPython中的其他所有功能都使其更易于浏览。我对此库非常满意,并赞赏它是基础git工具的包装。

更新:我已经切换到不仅使用git,而且还使用python中需要的大多数命令行实用程序使用sh模块。为了复制上面的内容,我将改为执行以下操作:

import sh
git = sh.git.bake(_cwd='/home/me/repodir')
print git.status()
# checkout and track a remote branch
print git.checkout('-b', 'somebranch')
# add a file
print git.add('somefile')
# commit
print git.commit(m='my commit message')
# now we are one commit ahead
print git.status()

While this question was asked a while ago and I don’t know the state of the libraries at that point, it is worth mentioning for searchers that GitPython does a good job of abstracting the command line tools so that you don’t need to use subprocess. There are some useful built in abstractions that you can use, but for everything else you can do things like:

import git
repo = git.Repo( '/home/me/repodir' )
print repo.git.status()
# checkout and track a remote branch
print repo.git.checkout( 'origin/somebranch', b='somebranch' )
# add a file
print repo.git.add( 'somefile' )
# commit
print repo.git.commit( m='my commit message' )
# now we are one commit ahead
print repo.git.status()

Everything else in GitPython just makes it easier to navigate. I’m fairly well satisfied with this library and appreciate that it is a wrapper on the underlying git tools.

UPDATE: I’ve switched to using the sh module for not just git but most commandline utilities I need in python. To replicate the above I would do this instead:

import sh
git = sh.git.bake(_cwd='/home/me/repodir')
print git.status()
# checkout and track a remote branch
print git.checkout('-b', 'somebranch')
# add a file
print git.add('somefile')
# commit
print git.commit(m='my commit message')
# now we are one commit ahead
print git.status()

回答 1

我以为我会回答自己的问题,因为我所采取的途径与答案中所建议的不同。尽管如此,感谢那些回答。

首先,简要介绍一下我在GitPython,PyGit和Dulwich的经验:

  • GitPython:下载后,我将其导入并初始化了适当的对象。但是,尝试执行本教程中建议的操作会导致错误。缺乏更多文档,我转向其他地方。
  • PyGit:这甚至都不会导入,而且我找不到任何文档。
  • 德威:似乎是最有前途的(至少就我想要和看到的而言)。与GitPython相比,我在其中取得了一些进步,因为它的卵来自Python源。但是,过了一会儿,我决定尝试做一下可能会更容易。

同样,StGit看起来很有趣,但是我需要将功能提取到一个单独的模块中,并且不希望现在等待它发生。

在比使上面的三个模块正常工作所花费的时间少得多的时间内,我设法通过子流程模块使git命令起作用,例如

def gitAdd(fileName, repoDir):
    cmd = ['git', 'add', fileName]
    p = subprocess.Popen(cmd, cwd=repoDir)
    p.wait()

gitAdd('exampleFile.txt', '/usr/local/example_git_repo_dir')

这还没有完全集成到我的程序中,但是除了速度(我有时会处理数百甚至数千个文件)之外,我没有预料到任何问题。

也许我只是没有耐心让Dulwich或GitPython正常运行。就是说,我希望这些模块能够得到更多的开发并且很快会有用。

I thought I would answer my own question, since I’m taking a different path than suggested in the answers. Nonetheless, thanks to those who answered.

First, a brief synopsis of my experiences with GitPython, PyGit, and Dulwich:

  • GitPython: After downloading, I got this imported and the appropriate object initialized. However, trying to do what was suggested in the tutorial led to errors. Lacking more documentation, I turned elsewhere.
  • PyGit: This would not even import, and I could find no documentation.
  • Dulwich: Seems to be the most promising (at least for what I wanted and saw). I made some progress with it, more than with GitPython, since its egg comes with Python source. However, after a while, I decided it may just be easier to try what I did.

Also, StGit looks interesting, but I would need the functionality extracted into a separate module and do not want wait for that to happen right now.

In (much) less time than I spent trying to get the three modules above working, I managed to get git commands working via the subprocess module, e.g.

def gitAdd(fileName, repoDir):
    cmd = ['git', 'add', fileName]
    p = subprocess.Popen(cmd, cwd=repoDir)
    p.wait()

gitAdd('exampleFile.txt', '/usr/local/example_git_repo_dir')

This isn’t fully incorporated into my program yet, but I’m not anticipating a problem, except maybe speed (since I’ll be processing hundreds or even thousands of files at times).

Maybe I just didn’t have the patience to get things going with Dulwich or GitPython. That said, I’m hopeful the modules will get more development and be more useful soon.


回答 2

我建议pygit2-它使用出色的libgit2绑定

I’d recommend pygit2 – it uses the excellent libgit2 bindings


回答 3

这是一个非常老的问题,在寻找Git库时,我发现了今年(2013年)制造的一个名为Gittle的库

它对我很有用(我尝试过的其他地方都比较薄弱),而且似乎涵盖了大多数常见操作。

自述文件中的一些示例:

from gittle import Gittle

# Clone a repository
repo_path = '/tmp/gittle_bare'
repo_url = 'git://github.com/FriendCode/gittle.git'
repo = Gittle.clone(repo_url, repo_path)

# Stage multiple files
repo.stage(['other1.txt', 'other2.txt'])

# Do the commit
repo.commit(name="Samy Pesse", email="samy@friendco.de", message="This is a commit")

# Authentication with RSA private key
key_file = open('/Users/Me/keys/rsa/private_rsa')
repo.auth(pkey=key_file)

# Do push
repo.push()

This is a pretty old question, and while looking for Git libraries, I found one that was made this year (2013) called Gittle.

It worked great for me (where the others I tried were flaky), and seems to cover most of the common actions.

Some examples from the README:

from gittle import Gittle

# Clone a repository
repo_path = '/tmp/gittle_bare'
repo_url = 'git://github.com/FriendCode/gittle.git'
repo = Gittle.clone(repo_url, repo_path)

# Stage multiple files
repo.stage(['other1.txt', 'other2.txt'])

# Do the commit
repo.commit(name="Samy Pesse", email="samy@friendco.de", message="This is a commit")

# Authentication with RSA private key
key_file = open('/Users/Me/keys/rsa/private_rsa')
repo.auth(pkey=key_file)

# Do push
repo.push()

回答 4

也许有帮助,但是Bazaar和Mercurial都使用dulwich来实现Git的互操作性。

Dulwich在某种意义上可能与另一个有所不同,因为它是python中git的重新实现。另一个可能只是Git命令的包装器(因此从较高的角度来看,它可能更易于使用:commit / add / delete),这可能意味着它们的API与git的命令行非常接近,因此您需要获得有关Git的经验。

Maybe it helps, but Bazaar and Mercurial are both using dulwich for their Git interoperability.

Dulwich is probably different than the other in the sense that’s it’s a reimplementation of git in python. The other might just be a wrapper around Git’s commands (so it could be simpler to use from a high level point of view: commit/add/delete), it probably means their API is very close to git’s command line so you’ll need to gain experience with Git.


回答 5

更新的答案反映了更改的时间:

GitPython当前是最容易使用的。它支持许多git plumbing命令的包装,并具有可插入的对象数据库(其中的一个是德威奇),并且如果未实现命令,则提供了一个简单的api,可以用于命令行。例如:

repo = Repo('.')
repo.checkout(b='new_branch')

这调用:

bash$ git checkout -b new_branch

德威也不错,但水平要低得多。使用它有点痛苦,因为它需要在管道级上对git对象进行操作,并且没有通常想要的精美瓷器。但是,如果您打算修改git的任何部分,或者使用git-receive-pack和git-upload-pack,则需要使用dulwich。

An updated answer reflecting changed times:

GitPython currently is the easiest to use. It supports wrapping of many git plumbing commands and has pluggable object database (dulwich being one of them), and if a command isn’t implemented, provides an easy api for shelling out to the command line. For example:

repo = Repo('.')
repo.checkout(b='new_branch')

This calls:

bash$ git checkout -b new_branch

Dulwich is also good but much lower level. It’s somewhat of a pain to use because it requires operating on git objects at the plumbing level and doesn’t have nice porcelain that you’d normally want to do. However, if you plan on modifying any parts of git, or use git-receive-pack and git-upload-pack, you need to use dulwich.


回答 6

为了完整起见,http://github.com/alex/pyvcs/是所有dvc的抽象层。它使用dulwich,但与其他dvc提供互操作。

For the sake of completeness, http://github.com/alex/pyvcs/ is an abstraction layer for all dvcs’s. It uses dulwich, but provides interop with the other dvcs’s.


回答 7

这是“ git status”的真正快速实现:

import os
import string
from subprocess import *

repoDir = '/Users/foo/project'

def command(x):
    return str(Popen(x.split(' '), stdout=PIPE).communicate()[0])

def rm_empty(L): return [l for l in L if (l and l!="")]

def getUntracked():
    os.chdir(repoDir)
    status = command("git status")
    if "# Untracked files:" in status:
        untf = status.split("# Untracked files:")[1][1:].split("\n")
        return rm_empty([x[2:] for x in untf if string.strip(x) != "#" and x.startswith("#\t")])
    else:
        return []

def getNew():
    os.chdir(repoDir)
    status = command("git status").split("\n")
    return [x[14:] for x in status if x.startswith("#\tnew file:   ")]

def getModified():
    os.chdir(repoDir)
    status = command("git status").split("\n")
    return [x[14:] for x in status if x.startswith("#\tmodified:   ")]

print("Untracked:")
print( getUntracked() )
print("New:")
print( getNew() )
print("Modified:")
print( getModified() )

Here’s a really quick implementation of “git status”:

import os
import string
from subprocess import *

repoDir = '/Users/foo/project'

def command(x):
    return str(Popen(x.split(' '), stdout=PIPE).communicate()[0])

def rm_empty(L): return [l for l in L if (l and l!="")]

def getUntracked():
    os.chdir(repoDir)
    status = command("git status")
    if "# Untracked files:" in status:
        untf = status.split("# Untracked files:")[1][1:].split("\n")
        return rm_empty([x[2:] for x in untf if string.strip(x) != "#" and x.startswith("#\t")])
    else:
        return []

def getNew():
    os.chdir(repoDir)
    status = command("git status").split("\n")
    return [x[14:] for x in status if x.startswith("#\tnew file:   ")]

def getModified():
    os.chdir(repoDir)
    status = command("git status").split("\n")
    return [x[14:] for x in status if x.startswith("#\tmodified:   ")]

print("Untracked:")
print( getUntracked() )
print("New:")
print( getNew() )
print("Modified:")
print( getModified() )

回答 8

PTBNL的答案对我来说非常完美。我为Windows用户提供了更多功能。

import time
import subprocess
def gitAdd(fileName, repoDir):
    cmd = 'git add ' + fileName
    pipe = subprocess.Popen(cmd, shell=True, cwd=repoDir,stdout = subprocess.PIPE,stderr = subprocess.PIPE )
    (out, error) = pipe.communicate()
    print out,error
    pipe.wait()
    return 

def gitCommit(commitMessage, repoDir):
    cmd = 'git commit -am "%s"'%commitMessage
    pipe = subprocess.Popen(cmd, shell=True, cwd=repoDir,stdout = subprocess.PIPE,stderr = subprocess.PIPE )
    (out, error) = pipe.communicate()
    print out,error
    pipe.wait()
    return 
def gitPush(repoDir):
    cmd = 'git push '
    pipe = subprocess.Popen(cmd, shell=True, cwd=repoDir,stdout = subprocess.PIPE,stderr = subprocess.PIPE )
    (out, error) = pipe.communicate()
    pipe.wait()
    return 

temp=time.localtime(time.time())
uploaddate= str(temp[0])+'_'+str(temp[1])+'_'+str(temp[2])+'_'+str(temp[3])+'_'+str(temp[4])

repoDir='d:\\c_Billy\\vfat\\Programming\\Projector\\billyccm' # your git repository , windows your need to use double backslash for right directory.
gitAdd('.',repoDir )
gitCommit(uploaddate, repoDir)
gitPush(repoDir)

PTBNL’s Answer is quite perfect for me. I make a little more for Windows user.

import time
import subprocess
def gitAdd(fileName, repoDir):
    cmd = 'git add ' + fileName
    pipe = subprocess.Popen(cmd, shell=True, cwd=repoDir,stdout = subprocess.PIPE,stderr = subprocess.PIPE )
    (out, error) = pipe.communicate()
    print out,error
    pipe.wait()
    return 

def gitCommit(commitMessage, repoDir):
    cmd = 'git commit -am "%s"'%commitMessage
    pipe = subprocess.Popen(cmd, shell=True, cwd=repoDir,stdout = subprocess.PIPE,stderr = subprocess.PIPE )
    (out, error) = pipe.communicate()
    print out,error
    pipe.wait()
    return 
def gitPush(repoDir):
    cmd = 'git push '
    pipe = subprocess.Popen(cmd, shell=True, cwd=repoDir,stdout = subprocess.PIPE,stderr = subprocess.PIPE )
    (out, error) = pipe.communicate()
    pipe.wait()
    return 

temp=time.localtime(time.time())
uploaddate= str(temp[0])+'_'+str(temp[1])+'_'+str(temp[2])+'_'+str(temp[3])+'_'+str(temp[4])

repoDir='d:\\c_Billy\\vfat\\Programming\\Projector\\billyccm' # your git repository , windows your need to use double backslash for right directory.
gitAdd('.',repoDir )
gitCommit(uploaddate, repoDir)
gitPush(repoDir)

回答 9

StGit的git交互库部分实际上非常好。但是,它不是作为单独的软件包分解的,但是,如果有足够的兴趣,我相信可以解决。

它具有非常好的抽象,用于表示提交,树等,以及用于创建新的提交和树。

The git interaction library part of StGit is actually pretty good. However, it isn’t broken out as a separate package but if there is sufficient interest, I’m sure that can be fixed.

It has very nice abstractions for representing commits, trees etc, and for creating new commits and trees.


回答 10

记录下来,前面提到的Git Python库似乎都没有包含“ git status”等效项,这实际上是我唯一想要的,因为通过子进程处理其余git命令非常容易。

For the record, none of the aforementioned Git Python libraries seem to contain a “git status” equivalent, which is really the only thing I would want since dealing with the rest of the git commands via subprocess is so easy.


是否可以使用scikit-learn K-Means聚类指定自己的距离函数?

问题:是否可以使用scikit-learn K-Means聚类指定自己的距离函数?

是否可以使用scikit-learn K-Means聚类指定自己的距离函数?

Is it possible to specify your own distance function using scikit-learn K-Means Clustering?


回答 0

这是一个小型的kmean,使用scipy.spatial.distance或用户函数中的20多个距离中的 任意一个。
欢迎发表评论(到目前为止,只有一位用户,这还不够);特别是,您的N,dim,k公制是什么?

#!/usr/bin/env python
# kmeans.py using any of the 20-odd metrics in scipy.spatial.distance
# kmeanssample 2 pass, first sample sqrt(N)

from __future__ import division
import random
import numpy as np
from scipy.spatial.distance import cdist  # $scipy/spatial/distance.py
    # http://docs.scipy.org/doc/scipy/reference/spatial.html
from scipy.sparse import issparse  # $scipy/sparse/csr.py

__date__ = "2011-11-17 Nov denis"
    # X sparse, any cdist metric: real app ?
    # centres get dense rapidly, metrics in high dim hit distance whiteout
    # vs unsupervised / semi-supervised svm

#...............................................................................
def kmeans( X, centres, delta=.001, maxiter=10, metric="euclidean", p=2, verbose=1 ):
    """ centres, Xtocentre, distances = kmeans( X, initial centres ... )
    in:
        X N x dim  may be sparse
        centres k x dim: initial centres, e.g. random.sample( X, k )
        delta: relative error, iterate until the average distance to centres
            is within delta of the previous average distance
        maxiter
        metric: any of the 20-odd in scipy.spatial.distance
            "chebyshev" = max, "cityblock" = L1, "minkowski" with p=
            or a function( Xvec, centrevec ), e.g. Lqmetric below
        p: for minkowski metric -- local mod cdist for 0 < p < 1 too
        verbose: 0 silent, 2 prints running distances
    out:
        centres, k x dim
        Xtocentre: each X -> its nearest centre, ints N -> k
        distances, N
    see also: kmeanssample below, class Kmeans below.
    """
    if not issparse(X):
        X = np.asanyarray(X)  # ?
    centres = centres.todense() if issparse(centres) \
        else centres.copy()
    N, dim = X.shape
    k, cdim = centres.shape
    if dim != cdim:
        raise ValueError( "kmeans: X %s and centres %s must have the same number of columns" % (
            X.shape, centres.shape ))
    if verbose:
        print "kmeans: X %s  centres %s  delta=%.2g  maxiter=%d  metric=%s" % (
            X.shape, centres.shape, delta, maxiter, metric)
    allx = np.arange(N)
    prevdist = 0
    for jiter in range( 1, maxiter+1 ):
        D = cdist_sparse( X, centres, metric=metric, p=p )  # |X| x |centres|
        xtoc = D.argmin(axis=1)  # X -> nearest centre
        distances = D[allx,xtoc]
        avdist = distances.mean()  # median ?
        if verbose >= 2:
            print "kmeans: av |X - nearest centre| = %.4g" % avdist
        if (1 - delta) * prevdist <= avdist <= prevdist \
        or jiter == maxiter:
            break
        prevdist = avdist
        for jc in range(k):  # (1 pass in C)
            c = np.where( xtoc == jc )[0]
            if len(c) > 0:
                centres[jc] = X[c].mean( axis=0 )
    if verbose:
        print "kmeans: %d iterations  cluster sizes:" % jiter, np.bincount(xtoc)
    if verbose >= 2:
        r50 = np.zeros(k)
        r90 = np.zeros(k)
        for j in range(k):
            dist = distances[ xtoc == j ]
            if len(dist) > 0:
                r50[j], r90[j] = np.percentile( dist, (50, 90) )
        print "kmeans: cluster 50 % radius", r50.astype(int)
        print "kmeans: cluster 90 % radius", r90.astype(int)
            # scale L1 / dim, L2 / sqrt(dim) ?
    return centres, xtoc, distances

#...............................................................................
def kmeanssample( X, k, nsample=0, **kwargs ):
    """ 2-pass kmeans, fast for large N:
        1) kmeans a random sample of nsample ~ sqrt(N) from X
        2) full kmeans, starting from those centres
    """
        # merge w kmeans ? mttiw
        # v large N: sample N^1/2, N^1/2 of that
        # seed like sklearn ?
    N, dim = X.shape
    if nsample == 0:
        nsample = max( 2*np.sqrt(N), 10*k )
    Xsample = randomsample( X, int(nsample) )
    pass1centres = randomsample( X, int(k) )
    samplecentres = kmeans( Xsample, pass1centres, **kwargs )[0]
    return kmeans( X, samplecentres, **kwargs )

def cdist_sparse( X, Y, **kwargs ):
    """ -> |X| x |Y| cdist array, any cdist metric
        X or Y may be sparse -- best csr
    """
        # todense row at a time, v slow if both v sparse
    sxy = 2*issparse(X) + issparse(Y)
    if sxy == 0:
        return cdist( X, Y, **kwargs )
    d = np.empty( (X.shape[0], Y.shape[0]), np.float64 )
    if sxy == 2:
        for j, x in enumerate(X):
            d[j] = cdist( x.todense(), Y, **kwargs ) [0]
    elif sxy == 1:
        for k, y in enumerate(Y):
            d[:,k] = cdist( X, y.todense(), **kwargs ) [0]
    else:
        for j, x in enumerate(X):
            for k, y in enumerate(Y):
                d[j,k] = cdist( x.todense(), y.todense(), **kwargs ) [0]
    return d

def randomsample( X, n ):
    """ random.sample of the rows of X
        X may be sparse -- best csr
    """
    sampleix = random.sample( xrange( X.shape[0] ), int(n) )
    return X[sampleix]

def nearestcentres( X, centres, metric="euclidean", p=2 ):
    """ each X -> nearest centre, any metric
            euclidean2 (~ withinss) is more sensitive to outliers,
            cityblock (manhattan, L1) less sensitive
    """
    D = cdist( X, centres, metric=metric, p=p )  # |X| x |centres|
    return D.argmin(axis=1)

def Lqmetric( x, y=None, q=.5 ):
    # yes a metric, may increase weight of near matches; see ...
    return (np.abs(x - y) ** q) .mean() if y is not None \
        else (np.abs(x) ** q) .mean()

#...............................................................................
class Kmeans:
    """ km = Kmeans( X, k= or centres=, ... )
        in: either initial centres= for kmeans
            or k= [nsample=] for kmeanssample
        out: km.centres, km.Xtocentre, km.distances
        iterator:
            for jcentre, J in km:
                clustercentre = centres[jcentre]
                J indexes e.g. X[J], classes[J]
    """
    def __init__( self, X, k=0, centres=None, nsample=0, **kwargs ):
        self.X = X
        if centres is None:
            self.centres, self.Xtocentre, self.distances = kmeanssample(
                X, k=k, nsample=nsample, **kwargs )
        else:
            self.centres, self.Xtocentre, self.distances = kmeans(
                X, centres, **kwargs )

    def __iter__(self):
        for jc in range(len(self.centres)):
            yield jc, (self.Xtocentre == jc)

#...............................................................................
if __name__ == "__main__":
    import random
    import sys
    from time import time

    N = 10000
    dim = 10
    ncluster = 10
    kmsample = 100  # 0: random centres, > 0: kmeanssample
    kmdelta = .001
    kmiter = 10
    metric = "cityblock"  # "chebyshev" = max, "cityblock" L1,  Lqmetric
    seed = 1

    exec( "\n".join( sys.argv[1:] ))  # run this.py N= ...
    np.set_printoptions( 1, threshold=200, edgeitems=5, suppress=True )
    np.random.seed(seed)
    random.seed(seed)

    print "N %d  dim %d  ncluster %d  kmsample %d  metric %s" % (
        N, dim, ncluster, kmsample, metric)
    X = np.random.exponential( size=(N,dim) )
        # cf scikits-learn datasets/
    t0 = time()
    if kmsample > 0:
        centres, xtoc, dist = kmeanssample( X, ncluster, nsample=kmsample,
            delta=kmdelta, maxiter=kmiter, metric=metric, verbose=2 )
    else:
        randomcentres = randomsample( X, ncluster )
        centres, xtoc, dist = kmeans( X, randomcentres,
            delta=kmdelta, maxiter=kmiter, metric=metric, verbose=2 )
    print "%.0f msec" % ((time() - t0) * 1000)

    # also ~/py/np/kmeans/test-kmeans.py

2012年3月26日添加了一些注意事项:

1)对于余弦距离,首先将所有数据向量归一化为| X | = 1; 然后

cosinedistance( X, Y ) = 1 - X . Y = Euclidean distance |X - Y|^2 / 2

很快 对于位向量,请将规范与向量分开,而不是扩展为浮点数(尽管某些程序可能会为您扩展)。对于稀疏向量,说N,X的1%。Y应该花费时间O(2%N),空间O(N); 但我不知道哪个程序可以做到这一点。

2) Scikit学习集群 很好地概述了k均值,mini-batch-k均值…以及适用于scipy.sparse矩阵的代码。

3)务必在k均值之后检查群集大小。如果您期望群集大小大致相等,但它们出来了 [44 37 9 5 5] %……(令人头疼的声音)。

Here’s a small kmeans that uses any of the 20-odd distances in scipy.spatial.distance, or a user function.
Comments would be welcome (this has had only one user so far, not enough); in particular, what are your N, dim, k, metric ?

#!/usr/bin/env python
# kmeans.py using any of the 20-odd metrics in scipy.spatial.distance
# kmeanssample 2 pass, first sample sqrt(N)

from __future__ import division
import random
import numpy as np
from scipy.spatial.distance import cdist  # $scipy/spatial/distance.py
    # http://docs.scipy.org/doc/scipy/reference/spatial.html
from scipy.sparse import issparse  # $scipy/sparse/csr.py

__date__ = "2011-11-17 Nov denis"
    # X sparse, any cdist metric: real app ?
    # centres get dense rapidly, metrics in high dim hit distance whiteout
    # vs unsupervised / semi-supervised svm

#...............................................................................
def kmeans( X, centres, delta=.001, maxiter=10, metric="euclidean", p=2, verbose=1 ):
    """ centres, Xtocentre, distances = kmeans( X, initial centres ... )
    in:
        X N x dim  may be sparse
        centres k x dim: initial centres, e.g. random.sample( X, k )
        delta: relative error, iterate until the average distance to centres
            is within delta of the previous average distance
        maxiter
        metric: any of the 20-odd in scipy.spatial.distance
            "chebyshev" = max, "cityblock" = L1, "minkowski" with p=
            or a function( Xvec, centrevec ), e.g. Lqmetric below
        p: for minkowski metric -- local mod cdist for 0 < p < 1 too
        verbose: 0 silent, 2 prints running distances
    out:
        centres, k x dim
        Xtocentre: each X -> its nearest centre, ints N -> k
        distances, N
    see also: kmeanssample below, class Kmeans below.
    """
    if not issparse(X):
        X = np.asanyarray(X)  # ?
    centres = centres.todense() if issparse(centres) \
        else centres.copy()
    N, dim = X.shape
    k, cdim = centres.shape
    if dim != cdim:
        raise ValueError( "kmeans: X %s and centres %s must have the same number of columns" % (
            X.shape, centres.shape ))
    if verbose:
        print "kmeans: X %s  centres %s  delta=%.2g  maxiter=%d  metric=%s" % (
            X.shape, centres.shape, delta, maxiter, metric)
    allx = np.arange(N)
    prevdist = 0
    for jiter in range( 1, maxiter+1 ):
        D = cdist_sparse( X, centres, metric=metric, p=p )  # |X| x |centres|
        xtoc = D.argmin(axis=1)  # X -> nearest centre
        distances = D[allx,xtoc]
        avdist = distances.mean()  # median ?
        if verbose >= 2:
            print "kmeans: av |X - nearest centre| = %.4g" % avdist
        if (1 - delta) * prevdist <= avdist <= prevdist \
        or jiter == maxiter:
            break
        prevdist = avdist
        for jc in range(k):  # (1 pass in C)
            c = np.where( xtoc == jc )[0]
            if len(c) > 0:
                centres[jc] = X[c].mean( axis=0 )
    if verbose:
        print "kmeans: %d iterations  cluster sizes:" % jiter, np.bincount(xtoc)
    if verbose >= 2:
        r50 = np.zeros(k)
        r90 = np.zeros(k)
        for j in range(k):
            dist = distances[ xtoc == j ]
            if len(dist) > 0:
                r50[j], r90[j] = np.percentile( dist, (50, 90) )
        print "kmeans: cluster 50 % radius", r50.astype(int)
        print "kmeans: cluster 90 % radius", r90.astype(int)
            # scale L1 / dim, L2 / sqrt(dim) ?
    return centres, xtoc, distances

#...............................................................................
def kmeanssample( X, k, nsample=0, **kwargs ):
    """ 2-pass kmeans, fast for large N:
        1) kmeans a random sample of nsample ~ sqrt(N) from X
        2) full kmeans, starting from those centres
    """
        # merge w kmeans ? mttiw
        # v large N: sample N^1/2, N^1/2 of that
        # seed like sklearn ?
    N, dim = X.shape
    if nsample == 0:
        nsample = max( 2*np.sqrt(N), 10*k )
    Xsample = randomsample( X, int(nsample) )
    pass1centres = randomsample( X, int(k) )
    samplecentres = kmeans( Xsample, pass1centres, **kwargs )[0]
    return kmeans( X, samplecentres, **kwargs )

def cdist_sparse( X, Y, **kwargs ):
    """ -> |X| x |Y| cdist array, any cdist metric
        X or Y may be sparse -- best csr
    """
        # todense row at a time, v slow if both v sparse
    sxy = 2*issparse(X) + issparse(Y)
    if sxy == 0:
        return cdist( X, Y, **kwargs )
    d = np.empty( (X.shape[0], Y.shape[0]), np.float64 )
    if sxy == 2:
        for j, x in enumerate(X):
            d[j] = cdist( x.todense(), Y, **kwargs ) [0]
    elif sxy == 1:
        for k, y in enumerate(Y):
            d[:,k] = cdist( X, y.todense(), **kwargs ) [0]
    else:
        for j, x in enumerate(X):
            for k, y in enumerate(Y):
                d[j,k] = cdist( x.todense(), y.todense(), **kwargs ) [0]
    return d

def randomsample( X, n ):
    """ random.sample of the rows of X
        X may be sparse -- best csr
    """
    sampleix = random.sample( xrange( X.shape[0] ), int(n) )
    return X[sampleix]

def nearestcentres( X, centres, metric="euclidean", p=2 ):
    """ each X -> nearest centre, any metric
            euclidean2 (~ withinss) is more sensitive to outliers,
            cityblock (manhattan, L1) less sensitive
    """
    D = cdist( X, centres, metric=metric, p=p )  # |X| x |centres|
    return D.argmin(axis=1)

def Lqmetric( x, y=None, q=.5 ):
    # yes a metric, may increase weight of near matches; see ...
    return (np.abs(x - y) ** q) .mean() if y is not None \
        else (np.abs(x) ** q) .mean()

#...............................................................................
class Kmeans:
    """ km = Kmeans( X, k= or centres=, ... )
        in: either initial centres= for kmeans
            or k= [nsample=] for kmeanssample
        out: km.centres, km.Xtocentre, km.distances
        iterator:
            for jcentre, J in km:
                clustercentre = centres[jcentre]
                J indexes e.g. X[J], classes[J]
    """
    def __init__( self, X, k=0, centres=None, nsample=0, **kwargs ):
        self.X = X
        if centres is None:
            self.centres, self.Xtocentre, self.distances = kmeanssample(
                X, k=k, nsample=nsample, **kwargs )
        else:
            self.centres, self.Xtocentre, self.distances = kmeans(
                X, centres, **kwargs )

    def __iter__(self):
        for jc in range(len(self.centres)):
            yield jc, (self.Xtocentre == jc)

#...............................................................................
if __name__ == "__main__":
    import random
    import sys
    from time import time

    N = 10000
    dim = 10
    ncluster = 10
    kmsample = 100  # 0: random centres, > 0: kmeanssample
    kmdelta = .001
    kmiter = 10
    metric = "cityblock"  # "chebyshev" = max, "cityblock" L1,  Lqmetric
    seed = 1

    exec( "\n".join( sys.argv[1:] ))  # run this.py N= ...
    np.set_printoptions( 1, threshold=200, edgeitems=5, suppress=True )
    np.random.seed(seed)
    random.seed(seed)

    print "N %d  dim %d  ncluster %d  kmsample %d  metric %s" % (
        N, dim, ncluster, kmsample, metric)
    X = np.random.exponential( size=(N,dim) )
        # cf scikits-learn datasets/
    t0 = time()
    if kmsample > 0:
        centres, xtoc, dist = kmeanssample( X, ncluster, nsample=kmsample,
            delta=kmdelta, maxiter=kmiter, metric=metric, verbose=2 )
    else:
        randomcentres = randomsample( X, ncluster )
        centres, xtoc, dist = kmeans( X, randomcentres,
            delta=kmdelta, maxiter=kmiter, metric=metric, verbose=2 )
    print "%.0f msec" % ((time() - t0) * 1000)

    # also ~/py/np/kmeans/test-kmeans.py

Some notes added 26mar 2012:

1) for cosine distance, first normalize all the data vectors to |X| = 1; then

cosinedistance( X, Y ) = 1 - X . Y = Euclidean distance |X - Y|^2 / 2

is fast. For bit vectors, keep the norms separately from the vectors instead of expanding out to floats (although some programs may expand for you). For sparse vectors, say 1 % of N, X . Y should take time O( 2 % N ), space O(N); but I don’t know which programs do that.

2) Scikit-learn clustering gives an excellent overview of k-means, mini-batch-k-means … with code that works on scipy.sparse matrices.

3) Always check cluster sizes after k-means. If you’re expecting roughly equal-sized clusters, but they come out [44 37 9 5 5] % … (sound of head-scratching).


回答 1

不幸的是,没有:scikit-learn当前的k-means实现仅使用欧几里得距离。

将k均值扩展到其他距离并非易事,并且denis的上述回答并不是为其他度量实施k均值的正确方法。

Unfortunately no: scikit-learn current implementation of k-means only uses Euclidean distances.

It is not trivial to extend k-means to other distances and denis’ answer above is not the correct way to implement k-means for other metrics.


回答 2

只需在可以执行此操作的地方使用nltk即可,例如

from nltk.cluster.kmeans import KMeansClusterer
NUM_CLUSTERS = <choose a value>
data = <sparse matrix that you would normally give to scikit>.toarray()

kclusterer = KMeansClusterer(NUM_CLUSTERS, distance=nltk.cluster.util.cosine_distance, repeats=25)
assigned_clusters = kclusterer.cluster(data, assign_clusters=True)

Just use nltk instead where you can do this, e.g.

from nltk.cluster.kmeans import KMeansClusterer
NUM_CLUSTERS = <choose a value>
data = <sparse matrix that you would normally give to scikit>.toarray()

kclusterer = KMeansClusterer(NUM_CLUSTERS, distance=nltk.cluster.util.cosine_distance, repeats=25)
assigned_clusters = kclusterer.cluster(data, assign_clusters=True)

回答 3

是的,您可以使用差异度量功能;但是,根据定义,k均值聚类算法依赖于距每个聚类均值的eucldiean距离。

您可以使用其他指标,因此即使您仍在计算均值,也可以使用诸如马氏距离之类的值。

Yes you can use a difference metric function; however, by definition, the k-means clustering algorithm relies on the eucldiean distance from the mean of each cluster.

You could use a different metric, so even though you are still calculating the mean you could use something like the mahalnobis distance.


回答 4

pyclustering,它是python / C ++(非常快!),可让您指定自定义指标函数

from pyclustering.cluster.kmeans import kmeans
from pyclustering.utils.metric import type_metric, distance_metric

user_function = lambda point1, point2: point1[0] + point2[0] + 2
metric = distance_metric(type_metric.USER_DEFINED, func=user_function)

# create K-Means algorithm with specific distance metric
start_centers = [[4.7, 5.9], [5.7, 6.5]];
kmeans_instance = kmeans(sample, start_centers, metric=metric)

# run cluster analysis and obtain results
kmeans_instance.process()
clusters = kmeans_instance.get_clusters()

实际上,我还没有测试过此代码,而是从票证示例代码中将其拼凑在一起。

There is pyclustering which is python/C++ (so its fast!) and lets you specify a custom metric function

from pyclustering.cluster.kmeans import kmeans
from pyclustering.utils.metric import type_metric, distance_metric

user_function = lambda point1, point2: point1[0] + point2[0] + 2
metric = distance_metric(type_metric.USER_DEFINED, func=user_function)

# create K-Means algorithm with specific distance metric
start_centers = [[4.7, 5.9], [5.7, 6.5]];
kmeans_instance = kmeans(sample, start_centers, metric=metric)

# run cluster analysis and obtain results
kmeans_instance.process()
clusters = kmeans_instance.get_clusters()

Actually, i haven’t tested this code but cobbled it together from a ticket and example code.


回答 5

Spectral Python的k均值允许使用L1(曼哈顿)距离。

k-means of Spectral Python allows the use of L1 (Manhattan) distance.


回答 6

Sklearn Kmeans使用欧几里德距离。它没有指标参数。这就是说,如果你聚类的时间序列,你可以使用tslearnPython包时,你可以指定一个度量标准(dtwsoftdtweuclidean)。

Sklearn Kmeans uses the Euclidean distance. It has no metric parameter. This said, if you’re clustering time series, you can use the tslearn python package, when you can specify a metric (dtw, softdtw, euclidean).


sys.stdout.flush()方法的用法

问题:sys.stdout.flush()方法的用法

怎么sys.stdout.flush()办?

What does sys.stdout.flush() do?


回答 0

Python的标准输出被缓冲(这意味着它在将标准写入之前将其收集的一些数据“写入”到标准输出中)。调用会sys.stdout.flush()强制其“刷新”缓冲区,这意味着它将把缓冲区中的所有内容都写到终端,即使通常情况下它会等待这样做。

以下是有关(非)缓冲I / O及其有用之处的一些良好信息:
http : //en.wikipedia.org/wiki/Data_buffer
缓冲与无缓冲IO

Python’s standard out is buffered (meaning that it collects some of the data “written” to standard out before it writes it to the terminal). Calling sys.stdout.flush() forces it to “flush” the buffer, meaning that it will write everything in the buffer to the terminal, even if normally it would wait before doing so.

Here’s some good information about (un)buffered I/O and why it’s useful:
http://en.wikipedia.org/wiki/Data_buffer
Buffered vs unbuffered IO


回答 1

考虑以下简单的Python脚本:

import time
import sys

for i in range(5):
    print(i),
    #sys.stdout.flush()
    time.sleep(1)

这是为了打印每秒五秒钟一个号码,你要是跑不过它,因为它是现在(取决于默认的系统缓存),你可能看不到任何输出,直到脚本完成,然后一下子你会看到0 1 2 3 4印到屏幕。

这是因为输出正在缓冲中,除非sys.stdout每次刷新后print您都不会立即看到输出。从sys.stdout.flush()行中删除注释以查看区别。

Consider the following simple Python script:

import time
import sys

for i in range(5):
    print(i),
    #sys.stdout.flush()
    time.sleep(1)

This is designed to print one number every second for five seconds, but if you run it as it is now (depending on your default system buffering) you may not see any output until the script completes, and then all at once you will see 0 1 2 3 4 printed to the screen.

This is because the output is being buffered, and unless you flush sys.stdout after each print you won’t see the output immediately. Remove the comment from the sys.stdout.flush() line to see the difference.


回答 2

根据我的理解,无论何时执行打印语句,输出都会写入缓冲区。当刷新缓冲区(清除)时,我们将在屏幕上看到输出。默认情况下,程序退出时将刷新缓冲区。但是我们也可以通过在程序中使用“ sys.stdout.flush()”语句来手动刷新缓冲区。在下面的代码中,当i的值达到5时,将刷新代码缓冲区。

您可以通过执行以下代码来理解。

chiru@online:~$ cat flush.py
import time
import sys

for i in range(10):
    print i
    if i == 5:
        print "Flushing buffer"
        sys.stdout.flush()
    time.sleep(1)

for i in range(10):
    print i,
    if i == 5:
        print "Flushing buffer"
        sys.stdout.flush()
chiru@online:~$ python flush.py 
0 1 2 3 4 5 Flushing buffer
6 7 8 9 0 1 2 3 4 5 Flushing buffer
6 7 8 9

As per my understanding, When ever we execute print statements output will be written to buffer. And we will see the output on screen when buffer get flushed(cleared). By default buffer will be flushed when program exits. BUT WE CAN ALSO FLUSH THE BUFFER MANUALLY by using “sys.stdout.flush()” statement in the program. In the below code buffer will be flushed when value of i reaches 5.

You can understand by executing the below code.

chiru@online:~$ cat flush.py
import time
import sys

for i in range(10):
    print i
    if i == 5:
        print "Flushing buffer"
        sys.stdout.flush()
    time.sleep(1)

for i in range(10):
    print i,
    if i == 5:
        print "Flushing buffer"
        sys.stdout.flush()
chiru@online:~$ python flush.py 
0 1 2 3 4 5 Flushing buffer
6 7 8 9 0 1 2 3 4 5 Flushing buffer
6 7 8 9

回答 3

import sys
for x in range(10000):
    print "HAPPY >> %s <<\r" % str(x),
    sys.stdout.flush()
import sys
for x in range(10000):
    print "HAPPY >> %s <<\r" % str(x),
    sys.stdout.flush()

回答 4

根据我的理解,sys.stdout.flush()会将缓冲到该点的所有数据推送到文件对象。使用stdout时,数据在写入终端之前先存储在缓冲存储器中(一段时间或直到内存被填满)。使用flush()会强制清空缓冲区,甚至在缓冲区没有空间之前就将其写入终端。

As per my understanding sys.stdout.flush() pushes out all the data that has been buffered to that point to a file object. While using stdout, data is stored in buffer memory (for some time or until the memory gets filled) before it gets written to terminal. Using flush() forces to empty the buffer and write to terminal even before buffer has empty space.


json.load()和json.loads()函数有什么区别

问题:json.load()和json.loads()函数有什么区别

在Python中,json.load()和之间有什么区别json.loads()

我猜想load()函数必须与文件对象一起使用(因此,我需要使用上下文管理器),而load()函数将文件路径作为字符串。这有点令人困惑。

字母“ sjson.loads()代表字符串吗?

非常感谢你的回答!

In Python, what is the difference between json.load() and json.loads()?

I guess that the load() function must be used with a file object (I need thus to use a context manager) while the loads() function take the path to the file as a string. It is a bit confusing.

Does the letter “s” in json.loads() stand for string?

Thanks a lot for your answers!


回答 0

是的,s代表字符串。该json.loads函数不采用文件路径,而是将文件内容作为字符串。查看位于https://docs.python.org/2/library/json.html的文档!

Yes, s stands for string. The json.loads function does not take the file path, but the file contents as a string. Look at the documentation at https://docs.python.org/2/library/json.html!


回答 1

只是在每个人的解释中添加一个简单的例子,

json.load()

json.load可以反序列化文件本身,即它接受一个file对象,例如,

# open a json file for reading and print content using json.load
with open("/xyz/json_data.json", "r") as content:
  print(json.load(content))

将输出

{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}

如果我改用json.loads打开文件,

# you cannot use json.loads on file object
with open("json_data.json", "r") as content:
  print(json.loads(content))

我会收到此错误:

TypeError:预期的字符串或缓冲区

json.loads()

json.loads() 反串化字符串。

因此,要使用json.loads该文件read(),我将不得不使用函数传递文件的内容,例如,

content.read()json.loads()文件的返回内容一起使用,

with open("json_data.json", "r") as content:
  print(json.loads(content.read()))

输出,

{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}

那是因为类型content.read()是字符串,即<type 'str'>

如果json.load()与配合使用content.read(),则会出现错误,

with open("json_data.json", "r") as content:
  print(json.load(content.read()))

给,

AttributeError:’str’对象没有属性’read’

因此,现在您知道json.load反序列化文件并json.loads反序列化一个字符串。

另一个例子,

sys.stdin返回file对象,所以如果我这样做print(json.load(sys.stdin)),我将获得实际的json数据,

cat json_data.json | ./test.py

{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}

如果要使用json.loads(),我会print(json.loads(sys.stdin.read()))改为使用。

Just going to add a simple example to what everyone has explained,

json.load()

json.load can deserialize a file itself i.e. it accepts a file object, for example,

# open a json file for reading and print content using json.load
with open("/xyz/json_data.json", "r") as content:
  print(json.load(content))

will output,

{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}

If I use json.loads to open a file instead,

# you cannot use json.loads on file object
with open("json_data.json", "r") as content:
  print(json.loads(content))

I would get this error:

TypeError: expected string or buffer

json.loads()

json.loads() deserialize string.

So in order to use json.loads I will have to pass the content of the file using read() function, for example,

using content.read() with json.loads() return content of the file,

with open("json_data.json", "r") as content:
  print(json.loads(content.read()))

Output,

{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}

That’s because type of content.read() is string, i.e. <type 'str'>

If I use json.load() with content.read(), I will get error,

with open("json_data.json", "r") as content:
  print(json.load(content.read()))

Gives,

AttributeError: ‘str’ object has no attribute ‘read’

So, now you know json.load deserialze file and json.loads deserialize a string.

Another example,

sys.stdin return file object, so if i do print(json.load(sys.stdin)), I will get actual json data,

cat json_data.json | ./test.py

{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}

If I want to use json.loads(), I would do print(json.loads(sys.stdin.read())) instead.


回答 2

文档非常清晰:https//docs.python.org/2/library/json.html

json.load(fp[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]])

使用此转换表将fp(支持.read()的包含JSON文档的类似文件的对象)反序列化为Python对象。

json.loads(s[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]])

使用此转换表将s(包含JSON文档的str或unicode实例)反序列化为Python对象。

所以load是一个文件,loads一个string

Documentation is quite clear: https://docs.python.org/2/library/json.html

json.load(fp[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]])

Deserialize fp (a .read()-supporting file-like object containing a JSON document) to a Python object using this conversion table.

json.loads(s[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]])

Deserialize s (a str or unicode instance containing a JSON document) to a Python object using this conversion table.

So load is for a file, loads for a string


回答 3

快速解答(非常简化!)

json.load()需要一个文件

json.load()需要一个文件(文件对象),例如,您在文件路径(如)给定之前打开的文件'files/example.json'


json.loads()需要一个STRING

json.loads()需要一个(有效)JSON字符串-即 {"foo": "bar"}


例子

假设您有一个文件example.json,其内容如下:{“ key_1”:1,1,“ key_2”:“ foo”,“ Key_3”:null}

>>> import json
>>> file = open("example.json")

>>> type(file)
<class '_io.TextIOWrapper'>

>>> file
<_io.TextIOWrapper name='example.json' mode='r' encoding='UTF-8'>

>>> json.load(file)
{'key_1': 1, 'key_2': 'foo', 'Key_3': None}

>>> json.loads(file)
Traceback (most recent call last):
  File "/usr/local/python/Versions/3.7/lib/python3.7/json/__init__.py", line 341, in loads
TypeError: the JSON object must be str, bytes or bytearray, not TextIOWrapper


>>> string = '{"foo": "bar"}'

>>> type(string)
<class 'str'>

>>> string
'{"foo": "bar"}'

>>> json.loads(string)
{'foo': 'bar'}

>>> json.load(string)
Traceback (most recent call last):
  File "/usr/local/python/Versions/3.7/lib/python3.7/json/__init__.py", line 293, in load
    return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'

QUICK ANSWER (very simplified!)

json.load() takes a FILE

json.load() expects a file (file object) – e.g. a file you opened before given by filepath like 'files/example.json'.


json.loads() takes a STRING

json.loads() expects a (valid) JSON string – i.e. {"foo": "bar"}


EXAMPLES

Assuming you have a file example.json with this content: { “key_1”: 1, “key_2”: “foo”, “Key_3”: null }

>>> import json
>>> file = open("example.json")

>>> type(file)
<class '_io.TextIOWrapper'>

>>> file
<_io.TextIOWrapper name='example.json' mode='r' encoding='UTF-8'>

>>> json.load(file)
{'key_1': 1, 'key_2': 'foo', 'Key_3': None}

>>> json.loads(file)
Traceback (most recent call last):
  File "/usr/local/python/Versions/3.7/lib/python3.7/json/__init__.py", line 341, in loads
TypeError: the JSON object must be str, bytes or bytearray, not TextIOWrapper


>>> string = '{"foo": "bar"}'

>>> type(string)
<class 'str'>

>>> string
'{"foo": "bar"}'

>>> json.loads(string)
{'foo': 'bar'}

>>> json.load(string)
Traceback (most recent call last):
  File "/usr/local/python/Versions/3.7/lib/python3.7/json/__init__.py", line 293, in load
    return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'

回答 4

所述json.load()方法(无“S”中的“负荷”)可直接读取的文件:

import json
with open('strings.json') as f:
    d = json.load(f)
    print(d)

json.loads()方法,仅用于字符串参数。

import json

person = '{"name": "Bob", "languages": ["English", "Fench"]}'
print(type(person))
# Output : <type 'str'>

person_dict = json.loads(person)
print( person_dict)
# Output: {'name': 'Bob', 'languages': ['English', 'Fench']}

print(type(person_dict))
# Output : <type 'dict'>

在这里,我们可以看到在使用load()将字符串(type(str))作为输入并返回字典之后

The json.load() method (without “s” in “load”) can read a file directly:

import json
with open('strings.json') as f:
    d = json.load(f)
    print(d)

json.loads() method, which is used for string arguments only.

import json

person = '{"name": "Bob", "languages": ["English", "Fench"]}'
print(type(person))
# Output : <type 'str'>

person_dict = json.loads(person)
print( person_dict)
# Output: {'name': 'Bob', 'languages': ['English', 'Fench']}

print(type(person_dict))
# Output : <type 'dict'>

Here , we can see after using loads() takes a string ( type(str) ) as a input and return dictionary.


回答 5

在python3.7.7中,根据cpython源代码,json.load的定义如下:

def load(fp, *, cls=None, object_hook=None, parse_float=None,
        parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):

    return loads(fp.read(),
        cls=cls, object_hook=object_hook,
        parse_float=parse_float, parse_int=parse_int,
        parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)

json.load实际上调用json.loads并fp.read()用作第一个参数。

因此,如果您的代码是:

with open (file) as fp:
    s = fp.read()
    json.loads(s)

这样做是一样的:

with open (file) as fp:
    json.load(fp)

但是,如果您需要指定从文件中读取的字节,例如,fp.read(10)或者您要反序列化的字符串/字节不是从文件中读取,则应使用json.loads()

至于json.loads(),它不仅反序列化字符串,而且还反序列化字节。如果s为bytes或bytearray,则将其首先解码为字符串。您也可以在源代码中找到它。

def loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None,
        parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
    """Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance
    containing a JSON document) to a Python object.

    ...

    """
    if isinstance(s, str):
        if s.startswith('\ufeff'):
            raise JSONDecodeError("Unexpected UTF-8 BOM (decode using utf-8-sig)",
                                  s, 0)
    else:
        if not isinstance(s, (bytes, bytearray)):
            raise TypeError(f'the JSON object must be str, bytes or bytearray, '
                            f'not {s.__class__.__name__}')
        s = s.decode(detect_encoding(s), 'surrogatepass')

In python3.7.7, the definition of json.load is as below according to cpython source code:

def load(fp, *, cls=None, object_hook=None, parse_float=None,
        parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):

    return loads(fp.read(),
        cls=cls, object_hook=object_hook,
        parse_float=parse_float, parse_int=parse_int,
        parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)

json.load actually calls json.loads and use fp.read() as the first argument.

So if your code is:

with open (file) as fp:
    s = fp.read()
    json.loads(s)

It’s the same to do this:

with open (file) as fp:
    json.load(fp)

But if you need to specify the bytes reading from the file as like fp.read(10) or the string/bytes you want to deserialize is not from file, you should use json.loads()

As for json.loads(), it not only deserialize string but also bytes. If s is bytes or bytearray, it will be decoded to string first. You can also find it in the source code.

def loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None,
        parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
    """Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance
    containing a JSON document) to a Python object.

    ...

    """
    if isinstance(s, str):
        if s.startswith('\ufeff'):
            raise JSONDecodeError("Unexpected UTF-8 BOM (decode using utf-8-sig)",
                                  s, 0)
    else:
        if not isinstance(s, (bytes, bytearray)):
            raise TypeError(f'the JSON object must be str, bytes or bytearray, '
                            f'not {s.__class__.__name__}')
        s = s.decode(detect_encoding(s), 'surrogatepass')


在matplotlib中删除已保存图像周围的空白

问题:在matplotlib中删除已保存图像周围的空白

我需要拍摄图像并经过一些处理将其保存。显示该图形时,它看起来不错,但是保存该图形后,在保存的图像周围有一些空白。我尝试过方法的'tight'选项savefig,也没有用。代码:

  import matplotlib.image as mpimg
  import matplotlib.pyplot as plt

  fig = plt.figure(1)
  img = mpimg.imread(path)
  plt.imshow(img)
  ax=fig.add_subplot(1,1,1)

  extent = ax.get_window_extent().transformed(fig.dpi_scale_trans.inverted())
  plt.savefig('1.png', bbox_inches=extent)

  plt.axis('off') 
  plt.show()

我正在尝试通过在图上使用NetworkX绘制基本图形并将其保存。我意识到没有图就可以,但是当添加图时,保存的图像周围会有空白;

import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import networkx as nx

G = nx.Graph()
G.add_node(1)
G.add_node(2)
G.add_node(3)
G.add_edge(1,3)
G.add_edge(1,2)
pos = {1:[100,120], 2:[200,300], 3:[50,75]}

fig = plt.figure(1)
img = mpimg.imread("C:\\images\\1.jpg")
plt.imshow(img)
ax=fig.add_subplot(1,1,1)

nx.draw(G, pos=pos)

extent = ax.get_window_extent().transformed(fig.dpi_scale_trans.inverted())
plt.savefig('1.png', bbox_inches = extent)

plt.axis('off') 
plt.show()

I need to take an image and save it after some process. The figure looks fine when I display it, but after saving the figure, I got some white space around the saved image. I have tried the 'tight' option for savefig method, did not work either. The code:

  import matplotlib.image as mpimg
  import matplotlib.pyplot as plt

  fig = plt.figure(1)
  img = mpimg.imread(path)
  plt.imshow(img)
  ax=fig.add_subplot(1,1,1)

  extent = ax.get_window_extent().transformed(fig.dpi_scale_trans.inverted())
  plt.savefig('1.png', bbox_inches=extent)

  plt.axis('off') 
  plt.show()

I am trying to draw a basic graph by using NetworkX on a figure and save it. I realized that without graph it works, but when added a graph I get white space around the saved image;

import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import networkx as nx

G = nx.Graph()
G.add_node(1)
G.add_node(2)
G.add_node(3)
G.add_edge(1,3)
G.add_edge(1,2)
pos = {1:[100,120], 2:[200,300], 3:[50,75]}

fig = plt.figure(1)
img = mpimg.imread("C:\\images\\1.jpg")
plt.imshow(img)
ax=fig.add_subplot(1,1,1)

nx.draw(G, pos=pos)

extent = ax.get_window_extent().transformed(fig.dpi_scale_trans.inverted())
plt.savefig('1.png', bbox_inches = extent)

plt.axis('off') 
plt.show()

回答 0

我不能说我确切知道我的“解决方案”为什么起作用或如何起作用,但是当我想将几个翼型截面的轮廓(没有白色边距)绘制到PDF文件时,这就是我要做的。(请注意,我在带有-pylab标志的IPython笔记本中使用了matplotlib。)

plt.gca().set_axis_off()
plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, 
            hspace = 0, wspace = 0)
plt.margins(0,0)
plt.gca().xaxis.set_major_locator(plt.NullLocator())
plt.gca().yaxis.set_major_locator(plt.NullLocator())
plt.savefig("filename.pdf", bbox_inches = 'tight',
    pad_inches = 0)

我尝试停用此功能的不同部分,但这总是在某处导致空白。您甚至可以对此进行修改,以防止由于缺乏边距而使图形附近的粗线被刮掉。

I cannot claim I know exactly why or how my “solution” works, but this is what I had to do when I wanted to plot the outline of a couple of aerofoil sections — without white margins — to a PDF file. (Note that I used matplotlib inside an IPython notebook, with the -pylab flag.)

plt.gca().set_axis_off()
plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, 
            hspace = 0, wspace = 0)
plt.margins(0,0)
plt.gca().xaxis.set_major_locator(plt.NullLocator())
plt.gca().yaxis.set_major_locator(plt.NullLocator())
plt.savefig("filename.pdf", bbox_inches = 'tight',
    pad_inches = 0)

I have tried to deactivate different parts of this, but this always lead to a white margin somewhere. You may even have modify this to keep fat lines near the limits of the figure from being shaved by the lack of margins.


回答 1

您可以通过bbox_inches="tight"在中设置来删除空白填充savefig

plt.savefig("test.png",bbox_inches='tight')

您必须将参数bbox_inches作为字符串输入,也许这就是为什么它对您较早不起作用的原因。


可能重复:

Matplotlib图:删除轴,图例和空白

如何设置matplotlib图形的边距?

减少matplotlib图中的左右边距

You can remove the white space padding by setting bbox_inches="tight" in savefig:

plt.savefig("test.png",bbox_inches='tight')

You’ll have to put the argument to bbox_inches as a string, perhaps this is why it didn’t work earlier for you.


Possible duplicates:

Matplotlib plots: removing axis, legends and white spaces

How to set the margins for a matplotlib figure?

Reduce left and right margins in matplotlib plot


回答 2

在尝试了上述答案但没有成功(以及许多其他堆栈文章)之后,最终对我有用的只是

plt.gca().set_axis_off()
plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, 
            hspace = 0, wspace = 0)
plt.margins(0,0)
plt.savefig("myfig.pdf")

重要的是,这不包括bbox或padding参数。

After trying the above answers with no success (and a slew of other stack posts) what finally worked for me was just

plt.gca().set_axis_off()
plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, 
            hspace = 0, wspace = 0)
plt.margins(0,0)
plt.savefig("myfig.pdf")

Importantly this does not include the bbox or padding arguments.


回答 3

我从Arvind Pereira(http://robotics.usc.edu/~ampereir/wordpress/?p=626)找到了一些东西,似乎对我有用:

plt.savefig(filename, transparent = True, bbox_inches = 'tight', pad_inches = 0)

I found something from Arvind Pereira (http://robotics.usc.edu/~ampereir/wordpress/?p=626) and seemed to work for me:

plt.savefig(filename, transparent = True, bbox_inches = 'tight', pad_inches = 0)

回答 4

以下功能合并了上面的johannes-s答案。我有测试过plt.figure,并plt.subplots()与多个轴,它工作得很好。

def save(filepath, fig=None):
    '''Save the current image with no whitespace
    Example filepath: "myfig.png" or r"C:\myfig.pdf" 
    '''
    import matplotlib.pyplot as plt
    if not fig:
        fig = plt.gcf()

    plt.subplots_adjust(0,0,1,1,0,0)
    for ax in fig.axes:
        ax.axis('off')
        ax.margins(0,0)
        ax.xaxis.set_major_locator(plt.NullLocator())
        ax.yaxis.set_major_locator(plt.NullLocator())
    fig.savefig(filepath, pad_inches = 0, bbox_inches='tight')

The following function incorporates johannes-s answer above. I have tested it with plt.figure and plt.subplots() with multiple axes, and it works nicely.

def save(filepath, fig=None):
    '''Save the current image with no whitespace
    Example filepath: "myfig.png" or r"C:\myfig.pdf" 
    '''
    import matplotlib.pyplot as plt
    if not fig:
        fig = plt.gcf()

    plt.subplots_adjust(0,0,1,1,0,0)
    for ax in fig.axes:
        ax.axis('off')
        ax.margins(0,0)
        ax.xaxis.set_major_locator(plt.NullLocator())
        ax.yaxis.set_major_locator(plt.NullLocator())
    fig.savefig(filepath, pad_inches = 0, bbox_inches='tight')

回答 5

我发现以下代码非常适合这项工作。

fig = plt.figure(figsize=[6,6])
ax = fig.add_subplot(111)
ax.imshow(data)
ax.axes.get_xaxis().set_visible(False)
ax.axes.get_yaxis().set_visible(False)
ax.set_frame_on(False)
plt.savefig('data.png', dpi=400, bbox_inches='tight',pad_inches=0)

I found the following codes work perfectly for the job.

fig = plt.figure(figsize=[6,6])
ax = fig.add_subplot(111)
ax.imshow(data)
ax.axes.get_xaxis().set_visible(False)
ax.axes.get_yaxis().set_visible(False)
ax.set_frame_on(False)
plt.savefig('data.png', dpi=400, bbox_inches='tight',pad_inches=0)

回答 6

我遵循了这个顺序,它就像一个魅力。

plt.axis("off")
fig=plt.imshow(image array,interpolation='nearest')
fig.axes.get_xaxis().set_visible(False)
fig.axes.get_yaxis().set_visible(False)
plt.savefig('destination_path.pdf',
    bbox_inches='tight', pad_inches=0, format='pdf', dpi=1200)

i followed this sequence and it worked like a charm.

plt.axis("off")
fig=plt.imshow(image array,interpolation='nearest')
fig.axes.get_xaxis().set_visible(False)
fig.axes.get_yaxis().set_visible(False)
plt.savefig('destination_path.pdf',
    bbox_inches='tight', pad_inches=0, format='pdf', dpi=1200)

回答 7

对于任何想以像素而不是英寸为单位的人,都可以使用。

加上平时您还需要

from matplotlib.transforms import Bbox

然后,您可以使用以下命令:

my_dpi = 100 # Good default - doesn't really matter

# Size of output in pixels
h = 224
w = 224

fig, ax = plt.subplots(1, figsize=(w/my_dpi, h/my_dpi), dpi=my_dpi)

ax.set_position([0, 0, 1, 1]) # Critical!

# Do some stuff
ax.imshow(img)
ax.imshow(heatmap) # 4-channel RGBA
ax.plot([50, 100, 150], [50, 100, 150], color="red")

ax.axis("off")

fig.savefig("saved_img.png",
            bbox_inches=Bbox([[0, 0], [w/my_dpi, h/my_dpi]]),
            dpi=my_dpi)

在此处输入图片说明

For anyone who wants to work in pixels rather than inches this will work.

Plus the usual you will also need

from matplotlib.transforms import Bbox

Then you can use the following:

my_dpi = 100 # Good default - doesn't really matter

# Size of output in pixels
h = 224
w = 224

fig, ax = plt.subplots(1, figsize=(w/my_dpi, h/my_dpi), dpi=my_dpi)

ax.set_position([0, 0, 1, 1]) # Critical!

# Do some stuff
ax.imshow(img)
ax.imshow(heatmap) # 4-channel RGBA
ax.plot([50, 100, 150], [50, 100, 150], color="red")

ax.axis("off")

fig.savefig("saved_img.png",
            bbox_inches=Bbox([[0, 0], [w/my_dpi, h/my_dpi]]),
            dpi=my_dpi)

enter image description here


回答 8

我发现一种更简单的方法是使用plt.imsave

    import matplotlib.pyplot as plt
    arr = plt.imread(path)
    plt.imsave('test.png', arr)

A much simpler approach I found is to use plt.imsave :

    import matplotlib.pyplot as plt
    arr = plt.imread(path)
    plt.imsave('test.png', arr)

回答 9

您可以尝试一下。它解决了我的问题。

import matplotlib.image as mpimg
img = mpimg.imread("src.png")
mpimg.imsave("out.png", img, cmap=cmap)

You may try this. It solved my issue.

import matplotlib.image as mpimg
img = mpimg.imread("src.png")
mpimg.imsave("out.png", img, cmap=cmap)

回答 10

如果要显示要保存的内容,我建议您使用plt.tight_layout转换,因为它在使用时不会进行不必要的裁剪,因此实际上更可取plt.savefig

import matplotlib as plt    
plt.plot([1,2,3], [1,2,3])
plt.tight_layout(pad=0)
plt.savefig('plot.png')

The most straightforward method is to use plt.tight_layout transformation which is actually more preferable as it doesn’t do unnecessary cropping when using plt.savefig

import matplotlib as plt    
plt.plot([1,2,3], [1,2,3])
plt.tight_layout(pad=0)
plt.savefig('plot.png')

However, this may not be preferable for complex plots that modifies the figure. Refer to top answers that uses plt.subplots_adjust if that’s the case.


回答 11

这对我有用,将用imshow绘制的numpy数组保存到文件

import matplotlib.pyplot as plt

fig = plt.figure(figsize=(10,10))
plt.imshow(img) # your image here
plt.axis("off")
plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, 
        hspace = 0, wspace = 0)
plt.savefig("example2.png", box_inches='tight', dpi=100)
plt.show()

This works for me saving a numpy array plotted with imshow to file

import matplotlib.pyplot as plt

fig = plt.figure(figsize=(10,10))
plt.imshow(img) # your image here
plt.axis("off")
plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, 
        hspace = 0, wspace = 0)
plt.savefig("example2.png", box_inches='tight', dpi=100)
plt.show()

有趣好用的Python教程