Numpy arrays have an astype method. Just do y.astype(int).
Note that it might not even be necessary to do this, depending on what you’re using the array for. Bool will be autopromoted to int in many cases, so you can add it to int arrays without having to explicitly convert it:
>>> x
array([ True, False, True], dtype=bool)
>>> x + [1, 2, 3]
array([2, 2, 4])
回答 1
该1*y方法也适用于Numpy:
>>>import numpy as np
>>> x = np.array([4,3,2,1])>>> y =2>= x
>>> y
array([False,False,True,True], dtype=bool)>>>1*y # Method 1
array([0,0,1,1])>>> y.astype(int)# Method 2
array([0,0,1,1])
如果您正在寻求一种将Python列表从Boolean转换为int的方法,则可以使用以下map方法:
>>> testList =[False,False,True,True]>>> map(lambda x:1if x else0, testList)[0,0,1,1]>>> map(int, testList)[0,0,1,1]
或使用列表推导:
>>> testList
[False,False,True,True]>>>[int(elem)for elem in testList][0,0,1,1]
Using the new Enum feature (via backport enum34) with python 2.7.6.
Given the following definition, how can I convert an int to the corresponding Enum value?
from enum import Enum
class Fruit(Enum):
Apple = 4
Orange = 5
Pear = 6
I know I can hand craft a series of if-statements to do the conversion but is there an easy pythonic way to convert? Basically, I’d like a function ConvertIntToFruit(int) that returns an enum value.
My use case is I have a csv file of records where I’m reading each record into an object. One of the file fields is an integer field that represents an enumeration. As I’m populating the object I’d like to convert that integer field from the file into the corresponding Enum value in the object.
回答 0
您“打电话”Enum上课:
Fruit(5)
轮到5为Fruit.Orange:
>>> from enum import Enum
>>> classFruit(Enum):... Apple = 4... Orange = 5... Pear = 6... >>> Fruit(5)
<Fruit.Orange: 5>
Sometimes it’s useful to access members in enumerations
programmatically (i.e. situations where Color.red won’t do because the
exact color is not known at program-writing time). Enum allows such
access:
The failure throws an exception as would be expected.
A more robust version:
#!/usr/bin/env python3
class EnumDemo():
enumeration = (
'ENUM_ZERO', # 0.
'ENUM_ONE', # 1.
'ENUM_TWO', # 2.
'ENUM_THREE', # 3.
'ENUM_INVALID' # 4.
)
def name(self, val):
try:
name = self.enumeration[val]
except IndexError:
# Always return last tuple.
name = self.enumeration[len(self.enumeration) - 1]
return name
def number(self, val):
try:
index = self.enumeration.index(val)
except (TypeError, ValueError):
# Always return last tuple.
index = (len(self.enumeration) - 1)
return index
#endclass.
print('Passes')
print('1) %d'%(EnumDemo().number('ENUM_TWO')))
print('2) %s'%(EnumDemo().number('ENUM_TWO')))
print('3) %s'%(EnumDemo().name(1)))
print('4) %s'%(EnumDemo().enumeration[1]))
print()
print('Fails')
print('1) %d'%(EnumDemo().number('ENUM_THREEa')))
print('2) %s'%(EnumDemo().number('ENUM_THREEa')))
print('3) %s'%(EnumDemo().name(11)))
print('4) %s'%(EnumDemo().enumeration[-1]))
When not used correctly this avoids creating an exception and, instead, passes back a fault indication. A more Pythonic way to do this would be to pass back “None” but my particular application uses the text directly.
because I have to store it as string in a database. After the conversion the order is not important anymore, so I can spare the ordered feature anyway.
Even though this is a year old question, I would like to say that using dict will not help if you have an ordered dict within the ordered dict. The simplest way that could convert those recursive ordered dict will be
If you have to store it as a string in your database, using JSON is the way to go. That is also quite simple, and you don’t even have to worry about converting to a regular dict:
Is there a preferred way to keep the data type of a numpy array fixed as int (or int64 or whatever), while still having an element inside listed as numpy.NaN?
In particular, I am converting an in-house data structure to a Pandas DataFrame. In our structure, we have integer-type columns that still have NaN’s (but the dtype of the column is int). It seems to recast everything as a float if we make this a DataFrame, but we’d really like to be int.
Thoughts?
Things tried:
I tried using the from_records() function under pandas.DataFrame, with coerce_float=False and this did not help. I also tried using NumPy masked arrays, with NaN fill_value, which also did not work. All of these caused the column data type to become a float.
NaN can’t be stored in an integer array. This is a known limitation of pandas at the moment; I have been waiting for progress to be made with NA values in NumPy (similar to NAs in R), but it will be at least 6 months to a year before NumPy gets these features, it seems:
Then you can mix then with NaN as much as you want. If you really want to have integers, depending on your application, you can use -1, or 0, or 1234567890, or some other dedicated value to represent NaN.
You can also temporarily duplicate the columns: one as you have, with floats; the other one experimental, with ints or strings. Then inserts asserts in every reasonable place checking that the two are in sync. After enough testing you can let go of the floats.
Functionality to support NaN in integer series will be available in v0.24 upwards. There’s information on this in the v0.24 “What’s New” section, and more details under Nullable Integer Data Type.
Pandas v0.23 and earlier
In general, it’s best to work with float series where possible, even when the series is upcast from int to float due to inclusion of NaN values. This enables vectorised NumPy-based calculations where, otherwise, Python-level loops would be processed.
The docs do suggest : “One possibility is to use dtype=object arrays instead.” For example:
s = pd.Series([1, 2, 3, np.nan])
print(s.astype(object))
0 1
1 2
2 3
3 NaN
dtype: object
For cosmetic reasons, e.g. output to a file, this may be preferable.
In the absence of high performance NA support being built into NumPy
from the ground up, the primary casualty is the ability to represent
NAs in integer arrays.
This trade-off is made largely for memory and performance reasons, and
also so that the resulting Series continues to be “numeric”.
The docs also provide rules for upcasting due to NaN inclusion:
Typeclass Promotion dtype for storing NAs
floating no change
object no change
integer cast to float64
boolean cast to object
s1 = pd.Series([1.434,2.343, np.nan])#without round() the next line returns an error
s1.astype('Int64')#cannot safely cast non-equivalent float64 to int64##with round() it works
s1.round().astype('Int64')01122NaN
dtype:Int64
Just wanted to add that in case you are trying to convert a float (1.143) vector to integer (1) that has NA converting to the new ‘Int64’ dtype will give you an error. In order to solve this you have to round the numbers and then do “.astype(‘Int64’)”
s1 = pd.Series([1.434, 2.343, np.nan])
#without round() the next line returns an error
s1.astype('Int64')
#cannot safely cast non-equivalent float64 to int64
##with round() it works
s1.round().astype('Int64')
0 1
1 2
2 NaN
dtype: Int64
My use case is that I have a float series that I want to round to int, but when you do .round() a ‘*.0’ at the end of the number remains, so you can drop that 0 from the end by converting to int.
import pandas as pd
import numpy as np
#show datatypes before transformation
mydf.dtypes
for c in mydf.select_dtypes(np.number).columns:try:
mydf[c]= mydf[c].astype('Int64')print('casted {} as Int64'.format(c))except:print('could not cast {} to Int64'.format(c))#show datatypes after transformation
mydf.dtypes
If there are blanks in the text data, columns that would normally be integers will be cast to floats as float64 dtype because int64 dtype cannot handle nulls. This can cause inconsistent schema if you are loading multiple files some with blanks (which will end up as float64 and others without which will end up as int64
This code will attempt to convert any number type columns to Int64 (as opposed to int64) since Int64 can handle nulls
import pandas as pd
import numpy as np
#show datatypes before transformation
mydf.dtypes
for c in mydf.select_dtypes(np.number).columns:
try:
mydf[c] = mydf[c].astype('Int64')
print('casted {} as Int64'.format(c))
except:
print('could not cast {} to Int64'.format(c))
#show datatypes after transformation
mydf.dtypes
I wonder what’s the correct way of converting (deserializing) a string to a Python’s Enum class. Seems like getattr(YourEnumType, str) does the job, but I’m not sure if it’s safe enough.
Just to be more specific, I would like to convert a 'debug'string to an Enum object like this:
Another alternative (especially useful if your strings don’t map 1-1 to your enum cases) is to add a staticmethod to your Enum, e.g.:
class QuestionType(enum.Enum):
MULTI_SELECT = "multi"
SINGLE_SELECT = "single"
@staticmethod
def from_str(label):
if label in ('single', 'singleSelect'):
return QuestionType.SINGLE_SELECT
elif label in ('multi', 'multiSelect'):
return QuestionType.MULTI_SELECT
else:
raise NotImplementedError
Then you can do question_type = QuestionType.from_str('singleSelect')
回答 2
def custom_enum(typename, items_dict):
class_definition ="""
from enum import Enum
class {}(Enum):
{}""".format(typename,'\n '.join(['{} = {}'.format(k, v)for k, v in items_dict.items()]))
namespace = dict(__name__='enum_%s'% typename)exec(class_definition, namespace)
result = namespace[typename]
result._source = class_definition
return result
MyEnum= custom_enum('MyEnum',{'a':123,'b':321})print(MyEnum.a,MyEnum.b)
还是需要将字符串转换为已知的 Enum?
classMyEnum(Enum):
a ='aaa'
b =123print(MyEnum('aaa'),MyEnum(123))
要么:
classBuildType(Enum):
debug =200
release =400print(BuildType.__dict__['debug'])print(eval('BuildType.debug'))print(type(eval('BuildType.debug')))print(eval(BuildType.__name__ +'.debug'))# for work with code refactoring
def custom_enum(typename, items_dict):
class_definition = """
from enum import Enum
class {}(Enum):
{}""".format(typename, '\n '.join(['{} = {}'.format(k, v) for k, v in items_dict.items()]))
namespace = dict(__name__='enum_%s' % typename)
exec(class_definition, namespace)
result = namespace[typename]
result._source = class_definition
return result
MyEnum = custom_enum('MyEnum', {'a': 123, 'b': 321})
print(MyEnum.a, MyEnum.b)
Or you need to convert string to known Enum?
class MyEnum(Enum):
a = 'aaa'
b = 123
print(MyEnum('aaa'), MyEnum(123))
Or:
class BuildType(Enum):
debug = 200
release = 400
print(BuildType.__dict__['debug'])
print(eval('BuildType.debug'))
print(type(eval('BuildType.debug')))
print(eval(BuildType.__name__ + '.debug')) # for work with code refactoring
回答 3
我的类Java解决方案。希望它可以帮助某人…
from enum importEnum, auto
classSignInMethod(Enum):
EMAIL = auto(),
GOOGLE = auto()@staticmethoddef value_of(value)->Enum:for m, mm inSignInMethod.__members__.items():if m == value.upper():return mm
sim =SignInMethod.value_of('EMAIL')print("""TEST
1). {0}
2). {1}
3). {2}
""".format(sim, sim.name, isinstance(sim,SignInMethod)))
My Java-like solution to the problem. Hope it helps someone…
from enum import Enum, auto
class SignInMethod(Enum):
EMAIL = auto(),
GOOGLE = auto()
@staticmethod
def value_of(value) -> Enum:
for m, mm in SignInMethod.__members__.items():
if m == value.upper():
return mm
sim = SignInMethod.value_of('EMAIL')
print("""TEST
1). {0}
2). {1}
3). {2}
""".format(sim, sim.name, isinstance(sim, SignInMethod)))
I’ve got some Python code that runs through a list of strings and converts them to integers or floating point numbers if possible. Doing this for integers is pretty easy
if element.isdigit():
newelement = int(element)
Floating point numbers are more difficult. Right now I’m using partition('.') to split the string and checking to make sure that one or both sides are digits.
partition = element.partition('.')
if (partition[0].isdigit() and partition[1] == '.' and partition[2].isdigit())
or (partition[0] == '' and partition[1] == '.' and partition[2].isdigit())
or (partition[0].isdigit() and partition[1] == '.' and partition[2] == ''):
newelement = float(element)
This works, but obviously the if statement for that is a bit of a bear. The other solution I considered is to just wrap the conversion in a try/catch block and see if it succeeds, as described in this question.
Anyone have any other ideas? Opinions on the relative merits of the partition and try/catch approaches?
回答 0
我会用..
try:
float(element)exceptValueError:print"Not a float"
..它很简单,并且可以正常工作
另一个选择是正则表达式:
import re
if re.match(r'^-?\d+(?:\.\d+)?$', element)isNone:print"Not float"
Command to parse Is it a float?Comment-----------------------------------------------------------------print(isfloat(""))Falseprint(isfloat("1234567"))Trueprint(isfloat("NaN"))True nan is also float
print(isfloat("NaNananana BATMAN"))Falseprint(isfloat("123.456"))Trueprint(isfloat("123.E4"))Trueprint(isfloat(".1"))Trueprint(isfloat("1,234"))Falseprint(isfloat("NULL"))False case insensitive
print(isfloat(",1"))Falseprint(isfloat("123.EE4"))Falseprint(isfloat("6.523537535629999e-07"))Trueprint(isfloat("6e777777"))TrueThisis same asInfprint(isfloat("-iNF"))Trueprint(isfloat("1.797693e+308"))Trueprint(isfloat("infinity"))Trueprint(isfloat("infinity and BEYOND"))Falseprint(isfloat("12.34.56"))FalseTwo dots not allowed.print(isfloat("#56"))Falseprint(isfloat("56%"))Falseprint(isfloat("0E0"))Trueprint(isfloat("x86E0"))Falseprint(isfloat("86-5"))Falseprint(isfloat("True"))FalseBooleanisnot a float.print(isfloat(True))TrueBooleanis a float
print(isfloat("+1e1^5"))Falseprint(isfloat("+1e1"))Trueprint(isfloat("+1e1.3"))Falseprint(isfloat("+1.3P1"))Falseprint(isfloat("-+1"))Falseprint(isfloat("(1)"))False brackets not interpreted
Don’t get bit by the goblins hiding in the float boat! DO UNIT TESTING!
What is, and is not a float may surprise you:
Command to parse Is it a float? Comment
-------------------------------------- --------------- ------------
print(isfloat("")) False
print(isfloat("1234567")) True
print(isfloat("NaN")) True nan is also float
print(isfloat("NaNananana BATMAN")) False
print(isfloat("123.456")) True
print(isfloat("123.E4")) True
print(isfloat(".1")) True
print(isfloat("1,234")) False
print(isfloat("NULL")) False case insensitive
print(isfloat(",1")) False
print(isfloat("123.EE4")) False
print(isfloat("6.523537535629999e-07")) True
print(isfloat("6e777777")) True This is same as Inf
print(isfloat("-iNF")) True
print(isfloat("1.797693e+308")) True
print(isfloat("infinity")) True
print(isfloat("infinity and BEYOND")) False
print(isfloat("12.34.56")) False Two dots not allowed.
print(isfloat("#56")) False
print(isfloat("56%")) False
print(isfloat("0E0")) True
print(isfloat("x86E0")) False
print(isfloat("86-5")) False
print(isfloat("True")) False Boolean is not a float.
print(isfloat(True)) True Boolean is a float
print(isfloat("+1e1^5")) False
print(isfloat("+1e1")) True
print(isfloat("+1e1.3")) False
print(isfloat("+1.3P1")) False
print(isfloat("-+1")) False
print(isfloat("(1)")) False brackets not interpreted
fn sad:0.220988988876
fn happy:0.212214946747.
part sad:1.2219619751
part happy:0.754667043686.
re sad:1.50515985489
re happy:1.01107215881.try sad:2.40243887901try happy:0.425730228424.----------------------------------------------------------------------Ran4 tests in7.761s
OK
If your input is mostly strings that can be converted to floats, the try: except: method is the best native Python method.
If your input is mostly strings that cannot be converted to floats, regular expressions or the partition method will be better.
If you are 1) unsure of your input or need more speed and 2) don’t mind and can install a third-party C-extension, fastnumbers works very well.
There is another method available via a third-party module called fastnumbers (disclosure, I am the author); it provides a function called isfloat. I have taken the unittest example outlined by Jacob Gabrielson in this answer, but added the fastnumbers.isfloat method. I should also note that Jacob’s example did not do justice to the regex option because most of the time in that example was spent in global lookups because of the dot operator… I have modified that function to give a fairer comparison to try: except:.
def is_float_try(str):
try:
float(str)
return True
except ValueError:
return False
import re
_float_regexp = re.compile(r"^[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?$").match
def is_float_re(str):
return True if _float_regexp(str) else False
def is_float_partition(element):
partition=element.partition('.')
if (partition[0].isdigit() and partition[1]=='.' and partition[2].isdigit()) or (partition[0]=='' and partition[1]=='.' and partition[2].isdigit()) or (partition[0].isdigit() and partition[1]=='.' and partition[2]==''):
return True
else:
return False
from fastnumbers import isfloat
if __name__ == '__main__':
import unittest
import timeit
class ConvertTests(unittest.TestCase):
def test_re_perf(self):
print
print 're sad:', timeit.Timer('ttest.is_float_re("12.2x")', "import ttest").timeit()
print 're happy:', timeit.Timer('ttest.is_float_re("12.2")', "import ttest").timeit()
def test_try_perf(self):
print
print 'try sad:', timeit.Timer('ttest.is_float_try("12.2x")', "import ttest").timeit()
print 'try happy:', timeit.Timer('ttest.is_float_try("12.2")', "import ttest").timeit()
def test_fn_perf(self):
print
print 'fn sad:', timeit.Timer('ttest.isfloat("12.2x")', "import ttest").timeit()
print 'fn happy:', timeit.Timer('ttest.isfloat("12.2")', "import ttest").timeit()
def test_part_perf(self):
print
print 'part sad:', timeit.Timer('ttest.is_float_partition("12.2x")', "import ttest").timeit()
print 'part happy:', timeit.Timer('ttest.is_float_partition("12.2")', "import ttest").timeit()
unittest.main()
On my machine, the output is:
fn sad: 0.220988988876
fn happy: 0.212214946747
.
part sad: 1.2219619751
part happy: 0.754667043686
.
re sad: 1.50515985489
re happy: 1.01107215881
.
try sad: 2.40243887901
try happy: 0.425730228424
.
----------------------------------------------------------------------
Ran 4 tests in 7.761s
OK
As you can see, regex is actually not as bad as it originally seemed, and if you have a real need for speed, the fastnumbers method is quite good.
If you cared about performance (and I’m not suggesting you should), the try-based approach is the clear winner (compared with your partition-based approach or the regexp approach), as long as you don’t expect a lot of invalid strings, in which case it’s potentially slower (presumably due to the cost of exception handling).
Again, I’m not suggesting you care about performance, just giving you the data in case you’re doing this 10 billion times a second, or something. Also, the partition-based code doesn’t handle at least one valid string.
$ ./floatstr.py
F..
partition sad: 3.1102449894
partition happy: 2.09208488464
..
re sad: 7.76906108856
re happy: 7.09421992302
..
try sad: 12.1525540352
try happy: 1.44165301323
.
======================================================================
FAIL: test_partition (__main__.ConvertTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "./floatstr.py", line 48, in test_partition
self.failUnless(is_float_partition("20e2"))
AssertionError
----------------------------------------------------------------------
Ran 8 tests in 33.670s
FAILED (failures=1)
Here’s the code (Python 2.6, regexp taken from John Gietzen’s answer):
def is_float_try(str):
try:
float(str)
return True
except ValueError:
return False
import re
_float_regexp = re.compile(r"^[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?$")
def is_float_re(str):
return re.match(_float_regexp, str)
def is_float_partition(element):
partition=element.partition('.')
if (partition[0].isdigit() and partition[1]=='.' and partition[2].isdigit()) or (partition[0]=='' and partition[1]=='.' and pa\
rtition[2].isdigit()) or (partition[0].isdigit() and partition[1]=='.' and partition[2]==''):
return True
if __name__ == '__main__':
import unittest
import timeit
class ConvertTests(unittest.TestCase):
def test_re(self):
self.failUnless(is_float_re("20e2"))
def test_try(self):
self.failUnless(is_float_try("20e2"))
def test_re_perf(self):
print
print 're sad:', timeit.Timer('floatstr.is_float_re("12.2x")', "import floatstr").timeit()
print 're happy:', timeit.Timer('floatstr.is_float_re("12.2")', "import floatstr").timeit()
def test_try_perf(self):
print
print 'try sad:', timeit.Timer('floatstr.is_float_try("12.2x")', "import floatstr").timeit()
print 'try happy:', timeit.Timer('floatstr.is_float_try("12.2")', "import floatstr").timeit()
def test_partition_perf(self):
print
print 'partition sad:', timeit.Timer('floatstr.is_float_partition("12.2x")', "import floatstr").timeit()
print 'partition happy:', timeit.Timer('floatstr.is_float_partition("12.2")', "import floatstr").timeit()
def test_partition(self):
self.failUnless(is_float_partition("20e2"))
def test_partition2(self):
self.failUnless(is_float_partition(".2"))
def test_partition3(self):
self.failIf(is_float_partition("1234x.2"))
unittest.main()
回答 5
仅出于多样性,这是另一种方法。
>>> all([i.isnumeric()for i in'1.2'.split('.',1)])True>>> all([i.isnumeric()for i in'2'.split('.',1)])True>>> all([i.isnumeric()for i in'2.f'.split('.',1)])False
>>>def isfloat(val):...return all([[any([i.isnumeric(), i in['.','e']])for i in val], len(val.split('.'))==2])...>>> isfloat('1')False>>> isfloat('1.2')True>>> isfloat('1.2e3')True>>> isfloat('12e3')False
>>> all([i.isnumeric() for i in '1.2'.split('.',1)])
True
>>> all([i.isnumeric() for i in '2'.split('.',1)])
True
>>> all([i.isnumeric() for i in '2.f'.split('.',1)])
False
Edit: Im sure it will not hold up to all cases of float though especially when there is an exponent. To solve that it looks like this. This will return True only val is a float and False for int but is probably less performant than regex.
>>> def isfloat(val):
... return all([ [any([i.isnumeric(), i in ['.','e']]) for i in val], len(val.split('.')) == 2] )
...
>>> isfloat('1')
False
>>> isfloat('1.2')
True
>>> isfloat('1.2e3')
True
>>> isfloat('12e3')
False
However, I believe that your best bet is to use the parser in a try.
回答 7
如果您不必担心数字的科学表达式或其他表达式,而只使用可能是带或不带句点的数字的字符串,则:
功能
def is_float(s):
result =Falseif s.count(".")==1:if s.replace(".","").isdigit():
result =Truereturn result
Lambda版本
is_float =lambda x: x.replace('.','',1).isdigit()and"."in x
例
if is_float(some_string):
some_string = float(some_string)elif some_string.isdigit():
some_string = int(some_string)else:print"Does not convert to int or float."
If you don’t need to worry about scientific or other expressions of numbers and are only working with strings that could be numbers with or without a period:
Function
def is_float(s):
result = False
if s.count(".") == 1:
if s.replace(".", "").isdigit():
result = True
return result
Lambda version
is_float = lambda x: x.replace('.','',1).isdigit() and "." in x
Example
if is_float(some_string):
some_string = float(some_string)
elif some_string.isdigit():
some_string = int(some_string)
else:
print "Does not convert to int or float."
This way you aren’t accidentally converting what should be an int, into a float.
I used the function already mentioned, but soon I notice that strings as “Nan”, “Inf” and it’s variation are considered as number. So I propose you improved version of the function, that will return false on those type of input and will not fail “1e3” variants:
def is_float(text):
# check for nan/infinity etc.
if text.isalpha():
return False
try:
float(text)
return True
except ValueError:
return False
回答 10
尝试转换为浮点数。如果有错误,请打印ValueError异常。
try:
x = float('1.23')print('val=',x)
y = float('abc')print('val=',y)exceptValueErroras err:print('floatErr;',err)
输出:
val=1.23
floatErr: could not convert string to float:'abc'
def cleanInput(question,retry=False):
inputValue = input("\n\nOnly positive numbers can be entered, please re-enter the value.\n\n{}".format(question))if retry else input(question)try:if float(inputValue)<=0:raiseValueError()else:return(float(inputValue))exceptValueError:return(cleanInput(question,retry=True))
willbefloat = cleanInput("Give me the number: ")
I was looking for some similar code, but it looks like using try/excepts is the best way.
Here is the code I’m using. It includes a retry function if the input is invalid. I needed to check if the input was greater than 0 and if so convert it to a float.
def cleanInput(question,retry=False):
inputValue = input("\n\nOnly positive numbers can be entered, please re-enter the value.\n\n{}".format(question)) if retry else input(question)
try:
if float(inputValue) <= 0 : raise ValueError()
else : return(float(inputValue))
except ValueError : return(cleanInput(question,retry=True))
willbefloat = cleanInput("Give me the number: ")
回答 13
def try_parse_float(item):
result =Nonetry:
float(item)except:passelse:
result = float(item)return result
How do you convert a Unicode string (containing extra characters like £ $, etc.) into a Python string?
回答 0
title = u"Klüft skräms inför på fédéral électoral große"import unicodedata
unicodedata.normalize('NFKD', title).encode('ascii','ignore')'Kluft skrams infor pa federal electoral groe'
title = u"Klüft skräms inför på fédéral électoral große"
import unicodedata
unicodedata.normalize('NFKD', title).encode('ascii','ignore')
'Kluft skrams infor pa federal electoral groe'
If you have a Unicode string, and you want to write this to a file, or other serialised form, you must first encode it into a particular representation that can be stored. There are several common Unicode encodings, such as UTF-16 (uses two bytes for most Unicode characters) or UTF-8 (1-4 bytes / codepoint depending on the character), etc. To convert that string into a particular encoding, you can use:
This raw string of bytes can be written to a file. However, note that when reading it back, you must know what encoding it is in and decode it using that same encoding.
When writing to files, you can get rid of this manual encode/decode process by using the codecs module. So, to open a file that encodes all Unicode strings into UTF-8, use:
import codecs
f = codecs.open('path/to/file.txt','w','utf8')
f.write(my_unicode_string) # Stored on disk as UTF-8
Do note that anything else that is using these files must understand what encoding the file is in if they want to read them. If you are the only one doing the reading/writing this isn’t a problem, otherwise make sure that you write in a form understandable by whatever else uses the files.
In Python 3, this form of file access is the default, and the built-in open function will take an encoding parameter and always translate to/from Unicode strings (the default string object in Python 3) for files opened in text mode.
回答 4
这是一个例子:
>>> u = u'€€€'>>> s = u.encode('utf8')>>> s
'\xe2\x82\xac\xe2\x82\xac\xe2\x82\xac'
Well, if you’re willing/ready to switch to Python 3 (which you may not be due to the backwards incompatibility with some Python 2 code), you don’t have to do any converting; all text in Python 3 is represented with Unicode strings, which also means that there’s no more usage of the u'<text>' syntax. You also have what are, in effect, strings of bytes, which are used to represent data (which may be an encoded string).
python3
>>>print("no me llama mucho la atenci\u00f3n")
输出正确:
output: no me llama mucho la atención
但是使用脚本加载此字符串变量无法正常工作。
这是对我的案例起作用的,以防万一:
string_to_convert ="no me llama mucho la atenci\u00f3n"print(json.dumps(json.loads(r'"%s"'% string_to_convert), ensure_ascii=False))
output: no me llama mucho la atención
No answere worked for my case, where I had a string variable containing unicode chars, and no encode-decode explained here did the work.
If I do in a Terminal
echo "no me llama mucho la atenci\u00f3n"
or
python3
>>> print("no me llama mucho la atenci\u00f3n")
The output is correct:
output: no me llama mucho la atención
But working with scripts loading this string variable didn’t work.
This is what worked on my case, in case helps anybody:
string_to_convert = "no me llama mucho la atenci\u00f3n"
print(json.dumps(json.loads(r'"%s"' % string_to_convert), ensure_ascii=False))
output: no me llama mucho la atención
val is_float(val)Note--------------------------------------------------------------""FalseBlank string
"127"TruePassed string
TrueTruePure sweet Truth"True"FalseVile contemptible lie
FalseTrueSo false it becomes true
"123.456"TrueDecimal" -127 "TrueSpaces trimmed
"\t\n12\r\n"True whitespace ignored
"NaN"TrueNot a number
"NaNanananaBATMAN"False I am Batman"-iNF"TrueNegative infinity
"123.E4"TrueExponential notation
".1"True mantissa only
"1,234"FalseCommas gtfo
u'\x30'TrueUnicodeis fine."NULL"FalseNullisnot special
0x3fadeTrueHexadecimal"6e7777777777777"TrueShrunk to infinity
"1.797693e+308"TrueThisis max value
"infinity"TrueSameas inf
"infinityandBEYOND"FalseExtra characters wreck it
"12.34.56"FalseOnly one dot allowed
u'四'FalseJapanese'4'isnot a float."#56"FalsePound sign
"56%"FalsePercent of what?"0E0"TrueExponential, move dot 0 places
0**0True0___0 Exponentiation"-5e-5"TrueRaise to a negative number
"+1e1"TruePlusis OK with exponent
"+1e1^5"FalseFancy exponent not interpreted
"+1e1.3"FalseNo decimals in exponent
"-+1"FalseMake up your mind
"(1)"FalseParenthesisis bad
A longer and more accurate name for this function could be: is_convertible_to_float(value)
What is, and is not a float in Python may surprise you:
val is_float(val) Note
-------------------- ---------- --------------------------------
"" False Blank string
"127" True Passed string
True True Pure sweet Truth
"True" False Vile contemptible lie
False True So false it becomes true
"123.456" True Decimal
" -127 " True Spaces trimmed
"\t\n12\r\n" True whitespace ignored
"NaN" True Not a number
"NaNanananaBATMAN" False I am Batman
"-iNF" True Negative infinity
"123.E4" True Exponential notation
".1" True mantissa only
"1,234" False Commas gtfo
u'\x30' True Unicode is fine.
"NULL" False Null is not special
0x3fade True Hexadecimal
"6e7777777777777" True Shrunk to infinity
"1.797693e+308" True This is max value
"infinity" True Same as inf
"infinityandBEYOND" False Extra characters wreck it
"12.34.56" False Only one dot allowed
u'四' False Japanese '4' is not a float.
"#56" False Pound sign
"56%" False Percent of what?
"0E0" True Exponential, move dot 0 places
0**0 True 0___0 Exponentiation
"-5e-5" True Raise to a negative number
"+1e1" True Plus is OK with exponent
"+1e1^5" False Fancy exponent not interpreted
"+1e1.3" False No decimals in exponent
"-+1" False Make up your mind
"(1)" False Parenthesis is bad
You think you know what numbers are? You are not so good as you think! Not big surprise.
Don’t use this code on life-critical software!
Catching broad exceptions this way, killing canaries and gobbling the exception creates a tiny chance that a valid float as string will return false. The float(...) line of code can failed for any of a thousand reasons that have nothing to do with the contents of the string. But if you’re writing life-critical software in a duck-typing prototype language like Python, then you’ve got much larger problems.
You should consider the possibility of commas in the string representation of a number, for cases like float("545,545.2222") which throws an exception. Instead, use methods in locale to convert the strings to numbers and interpret commas correctly. The locale.atof method converts to a float in one step once the locale has been set for the desired number convention.
Example 1 — United States number conventions
In the United States and the UK, commas can be used as a thousands separator. In this example with American locale, the comma is handled properly as a separator:
In the majority of countries of the world, commas are used for decimal marks instead of periods. In this example with French locale, the comma is correctly handled as a decimal mark:
If you aren’t averse to third-party modules, you could check out the fastnumbers module. It provides a function called fast_real that does exactly what this question is asking for and does it faster than a pure-Python implementation:
>>> from fastnumbers import fast_real
>>> fast_real("545.2222")
545.2222
>>> type(fast_real("545.2222"))
float
>>> fast_real("31")
31
>>> type(fast_real("31"))
int
Users codelogic and harley are correct, but keep in mind if you know the string is an integer (for example, 545) you can call int(“545”) without first casting to float.
If your strings are in a list, you could use the map function as well.
>>>0b10101# binary flags21>>>0o755# read, write, execute perms for owner, read & ex for group & others493>>>0xffffff# the color, white, max values for red, green, and blue16777215
使模棱两可的Python 2八进制与Python 3兼容
如果您在Python 2中看到一个以0开头的整数,则这是(不建议使用的)八进制语法。
>>>03731
这很糟糕,因为看起来值应该是37。因此,在Python 3中,它现在引发了SyntaxError:
>>>037File"<stdin>", line 1037^SyntaxError: invalid token
In Python, how can I parse a numeric string like “545.2222” to its corresponding float value, 542.2222? Or parse the string “31” to an integer, 31?
I just want to know how to parse a float string to a float, and (separately) an int string to an int.
It’s good that you ask to do these separately. If you’re mixing them, you may be setting yourself up for problems later. The simple answer is:
"545.2222" to float:
>>> float("545.2222")
545.2222
"31" to an integer:
>>> int("31")
31
Other conversions, ints to and from strings and literals:
Conversions from various bases, and you should know the base in advance (10 is the default). Note you can prefix them with what Python expects for its literals (see below) or remove the prefix:
Non-Decimal (i.e. Integer) Literals from other Bases
If your motivation is to have your own code clearly represent hard-coded specific values, however, you may not need to convert from the bases – you can let Python do it for you automatically with the correct syntax.
You can use the apropos prefixes to get automatic conversion to integers with the following literals. These are valid for Python 2 and 3:
Binary, prefix 0b
>>> 0b11111
31
Octal, prefix 0o
>>> 0o37
31
Hexadecimal, prefix 0x
>>> 0x1f
31
This can be useful when describing binary flags, file permissions in code, or hex values for colors – for example, note no quotes:
>>> 0b10101 # binary flags
21
>>> 0o755 # read, write, execute perms for owner, read & ex for group & others
493
>>> 0xffffff # the color, white, max values for red, green, and blue
16777215
Making ambiguous Python 2 octals compatible with Python 3
If you see an integer that starts with a 0, in Python 2, this is (deprecated) octal syntax.
>>> 037
31
It is bad because it looks like the value should be 37. So in Python 3, it now raises a SyntaxError:
The question seems a little bit old. But let me suggest a function, parseStr, which makes something similar, that is, returns integer or float and if a given ASCII string cannot be converted to none of them it returns it untouched. The code of course might be adjusted to do only what you want:
>>> import string
>>> parseStr = lambda x: x.isalpha() and x or x.isdigit() and \
... int(x) or x.isalnum() and x or \
... len(set(string.punctuation).intersection(x)) == 1 and \
... x.count('.') == 1 and float(x) or x
>>> parseStr('123')
123
>>> parseStr('123.3')
123.3
>>> parseStr('3HC1')
'3HC1'
>>> parseStr('12.e5')
1200000.0
>>> parseStr('12$5')
'12$5'
>>> parseStr('12.2.2')
'12.2.2'
>>>import yaml
>>> a ="545.2222">>> result = yaml.load(a)>>> result
545.22220000000004>>> type(result)<type 'float'>>>> b ="31">>> result = yaml.load(b)>>> result
31>>> type(result)<type 'int'>>>> c ="HI">>> result = yaml.load(c)>>> result
'HI'>>> type(result)<type 'str'>
def num(s):"""num(s)
num(3),num(3.7)-->3
num('3')-->3, num('3.7')-->3.7
num('3,700')-->ValueError
num('3a'),num('a3'),-->ValueError
num('3e4') --> 30000.0
"""try:return int(s)exceptValueError:try:return float(s)exceptValueError:raiseValueError('argument is not a string of number')
import re
def parseNumber(value, as_int=False):try:
number = float(re.sub('[^.\-\d]','', value))if as_int:return int(number +0.5)else:return number
exceptValueError:return float('nan')# or None if you wish
I am surprised nobody mentioned regex because sometimes string must be prepared and normalized before casting to number
import re
def parseNumber(value, as_int=False):
try:
number = float(re.sub('[^.\-\d]', '', value))
if as_int:
return int(number + 0.5)
else:
return number
except ValueError:
return float('nan') # or None if you wish
To typecast in python use the constructor funtions of the type, passing the string (or whatever value you are trying to cast) as a parameter.
For example:
>>>float("23.333")
23.333
Behind the scenes, python is calling the objects __float__ method, which should return a float representation of the parameter. This is especially powerful, as you can define your own types (using classes) with a __float__ method so that it can be casted into a float using float(myobject).
This will try to parse a string and return either int or float depending on what the string represents.
It might rise parsing exceptions or have some unexpected behaviour.
def num(s):
try:
for each in s:
yield int(each)
except ValueError:
yield float(each)
a = num(["123.55","345","44"])
print a.next()
print a.next()
This is the most Pythonic way I could come up with.
回答 21
处理十六进制,八进制,二进制,十进制和浮点数
该解决方案将处理数字的所有字符串约定(我所知道的全部)。
def to_number(n):''' Convert any number representation to a number
This covers: float, decimal, hex, and octal numbers.
'''try:return int(str(n),0)except:try:# python 3 doesn't accept "010" as a valid octal. You must use the# '0o' prefixreturn int('0o'+ n,0)except:return float(n)
This solution will handle all of the string conventions for numbers (all that I know about).
def to_number(n):
''' Convert any number representation to a number
This covers: float, decimal, hex, and octal numbers.
'''
try:
return int(str(n), 0)
except:
try:
# python 3 doesn't accept "010" as a valid octal. You must use the
# '0o' prefix
return int('0o' + n, 0)
except:
return float(n)
This test case output illustrates what I’m talking about.
def conv_to_num(x, num_type='asis'):'''Converts an object to a number if possible.
num_type: int, float, 'asis'
Defaults to floating point in case of ambiguity.
'''import numbers
is_num, is_str, is_other =[False]*3if isinstance(x, numbers.Number):
is_num =Trueelif isinstance(x, str):
is_str =True
is_other =not any([is_num, is_str])if is_num:
res = x
elif is_str:
is_float, is_int, is_char =[False]*3try:
res = float(x)if'.'in x:
is_float =Trueelse:
is_int =TrueexceptValueError:
res = x
is_char =Trueelse:if num_type =='asis':
funcs =[int, float]else:
funcs =[num_type]for func in funcs:try:
res = func(x)breakexceptTypeError:continueelse:
res = x
This is a function which will convert any object (not just str) to int or float, based on if the actual string supplied looks likeint or float. Further if it’s an object which has both __float and __int__ methods, it defaults to using __float__
def conv_to_num(x, num_type='asis'):
'''Converts an object to a number if possible.
num_type: int, float, 'asis'
Defaults to floating point in case of ambiguity.
'''
import numbers
is_num, is_str, is_other = [False]*3
if isinstance(x, numbers.Number):
is_num = True
elif isinstance(x, str):
is_str = True
is_other = not any([is_num, is_str])
if is_num:
res = x
elif is_str:
is_float, is_int, is_char = [False]*3
try:
res = float(x)
if '.' in x:
is_float = True
else:
is_int = True
except ValueError:
res = x
is_char = True
else:
if num_type == 'asis':
funcs = [int, float]
else:
funcs = [num_type]
for func in funcs:
try:
res = func(x)
break
except TypeError:
continue
else:
res = x
eval() is a very good solution to this question. It doesn’t need to check if the number is int or float, it just gives the corresponding equivalent.
If other methods are required, try
if '.' in string:
print(float(string))
else:
print(int(string))
try-except can also be used as an alternative. Try converting string to int inside the try block. If the string would be a float value, it will throw an error which will be catched in the except block, like this
Theoretically, there’s an injection vulnerability. The string could, for example be "import os; os.abort()". Without any background on where the string comes from, however, the possibility is theoretical speculation. Since the question is vague, it’s not at all clear if this vulnerability actually exists or not.
A regex or other string parsing method would be uglier and slower.
I’m not sure that anything much could be faster than the above. It calls the function and returns. Try/Catch doesn’t introduce much overhead because the most common exception is caught without an extensive search of stack frames.
The issue is that any numeric conversion function has two kinds of results
A number, if the number is valid
A status code (e.g., via errno) or exception to show that no valid number could be parsed.
C (as an example) hacks around this a number of ways. Python lays it out clearly and explicitly.
def is_number_tryexcept(s):""" Returns True is string is a number. """try:
float(s)returnTrueexceptValueError:returnFalseimport re
def is_number_regex(s):""" Returns True is string is a number. """if re.match("^\d+?\.\d+?$", s)isNone:return s.isdigit()returnTruedef is_number_repl_isdigit(s):""" Returns True is string is a number. """return s.replace('.','',1).isdigit()
funcs =[
is_number_tryexcept,
is_number_regex,
is_number_repl_isdigit
]
a_float ='.1234'print('Float notation ".1234" is not supported by:')for f in funcs:ifnot f(a_float):print('\t -', f.__name__)
浮点符号“ .1234”不受以下支持:
-is_number_regex
scientific1 ='1.000000e+50'
scientific2 ='1e50'print('Scientific notation "1.000000e+50" is not supported by:')for f in funcs:ifnot f(scientific1):print('\t -', f.__name__)print('Scientific notation "1e50" is not supported by:')for f in funcs:ifnot f(scientific2):print('\t -', f.__name__)
import timeit
test_cases =['1.12345','1.12.345','abc12345','12345']
times_n ={f.__name__:[]for f in funcs}for t in test_cases:for f in funcs:
f = f.__name__
times_n[f].append(min(timeit.Timer('%s(t)'%f,'from __main__ import %s, t'%f).repeat(repeat=3, number=1000000)))
测试以下功能的地方
from re import match as re_match
from re import compile as re_compile
def is_number_tryexcept(s):""" Returns True is string is a number. """try:
float(s)returnTrueexceptValueError:returnFalsedef is_number_regex(s):""" Returns True is string is a number. """if re_match("^\d+?\.\d+?$", s)isNone:return s.isdigit()returnTrue
comp = re_compile("^\d+?\.\d+?$")def compiled_regex(s):""" Returns True is string is a number. """if comp.match(s)isNone:return s.isdigit()returnTruedef is_number_repl_isdigit(s):""" Returns True is string is a number. """return s.replace('.','',1).isdigit()
TL;DR The best solution is s.replace('.','',1).isdigit()
I did some benchmarks comparing the different approaches
def is_number_tryexcept(s):
""" Returns True is string is a number. """
try:
float(s)
return True
except ValueError:
return False
import re
def is_number_regex(s):
""" Returns True is string is a number. """
if re.match("^\d+?\.\d+?$", s) is None:
return s.isdigit()
return True
def is_number_repl_isdigit(s):
""" Returns True is string is a number. """
return s.replace('.','',1).isdigit()
If the string is not a number, the except-block is quite slow. But more importantly, the try-except method is the only approach that handles scientific notations correctly.
funcs = [
is_number_tryexcept,
is_number_regex,
is_number_repl_isdigit
]
a_float = '.1234'
print('Float notation ".1234" is not supported by:')
for f in funcs:
if not f(a_float):
print('\t -', f.__name__)
Float notation “.1234” is not supported by:
– is_number_regex
scientific1 = '1.000000e+50'
scientific2 = '1e50'
print('Scientific notation "1.000000e+50" is not supported by:')
for f in funcs:
if not f(scientific1):
print('\t -', f.__name__)
print('Scientific notation "1e50" is not supported by:')
for f in funcs:
if not f(scientific2):
print('\t -', f.__name__)
Scientific notation “1.000000e+50” is not supported by:
– is_number_regex
– is_number_repl_isdigit
Scientific notation “1e50” is not supported by:
– is_number_regex
– is_number_repl_isdigit
EDIT: The benchmark results
import timeit
test_cases = ['1.12345', '1.12.345', 'abc12345', '12345']
times_n = {f.__name__:[] for f in funcs}
for t in test_cases:
for f in funcs:
f = f.__name__
times_n[f].append(min(timeit.Timer('%s(t)' %f,
'from __main__ import %s, t' %f)
.repeat(repeat=3, number=1000000)))
where the following functions were tested
from re import match as re_match
from re import compile as re_compile
def is_number_tryexcept(s):
""" Returns True is string is a number. """
try:
float(s)
return True
except ValueError:
return False
def is_number_regex(s):
""" Returns True is string is a number. """
if re_match("^\d+?\.\d+?$", s) is None:
return s.isdigit()
return True
comp = re_compile("^\d+?\.\d+?$")
def compiled_regex(s):
""" Returns True is string is a number. """
if comp.match(s) is None:
return s.isdigit()
return True
def is_number_repl_isdigit(s):
""" Returns True is string is a number. """
return s.replace('.','',1).isdigit()
There is one exception that you may want to take into account: the string ‘NaN’
If you want is_number to return FALSE for ‘NaN’ this code will not work as Python converts it to its representation of a number that is not a number (talk about identity issues):
>>> float('NaN')
nan
Otherwise, I should actually thank you for the piece of code I now use extensively. :)
which will return true only if there is one or no ‘.’ in the string of digits.
'3.14.5'.replace('.','',1).isdigit()
will return false
edit: just saw another comment …
adding a .replace(badstuff,'',maxnum_badstuff) for other cases can be done. if you are passing salt and not arbitrary condiments (ref:xkcd#974) this will do fine :P
It may take some getting used to, but this is the pythonic way of doing it. As has been already pointed out, the alternatives are worse. But there is one other advantage of doing things this way: polymorphism.
The central idea behind duck typing is that “if it walks and talks like a duck, then it’s a duck.” What if you decide that you need to subclass string so that you can change how you determine if something can be converted into a float? Or what if you decide to test some other object entirely? You can do these things without having to change the above code.
Other languages solve these problems by using interfaces. I’ll save the analysis of which solution is better for another thread. The point, though, is that python is decidedly on the duck typing side of the equation, and you’re probably going to have to get used to syntax like this if you plan on doing much programming in Python (but that doesn’t mean you have to like it of course).
One other thing you might want to take into consideration: Python is pretty fast in throwing and catching exceptions compared to a lot of other languages (30x faster than .Net for instance). Heck, the language itself even throws exceptions to communicate non-exceptional, normal program conditions (every time you use a for loop). Thus, I wouldn’t worry too much about the performance aspects of this code until you notice a significant problem.
回答 6
在Alfe指出您不需要单独检查float之后进行了更新,因为这两种情况都比较复杂:
def is_number(s):try:
complex(s)# for int, long, float and complexexceptValueError:returnFalsereturnTrue
先前曾说过:在极少数情况下,您可能还需要检查复数(例如1 + 2i),而复数不能用浮点数表示:
def is_number(s):try:
float(s)# for int, long and floatexceptValueError:try:
complex(s)# for complexexceptValueError:returnFalsereturnTrue
float.parse('giggity')// throws TypeException
float.parse('54.3')// returns the scalar value 54.3
float.tryParse('twank')// returns None
float.tryParse('32.2')// returns the scalar value 32.2
Note: You don’t want to return the boolean ‘False’ because that’s still a value type. None is better because it indicates failure. Of course, if you want something different you can change the fail parameter to whatever you want.
To extend float to include the ‘parse()’ and ‘try_parse()’ you’ll need to monkeypatch the ‘float’ class to add these methods.
If you want respect pre-existing functions the code should be something like:
For strings of non-numbers, try: except: is actually slower than regular expressions. For strings of valid numbers, regex is slower. So, the appropriate method depends on your input.
If you find that you are in a performance bind, you can use a new third-party module called fastnumbers that provides a function called isfloat. Full disclosure, I am the author. I have included its results in the timings below.
from __future__ import print_function
import timeit
prep_base = '''\
x = 'invalid'
y = '5402'
z = '4.754e3'
'''
prep_try_method = '''\
def is_number_try(val):
try:
float(val)
return True
except ValueError:
return False
'''
prep_re_method = '''\
import re
float_match = re.compile(r'[-+]?\d*\.?\d+(?:[eE][-+]?\d+)?$').match
def is_number_re(val):
return bool(float_match(val))
'''
fn_method = '''\
from fastnumbers import isfloat
'''
print('Try with non-number strings', timeit.timeit('is_number_try(x)',
prep_base + prep_try_method), 'seconds')
print('Try with integer strings', timeit.timeit('is_number_try(y)',
prep_base + prep_try_method), 'seconds')
print('Try with float strings', timeit.timeit('is_number_try(z)',
prep_base + prep_try_method), 'seconds')
print()
print('Regex with non-number strings', timeit.timeit('is_number_re(x)',
prep_base + prep_re_method), 'seconds')
print('Regex with integer strings', timeit.timeit('is_number_re(y)',
prep_base + prep_re_method), 'seconds')
print('Regex with float strings', timeit.timeit('is_number_re(z)',
prep_base + prep_re_method), 'seconds')
print()
print('fastnumbers with non-number strings', timeit.timeit('isfloat(x)',
prep_base + 'from fastnumbers import isfloat'), 'seconds')
print('fastnumbers with integer strings', timeit.timeit('isfloat(y)',
prep_base + 'from fastnumbers import isfloat'), 'seconds')
print('fastnumbers with float strings', timeit.timeit('isfloat(z)',
prep_base + 'from fastnumbers import isfloat'), 'seconds')
print()
Try with non-number strings 2.39108395576 seconds
Try with integer strings 0.375686168671 seconds
Try with float strings 0.369210958481 seconds
Regex with non-number strings 0.748660802841 seconds
Regex with integer strings 1.02021503448 seconds
Regex with float strings 1.08564686775 seconds
fastnumbers with non-number strings 0.174362897873 seconds
fastnumbers with integer strings 0.179651021957 seconds
fastnumbers with float strings 0.20222902298 seconds
As you can see
try: except: was fast for numeric input but very slow for an invalid input
I know this is particularly old but I would add an answer I believe covers the information missing from the highest voted answer that could be very valuable to any who find this:
For each of the following methods connect them with a count if you need any input to be accepted. (Assuming we are using vocal definitions of integers rather than 0-255, etc.)
x.isdigit()
works well for checking if x is an integer.
x.replace('-','').isdigit()
works well for checking if x is a negative.(Check – in first position)
x.replace('.','').isdigit()
works well for checking if x is a decimal.
x.replace(':','').isdigit()
works well for checking if x is a ratio.
x.replace('/','',1).isdigit()
works well for checking if x is a fraction.
def is_number(n):try:
float(n)# Type-casting the string to `float`.# If string is not a valid `float`, # it'll raise `ValueError` exceptionexceptValueError:returnFalsereturnTrue
# `nan_num` variable is taken from above example>>> nan_num == nan_num
False
因此,上述功能is_number可以更新,返回False的"NaN"是:
def is_number(n):
is_number =Truetry:
num = float(n)# check for "nan" floats
is_number = num == num # or use `math.isnan(num)`exceptValueError:
is_number =Falsereturn is_number
样品运行:
>>> is_number('Nan')# not a number "Nan" stringFalse>>> is_number('nan')# not a number string "nan" with all lower casedFalse>>> is_number('123')# positive integerTrue>>> is_number('-123')# negative integerTrue>>> is_number('-1.12')# negative `float`True>>> is_number('abc')# "some random" stringFalse
This answer provides step by step guide having function with examples to find the string is:
Positive integer
Positive/negative – integer/float
How to discard “NaN” (not a number) strings while checking for number?
Check if string is positive integer
You may use str.isdigit() to check whether given string is positive integer.
Sample Results:
# For digit
>>> '1'.isdigit()
True
>>> '1'.isalpha()
False
Check for string as positive/negative – integer/float
str.isdigit() returns False if the string is a negative number or a float number. For example:
# returns `False` for float
>>> '123.3'.isdigit()
False
# returns `False` for negative number
>>> '-123'.isdigit()
False
If you want to also check for the negative integers and float, then you may write a custom function to check for it as:
def is_number(n):
try:
float(n) # Type-casting the string to `float`.
# If string is not a valid `float`,
# it'll raise `ValueError` exception
except ValueError:
return False
return True
Sample Run:
>>> is_number('123') # positive integer number
True
>>> is_number('123.4') # positive float number
True
>>> is_number('-123') # negative integer number
True
>>> is_number('-123.4') # negative `float` number
True
>>> is_number('abc') # `False` for "some random" string
False
Discard “NaN” (not a number) strings while checking for number
The above functions will return True for the “NAN” (Not a number) string because for Python it is valid float representing it is not a number. For example:
>>> is_number('NaN')
True
In order to check whether the number is “NaN”, you may use math.isnan() as:
>>> import math
>>> nan_num = float('nan')
>>> math.isnan(nan_num)
True
Or if you don’t want to import additional library to check this, then you may simply check it via comparing it with itself using ==. Python returns False when nan float is compared with itself. For example:
# `nan_num` variable is taken from above example
>>> nan_num == nan_num
False
Hence, above function is_number can be updated to return False for "NaN" as:
def is_number(n):
is_number = True
try:
num = float(n)
# check for "nan" floats
is_number = num == num # or use `math.isnan(num)`
except ValueError:
is_number = False
return is_number
Sample Run:
>>> is_number('Nan') # not a number "Nan" string
False
>>> is_number('nan') # not a number string "nan" with all lower cased
False
>>> is_number('123') # positive integer
True
>>> is_number('-123') # negative integer
True
>>> is_number('-1.12') # negative `float`
True
>>> is_number('abc') # "some random" string
False
PS: Each operation for each check depending on the type of number comes with additional overhead. Choose the version of is_number function which fits your requirement.
Casting to float and catching ValueError is probably the fastest way, since float() is specifically meant for just that. Anything else that requires string parsing (regex, etc) will likely be slower due to the fact that it’s not tuned for this operation. My $0.02.
回答 13
您可以使用Unicode字符串,它们有一种方法可以执行您想要的操作:
>>> s = u"345">>> s.isnumeric()True
要么:
>>> s ="345">>> u = unicode(s)>>> u.isnumeric()True
import time, re, random, string
ITERATIONS =10000000classTimer:def __enter__(self):
self.start = time.clock()return self
def __exit__(self,*args):
self.end = time.clock()
self.interval = self.end - self.start
def check_regexp(x):return re.compile("^\d*\.?\d*$").match(x)isnotNonedef check_replace(x):return x.replace('.','',1).isdigit()def check_exception(s):try:
float(s)returnTrueexceptValueError:returnFalse
to_check =[check_regexp, check_replace, check_exception]print('preparing data...')
good_numbers =[
str(random.random()/ random.random())for x in range(ITERATIONS)]
bad_numbers =['.'+ x for x in good_numbers]
strings =[''.join(random.choice(string.ascii_uppercase + string.digits)for _ in range(random.randint(1,10)))for x in range(ITERATIONS)]print('running test...')for func in to_check:withTimer()as t:for x in good_numbers:
res = func(x)print('%s with good floats: %s'%(func.__name__, t.interval))withTimer()as t:for x in bad_numbers:
res = func(x)print('%s with bad floats: %s'%(func.__name__, t.interval))withTimer()as t:for x in strings:
res = func(x)print('%s with strings: %s'%(func.__name__, t.interval))
以下是2017年MacBook Pro 13上Python 2.7.10的结果:
check_regexp with good floats:12.688639
check_regexp with bad floats:11.624862
check_regexp with strings:11.349414
check_replace with good floats:4.419841
check_replace with bad floats:4.294909
check_replace with strings:4.086358
check_exception with good floats:3.276668
check_exception with bad floats:13.843092
check_exception with strings:15.786169
以下是2017年MacBook Pro 13上Python 3.6.5的结果:
check_regexp with good floats:13.472906000000009
check_regexp with bad floats:12.977665000000016
check_regexp with strings:12.417542999999995
check_replace with good floats:6.011045999999993
check_replace with bad floats:4.849356
check_replace with strings:4.282754000000011
check_exception with good floats:6.039081999999979
check_exception with bad floats:9.322753000000006
check_exception with strings:9.952595000000002
以下是2017年MacBook Pro 13上PyPy 2.7.13的结果:
check_regexp with good floats:2.693217
check_regexp with bad floats:2.744819
check_regexp with strings:2.532414
check_replace with good floats:0.604367
check_replace with bad floats:0.538169
check_replace with strings:0.598664
check_exception with good floats:1.944103
check_exception with bad floats:2.449182
check_exception with strings:2.200056
I wanted to see which method is fastest. Overall the best and most consistent results were given by the check_replace function. The fastest results were given by the check_exception function, but only if there was no exception fired – meaning its code is the most efficient, but the overhead of throwing an exception is quite large.
Please note that checking for a successful cast is the only method which is accurate, for example, this works with check_exception but the other two test functions will return False for a valid float:
huge_number = float('1e+100')
Here is the benchmark code:
import time, re, random, string
ITERATIONS = 10000000
class Timer:
def __enter__(self):
self.start = time.clock()
return self
def __exit__(self, *args):
self.end = time.clock()
self.interval = self.end - self.start
def check_regexp(x):
return re.compile("^\d*\.?\d*$").match(x) is not None
def check_replace(x):
return x.replace('.','',1).isdigit()
def check_exception(s):
try:
float(s)
return True
except ValueError:
return False
to_check = [check_regexp, check_replace, check_exception]
print('preparing data...')
good_numbers = [
str(random.random() / random.random())
for x in range(ITERATIONS)]
bad_numbers = ['.' + x for x in good_numbers]
strings = [
''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(random.randint(1,10)))
for x in range(ITERATIONS)]
print('running test...')
for func in to_check:
with Timer() as t:
for x in good_numbers:
res = func(x)
print('%s with good floats: %s' % (func.__name__, t.interval))
with Timer() as t:
for x in bad_numbers:
res = func(x)
print('%s with bad floats: %s' % (func.__name__, t.interval))
with Timer() as t:
for x in strings:
res = func(x)
print('%s with strings: %s' % (func.__name__, t.interval))
Here are the results with Python 2.7.10 on a 2017 MacBook Pro 13:
check_regexp with good floats: 12.688639
check_regexp with bad floats: 11.624862
check_regexp with strings: 11.349414
check_replace with good floats: 4.419841
check_replace with bad floats: 4.294909
check_replace with strings: 4.086358
check_exception with good floats: 3.276668
check_exception with bad floats: 13.843092
check_exception with strings: 15.786169
Here are the results with Python 3.6.5 on a 2017 MacBook Pro 13:
check_regexp with good floats: 13.472906000000009
check_regexp with bad floats: 12.977665000000016
check_regexp with strings: 12.417542999999995
check_replace with good floats: 6.011045999999993
check_replace with bad floats: 4.849356
check_replace with strings: 4.282754000000011
check_exception with good floats: 6.039081999999979
check_exception with bad floats: 9.322753000000006
check_exception with strings: 9.952595000000002
Here are the results with PyPy 2.7.13 on a 2017 MacBook Pro 13:
check_regexp with good floats: 2.693217
check_regexp with bad floats: 2.744819
check_regexp with strings: 2.532414
check_replace with good floats: 0.604367
check_replace with bad floats: 0.538169
check_replace with strings: 0.598664
check_exception with good floats: 1.944103
check_exception with bad floats: 2.449182
check_exception with strings: 2.200056
def is_number(s):try:
n=str(float(s))if n =="nan"or n=="inf"or n=="-inf":returnFalseexceptValueError:try:
complex(s)# for complexexceptValueError:returnFalsereturnTrue
>>> a=454>>> a.isdigit()Traceback(most recent call last):File"<stdin>", line 1,in<module>AttributeError:'int' object has no attribute 'isdigit'>>> a="454">>> a.isdigit()True
Finds whether the given variable is numeric. Numeric strings consist of optional sign, any number of digits, optional decimal part and optional exponential part. Thus +0123.45e6 is a valid numeric value. Hexadecimal (e.g. 0xf4c3b00c) and binary (e.g. 0b10100111001) notation is not allowed.
is_numeric function
import ast
import numbers
def is_numeric(obj):
if isinstance(obj, numbers.Number):
return True
elif isinstance(obj, str):
nodes = list(ast.walk(ast.parse(obj)))[1:]
if not isinstance(nodes[0], ast.Expr):
return False
if not isinstance(nodes[-1], ast.Num):
return False
nodes = nodes[1:-1]
for i in range(len(nodes)):
#if used + or - in digit :
if i % 2 == 0:
if not isinstance(nodes[i], ast.UnaryOp):
return False
else:
if not isinstance(nodes[i], (ast.USub, ast.UAdd)):
return False
return True
else:
return False
Finds whether the given variable is float. float strings consist of optional sign, any number of digits, …
import ast
def is_float(obj):
if isinstance(obj, float):
return True
if isinstance(obj, int):
return False
elif isinstance(obj, str):
nodes = list(ast.walk(ast.parse(obj)))[1:]
if not isinstance(nodes[0], ast.Expr):
return False
if not isinstance(nodes[-1], ast.Num):
return False
if not isinstance(nodes[-1].n, float):
return False
nodes = nodes[1:-1]
for i in range(len(nodes)):
if i % 2 == 0:
if not isinstance(nodes[i], ast.UnaryOp):
return False
else:
if not isinstance(nodes[i], (ast.USub, ast.UAdd)):
return False
return True
else:
return False
I did some speed test. Lets say that if the string is likely to be a number the try/except strategy is the fastest possible.If the string is not likely to be a number and you are interested in Integer check, it worths to do some test (isdigit plus heading ‘-‘).
If you are interested to check float number, you have to use the try/except code whitout escape.
def str_to_type (s):""" Get possible cast type for a string
Parameters
----------
s : string
Returns
-------
float,int,str,bool : type
Depending on what it can be cast to
"""try:
f = float(s)if"."notin s:return int
return float
exceptValueError:
value = s.upper()if value =="TRUE"or value =="FALSE":return bool
return type(s)
例
str_to_type("true")# bool
str_to_type("6.0")# float
str_to_type("6")# int
str_to_type("6abc")# str
str_to_type(u"6abc")# unicode
您可以捕获类型并使用它
s ="6.0"
type_ = str_to_type(s)# float
f = type_(s)
I needed to determine if a string cast into basic types (float,int,str,bool). After not finding anything on the internet I created this:
def str_to_type (s):
""" Get possible cast type for a string
Parameters
----------
s : string
Returns
-------
float,int,str,bool : type
Depending on what it can be cast to
"""
try:
f = float(s)
if "." not in s:
return int
return float
except ValueError:
value = s.upper()
if value == "TRUE" or value == "FALSE":
return bool
return type(s)
If you want to return False for a NaN and Inf, change line to x = float(s); return (x == x) and (x – 1 != x). This should return True for all floats except Inf and NaN
But this doesn’t quite work, because for sufficiently large floats, x-1 == x returns true. For example, 2.0**54 - 1 == 2.0**54
check_regexp with good floats:18.001921
check_regexp with bad floats:17.861423
check_regexp with strings:17.558862
check_correct_regexp with good floats:11.04428
check_correct_regexp with bad floats:8.71211
check_correct_regexp with strings:8.144161
check_replace with good floats:6.020597
check_replace with bad floats:5.343049
check_replace with strings:5.091642
check_exception with good floats:5.201605
check_exception with bad floats:23.921864
check_exception with strings:23.755481
I think your solution is fine, but there is a correct regexp implementation.
There does seem to be a lot of regexp hate towards these answers which I think is unjustified, regexps can be reasonably clean and correct and fast. It really depends on what you’re trying to do. The original question was how can you “check if a string can be represented as a number (float)” (as per your title). Presumably you would want to use the numeric/float value once you’ve checked that it’s valid, in which case your try/except makes a lot of sense. But if, for some reason, you just want to validate that a string is a number then a regex also works fine, but it’s hard to get correct. I think most of the regex answers so far, for example, do not properly parse strings without an integer part (such as “.7”) which is a float as far as python is concerned. And that’s slightly tricky to check for in a single regex where the fractional portion is not required. I’ve included two regex to show this.
It does raise the interesting question as to what a “number” is. Do you include “inf” which is valid as a float in python? Or do you include numbers that are “numbers” but maybe can’t be represented in python (such as numbers that are larger than the float max).
There’s also ambiguities in how you parse numbers. For example, what about “–20”? Is this a “number”? Is this a legal way to represent “20”? Python will let you do “var = –20” and set it to 20 (though really this is because it treats it as an expression), but float(“–20”) does not work.
Anyways, without more info, here’s a regex that I believe covers all the ints and floats as python parses them.
# Doesn't properly handle floats missing the integer part, such as ".7"
SIMPLE_FLOAT_REGEXP = re.compile(r'^[-+]?[0-9]+\.?[0-9]+([eE][-+]?[0-9]+)?$')
# Example "-12.34E+56" # sign (-)
# integer (12)
# mantissa (34)
# exponent (E+56)
# Should handle all floats
FLOAT_REGEXP = re.compile(r'^[-+]?([0-9]+|[0-9]*\.[0-9]+)([eE][-+]?[0-9]+)?$')
# Example "-12.34E+56" # sign (-)
# integer (12)
# OR
# int/mantissa (12.34)
# exponent (E+56)
def is_float(str):
return True if FLOAT_REGEXP.match(str) else False
Running the benchmarking code in @ron-reiter’s answer shows that this regex is actually faster than the normal regex and is much faster at handling bad values than the exception, which makes some sense. Results:
check_regexp with good floats: 18.001921
check_regexp with bad floats: 17.861423
check_regexp with strings: 17.558862
check_correct_regexp with good floats: 11.04428
check_correct_regexp with bad floats: 8.71211
check_correct_regexp with strings: 8.144161
check_replace with good floats: 6.020597
check_replace with bad floats: 5.343049
check_replace with strings: 5.091642
check_exception with good floats: 5.201605
check_exception with bad floats: 23.921864
check_exception with strings: 23.755481
回答 21
import re
def is_number(num):
pattern = re.compile(r'^[-+]?[-0-9]\d*\.\d*|[-+]?\.?[0-9]\d*$')
result = pattern.match(num)if result:returnTrueelse:returnFalse>>>: is_number('1')True>>>: is_number('111')True>>>: is_number('11.1')True>>>: is_number('-11.1')True>>>: is_number('inf')False>>>: is_number('-inf')False
Replace the myvar.apppend with whatever operation you want to do with the string if it turns out to be a number. The idea is to try to use a float() operation and use the returned error to determine whether or not the string is a number.
I also used the function you mentioned, but soon I notice that strings as “Nan”, “Inf” and it’s variation are considered as number. So I propose you improved version of your function, that will return false on those type of input and will not fail “1e3” variants:
def is_float(text):
try:
float(text)
# check for nan/infinity etc.
if text.isalpha():
return False
return True
except ValueError:
return False
import sys
def fix_quotes(s):try:
float(s)return s
exceptValueError:return'"{0}"'.format(s)for line in sys.stdin:
input = line.split()print input[0],'<- c(',','.join(fix_quotes(c)for c in input[1:]),')'
You can generalize the exception technique in a useful way by returning more useful values than True and False. For example this function puts quotes round strings but leaves numbers alone. Which is just what I needed for a quick and dirty filter to make some variable definitions for R.
import sys
def fix_quotes(s):
try:
float(s)
return s
except ValueError:
return '"{0}"'.format(s)
for line in sys.stdin:
input = line.split()
print input[0], '<- c(', ','.join(fix_quotes(c) for c in input[1:]), ')'
I was working on a problem that led me to this thread, namely how to convert a collection of data to strings and numbers in the most intuitive way. I realized after reading the original code that what I needed was different in two ways:
1 – I wanted an integer result if the string represented an integer
2 – I wanted a number or a string result to stick into a data structure
so I adapted the original code to produce this derivative:
def string_or_number(s):
try:
z = int(s)
return z
except ValueError:
try:
z = float(s)
return z
except ValueError:
return s
回答 28
尝试这个。
def is_number(var):try:if var == int(var):returnTrueexceptException:returnFalse
def is_number(var):
try:
if var == int(var):
return True
except Exception:
return False
回答 29
def is_float(s):if s isNone:returnFalseif len(s)==0:returnFalse
digits_count =0
dots_count =0
signs_count =0for c in s:if'0'<= c <='9':
digits_count +=1elif c =='.':
dots_count +=1elif c =='-'or c =='+':
signs_count +=1else:returnFalseif digits_count ==0:returnFalseif dots_count >1:returnFalseif signs_count >1:returnFalsereturnTrue
def is_float(s):
if s is None:
return False
if len(s) == 0:
return False
digits_count = 0
dots_count = 0
signs_count = 0
for c in s:
if '0' <= c <= '9':
digits_count += 1
elif c == '.':
dots_count += 1
elif c == '-' or c == '+':
signs_count += 1
else:
return False
if digits_count == 0:
return False
if dots_count > 1:
return False
if signs_count > 1:
return False
return True