标签归档:string

如何获得一个函数名作为字符串?

问题:如何获得一个函数名作为字符串?

在Python中,如何在不调用函数的情况下以字符串形式获取函数名称?

def my_function():
    pass

print get_function_name_as_string(my_function) # my_function is not in quotes

应该输出"my_function"

此类功能在Python中可用吗?如果没有,关于如何get_function_name_as_string在Python中实现的任何想法?

In Python, how do I get a function name as a string, without calling the function?

def my_function():
    pass

print get_function_name_as_string(my_function) # my_function is not in quotes

should output "my_function".

Is such function available in Python? If not, any ideas on how to implement get_function_name_as_string, in Python?


回答 0

my_function.__name__

使用__name__是首选的方法,因为它可以统一应用。与不同func_name,它还可以用于内置函数:

>>> import time
>>> time.time.func_name
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'builtin_function_or_method' object has no attribute 'func_name'
>>> time.time.__name__ 
'time'

同样,双下划线向读者表明这是一个特殊的属性。另外,类和模块也具有__name__属性,因此您只记得一个特殊名称。

my_function.__name__

Using __name__ is the preferred method as it applies uniformly. Unlike func_name, it works on built-in functions as well:

>>> import time
>>> time.time.func_name
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'builtin_function_or_method' object has no attribute 'func_name'
>>> time.time.__name__ 
'time'

Also the double underscores indicate to the reader this is a special attribute. As a bonus, classes and modules have a __name__ attribute too, so you only have remember one special name.


回答 1

要从内部获取当前函数或方法的名称,请考虑:

import inspect

this_function_name = inspect.currentframe().f_code.co_name

sys._getframeinspect.currentframe尽管后者避免访问私有功能,但它也可以代替。

要获取调用函数的名称,请考虑f_back中的inspect.currentframe().f_back.f_code.co_name


如果还使用mypy,它可能会抱怨:

错误:“ Optional [FrameType]”的项目“ None”没有属性“ f_code”

要抑制上述错误,请考虑:

import inspect
import types
from typing import cast

this_function_name = cast(types.FrameType, inspect.currentframe()).f_code.co_name

To get the current function’s or method’s name from inside it, consider:

import inspect

this_function_name = inspect.currentframe().f_code.co_name

sys._getframe also works instead of inspect.currentframe although the latter avoids accessing a private function.

To get the calling function’s name instead, consider f_back as in inspect.currentframe().f_back.f_code.co_name.


If also using mypy, it can complain that:

error: Item “None” of “Optional[FrameType]” has no attribute “f_code”

To suppress the above error, consider:

import inspect
import types
from typing import cast

this_function_name = cast(types.FrameType, inspect.currentframe()).f_code.co_name

回答 2

my_function.func_name

函数还有其他有趣的属性。键入dir(func_name)以列出它们。func_name.func_code.co_code是已编译的函数,存储为字符串。

import dis
dis.dis(my_function)

将以几乎人类可读的格式显示代码。:)

my_function.func_name

There are also other fun properties of functions. Type dir(func_name) to list them. func_name.func_code.co_code is the compiled function, stored as a string.

import dis
dis.dis(my_function)

will display the code in almost human readable format. :)


回答 3

该函数将返回调用者的函数名称。

def func_name():
    import traceback
    return traceback.extract_stack(None, 2)[0][2]

就像阿尔伯特·冯普普(Albert Vonpupp)用友好的包装纸回答的那样。

This function will return the caller’s function name.

def func_name():
    import traceback
    return traceback.extract_stack(None, 2)[0][2]

It is like Albert Vonpupp’s answer with a friendly wrapper.


回答 4

如果你有兴趣类的方法也一样,Python的3.3+具有__qualname____name__

def my_function():
    pass

class MyClass(object):
    def method(self):
        pass

print(my_function.__name__)         # gives "my_function"
print(MyClass.method.__name__)      # gives "method"

print(my_function.__qualname__)     # gives "my_function"
print(MyClass.method.__qualname__)  # gives "MyClass.method"

If you’re interested in class methods too, Python 3.3+ has __qualname__ in addition to __name__.

def my_function():
    pass

class MyClass(object):
    def method(self):
        pass

print(my_function.__name__)         # gives "my_function"
print(MyClass.method.__name__)      # gives "method"

print(my_function.__qualname__)     # gives "my_function"
print(MyClass.method.__qualname__)  # gives "MyClass.method"

回答 5

我喜欢使用函数装饰器。我添加了一个类,它也乘以函数时间。假设gLog是标准的python记录器:

class EnterExitLog():
    def __init__(self, funcName):
        self.funcName = funcName

    def __enter__(self):
        gLog.debug('Started: %s' % self.funcName)
        self.init_time = datetime.datetime.now()
        return self

    def __exit__(self, type, value, tb):
        gLog.debug('Finished: %s in: %s seconds' % (self.funcName, datetime.datetime.now() - self.init_time))

def func_timer_decorator(func):
    def func_wrapper(*args, **kwargs):
        with EnterExitLog(func.__name__):
            return func(*args, **kwargs)

    return func_wrapper

所以现在您要做的就是装饰它,瞧

@func_timer_decorator
def my_func():

I like using a function decorator. I added a class, which also times the function time. Assume gLog is a standard python logger:

class EnterExitLog():
    def __init__(self, funcName):
        self.funcName = funcName

    def __enter__(self):
        gLog.debug('Started: %s' % self.funcName)
        self.init_time = datetime.datetime.now()
        return self

    def __exit__(self, type, value, tb):
        gLog.debug('Finished: %s in: %s seconds' % (self.funcName, datetime.datetime.now() - self.init_time))

def func_timer_decorator(func):
    def func_wrapper(*args, **kwargs):
        with EnterExitLog(func.__name__):
            return func(*args, **kwargs)

    return func_wrapper

so now all you have to do with your function is decorate it and voila

@func_timer_decorator
def my_func():

回答 6

sys._getframe()不能保证在所有Python实现中都可用(请参阅ref),您可以使用该traceback模块执行相同的操作,例如。

import traceback
def who_am_i():
   stack = traceback.extract_stack()
   filename, codeline, funcName, text = stack[-2]

   return funcName

调用stack[-1]将返回当前过程详细信息。

sys._getframe() is not guaranteed to be available in all implementations of Python (see ref) ,you can use the traceback module to do the same thing, eg.

import traceback
def who_am_i():
   stack = traceback.extract_stack()
   filename, codeline, funcName, text = stack[-2]

   return funcName

A call to stack[-1] will return the current process details.


回答 7

import inspect

def foo():
   print(inspect.stack()[0][3])

哪里

  • stack()[0]调用者

  • stack()[3]方法的字符串名称

import inspect

def foo():
   print(inspect.stack()[0][3])

where

  • stack()[0] the caller

  • stack()[3] the string name of the method


回答 8

作为@Demyn答案的扩展,我创建了一些实用程序函数,这些函数打印当前函数的名称和当前函数的参数:

import inspect
import logging
import traceback

def get_function_name():
    return traceback.extract_stack(None, 2)[0][2]

def get_function_parameters_and_values():
    frame = inspect.currentframe().f_back
    args, _, _, values = inspect.getargvalues(frame)
    return ([(i, values[i]) for i in args])

def my_func(a, b, c=None):
    logging.info('Running ' + get_function_name() + '(' + str(get_function_parameters_and_values()) +')')
    pass

logger = logging.getLogger()
handler = logging.StreamHandler()
formatter = logging.Formatter(
    '%(asctime)s [%(levelname)s] -> %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(logging.INFO)

my_func(1, 3) # 2016-03-25 17:16:06,927 [INFO] -> Running my_func([('a', 1), ('b', 3), ('c', None)])

As an extension of @Demyn’s answer, I created some utility functions which print the current function’s name and current function’s arguments:

import inspect
import logging
import traceback

def get_function_name():
    return traceback.extract_stack(None, 2)[0][2]

def get_function_parameters_and_values():
    frame = inspect.currentframe().f_back
    args, _, _, values = inspect.getargvalues(frame)
    return ([(i, values[i]) for i in args])

def my_func(a, b, c=None):
    logging.info('Running ' + get_function_name() + '(' + str(get_function_parameters_and_values()) +')')
    pass

logger = logging.getLogger()
handler = logging.StreamHandler()
formatter = logging.Formatter(
    '%(asctime)s [%(levelname)s] -> %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(logging.INFO)

my_func(1, 3) # 2016-03-25 17:16:06,927 [INFO] -> Running my_func([('a', 1), ('b', 3), ('c', None)])

回答 9

您只想获取函数的名称,这里是一个简单的代码。假设您已经定义了这些功能

def function1():
    print "function1"

def function2():
    print "function2"

def function3():
    print "function3"
print function1.__name__

输出将为function1

现在说您在列表中有这些功能

a = [function1 , function2 , funciton3]

获得功能的名称

for i in a:
    print i.__name__

输出将是

功能1
功能2
功能3

You just want to get the name of the function here is a simple code for that. let say you have these functions defined

def function1():
    print "function1"

def function2():
    print "function2"

def function3():
    print "function3"
print function1.__name__

the output will be function1

Now let say you have these functions in a list

a = [function1 , function2 , funciton3]

to get the name of the functions

for i in a:
    print i.__name__

the output will be

function1
function2
function3


回答 10

我看到了一些使用装饰器的答案,尽管我觉得有些冗长。这是我用来记录函数名称以及它们各自的输入和输出值的东西。我在这里对其进行了修改,以仅打印信息,而不是创建日志文件,并将其修改为应用于OP特定示例。

def debug(func=None):
    def wrapper(*args, **kwargs):
        try:
            function_name = func.__func__.__qualname__
        except:
            function_name = func.__qualname__
        return func(*args, **kwargs, function_name=function_name)
    return wrapper

@debug
def my_function(**kwargs):
    print(kwargs)

my_function()

输出:

{'function_name': 'my_function'}

I’ve seen a few answers that utilized decorators, though I felt a few were a bit verbose. Here’s something I use for logging function names as well as their respective input and output values. I’ve adapted it here to just print the info rather than creating a log file and adapted it to apply to the OP specific example.

def debug(func=None):
    def wrapper(*args, **kwargs):
        try:
            function_name = func.__func__.__qualname__
        except:
            function_name = func.__qualname__
        return func(*args, **kwargs, function_name=function_name)
    return wrapper

@debug
def my_function(**kwargs):
    print(kwargs)

my_function()

Output:

{'function_name': 'my_function'}

如何将字符串转换为大写

问题:如何将字符串转换为大写

我在使用Python将字符串更改为大写时遇到问题。在我的研究中,我知道了,string.ascii_uppercase但是没有用。

如下代码:

 >>s = 'sdsd'
 >>s.ascii_uppercase

给出此错误信息:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
AttributeError: 'str' object has no attribute 'ascii_uppercase'

我的问题是:如何在Python中将字符串转换为大写?

I have problem in changing a string into uppercase with Python. In my research, I got string.ascii_uppercase but it doesn’t work.

The following code:

 >>s = 'sdsd'
 >>s.ascii_uppercase

Gives this error message:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
AttributeError: 'str' object has no attribute 'ascii_uppercase'

My question is: how can I convert a string into uppercase in Python?


回答 0

>>> s = 'sdsd'
>>> s.upper()
'SDSD'

请参阅字符串方法

>>> s = 'sdsd'
>>> s.upper()
'SDSD'

See String Methods.


回答 1

要获取字符串的大写版本,可以使用str.upper

s = 'sdsd'
s.upper()
#=> 'SDSD'

另一方面,string.ascii_uppercase是一个包含所有大写ASCII字母的字符串:

import string
string.ascii_uppercase
#=> 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

To get upper case version of a string you can use str.upper:

s = 'sdsd'
s.upper()
#=> 'SDSD'

On the other hand string.ascii_uppercase is a string containing all ASCII letters in upper case:

import string
string.ascii_uppercase
#=> 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

回答 2

使字符串大写-只需键入

s.upper()

简单容易!你也可以做同样的事情来降低它

s.lower()

等等

to make the string upper case — just simply type

s.upper()

simple and easy! you can do the same to make it lower too

s.lower()

etc.


回答 3

s = 'sdsd'
print (s.upper())
upper = raw_input('type in something lowercase.')
lower = raw_input('type in the same thing caps lock.')
print upper.upper()
print lower.lower()
s = 'sdsd'
print (s.upper())
upper = raw_input('type in something lowercase.')
lower = raw_input('type in the same thing caps lock.')
print upper.upper()
print lower.lower()

回答 4

用于将大写字母从小写字母转换为大写字母

"string".upper()

"string"您要转换大写的字符串在哪里

对于这个问题,它会像这样:

s.upper()

用于从大写字符串制作小写字母,只需使用

"string".lower()

"string"您要转换小写的字符串在哪里

对于这个问题,它会像这样:

s.lower()

如果要使用整个字符串变量

s="sadf"
# sadf

s=s.upper()
# SADF

for making uppercase from lowercase to upper just use

"string".upper()

where "string" is your string that you want to convert uppercase

for this question concern it will like this:

s.upper()

for making lowercase from uppercase string just use

"string".lower()

where "string" is your string that you want to convert lowercase

for this question concern it will like this:

s.lower()

If you want to make your whole string variable use

s="sadf"
# sadf

s=s.upper()
# SADF

回答 5

对于有关简单字符串操作的问题,dir内置函数非常方便。它给您提供参数方法的列表,例如,dir(s)返回包含的列表upper

For questions on simple string manipulation the dir built-in function comes in handy. It gives you, among others, a list of methods of the argument, e.g., dir(s) returns a list containing upper.


从字符串列表中删除空字符串

问题:从字符串列表中删除空字符串

我想从python中的字符串列表中删除所有空字符串。

我的想法如下:

while '' in str_list:
    str_list.remove('')

还有其他pythonic方式可以做到这一点吗?

I want to remove all empty strings from a list of strings in python.

My idea looks like this:

while '' in str_list:
    str_list.remove('')

Is there any more pythonic way to do this?


回答 0

我会使用filter

str_list = filter(None, str_list)
str_list = filter(bool, str_list)
str_list = filter(len, str_list)
str_list = filter(lambda item: item, str_list)

Python 3从返回一个迭代器filter,因此应包装在对的调用中list()

str_list = list(filter(None, str_list))

I would use filter:

str_list = filter(None, str_list)
str_list = filter(bool, str_list)
str_list = filter(len, str_list)
str_list = filter(lambda item: item, str_list)

Python 3 returns an iterator from filter, so should be wrapped in a call to list()

str_list = list(filter(None, str_list))

回答 1

使用列表理解是最Python的方式:

>>> strings = ["first", "", "second"]
>>> [x for x in strings if x]
['first', 'second']

如果必须就地修改列表,因为还有其他引用必须看到更新的数据,则使用分片分配:

strings[:] = [x for x in strings if x]

Using a list comprehension is the most Pythonic way:

>>> strings = ["first", "", "second"]
>>> [x for x in strings if x]
['first', 'second']

If the list must be modified in-place, because there are other references which must see the updated data, then use a slice assignment:

strings[:] = [x for x in strings if x]

回答 2

过滤器实际上对此有一个特殊的选择:

filter(None, sequence)

它将滤除所有评估为False的元素。此处无需使用实际的可调用对象,例如bool,len等。

和map(bool,…)一样快

filter actually has a special option for this:

filter(None, sequence)

It will filter out all elements that evaluate to False. No need to use an actual callable here such as bool, len and so on.

It’s equally fast as map(bool, …)


回答 3

>>> lstr = ['hello', '', ' ', 'world', ' ']
>>> lstr
['hello', '', ' ', 'world', ' ']

>>> ' '.join(lstr).split()
['hello', 'world']

>>> filter(None, lstr)
['hello', ' ', 'world', ' ']

比较时间

>>> from timeit import timeit
>>> timeit('" ".join(lstr).split()', "lstr=['hello', '', ' ', 'world', ' ']", number=10000000)
4.226747989654541
>>> timeit('filter(None, lstr)', "lstr=['hello', '', ' ', 'world', ' ']", number=10000000)
3.0278358459472656

请注意,filter(None, lstr)它不会删除带有空格的空字符串' ',只会修剪掉''而同时' '.join(lstr).split()删除它们。

要使用filter()删除的空格字符串,需要花费更多时间:

>>> timeit('filter(None, [l.replace(" ", "") for l in lstr])', "lstr=['hello', '', ' ', 'world', ' ']", number=10000000)
18.101892948150635
>>> lstr = ['hello', '', ' ', 'world', ' ']
>>> lstr
['hello', '', ' ', 'world', ' ']

>>> ' '.join(lstr).split()
['hello', 'world']

>>> filter(None, lstr)
['hello', ' ', 'world', ' ']

Compare time

>>> from timeit import timeit
>>> timeit('" ".join(lstr).split()', "lstr=['hello', '', ' ', 'world', ' ']", number=10000000)
4.226747989654541
>>> timeit('filter(None, lstr)', "lstr=['hello', '', ' ', 'world', ' ']", number=10000000)
3.0278358459472656

Notice that filter(None, lstr) does not remove empty strings with a space ' ', it only prunes away '' while ' '.join(lstr).split() removes both.

To use filter() with white space strings removed, it takes a lot more time:

>>> timeit('filter(None, [l.replace(" ", "") for l in lstr])', "lstr=['hello', '', ' ', 'world', ' ']", number=10000000)
18.101892948150635

回答 4

@ Ib33X的回复很棒。如果要删除每个空字符串,请剥离后。您也需要使用strip方法。否则,如果有空格,它将也返回空字符串。如,“”对于该答案也将有效。这样,就可以实现。

strings = ["first", "", "second ", " "]
[x.strip() for x in strings if x.strip()]

答案是["first", "second"]
如果要改用filtermethod,可以执行like
list(filter(lambda item: item.strip(), strings))。这给出了相同的结果。

Reply from @Ib33X is awesome. If you want to remove every empty string, after stripped. you need to use the strip method too. Otherwise, it will return the empty string too if it has white spaces. Like, ” ” will be valid too for that answer. So, can be achieved by.

strings = ["first", "", "second ", " "]
[x.strip() for x in strings if x.strip()]

The answer for this will be ["first", "second"].
If you want to use filter method instead, you can do like
list(filter(lambda item: item.strip(), strings)). This is give the same result.


回答 5

代替if x,我将使用if X!=”来消除空字符串。像这样:

str_list = [x for x in str_list if x != '']

这将在列表中保留“无”数据类型。此外,如果您的列表中有整数,并且0是其中的一个,它也将被保留。

例如,

str_list = [None, '', 0, "Hi", '', "Hello"]
[x for x in str_list if x != '']
[None, 0, "Hi", "Hello"]

Instead of if x, I would use if X != ” in order to just eliminate empty strings. Like this:

str_list = [x for x in str_list if x != '']

This will preserve None data type within your list. Also, in case your list has integers and 0 is one among them, it will also be preserved.

For example,

str_list = [None, '', 0, "Hi", '', "Hello"]
[x for x in str_list if x != '']
[None, 0, "Hi", "Hello"]

回答 6

根据列表的大小,如果您使用list.remove()而不是创建新列表,则可能是最有效的:

l = ["1", "", "3", ""]

while True:
  try:
    l.remove("")
  except ValueError:
    break

这具有不创建新列表的优点,但是具有每次都必须从头开始搜索的缺点,尽管与while '' in l上面建议的用法不同,它每次出现时仅需要搜索一次''(当然,有一种方法可以保持最佳状态)两种方法,但更为复杂)。

Depending on the size of your list, it may be most efficient if you use list.remove() rather than create a new list:

l = ["1", "", "3", ""]

while True:
  try:
    l.remove("")
  except ValueError:
    break

This has the advantage of not creating a new list, but the disadvantage of having to search from the beginning each time, although unlike using while '' in l as proposed above, it only requires searching once per occurrence of '' (there is certainly a way to keep the best of both methods, but it is more complicated).


回答 7

请记住,如果要将空格保留在字符串中,则可以使用某些方法无意中将其删除。如果你有这个清单

[‘hello world’,”,’,’hello’]您可能想要的内容[‘hello world’,’hello’]

首先修剪列表以将任何类型的空格转换为空字符串:

space_to_empty = [x.strip() for x in _text_list]

然后从列表中删除空字符串

space_clean_list = [x for x in space_to_empty if x]

Keep in mind that if you want to keep the white spaces within a string, you may remove them unintentionally using some approaches. If you have this list

[‘hello world’, ‘ ‘, ”, ‘hello’] what you may want [‘hello world’,’hello’]

first trim the list to convert any type of white space to empty string:

space_to_empty = [x.strip() for x in _text_list]

then remove empty string from them list

space_clean_list = [x for x in space_to_empty if x]

回答 8

用途filter

newlist=filter(lambda x: len(x)>0, oldlist) 

如所指出的,使用过滤器的缺点是它比替代方法慢。而且,lambda通常很昂贵。

或者,您可以选择最简单,最迭代的方法:

# I am assuming listtext is the original list containing (possibly) empty items
for item in listtext:
    if item:
        newlist.append(str(item))
# You can remove str() based on the content of your original list

这是最直观的方法,并且可以在适当的时间内完成。

Use filter:

newlist=filter(lambda x: len(x)>0, oldlist) 

The drawbacks of using filter as pointed out is that it is slower than alternatives; also, lambda is usually costly.

Or you can go for the simplest and the most iterative of all:

# I am assuming listtext is the original list containing (possibly) empty items
for item in listtext:
    if item:
        newlist.append(str(item))
# You can remove str() based on the content of your original list

this is the most intuitive of the methods and does it in decent time.


回答 9

正如Aziz Alto 所报告的filter(None, lstr)那样,不会删除带有空格的空字符串,' '但是如果您确定lstr仅包含字符串,则可以使用filter(str.strip, lstr)

>>> lstr = ['hello', '', ' ', 'world', ' ']
>>> lstr
['hello', '', ' ', 'world', ' ']
>>> ' '.join(lstr).split()
['hello', 'world']
>>> filter(str.strip, lstr)
['hello', 'world']

比较我的电脑上的时间

>>> from timeit import timeit
>>> timeit('" ".join(lstr).split()', "lstr=['hello', '', ' ', 'world', ' ']", number=10000000)
3.356455087661743
>>> timeit('filter(str.strip, lstr)', "lstr=['hello', '', ' ', 'world', ' ']", number=10000000)
5.276503801345825

删除''和清空带有空格的字符串的最快解决方案' '仍然是' '.join(lstr).split()

如评论中所述,如果您的字符串包含空格,则情况会有所不同。

>>> lstr = ['hello', '', ' ', 'world', '    ', 'see you']
>>> lstr
['hello', '', ' ', 'world', '    ', 'see you']
>>> ' '.join(lstr).split()
['hello', 'world', 'see', 'you']
>>> filter(str.strip, lstr)
['hello', 'world', 'see you']

您会看到filter(str.strip, lstr)保留带空格的字符串,但' '.join(lstr).split()会拆分这些字符串。

As reported by Aziz Alto filter(None, lstr) does not remove empty strings with a space ' ' but if you are sure lstr contains only string you can use filter(str.strip, lstr)

>>> lstr = ['hello', '', ' ', 'world', ' ']
>>> lstr
['hello', '', ' ', 'world', ' ']
>>> ' '.join(lstr).split()
['hello', 'world']
>>> filter(str.strip, lstr)
['hello', 'world']

Compare time on my pc

>>> from timeit import timeit
>>> timeit('" ".join(lstr).split()', "lstr=['hello', '', ' ', 'world', ' ']", number=10000000)
3.356455087661743
>>> timeit('filter(str.strip, lstr)', "lstr=['hello', '', ' ', 'world', ' ']", number=10000000)
5.276503801345825

The fastest solution to remove '' and empty strings with a space ' ' remains ' '.join(lstr).split().

As reported in a comment the situation is different if your strings contain spaces.

>>> lstr = ['hello', '', ' ', 'world', '    ', 'see you']
>>> lstr
['hello', '', ' ', 'world', '    ', 'see you']
>>> ' '.join(lstr).split()
['hello', 'world', 'see', 'you']
>>> filter(str.strip, lstr)
['hello', 'world', 'see you']

You can see that filter(str.strip, lstr) preserve strings with spaces on it but ' '.join(lstr).split() will split this strings.


回答 10

总结最佳答案:

1.消除空洞而无需剥离:

也就是说,保留所有空格字符串:

slist = list(filter(None, slist))

优点:

  • 最简单
  • 最快(请参见下面的基准)。

2.去除剥离后的空容器…

2.a …当字符串在单词之间不包含空格时:

slist = ' '.join(slist).split()

优点:

  • 小代码
  • 快速(但由于内存原因,对于大型数据集而言并非最快,这与@ paolo-melchiorre结果相反)

2.b …字符串在单词之间包含空格吗?

slist = list(filter(str.strip, slist))

优点:

  • 最快的;
  • 代码的可理解性。

2018年机器上的基准测试:

## Build test-data
#
import random, string
nwords = 10000
maxlen = 30
null_ratio = 0.1
rnd = random.Random(0)                  # deterministic results
words = [' ' * rnd.randint(0, maxlen)
         if rnd.random() > (1 - null_ratio)
         else
         ''.join(random.choices(string.ascii_letters, k=rnd.randint(0, maxlen)))
         for _i in range(nwords)
        ]

## Test functions
#
def nostrip_filter(slist):
    return list(filter(None, slist))

def nostrip_comprehension(slist):
    return [s for s in slist if s]

def strip_filter(slist):
    return list(filter(str.strip, slist))

def strip_filter_map(slist): 
    return list(filter(None, map(str.strip, slist))) 

def strip_filter_comprehension(slist):  # waste memory
    return list(filter(None, [s.strip() for s in slist]))

def strip_filter_generator(slist):
    return list(filter(None, (s.strip() for s in slist)))

def strip_join_split(slist):  # words without(!) spaces
    return ' '.join(slist).split()

## Benchmarks
#
%timeit nostrip_filter(words)
142 µs ± 16.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit nostrip_comprehension(words)
263 µs ± 19.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_filter(words)
653 µs ± 37.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_filter_map(words)
642 µs ± 36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_filter_comprehension(words)
693 µs ± 42.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_filter_generator(words)
750 µs ± 28.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_join_split(words)
796 µs ± 103 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Sum up best answers:

1. Eliminate emtpties WITHOUT stripping:

That is, all-space strings are retained:

slist = list(filter(None, slist))

PROs:

  • simplest;
  • fastest (see benchmarks below).

2. To eliminate empties after stripping …

2.a … when strings do NOT contain spaces between words:

slist = ' '.join(slist).split()

PROs:

  • small code
  • fast (BUT not fastest with big datasets due to memory, contrary to what @paolo-melchiorre results)

2.b … when strings contain spaces between words?

slist = list(filter(str.strip, slist))

PROs:

  • fastest;
  • understandability of the code.

Benchmarks on a 2018 machine:

## Build test-data
#
import random, string
nwords = 10000
maxlen = 30
null_ratio = 0.1
rnd = random.Random(0)                  # deterministic results
words = [' ' * rnd.randint(0, maxlen)
         if rnd.random() > (1 - null_ratio)
         else
         ''.join(random.choices(string.ascii_letters, k=rnd.randint(0, maxlen)))
         for _i in range(nwords)
        ]

## Test functions
#
def nostrip_filter(slist):
    return list(filter(None, slist))

def nostrip_comprehension(slist):
    return [s for s in slist if s]

def strip_filter(slist):
    return list(filter(str.strip, slist))

def strip_filter_map(slist): 
    return list(filter(None, map(str.strip, slist))) 

def strip_filter_comprehension(slist):  # waste memory
    return list(filter(None, [s.strip() for s in slist]))

def strip_filter_generator(slist):
    return list(filter(None, (s.strip() for s in slist)))

def strip_join_split(slist):  # words without(!) spaces
    return ' '.join(slist).split()

## Benchmarks
#
%timeit nostrip_filter(words)
142 µs ± 16.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit nostrip_comprehension(words)
263 µs ± 19.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_filter(words)
653 µs ± 37.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_filter_map(words)
642 µs ± 36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_filter_comprehension(words)
693 µs ± 42.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_filter_generator(words)
750 µs ± 28.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_join_split(words)
796 µs ± 103 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

回答 11

对于包含空格和空值的列表,请使用简单的列表理解-

>>> s = ['I', 'am', 'a', '', 'great', ' ', '', '  ', 'person', '!!', 'Do', 'you', 'think', 'its', 'a', '', 'a', '', 'joke', '', ' ', '', '?', '', '', '', '?']

因此,您可以看到,此列表包含空格和null元素的组合。使用摘要-

>>> d = [x for x in s if x.strip()]
>>> d
>>> d = ['I', 'am', 'a', 'great', 'person', '!!', 'Do', 'you', 'think', 'its', 'a', 'a', 'joke', '?', '?']

For a list with a combination of spaces and empty values, use simple list comprehension –

>>> s = ['I', 'am', 'a', '', 'great', ' ', '', '  ', 'person', '!!', 'Do', 'you', 'think', 'its', 'a', '', 'a', '', 'joke', '', ' ', '', '?', '', '', '', '?']

So, you can see, this list has a combination of spaces and null elements. Using the snippet –

>>> d = [x for x in s if x.strip()]
>>> d
>>> d = ['I', 'am', 'a', 'great', 'person', '!!', 'Do', 'you', 'think', 'its', 'a', 'a', 'joke', '?', '?']

将字符串拆分为具有多个单词边界定界符的单词

问题:将字符串拆分为具有多个单词边界定界符的单词

我认为我想做的是一项相当普通的任务,但是我在网络上找不到任何参考。我的文字带有标点符号,我想要一个单词列表。

"Hey, you - what are you doing here!?"

应该

['hey', 'you', 'what', 'are', 'you', 'doing', 'here']

但是Python str.split()只能使用一个参数,因此在用空格分割后,所有单词都带有标点符号。有任何想法吗?

I think what I want to do is a fairly common task but I’ve found no reference on the web. I have text with punctuation, and I want a list of the words.

"Hey, you - what are you doing here!?"

should be

['hey', 'you', 'what', 'are', 'you', 'doing', 'here']

But Python’s str.split() only works with one argument, so I have all words with the punctuation after I split with whitespace. Any ideas?


回答 0

正则表达式合理的情况:

import re
DATA = "Hey, you - what are you doing here!?"
print re.findall(r"[\w']+", DATA)
# Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

A case where regular expressions are justified:

import re
DATA = "Hey, you - what are you doing here!?"
print re.findall(r"[\w']+", DATA)
# Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

回答 1

re.split()

re.split(pattern,string [,maxsplit = 0])

按模式分割字符串。如果在模式中使用了捕获括号,则模式中所有组的文本也将作为结果列表的一部分返回。如果maxsplit不为零,则最多会发生maxsplit分割,并将字符串的其余部分作为列表的最后一个元素返回。(不兼容说明:在原始的Python 1.5发行版中,maxsplit被忽略。此问题已在以后的发行版中修复。)

>>> re.split('\W+', 'Words, words, words.')
['Words', 'words', 'words', '']
>>> re.split('(\W+)', 'Words, words, words.')
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split('\W+', 'Words, words, words.', 1)
['Words', 'words, words.']

re.split()

re.split(pattern, string[, maxsplit=0])

Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list. If maxsplit is nonzero, at most maxsplit splits occur, and the remainder of the string is returned as the final element of the list. (Incompatibility note: in the original Python 1.5 release, maxsplit was ignored. This has been fixed in later releases.)

>>> re.split('\W+', 'Words, words, words.')
['Words', 'words', 'words', '']
>>> re.split('(\W+)', 'Words, words, words.')
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split('\W+', 'Words, words, words.', 1)
['Words', 'words, words.']

回答 2

另一种无需使用正则表达式的快速方法是首先替换字符,如下所示:

>>> 'a;bcd,ef g'.replace(';',' ').replace(',',' ').split()
['a', 'bcd', 'ef', 'g']

Another quick way to do this without a regexp is to replace the characters first, as below:

>>> 'a;bcd,ef g'.replace(';',' ').replace(',',' ').split()
['a', 'bcd', 'ef', 'g']

回答 3

如此众多的答案,但我找不到有效解决问题标题真正要求的解决方案(拆分多个可能的分隔符,相反,许多答案拆分成一个单词而不是单词,这是不同的)。因此,这是标题中问题的答案,该问题依赖于Python的标准高效re模块:

>>> import re  # Will be splitting on: , <space> - ! ? :
>>> filter(None, re.split("[, \-!?:]+", "Hey, you - what are you doing here!?"))
['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

哪里:

  • […]比赛一个隔板内上市,
  • \-在正则表达式是在这里以防止特殊解释-为字符范围指示器(如在A-Z),
  • +跳过一个或多个分隔符(它可以省略感谢filter(),但是这将不必要地产生匹配隔板之间空字符串),并
  • filter(None, …) 删除可能由前导和尾随分隔符创建的空字符串(因为空字符串具有错误的布尔值)。

re.split()正如问题标题所要求的那样,这恰好是“用多个分隔符分隔”。

此外,该解决方案还可以避免在其他一些解决方案中发现的单词中非ASCII字符的问题(请参见ghostdog74的答案的第一条评论)。

re模块比“手动”执行Python循环和测试要高效得多(在速度和简洁性方面)!

So many answers, yet I can’t find any solution that does efficiently what the title of the questions literally asks for (splitting on multiple possible separators—instead, many answers split on anything that is not a word, which is different). So here is an answer to the question in the title, that relies on Python’s standard and efficient re module:

>>> import re  # Will be splitting on: , <space> - ! ? :
>>> filter(None, re.split("[, \-!?:]+", "Hey, you - what are you doing here!?"))
['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

where:

  • the […] matches one of the separators listed inside,
  • the \- in the regular expression is here to prevent the special interpretation of - as a character range indicator (as in A-Z),
  • the + skips one or more delimiters (it could be omitted thanks to the filter(), but this would unnecessarily produce empty strings between matched separators), and
  • filter(None, …) removes the empty strings possibly created by leading and trailing separators (since empty strings have a false boolean value).

This re.split() precisely “splits with multiple separators”, as asked for in the question title.

This solution is furthermore immune to the problems with non-ASCII characters in words found in some other solutions (see the first comment to ghostdog74’s answer).

The re module is much more efficient (in speed and concision) than doing Python loops and tests “by hand”!


回答 4

另一种方式,没有正则表达式

import string
punc = string.punctuation
thestring = "Hey, you - what are you doing here!?"
s = list(thestring)
''.join([o for o in s if not o in punc]).split()

Another way, without regex

import string
punc = string.punctuation
thestring = "Hey, you - what are you doing here!?"
s = list(thestring)
''.join([o for o in s if not o in punc]).split()

回答 5

专业提示:使用 string.translate用于Python最快的字符串操作。

一些证明…

首先,慢速的方式(对不起pprzemek):

>>> import timeit
>>> S = 'Hey, you - what are you doing here!?'
>>> def my_split(s, seps):
...     res = [s]
...     for sep in seps:
...         s, res = res, []
...         for seq in s:
...             res += seq.split(sep)
...     return res
... 
>>> timeit.Timer('my_split(S, punctuation)', 'from __main__ import S,my_split; from string import punctuation').timeit()
54.65477919578552

接下来,我们使用re.findall()(由建议的答案给出)。快多了:

>>> timeit.Timer('findall(r"\w+", S)', 'from __main__ import S; from re import findall').timeit()
4.194725036621094

最后,我们使用translate

>>> from string import translate,maketrans,punctuation 
>>> T = maketrans(punctuation, ' '*len(punctuation))
>>> timeit.Timer('translate(S, T).split()', 'from __main__ import S,T,translate').timeit()
1.2835021018981934

说明:

string.translate是用C实现的,与Python中的许多字符串操作函数不同,string.translate 它不会产生新的字符串。因此,它与字符串替换一样快。

不过,这有点尴尬,因为它需要翻译表才能执行此操作。您可以使用maketrans()便利功能制作翻译表。此处的目的是将所有不需要的字符转换为空格。一对一的替代品。同样,不会产生任何新数据。所以这很快

接下来,我们使用好old split()split()默认情况下,它将对所有空白字符起作用,将它们分组在一起以进行拆分。结果将是您想要的单词列表。而且这种方法的速度几乎快了4倍re.findall()

Pro-Tip: Use string.translate for the fastest string operations Python has.

Some proof…

First, the slow way (sorry pprzemek):

>>> import timeit
>>> S = 'Hey, you - what are you doing here!?'
>>> def my_split(s, seps):
...     res = [s]
...     for sep in seps:
...         s, res = res, []
...         for seq in s:
...             res += seq.split(sep)
...     return res
... 
>>> timeit.Timer('my_split(S, punctuation)', 'from __main__ import S,my_split; from string import punctuation').timeit()
54.65477919578552

Next, we use re.findall() (as given by the suggested answer). MUCH faster:

>>> timeit.Timer('findall(r"\w+", S)', 'from __main__ import S; from re import findall').timeit()
4.194725036621094

Finally, we use translate:

>>> from string import translate,maketrans,punctuation 
>>> T = maketrans(punctuation, ' '*len(punctuation))
>>> timeit.Timer('translate(S, T).split()', 'from __main__ import S,T,translate').timeit()
1.2835021018981934

Explanation:

string.translate is implemented in C and unlike many string manipulation functions in Python, string.translate does not produce a new string. So it’s about as fast as you can get for string substitution.

It’s a bit awkward, though, as it needs a translation table in order to do this magic. You can make a translation table with the maketrans() convenience function. The objective here is to translate all unwanted characters to spaces. A one-for-one substitute. Again, no new data is produced. So this is fast!

Next, we use good old split(). split() by default will operate on all whitespace characters, grouping them together for the split. The result will be the list of words that you want. And this approach is almost 4x faster than re.findall()!


回答 6

我遇到了类似的难题,不想使用’re’模块。

def my_split(s, seps):
    res = [s]
    for sep in seps:
        s, res = res, []
        for seq in s:
            res += seq.split(sep)
    return res

print my_split('1111  2222 3333;4444,5555;6666', [' ', ';', ','])
['1111', '', '2222', '3333', '4444', '5555', '6666']

I had a similar dilemma and didn’t want to use ‘re’ module.

def my_split(s, seps):
    res = [s]
    for sep in seps:
        s, res = res, []
        for seq in s:
            res += seq.split(sep)
    return res

print my_split('1111  2222 3333;4444,5555;6666', [' ', ';', ','])
['1111', '', '2222', '3333', '4444', '5555', '6666']

回答 7

首先,我想与其他人同意,正则表达式或str.translate(...)基于基础的解决方案性能最高。对于我的用例,此功能的性能并不重要,因此我想添加我考虑的该标准的想法。

我的主要目标是将其他一些答案中的想法归纳为一个解决方案,该解决方案可用于包含不仅仅是正则表达式单词的字符串(即,将标点字符的显式子集列入黑名单而将单词字符列入白名单)。

请注意,在任何方法中,都可能会考虑使用 string.punctuation代替手动定义的列表。

选项1-重新订阅

我很惊讶地发现到目前为止没有答案使用re.sub(…)。我发现这是解决此问题的一种简单自然的方法。

import re

my_str = "Hey, you - what are you doing here!?"

words = re.split(r'\s+', re.sub(r'[,\-!?]', ' ', my_str).strip())

在此解决方案中,我将调用嵌套到re.sub(...)内部re.split(...)-但如果性能至关重要,则在外部编译正则表达式可能会有所益处-对于我的用例而言,差异并不明显,因此我更喜欢简单性和可读性。

选项2-str.replace

这是另外几行,但是它具有可扩展的优点,而不必检查是否需要在正则表达式中转义某个字符。

my_str = "Hey, you - what are you doing here!?"

replacements = (',', '-', '!', '?')
for r in replacements:
    my_str = my_str.replace(r, ' ')

words = my_str.split()

能够将str.replace映射到字符串本来会很好,但是我不认为可以使用不可变的字符串来完成,并且在映射到字符列表时可以工作,对每个字符运行每个替换听起来太过分了。(编辑:有关功能示例,请参阅下一个选项。)

选项3-functools.reduce

(在Python 2中,reduce它可以在全局命名空间中使用,而无需从functools导入。)

import functools

my_str = "Hey, you - what are you doing here!?"

replacements = (',', '-', '!', '?')
my_str = functools.reduce(lambda s, sep: s.replace(sep, ' '), replacements, my_str)
words = my_str.split()

First, I want to agree with others that the regex or str.translate(...) based solutions are most performant. For my use case the performance of this function wasn’t significant, so I wanted to add ideas that I considered with that criteria.

My main goal was to generalize ideas from some of the other answers into one solution that could work for strings containing more than just regex words (i.e., blacklisting the explicit subset of punctuation characters vs whitelisting word characters).

Note that, in any approach, one might also consider using string.punctuation in place of a manually defined list.

Option 1 – re.sub

I was surprised to see no answer so far uses re.sub(…). I find it a simple and natural approach to this problem.

import re

my_str = "Hey, you - what are you doing here!?"

words = re.split(r'\s+', re.sub(r'[,\-!?]', ' ', my_str).strip())

In this solution, I nested the call to re.sub(...) inside re.split(...) — but if performance is critical, compiling the regex outside could be beneficial — for my use case, the difference wasn’t significant, so I prefer simplicity and readability.

Option 2 – str.replace

This is a few more lines, but it has the benefit of being expandable without having to check whether you need to escape a certain character in regex.

my_str = "Hey, you - what are you doing here!?"

replacements = (',', '-', '!', '?')
for r in replacements:
    my_str = my_str.replace(r, ' ')

words = my_str.split()

It would have been nice to be able to map the str.replace to the string instead, but I don’t think it can be done with immutable strings, and while mapping against a list of characters would work, running every replacement against every character sounds excessive. (Edit: See next option for a functional example.)

Option 3 – functools.reduce

(In Python 2, reduce is available in global namespace without importing it from functools.)

import functools

my_str = "Hey, you - what are you doing here!?"

replacements = (',', '-', '!', '?')
my_str = functools.reduce(lambda s, sep: s.replace(sep, ' '), replacements, my_str)
words = my_str.split()

回答 8

join = lambda x: sum(x,[])  # a.k.a. flatten1([[1],[2,3],[4]]) -> [1,2,3,4]
# ...alternatively...
join = lambda lists: [x for l in lists for x in l]

然后这变成了三层:

fragments = [text]
for token in tokens:
    fragments = join(f.split(token) for f in fragments)

说明

这就是在Haskell中被称为List monad的东西。monad背后的想法是,一旦“在monad中”,您就“停留在monad中”,直到有东西将您带出。例如在Haskell中,假设您将python range(n) -> [1,2,...,n]函数映射到List上。如果结果是一个列表,它将被原地追加到列表中,因此您将获得类似map(range, [3,4,1]) -> [0,1,2,0,1,2,3,0]。这称为map-append(或mappend,或类似的东西)。这里的想法是,您要执行此操作(拆分令牌),并且每当执行此操作时,您都将结果加入列表。

您可以将其抽象为一个函数,并且tokens=string.punctuation默认情况下具有。

这种方法的优点:

  • 这种方法(与基于朴素的基于正则表达式的方法不同)可以与任意长度的令牌一起使用(正则表达式也可以使用更高级的语法)。
  • 您不仅限于代币;您可以使用任意逻辑代替每个标记,例如,“标记”之一可以是根据嵌套括号的拆分方式进行拆分的函数。
join = lambda x: sum(x,[])  # a.k.a. flatten1([[1],[2,3],[4]]) -> [1,2,3,4]
# ...alternatively...
join = lambda lists: [x for l in lists for x in l]

Then this becomes a three-liner:

fragments = [text]
for token in tokens:
    fragments = join(f.split(token) for f in fragments)

Explanation

This is what in Haskell is known as the List monad. The idea behind the monad is that once “in the monad” you “stay in the monad” until something takes you out. For example in Haskell, say you map the python range(n) -> [1,2,...,n] function over a List. If the result is a List, it will be append to the List in-place, so you’d get something like map(range, [3,4,1]) -> [0,1,2,0,1,2,3,0]. This is known as map-append (or mappend, or maybe something like that). The idea here is that you’ve got this operation you’re applying (splitting on a token), and whenever you do that, you join the result into the list.

You can abstract this into a function and have tokens=string.punctuation by default.

Advantages of this approach:

  • This approach (unlike naive regex-based approaches) can work with arbitrary-length tokens (which regex can also do with more advanced syntax).
  • You are not restricted to mere tokens; you could have arbitrary logic in place of each token, for example one of the “tokens” could be a function which splits according to how nested parentheses are.

回答 9

尝试这个:

import re

phrase = "Hey, you - what are you doing here!?"
matches = re.findall('\w+', phrase)
print matches

这将打印 ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

try this:

import re

phrase = "Hey, you - what are you doing here!?"
matches = re.findall('\w+', phrase)
print matches

this will print ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']


回答 10

两次使用替换:

a = '11223FROM33344INTO33222FROM3344'
a.replace('FROM', ',,,').replace('INTO', ',,,').split(',,,')

结果是:

['11223', '33344', '33222', '3344']

Use replace two times:

a = '11223FROM33344INTO33222FROM3344'
a.replace('FROM', ',,,').replace('INTO', ',,,').split(',,,')

results in:

['11223', '33344', '33222', '3344']

回答 11

我喜欢re,但是这是我的解决方案:

from itertools import groupby
sep = ' ,-!?'
s = "Hey, you - what are you doing here!?"
print [''.join(g) for k, g in groupby(s, sep.__contains__) if not k]

sep .__ contains__是’in’运算符使用的方法。基本上和

lambda ch: ch in sep

但是这里比较方便。

groupby获取我们的字符串和函数。它使用该函数将字符串分成几组:每当函数值更改时,就会生成一个新的组。因此,sep .__ contains__正是我们需要的。

groupby返回一对对的序列,其中pair [0]是我们函数的结果,而pair [1]是一个组。使用‘if not k’我们用分隔符过滤掉组(因为sep .__ contains__在分隔符上为True 的结果)。好了,就是这样-现在我们有了一系列的组,每个组都是一个单词(组实际上是一个可迭代的,因此我们使用join将其转换为字符串)。

该解决方案非常通用,因为它使用一个函数来分隔字符串(可以按需要的任何条件进行拆分)。另外,它不会创建中间字符串/列表(您可以删除联接,并且表达式将变得很懒,因为每个组都是迭代器)

I like re, but here is my solution without it:

from itertools import groupby
sep = ' ,-!?'
s = "Hey, you - what are you doing here!?"
print [''.join(g) for k, g in groupby(s, sep.__contains__) if not k]

sep.__contains__ is a method used by ‘in’ operator. Basically it is the same as

lambda ch: ch in sep

but is more convenient here.

groupby gets our string and function. It splits string in groups using that function: whenever a value of function changes – a new group is generated. So, sep.__contains__ is exactly what we need.

groupby returns a sequence of pairs, where pair[0] is a result of our function and pair[1] is a group. Using ‘if not k’ we filter out groups with separators (because a result of sep.__contains__ is True on separators). Well, that’s all – now we have a sequence of groups where each one is a word (group is actually an iterable so we use join to convert it to string).

This solution is quite general, because it uses a function to separate string (you can split by any condition you need). Also, it doesn’t create intermediate strings/lists (you can remove join and the expression will become lazy, since each group is an iterator)


回答 12

您可以使用pandas的series.str.split方法来获得相同的结果,而不是使用re模块功能re.split。

首先,使用上面的字符串创建一个系列,然后将该方法应用于该系列。

thestring = pd.Series("Hey, you - what are you doing here!?") thestring.str.split(pat = ',|-')

参数pat接受定界符,并将拆分后的字符串作为数组返回。这里,两个定界符使用|传递。(或运算符)。输出如下:

[Hey, you , what are you doing here!?]

Instead of using a re module function re.split you can achieve the same result using the series.str.split method of pandas.

First, create a series with the above string and then apply the method to the series.

thestring = pd.Series("Hey, you - what are you doing here!?") thestring.str.split(pat = ',|-')

parameter pat takes the delimiters and returns the split string as an array. Here the two delimiters are passed using a | (or operator). The output is as follows:

[Hey, you , what are you doing here!?]


回答 13

我正在重新熟悉Python,并需要同样的东西。findall解决方案可能更好,但是我想到了:

tokens = [x.strip() for x in data.split(',')]

I’m re-acquainting myself with Python and needed the same thing. The findall solution may be better, but I came up with this:

tokens = [x.strip() for x in data.split(',')]

回答 14

使用maketrans和翻译,您可以轻松整齐地进行操作

import string
specials = ',.!?:;"()<>[]#$=-/'
trans = string.maketrans(specials, ' '*len(specials))
body = body.translate(trans)
words = body.strip().split()

using maketrans and translate you can do it easily and neatly

import string
specials = ',.!?:;"()<>[]#$=-/'
trans = string.maketrans(specials, ' '*len(specials))
body = body.translate(trans)
words = body.strip().split()

回答 15

在Python 3中,您可以使用PY4E-Python for Everybody中的方法

我们可以通过使用字符串的方法解决这两个问题lowerpunctuationtranslate。该translate是最微妙的方法。这是有关以下内容的文档translate

your_string.translate(your_string.maketrans(fromstr, tostr, deletestr))

将中的字符替换为中fromstr相同位置的tostr字符,并删除中的所有字符deletestr。该fromstrtostr可以为空字符串和deletestr可以省略参数。

您可以看到“标点符号”:

In [10]: import string

In [11]: string.punctuation
Out[11]: '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'  

例如:

In [12]: your_str = "Hey, you - what are you doing here!?"

In [13]: line = your_str.translate(your_str.maketrans('', '', string.punctuation))

In [14]: line = line.lower()

In [15]: words = line.split()

In [16]: print(words)
['hey', 'you', 'what', 'are', 'you', 'doing', 'here']

有关更多信息,您可以参考:

In Python 3, your can use the method from PY4E – Python for Everybody.

We can solve both these problems by using the string methods lower, punctuation, and translate. The translate is the most subtle of the methods. Here is the documentation for translate:

your_string.translate(your_string.maketrans(fromstr, tostr, deletestr))

Replace the characters in fromstr with the character in the same position in tostr and delete all characters that are in deletestr. The fromstr and tostr can be empty strings and the deletestr parameter can be omitted.

Your can see the “punctuation”:

In [10]: import string

In [11]: string.punctuation
Out[11]: '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'  

For your example:

In [12]: your_str = "Hey, you - what are you doing here!?"

In [13]: line = your_str.translate(your_str.maketrans('', '', string.punctuation))

In [14]: line = line.lower()

In [15]: words = line.split()

In [16]: print(words)
['hey', 'you', 'what', 'are', 'you', 'doing', 'here']

For more information, you can refer:


回答 16

实现此目的的另一种方法是使用自然语言工具包(nltk)。

import nltk
data= "Hey, you - what are you doing here!?"
word_tokens = nltk.tokenize.regexp_tokenize(data, r'\w+')
print word_tokens

打印: ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

这种方法的最大缺点是您需要安装nltk软件包

好处是,一旦获得令牌,您就可以使用其余的nltk软件包做很多有趣的事情

Another way to achieve this is to use the Natural Language Tool Kit (nltk).

import nltk
data= "Hey, you - what are you doing here!?"
word_tokens = nltk.tokenize.regexp_tokenize(data, r'\w+')
print word_tokens

This prints: ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

The biggest drawback of this method is that you need to install the nltk package.

The benefits are that you can do a lot of fun stuff with the rest of the nltk package once you get your tokens.


回答 17

首先,我不认为您的意图是在拆分函数中实际使用标点符号作为分隔符。您的描述表明您只是想从结果字符串中消除标点符号。

我经常遇到这种情况,而我通常的解决方案不需要重新输入。

具有列表理解功能的单行lambda函数:

(要求import string):

split_without_punc = lambda text : [word.strip(string.punctuation) for word in 
    text.split() if word.strip(string.punctuation) != '']

# Call function
split_without_punc("Hey, you -- what are you doing?!")
# returns ['Hey', 'you', 'what', 'are', 'you', 'doing']


功能(传统)

作为传统函数,这仍然只有两行具有列表理解功能(除了import string):

def split_without_punctuation2(text):

    # Split by whitespace
    words = text.split()

    # Strip punctuation from each word
    return [word.strip(ignore) for word in words if word.strip(ignore) != '']

split_without_punctuation2("Hey, you -- what are you doing?!")
# returns ['Hey', 'you', 'what', 'are', 'you', 'doing']

它自然也会使收缩和带连字符的单词保持完整。您总是可以text.replace("-", " ")在分割之前使用连字符将其转换为空格。

没有Lambda或列表理解的常规功能

对于更通用的解决方案(您可以在其中指定要消除的字符),并且无需列表理解,您将获得:

def split_without(text: str, ignore: str) -> list:

    # Split by whitespace
    split_string = text.split()

    # Strip any characters in the ignore string, and ignore empty strings
    words = []
    for word in split_string:
        word = word.strip(ignore)
        if word != '':
            words.append(word)

    return words

# Situation-specific call to general function
import string
final_text = split_without("Hey, you - what are you doing?!", string.punctuation)
# returns ['Hey', 'you', 'what', 'are', 'you', 'doing']

当然,您也可以始终将lambda函数概括为任何指定的字符串。

First of all, I don’t think that your intention is to actually use punctuation as delimiters in the split functions. Your description suggests that you simply want to eliminate punctuation from the resultant strings.

I come across this pretty frequently, and my usual solution doesn’t require re.

One-liner lambda function w/ list comprehension:

(requires import string):

split_without_punc = lambda text : [word.strip(string.punctuation) for word in 
    text.split() if word.strip(string.punctuation) != '']

# Call function
split_without_punc("Hey, you -- what are you doing?!")
# returns ['Hey', 'you', 'what', 'are', 'you', 'doing']


Function (traditional)

As a traditional function, this is still only two lines with a list comprehension (in addition to import string):

def split_without_punctuation2(text):

    # Split by whitespace
    words = text.split()

    # Strip punctuation from each word
    return [word.strip(ignore) for word in words if word.strip(ignore) != '']

split_without_punctuation2("Hey, you -- what are you doing?!")
# returns ['Hey', 'you', 'what', 'are', 'you', 'doing']

It will also naturally leave contractions and hyphenated words intact. You can always use text.replace("-", " ") to turn hyphens into spaces before the split.

General Function w/o Lambda or List Comprehension

For a more general solution (where you can specify the characters to eliminate), and without a list comprehension, you get:

def split_without(text: str, ignore: str) -> list:

    # Split by whitespace
    split_string = text.split()

    # Strip any characters in the ignore string, and ignore empty strings
    words = []
    for word in split_string:
        word = word.strip(ignore)
        if word != '':
            words.append(word)

    return words

# Situation-specific call to general function
import string
final_text = split_without("Hey, you - what are you doing?!", string.punctuation)
# returns ['Hey', 'you', 'what', 'are', 'you', 'doing']

Of course, you can always generalize the lambda function to any specified string of characters as well.


回答 18

首先,在循环中执行任何RegEx操作之前,请始终使用re.compile(),因为它比常规操作更快。

因此对于您的问题,请先编译模式,然后对其执行操作。

import re
DATA = "Hey, you - what are you doing here!?"
reg_tok = re.compile("[\w']+")
print reg_tok.findall(DATA)

First of all, always use re.compile() before performing any RegEx operation in a loop because it works faster than normal operation.

so for your problem first compile the pattern and then perform action on it.

import re
DATA = "Hey, you - what are you doing here!?"
reg_tok = re.compile("[\w']+")
print reg_tok.findall(DATA)

回答 19

这是一些解释的答案。

st = "Hey, you - what are you doing here!?"

# replace all the non alpha-numeric with space and then join.
new_string = ''.join([x.replace(x, ' ') if not x.isalnum() else x for x in st])
# output of new_string
'Hey  you  what are you doing here  '

# str.split() will remove all the empty string if separator is not provided
new_list = new_string.split()

# output of new_list
['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

# we can join it to get a complete string without any non alpha-numeric character
' '.join(new_list)
# output
'Hey you what are you doing'

或者一行,我们可以这样:

(''.join([x.replace(x, ' ') if not x.isalnum() else x for x in st])).split()

# output
['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

更新的答案

Here is the answer with some explanation.

st = "Hey, you - what are you doing here!?"

# replace all the non alpha-numeric with space and then join.
new_string = ''.join([x.replace(x, ' ') if not x.isalnum() else x for x in st])
# output of new_string
'Hey  you  what are you doing here  '

# str.split() will remove all the empty string if separator is not provided
new_list = new_string.split()

# output of new_list
['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

# we can join it to get a complete string without any non alpha-numeric character
' '.join(new_list)
# output
'Hey you what are you doing'

or in one line, we can do like this:

(''.join([x.replace(x, ' ') if not x.isalnum() else x for x in st])).split()

# output
['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

updated answer


回答 20

创建一个函数,将两个字符串(要拆分的源字符串和定界符的splitlist字符串)作为输入,并输出一个拆分词列表:

def split_string(source, splitlist):
    output = []  # output list of cleaned words
    atsplit = True
    for char in source:
        if char in splitlist:
            atsplit = True
        else:
            if atsplit:
                output.append(char)  # append new word after split
                atsplit = False
            else: 
                output[-1] = output[-1] + char  # continue copying characters until next split
    return output

Create a function that takes as input two strings (the source string to be split and the splitlist string of delimiters) and outputs a list of split words:

def split_string(source, splitlist):
    output = []  # output list of cleaned words
    atsplit = True
    for char in source:
        if char in splitlist:
            atsplit = True
        else:
            if atsplit:
                output.append(char)  # append new word after split
                atsplit = False
            else: 
                output[-1] = output[-1] + char  # continue copying characters until next split
    return output

回答 21

我喜欢pprzemek的解决方案,因为它不假定定界符是单个字符,并且不尝试利用正则表达式(如果分隔符的数目太长了,这将不能很好地工作)。

为了清楚起见,以下是上述解决方案的可读性更高的版本:

def split_string_on_multiple_separators(input_string, separators):
    buffer = [input_string]
    for sep in separators:
        strings = buffer
        buffer = []  # reset the buffer
        for s in strings:
            buffer = buffer + s.split(sep)

    return buffer

I like pprzemek’s solution because it does not assume that the delimiters are single characters and it doesn’t try to leverage a regex (which would not work well if the number of separators got to be crazy long).

Here’s a more readable version of the above solution for clarity:

def split_string_on_multiple_separators(input_string, separators):
    buffer = [input_string]
    for sep in separators:
        strings = buffer
        buffer = []  # reset the buffer
        for s in strings:
            buffer = buffer + s.split(sep)

    return buffer

回答 22

遇到了与@ooboo相同的问题,并找到了这个主题@ ghostdog74启发了我,也许有人觉得我的解决方案有用

str1='adj:sg:nom:m1.m2.m3:pos'
splitat=':.'
''.join([ s if s not in splitat else ' ' for s in str1]).split()

如果您不想在空格处分割,请在空格处输入内容并使用相同的字符分割。

got same problem as @ooboo and find this topic @ghostdog74 inspired me, maybe someone finds my solution usefull

str1='adj:sg:nom:m1.m2.m3:pos'
splitat=':.'
''.join([ s if s not in splitat else ' ' for s in str1]).split()

input something in space place and split using same character if you dont want to split at spaces.


回答 23

这是我与多个决策者共同努力的结果:

def msplit( str, delims ):
  w = ''
  for z in str:
    if z not in delims:
        w += z
    else:
        if len(w) > 0 :
            yield w
        w = ''
  if len(w) > 0 :
    yield w

Here is my go at a split with multiple deliminaters:

def msplit( str, delims ):
  w = ''
  for z in str:
    if z not in delims:
        w += z
    else:
        if len(w) > 0 :
            yield w
        w = ''
  if len(w) > 0 :
    yield w

回答 24

我认为以下是满足您需求的最佳答案:

\W+ 可能适合这种情况,但可能不适合其他情况。

filter(None, re.compile('[ |,|\-|!|?]').split( "Hey, you - what are you doing here!?")

I think the following is the best answer to suite your needs :

\W+ maybe suitable for this case, but may not be suitable for other cases.

filter(None, re.compile('[ |,|\-|!|?]').split( "Hey, you - what are you doing here!?")

回答 25

这是我的看法。

def split_string(source,splitlist):
    splits = frozenset(splitlist)
    l = []
    s1 = ""
    for c in source:
        if c in splits:
            if s1:
                l.append(s1)
                s1 = ""
        else:
            print s1
            s1 = s1 + c
    if s1:
        l.append(s1)
    return l

>>>out = split_string("First Name,Last Name,Street Address,City,State,Zip Code",",")
>>>print out
>>>['First Name', 'Last Name', 'Street Address', 'City', 'State', 'Zip Code']

Heres my take on it….

def split_string(source,splitlist):
    splits = frozenset(splitlist)
    l = []
    s1 = ""
    for c in source:
        if c in splits:
            if s1:
                l.append(s1)
                s1 = ""
        else:
            print s1
            s1 = s1 + c
    if s1:
        l.append(s1)
    return l

>>>out = split_string("First Name,Last Name,Street Address,City,State,Zip Code",",")
>>>print out
>>>['First Name', 'Last Name', 'Street Address', 'City', 'State', 'Zip Code']

回答 26

我喜欢replace()最好的方式。以下过程将字符串中定义的所有分隔符更改splitlist为第一个分隔符splitlist,然后在该分隔符上拆分文本。它还说明是否splitlist碰巧是一个空字符串。它返回单词列表,其中没有空字符串。

def split_string(text, splitlist):
    for sep in splitlist:
        text = text.replace(sep, splitlist[0])
    return filter(None, text.split(splitlist[0])) if splitlist else [text]

I like the replace() way the best. The following procedure changes all separators defined in a string splitlist to the first separator in splitlist and then splits the text on that one separator. It also accounts for if splitlist happens to be an empty string. It returns a list of words, with no empty strings in it.

def split_string(text, splitlist):
    for sep in splitlist:
        text = text.replace(sep, splitlist[0])
    return filter(None, text.split(splitlist[0])) if splitlist else [text]

回答 27

def get_words(s):
    l = []
    w = ''
    for c in s.lower():
        if c in '-!?,. ':
            if w != '': 
                l.append(w)
            w = ''
        else:
            w = w + c
    if w != '': 
        l.append(w)
    return l

这是用法:

>>> s = "Hey, you - what are you doing here!?"
>>> print get_words(s)
['hey', 'you', 'what', 'are', 'you', 'doing', 'here']
def get_words(s):
    l = []
    w = ''
    for c in s.lower():
        if c in '-!?,. ':
            if w != '': 
                l.append(w)
            w = ''
        else:
            w = w + c
    if w != '': 
        l.append(w)
    return l

Here is the usage:

>>> s = "Hey, you - what are you doing here!?"
>>> print get_words(s)
['hey', 'you', 'what', 'are', 'you', 'doing', 'here']

回答 28

如果要进行可逆操作(保留定界符),则可以使用以下功能:

def tokenizeSentence_Reversible(sentence):
    setOfDelimiters = ['.', ' ', ',', '*', ';', '!']
    listOfTokens = [sentence]

    for delimiter in setOfDelimiters:
        newListOfTokens = []
        for ind, token in enumerate(listOfTokens):
            ll = [([delimiter, w] if ind > 0 else [w]) for ind, w in enumerate(token.split(delimiter))]
            listOfTokens = [item for sublist in ll for item in sublist] # flattens.
            listOfTokens = filter(None, listOfTokens) # Removes empty tokens: ''
            newListOfTokens.extend(listOfTokens)

        listOfTokens = newListOfTokens

    return listOfTokens

If you want a reversible operation (preserve the delimiters), you can use this function:

def tokenizeSentence_Reversible(sentence):
    setOfDelimiters = ['.', ' ', ',', '*', ';', '!']
    listOfTokens = [sentence]

    for delimiter in setOfDelimiters:
        newListOfTokens = []
        for ind, token in enumerate(listOfTokens):
            ll = [([delimiter, w] if ind > 0 else [w]) for ind, w in enumerate(token.split(delimiter))]
            listOfTokens = [item for sublist in ll for item in sublist] # flattens.
            listOfTokens = filter(None, listOfTokens) # Removes empty tokens: ''
            newListOfTokens.extend(listOfTokens)

        listOfTokens = newListOfTokens

    return listOfTokens

回答 29

我最近需要执行此操作,但需要一个与标准库str.split函数有些匹配的函数,当使用0或1个参数调用时,该函数的行为与标准库相同。

def split_many(string, *separators):
    if len(separators) == 0:
        return string.split()
    if len(separators) > 1:
        table = {
            ord(separator): ord(separator[0])
            for separator in separators
        }
        string = string.translate(table)
    return string.split(separators[0])

注意:仅当分隔符由单个字符组成时(如我的用例),此功能才有用。

I recently needed to do this but wanted a function that somewhat matched the standard library str.split function, this function behaves the same as standard library when called with 0 or 1 arguments.

def split_many(string, *separators):
    if len(separators) == 0:
        return string.split()
    if len(separators) > 1:
        table = {
            ord(separator): ord(separator[0])
            for separator in separators
        }
        string = string.translate(table)
    return string.split(separators[0])

NOTE: This function is only useful when your separators consist of a single character (as was my usecase).


将字符串打印到文本文件

问题:将字符串打印到文本文件

我正在使用Python打开文本文档:

text_file = open("Output.txt", "w")

text_file.write("Purchase Amount: " 'TotalAmount')

text_file.close()

我想将字符串变量的值替换TotalAmount为文本文档。有人可以让我知道怎么做吗?

I’m using Python to open a text document:

text_file = open("Output.txt", "w")

text_file.write("Purchase Amount: " 'TotalAmount')

text_file.close()

I want to substitute the value of a string variable TotalAmount into the text document. Can someone please let me know how to do this?


回答 0

text_file = open("Output.txt", "w")
text_file.write("Purchase Amount: %s" % TotalAmount)
text_file.close()

如果使用上下文管理器,则将自动为您关闭文件

with open("Output.txt", "w") as text_file:
    text_file.write("Purchase Amount: %s" % TotalAmount)

如果您使用的是Python2.6或更高版本,则最好使用 str.format()

with open("Output.txt", "w") as text_file:
    text_file.write("Purchase Amount: {0}".format(TotalAmount))

对于python2.7及更高版本,您可以使用{}代替{0}

在Python3中,fileprint函数有一个可选参数

with open("Output.txt", "w") as text_file:
    print("Purchase Amount: {}".format(TotalAmount), file=text_file)

Python3.6引入了f字符串作为另一种选择

with open("Output.txt", "w") as text_file:
    print(f"Purchase Amount: {TotalAmount}", file=text_file)
text_file = open("Output.txt", "w")
text_file.write("Purchase Amount: %s" % TotalAmount)
text_file.close()

If you use a context manager, the file is closed automatically for you

with open("Output.txt", "w") as text_file:
    text_file.write("Purchase Amount: %s" % TotalAmount)

If you’re using Python2.6 or higher, it’s preferred to use str.format()

with open("Output.txt", "w") as text_file:
    text_file.write("Purchase Amount: {0}".format(TotalAmount))

For python2.7 and higher you can use {} instead of {0}

In Python3, there is an optional file parameter to the print function

with open("Output.txt", "w") as text_file:
    print("Purchase Amount: {}".format(TotalAmount), file=text_file)

Python3.6 introduced f-strings for another alternative

with open("Output.txt", "w") as text_file:
    print(f"Purchase Amount: {TotalAmount}", file=text_file)

回答 1

如果要传递多个参数,可以使用元组

price = 33.3
with open("Output.txt", "w") as text_file:
    text_file.write("Purchase Amount: %s price %f" % (TotalAmount, price))

更多:在python中打印多个参数

In case you want to pass multiple arguments you can use a tuple

price = 33.3
with open("Output.txt", "w") as text_file:
    text_file.write("Purchase Amount: %s price %f" % (TotalAmount, price))

More: Print multiple arguments in python


回答 2

如果您使用的是Python3。

然后可以使用打印功能

your_data = {"Purchase Amount": 'TotalAmount'}
print(your_data,  file=open('D:\log.txt', 'w'))

对于python2

这是Python打印字符串到文本文件的示例

def my_func():
    """
    this function return some value
    :return:
    """
    return 25.256


def write_file(data):
    """
    this function write data to file
    :param data:
    :return:
    """
    file_name = r'D:\log.txt'
    with open(file_name, 'w') as x_file:
        x_file.write('{} TotalAmount'.format(data))


def run():
    data = my_func()
    write_file(data)


run()

If you are using Python3.

then you can use Print Function :

your_data = {"Purchase Amount": 'TotalAmount'}
print(your_data,  file=open('D:\log.txt', 'w'))

For python2

this is the example of Python Print String To Text File

def my_func():
    """
    this function return some value
    :return:
    """
    return 25.256


def write_file(data):
    """
    this function write data to file
    :param data:
    :return:
    """
    file_name = r'D:\log.txt'
    with open(file_name, 'w') as x_file:
        x_file.write('{} TotalAmount'.format(data))


def run():
    data = my_func()
    write_file(data)


run()

回答 3

如果您使用的是numpy,则只需一行即可将单个(或乘)字符串打印到文件中:

numpy.savetxt('Output.txt', ["Purchase Amount: %s" % TotalAmount], fmt='%s')

If you are using numpy, printing a single (or multiply) strings to a file can be done with just one line:

numpy.savetxt('Output.txt', ["Purchase Amount: %s" % TotalAmount], fmt='%s')

回答 4

使用pathlib模块时,不需要缩进。

import pathlib
pathlib.Path("output.txt").write_text("Purchase Amount: {}" .format(TotalAmount))

从python 3.6开始,f字符串可用。

pathlib.Path("output.txt").write_text(f"Purchase Amount: {TotalAmount}")

With using pathlib module, indentation isn’t needed.

import pathlib
pathlib.Path("output.txt").write_text("Purchase Amount: {}" .format(TotalAmount))

As of python 3.6, f-strings is available.

pathlib.Path("output.txt").write_text(f"Purchase Amount: {TotalAmount}")

将字节转换为字符串

问题:将字节转换为字符串

我正在使用以下代码从外部程序获取标准输出:

>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]

communication()方法返回一个字节数组:

>>> command_stdout
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

但是,我想将输出作为普通的Python字符串使用。这样我就可以像这样打印它:

>>> print(command_stdout)
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2

我认为这就是binascii.b2a_qp()方法的用途,但是当我尝试使用它时,我又得到了相同的字节数组:

>>> binascii.b2a_qp(command_stdout)
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

如何将字节值转换回字符串?我的意思是,使用“电池”而不是手动进行操作。我希望它与Python 3兼容。

I’m using this code to get standard output from an external program:

>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]

The communicate() method returns an array of bytes:

>>> command_stdout
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

However, I’d like to work with the output as a normal Python string. So that I could print it like this:

>>> print(command_stdout)
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2

I thought that’s what the binascii.b2a_qp() method is for, but when I tried it, I got the same byte array again:

>>> binascii.b2a_qp(command_stdout)
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

How do I convert the bytes value back to string? I mean, using the “batteries” instead of doing it manually. And I’d like it to be OK with Python 3.


回答 0

您需要解码bytes对象以产生一个字符串:

>>> b"abcde"
b'abcde'

# utf-8 is used here because it is a very common encoding, but you
# need to use the encoding your data is actually in.
>>> b"abcde".decode("utf-8") 
'abcde'

You need to decode the bytes object to produce a string:

>>> b"abcde"
b'abcde'

# utf-8 is used here because it is a very common encoding, but you
# need to use the encoding your data is actually in.
>>> b"abcde".decode("utf-8") 
'abcde'

回答 1

您需要解码该字节字符串,然后将其转换为字符(Unicode)字符串。

在Python 2上

encoding = 'utf-8'
'hello'.decode(encoding)

要么

unicode('hello', encoding)

在Python 3上

encoding = 'utf-8'
b'hello'.decode(encoding)

要么

str(b'hello', encoding)

You need to decode the byte string and turn it in to a character (Unicode) string.

On Python 2

encoding = 'utf-8'
'hello'.decode(encoding)

or

unicode('hello', encoding)

On Python 3

encoding = 'utf-8'
b'hello'.decode(encoding)

or

str(b'hello', encoding)

回答 2

我认为这种方式很简单:

>>> bytes_data = [112, 52, 52]
>>> "".join(map(chr, bytes_data))
'p44'

I think this way is easy:

>>> bytes_data = [112, 52, 52]
>>> "".join(map(chr, bytes_data))
'p44'

回答 3

如果您不知道编码,则要以Python 3和Python 2兼容的方式将二进制输入读取为字符串,请使用古老的MS-DOS CP437编码:

PY3K = sys.version_info >= (3, 0)

lines = []
for line in stream:
    if not PY3K:
        lines.append(line)
    else:
        lines.append(line.decode('cp437'))

因为编码是未知的,所以希望将非英语符号转换为字符cp437(不会翻译英语字符,因为它们在大多数单字节编码和UTF-8中都匹配)。

将任意二进制输入解码为UTF-8是不安全的,因为您可能会得到以下信息:

>>> b'\x00\x01\xffsd'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 2: invalid
start byte

同样适用于latin-1,这在Python 2中很流行(默认?)。请参见“ 代码页布局”中的遗漏之处-这是Python臭名昭著的地方ordinal not in range

UPDATE 20150604:有传言称Python 3具有surrogateescape错误策略,可将内容编码为二进制数据而不会导致数据丢失和崩溃,但它需要进行转换测试[binary] -> [str] -> [binary],以验证性能和可靠性。

更新20170116:感谢评论-还可以使用backslashreplace错误处理程序对所有未知字节进行斜线转义。这仅适用于Python 3,因此即使采用这种解决方法,您仍然会从不同的Python版本获得不一致的输出:

PY3K = sys.version_info >= (3, 0)

lines = []
for line in stream:
    if not PY3K:
        lines.append(line)
    else:
        lines.append(line.decode('utf-8', 'backslashreplace'))

看到 详细信息, Python的Unicode支持

更新20170119:我决定实现适用于Python 2和Python 3的斜线转义解码。它应该比cp437解决方案要慢,但是在每个Python版本上都应产生相同的结果

# --- preparation

import codecs

def slashescape(err):
    """ codecs error handler. err is UnicodeDecode instance. return
    a tuple with a replacement for the unencodable part of the input
    and a position where encoding should continue"""
    #print err, dir(err), err.start, err.end, err.object[:err.start]
    thebyte = err.object[err.start:err.end]
    repl = u'\\x'+hex(ord(thebyte))[2:]
    return (repl, err.end)

codecs.register_error('slashescape', slashescape)

# --- processing

stream = [b'\x80abc']

lines = []
for line in stream:
    lines.append(line.decode('utf-8', 'slashescape'))

If you don’t know the encoding, then to read binary input into string in Python 3 and Python 2 compatible way, use the ancient MS-DOS CP437 encoding:

PY3K = sys.version_info >= (3, 0)

lines = []
for line in stream:
    if not PY3K:
        lines.append(line)
    else:
        lines.append(line.decode('cp437'))

Because encoding is unknown, expect non-English symbols to translate to characters of cp437 (English characters are not translated, because they match in most single byte encodings and UTF-8).

Decoding arbitrary binary input to UTF-8 is unsafe, because you may get this:

>>> b'\x00\x01\xffsd'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 2: invalid
start byte

The same applies to latin-1, which was popular (the default?) for Python 2. See the missing points in Codepage Layout – it is where Python chokes with infamous ordinal not in range.

UPDATE 20150604: There are rumors that Python 3 has the surrogateescape error strategy for encoding stuff into binary data without data loss and crashes, but it needs conversion tests, [binary] -> [str] -> [binary], to validate both performance and reliability.

UPDATE 20170116: Thanks to comment by Nearoo – there is also a possibility to slash escape all unknown bytes with backslashreplace error handler. That works only for Python 3, so even with this workaround you will still get inconsistent output from different Python versions:

PY3K = sys.version_info >= (3, 0)

lines = []
for line in stream:
    if not PY3K:
        lines.append(line)
    else:
        lines.append(line.decode('utf-8', 'backslashreplace'))

See Python’s Unicode Support for details.

UPDATE 20170119: I decided to implement slash escaping decode that works for both Python 2 and Python 3. It should be slower than the cp437 solution, but it should produce identical results on every Python version.

# --- preparation

import codecs

def slashescape(err):
    """ codecs error handler. err is UnicodeDecode instance. return
    a tuple with a replacement for the unencodable part of the input
    and a position where encoding should continue"""
    #print err, dir(err), err.start, err.end, err.object[:err.start]
    thebyte = err.object[err.start:err.end]
    repl = u'\\x'+hex(ord(thebyte))[2:]
    return (repl, err.end)

codecs.register_error('slashescape', slashescape)

# --- processing

stream = [b'\x80abc']

lines = []
for line in stream:
    lines.append(line.decode('utf-8', 'slashescape'))

回答 4

在Python 3中,默认编码为"utf-8",因此您可以直接使用:

b'hello'.decode()

相当于

b'hello'.decode(encoding="utf-8")

另一方面,在Python 2中,编码默认为默认的字符串编码。因此,您应该使用:

b'hello'.decode(encoding)

encoding您想要的编码在哪里。

注意:在Python 2.7中添加了对关键字参数的支持。

In Python 3, the default encoding is "utf-8", so you can directly use:

b'hello'.decode()

which is equivalent to

b'hello'.decode(encoding="utf-8")

On the other hand, in Python 2, encoding defaults to the default string encoding. Thus, you should use:

b'hello'.decode(encoding)

where encoding is the encoding you want.

Note: support for keyword arguments was added in Python 2.7.


回答 5

我认为您实际上想要这样:

>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]
>>> command_text = command_stdout.decode(encoding='windows-1252')

亚伦的答案是正确的,除了您需要知道哪个要使用编码。而且我相信Windows使用的是“ windows-1252”。仅当您的内容中包含一些不寻常的(非ASCII)字符时,这才有意义,但这将有所作为。

顺便说一句,它事实上事情的原因了Python转移到使用两种不同类型的二进制和文本数据:它不能神奇地将它们转换之间,因为它不知道编码,除非你告诉它!您唯一知道的方法是阅读Windows文档(或在此处阅读)。

I think you actually want this:

>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]
>>> command_text = command_stdout.decode(encoding='windows-1252')

Aaron’s answer was correct, except that you need to know which encoding to use. And I believe that Windows uses ‘windows-1252’. It will only matter if you have some unusual (non-ASCII) characters in your content, but then it will make a difference.

By the way, the fact that it does matter is the reason that Python moved to using two different types for binary and text data: it can’t convert magically between them, because it doesn’t know the encoding unless you tell it! The only way YOU would know is to read the Windows documentation (or read it here).


回答 6

将Universal_newlines设置为True,即

command_stdout = Popen(['ls', '-l'], stdout=PIPE, universal_newlines=True).communicate()[0]

Set universal_newlines to True, i.e.

command_stdout = Popen(['ls', '-l'], stdout=PIPE, universal_newlines=True).communicate()[0]

回答 7

虽然@Aaron Maenpaa的答案有效,但最近有用户

有没有更简单的方法?’fhand.read()。decode(“ ASCII”)'[…]太长了!

您可以使用:

command_stdout.decode()

decode()有一个标准参数

codecs.decode(obj, encoding='utf-8', errors='strict')

While @Aaron Maenpaa’s answer just works, a user recently asked:

Is there any more simply way? ‘fhand.read().decode(“ASCII”)’ […] It’s so long!

You can use:

command_stdout.decode()

decode() has a standard argument:

codecs.decode(obj, encoding='utf-8', errors='strict')


回答 8

要将字节序列解释为文本,您必须知道相应的字符编码:

unicode_text = bytestring.decode(character_encoding)

例:

>>> b'\xc2\xb5'.decode('utf-8')
'µ'

ls命令可能会产生无法解释为文本的输出。Unix上的文件名可以是任何字节序列,但斜杠b'/'和零 除外b'\0'

>>> open(bytes(range(0x100)).translate(None, b'\0/'), 'w').close()

尝试使用utf-8编码对此类字节汤进行解码将引发UnicodeDecodeError

可能会更糟。 如果使用错误的不兼容编码,解码可能会默默失败并产生mojibake

>>> '—'.encode('utf-8').decode('cp1252')
'—'

数据已损坏,但是您的程序仍然不知道发生了故障。

通常,要使用的字符编码不会嵌入字节序列本身。您必须带外传达此信息。一些结果比其他结果更有可能,因此chardet存在可以猜测字符编码的模块。单个Python脚本可能在不同位置使用多种字符编码。


ls可以使用os.fsdecode() 即使对于无法解码的文件名也成功的函数将输出转换为Python字符串(在Unix上使用 sys.getfilesystemencoding()surrogateescape错误处理程序):

import os
import subprocess

output = os.fsdecode(subprocess.check_output('ls'))

要获取原始字节,可以使用os.fsencode()

如果传递universal_newlines=True参数,则subprocess用于 locale.getpreferredencoding(False)解码字节,例如,它可以 cp1252在Windows上使用。

要实时解码字节流, io.TextIOWrapper() 可以使用:example

不同的命令可能对其输出使用不同的字符编码,例如,dir内部命令(cmd)可能使用cp437。要解码其输出,可以显式传递编码(Python 3.6+):

output = subprocess.check_output('dir', shell=True, encoding='cp437')

文件名可能与os.listdir()(使用Windows Unicode API)不同(例如,'\xb6'可以用'\x14'—Python的cp437编解码器映射b'\x14'代替)来控制字符U + 0014而不是U + 00B6(¶)。要支持带有任意Unicode字符的文件名,请参阅将 PowerShell输出可能包含非ASCII Unicode字符解码为Python字符串。

To interpret a byte sequence as a text, you have to know the corresponding character encoding:

unicode_text = bytestring.decode(character_encoding)

Example:

>>> b'\xc2\xb5'.decode('utf-8')
'µ'

ls command may produce output that can’t be interpreted as text. File names on Unix may be any sequence of bytes except slash b'/' and zero b'\0':

>>> open(bytes(range(0x100)).translate(None, b'\0/'), 'w').close()

Trying to decode such byte soup using utf-8 encoding raises UnicodeDecodeError.

It can be worse. The decoding may fail silently and produce mojibake if you use a wrong incompatible encoding:

>>> '—'.encode('utf-8').decode('cp1252')
'—'

The data is corrupted but your program remains unaware that a failure has occurred.

In general, what character encoding to use is not embedded in the byte sequence itself. You have to communicate this info out-of-band. Some outcomes are more likely than others and therefore chardet module exists that can guess the character encoding. A single Python script may use multiple character encodings in different places.


ls output can be converted to a Python string using os.fsdecode() function that succeeds even for undecodable filenames (it uses sys.getfilesystemencoding() and surrogateescape error handler on Unix):

import os
import subprocess

output = os.fsdecode(subprocess.check_output('ls'))

To get the original bytes, you could use os.fsencode().

If you pass universal_newlines=True parameter then subprocess uses locale.getpreferredencoding(False) to decode bytes e.g., it can be cp1252 on Windows.

To decode the byte stream on-the-fly, io.TextIOWrapper() could be used: example.

Different commands may use different character encodings for their output e.g., dir internal command (cmd) may use cp437. To decode its output, you could pass the encoding explicitly (Python 3.6+):

output = subprocess.check_output('dir', shell=True, encoding='cp437')

The filenames may differ from os.listdir() (which uses Windows Unicode API) e.g., '\xb6' can be substituted with '\x14'—Python’s cp437 codec maps b'\x14' to control character U+0014 instead of U+00B6 (¶). To support filenames with arbitrary Unicode characters, see Decode PowerShell output possibly containing non-ASCII Unicode characters into a Python string


回答 9

由于这个问题实际上是在询问subprocess输出,因此您可以使用更直接的方法,因为它Popen接受了encoding关键字(在Python 3.6+中):

>>> from subprocess import Popen, PIPE
>>> text = Popen(['ls', '-l'], stdout=PIPE, encoding='utf-8').communicate()[0]
>>> type(text)
str
>>> print(text)
total 0
-rw-r--r-- 1 wim badger 0 May 31 12:45 some_file.txt

其他用户的一般答案是将字节解码为文本:

>>> b'abcde'.decode()
'abcde'

没有参数,sys.getdefaultencoding()将被使用。如果您的数据不是sys.getdefaultencoding(),那么您必须在decode调用中显式指定编码:

>>> b'caf\xe9'.decode('cp1250')
'café'

Since this question is actually asking about subprocess output, you have a more direct approach available since Popen accepts an encoding keyword (in Python 3.6+):

>>> from subprocess import Popen, PIPE
>>> text = Popen(['ls', '-l'], stdout=PIPE, encoding='utf-8').communicate()[0]
>>> type(text)
str
>>> print(text)
total 0
-rw-r--r-- 1 wim badger 0 May 31 12:45 some_file.txt

The general answer for other users is to decode bytes to text:

>>> b'abcde'.decode()
'abcde'

With no argument, sys.getdefaultencoding() will be used. If your data is not sys.getdefaultencoding(), then you must specify the encoding explicitly in the decode call:

>>> b'caf\xe9'.decode('cp1250')
'café'

回答 10

如果您应该尝试以下操作decode()

AttributeError:“ str”对象没有属性“ decode”

您还可以直接在转换中指定编码类型:

>>> my_byte_str
b'Hello World'

>>> str(my_byte_str, 'utf-8')
'Hello World'

If you should get the following by trying decode():

AttributeError: ‘str’ object has no attribute ‘decode’

You can also specify the encoding type straight in a cast:

>>> my_byte_str
b'Hello World'

>>> str(my_byte_str, 'utf-8')
'Hello World'

回答 11

当使用Windows系统中的数据(以\r\n行结尾)时,我的答案是

String = Bytes.decode("utf-8").replace("\r\n", "\n")

为什么?尝试使用多行Input.txt:

Bytes = open("Input.txt", "rb").read()
String = Bytes.decode("utf-8")
open("Output.txt", "w").write(String)

您所有的行尾都将加倍(以 \r\r\n),从而导致多余的空行。Python的文本读取函数通常会规范行尾,因此字符串只能使用\n。如果您从Windows系统接收二进制数据,Python将没有机会这样做。从而,

Bytes = open("Input.txt", "rb").read()
String = Bytes.decode("utf-8").replace("\r\n", "\n")
open("Output.txt", "w").write(String)

将复制您的原始文件。

When working with data from Windows systems (with \r\n line endings), my answer is

String = Bytes.decode("utf-8").replace("\r\n", "\n")

Why? Try this with a multiline Input.txt:

Bytes = open("Input.txt", "rb").read()
String = Bytes.decode("utf-8")
open("Output.txt", "w").write(String)

All your line endings will be doubled (to \r\r\n), leading to extra empty lines. Python’s text-read functions usually normalize line endings so that strings use only \n. If you receive binary data from a Windows system, Python does not have a chance to do that. Thus,

Bytes = open("Input.txt", "rb").read()
String = Bytes.decode("utf-8").replace("\r\n", "\n")
open("Output.txt", "w").write(String)

will replicate your original file.


回答 12

我做了一个清理清单的功能

def cleanLists(self, lista):
    lista = [x.strip() for x in lista]
    lista = [x.replace('\n', '') for x in lista]
    lista = [x.replace('\b', '') for x in lista]
    lista = [x.encode('utf8') for x in lista]
    lista = [x.decode('utf8') for x in lista]

    return lista

I made a function to clean a list

def cleanLists(self, lista):
    lista = [x.strip() for x in lista]
    lista = [x.replace('\n', '') for x in lista]
    lista = [x.replace('\b', '') for x in lista]
    lista = [x.encode('utf8') for x in lista]
    lista = [x.decode('utf8') for x in lista]

    return lista

回答 13

对于Python 3,这是一个更安全和Python的方法来从转换bytestring

def byte_to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes): # Check if it's in bytes
        print(bytes_or_str.decode('utf-8'))
    else:
        print("Object not of byte type")

byte_to_str(b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n')

输出:

total 0
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2

For Python 3, this is a much safer and Pythonic approach to convert from byte to string:

def byte_to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes): # Check if it's in bytes
        print(bytes_or_str.decode('utf-8'))
    else:
        print("Object not of byte type")

byte_to_str(b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n')

Output:

total 0
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2

回答 14

sys —系统特定的参数和功能

要从标准流写入二进制数据或从标准流读取二进制数据,请使用基础二进制缓冲区。例如,要将字节写入stdout,请使用sys.stdout.buffer.write(b'abc')

From sys — System-specific parameters and functions:

To write or read binary data from/to the standard streams, use the underlying binary buffer. For example, to write bytes to stdout, use sys.stdout.buffer.write(b'abc').


回答 15

def toString(string):    
    try:
        return v.decode("utf-8")
    except ValueError:
        return string

b = b'97.080.500'
s = '97.080.500'
print(toString(b))
print(toString(s))
def toString(string):    
    try:
        return v.decode("utf-8")
    except ValueError:
        return string

b = b'97.080.500'
s = '97.080.500'
print(toString(b))
print(toString(s))

回答 16

对于“运行shell命令并以文本而不是字节形式获取其输出” 的特定情况,在Python 3.7上,您应该使用subprocess.run并传入text=True(以及capture_output=True捕获输出)

command_result = subprocess.run(["ls", "-l"], capture_output=True, text=True)
command_result.stdout  # is a `str` containing your program's stdout

text过去称为universal_newlines,并在Python 3.7中进行了更改(很好,为别名)。如果要支持3.7之前的Python版本,请传入universal_newlines=True而不是text=True

For your specific case of “run a shell command and get its output as text instead of bytes”, on Python 3.7, you should use subprocess.run and pass in text=True (as well as capture_output=True to capture the output)

command_result = subprocess.run(["ls", "-l"], capture_output=True, text=True)
command_result.stdout  # is a `str` containing your program's stdout

text used to be called universal_newlines, and was changed (well, aliased) in Python 3.7. If you want to support Python versions before 3.7, pass in universal_newlines=True instead of text=True


回答 17

如果要转换任何字节,而不仅仅是将字符串转换为字节:

with open("bytesfile", "rb") as infile:
    str = base64.b85encode(imageFile.read())

with open("bytesfile", "rb") as infile:
    str2 = json.dumps(list(infile.read()))

但是,这不是很有效。它将2 MB的图片变成9 MB。

If you want to convert any bytes, not just string converted to bytes:

with open("bytesfile", "rb") as infile:
    str = base64.b85encode(imageFile.read())

with open("bytesfile", "rb") as infile:
    str2 = json.dumps(list(infile.read()))

This is not very efficient, however. It will turn a 2 MB picture into 9 MB.


回答 18

尝试这个

bytes.fromhex('c3a9').decode('utf-8') 

try this

bytes.fromhex('c3a9').decode('utf-8') 

如何在Python中获取字符串的子字符串?

问题:如何在Python中获取字符串的子字符串?

有没有一种方法可以在Python中为字符串加上字符串,以从第三个字符到字符串的末尾获取新的字符串?

也许喜欢myString[2:end]吗?

如果离开第二部分意味着“直到最后”,而如果离开第一部分,它是否从头开始?

Is there a way to substring a string in Python, to get a new string from the third character to the end of the string?

Maybe like myString[2:end]?

If leaving the second part means ’till the end’, and if you leave the first part, does it start from the start?


回答 0

>>> x = "Hello World!"
>>> x[2:]
'llo World!'
>>> x[:2]
'He'
>>> x[:-2]
'Hello Worl'
>>> x[-2:]
'd!'
>>> x[2:-2]
'llo Worl'

Python称这个概念为“切片”,它不仅适用于字符串,还适用于更多的领域。看看这里的一个全面的介绍。

>>> x = "Hello World!"
>>> x[2:]
'llo World!'
>>> x[:2]
'He'
>>> x[:-2]
'Hello Worl'
>>> x[-2:]
'd!'
>>> x[2:-2]
'llo Worl'

Python calls this concept “slicing” and it works on more than just strings. Take a look here for a comprehensive introduction.


回答 1

只是为了完整性,没有其他人提到过它。数组切片的第三个参数是一个步骤。因此,反转字符串很简单:

some_string[::-1]

或选择其他字符为:

"H-e-l-l-o- -W-o-r-l-d"[::2] # outputs "Hello World"

在字符串中前进和后退的能力保持了从头到尾排列切片的一致性。

Just for completeness as nobody else has mentioned it. The third parameter to an array slice is a step. So reversing a string is as simple as:

some_string[::-1]

Or selecting alternate characters would be:

"H-e-l-l-o- -W-o-r-l-d"[::2] # outputs "Hello World"

The ability to step forwards and backwards through the string maintains consistency with being able to array slice from the start or end.


回答 2

Substr()通常(即PHP和Perl)以这种方式工作:

s = Substr(s, beginning, LENGTH)

因此参数为beginningLENGTH

但是Python的行为是不同的。它期望从开始到结束(!)。初学者很难发现这一点。因此,正确替换Substr(s,Beginning,LENGTH)是

s = s[ beginning : beginning + LENGTH]

Substr() normally (i.e. PHP and Perl) works this way:

s = Substr(s, beginning, LENGTH)

So the parameters are beginning and LENGTH.

But Python’s behaviour is different; it expects beginning and one after END (!). This is difficult to spot by beginners. So the correct replacement for Substr(s, beginning, LENGTH) is

s = s[ beginning : beginning + LENGTH]

回答 3

实现此目的的一种常见方法是通过字符串切片。

MyString[a:b] 给您一个从索引a到(b-1)的子字符串。

A common way to achieve this is by string slicing.

MyString[a:b] gives you a substring from index a to (b – 1).


回答 4

这里似乎缺少一个示例:完整(浅)副本。

>>> x = "Hello World!"
>>> x
'Hello World!'
>>> x[:]
'Hello World!'
>>> x==x[:]
True
>>>

这是用于创建序列类型(而非插入字符串)的副本的常见用法[:]。浅表复制列表,请参阅无明显原因的Python列表切片语法

One example seems to be missing here: full (shallow) copy.

>>> x = "Hello World!"
>>> x
'Hello World!'
>>> x[:]
'Hello World!'
>>> x==x[:]
True
>>>

This is a common idiom for creating a copy of sequence types (not of interned strings), [:]. Shallow copies a list, see Python list slice syntax used for no obvious reason.


回答 5

有没有一种方法可以在Python中为字符串加上字符串,以从第3个字符到字符串的末尾获取新的字符串?

也许喜欢myString[2:end]吗?

是的,如果您将名称()分配或绑定end到常量单例,这实际上是可行的None

>>> end = None
>>> myString = '1234567890'
>>> myString[2:end]
'34567890'

切片符号具有3个重要参数:

  • 开始

如果未指定,则默认为None-但我们可以显式传递它们:

>>> stop = step = None
>>> start = 2
>>> myString[start:stop:step]
'34567890'

如果离开第二部分意味着“直到最后”,那么如果离开第一部分,它是否从头开始?

是的,例如:

>>> start = None
>>> stop = 2
>>> myString[start:stop:step]
'12'

请注意,我们在切片中包括了开始,但是我们仅上至(不包括)停止。

当step为时None,默认情况下切片将1用于该步骤。如果您使用负整数执行操作,则Python足够聪明,可以从头到尾进行操作。

>>> myString[::-1]
'0987654321'

我在对“解释切片符号问题”的回答中会详细解释切片符号。

Is there a way to substring a string in Python, to get a new string from the 3rd character to the end of the string?

Maybe like myString[2:end]?

Yes, this actually works if you assign, or bind, the name,end, to constant singleton, None:

>>> end = None
>>> myString = '1234567890'
>>> myString[2:end]
'34567890'

Slice notation has 3 important arguments:

  • start
  • stop
  • step

Their defaults when not given are None – but we can pass them explicitly:

>>> stop = step = None
>>> start = 2
>>> myString[start:stop:step]
'34567890'

If leaving the second part means ’till the end’, if you leave the first part, does it start from the start?

Yes, for example:

>>> start = None
>>> stop = 2
>>> myString[start:stop:step]
'12'

Note that we include start in the slice, but we only go up to, and not including, stop.

When step is None, by default the slice uses 1 for the step. If you step with a negative integer, Python is smart enough to go from the end to the beginning.

>>> myString[::-1]
'0987654321'

I explain slice notation in great detail in my answer to Explain slice notation Question.


回答 6

除了“结束”之外,您已经准备就绪。这称为切片符号。您的示例应为:

new_sub_string = myString[2:]

如果省略第二个参数,则它隐式为字符串的结尾。

You’ve got it right there except for “end”. It’s called slice notation. Your example should read:

new_sub_string = myString[2:]

If you leave out the second parameter it is implicitly the end of the string.


回答 7

我想在讨论中添加两点:

  1. 您可以None改为在空白处使用“从头开始”或“到末尾”来指定:

    'abcde'[2:None] == 'abcde'[2:] == 'cde'

    这在不能提供空格作为参数的函数中特别有用:

    def substring(s, start, end):
        """Remove `start` characters from the beginning and `end` 
        characters from the end of string `s`.
    
        Examples
        --------
        >>> substring('abcde', 0, 3)
        'abc'
        >>> substring('abcde', 1, None)
        'bcde'
        """
        return s[start:end]
  2. Python具有切片对象:

    idx = slice(2, None)
    'abcde'[idx] == 'abcde'[2:] == 'cde'

I would like to add two points to the discussion:

  1. You can use None instead on an empty space to specify “from the start” or “to the end”:

    'abcde'[2:None] == 'abcde'[2:] == 'cde'
    

    This is particularly helpful in functions, where you can’t provide an empty space as an argument:

    def substring(s, start, end):
        """Remove `start` characters from the beginning and `end` 
        characters from the end of string `s`.
    
        Examples
        --------
        >>> substring('abcde', 0, 3)
        'abc'
        >>> substring('abcde', 1, None)
        'bcde'
        """
        return s[start:end]
    
  2. Python has slice objects:

    idx = slice(2, None)
    'abcde'[idx] == 'abcde'[2:] == 'cde'
    

回答 8

如果myString包含以偏移量6开始且长度为9的帐号,则可以通过以下方式提取该帐号: acct = myString[6:][:9]

如果OP接受,他们可能想尝试一下,

myString[2:][:999999]

它可以正常工作-不会引发任何错误,也不会发生默认的“字符串填充”。

If myString contains an account number that begins at offset 6 and has length 9, then you can extract the account number this way: acct = myString[6:][:9].

If the OP accepts that, they might want to try, in an experimental fashion,

myString[2:][:999999]

It works – no error is raised, and no default ‘string padding’ occurs.


回答 9

也许我错过了,但是在此页面上找不到原始问题的完整答案,因为这里没有进一步讨论变量。所以我不得不继续寻找。

由于尚未允许我发表评论,因此让我在这里添加我的结论。我确定访问此页面时,我不是唯一对此感兴趣的人:

 >>>myString = 'Hello World'
 >>>end = 5

 >>>myString[2:end]
 'llo'

如果您离开第一部分,您会得到

 >>>myString[:end]
 'Hello' 

如果在中间也留下了:,则会得到最简单的子字符串,它是第5个字符(计数从0开始,因此在这种情况下为空白):

 >>>myString[end]
 ' '

Maybe I missed it, but I couldn’t find a complete answer on this page to the original question(s) because variables are not further discussed here. So I had to go on searching.

Since I’m not yet allowed to comment, let me add my conclusion here. I’m sure I was not the only one interested in it when accessing this page:

 >>>myString = 'Hello World'
 >>>end = 5

 >>>myString[2:end]
 'llo'

If you leave the first part, you get

 >>>myString[:end]
 'Hello' 

And if you left the : in the middle as well you got the simplest substring, which would be the 5th character (count starting with 0, so it’s the blank in this case):

 >>>myString[end]
 ' '

回答 10

好吧,我遇到了需要将PHP脚本转换为Python的情况,并且它有许多用法substr(string, beginning, LENGTH)
如果选择Python,string[beginning:end]则必须计算大量的结束索引,因此更简单的方法是使用string[beginning:][:length],这为我省去了很多麻烦。

Well, I got a situation where I needed to translate a PHP script to Python, and it had many usages of substr(string, beginning, LENGTH).
If I chose Python’s string[beginning:end] I’d have to calculate a lot of end indexes, so the easier way was to use string[beginning:][:length], it saved me a lot of trouble.


回答 11

使用硬编码的索引本身可能是一团糟。

为了避免这种情况,Python提供了一个内置对象slice()

string = "my company has 1000$ on profit, but I lost 500$ gambling."

如果我们想知道我剩下多少钱。

正常解决方案:

final = int(string[15:19]) - int(string[43:46])
print(final)
>>>500

使用切片:

EARNINGS = slice(15, 19)
LOSSES = slice(43, 46)
final = int(string[EARNINGS]) - int(string[LOSSES])
print(final)
>>>500

使用切片可以获得可读性。

Using hardcoded indexes itself can be a mess.

In order to avoid that, Python offers a built-in object slice().

string = "my company has 1000$ on profit, but I lost 500$ gambling."

If we want to know how many money I got left.

Normal solution:

final = int(string[15:19]) - int(string[43:46])
print(final)
>>>500

Using slices:

EARNINGS = slice(15, 19)
LOSSES = slice(43, 46)
final = int(string[EARNINGS]) - int(string[LOSSES])
print(final)
>>>500

Using slice you gain readability.


如何在Python中小写一个字符串?

问题:如何在Python中小写一个字符串?

有没有一种方法可以将字符串从大写,甚至部分大写转换为小写?

例如,“公里”→“公里”。

Is there a way to convert a string from uppercase, or even part uppercase to lowercase?

For example, “Kilometers” → “kilometers”.


回答 0

用途.lower()-例如:

s = "Kilometer"
print(s.lower())

官方2.x文档在这里: 官方3.x文档在这里:str.lower()
str.lower()

Use .lower() – For example:

s = "Kilometer"
print(s.lower())

The official 2.x documentation is here: str.lower()
The official 3.x documentation is here: str.lower()


回答 1

如何在Python中将字符串转换为小写?

有什么办法可以将整个用户输入的字符串从大写甚至部分大写转换为小写?

例如公里->公里

规范的Python方式是

>>> 'Kilometers'.lower()
'kilometers'

但是,如果目的是进行不区分大小写的匹配,则应使用大小写折叠:

>>> 'Kilometers'.casefold()
'kilometers'

原因如下:

>>> "Maße".casefold()
'masse'
>>> "Maße".lower()
'maße'
>>> "MASSE" == "Maße"
False
>>> "MASSE".lower() == "Maße".lower()
False
>>> "MASSE".casefold() == "Maße".casefold()
True

这是Python 3中的str方法,但是在Python 2中,您需要查看PyICU或py2casefold- 几个答案在此解决

Unicode Python 3

Python 3将纯字符串文字处理为unicode:

>>> string = 'Километр'
>>> string
'Километр'
>>> string.lower()
'километр'

Python 2,纯字符串文字是字节

在Python 2中,将以下内容粘贴到外壳中,使用以下命令将文字编码为字节字符串 utf-8

并且lower不映射字节会知道的任何更改,因此我们得到相同的字符串。

>>> string = 'Километр'
>>> string
'\xd0\x9a\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> string.lower()
'\xd0\x9a\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> print string.lower()
Километр

在脚本中,Python将反对非ascii(从Python 2.5开始,在Python 2.4中为警告)字节,该字节位于未给出编码的字符串中,因为预期的编码将是模棱两可的。有关更多信息,请参阅文档PEP 263中的Unicode操作方法。

使用Unicode文字,而不是str文字

因此,我们需要一个unicode字符串来处理此转换,只需使用unicode字符串文字即可轻松完成此操作,该字符串文字可以用u前缀消除歧义(请注意,该u前缀在Python 3中也适用):

>>> unicode_literal = u'Километр'
>>> print(unicode_literal.lower())
километр

请注意,字节与字节完全不同str-转义字符'\u'后跟2字节宽度,或这些unicode字母的16位表示形式:

>>> unicode_literal
u'\u041a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
>>> unicode_literal.lower()
u'\u043a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'

现在,如果我们仅使用a的形式str,则需要将其转换为unicode。Python的Unicode类型是一种通用编码格式,相对于大多数其他编码而言,它具有许多优点。我们可以使用unicode构造函数或str.decode编解码器方法将转换strunicode

>>> unicode_from_string = unicode(string, 'utf-8') # "encoding" unicode from string
>>> print(unicode_from_string.lower())
километр
>>> string_to_unicode = string.decode('utf-8') 
>>> print(string_to_unicode.lower())
километр
>>> unicode_from_string == string_to_unicode == unicode_literal
True

两种方法都转换为unicode类型-并与unicode_literal相同。

最佳做法,使用Unicode

建议始终使用Unicode文本

软件仅应在内部使用Unicode字符串,并在输出时转换为特定的编码。

必要时可以回编码

但是,要使小写字母恢复为type str,请utf-8再次将python字符串编码为:

>>> print string
Километр
>>> string
'\xd0\x9a\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> string.decode('utf-8')
u'\u041a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
>>> string.decode('utf-8').lower()
u'\u043a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
>>> string.decode('utf-8').lower().encode('utf-8')
'\xd0\xba\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> print string.decode('utf-8').lower().encode('utf-8')
километр

因此,在Python 2中,Unicode可以编码为Python字符串,而Python字符串可以解码为Unicode类型。

How to convert string to lowercase in Python?

Is there any way to convert an entire user inputted string from uppercase, or even part uppercase to lowercase?

E.g. Kilometers –> kilometers

The canonical Pythonic way of doing this is

>>> 'Kilometers'.lower()
'kilometers'

However, if the purpose is to do case insensitive matching, you should use case-folding:

>>> 'Kilometers'.casefold()
'kilometers'

Here’s why:

>>> "Maße".casefold()
'masse'
>>> "Maße".lower()
'maße'
>>> "MASSE" == "Maße"
False
>>> "MASSE".lower() == "Maße".lower()
False
>>> "MASSE".casefold() == "Maße".casefold()
True

This is a str method in Python 3, but in Python 2, you’ll want to look at the PyICU or py2casefold – several answers address this here.

Unicode Python 3

Python 3 handles plain string literals as unicode:

>>> string = 'Километр'
>>> string
'Километр'
>>> string.lower()
'километр'

Python 2, plain string literals are bytes

In Python 2, the below, pasted into a shell, encodes the literal as a string of bytes, using utf-8.

And lower doesn’t map any changes that bytes would be aware of, so we get the same string.

>>> string = 'Километр'
>>> string
'\xd0\x9a\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> string.lower()
'\xd0\x9a\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> print string.lower()
Километр

In scripts, Python will object to non-ascii (as of Python 2.5, and warning in Python 2.4) bytes being in a string with no encoding given, since the intended coding would be ambiguous. For more on that, see the Unicode how-to in the docs and PEP 263

Use Unicode literals, not str literals

So we need a unicode string to handle this conversion, accomplished easily with a unicode string literal, which disambiguates with a u prefix (and note the u prefix also works in Python 3):

>>> unicode_literal = u'Километр'
>>> print(unicode_literal.lower())
километр

Note that the bytes are completely different from the str bytes – the escape character is '\u' followed by the 2-byte width, or 16 bit representation of these unicode letters:

>>> unicode_literal
u'\u041a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
>>> unicode_literal.lower()
u'\u043a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'

Now if we only have it in the form of a str, we need to convert it to unicode. Python’s Unicode type is a universal encoding format that has many advantages relative to most other encodings. We can either use the unicode constructor or str.decode method with the codec to convert the str to unicode:

>>> unicode_from_string = unicode(string, 'utf-8') # "encoding" unicode from string
>>> print(unicode_from_string.lower())
километр
>>> string_to_unicode = string.decode('utf-8') 
>>> print(string_to_unicode.lower())
километр
>>> unicode_from_string == string_to_unicode == unicode_literal
True

Both methods convert to the unicode type – and same as the unicode_literal.

Best Practice, use Unicode

It is recommended that you always work with text in Unicode.

Software should only work with Unicode strings internally, converting to a particular encoding on output.

Can encode back when necessary

However, to get the lowercase back in type str, encode the python string to utf-8 again:

>>> print string
Километр
>>> string
'\xd0\x9a\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> string.decode('utf-8')
u'\u041a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
>>> string.decode('utf-8').lower()
u'\u043a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
>>> string.decode('utf-8').lower().encode('utf-8')
'\xd0\xba\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> print string.decode('utf-8').lower().encode('utf-8')
километр

So in Python 2, Unicode can encode into Python strings, and Python strings can decode into the Unicode type.


回答 2

对于Python 2,这不适用于UTF-8中的非英语单词。在这种情况下decode('utf-8')可以帮助:

>>> s='Километр'
>>> print s.lower()
Километр
>>> print s.decode('utf-8').lower()
километр

With Python 2, this doesn’t work for non-English words in UTF-8. In this case decode('utf-8') can help:

>>> s='Километр'
>>> print s.lower()
Километр
>>> print s.decode('utf-8').lower()
километр

回答 3

另外,您可以覆盖一些变量:

s = input('UPPER CASE')
lower = s.lower()

如果您这样使用:

s = "Kilometer"
print(s.lower())     - kilometer
print(s)             - Kilometer

它会在被调用时起作用。

Also, you can overwrite some variables:

s = input('UPPER CASE')
lower = s.lower()

If you use like this:

s = "Kilometer"
print(s.lower())     - kilometer
print(s)             - Kilometer

It will work just when called.


回答 4

请勿尝试,完全不推荐,请勿这样做:

import string
s='ABCD'
print(''.join([string.ascii_lowercase[string.ascii_uppercase.index(i)] for i in s]))

输出:

abcd

由于尚无人编写,因此您可以使用 swapcase(因此大写字母将变为小写,反之亦然)(并且在我刚才提到的情况下,应使用此字母(将大写转换为小写,将小写转换为大写)):

s='ABCD'
print(s.swapcase())

输出:

abcd

Don’t try this, totally un-recommend, don’t do this:

import string
s='ABCD'
print(''.join([string.ascii_lowercase[string.ascii_uppercase.index(i)] for i in s]))

Output:

abcd

Since no one wrote it yet you can use swapcase (so uppercase letters will become lowercase, and vice versa) (and this one you should use in cases where i just mentioned (convert upper to lower, lower to upper)):

s='ABCD'
print(s.swapcase())

Output:

abcd

如何将文件逐行读取到列表中?

问题:如何将文件逐行读取到列表中?

如何在Python中读取文件的每一行并将每一行作为元素存储在列表中?

我想逐行读取文件并将每行追加到列表的末尾。

How do I read every line of a file in Python and store each line as an element in a list?

I want to read the file line by line and append each line to the end of the list.


回答 0

with open(filename) as f:
    content = f.readlines()
# you may also want to remove whitespace characters like `\n` at the end of each line
content = [x.strip() for x in content] 
with open(filename) as f:
    content = f.readlines()
# you may also want to remove whitespace characters like `\n` at the end of each line
content = [x.strip() for x in content] 

回答 1

请参阅输入和输出

with open('filename') as f:
    lines = f.readlines()

或通过删除换行符:

with open('filename') as f:
    lines = [line.rstrip() for line in f]

See Input and Ouput:

with open('filename') as f:
    lines = f.readlines()

or with stripping the newline character:

with open('filename') as f:
    lines = [line.rstrip() for line in f]

回答 2

这比必要的要明确,但是可以满足您的要求。

with open("file.txt") as file_in:
    lines = []
    for line in file_in:
        lines.append(line)

This is more explicit than necessary, but does what you want.

with open("file.txt") as file_in:
    lines = []
    for line in file_in:
        lines.append(line)

回答 3

这将从文件中产生行的“数组”。

lines = tuple(open(filename, 'r'))

open返回可以迭代的文件。遍历文件时,您将从该文件中获取行。tuple可以使用一个迭代器,并从赋予它的迭代器中实例化一个元组实例。lines是从文件行创建的元组。

This will yield an “array” of lines from the file.

lines = tuple(open(filename, 'r'))

open returns a file which can be iterated over. When you iterate over a file, you get the lines from that file. tuple can take an iterator and instantiate a tuple instance for you from the iterator that you give it. lines is a tuple created from the lines of the file.


回答 4

如果要\n包括在内:

with open(fname) as f:
    content = f.readlines()

如果你不想 \n包括:

with open(fname) as f:
    content = f.read().splitlines()

If you want the \n included:

with open(fname) as f:
    content = f.readlines()

If you do not want \n included:

with open(fname) as f:
    content = f.read().splitlines()

回答 5

根据Python的文件对象方法,将文本文件转换为a的最简单方法list是:

with open('file.txt') as f:
    my_list = list(f)

如果只需要遍历文本文件行,则可以使用:

with open('file.txt') as f:
    for line in f:
       ...

旧答案:

使用withreadlines()

with open('file.txt') as f:
    lines = f.readlines()

如果您不关心关闭文件,则此单行代码有效:

lines = open('file.txt').readlines()

传统的方法:

f = open('file.txt') # Open file on read mode
lines = f.read().split("\n") # Create a list containing all lines
f.close() # Close file

According to Python’s Methods of File Objects, the simplest way to convert a text file into a list is:

with open('file.txt') as f:
    my_list = list(f)

If you just need to iterate over the text file lines, you can use:

with open('file.txt') as f:
    for line in f:
       ...

Old answer:

Using with and readlines() :

with open('file.txt') as f:
    lines = f.readlines()

If you don’t care about closing the file, this one-liner works:

lines = open('file.txt').readlines()

The traditional way:

f = open('file.txt') # Open file on read mode
lines = f.read().split("\n") # Create a list containing all lines
f.close() # Close file

回答 6

如建议的那样,您可以简单地执行以下操作:

with open('/your/path/file') as f:
    my_lines = f.readlines()

请注意,此方法有两个缺点:

1)您将所有行存储在内存中。在一般情况下,这是一个非常糟糕的主意。该文件可能非常大,并且可能会用完内存。即使它不大,也只是浪费内存。

2)不允许在阅读每行时对其进行处理。因此,如果您在此之后处理行,则效率不高(需要两次通过而不是一次)。

对于一般情况,更好的方法是:

with open('/your/path/file') as f:
    for line in f:
        process(line)

在任何需要的地方定义过程功能。例如:

def process(line):
    if 'save the world' in line.lower():
         superman.save_the_world()

Superman该类的实现留给您练习)。

这对于任何文件大小都可以很好地工作,而且您只需一遍就可以浏览文件。这通常是通用解析器的工作方式。

You could simply do the following, as has been suggested:

with open('/your/path/file') as f:
    my_lines = f.readlines()

Note that this approach has 2 downsides:

1) You store all the lines in memory. In the general case, this is a very bad idea. The file could be very large, and you could run out of memory. Even if it’s not large, it is simply a waste of memory.

2) This does not allow processing of each line as you read them. So if you process your lines after this, it is not efficient (requires two passes rather than one).

A better approach for the general case would be the following:

with open('/your/path/file') as f:
    for line in f:
        process(line)

Where you define your process function any way you want. For example:

def process(line):
    if 'save the world' in line.lower():
         superman.save_the_world()

(The implementation of the Superman class is left as an exercise for you).

This will work nicely for any file size and you go through your file in just 1 pass. This is typically how generic parsers will work.


回答 7

数据入列表

假设我们有一个文本文件,其数据如下行所示,

文字档内容:

line 1
line 2
line 3
  • 在同一目录中打开cmd(右键单击鼠标,然后选择cmd或PowerShell)
  • 运行python并在解释器中编写:

Python脚本:

>>> with open("myfile.txt", encoding="utf-8") as file:
...     x = [l.strip() for l in file]
>>> x
['line 1','line 2','line 3']

使用追加:

x = []
with open("myfile.txt") as file:
    for l in file:
        x.append(l.strip())

要么:

>>> x = open("myfile.txt").read().splitlines()
>>> x
['line 1', 'line 2', 'line 3']

要么:

>>> x = open("myfile.txt").readlines()
>>> x
['linea 1\n', 'line 2\n', 'line 3\n']

要么:

>>> y = [x.rstrip() for x in open("my_file.txt")]
>>> y
['line 1','line 2','line 3']


with open('testodiprova.txt', 'r', encoding='utf-8') as file:
    file = file.read().splitlines()
  print(file)

with open('testodiprova.txt', 'r', encoding='utf-8') as file:
  file = file.readlines()
  print(file)

Data into list

Assume that we have a text file with our data like in the following lines,

Text file content:

line 1
line 2
line 3
  • Open the cmd in the same directory (right-click the mouse and choose cmd or PowerShell)
  • Run python and in the interpreter write:

The Python script:

>>> with open("myfile.txt", encoding="utf-8") as file:
...     x = [l.strip() for l in file]
>>> x
['line 1','line 2','line 3']

Using append:

x = []
with open("myfile.txt") as file:
    for l in file:
        x.append(l.strip())

Or:

>>> x = open("myfile.txt").read().splitlines()
>>> x
['line 1', 'line 2', 'line 3']

Or:

>>> x = open("myfile.txt").readlines()
>>> x
['linea 1\n', 'line 2\n', 'line 3\n']

Or:

>>> y = [x.rstrip() for x in open("my_file.txt")]
>>> y
['line 1','line 2','line 3']


with open('testodiprova.txt', 'r', encoding='utf-8') as file:
    file = file.read().splitlines()
  print(file)

with open('testodiprova.txt', 'r', encoding='utf-8') as file:
  file = file.readlines()
  print(file)

回答 8

要将文件读入列表,您需要做三件事:

  • 开启档案
  • 读取文件
  • 将内容存储为列表

幸运的是,Python使执行这些操作变得非常容易,因此将文件读入列表的最短方法是:

lst = list(open(filename))

但是,我将添加更多解释。

打开文件

我假设您要打开特定文件,并且不直接处理文件句柄(或类似文件的句柄)。在Python中打开文件最常用的功能是open,它在Python 2.7中带有一个强制参数和两个可选参数:

  • 文件名
  • 模式
  • 缓冲(我将在此答案中忽略此参数)

文件名应该是代表文件路径的字符串。例如:

open('afile')   # opens the file named afile in the current working directory
open('adir/afile')            # relative path (relative to the current working directory)
open('C:/users/aname/afile')  # absolute path (windows)
open('/usr/local/afile')      # absolute path (linux)

请注意,需要指定文件扩展名。这对于Windows用户尤其重要,因为在资源管理器中查看时,默认情况下会隐藏文件扩展名(例如.txt.doc等)。

第二个参数是moder默认情况下表示“只读”。这正是您所需要的。

但是,如果您确实要创建文件和/或写入文件,则在此处需要使用其他参数。如果您需要概述,这是一个很好的答案

要读取文件,您可以省略mode或明确传递它:

open(filename)
open(filename, 'r')

两者都将以只读模式打开文件。如果要在Windows上读取二进制文件,则需要使用模式rb

open(filename, 'rb')

在其他平台上,'b'(二进制模式)将被忽略。


现在,我已经显示了如何处理open文件,让我们谈谈您总是需要close再次使用它的事实。否则,它将保持对文件的打开文件句柄,直到进程退出(或Python丢弃文件句柄)。

虽然您可以使用:

f = open(filename)
# ... do stuff with f
f.close()

当两者之间存在openclose引发异常时,将无法关闭文件。您可以使用try和来避免这种情况finally

f = open(filename)
# nothing in between!
try:
    # do stuff with f
finally:
    f.close()

但是,Python提供了具有更漂亮语法的上下文管理器(但与上面opentry和几乎相同finally):

with open(filename) as f:
    # do stuff with f
# The file is always closed after the with-scope ends.

最后一种方法是建议使用 Python打开文件的方法!

读取文件

好的,您已经打开了文件,现在如何读取?

open函数返回一个file对象,它支持Python的迭代协议。每次迭代都会给你一行:

with open(filename) as f:
    for line in f:
        print(line)

这将打印文件的每一行。但是请注意,每行\n的末尾都将包含一个换行符(您可能要检查您的Python是否具有通用换行符支持 -否则\r\n在Windows或\rMac 上也可以作为换行符)。如果您不希望这样做,可以简单地删除最后符(或Windows中的最后两个字符):

with open(filename) as f:
    for line in f:
        print(line[:-1])

但是最后一行不一定有尾随换行符,因此不应使用它。可以检查它是否以尾随换行符结尾,如果是这样,请将其删除:

with open(filename) as f:
    for line in f:
        if line.endswith('\n'):
            line = line[:-1]
        print(line)

但是您可以简单地\n字符串末尾删除所有空格(包括字符),这还将删除所有其他尾随空格,因此如果这些空格很重要,则必须小心:

with open(filename) as f:
    for line in f:
        print(f.rstrip())

但是,如果这些行以\r\n(Windows“ newlines”)结尾,.rstrip()则也将注意\r

将内容存储为列表

现在您知道了如何打开文件并阅读它,是时候将内容存储在列表中了。最简单的选择是使用以下list功能:

with open(filename) as f:
    lst = list(f)

如果要删除尾随的换行符,可以使用列表理解:

with open(filename) as f:
    lst = [line.rstrip() for line in f]

或更简单:默认情况下.readlines()file对象的方法返回list以下行中的a:

with open(filename) as f:
    lst = f.readlines()

这还将包括尾随换行符,如果您不希望它们,我将推荐这种[line.rstrip() for line in f]方法,因为它避免了在内存中保留包含所有行的两个列表。

还有一个额外的选项来获得所需的输出,但是它是“次优的”:read将整个文件放在字符串中,然后在换行符上分割:

with open(filename) as f:
    lst = f.read().split('\n')

要么:

with open(filename) as f:
    lst = f.read().splitlines()

由于split不包含字符,因此它们会自动处理尾随的换行符。但是,它们并不理想,因为您将文件保留为字符串和内存中的行列表!

摘要

  • with open(...) as f在打开文件时使用,因为您无需自己关闭文件,即使发生某些异常也可以关闭文件。
  • file对象支持迭代协议,因此逐行读取文件就像一样简单for line in the_file_object:
  • 始终浏览文档以获取可用的功能/类。在大多数情况下,任务或至少一个或两个好的任务是一个完美的选择。在这种情况下,显而易见的选择是,readlines()但是如果您要在将行存储到列表中之前对其进行处理,我建议您使用简单的列表理解。

To read a file into a list you need to do three things:

  • Open the file
  • Read the file
  • Store the contents as list

Fortunately Python makes it very easy to do these things so the shortest way to read a file into a list is:

lst = list(open(filename))

However I’ll add some more explanation.

Opening the file

I assume that you want to open a specific file and you don’t deal directly with a file-handle (or a file-like-handle). The most commonly used function to open a file in Python is open, it takes one mandatory argument and two optional ones in Python 2.7:

  • Filename
  • Mode
  • Buffering (I’ll ignore this argument in this answer)

The filename should be a string that represents the path to the file. For example:

open('afile')   # opens the file named afile in the current working directory
open('adir/afile')            # relative path (relative to the current working directory)
open('C:/users/aname/afile')  # absolute path (windows)
open('/usr/local/afile')      # absolute path (linux)

Note that the file extension needs to be specified. This is especially important for Windows users because file extensions like .txt or .doc, etc. are hidden by default when viewed in the explorer.

The second argument is the mode, it’s r by default which means “read-only”. That’s exactly what you need in your case.

But in case you actually want to create a file and/or write to a file you’ll need a different argument here. There is an excellent answer if you want an overview.

For reading a file you can omit the mode or pass it in explicitly:

open(filename)
open(filename, 'r')

Both will open the file in read-only mode. In case you want to read in a binary file on Windows you need to use the mode rb:

open(filename, 'rb')

On other platforms the 'b' (binary mode) is simply ignored.


Now that I’ve shown how to open the file, let’s talk about the fact that you always need to close it again. Otherwise it will keep an open file-handle to the file until the process exits (or Python garbages the file-handle).

While you could use:

f = open(filename)
# ... do stuff with f
f.close()

That will fail to close the file when something between open and close throws an exception. You could avoid that by using a try and finally:

f = open(filename)
# nothing in between!
try:
    # do stuff with f
finally:
    f.close()

However Python provides context managers that have a prettier syntax (but for open it’s almost identical to the try and finally above):

with open(filename) as f:
    # do stuff with f
# The file is always closed after the with-scope ends.

The last approach is the recommended approach to open a file in Python!

Reading the file

Okay, you’ve opened the file, now how to read it?

The open function returns a file object and it supports Pythons iteration protocol. Each iteration will give you a line:

with open(filename) as f:
    for line in f:
        print(line)

This will print each line of the file. Note however that each line will contain a newline character \n at the end (you might want to check if your Python is built with universal newlines support – otherwise you could also have \r\n on Windows or \r on Mac as newlines). If you don’t want that you can could simply remove the last character (or the last two characters on Windows):

with open(filename) as f:
    for line in f:
        print(line[:-1])

But the last line doesn’t necessarily has a trailing newline, so one shouldn’t use that. One could check if it ends with a trailing newline and if so remove it:

with open(filename) as f:
    for line in f:
        if line.endswith('\n'):
            line = line[:-1]
        print(line)

But you could simply remove all whitespaces (including the \n character) from the end of the string, this will also remove all other trailing whitespaces so you have to be careful if these are important:

with open(filename) as f:
    for line in f:
        print(f.rstrip())

However if the lines end with \r\n (Windows “newlines”) that .rstrip() will also take care of the \r!

Store the contents as list

Now that you know how to open the file and read it, it’s time to store the contents in a list. The simplest option would be to use the list function:

with open(filename) as f:
    lst = list(f)

In case you want to strip the trailing newlines you could use a list comprehension instead:

with open(filename) as f:
    lst = [line.rstrip() for line in f]

Or even simpler: The .readlines() method of the file object by default returns a list of the lines:

with open(filename) as f:
    lst = f.readlines()

This will also include the trailing newline characters, if you don’t want them I would recommend the [line.rstrip() for line in f] approach because it avoids keeping two lists containing all the lines in memory.

There’s an additional option to get the desired output, however it’s rather “suboptimal”: read the complete file in a string and then split on newlines:

with open(filename) as f:
    lst = f.read().split('\n')

or:

with open(filename) as f:
    lst = f.read().splitlines()

These take care of the trailing newlines automatically because the split character isn’t included. However they are not ideal because you keep the file as string and as a list of lines in memory!

Summary

  • Use with open(...) as f when opening files because you don’t need to take care of closing the file yourself and it closes the file even if some exception happens.
  • file objects support the iteration protocol so reading a file line-by-line is as simple as for line in the_file_object:.
  • Always browse the documentation for the available functions/classes. Most of the time there’s a perfect match for the task or at least one or two good ones. The obvious choice in this case would be readlines() but if you want to process the lines before storing them in the list I would recommend a simple list-comprehension.

回答 9

将文件中的行读入列表的简洁Python方式


首先,最重要的是,您应该专注于以高效且Python方式打开文件并读取其内容。这是我个人不喜欢的方式的一个示例:

infile = open('my_file.txt', 'r')  # Open the file for reading.

data = infile.read()  # Read the contents of the file.

infile.close()  # Close the file since we're done using it.

相反,我更喜欢以下打开文件进行读写的方法,因为它非常干净,并且在使用完文件后不需要关闭文件的额外步骤。在下面的语句中,我们将打开文件进行读取,并将其分配给变量“ infile”。一旦该语句中的代码运行完毕,该文件将自动关闭。

# Open the file for reading.
with open('my_file.txt', 'r') as infile:

    data = infile.read()  # Read the contents of the file into memory.

现在,我们需要集中精力将这些数据引入Python列表中,因为它们是可迭代的,高效的和灵活的。在您的情况下,理想的目标是将文本文件的每一行放入一个单独的元素中。为此,我们将使用splitlines()方法,如下所示:

# Return a list of the lines, breaking at line boundaries.
my_list = data.splitlines()

最终产品:

# Open the file for reading.
with open('my_file.txt', 'r') as infile:

    data = infile.read()  # Read the contents of the file into memory.

# Return a list of the lines, breaking at line boundaries.
my_list = data.splitlines()

测试我们的代码:

  • 文本文件的内容:
     A fost odatã ca-n povesti,
     A fost ca niciodatã,
     Din rude mãri împãrãtesti,
     O prea frumoasã fatã.
  • 打印报表以进行测试:
    print my_list  # Print the list.

    # Print each line in the list.
    for line in my_list:
        print line

    # Print the fourth element in this list.
    print my_list[3]
  • 输出(由于Unicode字符而外观不同):
     ['A fost odat\xc3\xa3 ca-n povesti,', 'A fost ca niciodat\xc3\xa3,',
     'Din rude m\xc3\xa3ri \xc3\xaemp\xc3\xa3r\xc3\xa3testi,', 'O prea
     frumoas\xc3\xa3 fat\xc3\xa3.']

     A fost odatã ca-n povesti, A fost ca niciodatã, Din rude mãri
     împãrãtesti, O prea frumoasã fatã.

     O prea frumoasã fatã.

Clean and Pythonic Way of Reading the Lines of a File Into a List


First and foremost, you should focus on opening your file and reading its contents in an efficient and pythonic way. Here is an example of the way I personally DO NOT prefer:

infile = open('my_file.txt', 'r')  # Open the file for reading.

data = infile.read()  # Read the contents of the file.

infile.close()  # Close the file since we're done using it.

Instead, I prefer the below method of opening files for both reading and writing as it is very clean, and does not require an extra step of closing the file once you are done using it. In the statement below, we’re opening the file for reading, and assigning it to the variable ‘infile.’ Once the code within this statement has finished running, the file will be automatically closed.

# Open the file for reading.
with open('my_file.txt', 'r') as infile:

    data = infile.read()  # Read the contents of the file into memory.

Now we need to focus on bringing this data into a Python List because they are iterable, efficient, and flexible. In your case, the desired goal is to bring each line of the text file into a separate element. To accomplish this, we will use the splitlines() method as follows:

# Return a list of the lines, breaking at line boundaries.
my_list = data.splitlines()

The Final Product:

# Open the file for reading.
with open('my_file.txt', 'r') as infile:

    data = infile.read()  # Read the contents of the file into memory.

# Return a list of the lines, breaking at line boundaries.
my_list = data.splitlines()

Testing Our Code:

  • Contents of the text file:
     A fost odatã ca-n povesti,
     A fost ca niciodatã,
     Din rude mãri împãrãtesti,
     O prea frumoasã fatã.
  • Print statements for testing purposes:
    print my_list  # Print the list.

    # Print each line in the list.
    for line in my_list:
        print line

    # Print the fourth element in this list.
    print my_list[3]
  • Output (different-looking because of unicode characters):
     ['A fost odat\xc3\xa3 ca-n povesti,', 'A fost ca niciodat\xc3\xa3,',
     'Din rude m\xc3\xa3ri \xc3\xaemp\xc3\xa3r\xc3\xa3testi,', 'O prea
     frumoas\xc3\xa3 fat\xc3\xa3.']

     A fost odatã ca-n povesti, A fost ca niciodatã, Din rude mãri
     împãrãtesti, O prea frumoasã fatã.

     O prea frumoasã fatã.

回答 10

在Python 3.4中引入,它pathlib具有从文件中读取文本的非常方便的方法,如下所示:

from pathlib import Path
p = Path('my_text_file')
lines = p.read_text().splitlines()

(该splitlines调用使它从包含文件全部内容的字符串变成文件中的行列表)。

pathlib有很多方便的地方。read_text简洁明了,您不必担心打开和关闭文件的麻烦。如果您需要一次性处理所有文件,那么这是一个不错的选择。

Introduced in Python 3.4, pathlib has a really convenient method for reading in text from files, as follows:

from pathlib import Path
p = Path('my_text_file')
lines = p.read_text().splitlines()

(The splitlines call is what turns it from a string containing the whole contents of the file to a list of lines in the file).

pathlib has a lot of handy conveniences in it. read_text is nice and concise, and you don’t have to worry about opening and closing the file. If all you need to do with the file is read it all in in one go, it’s a good choice.


回答 11

通过对文件使用列表推导,这是另一个选择。

lines = [line.rstrip() for line in open('file.txt')]

这应该是一种更有效的方法,因为大部分工作都在Python解释器中完成。

Here’s one more option by using list comprehensions on files;

lines = [line.rstrip() for line in open('file.txt')]

This should be more efficient way as the most of the work is done inside the Python interpreter.


回答 12

f = open("your_file.txt",'r')
out = f.readlines() # will append in the list out

现在,变量out是您想要的列表(数组)。您可以这样做:

for line in out:
    print (line)

要么:

for line in f:
    print (line)

您将获得相同的结果。

f = open("your_file.txt",'r')
out = f.readlines() # will append in the list out

Now variable out is a list (array) of what you want. You could either do:

for line in out:
    print (line)

Or:

for line in f:
    print (line)

You’ll get the same results.


回答 13

使用Python 2和Python 3读写文本文件;它适用于Unicode

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

# Define data
lines = ['     A first string  ',
         'A Unicode sample: €',
         'German: äöüß']

# Write text file
with open('file.txt', 'w') as fp:
    fp.write('\n'.join(lines))

# Read text file
with open('file.txt', 'r') as fp:
    read_lines = fp.readlines()
    read_lines = [line.rstrip('\n') for line in read_lines]

print(lines == read_lines)

注意事项:

  • with是所谓的上下文管理器。确保打开的文件再次关闭。
  • 这里所有产生.strip().rstrip()将无法复制的解决方案都将lines剥夺空白。

通用文件结尾

.txt

更高级的文件写入/读取

对于您的应用程序,以下内容可能很重要:

  • 其他编程语言的支持
  • 读写性能
  • 紧凑度(文件大小)

另请参阅:数据序列化格式的比较

如果您想寻找一种制作配置文件的方法,则可能需要阅读我的简短文章《Python中的配置文件》

Read and write text files with Python 2 and Python 3; it works with Unicode

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

# Define data
lines = ['     A first string  ',
         'A Unicode sample: €',
         'German: äöüß']

# Write text file
with open('file.txt', 'w') as fp:
    fp.write('\n'.join(lines))

# Read text file
with open('file.txt', 'r') as fp:
    read_lines = fp.readlines()
    read_lines = [line.rstrip('\n') for line in read_lines]

print(lines == read_lines)

Things to notice:

  • with is a so-called context manager. It makes sure that the opened file is closed again.
  • All solutions here which simply make .strip() or .rstrip() will fail to reproduce the lines as they also strip the white space.

Common file endings

.txt

More advanced file writing/reading

For your application, the following might be important:

  • Support by other programming languages
  • Reading/writing performance
  • Compactness (file size)

See also: Comparison of data serialization formats

In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python.


回答 14

另一个选项是numpy.genfromtxt,例如:

import numpy as np
data = np.genfromtxt("yourfile.dat",delimiter="\n")

这将使dataNumPy数组具有与文件中一样多的行。

Another option is numpy.genfromtxt, for example:

import numpy as np
data = np.genfromtxt("yourfile.dat",delimiter="\n")

This will make data a NumPy array with as many rows as are in your file.


回答 15

如果您想从命令行或标准输入中读取文件,也可以使用以下fileinput模块:

# reader.py
import fileinput

content = []
for line in fileinput.input():
    content.append(line.strip())

fileinput.close()

像这样将文件传递给它:

$ python reader.py textfile.txt 

在此处阅读更多信息:http : //docs.python.org/2/library/fileinput.html

If you’d like to read a file from the command line or from stdin, you can also use the fileinput module:

# reader.py
import fileinput

content = []
for line in fileinput.input():
    content.append(line.strip())

fileinput.close()

Pass files to it like so:

$ python reader.py textfile.txt 

Read more here: http://docs.python.org/2/library/fileinput.html


回答 16

最简单的方法

一种简单的方法是:

  1. 以字符串形式读取整个文件
  2. 逐行拆分字符串

在一行中,这将给出:

lines = open('C:/path/file.txt').read().splitlines()

但是,这是一种非常低效的方式,因为它将在内存中存储2个版本的内容(对于小文件来说可能不是一个大问题,但仍然如此)。[谢谢马克·阿默里]。

有2种更简单的方法:

  1. 使用文件作为迭代器
lines = list(open('C:/path/file.txt'))
# ... or if you want to have a list without EOL characters
lines = [l.rstrip() for l in open('C:/path/file.txt')]
  1. 如果您使用的是Python 3.4或更高版本,请更好地pathlib为文件创建路径,以供程序中的其他操作使用:
from pathlib import Path
file_path = Path("C:/path/file.txt") 
lines = file_path.read_text().split_lines()
# ... or ... 
lines = [l.rstrip() for l in file_path.open()]

The simplest way to do it

A simple way is to:

  1. Read the whole file as a string
  2. Split the string line by line

In one line, that would give:

lines = open('C:/path/file.txt').read().splitlines()

However, this is quite inefficient way as this will store 2 versions of the content in memory (probably not a big issue for small files, but still). [Thanks Mark Amery].

There are 2 easier ways:

  1. Using the file as an iterator
lines = list(open('C:/path/file.txt'))
# ... or if you want to have a list without EOL characters
lines = [l.rstrip() for l in open('C:/path/file.txt')]
  1. If you are using Python 3.4 or above, better use pathlib to create a path for your file that you could use for other operations in your program:
from pathlib import Path
file_path = Path("C:/path/file.txt") 
lines = file_path.read_text().split_lines()
# ... or ... 
lines = [l.rstrip() for l in file_path.open()]

回答 17

只需使用splitlines()函数。这是一个例子。

inp = "file.txt"
data = open(inp)
dat = data.read()
lst = dat.splitlines()
print lst
# print(lst) # for python 3

在输出中,您将具有行列表。

Just use the splitlines() functions. Here is an example.

inp = "file.txt"
data = open(inp)
dat = data.read()
lst = dat.splitlines()
print lst
# print(lst) # for python 3

In the output you will have the list of lines.


回答 18

如果您想要面对一个非常大的文件,并且想要更快读取(假设您正在参加Topcoder / Hackerrank编码竞赛),则可以一次将相当大的几行读取到内存缓冲区中,而不是一次只是在文件级别逐行迭代。

buffersize = 2**16
with open(path) as f: 
    while True:
        lines_buffer = f.readlines(buffersize)
        if not lines_buffer:
            break
        for line in lines_buffer:
            process(line)

If you want to are faced with a very large / huge file and want to read faster (imagine you are in a Topcoder/Hackerrank coding competition), you might read a considerably bigger chunk of lines into a memory buffer at one time, rather than just iterate line by line at file level.

buffersize = 2**16
with open(path) as f: 
    while True:
        lines_buffer = f.readlines(buffersize)
        if not lines_buffer:
            break
        for line in lines_buffer:
            process(line)

回答 19

实现此目标的最简单方法是:

lines = list(open('filename'))

要么

lines = tuple(open('filename'))

要么

lines = set(open('filename'))

在使用的情况下set,必须记住,我们没有保留行顺序并摆脱了重复的行。

我在下面添加了@MarkAmery的重要补充:

由于您既不调用.close文件对象也不使用with语句,因此在某些Python实现中,文件在读取后可能不会关闭,并且您的进程将泄漏打开的文件句柄

CPython(大多数人使用的普通Python实现)中,这不是问题,因为文件对象将立即被垃圾收集并关闭文件,但是,尽管如此,它仍被认为是最佳实践,例如

with open('filename') as f: lines = list(f) 

以确保无论使用哪种Python实现,文件都将关闭。

The easiest ways to do that with some additional benefits are:

lines = list(open('filename'))

or

lines = tuple(open('filename'))

or

lines = set(open('filename'))

In the case with set, we must be remembered that we don’t have the line order preserved and get rid of the duplicated lines.

Below I added an important supplement from @MarkAmery:

Since you’re not calling .close on the file object nor using a with statement, in some Python implementations the file may not get closed after reading and your process will leak an open file handle.

In CPython (the normal Python implementation that most people use), this isn’t a problem since the file object will get immediately garbage-collected and this will close the file, but it’s nonetheless generally considered best practice to do something like:

with open('filename') as f: lines = list(f) 

to ensure that the file gets closed regardless of what Python implementation you’re using.


回答 20

用这个:

import pandas as pd
data = pd.read_csv(filename) # You can also add parameters such as header, sep, etc.
array = data.values

data是数据框类型,并使用值获取ndarray。您也可以使用来获得列表array.tolist()

Use this:

import pandas as pd
data = pd.read_csv(filename) # You can also add parameters such as header, sep, etc.
array = data.values

data is a dataframe type, and uses values to get ndarray. You can also get a list by using array.tolist().


回答 21

概述和总结

使用filename,从Path(filename)对象处理文件,或直接使用open(filename) as f,执行以下任一操作:

  • list(fileinput.input(filename))
  • 使用with path.open() as f,呼叫f.readlines()
  • list(f)
  • path.read_text().splitlines()
  • path.read_text().splitlines(keepends=True)
  • 遍历fileinput.input或,f并且list.append每行一次
  • 传递f给绑定list.extend方法
  • 用于f列表理解

我在下面解释了每个的用例。

在Python中,如何逐行读取文件?

这是一个很好的问题。首先,让我们创建一些示例数据:

from pathlib import Path
Path('filename').write_text('foo\nbar\nbaz')

文件对象是惰性的迭代器,因此只需对其进行迭代即可。

filename = 'filename'
with open(filename) as f:
    for line in f:
        line # do something with the line

或者,如果您有多个文件,请使用fileinput.input,另一个懒惰迭代器。仅一个文件:

import fileinput

for line in fileinput.input(filename): 
    line # process the line

或对于多个文件,向其传递文件名列表:

for line in fileinput.input([filename]*2): 
    line # process the line

再次,f并且fileinput.input在两者之上都是返回懒惰迭代器。您只能使用一次迭代器,因此在提供功能代码的同时避免了冗长性,我将fileinput.input(filename)在此处使用适当的简短程度。

在Python中,如何将文件逐行读入列表?

啊,但是出于某种原因您想要在列表中?如果可能,我会避免这种情况。但是,如果您坚持…只需将结果传递fileinput.input(filename)list

list(fileinput.input(filename))

另一个直接的答案是打电话 f.readlines,它返回文件的内容(最多可选hint数目的字符,因此您可以通过这种方式将其分解为多个列表)。

您可以通过两种方式获取此文件对象。一种方法是将文件名传递给open内置:

filename = 'filename'

with open(filename) as f:
    f.readlines()

或使用新的Path对象 pathlib模块中(我已经很喜欢它,并将在此处使用):

from pathlib import Path

path = Path(filename)

with path.open() as f:
    f.readlines()

list 也将使用文件迭代器并返回列表-同样是一个非常直接的方法:

with path.open() as f:
    list(f)

如果您不介意在拆分之前将整个文本作为单个字符串读取到内存中,则可以使用Path对象和splitlines()字符串方法将其作为一个单行进行。默认,splitlines删除换行符:

path.read_text().splitlines()

如果要保留换行符,请传递keepends=True

path.read_text().splitlines(keepends=True)

我想逐行读取文件并将每行追加到列表的末尾。

鉴于我们已经用几种方法轻松证明了最终结果,所以这有点愚蠢。但是您在创建列表时可能需要过滤或操作这些行,因此让我们对此请求进行幽默处理。

使用list.append可以让您在添加每一行之前对其进行过滤或操作:

line_list = []
for line in fileinput.input(filename):
    line_list.append(line)

line_list

使用list.extend会更直接一些,如果您已有一个列表,则可能会有用:

line_list = []
line_list.extend(fileinput.input(filename))
line_list

或更惯用的是,我们可以改用列表理解,并在需要时在其中进行映射和过滤:

[line for line in fileinput.input(filename)]

甚至更直接地,要闭合圆,只需将其传递到列表即可直接创建新列表,而无需在线操作:

list(fileinput.input(filename))

结论

您已经看到了许多将文件中的行放入列表中的方法,但是我建议您避免将大量数据具体化到列表中,而是尽可能使用Python的惰性迭代来处理数据。

也就是说,首选fileinput.inputwith path.open() as f

Outline and Summary

With a filename, handling the file from a Path(filename) object, or directly with open(filename) as f, do one of the following:

  • list(fileinput.input(filename))
  • using with path.open() as f, call f.readlines()
  • list(f)
  • path.read_text().splitlines()
  • path.read_text().splitlines(keepends=True)
  • iterate over fileinput.input or f and list.append each line one at a time
  • pass f to a bound list.extend method
  • use f in a list comprehension

I explain the use-case for each below.

In Python, how do I read a file line-by-line?

This is an excellent question. First, let’s create some sample data:

from pathlib import Path
Path('filename').write_text('foo\nbar\nbaz')

File objects are lazy iterators, so just iterate over it.

filename = 'filename'
with open(filename) as f:
    for line in f:
        line # do something with the line

Alternatively, if you have multiple files, use fileinput.input, another lazy iterator. With just one file:

import fileinput

for line in fileinput.input(filename): 
    line # process the line

or for multiple files, pass it a list of filenames:

for line in fileinput.input([filename]*2): 
    line # process the line

Again, f and fileinput.input above both are/return lazy iterators. You can only use an iterator one time, so to provide functional code while avoiding verbosity I’ll use the slightly more terse fileinput.input(filename) where apropos from here.

In Python, how do I read a file line-by-line into a list?

Ah but you want it in a list for some reason? I’d avoid that if possible. But if you insist… just pass the result of fileinput.input(filename) to list:

list(fileinput.input(filename))

Another direct answer is to call f.readlines, which returns the contents of the file (up to an optional hint number of characters, so you could break this up into multiple lists that way).

You can get to this file object two ways. One way is to pass the filename to the open builtin:

filename = 'filename'

with open(filename) as f:
    f.readlines()

or using the new Path object from the pathlib module (which I have become quite fond of, and will use from here on):

from pathlib import Path

path = Path(filename)

with path.open() as f:
    f.readlines()

list will also consume the file iterator and return a list – a quite direct method as well:

with path.open() as f:
    list(f)

If you don’t mind reading the entire text into memory as a single string before splitting it, you can do this as a one-liner with the Path object and the splitlines() string method. By default, splitlines removes the newlines:

path.read_text().splitlines()

If you want to keep the newlines, pass keepends=True:

path.read_text().splitlines(keepends=True)

I want to read the file line by line and append each line to the end of the list.

Now this is a bit silly to ask for, given that we’ve demonstrated the end result easily with several methods. But you might need to filter or operate on the lines as you make your list, so let’s humor this request.

Using list.append would allow you to filter or operate on each line before you append it:

line_list = []
for line in fileinput.input(filename):
    line_list.append(line)

line_list

Using list.extend would be a bit more direct, and perhaps useful if you have a preexisting list:

line_list = []
line_list.extend(fileinput.input(filename))
line_list

Or more idiomatically, we could instead use a list comprehension, and map and filter inside it if desirable:

[line for line in fileinput.input(filename)]

Or even more directly, to close the circle, just pass it to list to create a new list directly without operating on the lines:

list(fileinput.input(filename))

Conclusion

You’ve seen many ways to get lines from a file into a list, but I’d recommend you avoid materializing large quantities of data into a list and instead use Python’s lazy iteration to process the data if possible.

That is, prefer fileinput.input or with path.open() as f.


回答 22

如果文档中也有空行,我希望阅读内容并将其传递filter以防止空字符串元素

with open(myFile, "r") as f:
    excludeFileContent = list(filter(None, f.read().splitlines()))

In case that there are also empty lines in the document I like to read in the content and pass it through filter to prevent empty string elements

with open(myFile, "r") as f:
    excludeFileContent = list(filter(None, f.read().splitlines()))

回答 23

您也可以在NumPy中使用loadtxt命令。与genfromtxt相比,此方法检查的条件较少,因此可能更快。

import numpy
data = numpy.loadtxt(filename, delimiter="\n")

You could also use the loadtxt command in NumPy. This checks for fewer conditions than genfromtxt, so it may be faster.

import numpy
data = numpy.loadtxt(filename, delimiter="\n")

回答 24

我喜欢使用以下内容。立即阅读线路。

contents = []
for line in open(filepath, 'r').readlines():
    contents.append(line.strip())

或使用列表理解:

contents = [line.strip() for line in open(filepath, 'r').readlines()]

I like to use the following. Reading the lines immediately.

contents = []
for line in open(filepath, 'r').readlines():
    contents.append(line.strip())

Or using list comprehension:

contents = [line.strip() for line in open(filepath, 'r').readlines()]

回答 25

我会尝试以下提到的方法之一。我使用的示例文件的名称为dummy.txt。您可以在此处找到文件。我认为该文件与代码位于同一目录中(您可以更改fpath以包含正确的文件名和文件夹路径。)

在下面提到的两个示例中,所需的列表由给出lst

1.>第一种方法

fpath = 'dummy.txt'
with open(fpath, "r") as f: lst = [line.rstrip('\n \t') for line in f]

print lst
>>>['THIS IS LINE1.', 'THIS IS LINE2.', 'THIS IS LINE3.', 'THIS IS LINE4.']

2.>第二种方法中,可以使用Python标准库中的csv.reader模块

import csv
fpath = 'dummy.txt'
with open(fpath) as csv_file:
    csv_reader = csv.reader(csv_file, delimiter='   ')
    lst = [row[0] for row in csv_reader] 

print lst
>>>['THIS IS LINE1.', 'THIS IS LINE2.', 'THIS IS LINE3.', 'THIS IS LINE4.']

您可以使用两种方法之一。创建时间lst在两种方法中时间几乎相等。

I would try one of the below mentioned methods. The example file that I use has the name dummy.txt. You can find the file here. I presume, that the file is in the same directory as the code (you can change fpath to include the proper file name and folder path.)

In both the below mentioned examples, the list that you want is given by lst.

1.> First method:

fpath = 'dummy.txt'
with open(fpath, "r") as f: lst = [line.rstrip('\n \t') for line in f]

print lst
>>>['THIS IS LINE1.', 'THIS IS LINE2.', 'THIS IS LINE3.', 'THIS IS LINE4.']

2.> In the second method, one can use csv.reader module from Python Standard Library:

import csv
fpath = 'dummy.txt'
with open(fpath) as csv_file:
    csv_reader = csv.reader(csv_file, delimiter='   ')
    lst = [row[0] for row in csv_reader] 

print lst
>>>['THIS IS LINE1.', 'THIS IS LINE2.', 'THIS IS LINE3.', 'THIS IS LINE4.']

You can use either of the two methods. Time taken for the creation of lst is almost equal in the two methods.


回答 26

这是我用来简化文件I / O 的Python(3)帮助程序类:

import os

# handle files using a callback method, prevents repetition
def _FileIO__file_handler(file_path, mode, callback = lambda f: None):
  f = open(file_path, mode)
  try:
    return callback(f)
  except Exception as e:
    raise IOError("Failed to %s file" % ["write to", "read from"][mode.lower() in "r rb r+".split(" ")])
  finally:
    f.close()


class FileIO:
  # return the contents of a file
  def read(file_path, mode = "r"):
    return __file_handler(file_path, mode, lambda rf: rf.read())

  # get the lines of a file
  def lines(file_path, mode = "r", filter_fn = lambda line: len(line) > 0):
    return [line for line in FileIO.read(file_path, mode).strip().split("\n") if filter_fn(line)]

  # create or update a file (NOTE: can also be used to replace a file's original content)
  def write(file_path, new_content, mode = "w"):
    return __file_handler(file_path, mode, lambda wf: wf.write(new_content))

  # delete a file (if it exists)
  def delete(file_path):
    return os.remove() if os.path.isfile(file_path) else None

然后FileIO.lines,您将使用该函数,如下所示:

file_ext_lines = FileIO.lines("./path/to/file.ext"):
for i, line in enumerate(file_ext_lines):
  print("Line {}: {}".format(i + 1, line))

请记住,mode"r"默认情况下)和filter_fn(默认情况下检查空行)参数是可选的。

你甚至可以删除readwrite以及delete方法和刚离开FileIO.lines,甚至把它变成所谓的一个单独的方法read_lines

Here is a Python(3) helper library class that I use to simplify file I/O:

import os

# handle files using a callback method, prevents repetition
def _FileIO__file_handler(file_path, mode, callback = lambda f: None):
  f = open(file_path, mode)
  try:
    return callback(f)
  except Exception as e:
    raise IOError("Failed to %s file" % ["write to", "read from"][mode.lower() in "r rb r+".split(" ")])
  finally:
    f.close()


class FileIO:
  # return the contents of a file
  def read(file_path, mode = "r"):
    return __file_handler(file_path, mode, lambda rf: rf.read())

  # get the lines of a file
  def lines(file_path, mode = "r", filter_fn = lambda line: len(line) > 0):
    return [line for line in FileIO.read(file_path, mode).strip().split("\n") if filter_fn(line)]

  # create or update a file (NOTE: can also be used to replace a file's original content)
  def write(file_path, new_content, mode = "w"):
    return __file_handler(file_path, mode, lambda wf: wf.write(new_content))

  # delete a file (if it exists)
  def delete(file_path):
    return os.remove() if os.path.isfile(file_path) else None

You would then use the FileIO.lines function, like this:

file_ext_lines = FileIO.lines("./path/to/file.ext"):
for i, line in enumerate(file_ext_lines):
  print("Line {}: {}".format(i + 1, line))

Remember that the mode ("r" by default) and filter_fn (checks for empty lines by default) parameters are optional.

You could even remove the read, write and delete methods and just leave the FileIO.lines, or even turn it into a separate method called read_lines.


回答 27

命令行版本

#!/bin/python3
import os
import sys
abspath = os.path.abspath(__file__)
dname = os.path.dirname(abspath)
filename = dname + sys.argv[1]
arr = open(filename).read().split("\n") 
print(arr)

运行:

python3 somefile.py input_file_name.txt

Command line version

#!/bin/python3
import os
import sys
abspath = os.path.abspath(__file__)
dname = os.path.dirname(abspath)
filename = dname + sys.argv[1]
arr = open(filename).read().split("\n") 
print(arr)

Run with:

python3 somefile.py input_file_name.txt

为什么是string.join(list)而不是list.join(string)?

问题:为什么是string.join(list)而不是list.join(string)?

这一直使我感到困惑。看起来这样会更好:

my_list = ["Hello", "world"]
print(my_list.join("-"))
# Produce: "Hello-world"

比这个:

my_list = ["Hello", "world"]
print("-".join(my_list))
# Produce: "Hello-world"

是否有特定原因?

This has always confused me. It seems like this would be nicer:

my_list = ["Hello", "world"]
print(my_list.join("-"))
# Produce: "Hello-world"

Than this:

my_list = ["Hello", "world"]
print("-".join(my_list))
# Produce: "Hello-world"

Is there a specific reason it is like this?


回答 0

这是因为任何可迭代项都可以连接(例如,列表,元组,字典,集合),但是结果和“连接器” 必须是字符串。

例如:

'_'.join(['welcome', 'to', 'stack', 'overflow'])
'_'.join(('welcome', 'to', 'stack', 'overflow'))
'welcome_to_stack_overflow'

使用字符串以外的其他东西会引发以下错误:

TypeError:序列项0:预期的str实例,找到的int

It’s because any iterable can be joined (e.g, list, tuple, dict, set), but the result and the “joiner” must be strings.

For example:

'_'.join(['welcome', 'to', 'stack', 'overflow'])
'_'.join(('welcome', 'to', 'stack', 'overflow'))
'welcome_to_stack_overflow'

Using something else than strings will raise the following error:

TypeError: sequence item 0: expected str instance, int found


回答 1

这在String方法中进行了讨论……最终在Python-Dev中实现,并被Guido接受。该线程始于1999年6月,并str.join包含在2000年9月发布的Python 1.6中(并支持Unicode)。Python 2.0(受支持的str方法,包括join)于2000年10月发布。

  • 此线程中提出了四个选项:
    • str.join(seq)
    • seq.join(str)
    • seq.reduce(str)
    • join 作为内置功能
  • Guido不仅希望支持lists,tuples,而且还支持所有序列/可迭代对象。
  • seq.reduce(str) 对于新来者来说很难。
  • seq.join(str) 从序列到str / unicode引入了意外的依赖关系。
  • join()因为内置函数仅支持特定的数据类型。因此,使用内置的命名空间是不好的。如果join()支持许多数据类型,则创建优化的实现将很困难,如果使用该__add__方法实现,则为O(n²)。
  • 分隔符(sep)不应省略。显式胜于隐式。

此线程中没有其他原因。

以下是一些其他想法(我自己和我朋友的想法):

  • Unicode支持即将到来,但这不是最终的。当时,UTF-8最有可能取代UCS2 / 4。要计算UTF-8字符串的总缓冲区长度,需要知道字符编码规则。
  • 那时,Python已经决定了通用的序列接口规则,用户可以在其中创建类似序列的(可迭代)类。但是Python直到2.2才支持扩展内置类型。那时,很难提供基本的可迭代类(在另一条评论中提到)。

Guido的决定记录在历史邮件中,决定str.join(seq)

有趣,但看起来确实正确!巴里,去吧…-
吉多·范·罗苏姆(Guido van Rossum)

This was discussed in the String methods… finally thread in the Python-Dev achive, and was accepted by Guido. This thread began in Jun 1999, and str.join was included in Python 1.6 which was released in Sep 2000 (and supported Unicode). Python 2.0 (supported str methods including join) was released in Oct 2000.

  • There were four options proposed in this thread:
    • str.join(seq)
    • seq.join(str)
    • seq.reduce(str)
    • join as a built-in function
  • Guido wanted to support not only lists, tuples, but all sequences/iterables.
  • seq.reduce(str) is difficult for new-comers.
  • seq.join(str) introduces unexpected dependency from sequences to str/unicode.
  • join() as a built-in function would support only specific data types. So using a built in namespace is not good. If join() supports many datatypes, creating optimized implementation would be difficult, if implemented using the __add__ method then it’s O(n²).
  • The separator string (sep) should not be omitted. Explicit is better than implicit.

There are no other reasons offered in this thread.

Here are some additional thoughts (my own, and my friend’s):

  • Unicode support was coming, but it was not final. At that time UTF-8 was the most likely about to replace UCS2/4. To calculate total buffer length of UTF-8 strings it needs to know character coding rule.
  • At that time, Python had already decided on a common sequence interface rule where a user could create a sequence-like (iterable) class. But Python didn’t support extending built-in types until 2.2. At that time it was difficult to provide basic iterable class (which is mentioned in another comment).

Guido’s decision is recorded in a historical mail, deciding on str.join(seq):

Funny, but it does seem right! Barry, go for it…
–Guido van Rossum


回答 2

因为该join()方法位于字符串类中,而不是列表类中?

我同意这看起来很有趣。

参见http://www.faqs.org/docs/diveintopython/odbchelper_join.html

历史记录。当我第一次学习Python时,我期望join是一个列表方法,它将分隔符作为参数。很多人都有相同的感觉,join方法背后还有一个故事。在Python 1.6之前,字符串没有所有这些有用的方法。有一个单独的字符串模块,其中包含所有字符串函数。每个函数都将字符串作为第一个参数。这些功能被认为很重要,足以放在字符串本身上,这对于诸如lower,upper和split这样的功能是有意义的。但是许多铁杆Python程序员反对使用新的join方法,认为它应该是列表的方法,或者根本不应该移动,而只是保留旧字符串模块的一部分(仍然有很多方法)里面有用的东西)。

— Mark Pilgrim,深入Python

Because the join() method is in the string class, instead of the list class?

I agree it looks funny.

See http://www.faqs.org/docs/diveintopython/odbchelper_join.html:

Historical note. When I first learned Python, I expected join to be a method of a list, which would take the delimiter as an argument. Lots of people feel the same way, and there’s a story behind the join method. Prior to Python 1.6, strings didn’t have all these useful methods. There was a separate string module which contained all the string functions; each function took a string as its first argument. The functions were deemed important enough to put onto the strings themselves, which made sense for functions like lower, upper, and split. But many hard-core Python programmers objected to the new join method, arguing that it should be a method of the list instead, or that it shouldn’t move at all but simply stay a part of the old string module (which still has lots of useful stuff in it). I use the new join method exclusively, but you will see code written either way, and if it really bothers you, you can use the old string.join function instead.

— Mark Pilgrim, Dive into Python


回答 3

我同意起初这是违反直觉的,但是有充分的理由。Join不能成为列表的方法,因为:

  • 它也必须适用于不同的可迭代对象(元组,生成器等)
  • 在不同类型的字符串之间它必须具有不同的行为。

实际上有两种连接方法(Python 3.0):

>>> b"".join
<built-in method join of bytes object at 0x00A46800>
>>> "".join
<built-in method join of str object at 0x00A28D40>

如果join是列表的一种方法,则它必须检查其参数以确定要调用的参数。而且您不能将byte和str结合在一起,因此它们现在的用法很有意义。

I agree that it’s counterintuitive at first, but there’s a good reason. Join can’t be a method of a list because:

  • it must work for different iterables too (tuples, generators, etc.)
  • it must have different behavior between different types of strings.

There are actually two join methods (Python 3.0):

>>> b"".join
<built-in method join of bytes object at 0x00A46800>
>>> "".join
<built-in method join of str object at 0x00A28D40>

If join was a method of a list, then it would have to inspect its arguments to decide which one of them to call. And you can’t join byte and str together, so the way they have it now makes sense.


回答 4

为什么用它string.join(list)代替list.join(string)

这是因为join是“字符串”方法!它从任何迭代创建一个字符串。如果我们将方法卡在列表中,那么当我们拥有非列表的可迭代对象时该怎么办?

如果您有一个字符串元组怎么办?如果这是一种list方法,则必须将每个这样的字符串迭代器都转换为,list然后才能将元素连接到单个字符串中!例如:

some_strings = ('foo', 'bar', 'baz')

让我们推出自己的列表连接方法:

class OurList(list): 
    def join(self, s):
        return s.join(self)

并使用它,请注意,我们必须首先从每个可迭代对象创建一个列表,以将该字符串连接到该可迭代对象,从而浪费内存和处理能力:

>>> l = OurList(some_strings) # step 1, create our list
>>> l.join(', ') # step 2, use our list join method!
'foo, bar, baz'

因此,我们看到我们必须添加一个额外的步骤来使用我们的列表方法,而不仅仅是使用内置的字符串方法:

>>> ' | '.join(some_strings) # a single step!
'foo | bar | baz'

生成器性能警告

Python用于创建最终字符串的算法str.join实际上必须传递两次迭代,因此,如果为其提供生成器表达式,则必须先将其具体化为列表,然后才能创建最终字符串。

因此,尽管绕过生成器通常比列表理解更好,但这str.join是一个exceptions:

>>> import timeit
>>> min(timeit.repeat(lambda: ''.join(str(i) for i in range(10) if i)))
3.839168446022086
>>> min(timeit.repeat(lambda: ''.join([str(i) for i in range(10) if i])))
3.339879313018173

但是,该str.join操作在语义上仍然是“字符串”操作,因此将其放在str对象上而不是在其他可迭代对象上还是有意义的。

Why is it string.join(list) instead of list.join(string)?

This is because join is a “string” method! It creates a string from any iterable. If we stuck the method on lists, what about when we have iterables that aren’t lists?

What if you have a tuple of strings? If this were a list method, you would have to cast every such iterator of strings as a list before you could join the elements into a single string! For example:

some_strings = ('foo', 'bar', 'baz')

Let’s roll our own list join method:

class OurList(list): 
    def join(self, s):
        return s.join(self)

And to use it, note that we have to first create a list from each iterable to join the strings in that iterable, wasting both memory and processing power:

>>> l = OurList(some_strings) # step 1, create our list
>>> l.join(', ') # step 2, use our list join method!
'foo, bar, baz'

So we see we have to add an extra step to use our list method, instead of just using the builtin string method:

>>> ' | '.join(some_strings) # a single step!
'foo | bar | baz'

Performance Caveat for Generators

The algorithm Python uses to create the final string with str.join actually has to pass over the iterable twice, so if you provide it a generator expression, it has to materialize it into a list first before it can create the final string.

Thus, while passing around generators is usually better than list comprehensions, str.join is an exception:

>>> import timeit
>>> min(timeit.repeat(lambda: ''.join(str(i) for i in range(10) if i)))
3.839168446022086
>>> min(timeit.repeat(lambda: ''.join([str(i) for i in range(10) if i])))
3.339879313018173

Nevertheless, the str.join operation is still semantically a “string” operation, so it still makes sense to have it on the str object than on miscellaneous iterables.


回答 5

将其视为拆分的自然正交运算。

我明白为什么它适用于任何可迭代的,所以不能简单地执行只是在列表中。

为了提高可读性,我想用该语言查看它,但我认为这实际上是不可行的-如果可迭代性是一个接口,则可以将其添加到该接口中,但这只是一个约定,因此没有中央方法将其添加到可迭代的事物集中。

Think of it as the natural orthogonal operation to split.

I understand why it is applicable to anything iterable and so can’t easily be implemented just on list.

For readability, I’d like to see it in the language but I don’t think that is actually feasible – if iterability were an interface then it could be added to the interface but it is just a convention and so there’s no central way to add it to the set of things which are iterable.


回答 6

主要是因为a的结果someString.join()是字符串。

序列(列表或元组等)不会出现在结果中,而只是一个字符串。因为结果是一个字符串,所以作为字符串的方法是有意义的。

Primarily because the result of a someString.join() is a string.

The sequence (list or tuple or whatever) doesn’t appear in the result, just a string. Because the result is a string, it makes sense as a method of a string.


回答 7

- 在“-”中。join(my_list)声明您正在从列表的连接元素转换为字符串。它以结果为导向。(为便于记忆和理解)

我制作了一个methods_of_string的详尽备忘单,供您参考。

string_methonds_44 = {
    'convert': ['join','split', 'rsplit','splitlines', 'partition', 'rpartition'],
    'edit': ['replace', 'lstrip', 'rstrip', 'strip'],
    'search': ['endswith', 'startswith', 'count', 'index', 'find','rindex', 'rfind',],
    'condition': ['isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isnumeric','isidentifier',
                  'islower','istitle', 'isupper','isprintable', 'isspace', ],
    'text': ['lower', 'upper', 'capitalize', 'title', 'swapcase',
             'center', 'ljust', 'rjust', 'zfill', 'expandtabs','casefold'],
    'encode': ['translate', 'maketrans', 'encode'],
    'format': ['format', 'format_map']}

- in “-“.join(my_list) declares that you are converting to a string from joining elements a list.It’s result-oriented.(just for easy memory and understanding)

I make a exhaustive cheatsheet of methods_of_string for your reference.

string_methonds_44 = {
    'convert': ['join','split', 'rsplit','splitlines', 'partition', 'rpartition'],
    'edit': ['replace', 'lstrip', 'rstrip', 'strip'],
    'search': ['endswith', 'startswith', 'count', 'index', 'find','rindex', 'rfind',],
    'condition': ['isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isnumeric','isidentifier',
                  'islower','istitle', 'isupper','isprintable', 'isspace', ],
    'text': ['lower', 'upper', 'capitalize', 'title', 'swapcase',
             'center', 'ljust', 'rjust', 'zfill', 'expandtabs','casefold'],
    'encode': ['translate', 'maketrans', 'encode'],
    'format': ['format', 'format_map']}

回答 8

两者都不好。

string.join(xs,delimit)表示字符串模块知道列表的存在,而列表列表却没有任何业务意义,因为字符串模块仅适用于字符串。

list.join(delimit)更好一点,因为我们习惯于将字符串作为基本类型(从语言上讲,它们是)。但是,这意味着需要动态调度连接,因为在a.split("\n") python编译器,可能不知道a是什么,因此需要查找它(类似于vtable查找),如果您花很多时间这样做,这会很昂贵。次。

如果python运行时编译器知道列表是内置模块,则它可以跳过动态查找并将意图直接编码为字节码,否则,它需要动态地解析“ a”的“ join”,这可能是多层的每次调用的继承关系(因为两次调用之间,join的含义可能已更改,因为python是一种动态语言)。

可悲的是,这是抽象的最终缺陷。无论您选择哪种抽象,您的抽象都仅在您要解决的问题的背景下才有意义,因此,当您开始将它们胶合在一起时,您将永远无法获得与基础意识形态相一致的一致抽象而不将它们包装在与您的意识形态相符的视图中。知道了这一点,python的方法更灵活,因为它更便宜,您可以自己制作包装器或自己的预处理器,为此要花更多的钱才能使它看起来“更漂亮”。

Both are not nice.

string.join(xs, delimit) means that the string module is aware of the existence of a list, which it has no business knowing about, since the string module only works with strings.

list.join(delimit) is a bit nicer because we’re so used to strings being a fundamental type(and lingually speaking, they are). However this means that join needs to be dispatched dynamically because in the arbitrary context of a.split("\n") the python compiler might not know what a is, and will need to look it up(analogously to vtable lookup), which is expensive if you do it a lot of times.

if the python runtime compiler knows that list is a built in module, it can skip the dynamic lookup and encode the intent into the bytecode directly, whereas otherwise it needs to dynamically resolve “join” of “a”, which may be up several layers of inheritence per call(since between calls, the meaning of join may have changed, because python is a dynamic language).

sadly, this is the ultimate flaw of abstraction; no matter what abstraction you choose, your abstraction will only make sense in the context of the problem you’re trying to solve, and as such you can never have a consistent abstraction that doesn’t become inconsistent with underlying ideologies as you start gluing them together without wrapping them in a view that is consistent with your ideology. Knowing this, python’s approach is more flexible since it’s cheaper, it’s up to you to pay more to make it look “nicer”, either by making your own wrapper, or your own preprocessor.


回答 9

变量my_list"-"都是对象。具体来说,它们分别是类list和的实例str。该join函数属于该类str。因此,使用语法"-".join(my_list)是因为对象"-"my_list作为输入。

The variables my_list and "-" are both objects. Specifically, they’re instances of the classes list and str, respectively. The join function belongs to the class str. Therefore, the syntax "-".join(my_list) is used because the object "-" is taking my_list as an input.