标签归档:string

在python中加入字符串列表,并将每个字符串都用引号引起来

问题:在python中加入字符串列表,并将每个字符串都用引号引起来

我有:

words = ['hello', 'world', 'you', 'look', 'nice']

我希望有:

'"hello", "world", "you", "look", "nice"'

用Python做到这一点最简单的方法是什么?

I’ve got:

words = ['hello', 'world', 'you', 'look', 'nice']

I want to have:

'"hello", "world", "you", "look", "nice"'

What’s the easiest way to do this with Python?


回答 0

>>> words = ['hello', 'world', 'you', 'look', 'nice']
>>> ', '.join('"{0}"'.format(w) for w in words)
'"hello", "world", "you", "look", "nice"'
>>> words = ['hello', 'world', 'you', 'look', 'nice']
>>> ', '.join('"{0}"'.format(w) for w in words)
'"hello", "world", "you", "look", "nice"'

回答 1

您也可以执行一次format通话

>>> words = ['hello', 'world', 'you', 'look', 'nice']
>>> '"{0}"'.format('", "'.join(words))
'"hello", "world", "you", "look", "nice"'

更新:一些基准测试(以2009 Mbps的速度执行):

>>> timeit.Timer("""words = ['hello', 'world', 'you', 'look', 'nice'] * 100; ', '.join('"{0}"'.format(w) for w in words)""").timeit(1000)
0.32559704780578613

>>> timeit.Timer("""words = ['hello', 'world', 'you', 'look', 'nice'] * 100; '"{}"'.format('", "'.join(words))""").timeit(1000)
0.018904924392700195

所以看来format实际上很贵

更新2:在@JCode的注释之后,添加了一个map以确保join可以运行,Python 2.7.12

>>> timeit.Timer("""words = ['hello', 'world', 'you', 'look', 'nice'] * 100; ', '.join('"{0}"'.format(w) for w in words)""").timeit(1000)
0.08646488189697266

>>> timeit.Timer("""words = ['hello', 'world', 'you', 'look', 'nice'] * 100; '"{}"'.format('", "'.join(map(str, words)))""").timeit(1000)
0.04855608940124512

>>> timeit.Timer("""words = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] * 100; ', '.join('"{0}"'.format(w) for w in words)""").timeit(1000)
0.17348504066467285

>>> timeit.Timer("""words = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] * 100; '"{}"'.format('", "'.join(map(str, words)))""").timeit(1000)
0.06372308731079102

you may also perform a single format call

>>> words = ['hello', 'world', 'you', 'look', 'nice']
>>> '"{0}"'.format('", "'.join(words))
'"hello", "world", "you", "look", "nice"'

Update: Some benchmarking (performed on a 2009 mbp):

>>> timeit.Timer("""words = ['hello', 'world', 'you', 'look', 'nice'] * 100; ', '.join('"{0}"'.format(w) for w in words)""").timeit(1000)
0.32559704780578613

>>> timeit.Timer("""words = ['hello', 'world', 'you', 'look', 'nice'] * 100; '"{}"'.format('", "'.join(words))""").timeit(1000)
0.018904924392700195

So it seems that format is actually quite expensive

Update 2: following @JCode’s comment, adding a map to ensure that join will work, Python 2.7.12

>>> timeit.Timer("""words = ['hello', 'world', 'you', 'look', 'nice'] * 100; ', '.join('"{0}"'.format(w) for w in words)""").timeit(1000)
0.08646488189697266

>>> timeit.Timer("""words = ['hello', 'world', 'you', 'look', 'nice'] * 100; '"{}"'.format('", "'.join(map(str, words)))""").timeit(1000)
0.04855608940124512

>>> timeit.Timer("""words = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] * 100; ', '.join('"{0}"'.format(w) for w in words)""").timeit(1000)
0.17348504066467285

>>> timeit.Timer("""words = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] * 100; '"{}"'.format('", "'.join(map(str, words)))""").timeit(1000)
0.06372308731079102

回答 2

您可以尝试以下方法:

str(words)[1:-1]

You can try this :

str(words)[1:-1]

回答 3

>>> ', '.join(['"%s"' % w for w in words])
>>> ', '.join(['"%s"' % w for w in words])

回答 4

@jamylak答案的更新版本带有F字符串(适用于python 3.6+),我已经在SQL脚本使用的字符串中使用了反引号。

keys = ['foo', 'bar' , 'omg']
', '.join(f'`{k}`' for k in keys)
# result: '`foo`, `bar`, `omg`'

An updated version of @jamylak answer with F Strings (for python 3.6+), I’ve used backticks for a string used for a SQL script.

keys = ['foo', 'bar' , 'omg']
', '.join(f'`{k}`' for k in keys)
# result: '`foo`, `bar`, `omg`'

根据最后出现的分隔符将字符串分成2个

问题:根据最后出现的分隔符将字符串分成2个

我想知道在python中是否有任何内置函数根据最后一次出现的分隔符将字符串分为2部分。

例如:考虑字符串“ abc,d,e,f”,在分隔符“,”分割后,我希望输出为

“ abc,d,e”和“ f”。

我知道如何操作字符串以获取所需的输出,但是我想知道python中是否有内置函数。

I would like to know if there is any built in function in python to break the string in to 2 parts, based on the last occurrence of a separator.

for eg: consider the string “a b c,d,e,f” , after the split over separator “,”, i want the output as

“a b c,d,e” and “f”.

I know how to manipulate the string to get the desired output, but i want to know if there is any in built function in python.


回答 0

使用rpartition(s)。它确实做到了。

您也可以使用rsplit(s, 1)

Use rpartition(s). It does exactly that.

You can also use rsplit(s, 1).


回答 1

>>> "a b c,d,e,f".rsplit(',',1)
['a b c,d,e', 'f']
>>> "a b c,d,e,f".rsplit(',',1)
['a b c,d,e', 'f']

回答 2

您可以使用最后一个分隔符来分割字符串rsplit

返回字符串中的单词列表,以分隔符字符串分隔(从右开始)。

要以最后一个逗号分隔:

>>> "a b c,d,e,f".rsplit(',', 1)
['a b c,d,e', 'f']

You can split a string by the last occurrence of a separator with rsplit:

Returns a list of the words in the string, separated by the delimiter string (starting from right).

To split by the last comma:

>>> "a b c,d,e,f".rsplit(',', 1)
['a b c,d,e', 'f']

如何取消转义的反斜杠字符串?

问题:如何取消转义的反斜杠字符串?

假设我有一个字符串,它是另一个字符串的反斜杠转义版本。在Python中,有没有一种简便的方法可以使字符串不转义?例如,我可以这样做:

>>> escaped_str = '"Hello,\\nworld!"'
>>> raw_str = eval(escaped_str)
>>> print raw_str
Hello,
world!
>>> 

但是,这涉及将(可能不受信任的)字符串传递给eval(),这是安全隐患。标准库中是否有一个函数可以接收一个字符串并生成一个不涉及安全性的字符串?

Suppose I have a string which is a backslash-escaped version of another string. Is there an easy way, in Python, to unescape the string? I could, for example, do:

>>> escaped_str = '"Hello,\\nworld!"'
>>> raw_str = eval(escaped_str)
>>> print raw_str
Hello,
world!
>>> 

However that involves passing a (possibly untrusted) string to eval() which is a security risk. Is there a function in the standard lib which takes a string and produces a string with no security implications?


回答 0

>>> print '"Hello,\\nworld!"'.decode('string_escape')
"Hello,
world!"
>>> print '"Hello,\\nworld!"'.decode('string_escape')
"Hello,
world!"

回答 1

您可以使用ast.literal_eval哪个是安全的:

安全地评估表达式节点或包含Python表达式的字符串。提供的字符串或节点只能由以下Python文字结构组成:字符串,数字,元组,列表,字典,布尔值和无。(结束)

像这样:

>>> import ast
>>> escaped_str = '"Hello,\\nworld!"'
>>> print ast.literal_eval(escaped_str)
Hello,
world!

You can use ast.literal_eval which is safe:

Safely evaluate an expression node or a string containing a Python expression. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None. (END)

Like this:

>>> import ast
>>> escaped_str = '"Hello,\\nworld!"'
>>> print ast.literal_eval(escaped_str)
Hello,
world!

回答 2

所有给出的答案将在通用Unicode字符串上中断。据我所知,以下代码在所有情况下都适用于Python3:

from codecs import encode, decode
sample = u'mon€y\\nröcks'
result = decode(encode(sample, 'latin-1', 'backslashreplace'), 'unicode-escape')
print(result)

如注释中所述,您还可以像下面这样使用模块中的literal_eval方法ast

import ast
sample = u'mon€y\\nröcks'
print(ast.literal_eval(F'"{sample}"'))

当您的字符串确实包含字符串文字(包括引号)时,也可以这样:

import ast
sample = u'"mon€y\\nröcks"'
print(ast.literal_eval(sample))

但是,如果不确定输入字符串是使用双引号还是单引号作为定界符,或者不确定根本不能正确转义输入字符串,则literal_eval可能会花点时间SyntaxError编码/解码方法仍然有效。

All given answers will break on general Unicode strings. The following works for Python3 in all cases, as far as I can tell:

from codecs import encode, decode
sample = u'mon€y\\nröcks'
result = decode(encode(sample, 'latin-1', 'backslashreplace'), 'unicode-escape')
print(result)

As outlined in the comments, you can also use the literal_eval method from the ast module like so:

import ast
sample = u'mon€y\\nröcks'
print(ast.literal_eval(F'"{sample}"'))

Or like this when your string really contains a string literal (including the quotes):

import ast
sample = u'"mon€y\\nröcks"'
print(ast.literal_eval(sample))

However, if you are uncertain whether the input string uses double or single quotes as delimiters, or when you cannot assume it to be properly escaped at all, then literal_eval may raise a SyntaxError while the encode/decode method will still work.


回答 3

在python 3中,str对象没有decode方法,您必须使用bytes对象。ChristopheD的答案涵盖了python 2。

# create a `bytes` object from a `str`
my_str = "Hello,\\nworld"
# (pick an encoding suitable for your str, e.g. 'latin1')
my_bytes = my_str.encode("utf-8")

# or directly
my_bytes = b"Hello,\\nworld"

print(my_bytes.decode("unicode_escape"))
# "Hello,
# world"

In python 3, str objects don’t have a decode method and you have to use a bytes object. ChristopheD’s answer covers python 2.

# create a `bytes` object from a `str`
my_str = "Hello,\\nworld"
# (pick an encoding suitable for your str, e.g. 'latin1')
my_bytes = my_str.encode("utf-8")

# or directly
my_bytes = b"Hello,\\nworld"

print(my_bytes.decode("unicode_escape"))
# "Hello,
# world"

Python逐行写入CSV

问题:Python逐行写入CSV

我有通过http请求访问的数据,并由服务器以逗号分隔的格式发送回去,我有以下代码:

site= 'www.example.com'
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(site,headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
soup = soup.get_text()
text=str(soup)

文本内容如下:

april,2,5,7
may,3,5,8
june,4,7,3
july,5,6,9

如何将这些数据保存到CSV文件中。我知道我可以按照以下步骤做一些事情,逐行进行迭代:

import StringIO
s = StringIO.StringIO(text)
for line in s:

但是我不确定现在如何正确地将每一行写入CSV

编辑—>感谢您提供的反馈,该解决方案非常简单,可以在下面看到。

解:

import StringIO
s = StringIO.StringIO(text)
with open('fileName.csv', 'w') as f:
    for line in s:
        f.write(line)

I have data which is being accessed via http request and is sent back by the server in a comma separated format, I have the following code :

site= 'www.example.com'
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(site,headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
soup = soup.get_text()
text=str(soup)

The content of text is as follows:

april,2,5,7
may,3,5,8
june,4,7,3
july,5,6,9

How can I save this data into a CSV file. I know I can do something along the lines of the following to iterate line by line:

import StringIO
s = StringIO.StringIO(text)
for line in s:

But i’m unsure how to now properly write each line to CSV

EDIT—> Thanks for the feedback as suggested the solution was rather simple and can be seen below.

Solution:

import StringIO
s = StringIO.StringIO(text)
with open('fileName.csv', 'w') as f:
    for line in s:
        f.write(line)

回答 0

一般方式:

##text=List of strings to be written to file
with open('csvfile.csv','wb') as file:
    for line in text:
        file.write(line)
        file.write('\n')

要么

使用CSV编写器:

import csv
with open(<path to output_csv>, "wb") as csv_file:
        writer = csv.writer(csv_file, delimiter=',')
        for line in data:
            writer.writerow(line)

要么

最简单的方法:

f = open('csvfile.csv','w')
f.write('hi there\n') #Give your csv text here.
## Python will convert \n to os.linesep
f.close()

General way:

##text=List of strings to be written to file
with open('csvfile.csv','wb') as file:
    for line in text:
        file.write(line)
        file.write('\n')

OR

Using CSV writer :

import csv
with open(<path to output_csv>, "wb") as csv_file:
        writer = csv.writer(csv_file, delimiter=',')
        for line in data:
            writer.writerow(line)

OR

Simplest way:

f = open('csvfile.csv','w')
f.write('hi there\n') #Give your csv text here.
## Python will convert \n to os.linesep
f.close()

回答 1

您可以像写入任何普通文件一样直接写入文件。

with open('csvfile.csv','wb') as file:
    for l in text:
        file.write(l)
        file.write('\n')

如果以防万一,它是一个列表列表,您可以直接使用内置csv模块

import csv

with open("csvfile.csv", "wb") as file:
    writer = csv.writer(file)
    writer.writerows(text)

You could just write to the file as you would write any normal file.

with open('csvfile.csv','wb') as file:
    for l in text:
        file.write(l)
        file.write('\n')

If just in case, it is a list of lists, you could directly use built-in csv module

import csv

with open("csvfile.csv", "wb") as file:
    writer = csv.writer(file)
    writer.writerows(text)

回答 2

我只需将每一行写入文件,因为它已经是CSV格式:

write_file = "output.csv"
with open(write_file, "w") as output:
    for line in text:
        output.write(line + '\n')

我现在不记得如何写带有换行符的行,尽管:p

此外,你可能想看看这个答案write()writelines()'\n'

I would simply write each line to a file, since it’s already in a CSV format:

write_file = "output.csv"
with open(write_file, "w") as output:
    for line in text:
        output.write(line + '\n')

I can’t recall how to write lines with line-breaks at the moment, though :p

Also, you might like to take a look at this answer about write(), writelines(), and '\n'.


回答 3

为了补充前面的答案,我快速上了一堂课来写CSV文件。如果您必须处理多个文件,它可以更轻松地管理和关闭打开的文件,并实现一致性和更简洁的代码。

class CSVWriter():

    filename = None
    fp = None
    writer = None

    def __init__(self, filename):
        self.filename = filename
        self.fp = open(self.filename, 'w', encoding='utf8')
        self.writer = csv.writer(self.fp, delimiter=';', quotechar='"', quoting=csv.QUOTE_ALL, lineterminator='\n')

    def close(self):
        self.fp.close()

    def write(self, elems):
        self.writer.writerow(elems)

    def size(self):
        return os.path.getsize(self.filename)

    def fname(self):
        return self.filename

用法示例:

mycsv = CSVWriter('/tmp/test.csv')
mycsv.write((12,'green','apples'))
mycsv.write((7,'yellow','bananas'))
mycsv.close()
print("Written %d bytes to %s" % (mycsv.size(), mycsv.fname()))

玩得开心

To complement the previous answers, I whipped up a quick class to write to CSV files. It makes it easier to manage and close open files and achieve consistency and cleaner code if you have to deal with multiple files.

class CSVWriter():

    filename = None
    fp = None
    writer = None

    def __init__(self, filename):
        self.filename = filename
        self.fp = open(self.filename, 'w', encoding='utf8')
        self.writer = csv.writer(self.fp, delimiter=';', quotechar='"', quoting=csv.QUOTE_ALL, lineterminator='\n')

    def close(self):
        self.fp.close()

    def write(self, elems):
        self.writer.writerow(elems)

    def size(self):
        return os.path.getsize(self.filename)

    def fname(self):
        return self.filename

Example usage:

mycsv = CSVWriter('/tmp/test.csv')
mycsv.write((12,'green','apples'))
mycsv.write((7,'yellow','bananas'))
mycsv.close()
print("Written %d bytes to %s" % (mycsv.size(), mycsv.fname()))

Have fun


回答 4

那这个呢:

with open("your_csv_file.csv", "w") as f:
    f.write("\n".join(text))

str.join()返回一个字符串,该字符串是可迭代的字符串的串联。元素之间的分隔符是提供此方法的字符串。

What about this:

with open("your_csv_file.csv", "w") as f:
    f.write("\n".join(text))

str.join() Return a string which is the concatenation of the strings in iterable. The separator between elements is the string providing this method.


Python中的字符串串联与字符串替换

问题:Python中的字符串串联与字符串替换

在Python中,使用字符串连接与字符串替换的时间和地点使我难以理解。由于字符串连接的性能有了很大提高,这(成为更多)是一种风格上的决定,而不是一种实际的决定吗?

举一个具体的例子,如何处理灵活的URI:

DOMAIN = 'http://stackoverflow.com'
QUESTIONS = '/questions'

def so_question_uri_sub(q_num):
    return "%s%s/%d" % (DOMAIN, QUESTIONS, q_num)

def so_question_uri_cat(q_num):
    return DOMAIN + QUESTIONS + '/' + str(q_num)

编辑:也有关于加入字符串列表和使用命名替换的建议。这些是中心主题的变体,即在什么时候做正确的方法?感谢您的回复!

In Python, the where and when of using string concatenation versus string substitution eludes me. As the string concatenation has seen large boosts in performance, is this (becoming more) a stylistic decision rather than a practical one?

For a concrete example, how should one handle construction of flexible URIs:

DOMAIN = 'http://stackoverflow.com'
QUESTIONS = '/questions'

def so_question_uri_sub(q_num):
    return "%s%s/%d" % (DOMAIN, QUESTIONS, q_num)

def so_question_uri_cat(q_num):
    return DOMAIN + QUESTIONS + '/' + str(q_num)

Edit: There have also been suggestions about joining a list of strings and for using named substitution. These are variants on the central theme, which is, which way is the Right Way to do it at which time? Thanks for the responses!


回答 0

根据我的机器,连接的速度(明显)更快。但从风格上讲,如果性能不是很关键,我愿意付出替代的代价。好吧,如果我需要格式化,就不用问这个问题了……别无选择,只能使用插值/模板化。

>>> import timeit
>>> def so_q_sub(n):
...  return "%s%s/%d" % (DOMAIN, QUESTIONS, n)
...
>>> so_q_sub(1000)
'http://stackoverflow.com/questions/1000'
>>> def so_q_cat(n):
...  return DOMAIN + QUESTIONS + '/' + str(n)
...
>>> so_q_cat(1000)
'http://stackoverflow.com/questions/1000'
>>> t1 = timeit.Timer('so_q_sub(1000)','from __main__ import so_q_sub')
>>> t2 = timeit.Timer('so_q_cat(1000)','from __main__ import so_q_cat')
>>> t1.timeit(number=10000000)
12.166618871951641
>>> t2.timeit(number=10000000)
5.7813972166853773
>>> t1.timeit(number=1)
1.103492206766532e-05
>>> t2.timeit(number=1)
8.5206360154188587e-06

>>> def so_q_tmp(n):
...  return "{d}{q}/{n}".format(d=DOMAIN,q=QUESTIONS,n=n)
...
>>> so_q_tmp(1000)
'http://stackoverflow.com/questions/1000'
>>> t3= timeit.Timer('so_q_tmp(1000)','from __main__ import so_q_tmp')
>>> t3.timeit(number=10000000)
14.564135316080637

>>> def so_q_join(n):
...  return ''.join([DOMAIN,QUESTIONS,'/',str(n)])
...
>>> so_q_join(1000)
'http://stackoverflow.com/questions/1000'
>>> t4= timeit.Timer('so_q_join(1000)','from __main__ import so_q_join')
>>> t4.timeit(number=10000000)
9.4431309007150048

Concatenation is (significantly) faster according to my machine. But stylistically, I’m willing to pay the price of substitution if performance is not critical. Well, and if I need formatting, there’s no need to even ask the question… there’s no option but to use interpolation/templating.

>>> import timeit
>>> def so_q_sub(n):
...  return "%s%s/%d" % (DOMAIN, QUESTIONS, n)
...
>>> so_q_sub(1000)
'http://stackoverflow.com/questions/1000'
>>> def so_q_cat(n):
...  return DOMAIN + QUESTIONS + '/' + str(n)
...
>>> so_q_cat(1000)
'http://stackoverflow.com/questions/1000'
>>> t1 = timeit.Timer('so_q_sub(1000)','from __main__ import so_q_sub')
>>> t2 = timeit.Timer('so_q_cat(1000)','from __main__ import so_q_cat')
>>> t1.timeit(number=10000000)
12.166618871951641
>>> t2.timeit(number=10000000)
5.7813972166853773
>>> t1.timeit(number=1)
1.103492206766532e-05
>>> t2.timeit(number=1)
8.5206360154188587e-06

>>> def so_q_tmp(n):
...  return "{d}{q}/{n}".format(d=DOMAIN,q=QUESTIONS,n=n)
...
>>> so_q_tmp(1000)
'http://stackoverflow.com/questions/1000'
>>> t3= timeit.Timer('so_q_tmp(1000)','from __main__ import so_q_tmp')
>>> t3.timeit(number=10000000)
14.564135316080637

>>> def so_q_join(n):
...  return ''.join([DOMAIN,QUESTIONS,'/',str(n)])
...
>>> so_q_join(1000)
'http://stackoverflow.com/questions/1000'
>>> t4= timeit.Timer('so_q_join(1000)','from __main__ import so_q_join')
>>> t4.timeit(number=10000000)
9.4431309007150048

回答 1

不要忘记命名替换:

def so_question_uri_namedsub(q_num):
    return "%(domain)s%(questions)s/%(q_num)d" % locals()

Don’t forget about named substitution:

def so_question_uri_namedsub(q_num):
    return "%(domain)s%(questions)s/%(q_num)d" % locals()

回答 2

小心将字符串串联在一起! 字符串连接的代价与结果的长度成正比。循环使您直接进入N平方的区域。某些语言会优化串联到最近分配的字符串,但是依靠编译器将二次算法优化到线性优化是有风险的。最好使用原语(join?),该原语接收整个字符串列表,进行一次分配,然后一次性将它们全部串联起来。

Be wary of concatenating strings in a loop! The cost of string concatenation is proportional to the length of the result. Looping leads you straight to the land of N-squared. Some languages will optimize concatenation to the most recently allocated string, but it’s risky to count on the compiler to optimize your quadratic algorithm down to linear. Best to use the primitive (join?) that takes an entire list of strings, does a single allocation, and concatenates them all in one go.


回答 3

“由于字符串串联已经大大提高了性能……”

如果性能很重要,这是个好消息。

但是,我所见过的性能问题从未归结为字符串操作。我通常遇到I / O,排序和O(n 2)操作成为瓶颈的麻烦。

在字符串操作成为性能限制因素之前,我将坚持显而易见的事情。通常,当一行或更少行时,这是替换;当有意义时,则是串联;当它很大时,则是模板工具(例如Mako)。

“As the string concatenation has seen large boosts in performance…”

If performance matters, this is good to know.

However, performance problems I’ve seen have never come down to string operations. I’ve generally gotten in trouble with I/O, sorting and O(n2) operations being the bottlenecks.

Until string operations are the performance limiters, I’ll stick with things that are obvious. Mostly, that’s substitution when it’s one line or less, concatenation when it makes sense, and a template tool (like Mako) when it’s large.


回答 4

您要串联/插值的内容以及结果格式的格式应该会影响您的决策。

  • 字符串插值使您可以轻松添加格式。实际上,您的字符串插值版本与连接版本的功能不同。实际上,它会在q_num参数之前添加一个额外的正斜杠。要执行相同的操作,您将必须return DOMAIN + QUESTIONS + "/" + str(q_num)在该示例中编写。

  • 插值使设置数字格式更加容易;"%d of %d (%2.2f%%)" % (current, total, total/current)串联形式的可读性将大大降低。

  • 当您没有固定数量的项目要进行字符串化时,串联很有用。

另外,请知道Python 2.6引入了新版本的字符串插值,称为字符串模板

def so_question_uri_template(q_num):
    return "{domain}/{questions}/{num}".format(domain=DOMAIN,
                                               questions=QUESTIONS,
                                               num=q_num)

字符串模板将最终取代%插值,但是我认为这不会出现很长时间。

What you want to concatenate/interpolate and how you want to format the result should drive your decision.

  • String interpolation allows you to easily add formatting. In fact, your string interpolation version doesn’t do the same thing as your concatenation version; it actually adds an extra forward slash before the q_num parameter. To do the same thing, you would have to write return DOMAIN + QUESTIONS + "/" + str(q_num) in that example.

  • Interpolation makes it easier to format numerics; "%d of %d (%2.2f%%)" % (current, total, total/current) would be much less readable in concatenation form.

  • Concatenation is useful when you don’t have a fixed number of items to string-ize.

Also, know that Python 2.6 introduces a new version of string interpolation, called string templating:

def so_question_uri_template(q_num):
    return "{domain}/{questions}/{num}".format(domain=DOMAIN,
                                               questions=QUESTIONS,
                                               num=q_num)

String templating is slated to eventually replace %-interpolation, but that won’t happen for quite a while, I think.


回答 5

我只是出于好奇而测试了不同的字符串连接/替换方法的速度。谷歌搜索该主题将我带到这里。我以为我会发布测试结果,希望它可以帮助某人做出决定。

    import timeit
    def percent_():
            return "test %s, with number %s" % (1,2)

    def format_():
            return "test {}, with number {}".format(1,2)

    def format2_():
            return "test {1}, with number {0}".format(2,1)

    def concat_():
            return "test " + str(1) + ", with number " + str(2)

    def dotimers(func_list):
            # runs a single test for all functions in the list
            for func in func_list:
                    tmr = timeit.Timer(func)
                    res = tmr.timeit()
                    print "test " + func.func_name + ": " + str(res)

    def runtests(func_list, runs=5):
            # runs multiple tests for all functions in the list
            for i in range(runs):
                    print "----------- TEST #" + str(i + 1)
                    dotimers(func_list)

…运行之后runtests((percent_, format_, format2_, concat_), runs=5),我发现%方法的速度大约是这些小字符串上其他方法的两倍。concat方法始终是最慢的(很少)。切换format()方法中的位置时,差异很小,但是切换位置总是比常规格式方法至少慢0.01。

测试结果样本:

    test concat_()  : 0.62  (0.61 to 0.63)
    test format_()  : 0.56  (consistently 0.56)
    test format2_() : 0.58  (0.57 to 0.59)
    test percent_() : 0.34  (0.33 to 0.35)

之所以运行这些程序,是因为我在脚本中确实使用了字符串连接,所以我想知道这样做的代价是什么。我以不同的顺序运行它们,以确保没有任何干扰,或者获得更好的性能。附带说明一下,我将一些更长的字符串生成器加入了这些函数中,例如"%s" + ("a" * 1024),常规concat的速度几乎是使用formatand %方法的三倍(1.1 vs 2.8)。我想这取决于字符串以及您要实现的目标。如果性能确实很重要,那么尝试不同的东西并进行测试可能会更好。除非速度成为问题,否则我倾向于选择可读性而不是速度,但这就是我。所以不喜欢我的复制/粘贴,我必须在所有内容上放置8个空格以使其看起来正确。我通常使用4。

I was just testing the speed of different string concatenation/substitution methods out of curiosity. A google search on the subject brought me here. I thought I would post my test results in the hope that it might help someone decide.

    import timeit
    def percent_():
            return "test %s, with number %s" % (1,2)

    def format_():
            return "test {}, with number {}".format(1,2)

    def format2_():
            return "test {1}, with number {0}".format(2,1)

    def concat_():
            return "test " + str(1) + ", with number " + str(2)

    def dotimers(func_list):
            # runs a single test for all functions in the list
            for func in func_list:
                    tmr = timeit.Timer(func)
                    res = tmr.timeit()
                    print "test " + func.func_name + ": " + str(res)

    def runtests(func_list, runs=5):
            # runs multiple tests for all functions in the list
            for i in range(runs):
                    print "----------- TEST #" + str(i + 1)
                    dotimers(func_list)

…After running runtests((percent_, format_, format2_, concat_), runs=5), I found that the % method was about twice as fast as the others on these small strings. The concat method was always the slowest (barely). There were very tiny differences when switching the positions in the format() method, but switching positions was always at least .01 slower than the regular format method.

Sample of test results:

    test concat_()  : 0.62  (0.61 to 0.63)
    test format_()  : 0.56  (consistently 0.56)
    test format2_() : 0.58  (0.57 to 0.59)
    test percent_() : 0.34  (0.33 to 0.35)

I ran these because I do use string concatenation in my scripts, and I was wondering what the cost was. I ran them in different orders to make sure nothing was interfering, or getting better performance being first or last. On a side note, I threw in some longer string generators into those functions like "%s" + ("a" * 1024) and regular concat was almost 3 times as fast (1.1 vs 2.8) as using the format and % methods. I guess it depends on the strings, and what you are trying to achieve. If performance really matters, it might be better to try different things and test them. I tend to choose readability over speed, unless speed becomes a problem, but thats just me. SO didn’t like my copy/paste, i had to put 8 spaces on everything to make it look right. I usually use 4.


回答 6

请记住,如果您打算维护或调试代码,则风格决定实际的决定:-) Knuth有句著名的名言(可能引述Hoare?):“我们应该忘记效率低下的问题,大约有97%的时间是这样:过早的优化是万恶之源。”

只要您小心谨慎,不要(例如)将O(n)任务转换为O(n 2)任务,无论您发现最容易理解的是什么,我都会选择。

Remember, stylistic decisions are practical decisions, if you ever plan on maintaining or debugging your code :-) There’s a famous quote from Knuth (possibly quoting Hoare?): “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.”

As long as you’re careful not to (say) turn a O(n) task into an O(n2) task, I would go with whichever you find easiest to understand..


回答 7

我会尽一切可能使用替代。如果要在for循环中构建字符串,则仅使用串联。

I use substitution wherever I can. I only use concatenation if I’m building a string up in say a for-loop.


回答 8

实际上,在这种情况下(构建路径),正确的做法是使用os.path.join。不是字符串串联或插值

Actually the correct thing to do, in this case (building paths) is to use os.path.join. Not string concatenation or interpolation


删除字符串的第一个字符

问题:删除字符串的第一个字符

我想删除字符串的第一个字符。

例如,我的字符串以a开头,:而我只想删除它。:字符串中有几次不应删除。

我正在用Python编写代码。

I would like to remove the first character of a string.

For example, my string starts with a : and I want to remove that only. There are several occurrences of : in the string that shouldn’t be removed.

I am writing my code in Python.


回答 0

python 2.x

s = ":dfa:sif:e"
print s[1:]

python 3.x

s = ":dfa:sif:e"
print(s[1:])

都印

dfa:sif:e

python 2.x

s = ":dfa:sif:e"
print s[1:]

python 3.x

s = ":dfa:sif:e"
print(s[1:])

both prints

dfa:sif:e

回答 1

您的问题似乎不清楚。您说要删除“某个位置的字符”,然后继续说要删除特定字符。

如果只需要删除第一个字符,则可以执行以下操作:

s = ":dfa:sif:e"
fixed = s[1:]

如果要删除特定位置的字符,可以执行以下操作:

s = ":dfa:sif:e"
fixed = s[0:pos]+s[pos+1:]

如果您需要删除某个特定字符,例如在字符串中首次遇到该字符,请说::

s = ":dfa:sif:e"
fixed = ''.join(s.split(':', 1))

Your problem seems unclear. You say you want to remove “a character from a certain position” then go on to say you want to remove a particular character.

If you only need to remove the first character you would do:

s = ":dfa:sif:e"
fixed = s[1:]

If you want to remove a character at a particular position, you would do:

s = ":dfa:sif:e"
fixed = s[0:pos]+s[pos+1:]

If you need to remove a particular character, say ‘:’, the first time it is encountered in a string then you would do:

s = ":dfa:sif:e"
fixed = ''.join(s.split(':', 1))

回答 2

根据字符串的结构,可以使用lstrip

str = str.lstrip(':')

但这会在一开始就删除所有冒号,即如果有::foo,结果将是foo。但是,如果您还具有不以冒号开头的字符串并且不想删除第一个字符,则此功能很有用。

Depending on the structure of the string, you can use lstrip:

str = str.lstrip(':')

But this would remove all colons at the beginning, i.e. if you have ::foo, the result would be foo. But this function is helpful if you also have strings that do not start with a colon and you don’t want to remove the first character then.


回答 3

删除字符:

def del_char(string, indexes):

    'deletes all the indexes from the string and returns the new one'

    return ''.join((char for idx, char in enumerate(string) if idx not in indexes))

它删除索引中的所有字符;你可以在你的情况下使用它del_char(your_string, [0])

deleting a char:

def del_char(string, indexes):

    'deletes all the indexes from the string and returns the new one'

    return ''.join((char for idx, char in enumerate(string) if idx not in indexes))

it deletes all the chars that are in indexes; you can use it in your case with del_char(your_string, [0])


python:SyntaxError:扫描字符串文字时停产

问题:python:SyntaxError:扫描字符串文字时停产

我有上述错误 s1="some very long string............"

有人知道我在做什么错吗?

I have the above-mentioned error in s1="some very long string............"

Does anyone know what I am doing wrong?


回答 0

您没有"在行尾放置a 。

"""如果要执行此操作,请使用:

""" a very long string ...... 
....that can span multiple lines
"""

You are not putting a " before the end of the line.

Use """ if you want to do this:

""" a very long string ...... 
....that can span multiple lines
"""

回答 1

我遇到了这个问题-我最终弄清楚了原因是我\在字符串中包含了字符。如果您有任何一个,请与他们“转义” \\,它应该可以正常工作。

I had this problem – I eventually worked out that the reason was that I’d included \ characters in the string. If you have any of these, “escape” them with \\ and it should work fine.


回答 2

(假设您的字符串中没有/想要换行…)

这串真的多久了?

我怀疑从文件或从命令行读取的行的长度是有限制的,并且由于行的结尾被解析器截断,因此会看到类似s1="some very long string..........(不带结尾")的内容,从而引发解析错误?

您可以通过在源代码中转义换行符,将长行分成多行:

s1="some very long string.....\
...\
...."

(Assuming you don’t have/want line breaks in your string…)

How long is this string really?

I suspect there is a limit to how long a line read from a file or from the commandline can be, and because the end of the line gets choped off the parser sees something like s1="some very long string.......... (without an ending ") and thus throws a parsing error?

You can split long lines up in multiple lines by escaping linebreaks in your source like this:

s1="some very long string.....\
...\
...."

回答 3

在我的情况下,我\r\n在单引号中包含字典字符串。我取代的所有实例\r\\r\n\\n它固定我的问题,正确地返回在eval’ed字典逃脱换行符。

ast.literal_eval(my_str.replace('\r','\\r').replace('\n','\\n'))
  .....

In my situation, I had \r\n in my single-quoted dictionary strings. I replaced all instances of \r with \\r and \n with \\n and it fixed my issue, properly returning escaped line breaks in the eval’ed dict.

ast.literal_eval(my_str.replace('\r','\\r').replace('\n','\\n'))
  .....

回答 4

我遇到了类似的问题。我有一个包含Windows中文件夹路径的字符串,例如C:\Users\,问题是\转义字符,因此要在字符串中使用它,您需要再添加一个\

不正确: C:\Users\

正确: C:\\\Users\\\

I faced a similar problem. I had a string which contained path to a folder in Windows e.g. C:\Users\ The problem is that \ is an escape character and so in order to use it in strings you need to add one more \.

Incorrect: C:\Users\

Correct: C:\\\Users\\\


回答 5

我也有这个问题,尽管这里有答案,但我想在/不应该有空白的地方对此做一个重要 说明。

I too had this problem, though there were answers here I want to an important point to this after / there should not be empty spaces.Be Aware of it


回答 6

我也有此确切的错误消息,对我来说,此问题已通过添加“ \”来解决

事实证明,我的长字符串在结尾处被分解成大约八行,并带有“ \”,但在一行上缺少“ \”。

Python IDLE没有指定此错误所在的行号,但是它以红色突出显示了完全正确的变量赋值语句,这使我不满意。实际变形的字符串语句(多行长为“ \”)与突出显示的语句相邻。也许这会帮助别人。

I also had this exact error message, for me the problem was fixed by adding an ” \”

It turns out that my long string, broken into about eight lines with ” \” at the very end, was missing a ” \” on one line.

Python IDLE didn’t specify a line number that this error was on, but it red-highlighted a totally correct variable assignment statement, throwing me off. The actual misshapen string statement (multiple lines long with ” \”) was adjacent to the statement being highlighted. Maybe this will help someone else.


回答 7

就我而言,我使用Windows,因此必须使用双引号而不是单引号。

C:\Users\Dr. Printer>python -mtimeit -s"a = 0"
100000000 loops, best of 3: 0.011 usec per loop

In my case, I use Windows so I have to use double quotes instead of single.

C:\Users\Dr. Printer>python -mtimeit -s"a = 0"
100000000 loops, best of 3: 0.011 usec per loop

回答 8

我在postgresql函数中遇到此错误。我有一个较长的SQL,使用\分成多行,以提高可读性。但是,这就是问题所在。我删除了所有内容,并将它们放在一行中以解决此问题。我正在使用pgadmin III。

I was getting this error in postgresql function. I had a long SQL which I broke into multiple lines with \ for better readability. However, that was the problem. I removed all and made them in one line to fix the issue. I was using pgadmin III.


回答 9

就Mac OS X而言,我有以下陈述:

model.export_srcpkg(platform, toolchain, 'mymodel_pkg.zip', 'mymodel.dylib’)

我收到错误:

  File "<stdin>", line 1
model.export_srcpkg(platform, toolchain, 'mymodel_pkg.zip', 'mymodel.dylib’)
                                                                             ^
SyntaxError: EOL while scanning string literal

在我更改为:

model.export_srcpkg(platform, toolchain, "mymodel_pkg.zip", "mymodel.dylib")

有效…

大卫

In my case with Mac OS X, I had the following statement:

model.export_srcpkg(platform, toolchain, 'mymodel_pkg.zip', 'mymodel.dylib’)

I was getting the error:

  File "<stdin>", line 1
model.export_srcpkg(platform, toolchain, 'mymodel_pkg.zip', 'mymodel.dylib’)
                                                                             ^
SyntaxError: EOL while scanning string literal

After I change to:

model.export_srcpkg(platform, toolchain, "mymodel_pkg.zip", "mymodel.dylib")

It worked…

David


回答 10

您可以尝试以下方法:

s = r'long\annoying\path'

You can try this:

s = r'long\annoying\path'

回答 11

variable(s1)跨多行。为了做到这一点(即您希望您的字符串跨越多行),必须使用三引号(“”“)。

s1="""some very long 
string............"""

Your variable(s1) spans multiple lines. In order to do this (i.e you want your string to span multiple lines), you have to use triple quotes(“””).

s1="""some very long 
string............"""

回答 12

在这种情况下,三个单引号或三个双引号都可以使用!例如:

    """Parameters:
    ...Type something.....
    .....finishing statement"""

要么

    '''Parameters:
    ...Type something.....
    .....finishing statement'''

In this case, three single quotations or three double quotations both will work! For example:

    """Parameters:
    ...Type something.....
    .....finishing statement"""

OR

    '''Parameters:
    ...Type something.....
    .....finishing statement'''

回答 13

以前的大多数答案都是正确的,我的答案与aaronasterling非常相似,您也可以用3个单引号s1 =”’一些很长的字符串………”’

Most previous answers are correct and my answer is very similar to aaronasterling, you could also do 3 single quotations s1=”’some very long string…………”’


回答 14

访问任何硬盘目录时,我都遇到了同样的问题。然后我以这种方式解决了。

 import os
 os.startfile("D:\folder_name\file_name") #running shortcut
 os.startfile("F:") #accessing directory

上图显示了错误和已解决的输出。

I had faced the same problem while accessing any hard drive directory. Then I solved it in this way.

 import os
 os.startfile("D:\folder_name\file_name") #running shortcut
 os.startfile("F:") #accessing directory

The picture above shows an error and resolved output.


如何将字节字符串转换为int?

问题:如何将字节字符串转换为int?

如何在python中将字节字符串转换为int?

这样说: 'y\xcc\xa6\xbb'

我想出了一个聪明/愚蠢的方法:

sum(ord(c) << (i * 8) for i, c in enumerate('y\xcc\xa6\xbb'[::-1]))

我知道必须有内置的东西或在标准库中可以更简单地执行此操作…

这与转换可以使用int(xxx,16)的十六进制数字字符串不同,但是我想转换一个实际字节值的字符串。

更新:

我有点喜欢James的回答,因为它不需要导入另一个模块,但是Greg的方法更快:

>>> from timeit import Timer
>>> Timer('struct.unpack("<L", "y\xcc\xa6\xbb")[0]', 'import struct').timeit()
0.36242198944091797
>>> Timer("int('y\xcc\xa6\xbb'.encode('hex'), 16)").timeit()
1.1432669162750244

我的骇客方法:

>>> Timer("sum(ord(c) << (i * 8) for i, c in enumerate('y\xcc\xa6\xbb'[::-1]))").timeit()
2.8819329738616943

进一步更新:

有人在评论中问导入另一个模块有什么问题。好吧,导入模块不一定便宜,请看一下:

>>> Timer("""import struct\nstruct.unpack(">L", "y\xcc\xa6\xbb")[0]""").timeit()
0.98822188377380371

包括导入模块的成本,几乎抵消了此方法的所有优点。我认为,这仅包括在整个基准测试运行中一次导入一次的费用;看一下我每次强制重新加载时会发生什么:

>>> Timer("""reload(struct)\nstruct.unpack(">L", "y\xcc\xa6\xbb")[0]""", 'import struct').timeit()
68.474128007888794

不用说,如果您每次导入都执行此方法很多次,则成比例地减少了一个问题。也可能是I / O成本而不是CPU,因此它可能取决于特定计算机的容量和负载特性。

How can I convert a string of bytes into an int in python?

Say like this: 'y\xcc\xa6\xbb'

I came up with a clever/stupid way of doing it:

sum(ord(c) << (i * 8) for i, c in enumerate('y\xcc\xa6\xbb'[::-1]))

I know there has to be something builtin or in the standard library that does this more simply…

This is different from converting a string of hex digits for which you can use int(xxx, 16), but instead I want to convert a string of actual byte values.

UPDATE:

I kind of like James’ answer a little better because it doesn’t require importing another module, but Greg’s method is faster:

>>> from timeit import Timer
>>> Timer('struct.unpack("<L", "y\xcc\xa6\xbb")[0]', 'import struct').timeit()
0.36242198944091797
>>> Timer("int('y\xcc\xa6\xbb'.encode('hex'), 16)").timeit()
1.1432669162750244

My hacky method:

>>> Timer("sum(ord(c) << (i * 8) for i, c in enumerate('y\xcc\xa6\xbb'[::-1]))").timeit()
2.8819329738616943

FURTHER UPDATE:

Someone asked in comments what’s the problem with importing another module. Well, importing a module isn’t necessarily cheap, take a look:

>>> Timer("""import struct\nstruct.unpack(">L", "y\xcc\xa6\xbb")[0]""").timeit()
0.98822188377380371

Including the cost of importing the module negates almost all of the advantage that this method has. I believe that this will only include the expense of importing it once for the entire benchmark run; look what happens when I force it to reload every time:

>>> Timer("""reload(struct)\nstruct.unpack(">L", "y\xcc\xa6\xbb")[0]""", 'import struct').timeit()
68.474128007888794

Needless to say, if you’re doing a lot of executions of this method per one import than this becomes proportionally less of an issue. It’s also probably i/o cost rather than cpu so it may depend on the capacity and load characteristics of the particular machine.


回答 0

您还可以使用struct模块来执行此操作:

>>> struct.unpack("<L", "y\xcc\xa6\xbb")[0]
3148270713L

You can also use the struct module to do this:

>>> struct.unpack("<L", "y\xcc\xa6\xbb")[0]
3148270713L

回答 1

在Python 3.2和更高版本中,使用

>>> int.from_bytes(b'y\xcc\xa6\xbb', byteorder='big')
2043455163

要么

>>> int.from_bytes(b'y\xcc\xa6\xbb', byteorder='little')
3148270713

根据您的字节字符串的字节序

这也适用于任意长度的字节字符串整数,并且通过指定,可用于以二进制补码的整数signed=True。请参阅有关的文档from_bytes

In Python 3.2 and later, use

>>> int.from_bytes(b'y\xcc\xa6\xbb', byteorder='big')
2043455163

or

>>> int.from_bytes(b'y\xcc\xa6\xbb', byteorder='little')
3148270713

according to the endianness of your byte-string.

This also works for bytestring-integers of arbitrary length, and for two’s-complement signed integers by specifying signed=True. See the docs for from_bytes.


回答 2

正如Greg所说的,如果要处理二进制值,则可以使用struct,但是如果您只有一个“十六进制数”,但是以字节格式,则可能需要将其转换为:

s = 'y\xcc\xa6\xbb'
num = int(s.encode('hex'), 16)

…与以下内容相同:

num = struct.unpack(">L", s)[0]

…除了适用于任何数量的字节。

As Greg said, you can use struct if you are dealing with binary values, but if you just have a “hex number” but in byte format you might want to just convert it like:

s = 'y\xcc\xa6\xbb'
num = int(s.encode('hex'), 16)

…this is the same as:

num = struct.unpack(">L", s)[0]

…except it’ll work for any number of bytes.


回答 3

我使用以下函数在int,hex和字节之间转换数据。

def bytes2int(str):
 return int(str.encode('hex'), 16)

def bytes2hex(str):
 return '0x'+str.encode('hex')

def int2bytes(i):
 h = int2hex(i)
 return hex2bytes(h)

def int2hex(i):
 return hex(i)

def hex2int(h):
 if len(h) > 1 and h[0:2] == '0x':
  h = h[2:]

 if len(h) % 2:
  h = "0" + h

 return int(h, 16)

def hex2bytes(h):
 if len(h) > 1 and h[0:2] == '0x':
  h = h[2:]

 if len(h) % 2:
  h = "0" + h

 return h.decode('hex')

资料来源:http : //opentechnotes.blogspot.com.au/2014/04/convert-values-to-from-integer-hex.html

I use the following function to convert data between int, hex and bytes.

def bytes2int(str):
 return int(str.encode('hex'), 16)

def bytes2hex(str):
 return '0x'+str.encode('hex')

def int2bytes(i):
 h = int2hex(i)
 return hex2bytes(h)

def int2hex(i):
 return hex(i)

def hex2int(h):
 if len(h) > 1 and h[0:2] == '0x':
  h = h[2:]

 if len(h) % 2:
  h = "0" + h

 return int(h, 16)

def hex2bytes(h):
 if len(h) > 1 and h[0:2] == '0x':
  h = h[2:]

 if len(h) % 2:
  h = "0" + h

 return h.decode('hex')

Source: http://opentechnotes.blogspot.com.au/2014/04/convert-values-to-from-integer-hex.html


回答 4

import array
integerValue = array.array("I", 'y\xcc\xa6\xbb')[0]

警告:以上内容是特定于平台的。“ I”说明符和string-> int转换的字节序都取决于您的特定Python实现。但是,如果要一次转换许多整数/字符串,则数组模块可以快速完成转换。

import array
integerValue = array.array("I", 'y\xcc\xa6\xbb')[0]

Warning: the above is strongly platform-specific. Both the “I” specifier and the endianness of the string->int conversion are dependent on your particular Python implementation. But if you want to convert many integers/strings at once, then the array module does it quickly.


回答 5

在Python 2.x中,您可以将格式说明符<B用于无符号字节,以及<b用于带struct.unpack/的有符号字节struct.pack

例如:

x='\xff\x10\x11'

data_ints = struct.unpack('<' + 'B'*len(x), x) # [255, 16, 17]

和:

data_bytes = struct.pack('<' + 'B'*len(data_ints), *data_ints) # '\xff\x10\x11'

*是必须的!

看到 https://docs.python.org/2/library/struct.html#format-characters获取格式说明符列表。

In Python 2.x, you could use the format specifiers <B for unsigned bytes, and <b for signed bytes with struct.unpack/struct.pack.

E.g:

Let x = '\xff\x10\x11'

data_ints = struct.unpack('<' + 'B'*len(x), x) # [255, 16, 17]

And:

data_bytes = struct.pack('<' + 'B'*len(data_ints), *data_ints) # '\xff\x10\x11'

That * is required!

See https://docs.python.org/2/library/struct.html#format-characters for a list of the format specifiers.


回答 6

>>> reduce(lambda s, x: s*256 + x, bytearray("y\xcc\xa6\xbb"))
2043455163

测试1:逆:

>>> hex(2043455163)
'0x79cca6bb'

测试2:字节数> 8:

>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAAA"))
338822822454978555838225329091068225L

测试3:加1:

>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAAB"))
338822822454978555838225329091068226L

测试4:附加一个字节,说“ A”:

>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAABA"))
86738642548474510294585684247313465921L

测试5:除以256:

>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAABA"))/256
338822822454978555838225329091068226L

结果等于预期的测试4的结果。

>>> reduce(lambda s, x: s*256 + x, bytearray("y\xcc\xa6\xbb"))
2043455163

Test 1: inverse:

>>> hex(2043455163)
'0x79cca6bb'

Test 2: Number of bytes > 8:

>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAAA"))
338822822454978555838225329091068225L

Test 3: Increment by one:

>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAAB"))
338822822454978555838225329091068226L

Test 4: Append one byte, say ‘A’:

>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAABA"))
86738642548474510294585684247313465921L

Test 5: Divide by 256:

>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAABA"))/256
338822822454978555838225329091068226L

Result equals the result of Test 4, as expected.


回答 7

我一直在努力寻找适用于Python 2.x的任意长度字节序列的解决方案。最后,我写了这个,有点麻烦,因为它执行字符串转换,但是可以用。

Python 2.x的函数,任意长度

def signedbytes(data):
    """Convert a bytearray into an integer, considering the first bit as
    sign. The data must be big-endian."""
    negative = data[0] & 0x80 > 0

    if negative:
        inverted = bytearray(~d % 256 for d in data)
        return -signedbytes(inverted) - 1

    encoded = str(data).encode('hex')
    return int(encoded, 16)

此功能有两个要求:

  • 输入data必须为bytearray。您可以这样调用函数:

    s = 'y\xcc\xa6\xbb'
    n = signedbytes(s)
  • 数据必须是大端的。如果您有一个小端值,则应首先将其取反:

    n = signedbytes(s[::-1])

当然,仅在需要任意长度时才应使用此选项。否则,请遵循更多标准方法(例如struct)。

I was struggling to find a solution for arbitrary length byte sequences that would work under Python 2.x. Finally I wrote this one, it’s a bit hacky because it performs a string conversion, but it works.

Function for Python 2.x, arbitrary length

def signedbytes(data):
    """Convert a bytearray into an integer, considering the first bit as
    sign. The data must be big-endian."""
    negative = data[0] & 0x80 > 0

    if negative:
        inverted = bytearray(~d % 256 for d in data)
        return -signedbytes(inverted) - 1

    encoded = str(data).encode('hex')
    return int(encoded, 16)

This function has two requirements:

  • The input data needs to be a bytearray. You may call the function like this:

    s = 'y\xcc\xa6\xbb'
    n = signedbytes(s)
    
  • The data needs to be big-endian. In case you have a little-endian value, you should reverse it first:

    n = signedbytes(s[::-1])
    

Of course, this should be used only if arbitrary length is needed. Otherwise, stick with more standard ways (e.g. struct).


回答 8

如果版本> = 3.2,则int.from_bytes是最佳解决方案。“ struct.unpack”解决方案需要一个字符串,因此它不适用于字节数组。这是另一种解决方案:

def bytes2int( tb, order='big'):
    if order == 'big': seq=[0,1,2,3]
    elif order == 'little': seq=[3,2,1,0]
    i = 0
    for j in seq: i = (i<<8)+tb[j]
    return i

hex(bytes2int([0x87,0x65,0x43,0x21]))返回’0x87654321’。

它处理大小字节序,很容易修改为8个字节

int.from_bytes is the best solution if you are at version >=3.2. The “struct.unpack” solution requires a string so it will not apply to arrays of bytes. Here is another solution:

def bytes2int( tb, order='big'):
    if order == 'big': seq=[0,1,2,3]
    elif order == 'little': seq=[3,2,1,0]
    i = 0
    for j in seq: i = (i<<8)+tb[j]
    return i

hex( bytes2int( [0x87, 0x65, 0x43, 0x21])) returns ‘0x87654321’.

It handles big and little endianness and is easily modifiable for 8 bytes


回答 9

如上文使用所提unpack的功能结构是一个很好的方式。如果要实现自己的功能,则还有另一种解决方案:

def bytes_to_int(bytes):
    result = 0
    for b in bytes:
        result = result * 256 + int(b)
return result

As mentioned above using unpack function of struct is a good way. If you want to implement your own function there is an another solution:

def bytes_to_int(bytes):
    result = 0
    for b in bytes:
        result = result * 256 + int(b)
return result

回答 10

在python 3中,您可以通过以下方式轻松地将字节字符串转换为整数列表(0..255)

>>> list(b'y\xcc\xa6\xbb')
[121, 204, 166, 187]

In python 3 you can easily convert a byte string into a list of integers (0..255) by

>>> list(b'y\xcc\xa6\xbb')
[121, 204, 166, 187]

回答 11

一种使用array.array的快速方法,我已经使用了一段时间:

预定义变量:

offset = 0
size = 4
big = True # endian
arr = array('B')
arr.fromstring("\x00\x00\xff\x00") # 5 bytes (encoding issues) [0, 0, 195, 191, 0]

诠释为:(阅读)

val = 0
for v in arr[offset:offset+size][::pow(-1,not big)]: val = (val<<8)|v

来自int:(写)

val = 16384
arr[offset:offset+size] = \
    array('B',((val>>(i<<3))&255 for i in range(size)))[::pow(-1,not big)]

这些可能会更快一些。

编辑:
对于某些数字,这是一项性能测试(Anaconda 2.3.0),与以下各项相比,显示出稳定的平均读数reduce()

========================= byte array to int.py =========================
5000 iterations; threshold of min + 5000ns:
______________________________________code___|_______min______|_______max______|_______avg______|_efficiency
⣿⠀⠀⠀⠀⡇⢀⡀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⡀⠀⢰⠀⠀⠀⢰⠀⠀⠀⢸⠀⠀⢀⡇⠀⢀⠀⠀⠀⠀⢠⠀⠀⠀⠀⢰⠀⠀⠀⢸⡀⠀⠀⠀⢸⠀⡇⠀⠀⢠⠀⢰⠀⢸⠀
⣿⣦⣴⣰⣦⣿⣾⣧⣤⣷⣦⣤⣶⣾⣿⣦⣼⣶⣷⣶⣸⣴⣤⣀⣾⣾⣄⣤⣾⡆⣾⣿⣿⣶⣾⣾⣶⣿⣤⣾⣤⣤⣴⣼⣾⣼⣴⣤⣼⣷⣆⣴⣴⣿⣾⣷⣧⣶⣼⣴⣿⣶⣿⣶
    val = 0 \nfor v in arr: val = (val<<8)|v |     5373.848ns |   850009.965ns |     ~8649.64ns |  62.128%
⡇⠀⠀⢀⠀⠀⠀⡇⠀⡇⠀⠀⣠⠀⣿⠀⠀⠀⠀⡀⠀⠀⡆⠀⡆⢰⠀⠀⡆⠀⡄⠀⠀⠀⢠⢀⣼⠀⠀⡇⣠⣸⣤⡇⠀⡆⢸⠀⠀⠀⠀⢠⠀⢠⣿⠀⠀⢠⠀⠀⢸⢠⠀⡀
⣧⣶⣶⣾⣶⣷⣴⣿⣾⡇⣤⣶⣿⣸⣿⣶⣶⣶⣶⣧⣷⣼⣷⣷⣷⣿⣦⣴⣧⣄⣷⣠⣷⣶⣾⣸⣿⣶⣶⣷⣿⣿⣿⣷⣧⣷⣼⣦⣶⣾⣿⣾⣼⣿⣿⣶⣶⣼⣦⣼⣾⣿⣶⣷
                  val = reduce( shift, arr ) |     6489.921ns |  5094212.014ns |   ~12040.269ns |  53.902%

这是原始性能测试,因此省略了endian pow-flip。
shift显示的函数与for循环应用相同的移位或运算,并且该函数的迭代性能arr仅次于array.array('B',[0,0,255,0])dict

我可能还应该注意到,效率是通过对平均时间的准确性来衡量的。

A decently speedy method utilizing array.array I’ve been using for some time:

predefined variables:

offset = 0
size = 4
big = True # endian
arr = array('B')
arr.fromstring("\x00\x00\xff\x00") # 5 bytes (encoding issues) [0, 0, 195, 191, 0]

to int: (read)

val = 0
for v in arr[offset:offset+size][::pow(-1,not big)]: val = (val<<8)|v

from int: (write)

val = 16384
arr[offset:offset+size] = \
    array('B',((val>>(i<<3))&255 for i in range(size)))[::pow(-1,not big)]

It’s possible these could be faster though.

EDIT:
For some numbers, here’s a performance test (Anaconda 2.3.0) showing stable averages on read in comparison to reduce():

========================= byte array to int.py =========================
5000 iterations; threshold of min + 5000ns:
______________________________________code___|_______min______|_______max______|_______avg______|_efficiency
⣿⠀⠀⠀⠀⡇⢀⡀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⡀⠀⢰⠀⠀⠀⢰⠀⠀⠀⢸⠀⠀⢀⡇⠀⢀⠀⠀⠀⠀⢠⠀⠀⠀⠀⢰⠀⠀⠀⢸⡀⠀⠀⠀⢸⠀⡇⠀⠀⢠⠀⢰⠀⢸⠀
⣿⣦⣴⣰⣦⣿⣾⣧⣤⣷⣦⣤⣶⣾⣿⣦⣼⣶⣷⣶⣸⣴⣤⣀⣾⣾⣄⣤⣾⡆⣾⣿⣿⣶⣾⣾⣶⣿⣤⣾⣤⣤⣴⣼⣾⣼⣴⣤⣼⣷⣆⣴⣴⣿⣾⣷⣧⣶⣼⣴⣿⣶⣿⣶
    val = 0 \nfor v in arr: val = (val<<8)|v |     5373.848ns |   850009.965ns |     ~8649.64ns |  62.128%
⡇⠀⠀⢀⠀⠀⠀⡇⠀⡇⠀⠀⣠⠀⣿⠀⠀⠀⠀⡀⠀⠀⡆⠀⡆⢰⠀⠀⡆⠀⡄⠀⠀⠀⢠⢀⣼⠀⠀⡇⣠⣸⣤⡇⠀⡆⢸⠀⠀⠀⠀⢠⠀⢠⣿⠀⠀⢠⠀⠀⢸⢠⠀⡀
⣧⣶⣶⣾⣶⣷⣴⣿⣾⡇⣤⣶⣿⣸⣿⣶⣶⣶⣶⣧⣷⣼⣷⣷⣷⣿⣦⣴⣧⣄⣷⣠⣷⣶⣾⣸⣿⣶⣶⣷⣿⣿⣿⣷⣧⣷⣼⣦⣶⣾⣿⣾⣼⣿⣿⣶⣶⣼⣦⣼⣾⣿⣶⣷
                  val = reduce( shift, arr ) |     6489.921ns |  5094212.014ns |   ~12040.269ns |  53.902%

This is a raw performance test, so the endian pow-flip is left out.
The shift function shown applies the same shift-oring operation as the for loop, and arr is just array.array('B',[0,0,255,0]) as it has the fastest iterative performance next to dict.

I should probably also note efficiency is measured by accuracy to the average time.


如何在Python中获取字符串的大小?

问题:如何在Python中获取字符串的大小?

例如,我得到一个字符串:

str = "please answer my question"

我想将其写入文件。

但是在将字符串写入文件之前,我需要知道字符串的大小。我可以使用什么函数来计算字符串的大小?

For example, I get a string:

str = "please answer my question"

I want to write it to a file.

But I need to know the size of the string before writing the string to the file. What function can I use to calculate the size of the string?


回答 0

如果您在谈论字符串的长度,则可以使用len()

>>> s = 'please answer my question'
>>> len(s)  # number of characters in s
25

如果需要以字节为单位的字符串大小,则需要sys.getsizeof()

>>> import sys
>>> sys.getsizeof(s)
58

另外,不要调用您的字符串变量str。它遮盖了内置str()功能。

If you are talking about the length of the string, you can use len():

>>> s = 'please answer my question'
>>> len(s)  # number of characters in s
25

If you need the size of the string in bytes, you need sys.getsizeof():

>>> import sys
>>> sys.getsizeof(s)
58

Also, don’t call your string variable str. It shadows the built-in str() function.


回答 1

Python 3:

user225312的答案是正确的:

A.要计算str对象中的字符数,可以使用len()函数:

>>> print(len('please anwser my question'))
25

B.要获得分配给存储str对象的字节大小的内存,可以使用sys.getsizeof()函数

>>> from sys import getsizeof
>>> print(getsizeof('please anwser my question'))
50

Python 2:

对于Python 2,它变得复杂。

len()Python 2中的函数返回分配的字节数,以将编码的字符存储在str对象中。

有时,它等于字符数:

>>> print(len('abc'))
3

但是有时候,它不会:

>>> print(len('йцы'))  # String contains Cyrillic symbols
6

那是因为str可以在内部使用可变长度编码。因此,要计算字符数,str您应该知道str对象正在使用哪种编码。然后,您可以将其转换为unicode对象并获得字符数:

>>> print(len('йцы'.decode('utf8'))) #String contains Cyrillic symbols 
3

B.sys.getsizeof()功能与Python 3中的功能相同-它返回分配用于存储整个字符串对象的字节数

>>> print(getsizeof('йцы'))
27
>>> print(getsizeof('йцы'.decode('utf8')))
32

Python 3:

user225312’s answer is correct:

A. To count number of characters in str object, you can use len() function:

>>> print(len('please anwser my question'))
25

B. To get memory size in bytes allocated to store str object, you can use sys.getsizeof() function

>>> from sys import getsizeof
>>> print(getsizeof('please anwser my question'))
50

Python 2:

It gets complicated for Python 2.

A. The len() function in Python 2 returns count of bytes allocated to store encoded characters in a str object.

Sometimes it will be equal to character count:

>>> print(len('abc'))
3

But sometimes, it won’t:

>>> print(len('йцы'))  # String contains Cyrillic symbols
6

That’s because str can use variable-length encoding internally. So, to count characters in str you should know which encoding your str object is using. Then you can convert it to unicode object and get character count:

>>> print(len('йцы'.decode('utf8'))) #String contains Cyrillic symbols 
3

B. The sys.getsizeof() function does the same thing as in Python 3 – it returns count of bytes allocated to store the whole string object

>>> print(getsizeof('йцы'))
27
>>> print(getsizeof('йцы'.decode('utf8')))
32

回答 2

>>> s = 'abcd'
>>> len(s)
4
>>> s = 'abcd'
>>> len(s)
4

回答 3

您也可以使用str.len()计算列中元素的长度

data['name of column'].str.len() 

You also may use str.len() to count length of element in the column

data['name of column'].str.len() 

回答 4

Python的方式是使用len()。请记住,转义序列中的’\’字符不计算在内,如果使用不正确,可能会造成危险。

>>> len('foo')
3
>>> len('\foo')
3
>>> len('\xoo')
  File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \xXX escape

The most Pythonic way is to use the len(). Keep in mind that the ‘\’ character in escape sequences is not counted and can be dangerous if not used correctly.

>>> len('foo')
3
>>> len('\foo')
3
>>> len('\xoo')
  File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \xXX escape

在Python中的特定位置添加字符串

问题:在Python中的特定位置添加字符串

Python中是否可以使用任何函数在字符串的某个位置插入值?

像这样:

"3655879ACB6"然后在位置4添加"-"成为"3655-879ACB6"

Is there any function in Python that I can use to insert a value in a certain position of a string?

Something like this:

"3655879ACB6" then in position 4 add "-" to become "3655-879ACB6"


回答 0

否。Python字符串是不可变的。

>>> s='355879ACB6'
>>> s[4:4] = '-'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment

但是,可以创建一个具有插入字符的新字符串:

>>> s[:4] + '-' + s[4:]
'3558-79ACB6'

No. Python Strings are immutable.

>>> s='355879ACB6'
>>> s[4:4] = '-'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment

It is, however, possible to create a new string that has the inserted character:

>>> s[:4] + '-' + s[4:]
'3558-79ACB6'

回答 1

这看起来非常简单:

>>> hash = "355879ACB6"
>>> hash = hash[:4] + '-' + hash[4:]
>>> print hash
3558-79ACB6

但是,如果您喜欢类似函数的方法,请执行以下操作:

def insert_dash(string, index):
    return string[:index] + '-' + string[index:]

print insert_dash("355879ACB6", 5)

This seems very easy:

>>> hash = "355879ACB6"
>>> hash = hash[:4] + '-' + hash[4:]
>>> print hash
3558-79ACB6

However if you like something like a function do as this:

def insert_dash(string, index):
    return string[:index] + '-' + string[index:]

print insert_dash("355879ACB6", 5)

回答 2

由于字符串是不可变的,因此,另一种方法是将字符串转换为列表,然后可以对其进行索引和修改,而无需进行任何切片操作。但是,要使列表返回字符串,您必须使用.join()空字符串。

>>> hash = '355879ACB6'
>>> hashlist = list(hash)
>>> hashlist.insert(4, '-')
>>> ''.join(hashlist)
'3558-79ACB6'

我不确定这与性能相比如何,但是我确实觉得它比其他解决方案容易。;-)

As strings are immutable another way to do this would be to turn the string into a list, which can then be indexed and modified without any slicing trickery. However, to get the list back to a string you’d have to use .join() using an empty string.

>>> hash = '355879ACB6'
>>> hashlist = list(hash)
>>> hashlist.insert(4, '-')
>>> ''.join(hashlist)
'3558-79ACB6'

I am not sure how this compares as far as performance, but I do feel it’s easier on the eyes than the other solutions. ;-)


回答 3

简单的功能可以完成此任务:

def insert_str(string, str_to_insert, index):
    return string[:index] + str_to_insert + string[index:]

Simple function to accomplish this:

def insert_str(string, str_to_insert, index):
    return string[:index] + str_to_insert + string[index:]

回答 4

我已经做了一个非常有用的方法,可以在Python中的某个位置添加字符串

def insertChar(mystring, position, chartoinsert ):
    longi = len(mystring)
    mystring   =  mystring[:position] + chartoinsert + mystring[position:] 
    return mystring  

例如:

a = "Jorgesys was here!"

def insertChar(mystring, position, chartoinsert ):
    longi = len(mystring)
    mystring   =  mystring[:position] + chartoinsert + mystring[position:] 
    return mystring   

#Inserting some characters with a defined position:    
print(insertChar(a,0, '-'))    
print(insertChar(a,9, '@'))    
print(insertChar(a,14, '%'))   

我们将有一个输出:

-Jorgesys was here!
Jorgesys @was here!
Jorgesys was h%ere!

I have made a very useful method to add a string in a certain position in Python:

def insertChar(mystring, position, chartoinsert ):
    longi = len(mystring)
    mystring   =  mystring[:position] + chartoinsert + mystring[position:] 
    return mystring  

for example:

a = "Jorgesys was here!"

def insertChar(mystring, position, chartoinsert ):
    longi = len(mystring)
    mystring   =  mystring[:position] + chartoinsert + mystring[position:] 
    return mystring   

#Inserting some characters with a defined position:    
print(insertChar(a,0, '-'))    
print(insertChar(a,9, '@'))    
print(insertChar(a,14, '%'))   

we will have as an output:

-Jorgesys was here!
Jorgesys @was here!
Jorgesys was h%ere!

回答 5

我认为上述答案很好,但我会解释说,它们有一些意想不到但很好的副作用…

def insert(string_s, insert_s, pos_i=0):
    return string_s[:pos_i] + insert_s + string_s[pos_i:]

如果索引pos_i很小(太负),则将插入字符串。如果太长,则会附加插入字符串。如果pos_i在-len(string_s)和+ len(string_s)-1之间,则将插入字符串插入正确的位置。

I think the above answers are fine, but I would explain that there are some unexpected-but-good side effects to them…

def insert(string_s, insert_s, pos_i=0):
    return string_s[:pos_i] + insert_s + string_s[pos_i:]

If the index pos_i is very small (too negative), the insert string gets prepended. If too long, the insert string gets appended. If pos_i is between -len(string_s) and +len(string_s) – 1, the insert string gets inserted into the correct place.


回答 6

使用f-string的Python 3.6+:

mys = '1362511338314'
f"{mys[:10]}_{mys[10:]}"

'1362511338_314'

Python 3.6+ using f-string:

mys = '1362511338314'
f"{mys[:10]}_{mys[10:]}"

gives

'1362511338_314'

回答 7

如果您想插入很多

from rope.base.codeanalyze import ChangeCollector

c = ChangeCollector(code)
c.add_change(5, 5, '<span style="background-color:#339999;">')
c.add_change(10, 10, '</span>')
rend_code = c.get_changed()

If you want many inserts

from rope.base.codeanalyze import ChangeCollector

c = ChangeCollector(code)
c.add_change(5, 5, '<span style="background-color:#339999;">')
c.add_change(10, 10, '</span>')
rend_code = c.get_changed()