标签归档:string

如何在大熊猫中测试字符串是否包含列表中的子字符串之一?

问题:如何在大熊猫中测试字符串是否包含列表中的子字符串之一?

有没有这将是一个组合的等同的任何功能df.isin()df[col].str.contains()

例如,假设我有系列 s = pd.Series(['cat','hat','dog','fog','pet']),并且我想找到s包含的任何一个的所有地方['og', 'at'],那么我想得到除“宠物”以外的所有东西。

我有一个解决方案,但这很不雅致:

searchfor = ['og', 'at']
found = [s.str.contains(x) for x in searchfor]
result = pd.DataFrame[found]
result.any()

有一个更好的方法吗?

Is there any function that would be the equivalent of a combination of df.isin() and df[col].str.contains()?

For example, say I have the series s = pd.Series(['cat','hat','dog','fog','pet']), and I want to find all places where s contains any of ['og', 'at'], I would want to get everything but ‘pet’.

I have a solution, but it’s rather inelegant:

searchfor = ['og', 'at']
found = [s.str.contains(x) for x in searchfor]
result = pd.DataFrame[found]
result.any()

Is there a better way to do this?


回答 0

一种选择是仅使用正则表达式|字符尝试匹配系列中单词中的每个子字符串s(仍使用str.contains)。

您可以通过将单词searchfor与结合在一起来构造正则表达式|

>>> searchfor = ['og', 'at']
>>> s[s.str.contains('|'.join(searchfor))]
0    cat
1    hat
2    dog
3    fog
dtype: object

就像@AndyHayden在下面的注释中指出的那样,请注意您的子字符串是否具有特殊字符,例如$^您想在字面上进行匹配。这些字符在正则表达式的上下文中具有特定含义,并且会影响匹配。

您可以通过转义非字母数字字符来使子字符串列表更安全re.escape

>>> import re
>>> matches = ['$money', 'x^y']
>>> safe_matches = [re.escape(m) for m in matches]
>>> safe_matches
['\\$money', 'x\\^y']

与结合使用时,此新列表中带有的字符串将逐字匹配每个字符str.contains

One option is just to use the regex | character to try to match each of the substrings in the words in your Series s (still using str.contains).

You can construct the regex by joining the words in searchfor with |:

>>> searchfor = ['og', 'at']
>>> s[s.str.contains('|'.join(searchfor))]
0    cat
1    hat
2    dog
3    fog
dtype: object

As @AndyHayden noted in the comments below, take care if your substrings have special characters such as $ and ^ which you want to match literally. These characters have specific meanings in the context of regular expressions and will affect the matching.

You can make your list of substrings safer by escaping non-alphanumeric characters with re.escape:

>>> import re
>>> matches = ['$money', 'x^y']
>>> safe_matches = [re.escape(m) for m in matches]
>>> safe_matches
['\\$money', 'x\\^y']

The strings with in this new list will match each character literally when used with str.contains.


回答 1

您可以使用str.containsregex模式单独使用OR (|)

s[s.str.contains('og|at')]

或者您可以将系列添加到,dataframe然后使用str.contains

df = pd.DataFrame(s)
df[s.str.contains('og|at')] 

输出:

0 cat
1 hat
2 dog
3 fog 

You can use str.contains alone with a regex pattern using OR (|):

s[s.str.contains('og|at')]

Or you could add the series to a dataframe then use str.contains:

df = pd.DataFrame(s)
df[s.str.contains('og|at')] 

Output:

0 cat
1 hat
2 dog
3 fog 

回答 2

这是一个单行lambda,它也可以工作:

df["TrueFalse"] = df['col1'].apply(lambda x: 1 if any(i in x for i in searchfor) else 0)

输入:

searchfor = ['og', 'at']

df = pd.DataFrame([('cat', 1000.0), ('hat', 2000000.0), ('dog', 1000.0), ('fog', 330000.0),('pet', 330000.0)], columns=['col1', 'col2'])

   col1  col2
0   cat 1000.0
1   hat 2000000.0
2   dog 1000.0
3   fog 330000.0
4   pet 330000.0

应用Lambda:

df["TrueFalse"] = df['col1'].apply(lambda x: 1 if any(i in x for i in searchfor) else 0)

输出:

    col1    col2        TrueFalse
0   cat     1000.0      1
1   hat     2000000.0   1
2   dog     1000.0      1
3   fog     330000.0    1
4   pet     330000.0    0

Here is a one line lambda that also works:

df["TrueFalse"] = df['col1'].apply(lambda x: 1 if any(i in x for i in searchfor) else 0)

Input:

searchfor = ['og', 'at']

df = pd.DataFrame([('cat', 1000.0), ('hat', 2000000.0), ('dog', 1000.0), ('fog', 330000.0),('pet', 330000.0)], columns=['col1', 'col2'])

   col1  col2
0   cat 1000.0
1   hat 2000000.0
2   dog 1000.0
3   fog 330000.0
4   pet 330000.0

Apply Lambda:

df["TrueFalse"] = df['col1'].apply(lambda x: 1 if any(i in x for i in searchfor) else 0)

Output:

    col1    col2        TrueFalse
0   cat     1000.0      1
1   hat     2000000.0   1
2   dog     1000.0      1
3   fog     330000.0    1
4   pet     330000.0    0

替换字符串中字符的实例

问题:替换字符串中字符的实例

这个简单的代码仅尝试用冒号替换分号(在i指定的位置)不起作用:

for i in range(0,len(line)):
     if (line[i]==";" and i in rightindexarray):
         line[i]=":"

它给出了错误

line[i]=":"
TypeError: 'str' object does not support item assignment

如何解决此问题,以冒号代替分号?使用replace不起作用,因为该函数不使用索引-可能有一些我不想替换的分号。

在字符串中,我可能有许多分号,例如“ Hei der!; Hello there;!;”

我知道我想替换哪些(我在字符串中有索引)。使用替换无法正常工作,因为我无法对其使用索引。

This simple code that simply tries to replace semicolons (at i-specified postions) by colons does not work:

for i in range(0,len(line)):
     if (line[i]==";" and i in rightindexarray):
         line[i]=":"

It gives the error

line[i]=":"
TypeError: 'str' object does not support item assignment

How can I work around this to replace the semicolons by colons? Using replace does not work as that function takes no index- there might be some semicolons I do not want to replace.

Example

In the string I might have any number of semicolons, eg “Hei der! ; Hello there ;!;”

I know which ones I want to replace (I have their index in the string). Using replace does not work as I’m not able to use an index with it.


回答 0

python中的字符串是不可变的,因此您不能将它们视为列表并分配给索引。

使用.replace()来代替:

line = line.replace(';', ':')

如果您只需要替换某些分号,则需要更具体。您可以使用切片来分隔要替换的字符串部分:

line = line[:10].replace(';', ':') + line[10:]

这将替换字符串的前10个字符中的所有分号。

Strings in python are immutable, so you cannot treat them as a list and assign to indices.

Use .replace() instead:

line = line.replace(';', ':')

If you need to replace only certain semicolons, you’ll need to be more specific. You could use slicing to isolate the section of the string to replace in:

line = line[:10].replace(';', ':') + line[10:]

That’ll replace all semi-colons in the first 10 characters of the string.


回答 1

如果您不想使用以下字符,可以执行以下操作,以给定索引将任何字符替换为相应的字符: .replace()

word = 'python'
index = 4
char = 'i'

word = word[:index] + char + word[index + 1:]
print word

o/p: pythin

You can do the below, to replace any char with a respective char at a given index, if you wish not to use .replace()

word = 'python'
index = 4
char = 'i'

word = word[:index] + char + word[index + 1:]
print word

o/p: pythin

回答 2

把字符串变成一个列表;那么您可以单独更改字符。然后,您可以将其放回原处.join

s = 'a;b;c;d'
slist = list(s)
for i, c in enumerate(slist):
    if slist[i] == ';' and 0 <= i <= 3: # only replaces semicolons in the first part of the text
        slist[i] = ':'
s = ''.join(slist)
print s # prints a:b:c;d

Turn the string into a list; then you can change the characters individually. Then you can put it back together with .join:

s = 'a;b;c;d'
slist = list(s)
for i, c in enumerate(slist):
    if slist[i] == ';' and 0 <= i <= 3: # only replaces semicolons in the first part of the text
        slist[i] = ':'
s = ''.join(slist)
print s # prints a:b:c;d

回答 3

如果要替换单个分号:

for i in range(0,len(line)):
 if (line[i]==";"):
     line = line[:i] + ":" + line[i+1:]

Havent对此进行了测试。

If you want to replace a single semicolon:

for i in range(0,len(line)):
 if (line[i]==";"):
     line = line[:i] + ":" + line[i+1:]

Havent tested it though.


回答 4

这应该涵盖了更一般的情况,但是您应该能够针对自己的目的对其进行自定义

def selectiveReplace(myStr):
    answer = []
    for index,char in enumerate(myStr):
        if char == ';':
            if index%2 == 1: # replace ';' in even indices with ":"
                answer.append(":")
            else:
                answer.append("!") # replace ';' in odd indices with "!"
        else:
            answer.append(char)
    return ''.join(answer)

希望这可以帮助

This should cover a slightly more general case, but you should be able to customize it for your purpose

def selectiveReplace(myStr):
    answer = []
    for index,char in enumerate(myStr):
        if char == ';':
            if index%2 == 1: # replace ';' in even indices with ":"
                answer.append(":")
            else:
                answer.append("!") # replace ';' in odd indices with "!"
        else:
            answer.append(char)
    return ''.join(answer)

Hope this helps


回答 5

您不能简单地为字符串中的字符分配值。使用此方法替换特定字符的值:

name = "India"
result=name .replace("d",'*')

输出:In * ia

另外,如果要替换第一个字符以外的所有第一个字符,请说*,例如。字符串=混音输出= ba ** le

码:

name = "babble"
front= name [0:1]
fromSecondCharacter = name [1:]
back=fromSecondCharacter.replace(front,'*')
return front+back

You cannot simply assign value to a character in the string. Use this method to replace value of a particular character:

name = "India"
result=name .replace("d",'*')

Output: In*ia

Also, if you want to replace say * for all the occurrences of the first character except the first character, eg. string = babble output = ba**le

Code:

name = "babble"
front= name [0:1]
fromSecondCharacter = name [1:]
back=fromSecondCharacter.replace(front,'*')
return front+back

回答 6

如果要替换为变量“ n”中指定的索引值,请尝试以下操作:

def missing_char(str, n):
 str=str.replace(str[n],":")
 return str

If you are replacing by an index value specified in variable ‘n’, then try the below:

def missing_char(str, n):
 str=str.replace(str[n],":")
 return str

回答 7

这个怎么样:

sentence = 'After 1500 years of that thinking surpressed'

sentence = sentence.lower()

def removeLetter(text,char):

    result = ''
    for c in text:
        if c != char:
            result += c
    return text.replace(char,'*')
text = removeLetter(sentence,'a')

How about this:

sentence = 'After 1500 years of that thinking surpressed'

sentence = sentence.lower()

def removeLetter(text,char):

    result = ''
    for c in text:
        if c != char:
            result += c
    return text.replace(char,'*')
text = removeLetter(sentence,'a')

回答 8

为了在字符串上有效地使用.replace()方法而不创建单独的列表,例如查看包含有空格的字符串的列表用户名,我们希望在每个用户名字符串中用下划线替换空格。

usernames = ["Joey Tribbiani", "Monica Geller", "Chandler Bing", "Phoebe Buffay"]

要替换每个用户名中的空格,请考虑在python中使用range函数。

for i in range(len(usernames)):
    usernames[i] = usernames[i].lower().replace(" ", "_")

print(usernames)

to use the .replace() method effectively on string without creating a separate list for example take a look at the list username containing string with some white space, we want to replace the white space with an underscore in each of the username string.

usernames = ["Joey Tribbiani", "Monica Geller", "Chandler Bing", "Phoebe Buffay"]

to replace the white spaces in each username consider using the range function in python.

for i in range(len(usernames)):
    usernames[i] = usernames[i].lower().replace(" ", "_")

print(usernames)

回答 9

要替换特定索引处的字符,功能如下:

def replace_char(s , n , c):
    n-=1
    s = s[0:n] + s[n:n+1].replace(s[n] , c) + s[n+1:]
    return s

其中s是字符串,n是索引,c是字符。

To replace a character at a specific index, the function is as follows:

def replace_char(s , n , c):
    n-=1
    s = s[0:n] + s[n:n+1].replace(s[n] , c) + s[n+1:]
    return s

where s is a string, n is index and c is a character.


回答 10

我写了这种方法来替换字符或替换特定实例的字符串。实例从0开始(如果将可选的inst参数更改为1,并将test_instance变量更改为1,则可以轻松将其更改为1。

def replace_instance(some_word, str_to_replace, new_str='', inst=0):
    return_word = ''
    char_index, test_instance = 0, 0
    while char_index < len(some_word):
        test_str = some_word[char_index: char_index + len(str_to_replace)]
        if test_str == str_to_replace:
            if test_instance == inst:
                return_word = some_word[:char_index] + new_str + some_word[char_index + len(str_to_replace):]
                break
            else:
                test_instance += 1
        char_index += 1
    return return_word

I wrote this method to replace characters or replace strings at a specific instance. instances start at 0 (this can easily be changed to 1 if you change the optional inst argument to 1, and test_instance variable to 1.

def replace_instance(some_word, str_to_replace, new_str='', inst=0):
    return_word = ''
    char_index, test_instance = 0, 0
    while char_index < len(some_word):
        test_str = some_word[char_index: char_index + len(str_to_replace)]
        if test_str == str_to_replace:
            if test_instance == inst:
                return_word = some_word[:char_index] + new_str + some_word[char_index + len(str_to_replace):]
                break
            else:
                test_instance += 1
        char_index += 1
    return return_word

回答 11

你可以这样做:

string = "this; is a; sample; ; python code;!;" #your desire string
result = ""
for i in range(len(string)):
    s = string[i]
    if (s == ";" and i in [4, 18, 20]): #insert your desire list
        s = ":"
    result = result + s
print(result)

You can do this:

string = "this; is a; sample; ; python code;!;" #your desire string
result = ""
for i in range(len(string)):
    s = string[i]
    if (s == ";" and i in [4, 18, 20]): #insert your desire list
        s = ":"
    result = result + s
print(result)

回答 12

名称= [“ Joey Tribbiani”,“ Monica Geller”,“ Chandler Bing”,“ Phoebe Buffay”]

用户名= []

for i in names:
    if " " in i:
        i = i.replace(" ", "_")
    print(i)

o,p Joey_Tribbiani Monica_Geller Chandler_Bing Phoebe_Buffay

names = ["Joey Tribbiani", "Monica Geller", "Chandler Bing", "Phoebe Buffay"]

usernames = []

for i in names:
    if " " in i:
        i = i.replace(" ", "_")
    print(i)

Output: Joey_Tribbiani Monica_Geller Chandler_Bing Phoebe_Buffay


如何在Python中将’false’转换为0并将’true’转换为1

问题:如何在Python中将’false’转换为0并将’true’转换为1

有没有一种方法可以将true类型转换unicode为1并将false类型转换unicode为0(在Python中)?

例如: x == 'true' and type(x) == unicode

我想要 x = 1

PS:我不想使用ifelse

Is there a way to convert true of type unicode to 1 and false of type unicode to 0 (in Python)?

For example: x == 'true' and type(x) == unicode

I want x = 1

PS: I don’t want to use ifelse.


回答 0

使用int()一个布尔测试:

x = int(x == 'true')

int()将布尔值转换为10。请注意,任何等于的值'true'都将导致0返回。

Use int() on a boolean test:

x = int(x == 'true')

int() turns the boolean into 1 or 0. Note that any value not equal to 'true' will result in 0 being returned.


回答 1

如果B是布尔数组,则写

B = B*1

(一些代码golfy。)

If B is a Boolean array, write

B = B*1

(A bit code golfy.)


回答 2

您可以使用x.astype('uint8')where x是布尔数组。

You can use x.astype('uint8') where x is your Boolean array.


回答 3

这是您的问题的另一种解决方案:

def to_bool(s):
    return 1 - sum(map(ord, s)) % 2
    # return 1 - sum(s.encode('ascii')) % 2  # Alternative for Python 3

它的工作原理因为ASCII码的总和'true'就是448,这是偶数,而的ASCII码的总和'false'就是523这是奇怪的。


关于此解决方案的有趣之处在于,如果输入不是'true' or 之一,则其结果是非常随机的'false'。一半的时间会回来0,另一半1encode如果输入不是ASCII ,变体using 将引发编码错误(从而增加行为的不确定性)。


认真地说,我认为最易读,更快捷的解决方案是使用if

def to_bool(s):
    return 1 if s == 'true' else 0

查看一些微基准测试:

In [14]: def most_readable(s):
    ...:     return 1 if s == 'true' else 0

In [15]: def int_cast(s):
    ...:     return int(s == 'true')

In [16]: def str2bool(s):
    ...:     try:
    ...:         return ['false', 'true'].index(s)
    ...:     except (ValueError, AttributeError):
    ...:         raise ValueError()

In [17]: def str2bool2(s):
    ...:     try:
    ...:         return ('false', 'true').index(s)
    ...:     except (ValueError, AttributeError):
    ...:         raise ValueError()

In [18]: def to_bool(s):
    ...:     return 1 - sum(s.encode('ascii')) % 2

In [19]: %timeit most_readable('true')
10000000 loops, best of 3: 112 ns per loop

In [20]: %timeit most_readable('false')
10000000 loops, best of 3: 109 ns per loop

In [21]: %timeit int_cast('true')
1000000 loops, best of 3: 259 ns per loop

In [22]: %timeit int_cast('false')
1000000 loops, best of 3: 262 ns per loop

In [23]: %timeit str2bool('true')
1000000 loops, best of 3: 343 ns per loop

In [24]: %timeit str2bool('false')
1000000 loops, best of 3: 325 ns per loop

In [25]: %timeit str2bool2('true')
1000000 loops, best of 3: 295 ns per loop

In [26]: %timeit str2bool2('false')
1000000 loops, best of 3: 277 ns per loop

In [27]: %timeit to_bool('true')
1000000 loops, best of 3: 607 ns per loop

In [28]: %timeit to_bool('false')
1000000 loops, best of 3: 612 ns per loop

请注意该怎么if解决办法是至少 2.5倍速度所有其他解决方案。避免使用s 是没有意义的,if除非这是某种家庭作业(在这种情况下,您本来不应该首先问这个问题)。

Here’s a yet another solution to your problem:

def to_bool(s):
    return 1 - sum(map(ord, s)) % 2
    # return 1 - sum(s.encode('ascii')) % 2  # Alternative for Python 3

It works because the sum of the ASCII codes of 'true' is 448, which is even, while the sum of the ASCII codes of 'false' is 523 which is odd.


The funny thing about this solution is that its result is pretty random if the input is not one of 'true' or 'false'. Half of the time it will return 0, and the other half 1. The variant using encode will raise an encoding error if the input is not ASCII (thus increasing the undefined-ness of the behaviour).


Seriously, I believe the most readable, and faster, solution is to use an if:

def to_bool(s):
    return 1 if s == 'true' else 0

See some microbenchmarks:

In [14]: def most_readable(s):
    ...:     return 1 if s == 'true' else 0

In [15]: def int_cast(s):
    ...:     return int(s == 'true')

In [16]: def str2bool(s):
    ...:     try:
    ...:         return ['false', 'true'].index(s)
    ...:     except (ValueError, AttributeError):
    ...:         raise ValueError()

In [17]: def str2bool2(s):
    ...:     try:
    ...:         return ('false', 'true').index(s)
    ...:     except (ValueError, AttributeError):
    ...:         raise ValueError()

In [18]: def to_bool(s):
    ...:     return 1 - sum(s.encode('ascii')) % 2

In [19]: %timeit most_readable('true')
10000000 loops, best of 3: 112 ns per loop

In [20]: %timeit most_readable('false')
10000000 loops, best of 3: 109 ns per loop

In [21]: %timeit int_cast('true')
1000000 loops, best of 3: 259 ns per loop

In [22]: %timeit int_cast('false')
1000000 loops, best of 3: 262 ns per loop

In [23]: %timeit str2bool('true')
1000000 loops, best of 3: 343 ns per loop

In [24]: %timeit str2bool('false')
1000000 loops, best of 3: 325 ns per loop

In [25]: %timeit str2bool2('true')
1000000 loops, best of 3: 295 ns per loop

In [26]: %timeit str2bool2('false')
1000000 loops, best of 3: 277 ns per loop

In [27]: %timeit to_bool('true')
1000000 loops, best of 3: 607 ns per loop

In [28]: %timeit to_bool('false')
1000000 loops, best of 3: 612 ns per loop

Notice how the if solution is at least 2.5x times faster than all the other solutions. It does not make sense to put as a requirement to avoid using ifs except if this is some kind of homework (in which case you shouldn’t have asked this in the first place).


回答 4

如果您需要从本身不是布尔值的字符串进行通用转换,则最好编写类似于以下所示的例程。秉承鸭子打字的精神,我没有默默地传递错误,而是将其转换为适合当前情况的错误。

>>> def str2bool(st):
try:
    return ['false', 'true'].index(st.lower())
except (ValueError, AttributeError):
    raise ValueError('no Valid Conversion Possible')


>>> str2bool('garbaze')

Traceback (most recent call last):
  File "<pyshell#106>", line 1, in <module>
    str2bool('garbaze')
  File "<pyshell#105>", line 5, in str2bool
    raise TypeError('no Valid COnversion Possible')
TypeError: no Valid Conversion Possible
>>> str2bool('false')
0
>>> str2bool('True')
1

If you need a general purpose conversion from a string which per se is not a bool, you should better write a routine similar to the one depicted below. In keeping with the spirit of duck typing, I have not silently passed the error but converted it as appropriate for the current scenario.

>>> def str2bool(st):
try:
    return ['false', 'true'].index(st.lower())
except (ValueError, AttributeError):
    raise ValueError('no Valid Conversion Possible')


>>> str2bool('garbaze')

Traceback (most recent call last):
  File "<pyshell#106>", line 1, in <module>
    str2bool('garbaze')
  File "<pyshell#105>", line 5, in str2bool
    raise TypeError('no Valid COnversion Possible')
TypeError: no Valid Conversion Possible
>>> str2bool('false')
0
>>> str2bool('True')
1

回答 5

布尔到整数: x = (x == 'true') + 0

现在x包含1,x == 'true'否则为0。

注意:x == 'true'将返回bool,然后将其与0一起转换为具有值(如果bool值为True则为1,否则为0)的int类型。

bool to int: x = (x == 'true') + 0

Now the x contains 1 if x == 'true' else 0.

Note: x == 'true' will return bool which then will be typecasted to int having value (1 if bool value is True else 0) when added with 0.


回答 6

仅与此:

const a = true; const b = false;

console.log(+ a); // 1 console.log(+ b); // 0

only with this:

const a = true; const b = false;

console.log(+a);//1 console.log(+b);//0


使用熊猫将字符串前缀添加到字符串列中的每个值

问题:使用熊猫将字符串前缀添加到字符串列中的每个值

我想在熊猫数据帧的所述列中的每个值的开头附加一个字符串(优雅)。我已经弄清楚该如何做,目前正在使用:

df.ix[(df['col'] != False), 'col'] = 'str'+df[(df['col'] != False), 'col']

这似乎是一件微不足道的事情-您是否知道其他任何方式(可能还会将该字符添加到该列为0或NaN的行中)?

如果还不清楚,我想转一下:

    col 
1     a
2     0

变成:

       col 
1     stra
2     str0

I would like to append a string to the start of each value in a said column of a pandas dataframe (elegantly). I already figured out how to kind-of do this and I am currently using:

df.ix[(df['col'] != False), 'col'] = 'str'+df[(df['col'] != False), 'col']

This seems one hell of an inelegant thing to do – do you know any other way (which maybe also adds the character to rows where that column is 0 or NaN)?

In case this is yet unclear, I would like to turn:

    col 
1     a
2     0

into:

       col 
1     stra
2     str0

回答 0

df['col'] = 'str' + df['col'].astype(str)

例:

>>> df = pd.DataFrame({'col':['a',0]})
>>> df
  col
0   a
1   0
>>> df['col'] = 'str' + df['col'].astype(str)
>>> df
    col
0  stra
1  str0
df['col'] = 'str' + df['col'].astype(str)

Example:

>>> df = pd.DataFrame({'col':['a',0]})
>>> df
  col
0   a
1   0
>>> df['col'] = 'str' + df['col'].astype(str)
>>> df
    col
0  stra
1  str0

回答 1

另外,您也可以使用apply组合format(或f字符串更好),如果例如还想添加后缀或操纵元素本身,我会觉得可读性更高:

df = pd.DataFrame({'col':['a', 0]})

df['col'] = df['col'].apply(lambda x: "{}{}".format('str', x))

这也会产生所需的输出:

    col
0  stra
1  str0

如果您使用的是Python 3.6+,则还可以使用f字符串:

df['col'] = df['col'].apply(lambda x: f"str{x}")

产生相同的输出。

f字符串版本几乎与@RomanPekar的解决方案(python 3.6.4)一样快:

df = pd.DataFrame({'col':['a', 0]*200000})

%timeit df['col'].apply(lambda x: f"str{x}")
117 ms ± 451 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit 'str' + df['col'].astype(str)
112 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

format但是,使用的确确实要慢得多:

%timeit df['col'].apply(lambda x: "{}{}".format('str', x))
185 ms ± 1.07 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

As an alternative, you can also use an apply combined with format (or better with f-strings) which I find slightly more readable if one e.g. also wants to add a suffix or manipulate the element itself:

df = pd.DataFrame({'col':['a', 0]})

df['col'] = df['col'].apply(lambda x: "{}{}".format('str', x))

which also yields the desired output:

    col
0  stra
1  str0

If you are using Python 3.6+, you can also use f-strings:

df['col'] = df['col'].apply(lambda x: f"str{x}")

yielding the same output.

The f-string version is almost as fast as @RomanPekar’s solution (python 3.6.4):

df = pd.DataFrame({'col':['a', 0]*200000})

%timeit df['col'].apply(lambda x: f"str{x}")
117 ms ± 451 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit 'str' + df['col'].astype(str)
112 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Using format, however, is indeed far slower:

%timeit df['col'].apply(lambda x: "{}{}".format('str', x))
185 ms ± 1.07 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

回答 2

您可以使用pandas.Series.map:

df['col'].map('str{}'.format)

它将在所有值之前加上“ str”一词。

You can use pandas.Series.map :

df['col'].map('str{}'.format)

It will apply the word “str” before all your values.


回答 3

如果使用加载表文件dtype=str
或将列类型转换为字符串,df['a'] = df['a'].astype(str)
则可以使用以下方法:

df['a']= 'col' + df['a'].str[:]

这种方法允许使用的前缀,追加和子集字符串df
适用于Pandas v0.23.4,v0.24.1。不了解较早的版本。

If you load you table file with dtype=str
or convert column type to string df['a'] = df['a'].astype(str)
then you can use such approach:

df['a']= 'col' + df['a'].str[:]

This approach allows prepend, append, and subset string of df.
Works on Pandas v0.23.4, v0.24.1. Don’t know about earlier versions.


回答 4

.loc的另一种解决方案:

df = pd.DataFrame({'col': ['a', 0]})
df.loc[df.index, 'col'] = 'string' + df['col'].astype(str)

这没有上述解决方案快(每个循环慢1ms以上),但在需要条件更改时可能有用,例如:

mask = (df['col'] == 0)
df.loc[mask, 'col'] = 'string' + df['col'].astype(str)

Another solution with .loc:

df = pd.DataFrame({'col': ['a', 0]})
df.loc[df.index, 'col'] = 'string' + df['col'].astype(str)

This is not as quick as solutions above (>1ms per loop slower) but may be useful in case you need conditional change, like:

mask = (df['col'] == 0)
df.loc[mask, 'col'] = 'string' + df['col'].astype(str)

字符串如何串联?

问题:字符串如何串联?

如何在python中连接字符串?

例如:

Section = 'C_type'

将其与Sec_形成字符串:

Sec_C_type

How to concatenate strings in python?

For example:

Section = 'C_type'

Concatenate it with Sec_ to form the string:

Sec_C_type

回答 0

最简单的方法是

Section = 'Sec_' + Section

但为了提高效率,请参阅:https : //waymoot.org/home/python_string/

The easiest way would be

Section = 'Sec_' + Section

But for efficiency, see: https://waymoot.org/home/python_string/


回答 1

您也可以这样做:

section = "C_type"
new_section = "Sec_%s" % section

这样,您不仅可以追加,还可以在字符串中的任意位置插入:

section = "C_type"
new_section = "Sec_%s_blah" % section

you can also do this:

section = "C_type"
new_section = "Sec_%s" % section

This allows you not only append, but also insert wherever in the string:

section = "C_type"
new_section = "Sec_%s_blah" % section

回答 2

只是一条评论,就像有人可能会发现它很有用-您可以一次连接多个字符串:

>>> a='rabbit'
>>> b='fox'
>>> print '%s and %s' %(a,b)
rabbit and fox

Just a comment, as someone may find it useful – you can concatenate more than one string in one go:

>>> a='rabbit'
>>> b='fox'
>>> print '%s and %s' %(a,b)
rabbit and fox

回答 3

连接字符串的更有效方法是:

加入():

效率很高,但有点难读。

>>> Section = 'C_type'  
>>> new_str = ''.join(['Sec_', Section]) # inserting a list of strings 
>>> print new_str 
>>> 'Sec_C_type'

字符串格式:

易于阅读,在大多数情况下比“ +”级联更快

>>> Section = 'C_type'
>>> print 'Sec_%s' % Section
>>> 'Sec_C_type'

More efficient ways of concatenating strings are:

join():

Very efficent, but a bit hard to read.

>>> Section = 'C_type'  
>>> new_str = ''.join(['Sec_', Section]) # inserting a list of strings 
>>> print new_str 
>>> 'Sec_C_type'

String formatting:

Easy to read and in most cases faster than ‘+’ concatenating

>>> Section = 'C_type'
>>> print 'Sec_%s' % Section
>>> 'Sec_C_type'

回答 4

使用+字符串连接为:

section = 'C_type'
new_section = 'Sec_' + section

Use + for string concatenation as:

section = 'C_type'
new_section = 'Sec_' + section

回答 5

要在python中连接字符串,请使用“ +”号

参考:http : //www.gidnetwork.com/b-40.html

To concatenate strings in python you use the “+” sign

ref: http://www.gidnetwork.com/b-40.html


回答 6

对于附加到现有字符串末尾的情况:

string = "Sec_"
string += "C_type"
print(string)

结果是

Sec_C_type

For cases of appending to end of existing string:

string = "Sec_"
string += "C_type"
print(string)

results in

Sec_C_type

查找字符串中子字符串的第n次出现

问题:查找字符串中子字符串的第n次出现

这似乎应该是微不足道的,但是我是Python的新手,并且希望以最Python的方式进行操作。

我想找到对应于字符串中第n个子字符串的索引。

一定有什么我想做的事情是

mystring.find("substring", 2nd)

如何在Python中实现?

This seems like it should be pretty trivial, but I am new at Python and want to do it the most Pythonic way.

I want to find the index corresponding to the n’th occurrence of a substring within a string.

There’s got to be something equivalent to what I WANT to do which is

mystring.find("substring", 2nd)

How can you achieve this in Python?


回答 0

我认为,Mark的迭代方法将是通常的方法。

这是字符串拆分的替代方法,通常可用于查找相关过程:

def findnth(haystack, needle, n):
    parts= haystack.split(needle, n+1)
    if len(parts)<=n+1:
        return -1
    return len(haystack)-len(parts[-1])-len(needle)

这是一种快速(有点脏,因为您必须选择一些无法与针头相匹配的谷壳)的单缸套:

'foo bar bar bar'.replace('bar', 'XXX', 1).find('bar')

Mark’s iterative approach would be the usual way, I think.

Here’s an alternative with string-splitting, which can often be useful for finding-related processes:

def findnth(haystack, needle, n):
    parts= haystack.split(needle, n+1)
    if len(parts)<=n+1:
        return -1
    return len(haystack)-len(parts[-1])-len(needle)

And here’s a quick (and somewhat dirty, in that you have to choose some chaff that can’t match the needle) one-liner:

'foo bar bar bar'.replace('bar', 'XXX', 1).find('bar')

回答 1

这是简单的迭代解决方案的更多Pythonic版本:

def find_nth(haystack, needle, n):
    start = haystack.find(needle)
    while start >= 0 and n > 1:
        start = haystack.find(needle, start+len(needle))
        n -= 1
    return start

例:

>>> find_nth("foofoofoofoo", "foofoo", 2)
6

如果要查找的第n个重叠出现needle,可以用1代替,增加len(needle),如下所示:

def find_nth_overlapping(haystack, needle, n):
    start = haystack.find(needle)
    while start >= 0 and n > 1:
        start = haystack.find(needle, start+1)
        n -= 1
    return start

例:

>>> find_nth_overlapping("foofoofoofoo", "foofoo", 2)
3

这比Mark的版本更容易阅读,并且不需要拆分版本或导入正则表达式模块的额外内存。与各种方法不同,它还遵守python Zen中的一些规则re

  1. 简单胜于复杂。
  2. 扁平比嵌套更好。
  3. 可读性很重要。

Here’s a more Pythonic version of the straightforward iterative solution:

def find_nth(haystack, needle, n):
    start = haystack.find(needle)
    while start >= 0 and n > 1:
        start = haystack.find(needle, start+len(needle))
        n -= 1
    return start

Example:

>>> find_nth("foofoofoofoo", "foofoo", 2)
6

If you want to find the nth overlapping occurrence of needle, you can increment by 1 instead of len(needle), like this:

def find_nth_overlapping(haystack, needle, n):
    start = haystack.find(needle)
    while start >= 0 and n > 1:
        start = haystack.find(needle, start+1)
        n -= 1
    return start

Example:

>>> find_nth_overlapping("foofoofoofoo", "foofoo", 2)
3

This is easier to read than Mark’s version, and it doesn’t require the extra memory of the splitting version or importing regular expression module. It also adheres to a few of the rules in the Zen of python, unlike the various re approaches:

  1. Simple is better than complex.
  2. Flat is better than nested.
  3. Readability counts.

回答 2

这将在字符串中找到子字符串的第二次出现。

def find_2nd(string, substring):
   return string.find(substring, string.find(substring) + 1)

编辑:我对性能没有考虑太多,但是快速递归可以帮助找到第n个出现的情况:

def find_nth(string, substring, n):
   if (n == 1):
       return string.find(substring)
   else:
       return string.find(substring, find_nth(string, substring, n - 1) + 1)

This will find the second occurrence of substring in string.

def find_2nd(string, substring):
   return string.find(substring, string.find(substring) + 1)

Edit: I haven’t thought much about the performance, but a quick recursion can help with finding the nth occurrence:

def find_nth(string, substring, n):
   if (n == 1):
       return string.find(substring)
   else:
       return string.find(substring, find_nth(string, substring, n - 1) + 1)

回答 3

了解正则表达式并不总是最好的解决方案,我可能在这里使用一个:

>>> import re
>>> s = "ababdfegtduab"
>>> [m.start() for m in re.finditer(r"ab",s)]
[0, 2, 11]
>>> [m.start() for m in re.finditer(r"ab",s)][2] #index 2 is third occurrence 
11

Understanding that regex is not always the best solution, I’d probably use one here:

>>> import re
>>> s = "ababdfegtduab"
>>> [m.start() for m in re.finditer(r"ab",s)]
[0, 2, 11]
>>> [m.start() for m in re.finditer(r"ab",s)][2] #index 2 is third occurrence 
11

回答 4

我提供了一些基准测试结果,以比较到目前为止介绍的最著名的方法,即@bobince findnth()(基于str.split())与@tgamblin find_nth()(或基于@Mark Byers)(基于str.find())。我还将与C扩展名(_find_nth.so)进行比较,以了解我们可以走多快。这里是find_nth.py

def findnth(haystack, needle, n):
    parts= haystack.split(needle, n+1)
    if len(parts)<=n+1:
        return -1
    return len(haystack)-len(parts[-1])-len(needle)

def find_nth(s, x, n=0, overlap=False):
    l = 1 if overlap else len(x)
    i = -l
    for c in xrange(n + 1):
        i = s.find(x, i + l)
        if i < 0:
            break
    return i

当然,如果字符串很大,性能最重要,因此假设我们要在1.3 GB的文件“ bigfile”中找到第1000001个换行符(’\ n’)。为了节省内存,我们希望处理mmap.mmap文件的对象表示形式:

In [1]: import _find_nth, find_nth, mmap

In [2]: f = open('bigfile', 'r')

In [3]: mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)

findnth()由于mmap.mmap对象不支持,因此已经存在第一个问题split()。因此,我们实际上必须将整个文件复制到内存中:

In [4]: %time s = mm[:]
CPU times: user 813 ms, sys: 3.25 s, total: 4.06 s
Wall time: 17.7 s

哎哟! 幸运的是s,我的Macbook Air仍可容纳4 GB内存,因此让我们进行基准测试findnth()

In [5]: %timeit find_nth.findnth(s, '\n', 1000000)
1 loops, best of 3: 29.9 s per loop

显然表现糟糕。让我们看看基于的方法是如何str.find()做到的:

In [6]: %timeit find_nth.find_nth(s, '\n', 1000000)
1 loops, best of 3: 774 ms per loop

好多了!显然,findnth()问题在于它被迫在期间复制字符串split(),这已经是我们第二次在after之后复制1.3 GB的数据了s = mm[:]。这里有第二个优点find_nth():我们可以mm直接使用它,因此文件的副本是必需的:

In [7]: %timeit find_nth.find_nth(mm, '\n', 1000000)
1 loops, best of 3: 1.21 s per loop

mmvs. 上似乎有一些小的性能损失s,但这表明find_nth()与1.2 s findnth的总和(47 s)相比,可以在1.2 s内获得答案。

我发现没有任何str.find()一种方法比基于方法的性能明显差于str.split()基于方法的情况,因此,在这一点上,我认为应该接受@tgamblin或@Mark Byers的答案,而不是@bobince的答案。

在我的测试中,上述版本find_nth()是我能想到的最快的纯Python解决方案(非常类似于@Mark Byers的版本)。让我们看看使用C扩展模块可以做的更好。这里是_find_nthmodule.c

#include <Python.h>
#include <string.h>

off_t _find_nth(const char *buf, size_t l, char c, int n) {
    off_t i;
    for (i = 0; i < l; ++i) {
        if (buf[i] == c && n-- == 0) {
            return i;
        }
    }
    return -1;
}

off_t _find_nth2(const char *buf, size_t l, char c, int n) {
    const char *b = buf - 1;
    do {
        b = memchr(b + 1, c, l);
        if (!b) return -1;
    } while (n--);
    return b - buf;
}

/* mmap_object is private in mmapmodule.c - replicate beginning here */
typedef struct {
    PyObject_HEAD
    char *data;
    size_t size;
} mmap_object;

typedef struct {
    const char *s;
    size_t l;
    char c;
    int n;
} params;

int parse_args(PyObject *args, params *P) {
    PyObject *obj;
    const char *x;

    if (!PyArg_ParseTuple(args, "Osi", &obj, &x, &P->n)) {
        return 1;
    }
    PyTypeObject *type = Py_TYPE(obj);

    if (type == &PyString_Type) {
        P->s = PyString_AS_STRING(obj);
        P->l = PyString_GET_SIZE(obj);
    } else if (!strcmp(type->tp_name, "mmap.mmap")) {
        mmap_object *m_obj = (mmap_object*) obj;
        P->s = m_obj->data;
        P->l = m_obj->size;
    } else {
        PyErr_SetString(PyExc_TypeError, "Cannot obtain char * from argument 0");
        return 1;
    }
    P->c = x[0];
    return 0;
}

static PyObject* py_find_nth(PyObject *self, PyObject *args) {
    params P;
    if (!parse_args(args, &P)) {
        return Py_BuildValue("i", _find_nth(P.s, P.l, P.c, P.n));
    } else {
        return NULL;    
    }
}

static PyObject* py_find_nth2(PyObject *self, PyObject *args) {
    params P;
    if (!parse_args(args, &P)) {
        return Py_BuildValue("i", _find_nth2(P.s, P.l, P.c, P.n));
    } else {
        return NULL;    
    }
}

static PyMethodDef methods[] = {
    {"find_nth", py_find_nth, METH_VARARGS, ""},
    {"find_nth2", py_find_nth2, METH_VARARGS, ""},
    {0}
};

PyMODINIT_FUNC init_find_nth(void) {
    Py_InitModule("_find_nth", methods);
}

这是setup.py文件:

from distutils.core import setup, Extension
module = Extension('_find_nth', sources=['_find_nthmodule.c'])
setup(ext_modules=[module])

像往常一样安装python setup.py install。C代码在这里发挥了优势,因为它仅限于查找单个字符,但是让我们看一下它有多快:

In [8]: %timeit _find_nth.find_nth(mm, '\n', 1000000)
1 loops, best of 3: 218 ms per loop

In [9]: %timeit _find_nth.find_nth(s, '\n', 1000000)
1 loops, best of 3: 216 ms per loop

In [10]: %timeit _find_nth.find_nth2(mm, '\n', 1000000)
1 loops, best of 3: 307 ms per loop

In [11]: %timeit _find_nth.find_nth2(s, '\n', 1000000)
1 loops, best of 3: 304 ms per loop

显然还快很多。有趣的是,内存中情况和映射情况之间的C级别没有差异。有趣的是_find_nth2(),它基于string.hmemchr()库函数,相对于以下简单的实现方式有所失落_find_nth():额外的“优化” memchr()显然是后退式的…

总而言之,findnth()(基于str.split())中的实现确实是一个坏主意,因为(a)由于需要进行复制,因此它对于较大的字符串表现出极大的性能,(b)根本不适用于mmap.mmap对象。在find_nth()(基于str.find())中的实现在所有情况下都应优先考虑(因此是该问题的公认答案)。

还有很大的改进空间,因为C扩展比纯Python代码快将近4倍,这表明可能存在专用Python库函数的情况。

I’m offering some benchmarking results comparing the most prominent approaches presented so far, namely @bobince’s findnth() (based on str.split()) vs. @tgamblin’s or @Mark Byers’ find_nth() (based on str.find()). I will also compare with a C extension (_find_nth.so) to see how fast we can go. Here is find_nth.py:

def findnth(haystack, needle, n):
    parts= haystack.split(needle, n+1)
    if len(parts)<=n+1:
        return -1
    return len(haystack)-len(parts[-1])-len(needle)

def find_nth(s, x, n=0, overlap=False):
    l = 1 if overlap else len(x)
    i = -l
    for c in xrange(n + 1):
        i = s.find(x, i + l)
        if i < 0:
            break
    return i

Of course, performance matters most if the string is large, so suppose we want to find the 1000001st newline (‘\n’) in a 1.3 GB file called ‘bigfile’. To save memory, we would like to work on an mmap.mmap object representation of the file:

In [1]: import _find_nth, find_nth, mmap

In [2]: f = open('bigfile', 'r')

In [3]: mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)

There is already the first problem with findnth(), since mmap.mmap objects don’t support split(). So we actually have to copy the whole file into memory:

In [4]: %time s = mm[:]
CPU times: user 813 ms, sys: 3.25 s, total: 4.06 s
Wall time: 17.7 s

Ouch! Fortunately s still fits in the 4 GB of memory of my Macbook Air, so let’s benchmark findnth():

In [5]: %timeit find_nth.findnth(s, '\n', 1000000)
1 loops, best of 3: 29.9 s per loop

Clearly a terrible performance. Let’s see how the approach based on str.find() does:

In [6]: %timeit find_nth.find_nth(s, '\n', 1000000)
1 loops, best of 3: 774 ms per loop

Much better! Clearly, findnth()‘s problem is that it is forced to copy the string during split(), which is already the second time we copied the 1.3 GB of data around after s = mm[:]. Here comes in the second advantage of find_nth(): We can use it on mm directly, such that zero copies of the file are required:

In [7]: %timeit find_nth.find_nth(mm, '\n', 1000000)
1 loops, best of 3: 1.21 s per loop

There appears to be a small performance penalty operating on mm vs. s, but this illustrates that find_nth() can get us an answer in 1.2 s compared to findnth‘s total of 47 s.

I found no cases where the str.find() based approach was significantly worse than the str.split() based approach, so at this point, I would argue that @tgamblin’s or @Mark Byers’ answer should be accepted instead of @bobince’s.

In my testing, the version of find_nth() above was the fastest pure Python solution I could come up with (very similar to @Mark Byers’ version). Let’s see how much better we can do with a C extension module. Here is _find_nthmodule.c:

#include <Python.h>
#include <string.h>

off_t _find_nth(const char *buf, size_t l, char c, int n) {
    off_t i;
    for (i = 0; i < l; ++i) {
        if (buf[i] == c && n-- == 0) {
            return i;
        }
    }
    return -1;
}

off_t _find_nth2(const char *buf, size_t l, char c, int n) {
    const char *b = buf - 1;
    do {
        b = memchr(b + 1, c, l);
        if (!b) return -1;
    } while (n--);
    return b - buf;
}

/* mmap_object is private in mmapmodule.c - replicate beginning here */
typedef struct {
    PyObject_HEAD
    char *data;
    size_t size;
} mmap_object;

typedef struct {
    const char *s;
    size_t l;
    char c;
    int n;
} params;

int parse_args(PyObject *args, params *P) {
    PyObject *obj;
    const char *x;

    if (!PyArg_ParseTuple(args, "Osi", &obj, &x, &P->n)) {
        return 1;
    }
    PyTypeObject *type = Py_TYPE(obj);

    if (type == &PyString_Type) {
        P->s = PyString_AS_STRING(obj);
        P->l = PyString_GET_SIZE(obj);
    } else if (!strcmp(type->tp_name, "mmap.mmap")) {
        mmap_object *m_obj = (mmap_object*) obj;
        P->s = m_obj->data;
        P->l = m_obj->size;
    } else {
        PyErr_SetString(PyExc_TypeError, "Cannot obtain char * from argument 0");
        return 1;
    }
    P->c = x[0];
    return 0;
}

static PyObject* py_find_nth(PyObject *self, PyObject *args) {
    params P;
    if (!parse_args(args, &P)) {
        return Py_BuildValue("i", _find_nth(P.s, P.l, P.c, P.n));
    } else {
        return NULL;    
    }
}

static PyObject* py_find_nth2(PyObject *self, PyObject *args) {
    params P;
    if (!parse_args(args, &P)) {
        return Py_BuildValue("i", _find_nth2(P.s, P.l, P.c, P.n));
    } else {
        return NULL;    
    }
}

static PyMethodDef methods[] = {
    {"find_nth", py_find_nth, METH_VARARGS, ""},
    {"find_nth2", py_find_nth2, METH_VARARGS, ""},
    {0}
};

PyMODINIT_FUNC init_find_nth(void) {
    Py_InitModule("_find_nth", methods);
}

Here is the setup.py file:

from distutils.core import setup, Extension
module = Extension('_find_nth', sources=['_find_nthmodule.c'])
setup(ext_modules=[module])

Install as usual with python setup.py install. The C code plays at an advantage here since it is limited to finding single characters, but let’s see how fast this is:

In [8]: %timeit _find_nth.find_nth(mm, '\n', 1000000)
1 loops, best of 3: 218 ms per loop

In [9]: %timeit _find_nth.find_nth(s, '\n', 1000000)
1 loops, best of 3: 216 ms per loop

In [10]: %timeit _find_nth.find_nth2(mm, '\n', 1000000)
1 loops, best of 3: 307 ms per loop

In [11]: %timeit _find_nth.find_nth2(s, '\n', 1000000)
1 loops, best of 3: 304 ms per loop

Clearly quite a bit faster still. Interestingly, there is no difference on the C level between the in-memory and mmapped cases. It is also interesting to see that _find_nth2(), which is based on string.h‘s memchr() library function, loses out against the straightforward implementation in _find_nth(): The additional “optimizations” in memchr() are apparently backfiring…

In conclusion, the implementation in findnth() (based on str.split()) is really a bad idea, since (a) it performs terribly for larger strings due to the required copying, and (b) it doesn’t work on mmap.mmap objects at all. The implementation in find_nth() (based on str.find()) should be preferred in all circumstances (and therefore be the accepted answer to this question).

There is still quite a bit of room for improvement, since the C extension ran almost a factor of 4 faster than the pure Python code, indicating that there might be a case for a dedicated Python library function.


回答 5

最简单的方法?

text = "This is a test from a test ok" 

firstTest = text.find('test')

print text.find('test', firstTest + 1)

Simplest way?

text = "This is a test from a test ok" 

firstTest = text.find('test')

print text.find('test', firstTest + 1)

回答 6

我可能会使用带有索引参数的find函数来做这样的事情:

def find_nth(s, x, n):
    i = -1
    for _ in range(n):
        i = s.find(x, i + len(x))
        if i == -1:
            break
    return i

print find_nth('bananabanana', 'an', 3)

我猜这不是特别的Pythonic,但是很简单。您可以使用递归来代替:

def find_nth(s, x, n, i = 0):
    i = s.find(x, i)
    if n == 1 or i == -1:
        return i 
    else:
        return find_nth(s, x, n - 1, i + len(x))

print find_nth('bananabanana', 'an', 3)

这是解决该问题的一种实用方法,但是我不知道这是否使其更具有Python风格。

I’d probably do something like this, using the find function that takes an index parameter:

def find_nth(s, x, n):
    i = -1
    for _ in range(n):
        i = s.find(x, i + len(x))
        if i == -1:
            break
    return i

print find_nth('bananabanana', 'an', 3)

It’s not particularly Pythonic I guess, but it’s simple. You could do it using recursion instead:

def find_nth(s, x, n, i = 0):
    i = s.find(x, i)
    if n == 1 or i == -1:
        return i 
    else:
        return find_nth(s, x, n - 1, i + len(x))

print find_nth('bananabanana', 'an', 3)

It’s a functional way to solve it, but I don’t know if that makes it more Pythonic.


回答 7

这将为您提供与匹配的起始索引数组yourstring

import re
indices = [s.start() for s in re.finditer(':', yourstring)]

那么您的第n个条目将是:

n = 2
nth_entry = indices[n-1]

当然,您必须小心索引范围。您可以获得这样的实例数yourstring

num_instances = len(indices)

This will give you an array of the starting indices for matches to yourstring:

import re
indices = [s.start() for s in re.finditer(':', yourstring)]

Then your nth entry would be:

n = 2
nth_entry = indices[n-1]

Of course you have to be careful with the index bounds. You can get the number of instances of yourstring like this:

num_instances = len(indices)

回答 8

这是使用re.finditer的另一种方法。
所不同的是,这只会尽可能地调查大海捞针

from re import finditer
from itertools import dropwhile
needle='an'
haystack='bananabanana'
n=2
next(dropwhile(lambda x: x[0]<n, enumerate(re.finditer(needle,haystack))))[1].start() 

Here is another approach using re.finditer.
The difference is that this only looks into the haystack as far as necessary

from re import finditer
from itertools import dropwhile
needle='an'
haystack='bananabanana'
n=2
next(dropwhile(lambda x: x[0]<n, enumerate(re.finditer(needle,haystack))))[1].start() 

回答 9

这是搜索a 或a 时应该工作的另一个re+ itertools版本。我会自由地承认这可能是过度设计的,但是出于某种原因,它使我感到很开心。strRegexpObject

import itertools
import re

def find_nth(haystack, needle, n = 1):
    """
    Find the starting index of the nth occurrence of ``needle`` in \
    ``haystack``.

    If ``needle`` is a ``str``, this will perform an exact substring
    match; if it is a ``RegexpObject``, this will perform a regex
    search.

    If ``needle`` doesn't appear in ``haystack``, return ``-1``. If
    ``needle`` doesn't appear in ``haystack`` ``n`` times,
    return ``-1``.

    Arguments
    ---------
    * ``needle`` the substring (or a ``RegexpObject``) to find
    * ``haystack`` is a ``str``
    * an ``int`` indicating which occurrence to find; defaults to ``1``

    >>> find_nth("foo", "o", 1)
    1
    >>> find_nth("foo", "o", 2)
    2
    >>> find_nth("foo", "o", 3)
    -1
    >>> find_nth("foo", "b")
    -1
    >>> import re
    >>> either_o = re.compile("[oO]")
    >>> find_nth("foo", either_o, 1)
    1
    >>> find_nth("FOO", either_o, 1)
    1
    """
    if (hasattr(needle, 'finditer')):
        matches = needle.finditer(haystack)
    else:
        matches = re.finditer(re.escape(needle), haystack)
    start_here = itertools.dropwhile(lambda x: x[0] < n, enumerate(matches, 1))
    try:
        return next(start_here)[1].start()
    except StopIteration:
        return -1

Here’s another re + itertools version that should work when searching for either a str or a RegexpObject. I will freely admit that this is likely over-engineered, but for some reason it entertained me.

import itertools
import re

def find_nth(haystack, needle, n = 1):
    """
    Find the starting index of the nth occurrence of ``needle`` in \
    ``haystack``.

    If ``needle`` is a ``str``, this will perform an exact substring
    match; if it is a ``RegexpObject``, this will perform a regex
    search.

    If ``needle`` doesn't appear in ``haystack``, return ``-1``. If
    ``needle`` doesn't appear in ``haystack`` ``n`` times,
    return ``-1``.

    Arguments
    ---------
    * ``needle`` the substring (or a ``RegexpObject``) to find
    * ``haystack`` is a ``str``
    * an ``int`` indicating which occurrence to find; defaults to ``1``

    >>> find_nth("foo", "o", 1)
    1
    >>> find_nth("foo", "o", 2)
    2
    >>> find_nth("foo", "o", 3)
    -1
    >>> find_nth("foo", "b")
    -1
    >>> import re
    >>> either_o = re.compile("[oO]")
    >>> find_nth("foo", either_o, 1)
    1
    >>> find_nth("FOO", either_o, 1)
    1
    """
    if (hasattr(needle, 'finditer')):
        matches = needle.finditer(haystack)
    else:
        matches = re.finditer(re.escape(needle), haystack)
    start_here = itertools.dropwhile(lambda x: x[0] < n, enumerate(matches, 1))
    try:
        return next(start_here)[1].start()
    except StopIteration:
        return -1

回答 10

基于modle13的答案,但没有re模块依赖性。

def iter_find(haystack, needle):
    return [i for i in range(0, len(haystack)) if haystack[i:].startswith(needle)]

我有点希望这是一个内置的字符串方法。

>>> iter_find("http://stackoverflow.com/questions/1883980/", '/')
[5, 6, 24, 34, 42]

Building on modle13‘s answer, but without the re module dependency.

def iter_find(haystack, needle):
    return [i for i in range(0, len(haystack)) if haystack[i:].startswith(needle)]

I kinda wish this was a builtin string method.

>>> iter_find("http://stackoverflow.com/questions/1883980/", '/')
[5, 6, 24, 34, 42]

回答 11

>>> s="abcdefabcdefababcdef"
>>> j=0
>>> for n,i in enumerate(s):
...   if s[n:n+2] =="ab":
...     print n,i
...     j=j+1
...     if j==2: print "2nd occurence at index position: ",n
...
0 a
6 a
2nd occurence at index position:  6
12 a
14 a
>>> s="abcdefabcdefababcdef"
>>> j=0
>>> for n,i in enumerate(s):
...   if s[n:n+2] =="ab":
...     print n,i
...     j=j+1
...     if j==2: print "2nd occurence at index position: ",n
...
0 a
6 a
2nd occurence at index position:  6
12 a
14 a

回答 12

提供另一个使用“ split和”的“棘手”解决方案join

在您的示例中,我们可以使用

len("substring".join([s for s in ori.split("substring")[:2]]))

Providing another “tricky” solution, which use split and join.

In your example, we can use

len("substring".join([s for s in ori.split("substring")[:2]]))

回答 13

# return -1 if nth substr (0-indexed) d.n.e, else return index
def find_nth(s, substr, n):
    i = 0
    while n >= 0:
        n -= 1
        i = s.find(substr, i + 1)
    return i
# return -1 if nth substr (0-indexed) d.n.e, else return index
def find_nth(s, substr, n):
    i = 0
    while n >= 0:
        n -= 1
        i = s.find(substr, i + 1)
    return i

回答 14

不使用循环和递归的解决方案。

在编译方法中使用所需的模式,然后在变量‘n’中输入所需的出现位置,最后一条语句将在给定的字符串中打印该模式的第n个出现位置的起始索引。在这里,finditer的结果(即迭代器)将转换为list并直接访问第n个索引。

import re
n=2
sampleString="this is history"
pattern=re.compile("is")
matches=pattern.finditer(sampleString)
print(list(matches)[n].span()[0])

Solution without using loops and recursion.

Use the required pattern in compile method and enter the desired occurrence in variable ‘n’ and the last statement will print the starting index of the nth occurrence of the pattern in the given string. Here the result of finditer i.e. iterator is being converted to list and directly accessing the nth index.

import re
n=2
sampleString="this is history"
pattern=re.compile("is")
matches=pattern.finditer(sampleString)
print(list(matches)[n].span()[0])

回答 15

替换一根衬管很棒,但只能工作,因为XX和bar具有相同的长度

一个好的和一般的定义是:

def findN(s,sub,N,replaceString="XXX"):
    return s.replace(sub,replaceString,N-1).find(sub) - (len(replaceString)-len(sub))*(N-1)

The replace one liner is great but only works because XX and bar have the same lentgh

A good and general def would be:

def findN(s,sub,N,replaceString="XXX"):
    return s.replace(sub,replaceString,N-1).find(sub) - (len(replaceString)-len(sub))*(N-1)

回答 16

这是您真正想要的答案:

def Find(String,ToFind,Occurence = 1):
index = 0 
count = 0
while index <= len(String):
    try:
        if String[index:index + len(ToFind)] == ToFind:
            count += 1
        if count == Occurence:
               return index
               break
        index += 1
    except IndexError:
        return False
        break
return False

This is the answer you really want:

def Find(String,ToFind,Occurence = 1):
index = 0 
count = 0
while index <= len(String):
    try:
        if String[index:index + len(ToFind)] == ToFind:
            count += 1
        if count == Occurence:
               return index
               break
        index += 1
    except IndexError:
        return False
        break
return False

回答 17

这是我找到ninth出现b在字符串中的解决方案a

from functools import reduce


def findNth(a, b, n):
    return reduce(lambda x, y: -1 if y > x + 1 else a.find(b, x + 1), range(n), -1)

它是纯Python并且是迭代的。对于0或n太大,它将返回-1。它是单线的,可以直接使用。这是一个例子:

>>> reduce(lambda x, y: -1 if y > x + 1 else 'bibarbobaobaotang'.find('b', x + 1), range(4), -1)
7

Here is my solution for finding nth occurrance of b in string a:

from functools import reduce


def findNth(a, b, n):
    return reduce(lambda x, y: -1 if y > x + 1 else a.find(b, x + 1), range(n), -1)

It is pure Python and iterative. For 0 or n that is too large, it returns -1. It is one-liner and can be used directly. Here is an example:

>>> reduce(lambda x, y: -1 if y > x + 1 else 'bibarbobaobaotang'.find('b', x + 1), range(4), -1)
7

回答 18

对于搜索字符的第n个出现(即长度为1的子字符串)的特殊情况,以下功能通过构建给定字符出现的所有位置的列表来起作用:

def find_char_nth(string, char, n):
    """Find the n'th occurence of a character within a string."""
    return [i for i, c in enumerate(string) if c == char][n-1]

如果少于n给定字符的出现次数,它将给出IndexError: list index out of range

这是从@Z​​v_oDD的答案派生而来的,对于单个字符而言,它得到了简化。

For the special case where you search for the n’th occurence of a character (i.e. substring of length 1), the following function works by building a list of all positions of occurences of the given character:

def find_char_nth(string, char, n):
    """Find the n'th occurence of a character within a string."""
    return [i for i, c in enumerate(string) if c == char][n-1]

If there are fewer than n occurences of the given character, it will give IndexError: list index out of range.

This is derived from @Zv_oDD’s answer and simplified for the case of a single character.


回答 19

Def:

def get_first_N_words(mytext, mylen = 3):
    mylist = list(mytext.split())
    if len(mylist)>=mylen: return ' '.join(mylist[:mylen])

使用方法:

get_first_N_words('  One Two Three Four ' , 3)

输出:

'One Two Three'

Def:

def get_first_N_words(mytext, mylen = 3):
    mylist = list(mytext.split())
    if len(mylist)>=mylen: return ' '.join(mylist[:mylen])

To use:

get_first_N_words('  One Two Three Four ' , 3)

Output:

'One Two Three'

回答 20

怎么样:

c = os.getcwd().split('\\')
print '\\'.join(c[0:-2])

How about:

c = os.getcwd().split('\\')
print '\\'.join(c[0:-2])

为什么在Python 3.5中str.translate比Python 3.4更快?

问题:为什么在Python 3.5中str.translate比Python 3.4更快?

我试图使用text.translate()Python 3.4 从给定的字符串中删除不需要的字符。

最小的代码是:

import sys 
s = 'abcde12345@#@$#%$'
mapper = dict.fromkeys(i for i in range(sys.maxunicode) if chr(i) in '@#$')
print(s.translate(mapper))

它按预期工作。但是,在Python 3.4和Python 3.5中执行相同的程序会产生很大的不同。

计算时间的代码是

python3 -m timeit -s "import sys;s = 'abcde12345@#@$#%$'*1000 ; mapper = dict.fromkeys(i for i in range(sys.maxunicode) if chr(i) in '@#$'); "   "s.translate(mapper)"

Python 3.4程序花费1.3毫秒,而Python 3.5中的同一程序仅花费26.4μs

Python 3.5中有哪些改进使其比Python 3.4更快?

I was trying to remove unwanted characters from a given string using text.translate() in Python 3.4.

The minimal code is:

import sys 
s = 'abcde12345@#@$#%$'
mapper = dict.fromkeys(i for i in range(sys.maxunicode) if chr(i) in '@#$')
print(s.translate(mapper))

It works as expected. However the same program when executed in Python 3.4 and Python 3.5 gives a large difference.

The code to calculate timings is

python3 -m timeit -s "import sys;s = 'abcde12345@#@$#%$'*1000 ; mapper = dict.fromkeys(i for i in range(sys.maxunicode) if chr(i) in '@#$'); "   "s.translate(mapper)"

The Python 3.4 program takes 1.3ms whereas the same program in Python 3.5 takes only 26.4μs.

What has improved in Python 3.5 that makes it faster compared to Python 3.4?


回答 0

TL; DR- 问题21118


长篇故事

Josh Rosenberg发现str.translate()与相比,该功能非常慢bytes.translate,他提出了一个问题,并指出:

在Python 3中,str.translate()通常是性能悲观,而不是优化。

为什么str.translate()慢呢?

str.translate()速度很慢的主要原因是查找曾经在Python字典中进行。

使用maketrans此问题使情况变得更糟。类似的方法是使用bytes256个项目构建一个C数组以快速查找表。因此,较高级别的Python的使用dict使str.translate()Python 3.4中的速度非常慢。

现在发生什么事?

第一种方法是添加一个小的补丁,translate_writer,但是速度的提高并不令人满意。很快又测试了另一个补丁fast_translate,它产生了非常好的结果,加速了55%。

从文件中可以看到的主要变化是Python字典查找已更改为C级查找。

现在的速度几乎与 bytes

                                unpatched           patched

str.translate                   4.55125927699919    0.7898181750006188
str.translate from bytes trans  1.8910855210015143  0.779950579000797

这里需要注意的一点是,性能增强仅在ASCII字符串中突出。

正如JFSebastian在下面的注释中提到的,在3.5之前,对于ASCII和非ASCII情况,转换以前都以相同的方式工作。但是从3.5 ASCII起,大小写要快得多。

早期的ASCII与非ASCII几乎相同,但是现在我们可以看到性能有了很大的变化。

答案所示,它可以从71.6μs改善到2.33μs 。

以下代码演示了这一点

python3.5 -m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')" "text.translate(d)"
100000 loops, best of 3: 2.3 usec per loop
python3.5 -m timeit -s "text = 'm\U0001F602ssissippi'*100; d={'\U0001F602': 'i'}" "text.translate(d)"
10000 loops, best of 3: 117 usec per loop

python3 -m timeit -s "text = 'm\U0001F602ssissippi'*100; d={'\U0001F602': 'i'}" "text.translate(d)"
10000 loops, best of 3: 91.2 usec per loop
python3 -m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')" "text.translate(d)"
10000 loops, best of 3: 101 usec per loop

结果列表:

         Python 3.4    Python 3.5  
Ascii     91.2          2.3 
Unicode   101           117

TL;DR – ISSUE 21118


The long Story

Josh Rosenberg found out that the str.translate() function is very slow compared to the bytes.translate, he raised an issue, stating that:

In Python 3, str.translate() is usually a performance pessimization, not optimization.

Why was str.translate() slow?

The main reason for str.translate() to be very slow was that the lookup used to be in a Python dictionary.

The usage of maketrans made this problem worse. The similar approach using bytes builds a C array of 256 items to fast table lookup. Hence the usage of higher level Python dict makes the str.translate() in Python 3.4 very slow.

What happened now?

The first approach was to add a small patch, translate_writer, However the speed increase was not that pleasing. Soon another patch fast_translate was tested and it yielded very nice results of up to 55% speedup.

The main change as can be seen from the file is that the Python dictionary lookup is changed into a C level lookup.

The speeds now are almost the same as bytes

                                unpatched           patched

str.translate                   4.55125927699919    0.7898181750006188
str.translate from bytes trans  1.8910855210015143  0.779950579000797

A small note here is that the performance enhancement is only prominent in ASCII strings.

As J.F.Sebastian mentions in a comment below, Before 3.5, translate used to work in the same way for both ASCII and non-ASCII cases. However from 3.5 ASCII case is much faster.

Earlier ASCII vs non-ascii used to be almost same, however now we can see a great change in the performance.

It can be an improvement from 71.6μs to 2.33μs as seen in this answer.

The following code demonstrates this

python3.5 -m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')" "text.translate(d)"
100000 loops, best of 3: 2.3 usec per loop
python3.5 -m timeit -s "text = 'm\U0001F602ssissippi'*100; d={'\U0001F602': 'i'}" "text.translate(d)"
10000 loops, best of 3: 117 usec per loop

python3 -m timeit -s "text = 'm\U0001F602ssissippi'*100; d={'\U0001F602': 'i'}" "text.translate(d)"
10000 loops, best of 3: 91.2 usec per loop
python3 -m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')" "text.translate(d)"
10000 loops, best of 3: 101 usec per loop

Tabulation of the results:

         Python 3.4    Python 3.5  
Ascii     91.2          2.3 
Unicode   101           117

从字符串中删除前x个字符?

问题:从字符串中删除前x个字符?

如何从字符串中删除前x个字符?例如,如果一个人有一个字符串lipsum,他们将如何删除前三个字符并得到结果sum

How might one remove the first x characters from a string? For example, if one had a string lipsum, how would they remove the first 3 characters and get a result of sum?


回答 0

>>> text = 'lipsum'
>>> text[3:]
'sum'

有关更多信息,请参见有关字符串的官方文档,有关符号的简要概述,请参见此SO答案。

>>> text = 'lipsum'
>>> text[3:]
'sum'

See the official documentation on strings for more information and this SO answer for a concise summary of the notation.


回答 1

另一种方法(取决于您的实际需求):如果要弹出前n个字符并同时保存弹出的字符和修改后的字符串:

s = 'lipsum'
n = 3
a, s = s[:n], s[n:]
print(a)
# lip
print(s)
# sum

Another way (depending on your actual needs): If you want to pop the first n characters and save both the popped characters and the modified string:

s = 'lipsum'
n = 3
a, s = s[:n], s[n:]
print(a)
# lip
print(s)
# sum

回答 2

>>> x = 'lipsum'
>>> x.replace(x[:3], '')
'sum'
>>> x = 'lipsum'
>>> x.replace(x[:3], '')
'sum'

回答 3

使用del

例:

>>> text = 'lipsum'
>>> l = list(text)
>>> del l[3:]
>>> ''.join(l)
'sum'

Use del.

Example:

>>> text = 'lipsum'
>>> l = list(text)
>>> del l[3:]
>>> ''.join(l)
'sum'

回答 4

示例显示帐号的后3位数字。

x = '1234567890'   
x.replace(x[:7], '')

o/p: '890'

Example to show last 3 digits of account number.

x = '1234567890'   
x.replace(x[:7], '')

o/p: '890'

在pandas数据框中完全打印很长的字符串

问题:在pandas数据框中完全打印很长的字符串

我正在努力看似非常简单的事情。我有一个包含非常长字符串的pandas数据框。

df = pd.DataFrame({'one' : ['one', 'two', 
      'This is very long string very long string very long string veryvery long string']})

现在,当我尝试打印相同的字符串时,我看不到完整的字符串,而只看到了字符串的一部分。

我尝试了以下选项

  • 使用 print(df.iloc[2])
  • 使用 to_html
  • 使用 to_string
  • 其中一个stackoverflow答案建议通过使用pandas display选项来增加列宽,但该方法也不起作用。
  • 我也没有得到如何set_printoptions帮助我。

任何想法表示赞赏。看起来很简单,但无法获得!

I am struggling with the seemingly very simple thing.I have a pandas data frame containing very long string.

df = pd.DataFrame({'one' : ['one', 'two', 
      'This is very long string very long string very long string veryvery long string']})

Now when I try to print the same, I do not see the full string I rather see only part of the string.

I tried following options

  • using print(df.iloc[2])
  • using to_html
  • using to_string
  • One of the stackoverflow answer suggested to increase column width by using pandas display option, that did not work either.
  • I also did not get how set_printoptions will help me.

Any ideas appreciated. Looks very simple, but not able to get it!


回答 0

您可以使用options.display.max_colwidth指定想要在默认表示中看到更多内容:

In [2]: df
Out[2]:
                                                 one
0                                                one
1                                                two
2  This is very long string very long string very...

In [3]: pd.options.display.max_colwidth
Out[3]: 50

In [4]: pd.options.display.max_colwidth = 100

In [5]: df
Out[5]:
                                                                               one
0                                                                              one
1                                                                              two
2  This is very long string very long string very long string veryvery long string

实际上,如果您只想检查一个值,则可以通过访问它(作为标量,而不是像一行一样df.iloc[2])来查看完整的字符串:

In [7]: df.iloc[2,0]    # or df.loc[2,'one']
Out[7]: 'This is very long string very long string very long string veryvery long string'

You can use options.display.max_colwidth to specify you want to see more in the default representation:

In [2]: df
Out[2]:
                                                 one
0                                                one
1                                                two
2  This is very long string very long string very...

In [3]: pd.options.display.max_colwidth
Out[3]: 50

In [4]: pd.options.display.max_colwidth = 100

In [5]: df
Out[5]:
                                                                               one
0                                                                              one
1                                                                              two
2  This is very long string very long string very long string veryvery long string

And indeed, if you just want to inspect the one value, by accessing it (as a scalar, not as a row as df.iloc[2] does) you also see the full string:

In [7]: df.iloc[2,0]    # or df.loc[2,'one']
Out[7]: 'This is very long string very long string very long string veryvery long string'

回答 1

使用pd.set_option('display.max_colwidth', -1)自动换行,多行细胞。

是有关如何充分利用大熊猫的jupyters显示器的重要资源。

Use pd.set_option('display.max_colwidth', -1) for automatic linebreaks and multi-line cells.

This is a great resource on how to use jupyters display with pandas to the fullest.


回答 2

另一种非常简单的方法是调用列表函数:

list(df['one'][2])
# output:
['This is very long string very long string very long string veryvery long string']

值得一提的是,要列出整个列并不是很方便,但是对于简单的一行来说,为什么呢?

Another, pretty simple approach is to call list function:

list(df['one'][2])
# output:
['This is very long string very long string very long string veryvery long string']

No worth to mention, that is not good to convent to list the whole columns, but for a simple line – why not


回答 3

打印整个字符串的另一种简便方法是values在数据框上调用。

df = pd.DataFrame({'one' : ['one', 'two', 
      'This is very long string very long string very long string veryvery long string']})

print(df.values)

输出将是

[['one']
 ['two']
 ['This is very long string very long string very long string veryvery long string']]

Another easier way to print the whole string is to call values on the dataframe.

df = pd.DataFrame({'one' : ['one', 'two', 
      'This is very long string very long string very long string veryvery long string']})

print(df.values)

The Output will be

[['one']
 ['two']
 ['This is very long string very long string very long string veryvery long string']]

回答 4

这是你的本意吗?

In [7]: x =  pd.DataFrame({'one' : ['one', 'two', 'This is very long string very long string very long string veryvery long string']})

In [8]: x
Out[8]: 
                                                 one
0                                                one
1                                                two
2  This is very long string very long string very...

In [9]: x['one'][2]
Out[9]: 'This is very long string very long string very long string veryvery long string'

Is this what you meant to do ?

In [7]: x =  pd.DataFrame({'one' : ['one', 'two', 'This is very long string very long string very long string veryvery long string']})

In [8]: x
Out[8]: 
                                                 one
0                                                one
1                                                two
2  This is very long string very long string very...

In [9]: x['one'][2]
Out[9]: 'This is very long string very long string very long string veryvery long string'

回答 5

我经常处理您描述的情况的.to_csv()方法是使用该方法并写入stdout:

import sys

df.to_csv(sys.stdout)

更新:现在应该可以使用None而不是sys.stdout具有相似的效果了!

这应该转储整个数据帧,包括所有字符串的全部。您可以使用to_csv参数来配置列分隔符,是否打印索引等。不过,它不如正确呈现它漂亮。

我最初将其发布是为了回答有关熊猫中某个数据框中所有列的输出数据的一些相关问题

The way I often deal with the situation you describe is to use the .to_csv() method and write to stdout:

import sys

df.to_csv(sys.stdout)

Update: it should now be possible to just use None instead of sys.stdout with similar effect!

This should dump the whole dataframe, including the entirety of any strings. You can use the to_csv parameters to configure column separators, whether the index is printed, etc. It will be less pretty than rendering it properly though.

I posted this originally in answer to the somewhat-related question at Output data from all columns in a dataframe in pandas


回答 6

只需在打印之前将以下行添加到您的代码中即可。

 pd.options.display.max_colwidth = 90  # set a value as your need

您只需执行以下步骤即可设置其他附加选项,

  • 您可以如下更改熊猫max_columns功能的选项,以显示更多列

    import pandas as pd
    pd.options.display.max_columns = 10

    (这将显示10列,您可以根据需要进行更改)

  • 这样,您可以更改行数,如下所示以显示更多行

    pd.options.display.max_rows = 999

    (这允许一次打印999行)

这应该很好

请参考文档,为熊猫更改更多选项/设置

Just add the following line to your code before print.

 pd.options.display.max_colwidth = 90  # set a value as your need

You can simply do the following steps for setting other additional options,

  • You can change the options for pandas max_columns feature as follows to display more columns

    import pandas as pd
    pd.options.display.max_columns = 10
    

    (this allows 10 columns to display, you can change this as you need)

  • Like that you can change the number of rows as you need to display as follows to display more rows

    pd.options.display.max_rows = 999
    

    (this allows to print 999 rows at a time)

this should works fine

Please kindly refer the doc to change more options/settings for pandas


回答 7

我创建了一个小实用程序功能,对我来说效果很好

def display_text_max_col_width(df, width):
    with pd.option_context('display.max_colwidth', width):
        print(df)

display_text_max_col_width(train_df["Description"], 800)

我可以根据需要更改宽度的长度,而无需永久设置任何选项。

I have created a small utility function, this works well for me

def display_text_max_col_width(df, width):
    with pd.option_context('display.max_colwidth', width):
        print(df)

display_text_max_col_width(train_df["Description"], 800)

I can change length of the width as per my requirement, without setting any option permanently.


回答 8

如果您使用的是jupyter笔记本,还可以将pandas数据帧打印为HTML表格,该表格将打印完整字符串。

from IPython.display import display, HTML
display(HTML(df.to_html()))

输出量

    one
0   one
1   two
2   This is very long string very long string very long string veryvery long string

If you’re using jupyter notebook, you can also print pandas dataframe as HTML table, which will print full strings.

from IPython.display import display, HTML
display(HTML(df.to_html()))

Output

    one
0   one
1   two
2   This is very long string very long string very long string veryvery long string

插入两个字符串的最pythonic方式

问题:插入两个字符串的最pythonic方式

将两个字符串网格化的最Python方式是什么?

例如:

输入:

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'

输出:

'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

What’s the most pythonic way to mesh two strings together?

For example:

Input:

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'

Output:

'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

回答 0

对我来说,最pythonic *的方式是以下代码,它几乎做同样的事情,但是使用+运算符来连接每个字符串中的各个字符:

res = "".join(i + j for i, j in zip(u, l))
print(res)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

它也比使用两个join()调用更快:

In [5]: l1 = 'A' * 1000000; l2 = 'a' * 1000000

In [6]: %timeit "".join("".join(item) for item in zip(l1, l2))
1 loops, best of 3: 442 ms per loop

In [7]: %timeit "".join(i + j for i, j in zip(l1, l2))
1 loops, best of 3: 360 ms per loop

存在更快的方法,但是它们常常使代码模糊。

注:如果两个输入字符串是相同的长度,则较长的一个将被截断,zip停在较短字符串的结尾迭代。在这种情况下,zip应该使用模块中的zip_longestizip_longest在Python 2中)而不是一个itertools来确保两个字符串都已用尽。


*引用Python之禅可读性很重要
Pythonic = 对我而言可读性i + j至少对于我的眼睛来说,更容易从视觉上进行解析。

For me, the most pythonic* way is the following which pretty much does the same thing but uses the + operator for concatenating the individual characters in each string:

res = "".join(i + j for i, j in zip(u, l))
print(res)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

It is also faster than using two join() calls:

In [5]: l1 = 'A' * 1000000; l2 = 'a' * 1000000

In [6]: %timeit "".join("".join(item) for item in zip(l1, l2))
1 loops, best of 3: 442 ms per loop

In [7]: %timeit "".join(i + j for i, j in zip(l1, l2))
1 loops, best of 3: 360 ms per loop

Faster approaches exist, but they often obfuscate the code.

Note: If the two input strings are not the same length then the longer one will be truncated as zip stops iterating at the end of the shorter string. In this case instead of zip one should use zip_longest (izip_longest in Python 2) from the itertools module to ensure that both strings are fully exhausted.


*To take a quote from the Zen of Python: Readability counts.
Pythonic = readability for me; i + j is just visually parsed more easily, at least for my eyes.


回答 1

更快的选择

其他方式:

res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
print(''.join(res))

输出:

'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

速度

看起来更快:

%%timeit
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
''.join(res)

100000 loops, best of 3: 4.75 µs per loop

比迄今为止最快的解决方案:

%timeit "".join(list(chain.from_iterable(zip(u, l))))

100000 loops, best of 3: 6.52 µs per loop

同样对于较大的字符串:

l1 = 'A' * 1000000; l2 = 'a' * 1000000

%timeit "".join(list(chain.from_iterable(zip(l1, l2))))
1 loops, best of 3: 151 ms per loop


%%timeit
res = [''] * len(l1) * 2
res[::2] = l1
res[1::2] = l2
''.join(res)

10 loops, best of 3: 92 ms per loop

Python 3.5.1。

不同长度字符串的变化

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijkl'

较短的一个确定长度(zip()等效)

min_len = min(len(u), len(l))
res = [''] * min_len * 2 
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
print(''.join(res))

输出:

AaBbCcDdEeFfGgHhIiJjKkLl

更长的长度决定长度(itertools.zip_longest(fillvalue='')等效)

min_len = min(len(u), len(l))
res = [''] * min_len * 2 
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
res += u[min_len:] + l[min_len:]
print(''.join(res))

输出:

AaBbCcDdEeFfGgHhIiJjKkLlMNOPQRSTUVWXYZ

Faster Alternative

Another way:

res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
print(''.join(res))

Output:

'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

Speed

Looks like it is faster:

%%timeit
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
''.join(res)

100000 loops, best of 3: 4.75 µs per loop

than the fastest solution so far:

%timeit "".join(list(chain.from_iterable(zip(u, l))))

100000 loops, best of 3: 6.52 µs per loop

Also for the larger strings:

l1 = 'A' * 1000000; l2 = 'a' * 1000000

%timeit "".join(list(chain.from_iterable(zip(l1, l2))))
1 loops, best of 3: 151 ms per loop


%%timeit
res = [''] * len(l1) * 2
res[::2] = l1
res[1::2] = l2
''.join(res)

10 loops, best of 3: 92 ms per loop

Python 3.5.1.

Variation for strings with different lengths

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijkl'

Shorter one determines length (zip() equivalent)

min_len = min(len(u), len(l))
res = [''] * min_len * 2 
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
print(''.join(res))

Output:

AaBbCcDdEeFfGgHhIiJjKkLl

Longer one determines length (itertools.zip_longest(fillvalue='') equivalent)

min_len = min(len(u), len(l))
res = [''] * min_len * 2 
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
res += u[min_len:] + l[min_len:]
print(''.join(res))

Output:

AaBbCcDdEeFfGgHhIiJjKkLlMNOPQRSTUVWXYZ

回答 2

join()zip()

>>> ''.join(''.join(item) for item in zip(u,l))
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

With join() and zip().

>>> ''.join(''.join(item) for item in zip(u,l))
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

回答 3

在Python 2上,到目前为止,做事的最快方法是小字符串列表切片的速度大约是3倍,长字符串列表切片的速度大约是30倍。

res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)

但是,这在Python 3上不起作用。您可以实现类似

res = bytearray(len(u) * 2)
res[::2] = u.encode("ascii")
res[1::2] = l.encode("ascii")
res.decode("ascii")

但是到那时,您已经失去了对小型字符串进行列表切片所获得的收益(对于长字符串而言,它的速度仍然是20倍),并且这甚至还不适用于非ASCII字符。

FWIW,如果您在大量字符串上执行此操作并且需要每个周期,并且由于某种原因必须使用Python字符串…以下是操作方法:

res = bytearray(len(u) * 4 * 2)

u_utf32 = u.encode("utf_32_be")
res[0::8] = u_utf32[0::4]
res[1::8] = u_utf32[1::4]
res[2::8] = u_utf32[2::4]
res[3::8] = u_utf32[3::4]

l_utf32 = l.encode("utf_32_be")
res[4::8] = l_utf32[0::4]
res[5::8] = l_utf32[1::4]
res[6::8] = l_utf32[2::4]
res[7::8] = l_utf32[3::4]

res.decode("utf_32_be")

特殊情况下,较小类型的外壳也将有所帮助。FWIW,这只是长字符串列表切片速度的3倍,而小字符串则 4到5倍。

无论哪种方式,我都喜欢join解决方案,但是由于在其他地方提到了时间安排,我认为我也应该加入。

On Python 2, by far the faster way to do things, at ~3x the speed of list slicing for small strings and ~30x for long ones, is

res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)

This wouldn’t work on Python 3, though. You could implement something like

res = bytearray(len(u) * 2)
res[::2] = u.encode("ascii")
res[1::2] = l.encode("ascii")
res.decode("ascii")

but by then you’ve already lost the gains over list slicing for small strings (it’s still 20x the speed for long strings) and this doesn’t even work for non-ASCII characters yet.

FWIW, if you are doing this on massive strings and need every cycle, and for some reason have to use Python strings… here’s how to do it:

res = bytearray(len(u) * 4 * 2)

u_utf32 = u.encode("utf_32_be")
res[0::8] = u_utf32[0::4]
res[1::8] = u_utf32[1::4]
res[2::8] = u_utf32[2::4]
res[3::8] = u_utf32[3::4]

l_utf32 = l.encode("utf_32_be")
res[4::8] = l_utf32[0::4]
res[5::8] = l_utf32[1::4]
res[6::8] = l_utf32[2::4]
res[7::8] = l_utf32[3::4]

res.decode("utf_32_be")

Special-casing the common case of smaller types will help too. FWIW, this is only 3x the speed of list slicing for long strings and a factor of 4 to 5 slower for small strings.

Either way I prefer the join solutions, but since timings were mentioned elsewhere I thought I might as well join in.


回答 4

如果您想要最快的方法,可以将itertools与结合使用operator.add

In [36]: from operator import add

In [37]: from itertools import  starmap, izip

In [38]: timeit "".join([i + j for i, j in uzip(l1, l2)])
1 loops, best of 3: 142 ms per loop

In [39]: timeit "".join(starmap(add, izip(l1,l2)))
1 loops, best of 3: 117 ms per loop

In [40]: timeit "".join(["".join(item) for item in zip(l1, l2)])
1 loops, best of 3: 196 ms per loop

In [41]:  "".join(starmap(add, izip(l1,l2))) ==  "".join([i + j   for i, j in izip(l1, l2)]) ==  "".join(["".join(item) for item in izip(l1, l2)])
Out[42]: True

但是合并起来izipchain.from_iterable更快了

In [2]: from itertools import  chain, izip

In [3]: timeit "".join(chain.from_iterable(izip(l1, l2)))
10 loops, best of 3: 98.7 ms per loop

chain(*和之间也存在实质性差异 chain.from_iterable(...

In [5]: timeit "".join(chain(*izip(l1, l2)))
1 loops, best of 3: 212 ms per loop

没有像join那样的生成器,传递一个总是慢一些,因为python首先会使用内容来建立一个列表,因为它会对数据进行两次传递,一次传递所需的大小,一次传递实际的大小使用生成器无法实现的联接:

join.h

 /* Here is the general case.  Do a pre-pass to figure out the total
  * amount of space we'll need (sz), and see whether all arguments are
  * bytes-like.
   */

另外,如果您使用不同长度的字符串,并且不想丢失数据,则可以使用izip_longest

In [22]: from itertools import izip_longest    
In [23]: a,b = "hlo","elworld"

In [24]:  "".join(chain.from_iterable(izip_longest(a, b,fillvalue="")))
Out[24]: 'helloworld'

对于python 3,它称为 zip_longest

但是对于python2来说,veedrac的建议是迄今为止最快的:

In [18]: %%timeit
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
   ....: 
100 loops, best of 3: 2.68 ms per loop

If you want the fastest way, you can combine itertools with operator.add:

In [36]: from operator import add

In [37]: from itertools import  starmap, izip

In [38]: timeit "".join([i + j for i, j in uzip(l1, l2)])
1 loops, best of 3: 142 ms per loop

In [39]: timeit "".join(starmap(add, izip(l1,l2)))
1 loops, best of 3: 117 ms per loop

In [40]: timeit "".join(["".join(item) for item in zip(l1, l2)])
1 loops, best of 3: 196 ms per loop

In [41]:  "".join(starmap(add, izip(l1,l2))) ==  "".join([i + j   for i, j in izip(l1, l2)]) ==  "".join(["".join(item) for item in izip(l1, l2)])
Out[42]: True

But combining izip and chain.from_iterable is faster again

In [2]: from itertools import  chain, izip

In [3]: timeit "".join(chain.from_iterable(izip(l1, l2)))
10 loops, best of 3: 98.7 ms per loop

There is also a substantial difference between chain(* and chain.from_iterable(....

In [5]: timeit "".join(chain(*izip(l1, l2)))
1 loops, best of 3: 212 ms per loop

There is no such thing as a generator with join, passing one is always going to be slower as python will first build a list using the content because it does two passes over the data, one to figure out the size needed and one to actually do the join which would not be possible using a generator:

join.h:

 /* Here is the general case.  Do a pre-pass to figure out the total
  * amount of space we'll need (sz), and see whether all arguments are
  * bytes-like.
   */

Also if you have different length strings and you don’t want to lose data you can use izip_longest :

In [22]: from itertools import izip_longest    
In [23]: a,b = "hlo","elworld"

In [24]:  "".join(chain.from_iterable(izip_longest(a, b,fillvalue="")))
Out[24]: 'helloworld'

For python 3 it is called zip_longest

But for python2, veedrac’s suggestion is by far the fastest:

In [18]: %%timeit
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
   ....: 
100 loops, best of 3: 2.68 ms per loop

回答 5

您也可以使用map和执行此操作operator.add

from operator import add

u = 'AAAAA'
l = 'aaaaa'

s = "".join(map(add, u, l))

输出

'AaAaAaAaAa'

map的作用是,它从第一个可迭代对象获取每个元素,u并从第二个可迭代对象获取第一个元素,l并应用作为第一个参数提供的函数add。然后加入只是加入他们。

You could also do this using map and operator.add:

from operator import add

u = 'AAAAA'
l = 'aaaaa'

s = "".join(map(add, u, l))

Output:

'AaAaAaAaAa'

What map does is it takes every element from the first iterable u and the first elements from the second iterable l and applies the function supplied as the first argument add. Then join just joins them.


回答 6

吉姆的答案很好,但是,如果您不介意几次导入,这是我最喜欢的选择:

from functools import reduce
from operator import add

reduce(add, map(add, u, l))

Jim’s answer is great, but here’s my favorite option, if you don’t mind a couple of imports:

from functools import reduce
from operator import add

reduce(add, map(add, u, l))

回答 7

这些建议很多都假设字符串长度相等。也许涵盖了所有合理的用例,但至少对我来说,您似乎也想适应长度不同的字符串。还是我是唯一认为网格应该像这样工作的人:

u = "foobar"
l = "baz"
mesh(u,l) = "fboaozbar"

一种方法是:

def mesh(a,b):
    minlen = min(len(a),len(b))
    return "".join(["".join(x+y for x,y in zip(a,b)),a[minlen:],b[minlen:]])

A lot of these suggestions assume the strings are of equal length. Maybe that covers all reasonable use cases, but at least to me it seems that you might want to accomodate strings of differing lengths too. Or am I the only one thinking the mesh should work a bit like this:

u = "foobar"
l = "baz"
mesh(u,l) = "fboaozbar"

One way to do this would be the following:

def mesh(a,b):
    minlen = min(len(a),len(b))
    return "".join(["".join(x+y for x,y in zip(a,b)),a[minlen:],b[minlen:]])

回答 8

我喜欢使用两个fors,变量名可以提示/提醒正在发生的事情:

"".join(char for pair in zip(u,l) for char in pair)

I like using two fors, the variable names can give a hint/reminder to what is going on:

"".join(char for pair in zip(u,l) for char in pair)

回答 9

只是添加另一种更基本的方法:

st = ""
for char in u:
    st = "{0}{1}{2}".format( st, char, l[ u.index( char ) ] )

Just to add another, more basic approach:

st = ""
for char in u:
    st = "{0}{1}{2}".format( st, char, l[ u.index( char ) ] )

回答 10

有点不讲究Python而不考虑这里的double-list-comprehension答案,用O(1)来处理n个字符串:

"".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)

all_strings您要插入的字符串的列表在哪里。就您而言,all_strings = [u, l]。完整的使用示例如下所示:

import itertools
a = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
b = 'abcdefghijklmnopqrstuvwxyz'
all_strings = [a,b]
interleaved = "".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)
print(interleaved)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

喜欢许多答案,最快吗?可能不是,但是简单而灵活。另外,在没有增加太多复杂性的情况下,这比公认的答案要快一些(通常,在python中字符串添加有点慢):

In [7]: l1 = 'A' * 1000000; l2 = 'a' * 1000000;

In [8]: %timeit "".join(a + b for i, j in zip(l1, l2))
1 loops, best of 3: 227 ms per loop

In [9]: %timeit "".join(c for cs in zip(*(l1, l2)) for c in cs)
1 loops, best of 3: 198 ms per loop

Feels a bit un-pythonic not to consider the double-list-comprehension answer here, to handle n string with O(1) effort:

"".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)

where all_strings is a list of the strings you want to interleave. In your case, all_strings = [u, l]. A full use example would look like this:

import itertools
a = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
b = 'abcdefghijklmnopqrstuvwxyz'
all_strings = [a,b]
interleaved = "".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)
print(interleaved)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

Like many answers, fastest? Probably not, but simple and flexible. Also, without too much added complexity, this is slightly faster than the accepted answer (in general, string addition is a bit slow in python):

In [7]: l1 = 'A' * 1000000; l2 = 'a' * 1000000;

In [8]: %timeit "".join(a + b for i, j in zip(l1, l2))
1 loops, best of 3: 227 ms per loop

In [9]: %timeit "".join(c for cs in zip(*(l1, l2)) for c in cs)
1 loops, best of 3: 198 ms per loop

回答 11

可能比当前领先的解决方案更快,更短:

from itertools import chain

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'

res = "".join(chain(*zip(u, l)))

快速策略是在C级别上尽可能多地做。相同的zip_longest()修复了不均匀的字符串,它会与chain()来自同一个模块,所以在这里不能给我太多点!

我提出的其他解决方案:

res = "".join(u[x] + l[x] for x in range(len(u)))

res = "".join(k + l[i] for i, k in enumerate(u))

Potentially faster and shorter than the current leading solution:

from itertools import chain

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'

res = "".join(chain(*zip(u, l)))

Strategy speed-wise is to do as much at the C-level as possible. Same zip_longest() fix for uneven strings and it would be coming out of the same module as chain() so can’t ding me too many points there!

Other solutions I came up with along the way:

res = "".join(u[x] + l[x] for x in range(len(u)))

res = "".join(k + l[i] for i, k in enumerate(u))

回答 12

你可以用1iteration_utilities.roundrobin

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'

from iteration_utilities import roundrobin
''.join(roundrobin(u, l))
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

ManyIterables同一包中的类:

from iteration_utilities import ManyIterables
ManyIterables(u, l).roundrobin().as_string()
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

1这来自我编写的第三方库iteration_utilities

You could use iteration_utilities.roundrobin1

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'

from iteration_utilities import roundrobin
''.join(roundrobin(u, l))
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

or the ManyIterables class from the same package:

from iteration_utilities import ManyIterables
ManyIterables(u, l).roundrobin().as_string()
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

1 This is from a third-party library I have written: iteration_utilities.


回答 13

我将使用zip()来获得一种可读且简单的方法:

result = ''
for cha, chb in zip(u, l):
    result += '%s%s' % (cha, chb)

print result
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

I would use zip() to get a readable and easy way:

result = ''
for cha, chb in zip(u, l):
    result += '%s%s' % (cha, chb)

print result
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'