如何在特定子字符串之后获取字符串？

Question 1

How can I get a string after a specific substring?

For example, I want to get the string after "world" in my_string="hello python world , I'm a beginner " (which in this case it is:, I'm a beginner)

Question 2

The easiest way is probably just to split on your target word

my_string="hello python world , i'm a beginner "
print my_string.split("world",1)[1]

split takes the word(or character) to split on and optionally a limit to the number of splits.

In this example split on “world” and limit it to only one split.

Question 3

s1 = "hello python world , i'm a beginner "
s2 = "world"

print s1[s1.index(s2) + len(s2):]

If you want to deal with the case where s2 is not present in s1, then use s1.find(s2) as opposed to index. If the return value of that call is -1, then s2 is not in s1.

Question 4

I’m surprised nobody mentioned partition.

def substring_after(s, delim):
    return s.partition(delim)[2]

IMHO, this solution is more readable than @arshajii’s. Other than that, I think @arshajii’s is the best for being the fastest — it does not create any unnecessary copies/substrings.

Question 5

You want to use str.partition():

>>> my_string.partition("world")[2]
" , i'm a beginner "

because this option is faster than the alternatives.

Note that this produces an empty string if the delimiter is missing:

>>> my_string.partition("Monty")[2]  # delimiter missing
''

If you want to have the original string, then test if the second value returned from str.partition() is non-empty:

prefix, success, result = my_string.partition(delimiter)
if not success: result = prefix

You could also use str.split() with a limit of 1:

>>> my_string.split("world", 1)[-1]
" , i'm a beginner "
>>> my_string.split("Monty", 1)[-1]  # delimiter missing
"hello python world , i'm a beginner "

However, this option is slower. For a best-case scenario, str.partition() is easily about 15% faster compared to str.split():

                                missing        first         lower         upper          last
      str.partition(...)[2]:  [3.745 usec]  [0.434 usec]  [1.533 usec]  <3.543 usec>  [4.075 usec]
str.partition(...) and test:   3.793 usec    0.445 usec    1.597 usec    3.208 usec    4.170 usec
      str.split(..., 1)[-1]:  <3.817 usec>  <0.518 usec>  <1.632 usec>  [3.191 usec]  <4.173 usec>
            % best vs worst:         1.9%         16.2%          6.1%          9.9%          2.3%

This shows timings per execution with inputs here the delimiter is either missing (worst-case scenario), placed first (best case scenario), or in the lower half, upper half or last position. The fastest time is marked with [...] and <...> marks the worst.

The above table is produced by a comprehensive time trial for all three options, produced below. I ran the tests on Python 3.7.4 on a 2017 model 15″ Macbook Pro with 2.9 GHz Intel Core i7 and 16 GB ram.

This script generates random sentences with and without the randomly selected delimiter present, and if present, at different positions in the generated sentence, runs the tests in random order with repeats (producing the fairest results accounting for random OS events taking place during testing), and then prints a table of the results:

import random
from itertools import product
from operator import itemgetter
from pathlib import Path
from timeit import Timer

setup = "from __main__ import sentence as s, delimiter as d"
tests = {
    "str.partition(...)[2]": "r = s.partition(d)[2]",
    "str.partition(...) and test": (
        "prefix, success, result = s.partition(d)\n"
        "if not success: result = prefix"
    ),
    "str.split(..., 1)[-1]": "r = s.split(d, 1)[-1]",
}

placement = "missing first lower upper last".split()
delimiter_count = 3

wordfile = Path("/usr/dict/words")  # Linux
if not wordfile.exists():
    # macos
    wordfile = Path("/usr/share/dict/words")
words = [w.strip() for w in wordfile.open()]

def gen_sentence(delimiter, where="missing", l=1000):
    """Generate a random sentence of length l

    The delimiter is incorporated according to the value of where:

    "missing": no delimiter
    "first":   delimiter is the first word
    "lower":   delimiter is present in the first half
    "upper":   delimiter is present in the second half
    "last":    delimiter is the last word

    """
    possible = [w for w in words if delimiter not in w]
    sentence = random.choices(possible, k=l)
    half = l // 2
    if where == "first":
        # best case, at the start
        sentence[0] = delimiter
    elif where == "lower":
        # lower half
        sentence[random.randrange(1, half)] = delimiter
    elif where == "upper":
        sentence[random.randrange(half, l)] = delimiter
    elif where == "last":
        sentence[-1] = delimiter
    # else: worst case, no delimiter

    return " ".join(sentence)

delimiters = random.choices(words, k=delimiter_count)
timings = {}
sentences = [
    # where, delimiter, sentence
    (w, d, gen_sentence(d, w)) for d, w in product(delimiters, placement)
]
test_mix = [
    # label, test, where, delimiter sentence
    (*t, *s) for t, s in product(tests.items(), sentences)
]
random.shuffle(test_mix)

for i, (label, test, where, delimiter, sentence) in enumerate(test_mix, 1):
    print(f"\rRunning timed tests, {i:2d}/{len(test_mix)}", end="")
    t = Timer(test, setup)
    number, _ = t.autorange()
    results = t.repeat(5, number)
    # best time for this specific random sentence and placement
    timings.setdefault(
        label, {}
    ).setdefault(
        where, []
    ).append(min(dt / number for dt in results))

print()

scales = [(1.0, 'sec'), (0.001, 'msec'), (1e-06, 'usec'), (1e-09, 'nsec')]
width = max(map(len, timings))
rows = []
bestrow = dict.fromkeys(placement, (float("inf"), None))
worstrow = dict.fromkeys(placement, (float("-inf"), None))

for row, label in enumerate(tests):
    columns = []
    worst = float("-inf")
    for p in placement:
        timing = min(timings[label][p])
        if timing < bestrow[p][0]:
            bestrow[p] = (timing, row)
        if timing > worstrow[p][0]:
            worstrow[p] = (timing, row)
        worst = max(timing, worst)
        columns.append(timing)

    scale, unit = next((s, u) for s, u in scales if worst >= s)
    rows.append(
        [f"{label:>{width}}:", *(f" {c / scale:.3f} {unit} " for c in columns)]
    )

colwidth = max(len(c) for r in rows for c in r[1:])
print(' ' * (width + 1), *(p.center(colwidth) for p in placement), sep="  ")
for r, row in enumerate(rows):
    for c, p in enumerate(placement, 1):
        if bestrow[p][1] == r:
            row[c] = f"[{row[c][1:-1]}]"
        elif worstrow[p][1] == r:
            row[c] = f"<{row[c][1:-1]}>"
    print(*row, sep="  ")

percentages = []
for p in placement:
    best, worst = bestrow[p][0], worstrow[p][0]
    ratio = ((worst - best) / worst)
    percentages.append(f"{ratio:{colwidth - 1}.1%} ")

print("% best vs worst:".rjust(width + 1), *percentages, sep="  ")

Question 6

If you want to do this using regex, you could simply use a non-capturing group, to get the word “world” and then grab everything after, like so

(?:world).*

The example string is tested here

Question 7

You can use this package called “substring”. Just type “pip install substring”. You can get the substring by just mentioning the start and end characters/indices.

For example:

import substring

s = substring.substringByChar("abcdefghijklmnop", startChar="d", endChar="n")

print(s)

Output:

s = defghijklmn

Question 8

It’s an old question but i faced a very same scenario, i need to split a string using as demiliter the word “low” the problem for me was that i have in the same string the word below and lower.

I solved it using the re module this way

import re

string = '...below...as higher prices mean lower demand to be expected. Generally, a high reading is seen as negative (or bearish), while a low reading is seen as positive (or bullish) for the Korean Won.'

use re.split with regex to match the exact word

stringafterword = re.split('\\blow\\b',string)[-1]
print(stringafterword)
' reading is seen as positive (or bullish) for the Korean Won.'

the generic code is:

re.split('\\bTHE_WORD_YOU_WANT\\b',string)[-1]

Hope this can help someone!

Question 9

Try this general approach:

import re
my_string="hello python world , i'm a beginner "
p = re.compile("world(.*)")
print (p.findall(my_string))

#[" , i'm a beginner "]

Question 10

In Python 3.9, a new removeprefix method is being added:

>>> 'TestHook'.removeprefix('Test')
'Hook'
>>> 'BaseTestCase'.removeprefix('Test')
'BaseTestCase'

Documentation: https://docs.python.org/3.9/library/stdtypes.html#str.removeprefix
Announcement: https://docs.python.org/3.9/whatsnew/3.9.html

如何在特定子字符串之后获取字符串？

问题：如何在特定子字符串之后获取字符串？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

s = defghijklmn

s = defghijklmn

回答 6

回答 7

回答 8

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

Python 流程图 — 一键转化代码为流程图

7行代码 Python热力图可视化分析缺失数据处理

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

重命名熊猫列

将日期时间格式化为字符串（以毫秒为单位）

熊猫将某些列转换为行

“ foo is None”和“ foo == None”之间有什么区别吗？

如何替换熊猫数据框的列中的文本？

如何使用Django的ORM提取随机记录？

如何在特定子字符串之后获取字符串？

问题：如何在特定子字符串之后获取字符串？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

s = defghijklmn

s = defghijklmn

回答 6

回答 7

回答 8

相关文章

排行榜展示

文章展示