如何在Python中从字符串末尾删除子字符串?

问题:如何在Python中从字符串末尾删除子字符串?

我有以下代码:

url = 'abcdc.com'
print(url.strip('.com'))

我期望: abcdc

我有: abcd

现在我做

url.rsplit('.com', 1)

有没有更好的办法?

I have the following code:

url = 'abcdc.com'
print(url.strip('.com'))

I expected: abcdc

I got: abcd

Now I do

url.rsplit('.com', 1)

Is there a better way?


回答 0

strip并不意味着“删除此子字符串”。x.strip(y)视为y一组字符,并从的末尾剥离该组中的所有字符x

相反,您可以使用endswith和切片:

url = 'abcdc.com'
if url.endswith('.com'):
    url = url[:-4]

或使用正则表达式

import re
url = 'abcdc.com'
url = re.sub('\.com$', '', url)

strip doesn’t mean “remove this substring”. x.strip(y) treats y as a set of characters and strips any characters in that set from the ends of x.

Instead, you could use endswith and slicing:

url = 'abcdc.com'
if url.endswith('.com'):
    url = url[:-4]

Or using regular expressions:

import re
url = 'abcdc.com'
url = re.sub('\.com$', '', url)

回答 1

如果您确定字符串仅出现在末尾,则最简单的方法是使用“替换”:

url = 'abcdc.com'
print(url.replace('.com',''))

If you are sure that the string only appears at the end, then the simplest way would be to use ‘replace’:

url = 'abcdc.com'
print(url.replace('.com',''))

回答 2

def strip_end(text, suffix):
    if not text.endswith(suffix):
        return text
    return text[:len(text)-len(suffix)]
def strip_end(text, suffix):
    if not text.endswith(suffix):
        return text
    return text[:len(text)-len(suffix)]

回答 3

由于似乎没有人指出这一点:

url = "www.example.com"
new_url = url[:url.rfind(".")]

这应该比split()不使用任何新列表对象的方法更有效,并且此解决方案适用于带有多个点的字符串。

Since it seems like nobody has pointed this on out yet:

url = "www.example.com"
new_url = url[:url.rfind(".")]

This should be more efficient than the methods using split() as no new list object is created, and this solution works for strings with several dots.


回答 4

取决于您对网址的了解以及您要尝试的内容。如果您知道它将始终以“ .com”(或“ .net”或“ .org”)结尾,则

 url=url[:-4]

是最快的解决方案。如果它是更通用的URL,那么最好研究一下python随附的urlparse库。

另一方面,如果您只是想删除最后一个“。”之后的所有内容。然后是一个字符串

url.rsplit('.',1)[0]

将工作。或者,如果您只想让所有内容都达到第一个“。”。然后尝试

url.split('.',1)[0]

Depends on what you know about your url and exactly what you’re tryinh to do. If you know that it will always end in ‘.com’ (or ‘.net’ or ‘.org’) then

 url=url[:-4]

is the quickest solution. If it’s a more general URLs then you’re probably better of looking into the urlparse library that comes with python.

If you on the other hand you simply want to remove everything after the final ‘.’ in a string then

url.rsplit('.',1)[0]

will work. Or if you want just want everything up to the first ‘.’ then try

url.split('.',1)[0]

回答 5

如果您知道这是一个扩展,那么

url = 'abcdc.com'
...
url.rsplit('.', 1)[0]  # split at '.', starting from the right, maximum 1 split

这与abcdc.comor www.abcdc.com或or 同样有效,abcdc.[anything]并且可扩展性更高。

If you know it’s an extension, then

url = 'abcdc.com'
...
url.rsplit('.', 1)[0]  # split at '.', starting from the right, maximum 1 split

This works equally well with abcdc.com or www.abcdc.com or abcdc.[anything] and is more extensible.


回答 6

一行:

text if not text.endswith(suffix) or len(suffix) == 0 else text[:-len(suffix)]

In one line:

text if not text.endswith(suffix) or len(suffix) == 0 else text[:-len(suffix)]

回答 7

怎么url[:-4]

How about url[:-4]?


回答 8

对于url(在给定的示例中,它似乎是主题的一部分),可以执行以下操作:

import os
url = 'http://www.stackoverflow.com'
name,ext = os.path.splitext(url)
print (name, ext)

#Or:
ext = '.'+url.split('.')[-1]
name = url[:-len(ext)]
print (name, ext)

两者都将输出: ('http://www.stackoverflow', '.com')

str.endswith(suffix)如果您只需要分割“ .com”或其他特定内容,也可以将其结合使用。

For urls (as it seems to be a part of the topic by the given example), one can do something like this:

import os
url = 'http://www.stackoverflow.com'
name,ext = os.path.splitext(url)
print (name, ext)

#Or:
ext = '.'+url.split('.')[-1]
name = url[:-len(ext)]
print (name, ext)

Both will output: ('http://www.stackoverflow', '.com')

This can also be combined with str.endswith(suffix) if you need to just split “.com”, or anything specific.


回答 9

url.rsplit(’。com’,1)

不太正确。

您实际需要写的是

url.rsplit('.com', 1)[0]

,而且恕我直言。

但是,我个人偏爱此选项,因为它仅使用一个参数:

url.rpartition('.com')[0]

url.rsplit(‘.com’, 1)

is not quite right.

What you actually would need to write is

url.rsplit('.com', 1)[0]

, and it looks pretty succinct IMHO.

However, my personal preference is this option because it uses only one parameter:

url.rpartition('.com')[0]

回答 10

从开始Python 3.9,您可以removesuffix改用:

'abcdc.com'.removesuffix('.com')
# 'abcdc'

Starting in Python 3.9, you can use removesuffix instead:

'abcdc.com'.removesuffix('.com')
# 'abcdc'

回答 11

如果需要剥离某个字符串的某个末端(如果存在),否则什么也不做。我最好的解决方案。您可能会想使用前两个实现之一,但是为了完整起见,我包括了第三个实现。

对于恒定的后缀:

def remove_suffix(v, s):
    return v[:-len(s) if v.endswith(s) else v
remove_suffix("abc.com", ".com") == 'abc'
remove_suffix("abc", ".com") == 'abc'

对于正则表达式:

def remove_suffix_compile(suffix_pattern):
    r = re.compile(f"(.*?)({suffix_pattern})?$")
    return lambda v: r.match(v)[1]
remove_domain = remove_suffix_compile(r"\.[a-zA-Z0-9]{3,}")
remove_domain("abc.com") == "abc"
remove_domain("sub.abc.net") == "sub.abc"
remove_domain("abc.") == "abc."
remove_domain("abc") == "abc"

对于常量后缀的集合,用于大量调用的渐近最快方法:

def remove_suffix_preprocess(*suffixes):
    suffixes = set(suffixes)
    try:
        suffixes.remove('')
    except KeyError:
        pass

    def helper(suffixes, pos):
        if len(suffixes) == 1:
            suf = suffixes[0]
            l = -len(suf)
            ls = slice(0, l)
            return lambda v: v[ls] if v.endswith(suf) else v
        si = iter(suffixes)
        ml = len(next(si))
        exact = False
        for suf in si:
            l = len(suf)
            if -l == pos:
                exact = True
            else:
                ml = min(len(suf), ml)
        ml = -ml
        suffix_dict = {}
        for suf in suffixes:
            sub = suf[ml:pos]
            if sub in suffix_dict:
                suffix_dict[sub].append(suf)
            else:
                suffix_dict[sub] = [suf]
        if exact:
            del suffix_dict['']
            for key in suffix_dict:
                suffix_dict[key] = helper([s[:pos] for s in suffix_dict[key]], None)
            return lambda v: suffix_dict.get(v[ml:pos], lambda v: v)(v[:pos])
        else:
            for key in suffix_dict:
                suffix_dict[key] = helper(suffix_dict[key], ml)
            return lambda v: suffix_dict.get(v[ml:pos], lambda v: v)(v)
    return helper(tuple(suffixes), None)
domain_remove = remove_suffix_preprocess(".com", ".net", ".edu", ".uk", '.tv', '.co.uk', '.org.uk')

最后一个在pypy中可能要比cpython快得多。对于几乎所有不涉及潜在后缀的巨大词典的情况,regex变体可能比此方法更快,至少在cPython中这些潜在后缀无法轻易地表示为regex。

在PyPy中,即使re模​​块使用DFA编译正则表达式引擎,对于大量调用或长字符串来说,正则表达式变体几乎肯定会变慢,因为JIT会优化lambda的大部分开销。

但是,在cPython中,您几乎可以肯定地比较了正在运行的regex的c代码这一事实,这几乎可以证明后缀集合版本在算法上的优势。

If you need to strip some end of a string if it exists otherwise do nothing. My best solutions. You probably will want to use one of first 2 implementations however I have included the 3rd for completeness.

For a constant suffix:

def remove_suffix(v, s):
    return v[:-len(s) if v.endswith(s) else v
remove_suffix("abc.com", ".com") == 'abc'
remove_suffix("abc", ".com") == 'abc'

For a regex:

def remove_suffix_compile(suffix_pattern):
    r = re.compile(f"(.*?)({suffix_pattern})?$")
    return lambda v: r.match(v)[1]
remove_domain = remove_suffix_compile(r"\.[a-zA-Z0-9]{3,}")
remove_domain("abc.com") == "abc"
remove_domain("sub.abc.net") == "sub.abc"
remove_domain("abc.") == "abc."
remove_domain("abc") == "abc"

For a collection of constant suffixes the asymptotically fastest way for a large number of calls:

def remove_suffix_preprocess(*suffixes):
    suffixes = set(suffixes)
    try:
        suffixes.remove('')
    except KeyError:
        pass

    def helper(suffixes, pos):
        if len(suffixes) == 1:
            suf = suffixes[0]
            l = -len(suf)
            ls = slice(0, l)
            return lambda v: v[ls] if v.endswith(suf) else v
        si = iter(suffixes)
        ml = len(next(si))
        exact = False
        for suf in si:
            l = len(suf)
            if -l == pos:
                exact = True
            else:
                ml = min(len(suf), ml)
        ml = -ml
        suffix_dict = {}
        for suf in suffixes:
            sub = suf[ml:pos]
            if sub in suffix_dict:
                suffix_dict[sub].append(suf)
            else:
                suffix_dict[sub] = [suf]
        if exact:
            del suffix_dict['']
            for key in suffix_dict:
                suffix_dict[key] = helper([s[:pos] for s in suffix_dict[key]], None)
            return lambda v: suffix_dict.get(v[ml:pos], lambda v: v)(v[:pos])
        else:
            for key in suffix_dict:
                suffix_dict[key] = helper(suffix_dict[key], ml)
            return lambda v: suffix_dict.get(v[ml:pos], lambda v: v)(v)
    return helper(tuple(suffixes), None)
domain_remove = remove_suffix_preprocess(".com", ".net", ".edu", ".uk", '.tv', '.co.uk', '.org.uk')

the final one is probably significantly faster in pypy then cpython. The regex variant is likely faster than this for virtually all cases that to not involve huge dictionaries of potential suffixes that cannot be easily represented as a regex at least in cPython.

In PyPy the regex variant is almost certainly slower for large number of calls or long strings even if the re module uses a DFA compiling regex engine as the vast majority of the overhead of the lambda’s will be optimized out by the JIT.

In cPython however the fact that your running c code for the regex compare almost certainly out ways the algorithmic advantages of the suffix collection version in almost all cases.


回答 12

如果您只打算去除扩展名:

'.'.join('abcdc.com'.split('.')[:-1])
# 'abcdc'

它适用于任何扩展名,文件名中也可能存在其他点。它只是将字符串拆分为点列表,并在没有最后一个元素的情况下将其加入。

If you mean to only strip the extension:

'.'.join('abcdc.com'.split('.')[:-1])
# 'abcdc'

It works with any extension, with potential other dots existing in filename as well. It simply splits the string as a list on dots and joins it without the last element.


回答 13

import re

def rm_suffix(url = 'abcdc.com', suffix='\.com'):
    return(re.sub(suffix+'$', '', url))

我想重复这个答案,以此作为最有表现力的方式。当然,以下操作会减少CPU时间:

def rm_dotcom(url = 'abcdc.com'):
    return(url[:-4] if url.endswith('.com') else url)

但是,如果CPU是瓶颈,为什么要用Python编写?

无论如何,CPU何时会成为瓶颈?在司机中,也许。

使用正则表达式的优点是代码可重用性。如果下一个要删除只有三个字符的’.me’怎么办?

相同的代码可以解决问题:

>>> rm_sub('abcdc.me','.me')
'abcdc'
import re

def rm_suffix(url = 'abcdc.com', suffix='\.com'):
    return(re.sub(suffix+'$', '', url))

I want to repeat this answer as the most expressive way to do it. Of course, the following would take less CPU time:

def rm_dotcom(url = 'abcdc.com'):
    return(url[:-4] if url.endswith('.com') else url)

However, if CPU is the bottle neck why write in Python?

When is CPU a bottle neck anyway? In drivers, maybe.

The advantages of using regular expression is code reusability. What if you next want to remove ‘.me’, which only has three characters?

Same code would do the trick:

>>> rm_sub('abcdc.me','.me')
'abcdc'

回答 14

就我而言,我需要提出一个exceptions,所以我做到了:

class UnableToStripEnd(Exception):
    """A Exception type to indicate that the suffix cannot be removed from the text."""

    @staticmethod
    def get_exception(text, suffix):
        return UnableToStripEnd("Could not find suffix ({0}) on text: {1}."
                                .format(suffix, text))


def strip_end(text, suffix):
    """Removes the end of a string. Otherwise fails."""
    if not text.endswith(suffix):
        raise UnableToStripEnd.get_exception(text, suffix)
    return text[:len(text)-len(suffix)]

In my case I needed to raise an exception so I did:

class UnableToStripEnd(Exception):
    """A Exception type to indicate that the suffix cannot be removed from the text."""

    @staticmethod
    def get_exception(text, suffix):
        return UnableToStripEnd("Could not find suffix ({0}) on text: {1}."
                                .format(suffix, text))


def strip_end(text, suffix):
    """Removes the end of a string. Otherwise fails."""
    if not text.endswith(suffix):
        raise UnableToStripEnd.get_exception(text, suffix)
    return text[:len(text)-len(suffix)]

回答 15

在这里,我有一个最简单的代码。

url=url.split(".")[0]

Here,i have a simplest code.

url=url.split(".")[0]

回答 16

假定您要删除域,无论它是什么(.com,.net等)。我建议找到,.然后从此删除所有内容。

url = 'abcdc.com'
dot_index = url.rfind('.')
url = url[:dot_index]

在这里,我rfind用来解决url之类的问题abcdc.com.net,应该将其简化为name abcdc.com

如果您还担心www.s,则应明确检查它们:

if url.startswith("www."):
   url = url.replace("www.","", 1)

替换中的1用于奇怪的边缘情况,例如 www.net.www.com

如果您的网址比该网址更野,请查看人们响应的正则表达式答案。

Assuming you want to remove the domain, no matter what it is (.com, .net, etc). I recommend finding the . and removing everything from that point on.

url = 'abcdc.com'
dot_index = url.rfind('.')
url = url[:dot_index]

Here I’m using rfind to solve the problem of urls like abcdc.com.net which should be reduced to the name abcdc.com.

If you’re also concerned about www.s, you should explicitly check for them:

if url.startswith("www."):
   url = url.replace("www.","", 1)

The 1 in replace is for strange edgecases like www.net.www.com

If your url gets any wilder than that look at the regex answers people have responded with.


回答 17

我使用内置的rstrip函数来执行此操作,如下所示:

string = "test.com"
suffix = ".com"
newstring = string.rstrip(suffix)
print(newstring)
test

I used the built-in rstrip function to do it like follow:

string = "test.com"
suffix = ".com"
newstring = string.rstrip(suffix)
print(newstring)
test

回答 18

您可以使用split:

'abccomputer.com'.split('.com',1)[0]
# 'abccomputer'

You can use split:

'abccomputer.com'.split('.com',1)[0]
# 'abccomputer'

回答 19

这是正则表达式的完美用法:

>>> import re
>>> re.match(r"(.*)\.com", "hello.com").group(1)
'hello'

This is a perfect use for regular expressions:

>>> import re
>>> re.match(r"(.*)\.com", "hello.com").group(1)
'hello'

回答 20

Python> = 3.9:

'abcdc.com'.removesuffix('.com')

Python <3.9:

def remove_suffix(text, suffix):
    if text.endswith(suffix):
        text = text[:-len(suffix)]
    return text

remove_suffix('abcdc.com', '.com')

Python >= 3.9:

'abcdc.com'.removesuffix('.com')

Python < 3.9:

def remove_suffix(text, suffix):
    if text.endswith(suffix):
        text = text[:-len(suffix)]
    return text

remove_suffix('abcdc.com', '.com')