问题:如何检查字符串是否包含Python中列表中的元素
我有这样的事情:
extensionsToCheck = ['.pdf', '.doc', '.xls']
for extension in extensionsToCheck:
if extension in url_string:
print(url_string)
我想知道在Python中(不使用for循环)更优雅的方法是什么?我在想这样的事情(例如从C / C ++开始),但是没有用:
if ('.pdf' or '.doc' or '.xls') in url_string:
print(url_string)
编辑:我有点被迫解释这与下面的问题有何不同,该问题被标记为潜在重复(所以我猜它不会关闭)。
区别是,我想检查一个字符串是否是某些字符串列表的一部分,而另一个问题是检查字符串列表中的字符串是否是另一个字符串的子字符串。类似的,但不完全相同,当您在网上寻找答案时,语义很重要。这两个问题实际上是在寻求解决彼此相反的问题。两者的解决方案虽然相同。
I have something like this:
extensionsToCheck = ['.pdf', '.doc', '.xls']
for extension in extensionsToCheck:
if extension in url_string:
print(url_string)
I am wondering what would be the more elegant way to do this in Python (without using the for loop)? I was thinking of something like this (like from C/C++), but it didn’t work:
if ('.pdf' or '.doc' or '.xls') in url_string:
print(url_string)
Edit: I’m kinda forced to explain how this is different to the question below which is marked as potential duplicate (so it doesn’t get closed I guess).
The difference is, I wanted to check if a string is part of some list of strings whereas the other question is checking whether a string from a list of strings is a substring of another string. Similar, but not quite the same and semantics matter when you’re looking for an answer online IMHO. These two questions are actually looking to solve the opposite problem of one another. The solution for both turns out to be the same though.
回答 0
与一起使用生成器any
,它会在第一个True上短路:
if any(ext in url_string for ext in extensionsToCheck):
print(url_string)
编辑:我看到这个答案已经被OP接受。尽管我的解决方案可能是解决他特定问题的“足够好”的解决方案,并且是检查列表中是否有任何字符串在另一个字符串中找到的一种很好的通用方法,但请记住,这就是该解决方案的全部工作。不管在哪里找到字符串,例如在字符串的末尾。如果这很重要(通常是使用url的情况),则应查看@Wladimir Palant的答案,否则,您可能会得到误报。
Use a generator together with any
, which short-circuits on the first True:
if any(ext in url_string for ext in extensionsToCheck):
print(url_string)
EDIT: I see this answer has been accepted by OP. Though my solution may be “good enough” solution to his particular problem, and is a good general way to check if any strings in a list are found in another string, keep in mind that this is all that this solution does. It does not care WHERE the string is found e.g. in the ending of the string. If this is important, as is often the case with urls, you should look to the answer of @Wladimir Palant, or you risk getting false positives.
回答 1
extensionsToCheck = ('.pdf', '.doc', '.xls')
'test.doc'.endswith(extensionsToCheck) # returns True
'test.jpg'.endswith(extensionsToCheck) # returns False
extensionsToCheck = ('.pdf', '.doc', '.xls')
'test.doc'.endswith(extensionsToCheck) # returns True
'test.jpg'.endswith(extensionsToCheck) # returns False
回答 2
这是更好地解析正确的URL -这种方式,您可以处理http://.../file.doc?foo
和http://.../foo.doc/file.exe
正确。
from urlparse import urlparse
import os
path = urlparse(url_string).path
ext = os.path.splitext(path)[1]
if ext in extensionsToCheck:
print(url_string)
It is better to parse the URL properly – this way you can handle http://.../file.doc?foo
and http://.../foo.doc/file.exe
correctly.
from urlparse import urlparse
import os
path = urlparse(url_string).path
ext = os.path.splitext(path)[1]
if ext in extensionsToCheck:
print(url_string)
回答 3
如果需要单行解决方案,请使用列表推导。以下代码在扩展名为.doc,.pdf和.xls时返回包含url_string的列表,或者在不包含扩展名时返回空列表。
print [url_string for extension in extensionsToCheck if(extension in url_string)]
注意:这仅是检查它是否包含,并且在想要提取与扩展名匹配的确切单词时无用。
Use list comprehensions if you want a single line solution. The following code returns a list containing the url_string when it has the extensions .doc, .pdf and .xls or returns empty list when it doesn’t contain the extension.
print [url_string for extension in extensionsToCheck if(extension in url_string)]
NOTE: This is only to check if it contains or not and is not useful when one wants to extract the exact word matching the extensions.
回答 4
检查它是否与此正则表达式匹配:
'(\.pdf$|\.doc$|\.xls$)'
注意:如果扩展名不在URL的末尾,请删除$
字符,但这会稍微削弱它
Check if it matches this regex:
'(\.pdf$|\.doc$|\.xls$)'
Note: if you extensions are not at the end of the url, remove the $
characters, but it does weaken it slightly
回答 5
这是@psun给出的列表理解答案的一种变体。
通过切换输出值,您实际上可以从列表理解中提取匹配的模式(any()
@ Lauritz-v-Thaulow 的方法无法做到这一点)
extensionsToCheck = ['.pdf', '.doc', '.xls']
url_string = 'http://.../foo.doc'
print [extension for extension in extensionsToCheck if(extension in url_string)]
[‘.doc’]`
如果想要在知道匹配的模式后收集其他信息,则可以进一步插入正则表达式(当允许的模式列表太长而无法写入单个regex模式时,这可能会很有用)
print [re.search(r'(\w+)'+extension, url_string).group(0) for extension in extensionsToCheck if(extension in url_string)]
['foo.doc']
This is a variant of the list comprehension answer given by @psun.
By switching the output value, you can actually extract the matching pattern from the list comprehension (something not possible with the any()
approach by @Lauritz-v-Thaulow)
extensionsToCheck = ['.pdf', '.doc', '.xls']
url_string = 'http://.../foo.doc'
print [extension for extension in extensionsToCheck if(extension in url_string)]
[‘.doc’]`
You can furthermore insert a regular expression if you want to collect additional information once the matched pattern is known (this could be useful when the list of allowed patterns is too long to write into a single regex pattern)
print [re.search(r'(\w+)'+extension, url_string).group(0) for extension in extensionsToCheck if(extension in url_string)]
['foo.doc']