在pandas中的DataFrame上搜索“不包含”

Question 1

I’ve done some searching and can’t figure out how to filter a dataframe by df["col"].str.contains(word), however I’m wondering if there is a way to do the reverse: filter a dataframe by that set’s compliment. eg: to the effect of !(df["col"].str.contains(word)).

Can this be done through a DataFrame method?

Question 2

You can use the invert (~) operator (which acts like a not for boolean data):

new_df = df[~df["col"].str.contains(word)]

, where new_df is the copy returned by RHS.

contains also accepts a regular expression…

If the above throws a ValueError, the reason is likely because you have mixed datatypes, so use na=False:

new_df = df[~df["col"].str.contains(word, na=False)]

Or,

new_df = df[df["col"].str.contains(word) == False]

Question 3

I was having trouble with the not (~) symbol as well, so here’s another way from another StackOverflow thread:

df[df["col"].str.contains('this|that')==False]

Question 4

You can use Apply and Lambda to select rows where a column contains any thing in a list. For your scenario :

df[df["col"].apply(lambda x:x not in [word1,word2,word3])]

Question 5

I had to get rid of the NULL values before using the command recommended by Andy above. An example:

df = pd.DataFrame(index = [0, 1, 2], columns=['first', 'second', 'third'])
df.ix[:, 'first'] = 'myword'
df.ix[0, 'second'] = 'myword'
df.ix[2, 'second'] = 'myword'
df.ix[1, 'third'] = 'myword'
df

    first   second  third
0   myword  myword   NaN
1   myword  NaN      myword 
2   myword  myword   NaN

Now running the command:

~df["second"].str.contains(word)

I get the following error:

TypeError: bad operand type for unary ~: 'float'

I got rid of the NULL values using dropna() or fillna() first and retried the command with no problem.

Question 6

I hope the answers are already posted

I am adding the framework to find multiple words and negate those from dataFrame.

Here 'word1','word2','word3','word4' = list of patterns to search

df = DataFrame

column_a = A column name from from DataFrame df

Search_for_These_values = ['word1','word2','word3','word4'] 

pattern = '|'.join(Search_for_These_values)

result = df.loc[~(df['column_a'].str.contains(pattern, case=False)]

Question 7

Additional to nanselm2’s answer, you can use 0 instead of False:

df["col"].str.contains(word)==0

在pandas中的DataFrame上搜索“不包含”

问题：在pandas中的DataFrame上搜索“不包含”

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

7行代码 Python热力图可视化分析缺失数据处理

Python 流程图 — 一键转化代码为流程图

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

在与谓词匹配的序列中查找第一个元素

在matplotlib中的两条垂直线之间填充

快来试试Python写的游戏《我的世界》

type（）和isinstance（）有什么区别？

将列表拆分为较小的列表（一分为二）

在config.py中提供全局配置变量的最Pythonic方法？[关闭]

在pandas中的DataFrame上搜索“不包含”

问题：在pandas中的DataFrame上搜索“不包含”

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

相关文章

排行榜展示

文章展示