标签归档:html

在Python中转义HTML的最简单方法是什么?

问题:在Python中转义HTML的最简单方法是什么?

cgi.escape似乎是一种可能的选择。它运作良好吗?有什么更好的东西吗?

cgi.escape seems like one possible choice. Does it work well? Is there something that is considered better?


回答 0

cgi.escape很好 它逃脱了:

  • <&lt;
  • >&gt;
  • &&amp;

对于所有HTML而言,这就足够了。

编辑:如果您有非ASCII字符,您还想转义,以便包含在使用不同编码的另一个编码文档中,如Craig所说,只需使用:

data.encode('ascii', 'xmlcharrefreplace')

不要忘了解码dataunicode第一,使用任何编码它编码的。

但是根据我的经验,如果您unicode从头开始一直都在工作,那么这种编码是没有用的。只需在文档头中指定的编码末尾进行编码(utf-8以实现最大兼容性)。

例:

>>> cgi.escape(u'<a>bá</a>').encode('ascii', 'xmlcharrefreplace')
'&lt;a&gt;b&#225;&lt;/a&gt;

另外值得一提的(感谢Greg)是额外的quote参数cgi.escape。将其设置为时Truecgi.escape还转义双引号字符("),因此您可以在XML / HTML属性中使用结果值。

编辑:请注意,cgi.escape已在Python 3.2中弃用,转而使用html.escape,它的功能相同,但quote默认情况下为True。

cgi.escape is fine. It escapes:

  • < to &lt;
  • > to &gt;
  • & to &amp;

That is enough for all HTML.

EDIT: If you have non-ascii chars you also want to escape, for inclusion in another encoded document that uses a different encoding, like Craig says, just use:

data.encode('ascii', 'xmlcharrefreplace')

Don’t forget to decode data to unicode first, using whatever encoding it was encoded.

However in my experience that kind of encoding is useless if you just work with unicode all the time from start. Just encode at the end to the encoding specified in the document header (utf-8 for maximum compatibility).

Example:

>>> cgi.escape(u'<a>bá</a>').encode('ascii', 'xmlcharrefreplace')
'&lt;a&gt;b&#225;&lt;/a&gt;

Also worth of note (thanks Greg) is the extra quote parameter cgi.escape takes. With it set to True, cgi.escape also escapes double quote chars (") so you can use the resulting value in a XML/HTML attribute.

EDIT: Note that cgi.escape has been deprecated in Python 3.2 in favor of html.escape, which does the same except that quote defaults to True.


回答 1

在Python 3.2中,新 html,引入模块,该模块用于从HTML标记转义保留字符。

它具有一个功能escape()

>>> import html
>>> html.escape('x > 2 && x < 7 single quote: \' double quote: "')
'x &gt; 2 &amp;&amp; x &lt; 7 single quote: &#x27; double quote: &quot;'

In Python 3.2 a new html module was introduced, which is used for escaping reserved characters from HTML markup.

It has one function escape():

>>> import html
>>> html.escape('x > 2 && x < 7 single quote: \' double quote: "')
'x &gt; 2 &amp;&amp; x &lt; 7 single quote: &#x27; double quote: &quot;'

回答 2

如果您希望在URL中转义HTML:

这可能不是OP想要的(问题并没有明确指出转义是在哪种上下文中使用的),但是Python的本机库urllib有一种方法可以转义需要安全包含在URL中的HTML实体。

以下是一个示例:

#!/usr/bin/python
from urllib import quote

x = '+<>^&'
print quote(x) # prints '%2B%3C%3E%5E%26'

在这里找到文件

If you wish to escape HTML in a URL:

This is probably NOT what the OP wanted (the question doesn’t clearly indicate in which context the escaping is meant to be used), but Python’s native library urllib has a method to escape HTML entities that need to be included in a URL safely.

The following is an example:

#!/usr/bin/python
from urllib import quote

x = '+<>^&'
print quote(x) # prints '%2B%3C%3E%5E%26'

Find docs here


回答 3

还有出色的markupsafe软件包

>>> from markupsafe import Markup, escape
>>> escape("<script>alert(document.cookie);</script>")
Markup(u'&lt;script&gt;alert(document.cookie);&lt;/script&gt;')

markupsafe程序包经过精心设计,并且可能是逃避转义的最通用,最Python化的方法,恕我直言,因为:

  1. return(Markup)是从unicode派生的类(即isinstance(escape('str'), unicode) == True
  2. 它可以正确处理unicode输入
  3. 它适用于Python(2.6、2.7、3.3和pypy)
  4. 它尊重对象(即具有__html__属性的对象)和模板重载(__html_format__)的自定义方法。

There is also the excellent markupsafe package.

>>> from markupsafe import Markup, escape
>>> escape("<script>alert(document.cookie);</script>")
Markup(u'&lt;script&gt;alert(document.cookie);&lt;/script&gt;')

The markupsafe package is well engineered, and probably the most versatile and Pythonic way to go about escaping, IMHO, because:

  1. the return (Markup) is a class derived from unicode (i.e. isinstance(escape('str'), unicode) == True
  2. it properly handles unicode input
  3. it works in Python (2.6, 2.7, 3.3, and pypy)
  4. it respects custom methods of objects (i.e. objects with a __html__ property) and template overloads (__html_format__).

回答 4

cgi.escape 从转义HTML标记和字符实体的有限意义上讲,应该可以逃脱HTML。

但是,您可能还必须考虑编码问题:如果要引用的HTML在特定的编码中包含非ASCII字符,那么还必须注意在引用时要合理地表示这些字符。也许您可以将它们转换为实体。否则,您应确保在“源” HTML和嵌入页面之间进行正确的编码转换,以避免损坏非ASCII字符。

cgi.escape should be good to escape HTML in the limited sense of escaping the HTML tags and character entities.

But you might have to also consider encoding issues: if the HTML you want to quote has non-ASCII characters in a particular encoding, then you would also have to take care that you represent those sensibly when quoting. Perhaps you could convert them to entities. Otherwise you should ensure that the correct encoding translations are done between the “source” HTML and the page it’s embedded in, to avoid corrupting the non-ASCII characters.


回答 5

没有库,纯python,可以安全地将文本转义为html文本:

text.replace('&', '&amp;').replace('>', '&gt;').replace('<', '&lt;'
        ).encode('ascii', 'xmlcharrefreplace')

No libraries, pure python, safely escapes text into html text:

text.replace('&', '&amp;').replace('>', '&gt;').replace('<', '&lt;'
        ).encode('ascii', 'xmlcharrefreplace')

回答 6

cgi.escape 扩展的

此版本进行了改进cgi.escape。它还保留空格和换行符。返回一个unicode字符串。

def escape_html(text):
    """escape strings for display in HTML"""
    return cgi.escape(text, quote=True).\
           replace(u'\n', u'<br />').\
           replace(u'\t', u'&emsp;').\
           replace(u'  ', u' &nbsp;')

例如

>>> escape_html('<foo>\nfoo\t"bar"')
u'&lt;foo&gt;<br />foo&emsp;&quot;bar&quot;'

cgi.escape extended

This version improves cgi.escape. It also preserves whitespace and newlines. Returns a unicode string.

def escape_html(text):
    """escape strings for display in HTML"""
    return cgi.escape(text, quote=True).\
           replace(u'\n', u'<br />').\
           replace(u'\t', u'&emsp;').\
           replace(u'  ', u' &nbsp;')

for example

>>> escape_html('<foo>\nfoo\t"bar"')
u'&lt;foo&gt;<br />foo&emsp;&quot;bar&quot;'

回答 7

不是最简单的方法,但仍然很简单。与cgi.escape模块的主要区别-如果您已经&amp;在文本中使用了它,它仍然可以正常工作。从评论中可以看到:

cgi.escape版本

def escape(s, quote=None):
    '''Replace special characters "&", "<" and ">" to HTML-safe sequences.
    If the optional flag quote is true, the quotation mark character (")
is also translated.'''
    s = s.replace("&", "&amp;") # Must be done first!
    s = s.replace("<", "&lt;")
    s = s.replace(">", "&gt;")
    if quote:
        s = s.replace('"', "&quot;")
    return s

正则表达式版本

QUOTE_PATTERN = r"""([&<>"'])(?!(amp|lt|gt|quot|#39);)"""
def escape(word):
    """
    Replaces special characters <>&"' to HTML-safe sequences. 
    With attention to already escaped characters.
    """
    replace_with = {
        '<': '&gt;',
        '>': '&lt;',
        '&': '&amp;',
        '"': '&quot;', # should be escaped in attributes
        "'": '&#39'    # should be escaped in attributes
    }
    quote_pattern = re.compile(QUOTE_PATTERN)
    return re.sub(quote_pattern, lambda x: replace_with[x.group(0)], word)

Not the easiest way, but still straightforward. The main difference from cgi.escape module – it still will work properly if you already have &amp; in your text. As you see from comments to it:

cgi.escape version

def escape(s, quote=None):
    '''Replace special characters "&", "<" and ">" to HTML-safe sequences.
    If the optional flag quote is true, the quotation mark character (")
is also translated.'''
    s = s.replace("&", "&amp;") # Must be done first!
    s = s.replace("<", "&lt;")
    s = s.replace(">", "&gt;")
    if quote:
        s = s.replace('"', "&quot;")
    return s

regex version

QUOTE_PATTERN = r"""([&<>"'])(?!(amp|lt|gt|quot|#39);)"""
def escape(word):
    """
    Replaces special characters <>&"' to HTML-safe sequences. 
    With attention to already escaped characters.
    """
    replace_with = {
        '<': '&gt;',
        '>': '&lt;',
        '&': '&amp;',
        '"': '&quot;', # should be escaped in attributes
        "'": '&#39'    # should be escaped in attributes
    }
    quote_pattern = re.compile(QUOTE_PATTERN)
    return re.sub(quote_pattern, lambda x: replace_with[x.group(0)], word)

回答 8

对于Python 2.7中的旧代码,可以通过BeautifulSoup4做到:

>>> bs4.dammit import EntitySubstitution
>>> esub = EntitySubstitution()
>>> esub.substitute_html("r&d")
'r&amp;d'

For legacy code in Python 2.7, can do it via BeautifulSoup4:

>>> bs4.dammit import EntitySubstitution
>>> esub = EntitySubstitution()
>>> esub.substitute_html("r&d")
'r&amp;d'

提取正则表达式匹配项的一部分

问题:提取正则表达式匹配项的一部分

我想要一个正则表达式从HTML页面提取标题。目前我有这个:

title = re.search('<title>.*</title>', html, re.IGNORECASE).group()
if title:
    title = title.replace('<title>', '').replace('</title>', '') 

是否有一个正则表达式仅提取<title>的内容,所以我不必删除标签?

I want a regular expression to extract the title from a HTML page. Currently I have this:

title = re.search('<title>.*</title>', html, re.IGNORECASE).group()
if title:
    title = title.replace('<title>', '').replace('</title>', '') 

Is there a regular expression to extract just the contents of <title> so I don’t have to remove the tags?


回答 0

( )在正则表达式和group(1)python中检索捕获的字符串(re.search将返回None如果没有找到结果,所以不要用group()直接):

title_search = re.search('<title>(.*)</title>', html, re.IGNORECASE)

if title_search:
    title = title_search.group(1)

Use ( ) in regexp and group(1) in python to retrieve the captured string (re.search will return None if it doesn’t find the result, so don’t use group() directly):

title_search = re.search('<title>(.*)</title>', html, re.IGNORECASE)

if title_search:
    title = title_search.group(1)

回答 1

请注意,通过开始Python 3.8并引入赋值表达式(PEP 572):=运算符),可以通过在if条件中直接将匹配结果捕获为变量并将其在条件体内重复使用,从而对KrzysztofKrasoń解决方案进行一些改进:

# pattern = '<title>(.*)</title>'
# text = '<title>hello</title>'
if match := re.search(pattern, text, re.IGNORECASE):
  title = match.group(1)
# hello

Note that starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), it’s possible to improve a bit on Krzysztof Krasoń’s solution by capturing the match result directly within the if condition as a variable and re-use it in the condition’s body:

# pattern = '<title>(.*)</title>'
# text = '<title>hello</title>'
if match := re.search(pattern, text, re.IGNORECASE):
  title = match.group(1)
# hello

回答 2

尝试使用捕获组:

title = re.search('<title>(.*)</title>', html, re.IGNORECASE).group(1)

Try using capturing groups:

title = re.search('<title>(.*)</title>', html, re.IGNORECASE).group(1)

回答 3

re.search('<title>(.*)</title>', s, re.IGNORECASE).group(1)
re.search('<title>(.*)</title>', s, re.IGNORECASE).group(1)

回答 4

我可以推荐你去美丽汤。汤是一个很好的库,可以解析您的所有html文档。

soup = BeatifulSoup(html_doc)
titleName = soup.title.name

May I recommend you to Beautiful Soup. Soup is a very good lib to parse all of your html document.

soup = BeatifulSoup(html_doc)
titleName = soup.title.name

回答 5

尝试:

title = re.search('<title>(.*)</title>', html, re.IGNORECASE).group(1)

Try:

title = re.search('<title>(.*)</title>', html, re.IGNORECASE).group(1)

回答 6

提供的代码段不能应付Exceptions 我的建议

getattr(re.search(r"<title>(.*)</title>", s, re.IGNORECASE), 'groups', lambda:[u""])()[0]

如果未找到模式或第一个匹配项,则默认情况下返回空字符串。

The provided pieces of code do not cope with Exceptions May I suggest

getattr(re.search(r"<title>(.*)</title>", s, re.IGNORECASE), 'groups', lambda:[u""])()[0]

This returns an empty string by default if the pattern has not been found, or the first match.


回答 7

我认为这足够了:

#!python
import re
pattern = re.compile(r'<title>([^<]*)</title>', re.MULTILINE|re.IGNORECASE)
pattern.search(text)

…假设您的文本(HTML)位于名为“ text”的变量中。

这也假定没有其他HTML标记可以合法地嵌入HTML TITLE标记内部,并且没有办法合法地将任何其他<字符嵌入这样的容器/块中。

但是

不要在Python中使用正则表达式进行HTML解析。使用HTML解析器!(除非您要编写完整的解析器,否则当标准库中已经包含各种HTML,SGML和XML解析器时,这将是一项额外的工作。

如果您处理“真实世界” 标记汤 HTML(通常不符合任何SGML / XML验证器),请使用BeautifulSoup包。它尚未出现在标准库中,但为此目的广泛建议使用。

另一个选项是:lxml …,它是为结构正确(符合标准的HTML)编写的。但是它可以选择退回到使用BeautifulSoup作为解析器:ElementSoup

I’d think this should suffice:

#!python
import re
pattern = re.compile(r'<title>([^<]*)</title>', re.MULTILINE|re.IGNORECASE)
pattern.search(text)

… assuming that your text (HTML) is in a variable named “text.”

This also assumes that there are not other HTML tags which can be legally embedded inside of an HTML TITLE tag and no way to legally embed any other < character within such a container/block.

However

Don’t use regular expressions for HTML parsing in Python. Use an HTML parser! (Unless you’re going to write a full parser, which would be a of extra work when various HTML, SGML and XML parsers are already in the standard libraries.

If your handling “real world” tag soup HTML (which is frequently non-conforming to any SGML/XML validator) then use the BeautifulSoup package. It isn’t in the standard libraries (yet) but is wide recommended for this purpose.

Another option is: lxml … which is written for properly structured (standards conformant) HTML. But it has an option to fallback to using BeautifulSoup as a parser: ElementSoup.


将Django表单字段更改为隐藏字段

问题:将Django表单字段更改为隐藏字段

我有一个带的Django表单,与RegexField正常的文本输入字段非常相似。

我认为,在某些情况下,我想对用户隐藏它,并尝试使表单尽可能相似。将这个领域变成一个HiddenInput领域的最好方法是什么?

我知道我可以使用以下方法在字段上设置属性:

form['fieldname'].field.widget.attr['readonly'] = 'readonly'

我可以通过以下方式设置所需的初始值:

form.initial['fieldname'] = 'mydesiredvalue'

但是,这不会更改小部件的形式。

什么是使此字段成为<input type="hidden">字段的最佳/最“ django-y” /最不“ hacky”的方法?

I have a Django form with a RegexField, which is very similar to a normal text input field.

In my view, under certain conditions I want to hide it from the user, and trying to keep the form as similar as possible. What’s the best way to turn this field into a HiddenInput field?

I know I can set attributes on the field with:

form['fieldname'].field.widget.attr['readonly'] = 'readonly'

And I can set the desired initial value with:

form.initial['fieldname'] = 'mydesiredvalue'

However, that won’t change the form of the widget.

What’s the best / most “django-y” / least “hacky” way to make this field a <input type="hidden"> field?


回答 0

如果您具有自定义模板并查看,则可以排除该字段并用于{{ modelform.instance.field }}获取值。

您也可以在视图中使用:

form.fields['field_name'].widget = forms.HiddenInput()

但我不确定它是否会保护发布后的保存方法。

希望能帮助到你。

If you have a custom template and view you may exclude the field and use {{ modelform.instance.field }} to get the value.

also you may prefer to use in the view:

form.fields['field_name'].widget = forms.HiddenInput()

but I’m not sure it will protect save method on post.

Hope it helps.


回答 1

这也可能有用: {{ form.field.as_hidden }}

This may also be useful: {{ form.field.as_hidden }}


回答 2

一个对我有用的选项,将字段的原始形式定义为:

forms.CharField(widget = forms.HiddenInput(), required = False)

那么当您在新的类中覆盖它时,它将保留它的位置。

an option that worked for me, define the field in the original form as:

forms.CharField(widget = forms.HiddenInput(), required = False)

then when you override it in the new Class it will keep it’s place.


回答 3

首先,如果您不希望用户修改数据,则仅排除该字段似乎更干净。将其包含为隐藏字段只会增加更多数据以通过有线方式发送,并在您不希望恶意用户修改时邀请它们进行修改。如果确实有理由包括该字段但将其隐藏,则可以将关键字arg传递给modelform的构造函数。可能是这样的:

class MyModelForm(forms.ModelForm):
    class Meta:
        model = MyModel
    def __init__(self, *args, **kwargs):
        from django.forms.widgets import HiddenInput
        hide_condition = kwargs.pop('hide_condition',None)
        super(MyModelForm, self).__init__(*args, **kwargs)
        if hide_condition:
            self.fields['fieldname'].widget = HiddenInput()
            # or alternately:  del self.fields['fieldname']  to remove it from the form altogether.

然后在您看来:

form = MyModelForm(hide_condition=True)

在视图中,我更喜欢这种方法来修改模型形式的内部,但这只是一个问题。

Firstly, if you don’t want the user to modify the data, then it seems cleaner to simply exclude the field. Including it as a hidden field just adds more data to send over the wire and invites a malicious user to modify it when you don’t want them to. If you do have a good reason to include the field but hide it, you can pass a keyword arg to the modelform’s constructor. Something like this perhaps:

class MyModelForm(forms.ModelForm):
    class Meta:
        model = MyModel
    def __init__(self, *args, **kwargs):
        from django.forms.widgets import HiddenInput
        hide_condition = kwargs.pop('hide_condition',None)
        super(MyModelForm, self).__init__(*args, **kwargs)
        if hide_condition:
            self.fields['fieldname'].widget = HiddenInput()
            # or alternately:  del self.fields['fieldname']  to remove it from the form altogether.

Then in your view:

form = MyModelForm(hide_condition=True)

I prefer this approach to modifying the modelform’s internals in the view, but it’s a matter of taste.


回答 4

对于正常形式,您可以

class MyModelForm(forms.ModelForm):
    slug = forms.CharField(widget=forms.HiddenInput())

如果您有模型表格,则可以执行以下操作

class MyModelForm(forms.ModelForm):
    class Meta:
        model = TagStatus
        fields = ('slug', 'ext')
        widgets = {'slug': forms.HiddenInput()}

您也可以覆盖__init__方法

class Myform(forms.Form):
    def __init__(self, *args, **kwargs):
        super(Myform, self).__init__(*args, **kwargs)
        self.fields['slug'].widget = forms.HiddenInput()

For normal form you can do

class MyModelForm(forms.ModelForm):
    slug = forms.CharField(widget=forms.HiddenInput())

If you have model form you can do the following

class MyModelForm(forms.ModelForm):
    class Meta:
        model = TagStatus
        fields = ('slug', 'ext')
        widgets = {'slug': forms.HiddenInput()}

You can also override __init__ method

class Myform(forms.Form):
    def __init__(self, *args, **kwargs):
        super(Myform, self).__init__(*args, **kwargs)
        self.fields['slug'].widget = forms.HiddenInput()

回答 5

如果要始终隐藏该字段,请使用以下命令:

class MyForm(forms.Form):
    hidden_input = forms.CharField(widget=forms.HiddenInput(), initial="value")

If you want the field to always be hidden, use the following:

class MyForm(forms.Form):
    hidden_input = forms.CharField(widget=forms.HiddenInput(), initial="value")

回答 6

您可以只使用css:

#id_fieldname, label[for="id_fieldname"] {
  position: absolute;
  display: none
}

这将使该字段及其标签不可见。

You can just use css :

#id_fieldname, label[for="id_fieldname"] {
  position: absolute;
  display: none
}

This will make the field and its label invisible.


如何使用Python请求伪造浏览器访问?

问题:如何使用Python请求伪造浏览器访问?

我想从下面的网站获取内容。如果使用Firefox或Chrome这样的浏览器,则可以获取所需的真实网站页面,但是如果使用Python request软件包(或wget命令)进行获取,则它将返回完全不同的HTML页面。我以为网站的开发人员为此做了一些阻碍,所以问题是:

如何使用python请求或命令wget伪造浏览器访问?

http://www.ichangtou.com/#company:data_000008.html

I want to get the content from the below website. If I use a browser like Firefox or Chrome I could get the real website page I want, but if I use the Python requests package (or wget command) to get it, it returns a totally different HTML page. I thought the developer of the website had made some blocks for this, so the question is:

How do I fake a browser visit by using python requests or command wget?

http://www.ichangtou.com/#company:data_000008.html


回答 0

提供User-Agent标题

import requests

url = 'http://www.ichangtou.com/#company:data_000008.html'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

response = requests.get(url, headers=headers)
print(response.content)

仅供参考,这是不同浏览器的用户代理字符串的列表:


附带说明一下,有一个非常有用的第三方程序包,称为fake-useragent,它在用户代理上提供了一个不错的抽象层:

假用户代理

最新的简单useragent伪造者与实际数据库

演示:

>>> from fake_useragent import UserAgent
>>> ua = UserAgent()
>>> ua.chrome
u'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36'
>>> ua.random
u'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36'

Provide a User-Agent header:

import requests

url = 'http://www.ichangtou.com/#company:data_000008.html'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

response = requests.get(url, headers=headers)
print(response.content)

FYI, here is a list of User-Agent strings for different browsers:


As a side note, there is a pretty useful third-party package called fake-useragent that provides a nice abstraction layer over user agents:

fake-useragent

Up to date simple useragent faker with real world database

Demo:

>>> from fake_useragent import UserAgent
>>> ua = UserAgent()
>>> ua.chrome
u'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36'
>>> ua.random
u'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36'

回答 1

如果这个问题仍然有效

我使用了伪造的UserAgent

如何使用:

from fake_useragent import UserAgent
import requests


ua = UserAgent()
print(ua.chrome)
header = {'User-Agent':str(ua.chrome)}
print(header)
url = "https://www.hybrid-analysis.com/recent-submissions?filter=file&sort=^timestamp"
htmlContent = requests.get(url, headers=header)
print(htmlContent)

输出:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1309.0 Safari/537.17
{'User-Agent': 'Mozilla/5.0 (X11; OpenBSD i386) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36'}
<Response [200]>

if this question is still valid

I used fake UserAgent

How to use:

from fake_useragent import UserAgent
import requests


ua = UserAgent()
print(ua.chrome)
header = {'User-Agent':str(ua.chrome)}
print(header)
url = "https://www.hybrid-analysis.com/recent-submissions?filter=file&sort=^timestamp"
htmlContent = requests.get(url, headers=header)
print(htmlContent)

outPut:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1309.0 Safari/537.17
{'User-Agent': 'Mozilla/5.0 (X11; OpenBSD i386) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36'}
<Response [200]>

回答 2

尝试使用Firefox作为伪造的用户代理来执行此操作(此外,这是使用Cookie进行网络抓取的良好启动脚本):

#!/usr/bin/env python2
# -*- coding: utf8 -*-
# vim:ts=4:sw=4


import cookielib, urllib2, sys

def doIt(uri):
    cj = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    page = opener.open(uri)
    page.addheaders = [('User-agent', 'Mozilla/5.0')]
    print page.read()

for i in sys.argv[1:]:
    doIt(i)

用法:

python script.py "http://www.ichangtou.com/#company:data_000008.html"

Try doing this, using firefox as fake user agent (moreover, it’s a good startup script for web scraping with the use of cookies):

#!/usr/bin/env python2
# -*- coding: utf8 -*-
# vim:ts=4:sw=4


import cookielib, urllib2, sys

def doIt(uri):
    cj = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    page = opener.open(uri)
    page.addheaders = [('User-agent', 'Mozilla/5.0')]
    print page.read()

for i in sys.argv[1:]:
    doIt(i)

USAGE:

python script.py "http://www.ichangtou.com/#company:data_000008.html"

回答 3

答案的根源是,提出问题的人需要有一个JavaScript解释器才能获得所要查找的内容。我发现我可以在JSON网站上获取想要的所有信息,然后再用JavaScript对其进行解释。这为我节省了很多时间来解析html,希望每个网页都采用相同的格式。

因此,当您从网站收到使用请求的响应时,请真正查看html / text,因为您可能会在页脚中找到可解析的javascripts JSON。

The root of the answer is that the person asking the question needs to have a JavaScript interpreter to get what they are after. What I have found is I am able to get all of the information I wanted on a website in json before it was interpreted by JavaScript. This has saved me a ton of time in what would be parsing html hoping each webpage is in the same format.

So when you get a response from a website using requests really look at the html/text because you might find the javascripts JSON in the footer ready to be parsed.


在Django网站中将HTML渲染为PDF

问题:在Django网站中将HTML渲染为PDF

对于我的django网站,我正在寻找一种将动态html页面转换为pdf的简单解决方案。

页面包含HTML和来自Google可视化API的图表(该图表基于javascript,但必须包含这些图表)。

For my django powered site, I am looking for an easy solution to convert dynamic html pages to pdf.

Pages include HTML and charts from Google visualization API (which is javascript based, yet including those graphs is a must).


回答 0

尝试从Reportlab解决方案。

下载并像往常一样使用python setup.py install安装

您还需要安装以下模块:具有easy_install的xhtml2pdf,html5lib,pypdf。

这是一个用法示例:

首先定义此功能:

import cStringIO as StringIO
from xhtml2pdf import pisa
from django.template.loader import get_template
from django.template import Context
from django.http import HttpResponse
from cgi import escape


def render_to_pdf(template_src, context_dict):
    template = get_template(template_src)
    context = Context(context_dict)
    html  = template.render(context)
    result = StringIO.StringIO()

    pdf = pisa.pisaDocument(StringIO.StringIO(html.encode("ISO-8859-1")), result)
    if not pdf.err:
        return HttpResponse(result.getvalue(), content_type='application/pdf')
    return HttpResponse('We had some errors<pre>%s</pre>' % escape(html))

然后,您可以像这样使用它:

def myview(request):
    #Retrieve data or whatever you need
    return render_to_pdf(
            'mytemplate.html',
            {
                'pagesize':'A4',
                'mylist': results,
            }
        )

模板:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
    <head>
        <title>My Title</title>
        <style type="text/css">
            @page {
                size: {{ pagesize }};
                margin: 1cm;
                @frame footer {
                    -pdf-frame-content: footerContent;
                    bottom: 0cm;
                    margin-left: 9cm;
                    margin-right: 9cm;
                    height: 1cm;
                }
            }
        </style>
    </head>
    <body>
        <div>
            {% for item in mylist %}
                RENDER MY CONTENT
            {% endfor %}
        </div>
        <div id="footerContent">
            {%block page_foot%}
                Page <pdf:pagenumber>
            {%endblock%}
        </div>
    </body>
</html>

希望能帮助到你。

Try the solution from Reportlab.

Download it and install it as usual with python setup.py install

You will also need to install the following modules: xhtml2pdf, html5lib, pypdf with easy_install.

Here is an usage example:

First define this function:

import cStringIO as StringIO
from xhtml2pdf import pisa
from django.template.loader import get_template
from django.template import Context
from django.http import HttpResponse
from cgi import escape


def render_to_pdf(template_src, context_dict):
    template = get_template(template_src)
    context = Context(context_dict)
    html  = template.render(context)
    result = StringIO.StringIO()

    pdf = pisa.pisaDocument(StringIO.StringIO(html.encode("ISO-8859-1")), result)
    if not pdf.err:
        return HttpResponse(result.getvalue(), content_type='application/pdf')
    return HttpResponse('We had some errors<pre>%s</pre>' % escape(html))

Then you can use it like this:

def myview(request):
    #Retrieve data or whatever you need
    return render_to_pdf(
            'mytemplate.html',
            {
                'pagesize':'A4',
                'mylist': results,
            }
        )

The template:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
    <head>
        <title>My Title</title>
        <style type="text/css">
            @page {
                size: {{ pagesize }};
                margin: 1cm;
                @frame footer {
                    -pdf-frame-content: footerContent;
                    bottom: 0cm;
                    margin-left: 9cm;
                    margin-right: 9cm;
                    height: 1cm;
                }
            }
        </style>
    </head>
    <body>
        <div>
            {% for item in mylist %}
                RENDER MY CONTENT
            {% endfor %}
        </div>
        <div id="footerContent">
            {%block page_foot%}
                Page <pdf:pagenumber>
            {%endblock%}
        </div>
    </body>
</html>

Hope it helps.


回答 1

https://github.com/nigma/django-easy-pdf

模板:

{% extends "easy_pdf/base.html" %}

{% block content %}
    <div id="content">
        <h1>Hi there!</h1>
    </div>
{% endblock %}

视图:

from easy_pdf.views import PDFTemplateView

class HelloPDFView(PDFTemplateView):
    template_name = "hello.html"

如果要在Python 3上使用django-easy-pdf,请检查此处建议的解决方案。

https://github.com/nigma/django-easy-pdf

Template:

{% extends "easy_pdf/base.html" %}

{% block content %}
    <div id="content">
        <h1>Hi there!</h1>
    </div>
{% endblock %}

View:

from easy_pdf.views import PDFTemplateView

class HelloPDFView(PDFTemplateView):
    template_name = "hello.html"

If you want to use django-easy-pdf on Python 3 check the solution suggested here.


回答 2

我只是为CBV打了个招。未在生产中使用,但会为我生成PDF。可能需要为错误报告方面的事情工作,但到目前为止仍能解决问题。

import StringIO
from cgi import escape
from xhtml2pdf import pisa
from django.http import HttpResponse
from django.template.response import TemplateResponse
from django.views.generic import TemplateView

class PDFTemplateResponse(TemplateResponse):

    def generate_pdf(self, retval):

        html = self.content

        result = StringIO.StringIO()
        rendering = pisa.pisaDocument(StringIO.StringIO(html.encode("ISO-8859-1")), result)

        if rendering.err:
            return HttpResponse('We had some errors<pre>%s</pre>' % escape(html))
        else:
            self.content = result.getvalue()

    def __init__(self, *args, **kwargs):
        super(PDFTemplateResponse, self).__init__(*args, mimetype='application/pdf', **kwargs)
        self.add_post_render_callback(self.generate_pdf)


class PDFTemplateView(TemplateView):
    response_class = PDFTemplateResponse

像这样使用:

class MyPdfView(PDFTemplateView):
    template_name = 'things/pdf.html'

I just whipped this up for CBV. Not used in production but generates a PDF for me. Probably needs work for the error reporting side of things but does the trick so far.

import StringIO
from cgi import escape
from xhtml2pdf import pisa
from django.http import HttpResponse
from django.template.response import TemplateResponse
from django.views.generic import TemplateView

class PDFTemplateResponse(TemplateResponse):

    def generate_pdf(self, retval):

        html = self.content

        result = StringIO.StringIO()
        rendering = pisa.pisaDocument(StringIO.StringIO(html.encode("ISO-8859-1")), result)

        if rendering.err:
            return HttpResponse('We had some errors<pre>%s</pre>' % escape(html))
        else:
            self.content = result.getvalue()

    def __init__(self, *args, **kwargs):
        super(PDFTemplateResponse, self).__init__(*args, mimetype='application/pdf', **kwargs)
        self.add_post_render_callback(self.generate_pdf)


class PDFTemplateView(TemplateView):
    response_class = PDFTemplateResponse

Used like:

class MyPdfView(PDFTemplateView):
    template_name = 'things/pdf.html'

回答 3

使用以下包装之一尝试wkhtmltopdf

django-wkhtmltopdfpython-pdfkit

这对我来说非常有效,支持javascript和css或webkit浏览器支持的任何功能。

有关更多详细的教程,请参见此博客文章。

Try wkhtmltopdf with either one of the following wrappers

django-wkhtmltopdf or python-pdfkit

This worked great for me,supports javascript and css or anything for that matter which a webkit browser supports.

For more detailed tutorial please see this blog post


回答 4

在尝试使它工作了许多小时之后,我终于找到了这个:https : //github.com/vierno/django-xhtml2pdf

这是https://github.com/chrisglass/django-xhtml2pdf的一个分支,它为基于类的通用视图提供了mixin。我这样使用它:

    # views.py
    from django_xhtml2pdf.views import PdfMixin
    class GroupPDFGenerate(PdfMixin, DetailView):
        model = PeerGroupSignIn
        template_name = 'groups/pdf.html'

    # templates/groups/pdf.html
    <html>
    <style>
    @page { your xhtml2pdf pisa PDF parameters }
    </style>
    </head>
    <body>
        <div id="header_content"> (this is defined in the style section)
            <h1>{{ peergroupsignin.this_group_title }}</h1>
            ...

填充模板字段时,请使用您在视图中定义的所有小写字母的模型名称。由于它是GCBV,因此您可以在urls.py中将其称为“ .as_view”:

    # urls.py (using url namespaces defined in the main urls.py file)
    url(
        regex=r"^(?P<pk>\d+)/generate_pdf/$",
        view=views.GroupPDFGenerate.as_view(),
        name="generate_pdf",
       ),

After trying to get this to work for too many hours, I finally found this: https://github.com/vierno/django-xhtml2pdf

It’s a fork of https://github.com/chrisglass/django-xhtml2pdf that provides a mixin for a generic class-based view. I used it like this:

    # views.py
    from django_xhtml2pdf.views import PdfMixin
    class GroupPDFGenerate(PdfMixin, DetailView):
        model = PeerGroupSignIn
        template_name = 'groups/pdf.html'

    # templates/groups/pdf.html
    <html>
    <style>
    @page { your xhtml2pdf pisa PDF parameters }
    </style>
    </head>
    <body>
        <div id="header_content"> (this is defined in the style section)
            <h1>{{ peergroupsignin.this_group_title }}</h1>
            ...

Use the model name you defined in your view in all lowercase when populating the template fields. Because its a GCBV, you can just call it as ‘.as_view’ in your urls.py:

    # urls.py (using url namespaces defined in the main urls.py file)
    url(
        regex=r"^(?P<pk>\d+)/generate_pdf/$",
        view=views.GroupPDFGenerate.as_view(),
        name="generate_pdf",
       ),

回答 5

您可以使用iReport编辑器定义布局,并在jasper报表服务器中发布报表。发布后,您可以调用rest api以获取结果。

这是功能测试:

from django.test import TestCase
from x_reports_jasper.models import JasperServerClient

"""
    to try integraction with jasper server through rest
"""
class TestJasperServerClient(TestCase):

    # define required objects for tests
    def setUp(self):

        # load the connection to remote server
        try:

            self.j_url = "http://127.0.0.1:8080/jasperserver"
            self.j_user = "jasperadmin"
            self.j_pass = "jasperadmin"

            self.client = JasperServerClient.create_client(self.j_url,self.j_user,self.j_pass)

        except Exception, e:
            # if errors could not execute test given prerrequisites
            raise

    # test exception when server data is invalid
    def test_login_to_invalid_address_should_raise(self):
        self.assertRaises(Exception,JasperServerClient.create_client, "http://127.0.0.1:9090/jasperserver",self.j_user,self.j_pass)

    # test execute existent report in server
    def test_get_report(self):

        r_resource_path = "/reports/<PathToPublishedReport>"
        r_format = "pdf"
        r_params = {'PARAM_TO_REPORT':"1",}

        #resource_meta = client.load_resource_metadata( rep_resource_path )

        [uuid,out_mime,out_data] = self.client.generate_report(r_resource_path,r_format,r_params)
        self.assertIsNotNone(uuid)

这是调用实现的示例:

from django.db import models
import requests
import sys
from xml.etree import ElementTree
import logging 

# module logger definition
logger = logging.getLogger(__name__)

# Create your models here.
class JasperServerClient(models.Manager):

    def __handle_exception(self, exception_root, exception_id, exec_info ):
        type, value, traceback = exec_info
        raise JasperServerClientError(exception_root, exception_id), None, traceback

    # 01: REPORT-METADATA 
    #   get resource description to generate the report
    def __handle_report_metadata(self, rep_resourcepath):

        l_path_base_resource = "/rest/resource"
        l_path = self.j_url + l_path_base_resource
        logger.info( "metadata (begin) [path=%s%s]"  %( l_path ,rep_resourcepath) )

        resource_response = None
        try:
            resource_response = requests.get( "%s%s" %( l_path ,rep_resourcepath) , cookies = self.login_response.cookies)

        except Exception, e:
            self.__handle_exception(e, "REPORT_METADATA:CALL_ERROR", sys.exc_info())

        resource_response_dom = None
        try:
            # parse to dom and set parameters
            logger.debug( " - response [data=%s]"  %( resource_response.text) )
            resource_response_dom = ElementTree.fromstring(resource_response.text)

            datum = "" 
            for node in resource_response_dom.getiterator():
                datum = "%s<br />%s - %s" % (datum, node.tag, node.text)
            logger.debug( " - response [xml=%s]"  %( datum ) )

            #
            self.resource_response_payload= resource_response.text
            logger.info( "metadata (end) ")
        except Exception, e:
            logger.error( "metadata (error) [%s]" % (e))
            self.__handle_exception(e, "REPORT_METADATA:PARSE_ERROR", sys.exc_info())


    # 02: REPORT-PARAMS 
    def __add_report_params(self, metadata_text, params ):
        if(type(params) != dict):
            raise TypeError("Invalid parameters to report")
        else:
            logger.info( "add-params (begin) []" )
            #copy parameters
            l_params = {}
            for k,v in params.items():
                l_params[k]=v
            # get the payload metadata
            metadata_dom = ElementTree.fromstring(metadata_text)
            # add attributes to payload metadata
            root = metadata_dom #('report'):

            for k,v in l_params.items():
                param_dom_element = ElementTree.Element('parameter')
                param_dom_element.attrib["name"] = k
                param_dom_element.text = v
                root.append(param_dom_element)

            #
            metadata_modified_text =ElementTree.tostring(metadata_dom, encoding='utf8', method='xml')
            logger.info( "add-params (end) [payload-xml=%s]" %( metadata_modified_text )  )
            return metadata_modified_text



    # 03: REPORT-REQUEST-CALL 
    #   call to generate the report
    def __handle_report_request(self, rep_resourcepath, rep_format, rep_params):

        # add parameters
        self.resource_response_payload = self.__add_report_params(self.resource_response_payload,rep_params)

        # send report request

        l_path_base_genreport = "/rest/report"
        l_path = self.j_url + l_path_base_genreport
        logger.info( "report-request (begin) [path=%s%s]"  %( l_path ,rep_resourcepath) )

        genreport_response = None
        try:
            genreport_response = requests.put( "%s%s?RUN_OUTPUT_FORMAT=%s" %(l_path,rep_resourcepath,rep_format),data=self.resource_response_payload, cookies = self.login_response.cookies )
            logger.info( " - send-operation-result [value=%s]"  %( genreport_response.text) )
        except Exception,e:
            self.__handle_exception(e, "REPORT_REQUEST:CALL_ERROR", sys.exc_info())


        # parse the uuid of the requested report
        genreport_response_dom = None

        try:
            genreport_response_dom = ElementTree.fromstring(genreport_response.text)

            for node in genreport_response_dom.findall("uuid"):
                datum = "%s" % (node.text)

            genreport_uuid = datum      

            for node in genreport_response_dom.findall("file/[@type]"):
                datum = "%s" % (node.text)
            genreport_mime = datum

            logger.info( "report-request (end) [uuid=%s,mime=%s]"  %( genreport_uuid, genreport_mime) )

            return [genreport_uuid,genreport_mime]
        except Exception,e:
            self.__handle_exception(e, "REPORT_REQUEST:PARSE_ERROR", sys.exc_info())

    # 04: REPORT-RETRIEVE RESULTS 
    def __handle_report_reply(self, genreport_uuid ):


        l_path_base_getresult = "/rest/report"
        l_path = self.j_url + l_path_base_getresult 
        logger.info( "report-reply (begin) [uuid=%s,path=%s]"  %( genreport_uuid,l_path) )

        getresult_response = requests.get( "%s%s/%s?file=report" %(self.j_url,l_path_base_getresult,genreport_uuid),data=self.resource_response_payload, cookies = self.login_response.cookies )
        l_result_header_mime =getresult_response.headers['Content-Type']

        logger.info( "report-reply (end) [uuid=%s,mime=%s]"  %( genreport_uuid, l_result_header_mime) )
        return [l_result_header_mime, getresult_response.content]

    # public methods ---------------------------------------    

    # tries the authentication with jasperserver throug rest
    def login(self, j_url, j_user,j_pass):
        self.j_url= j_url

        l_path_base_auth = "/rest/login"
        l_path = self.j_url + l_path_base_auth

        logger.info( "login (begin) [path=%s]"  %( l_path) )

        try:
            self.login_response = requests.post(l_path , params = {
                    'j_username':j_user,
                    'j_password':j_pass
                })                  

            if( requests.codes.ok != self.login_response.status_code ):
                self.login_response.raise_for_status()

            logger.info( "login (end)" )
            return True
            # see http://blog.ianbicking.org/2007/09/12/re-raising-exceptions/

        except Exception, e:
            logger.error("login (error) [e=%s]" % e )
            self.__handle_exception(e, "LOGIN:CALL_ERROR",sys.exc_info())
            #raise

    def generate_report(self, rep_resourcepath,rep_format,rep_params):
        self.__handle_report_metadata(rep_resourcepath)
        [uuid,mime] = self.__handle_report_request(rep_resourcepath, rep_format,rep_params)
        # TODO: how to handle async?
        [out_mime,out_data] = self.__handle_report_reply(uuid)
        return [uuid,out_mime,out_data]

    @staticmethod
    def create_client(j_url, j_user, j_pass):
        client = JasperServerClient()
        login_res = client.login( j_url, j_user, j_pass )
        return client


class JasperServerClientError(Exception):

    def __init__(self,exception_root,reason_id,reason_message=None):
        super(JasperServerClientError, self).__init__(str(reason_message))
        self.code = reason_id 
        self.description = str(exception_root) + " " + str(reason_message)
    def __str__(self):
        return self.code + " " + self.description

You can use iReport editor to define the layout, and publish the report in jasper reports server. After publish you can invoke the rest api to get the results.

Here is the test of the functionality:

from django.test import TestCase
from x_reports_jasper.models import JasperServerClient

"""
    to try integraction with jasper server through rest
"""
class TestJasperServerClient(TestCase):

    # define required objects for tests
    def setUp(self):

        # load the connection to remote server
        try:

            self.j_url = "http://127.0.0.1:8080/jasperserver"
            self.j_user = "jasperadmin"
            self.j_pass = "jasperadmin"

            self.client = JasperServerClient.create_client(self.j_url,self.j_user,self.j_pass)

        except Exception, e:
            # if errors could not execute test given prerrequisites
            raise

    # test exception when server data is invalid
    def test_login_to_invalid_address_should_raise(self):
        self.assertRaises(Exception,JasperServerClient.create_client, "http://127.0.0.1:9090/jasperserver",self.j_user,self.j_pass)

    # test execute existent report in server
    def test_get_report(self):

        r_resource_path = "/reports/<PathToPublishedReport>"
        r_format = "pdf"
        r_params = {'PARAM_TO_REPORT':"1",}

        #resource_meta = client.load_resource_metadata( rep_resource_path )

        [uuid,out_mime,out_data] = self.client.generate_report(r_resource_path,r_format,r_params)
        self.assertIsNotNone(uuid)

And here is an example of the invocation implementation:

from django.db import models
import requests
import sys
from xml.etree import ElementTree
import logging 

# module logger definition
logger = logging.getLogger(__name__)

# Create your models here.
class JasperServerClient(models.Manager):

    def __handle_exception(self, exception_root, exception_id, exec_info ):
        type, value, traceback = exec_info
        raise JasperServerClientError(exception_root, exception_id), None, traceback

    # 01: REPORT-METADATA 
    #   get resource description to generate the report
    def __handle_report_metadata(self, rep_resourcepath):

        l_path_base_resource = "/rest/resource"
        l_path = self.j_url + l_path_base_resource
        logger.info( "metadata (begin) [path=%s%s]"  %( l_path ,rep_resourcepath) )

        resource_response = None
        try:
            resource_response = requests.get( "%s%s" %( l_path ,rep_resourcepath) , cookies = self.login_response.cookies)

        except Exception, e:
            self.__handle_exception(e, "REPORT_METADATA:CALL_ERROR", sys.exc_info())

        resource_response_dom = None
        try:
            # parse to dom and set parameters
            logger.debug( " - response [data=%s]"  %( resource_response.text) )
            resource_response_dom = ElementTree.fromstring(resource_response.text)

            datum = "" 
            for node in resource_response_dom.getiterator():
                datum = "%s<br />%s - %s" % (datum, node.tag, node.text)
            logger.debug( " - response [xml=%s]"  %( datum ) )

            #
            self.resource_response_payload= resource_response.text
            logger.info( "metadata (end) ")
        except Exception, e:
            logger.error( "metadata (error) [%s]" % (e))
            self.__handle_exception(e, "REPORT_METADATA:PARSE_ERROR", sys.exc_info())


    # 02: REPORT-PARAMS 
    def __add_report_params(self, metadata_text, params ):
        if(type(params) != dict):
            raise TypeError("Invalid parameters to report")
        else:
            logger.info( "add-params (begin) []" )
            #copy parameters
            l_params = {}
            for k,v in params.items():
                l_params[k]=v
            # get the payload metadata
            metadata_dom = ElementTree.fromstring(metadata_text)
            # add attributes to payload metadata
            root = metadata_dom #('report'):

            for k,v in l_params.items():
                param_dom_element = ElementTree.Element('parameter')
                param_dom_element.attrib["name"] = k
                param_dom_element.text = v
                root.append(param_dom_element)

            #
            metadata_modified_text =ElementTree.tostring(metadata_dom, encoding='utf8', method='xml')
            logger.info( "add-params (end) [payload-xml=%s]" %( metadata_modified_text )  )
            return metadata_modified_text



    # 03: REPORT-REQUEST-CALL 
    #   call to generate the report
    def __handle_report_request(self, rep_resourcepath, rep_format, rep_params):

        # add parameters
        self.resource_response_payload = self.__add_report_params(self.resource_response_payload,rep_params)

        # send report request

        l_path_base_genreport = "/rest/report"
        l_path = self.j_url + l_path_base_genreport
        logger.info( "report-request (begin) [path=%s%s]"  %( l_path ,rep_resourcepath) )

        genreport_response = None
        try:
            genreport_response = requests.put( "%s%s?RUN_OUTPUT_FORMAT=%s" %(l_path,rep_resourcepath,rep_format),data=self.resource_response_payload, cookies = self.login_response.cookies )
            logger.info( " - send-operation-result [value=%s]"  %( genreport_response.text) )
        except Exception,e:
            self.__handle_exception(e, "REPORT_REQUEST:CALL_ERROR", sys.exc_info())


        # parse the uuid of the requested report
        genreport_response_dom = None

        try:
            genreport_response_dom = ElementTree.fromstring(genreport_response.text)

            for node in genreport_response_dom.findall("uuid"):
                datum = "%s" % (node.text)

            genreport_uuid = datum      

            for node in genreport_response_dom.findall("file/[@type]"):
                datum = "%s" % (node.text)
            genreport_mime = datum

            logger.info( "report-request (end) [uuid=%s,mime=%s]"  %( genreport_uuid, genreport_mime) )

            return [genreport_uuid,genreport_mime]
        except Exception,e:
            self.__handle_exception(e, "REPORT_REQUEST:PARSE_ERROR", sys.exc_info())

    # 04: REPORT-RETRIEVE RESULTS 
    def __handle_report_reply(self, genreport_uuid ):


        l_path_base_getresult = "/rest/report"
        l_path = self.j_url + l_path_base_getresult 
        logger.info( "report-reply (begin) [uuid=%s,path=%s]"  %( genreport_uuid,l_path) )

        getresult_response = requests.get( "%s%s/%s?file=report" %(self.j_url,l_path_base_getresult,genreport_uuid),data=self.resource_response_payload, cookies = self.login_response.cookies )
        l_result_header_mime =getresult_response.headers['Content-Type']

        logger.info( "report-reply (end) [uuid=%s,mime=%s]"  %( genreport_uuid, l_result_header_mime) )
        return [l_result_header_mime, getresult_response.content]

    # public methods ---------------------------------------    

    # tries the authentication with jasperserver throug rest
    def login(self, j_url, j_user,j_pass):
        self.j_url= j_url

        l_path_base_auth = "/rest/login"
        l_path = self.j_url + l_path_base_auth

        logger.info( "login (begin) [path=%s]"  %( l_path) )

        try:
            self.login_response = requests.post(l_path , params = {
                    'j_username':j_user,
                    'j_password':j_pass
                })                  

            if( requests.codes.ok != self.login_response.status_code ):
                self.login_response.raise_for_status()

            logger.info( "login (end)" )
            return True
            # see http://blog.ianbicking.org/2007/09/12/re-raising-exceptions/

        except Exception, e:
            logger.error("login (error) [e=%s]" % e )
            self.__handle_exception(e, "LOGIN:CALL_ERROR",sys.exc_info())
            #raise

    def generate_report(self, rep_resourcepath,rep_format,rep_params):
        self.__handle_report_metadata(rep_resourcepath)
        [uuid,mime] = self.__handle_report_request(rep_resourcepath, rep_format,rep_params)
        # TODO: how to handle async?
        [out_mime,out_data] = self.__handle_report_reply(uuid)
        return [uuid,out_mime,out_data]

    @staticmethod
    def create_client(j_url, j_user, j_pass):
        client = JasperServerClient()
        login_res = client.login( j_url, j_user, j_pass )
        return client


class JasperServerClientError(Exception):

    def __init__(self,exception_root,reason_id,reason_message=None):
        super(JasperServerClientError, self).__init__(str(reason_message))
        self.code = reason_id 
        self.description = str(exception_root) + " " + str(reason_message)
    def __str__(self):
        return self.code + " " + self.description

回答 6

我得到了从html模板生成PDF的代码:

    import os

    from weasyprint import HTML

    from django.template import Template, Context
    from django.http import HttpResponse 


    def generate_pdf(self, report_id):

            # Render HTML into memory and get the template firstly
            template_file_loc = os.path.join(os.path.dirname(__file__), os.pardir, 'templates', 'the_template_pdf_generator.html')
            template_contents = read_all_as_str(template_file_loc)
            render_template = Template(template_contents)

            #rendering_map is the dict for params in the template 
            render_definition = Context(rendering_map)
            render_output = render_template.render(render_definition)

            # Using Rendered HTML to generate PDF
            response = HttpResponse(content_type='application/pdf')
            response['Content-Disposition'] = 'attachment; filename=%s-%s-%s.pdf' % \
                                              ('topic-test','topic-test', '2018-05-04')
            # Generate PDF
            pdf_doc = HTML(string=render_output).render()
            pdf_doc.pages[0].height = pdf_doc.pages[0]._page_box.children[0].children[
                0].height  # Make PDF file as single page file 
            pdf_doc.write_pdf(response)
            return response

    def read_all_as_str(self, file_loc, read_method='r'):
        if file_exists(file_loc):
            handler = open(file_loc, read_method)
            contents = handler.read()
            handler.close()
            return contents
        else:
            return 'file not exist'  

I get the code to generate the PDF from html template :

    import os

    from weasyprint import HTML

    from django.template import Template, Context
    from django.http import HttpResponse 


    def generate_pdf(self, report_id):

            # Render HTML into memory and get the template firstly
            template_file_loc = os.path.join(os.path.dirname(__file__), os.pardir, 'templates', 'the_template_pdf_generator.html')
            template_contents = read_all_as_str(template_file_loc)
            render_template = Template(template_contents)

            #rendering_map is the dict for params in the template 
            render_definition = Context(rendering_map)
            render_output = render_template.render(render_definition)

            # Using Rendered HTML to generate PDF
            response = HttpResponse(content_type='application/pdf')
            response['Content-Disposition'] = 'attachment; filename=%s-%s-%s.pdf' % \
                                              ('topic-test','topic-test', '2018-05-04')
            # Generate PDF
            pdf_doc = HTML(string=render_output).render()
            pdf_doc.pages[0].height = pdf_doc.pages[0]._page_box.children[0].children[
                0].height  # Make PDF file as single page file 
            pdf_doc.write_pdf(response)
            return response

    def read_all_as_str(self, file_loc, read_method='r'):
        if file_exists(file_loc):
            handler = open(file_loc, read_method)
            contents = handler.read()
            handler.close()
            return contents
        else:
            return 'file not exist'  

回答 7

如果您的html模板中包含上下文数据以及css和js。比使用pdfjs的选择更好

在您的代码中,您可以像这样使用。

from django.template.loader import get_template
import pdfkit
from django.conf import settings

context={....}
template = get_template('reports/products.html')
html_string = template.render(context)
pdfkit.from_string(html_string, os.path.join(settings.BASE_DIR, "media", 'products_report-%s.pdf'%(id)))

在您的HTML中,您可以链接外部或内部的CSS和js,它将生成最佳质量的pdf。

If you have context data along with css and js in your html template. Than you have good option to use pdfjs.

In your code you can use like this.

from django.template.loader import get_template
import pdfkit
from django.conf import settings

context={....}
template = get_template('reports/products.html')
html_string = template.render(context)
pdfkit.from_string(html_string, os.path.join(settings.BASE_DIR, "media", 'products_report-%s.pdf'%(id)))

In your HTML you can link extranal or internal css and js, it will generate best quality of pdf.


如何使用BeautifulSoup查找节点的子节点

问题:如何使用BeautifulSoup查找节点的子节点

我想获取所有<a>属于以下子项的标签<li>

<div>
<li class="test">
    <a>link1</a>
    <ul> 
       <li>  
          <a>link2</a> 
       </li>
    </ul>
</li>
</div>

我知道如何找到像这样的特定类的元素:

soup.find("li", { "class" : "test" }) 

但是我不知道如何找到所有<a>的孩子的孩子,<li class=test>而不是其他孩子的孩子。

就像我想选择:

<a>link1</a>

I want to get all the <a> tags which are children of <li>:

<div>
<li class="test">
    <a>link1</a>
    <ul> 
       <li>  
          <a>link2</a> 
       </li>
    </ul>
</li>
</div>

I know how to find element with particular class like this:

soup.find("li", { "class" : "test" }) 

But I don’t know how to find all <a> which are children of <li class=test> but not any others.

Like I want to select:

<a>link1</a>

回答 0

试试这个

li = soup.find('li', {'class': 'text'})
children = li.findChildren("a" , recursive=False)
for child in children:
    print child

Try this

li = soup.find('li', {'class': 'text'})
children = li.findChildren("a" , recursive=False)
for child in children:
    print(child)

回答 1

DOC中有一个超小部分,显示了如何查找/ find_all 直接子级。

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#the-recursive-argument

在您需要的情况下,link1是第一个直接子级:

# for only first direct child
soup.find("li", { "class" : "test" }).find("a", recursive=False)

如果您想要所有直系子女:

# for all direct children
soup.find("li", { "class" : "test" }).findAll("a", recursive=False)

There’s a super small section in the DOCs that shows how to find/find_all direct children.

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#the-recursive-argument

In your case as you want link1 which is first direct child:

# for only first direct child
soup.find("li", { "class" : "test" }).find("a", recursive=False)

If you want all direct children:

# for all direct children
soup.find("li", { "class" : "test" }).findAll("a", recursive=False)

回答 2

也许你想做

soup.find("li", { "class" : "test" }).find('a')

Perhaps you want to do

soup.find("li", { "class" : "test" }).find('a')

回答 3

试试这个:

li = soup.find("li", { "class" : "test" })
children = li.find_all("a") # returns a list of all <a> children of li

其他提醒:

find方法仅获取第一个出现的子元素。find_all方法获取所有后代元素,并存储在列表中。

try this:

li = soup.find("li", { "class" : "test" })
children = li.find_all("a") # returns a list of all <a> children of li

other reminders:

The find method only gets the first occurring child element. The find_all method gets all descendant elements and are stored in a list.


回答 4

“如何找到所有人a的孩子,<li class=test>而不是其他孩子?”

给定下面的HTML(我添加了另一个<a>以显示select和之间的区别select_one):

<div>
  <li class="test">
    <a>link1</a>
    <ul>
      <li>
        <a>link2</a>
      </li>
    </ul>
    <a>link3</a>
  </li>
</div>

解决方案是使用放在两个CSS选择器之间的子组合器>):

>>> soup.select('li.test > a')
[<a>link1</a>, <a>link3</a>]

如果您只想找到第一个孩子:

>>> soup.select_one('li.test > a')
<a>link1</a>

“How to find all a which are children of <li class=test> but not any others?”

Given the HTML below (I added another <a> to show te difference between select and select_one):

<div>
  <li class="test">
    <a>link1</a>
    <ul>
      <li>
        <a>link2</a>
      </li>
    </ul>
    <a>link3</a>
  </li>
</div>

The solution is to use child combinator (>) that is placed between two CSS selectors:

>>> soup.select('li.test > a')
[<a>link1</a>, <a>link3</a>]

In case you want to find only the first child:

>>> soup.select_one('li.test > a')
<a>link1</a>

回答 5

另一种方法-创建一个过滤器函数,该函数返回True所有所需标签:

def my_filter(tag):
    return (tag.name == 'a' and
        tag.parent.name == 'li' and
        'test' in tag.parent['class'])

然后只需调用find_all参数:

for a in soup(my_filter): # or soup.find_all(my_filter)
    print a

Yet another method – create a filter function that returns True for all desired tags:

def my_filter(tag):
    return (tag.name == 'a' and
        tag.parent.name == 'li' and
        'test' in tag.parent['class'])

Then just call find_all with the argument:

for a in soup(my_filter): # or soup.find_all(my_filter)
    print a

应用程序未拾取.css文件(烧瓶/ python)

问题:应用程序未拾取.css文件(烧瓶/ python)

我正在渲染一个模板,尝试使用外部样式表进行样式设置。文件结构如下。

/app
    - app_runner.py
    /services
        - app.py 
    /templates
        - mainpage.html
    /styles
        - mainpage.css

mainpage.html看起来像这样

<html>
    <head>
        <link rel= "stylesheet" type= "text/css" href= "../styles/mainpage.css">
    </head>
    <body>
        <!-- content --> 

我的样式均未应用。它与html是我正在渲染的模板有关吗?python看起来像这样。

return render_template("mainpage.html", variables..)

我知道这很有效,因为我仍然能够渲染模板。但是,当我尝试将样式代码从html的“ head”标记中的“样式”块移动到外部文件时,所有样式消失了,留下了一个裸露的html页面。有人看到我的文件结构有任何错误吗?

I am rendering a template, that I am attempting to style with an external style sheet. File structure is as follows.

/app
    - app_runner.py
    /services
        - app.py 
    /templates
        - mainpage.html
    /styles
        - mainpage.css

mainpage.html looks like this

<html>
    <head>
        <link rel= "stylesheet" type= "text/css" href= "../styles/mainpage.css">
    </head>
    <body>
        <!-- content --> 

None of my styles are being applied though. Does it have something to do with the fact that the html is a template I am rendering? The python looks like this.

return render_template("mainpage.html", variables..)

I know this much is working, because I am still able to render the template. However, when I tried to move my styling code from a “style” block within the html’s “head” tag to an external file, all the styling went away, leaving a bare html page. Anyone see any errors with my file structure?


回答 0

您需要有一个“静态”文件夹设置(用于css / js文件),除非您在Flask初始化期间专门覆盖了它。我假设您没有覆盖它。

您的CSS目录结构应类似于:

/app
    - app_runner.py
    /services
        - app.py 
    /templates
        - mainpage.html
    /static
        /styles
            - mainpage.css

请注意,您的/ styles目录应位于/ static下

然后做

<link rel= "stylesheet" type= "text/css" href= "{{ url_for('static',filename='styles/mainpage.css') }}">

Flask现在将在static / styles / mainpage.css下查找css文件

You need to have a ‘static’ folder setup (for css/js files) unless you specifically override it during Flask initialization. I am assuming you did not override it.

Your directory structure for css should be like:

/app
    - app_runner.py
    /services
        - app.py 
    /templates
        - mainpage.html
    /static
        /styles
            - mainpage.css

Notice that your /styles directory should be under /static

Then, do this

<link rel= "stylesheet" type= "text/css" href= "{{ url_for('static',filename='styles/mainpage.css') }}">

Flask will now look for the css file under static/styles/mainpage.css


回答 1

在jinja2模板(flask使用的模板)中,使用

href="{{ url_for('static', filename='mainpage.css')}}"

但是,这些static文件通常位于static文件夹中,除非另有配置。

In jinja2 templates (which flask uses), use

href="{{ url_for('static', filename='mainpage.css')}}"

The static files are usually in the static folder, though, unless configured otherwise.


回答 2

我已经阅读了多个主题,但都没有解决人们正在描述的问题,我也经历过。

我什至尝试离开conda并使用pip升级到python 3.7,我尝试了所有建议的编码,但均未解决。

这就是原因(问题):

默认情况下,python / flask在以下文件夹结构中搜索静态文件和模板:

/Users/username/folder_one/folder_two/ProjectName/src/app_name/<static>
 and 
/Users/username/folder_one/folder_two/ProjectName/src/app_name/<template>

您可以使用Pycharm(或其他任何工具)上的调试器自己进行验证,并检查应用程序上的值(app = Flask(name))并搜索teamplate_folder和static_folder

为了解决这个问题,您必须在创建应用程序时指定值,如下所示:

TEMPLATE_DIR = os.path.abspath('../templates')
STATIC_DIR = os.path.abspath('../static')

# app = Flask(__name__) # to make the app run without any
app = Flask(__name__, template_folder=TEMPLATE_DIR, static_folder=STATIC_DIR)

路径TEMPLATE_DIR和STATIC_DIR取决于文件应用程序所在的位置。就我而言,请参见图片,它位于src下的文件夹中。

您可以根据需要更改模板和静态文件夹,并在应用程序= Flask上注册…

实际上,当我弄乱文件夹时,我已经开始遇到问题,有时却无法正常工作。这可以彻底解决问题

html代码如下所示:

<link href="{{ url_for('static', filename='libraries/css/bootstrap.css') }}" rel="stylesheet" type="text/css" >

这个代码

这里的文件夹结构

I have read multiple threads and none of them fixed the issue that people are describing and I have experienced too.

I have even tried to move away from conda and use pip, to upgrade to python 3.7, i have tried all coding proposed and none of them fixed.

And here is why (the problem):

by default python/flask search the static and the template in a folder structure like:

/Users/username/folder_one/folder_two/ProjectName/src/app_name/<static>
 and 
/Users/username/folder_one/folder_two/ProjectName/src/app_name/<template>

you can verify by yourself using the debugger on Pycharm (or anything else) and check the values on the app (app = Flask(name)) and search for teamplate_folder and static_folder

in order to fix this, you have to specify the values when creating the app something like this:

TEMPLATE_DIR = os.path.abspath('../templates')
STATIC_DIR = os.path.abspath('../static')

# app = Flask(__name__) # to make the app run without any
app = Flask(__name__, template_folder=TEMPLATE_DIR, static_folder=STATIC_DIR)

the path TEMPLATE_DIR and STATIC_DIR depend on where the file app is located. in my case, see the picture, it was located within a folder under src.

you can change the template and static folders as you wish and register on the app = Flask…

In truth, I have started experiencing the problem when messing around with folder and at times worked at times not. this fixes the problem once and for all

the html code looks like this:

<link href="{{ url_for('static', filename='libraries/css/bootstrap.css') }}" rel="stylesheet" type="text/css" >

This the code

Here the structure of the folders


回答 3

仍然有下面由codegeek提供的解决方案后的问题:
<link rel= "stylesheet" type= "text/css" href= "{{ url_for('static',filename='styles/mainpage.css') }}">

Google Chrome浏览器中,按下重新加载按钮(F5)不会重新加载静态文件。如果遵循了公认的解决方案,但仍然看不到对CSS所做的更改,请按ctrl + shift + R忽略缓存的文件并重新加载静态文件。

Firefox中,按下重新加载按钮会出现以重新加载静态文件。

Edge中,按刷新按钮不会重新加载静态文件。按ctrl + shift + R应该忽略缓存的文件并重新加载静态文件。但是,这在我的计算机上不起作用。

Still having problems after following the solution provided by codegeek:
<link rel= "stylesheet" type= "text/css" href= "{{ url_for('static',filename='styles/mainpage.css') }}"> ?

In Google Chrome pressing the reload button (F5) will not reload the static files. If you have followed the accepted solution but still don’t see the changes you have made to CSS, then press ctrl + shift + R to ignore cached files and reload the static files.

In Firefox pressing the reload button appears to reload the static files.

In Edge pressing the refresh button does not reload the static file. Pressing ctrl + shift + R is supposed to ignore cached files and reload the static files. However this does not work on my computer.


回答 4

我正在运行flask的1.0.2版本。上面的文件结构对我不起作用,但是我找到了一个可以的文件结构,如下所示:

     app_folder/ flask_app.py/ static/ style.css/ templates/
     index.html

(请注意,“静态”和“模板”是文件夹,应命名为完全相同的名称。)

要检查您正在运行的烧瓶版本,应在终端中打开Python并相应地键入以下内容:

进口烧瓶

烧瓶-版本

I’m running version 1.0.2 of flask right now. The above file structures did not work for me, but I found one that did, which are as follows:

     app_folder/ flask_app.py/ static/ style.css/ templates/
     index.html

(Please note that ‘static’ and ‘templates’ are folders, which should be named exactly the same thing.)

To check what version of flask you are running, you should open Python in terminal and type the following accordingly:

import flask

flask –version


回答 5

烧瓶项目的结构不同。正如您所提到的,项目结构是相同的,但是唯一的问题是样式文件夹。样式文件夹必须位于静态文件夹内。

static/styles/style.css

The flask project structure is different. As you mentioned in question the project structure is the same but the only problem is wit the styles folder. Styles folder must come within the static folder.

static/styles/style.css

回答 6

还有一点要添加到该线程上。

如果在.css文件名中添加下划线,则将无法使用。

One more point to add on to this thread.

If you add an underscore in your .css file name, then it wouldn’t work.


回答 7

还有一点要添加。除了上面提到的答案外,请确保将以下行添加到app.py文件中:

app = Flask(__name__, static_folder="your path to static")

否则,flask将无法检测到静态文件夹。

One more point to add.Along with above upvoted answers, please make sure the below line is added to app.py file:

app = Flask(__name__, static_folder="your path to static")

Otherwise flask will not be able to detect static folder.


回答 8

如果以上任何方法都不起作用,并且您的代码是完美的,则请按Ctrl + F5尝试进行强制刷新。它将清除所有机会,然后重新加载文件。它为我工作。

If any of the above method is not working and you code is perfect then try hard refreshing by pressing Ctrl + F5. It will clear all the chaces and then reload file. It worked for me.


回答 9

我很确定它类似于Laravel模板,这就是我的方法。

<link rel="stylesheet" href="/folder/stylesheets/stylesheet.css" />

引用: CSS文件路径问题

一个简单的flask应用程序,它从.html文件读取其内容。外部样式表被阻止?

I’m pretty sure it’s similar to Laravel template, this is how I did mine.

<link rel="stylesheet" href="/folder/stylesheets/stylesheet.css" />

Referred: CSS file pathing problem

Simple flask application that reads its content from a .html file. External style sheet being blocked?


无法显示HTML字符串

问题:无法显示HTML字符串

我正在Android WebView中使用显示字符串HTML挣扎。

在服务器端,我下载了一个网页并转义了HTML字符和引号(我使用了Python):

my_string = html.escape(my_string, True)

在Android客户端上:字符串不通过以下方式转义:

myString = StringEscapeUtils.unescapeHtml4(myString)
webview.loadData( myString, "text/html", "encoding");

但是,webview只是将它们显示为文字字符串。结果如下:

编辑:我添加从服务器端返回的原始字符串:

链接rel =“ apple-touch-icon”;尺寸=“ 114×114” href =“&quot; /static/favicon/apple-touch-icon-114×114.png”&gt; &lt; link rel =“苹果触摸图标”;尺寸=“ 72×72” href =“&quot; /static/favicon/apple-touch-icon-72×72.png”&gt; &lt; link rel =“苹果触摸图标”;尺寸=“ 144×144” href =“&quot; /static/favicon/apple-touch-icon-144×144.png”&gt; &lt; link rel =“苹果触摸图标”;尺寸=“ 60×60” href =“&quot; /static/favicon/apple-touch-icon-60×60.png”&gt; &lt; link rel =“苹果触摸图标”;尺寸=“ 120×120” href =“&quot; /static/favicon/apple-touch-icon-120×120.png”&gt; &lt; link rel =&quot; 苹果触摸图标” 尺寸=“ 76×76” href =“&quot; /static/favicon/apple-touch-icon-76×76.png”&gt; &lt; link rel =“苹果触摸图标”;尺寸=“ 152×152” href =“&quot; /static/favicon/apple-touch-icon-152×152.png”&gt; &lt; link rel =“苹果触摸图标”;尺寸=“ 180×180” href =“&quot; /static/favicon/apple-touch-icon-180×180.png”&gt; &lt; link rel =“ icon”;type =“ image / png” href =“&quot; /static/favicon/favicon-192×192.png” 尺寸=“ 192×192” &lt; link rel =“ icon”;type =“ image / png” href =“&quot; /static/favicon/favicon-160×160.png” 尺寸=“ 160×160” &lt; link rel =&quot; 图标” type =“ image / png” href =“&quot; /static/favicon/favicon-96×96.png” 尺寸=“ 96×96” &lt; link rel =“ icon”;type =“ image / png” href =“&quot; /static/favicon/favicon-16×16.png” 尺寸=“ 16×16” &lt; link rel =“ icon”;type =“ image / png” href =“&quot; /static/favicon/favicon-32×32.png” 尺寸=“ 32×32” &lt;元名称=“ msapplication-TileColor”;内容=“#da532c” &lt;元名称=“ msapplication-TileImage”;content =“ / static / favicon / mstile-144×144.png” &lt;元名称=“ msapplication-config”;内容=“ / static / favicon / browserconfig.xml”;&gt; &lt;!-外部CSS-&gt; &lt; link rel =“ stylesheet” href =&quot;https://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap.min.css&quot; &lt;!-外部字体-&gt; &lt; link href =” // maxcdn.bootstrapcdn.com/font-awesome/4.2.0/css/font-awesome.min.css” rel =“ stylesheet” &lt; link href =&#x27; // fonts.googleapis.com/css?family=Open+Sans:300,600′ rel =&#x27; stylesheet&#x27; type =&#x27; text / css&#x27;&gt; &lt; link href =&#x27; // fonts.googleapis.com/css?family=Lora:400,700′ rel =&#x27; stylesheet&#x27; type =&#x27; text / css&#x27;&gt; &lt;!-[如果是IE 9]&gt; &lt; script src =” // cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.2/html5shiv.min.js”</script> &lt; script src =” // cdnjs.cloudflare.com/ajax/libs/respond.js/1.4.2/respond.min.js”&lt; / script&gt; &lt ;! [endif]-&gt; &lt;!-网站CSS-&gt; &lt; link rel =“ stylesheet” type =“ text / css”;href =“&quot; /static/css/style.css”&gt; &lt; link rel =“ stylesheet” type =“ text / css”;href =“&quot; /static/css/glyphicon.css”&gt; &lt; / head&gt; &lt; body&gt; &lt; div class =“容器文章-页面”&gt; &lt; div class =“ row”&gt; &lt; div class =“ col-md-8 col-md-offset-2”&gt; &lt; h2&gt;&lt; a href =&quot; quot; href =“&quot; /static/css/glyphicon.css”&gt; &lt; / head&gt; &lt; body&gt; &lt; div class =“容器文章-页面”&gt; &lt; div class =“ row”&gt; &lt; div class =“ col-md-8 col-md-offset-2”&gt; &lt; h2&gt;&lt; a href =&quot; quot; href =“&quot; /static/css/glyphicon.css”&gt; &lt; / head&gt; &lt; body&gt; &lt; div class =“容器文章-页面”&gt; &lt; div class =“ row”&gt; &lt; div class =“ col-md-8 col-md-offset-2”&gt; &lt; h2&gt;&lt; a href =&quot;http://www.huffingtonpost.com/2015/03/22/ted-cruz-climate-change_n_6919002.html&quot;&gt ; Gov。杰里·布朗(Jerry Brown)说特德·克鲁兹(Ted Cruz)绝对不适合(&#39;)因气候变化而竞选公职的观点&lt; / a&lt; / h2&gt; &lt; h4&gt; Sam Levine&lt; / h4&gt; &lt; div class =“ article”&gt; 加利福尼亚州州长杰里·布朗(D)星期天说,德克萨斯州参议员特德·克鲁兹(R-Texas)“绝对不适合竞选公职”。因为他在气候变化方面的立场。&lt; p&gt;&quot;我刚从新罕布什尔州回来,那里到处都是冰雪。我对此的看法很简单:有关此事的辩论应遵循科学并应遵循数据,而且许多关于全球变暖的警报者,他们有一个问题,因为科学不支持他们。克鲁兹&lt; a href =&quot;https://www.youtube.com/watch?v=m0UJ_S​​c0Udk&lt ; said&lt; / a&gt; 在“与塞思·迈耶斯晚晚”上 &lt; / p&gt; &lt; p&gt;为了支持他的主张,克鲁兹引用了卫星数据,该数据显示在过去的17年中没有明显的变暖现象。但是Cruz的推理&lt; a href =&quot; http://www.politifact.com/truth-o-meter/statements/2015/mar/20 / ted-cruz / ted-cruzs-worlds-fire-not-last-17-years /“已被由Politifact揭穿,这表明科学家有足够的证据相信气候将继续变暖。&lt; p&gt;&quot;他所说的话绝对是错误的,“布朗在&lt; a href =&quot; http://www.nbcnews。 “不适合运行” n328046”“ NBC新闻”。他补充说&lt; a href =&quot; http://climate.nasa.gov/scientific-consensus/ “超过90%”。研究气候的科学家们一致认为,气候变化是人类活动造成的。“那个人象征着这种无知和对现有科学数据的直接伪造。令人震惊,我认为男人已经使自己完全不适合竞选公职。” 布朗说。&lt; p&gt;布朗补充说,气候变化具有&lt; a href =&quot; http://www.huffingtonpost.com/2015/03/06/california-drought-february- record_n_6820704.html?utm_hp_ref =加利福尼亚干旱“导致他所在州的干旱,以及东海岸的严寒和暴风雨。&lt; p&gt;虽然克鲁兹可能在新罕布什尔州到处都看到冰雪,但数据显示该国实际上正在经历&lt; a href =&quot; http://www.huffingtonpost.com/2015/02/19/cold-weather- winter_n_6713104.html“&gt;比平均水平&lt; / a&gt; / p。&lt; p&gt;布朗对克鲁兹的批评发生在得克萨斯州参议员被宣布宣布&lt; a href =&quot; http://www.huffingtonpost.com/2015/03/22/ted-cruz-2016_n_6917824.html” 总统竞选&lt; / a&gt;。&lt; / p&gt; &lt; / div&gt; &lt; div class =“原始”&gt; &lt; a href =&quot;http://www.huffingtonpost.com/2015/03/22/ted-cruz-climate-change_n_6919002.html&lt; VIEW ORIGINAL&lt; / a&gt; &lt; / div&gt; &lt; / div&gt; &lt; / div&gt; &lt; / div&gt; &lt; script src =” // code.jquery.com/jquery-latest.js”</script &lt; script src =” / static / js / modal.js”&lt; / script &lt; script src =&quot; /static/js/bootbox.min.js”&lt; / script&gt; &lt; script src =” / static / js / site.js”&lt; / script &lt; script&gt; (function(i,s,o,g,r,a,m){i [&#x27; GoogleAnalyticsObject&#x27;] = r; i [r] = i [r] || function(){(i [ r] .q = i [r] .q || [])。push(arguments)},i [r] .l = 1 * new Date(); a = s.createElement(o),m = s。 getElementsByTagName(o)[0]; a.async = 1; a.src = g; m.parentNode.insertBefore(a,m)})(窗口,文档,&#x27; script&#x27;,&#x27; //万维网。google-analytics.com/analytics.js’,’ga’); ga(&#x27; create&#x27;,&#x27; UA-56257533-1&#x27;,&#x27; auto&#x27;);ga(&#xsend&#x27 ;,&#x27; pageview&#x27;); &lt; / script&gt; &lt; / body&gt; &lt; / html&gt;”

I am struggling with display string HTML in Android WebView.

On the server side, I downloaded a web page and escape HTML characters and quotes (I used Python):

my_string = html.escape(my_string, True)

On the Android client side: strings are unescaped by:

myString = StringEscapeUtils.unescapeHtml4(myString)
webview.loadData( myString, "text/html", "encoding");

However webview just display them as literal strings. Here are the result:

Edit: I add original string returned from server side:

“&lt;!DOCTYPE html&gt; &lt;html lang=&quot;en&quot;&gt; &lt;head&gt; &lt;meta charset=&quot;utf-8&quot;&gt; &lt;meta http-equiv=&quot;X-UA-Compatible&quot; content=&quot;IE=edge&quot;&gt; &lt;meta name=&quot;viewport&quot; content=&quot;width=device-width, initial-scale=1.0&quot;&gt; &lt;meta name=&quot;description&quot; content=&quot;&quot;&gt; &lt;title&gt;Saulify&lt;/title&gt; &lt;!– All the Favicons… –&gt; &lt;link rel=&quot;shortcut icon&quot; href=&quot;/static/favicon/favicon.ico&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;57×57&quot; href=&quot;/static/favicon/apple-touch-icon-57×57.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;114×114&quot; href=&quot;/static/favicon/apple-touch-icon-114×114.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;72×72&quot; href=&quot;/static/favicon/apple-touch-icon-72×72.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;144×144&quot; href=&quot;/static/favicon/apple-touch-icon-144×144.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;60×60&quot; href=&quot;/static/favicon/apple-touch-icon-60×60.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;120×120&quot; href=&quot;/static/favicon/apple-touch-icon-120×120.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;76×76&quot; href=&quot;/static/favicon/apple-touch-icon-76×76.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;152×152&quot; href=&quot;/static/favicon/apple-touch-icon-152×152.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;180×180&quot; href=&quot;/static/favicon/apple-touch-icon-180×180.png&quot;&gt; &lt;link rel=&quot;icon&quot; type=&quot;image/png&quot; href=&quot;/static/favicon/favicon-192×192.png&quot; sizes=&quot;192×192&quot;&gt; &lt;link rel=&quot;icon&quot; type=&quot;image/png&quot; href=&quot;/static/favicon/favicon-160×160.png&quot; sizes=&quot;160×160&quot;&gt; &lt;link rel=&quot;icon&quot; type=&quot;image/png&quot; href=&quot;/static/favicon/favicon-96×96.png&quot; sizes=&quot;96×96&quot;&gt; &lt;link rel=&quot;icon&quot; type=&quot;image/png&quot; href=&quot;/static/favicon/favicon-16×16.png&quot; sizes=&quot;16×16&quot;&gt; &lt;link rel=&quot;icon&quot; type=&quot;image/png&quot; href=&quot;/static/favicon/favicon-32×32.png&quot; sizes=&quot;32×32&quot;&gt; &lt;meta name=&quot;msapplication-TileColor&quot; content=&quot;#da532c&quot;&gt; &lt;meta name=&quot;msapplication-TileImage&quot; content=&quot;/static/favicon/mstile-144×144.png&quot;&gt; &lt;meta name=&quot;msapplication-config&quot; content=&quot;/static/favicon/browserconfig.xml&quot;&gt; &lt;!– External CSS –&gt; &lt;link rel=&quot;stylesheet&quot; href=&quot;https://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap.min.css&quot;&gt; &lt;!– External Fonts –&gt; &lt;link href=&quot;//maxcdn.bootstrapcdn.com/font-awesome/4.2.0/css/font-awesome.min.css&quot; rel=&quot;stylesheet&quot;&gt; &lt;link href=&#x27;//fonts.googleapis.com/css?family=Open+Sans:300,600&#x27; rel=&#x27;stylesheet&#x27; type=&#x27;text/css&#x27;&gt; &lt;link href=&#x27;//fonts.googleapis.com/css?family=Lora:400,700&#x27; rel=&#x27;stylesheet&#x27; type=&#x27;text/css&#x27;&gt; &lt;!–[if lt IE 9]&gt; &lt;script src=&quot;//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.2/html5shiv.min.js&quot;&gt;&lt;/script&gt; &lt;script src=&quot;//cdnjs.cloudflare.com/ajax/libs/respond.js/1.4.2/respond.min.js&quot;&gt;&lt;/script&gt; &lt;![endif]–&gt; &lt;!– Site CSS –&gt; &lt;link rel=&quot;stylesheet&quot; type=&quot;text/css&quot; href=&quot;/static/css/style.css&quot;&gt; &lt;link rel=&quot;stylesheet&quot; type=&quot;text/css&quot; href=&quot;/static/css/glyphicon.css&quot;&gt; &lt;/head&gt; &lt;body&gt; &lt;div class=&quot;container article-page&quot;&gt; &lt;div class=&quot;row&quot;&gt; &lt;div class=&quot;col-md-8 col-md-offset-2&quot;&gt; &lt;h2&gt;&lt;a href=&quot;http://www.huffingtonpost.com/2015/03/22/ted-cruz-climate-change_n_6919002.html&quot;&gt;Gov. Jerry Brown Says Ted Cruz Is &amp;#39;Absolutely Unfit&amp;#39; To Run For Office Because Of Climate Change Views&lt;/a&gt;&lt;/h2&gt; &lt;h4&gt;Sam Levine&lt;/h4&gt; &lt;div class=&quot;article&quot;&gt; &lt;p&gt;California Gov. Jerry Brown (D) said on Sunday that Texas Sen. Ted Cruz (R-Texas) is &quot;absolutely unfit to be running for office&quot; because of his position on climate change.&lt;/p&gt; &lt;p&gt;&quot;I just came back from New Hampshire, where there&#x27;s snow and ice everywhere. My view on this is simple: Debates on this should follow science and should follow data, and many of the alarmists on global warming, they have a problem because the science doesn&#x27;t back them up,&quot; Cruz &lt;a href=&quot;https://www.youtube.com/watch?v=m0UJ_Sc0Udk&quot;&gt;said&lt;/a&gt; on &quot;Late Night with Seth Meyers&quot; last week.&lt;/p&gt; &lt;p&gt;To back up his claim, Cruz cited satellite data that has shown a lack of significant warming over the last 17 years. But Cruz&#x27;s reasoning &lt;a href=&quot;http://www.politifact.com/truth-o-meter/statements/2015/mar/20 /ted-cruz/ted-cruzs-worlds-fire-not-last-17-years/&quot;&gt;has been debunked by Politifact&lt;/a&gt;, which has shown that scientists have ample evidence to believe that the climate will continue to warm.&lt;/p&gt; &lt;p&gt;&quot;What he said is absolutely false,” Brown said on &lt;a href=&quot;http://www.nbcnews.com/meet-the-press/california-governor-ted-cruz- unfit-be-running-n328046&quot;&gt;NBC&#x27;s &quot;Meet the Press.&quot;&lt;/a&gt; He added that &lt;a href=&quot;http://climate.nasa.gov/scientific-consensus/&quot;&gt;over 90 percent&lt;/a&gt; of scientists who study the climate agree that climate change is caused by human activity. &quot;That man betokens such a level of ignorance and a direct falsification of existing scientific data. It&#x27;s shocking, and I think that man has rendered himself absolutely unfit to be running for office,&quot; Brown said.&lt;/p&gt; &lt;p&gt;Brown added that climate change has &lt;a href=&quot;http://www.huffingtonpost.com/2015/03/06/california-drought-february- record_n_6820704.html?utm_hp_ref=california-drought&quot;&gt;caused droughts in his state&lt;/a&gt;, as well as severe cold and storms on the east coast.&lt;/p&gt; &lt;p&gt;While Cruz may have seen snow and ice everywhere in New Hampshire, data shows that the country is actually experiencing a &lt;a href=&quot;http://www.huffingtonpost.com/2015/02/19/cold-weather- winter_n_6713104.html&quot;&gt;warmer than average&lt;/a&gt; winter.&lt;/p&gt; &lt;p&gt;Brown’s criticism of Cruz comes one day before the Texas senator is set to announce a &lt;a href=&quot;http://www.huffingtonpost.com/2015/03/22 /ted-cruz-2016_n_6917824.html&quot;&gt;presidential campaign&lt;/a&gt;. &lt;/p&gt; &lt;/div&gt; &lt;div class=&quot;original&quot;&gt; &lt;a href=&quot;http://www.huffingtonpost.com/2015/03/22/ted-cruz-climate-change_n_6919002.html&quot;&gt;VIEW ORIGINAL&lt;/a&gt; &lt;/div&gt; &lt;/div&gt; &lt;/div&gt; &lt;/div&gt; &lt;script src=&quot;//code.jquery.com/jquery-latest.js&quot;&gt;&lt;/script&gt; &lt;script src=&quot;/static/js/modal.js&quot;&gt;&lt;/script&gt; &lt;script src=&quot;/static/js/bootbox.min.js&quot;&gt;&lt;/script&gt; &lt;script src=&quot;/static/js/site.js&quot;&gt;&lt;/script&gt; &lt;script&gt; (function(i,s,o,g,r,a,m){i[&#x27;GoogleAnalyticsObject&#x27;]=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,&#x27;script&#x27;,&#x27;//www.google-analytics.com/analytics.js&#x27;,&#x27;ga&#x27;); ga(&#x27;create&#x27;, &#x27;UA-56257533-1&#x27;, &#x27;auto&#x27;); ga(&#x27;send&#x27;, &#x27;pageview&#x27;); &lt;/script&gt; &lt;/body&gt; &lt;/html&gt;”


回答 0

我在这里修改了代码:

public class test extends Activity {
    private WebView wv;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.test);
        wv = (WebView) findViewById(R.id.wv);
        String s = "&lt;!DOCTYPE html&gt; &lt;html lang=&quot;en&quot;&gt; &lt;head&gt; &lt;meta charset=&quot;utf-8&quot;&gt; &lt;meta http-equiv=&quot;X-UA-Compatible&quot; content=&quot;IE=edge&quot;&gt; &lt;meta name=&quot;viewport&quot; content=&quot;width=device-width, initial-scale=1.0&quot;&gt; &lt;meta name=&quot;description&quot; content=&quot;&quot;&gt; &lt;title&gt;Saulify&lt;/title&gt; &lt;!-- All the Favicons... --&gt; &lt;link rel=&quot;shortcut icon&quot; href=&quot;/static/favicon/favicon.ico&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;57x57&quot; href=&quot;/static/favicon/apple-touch-icon-57x57.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;114x114&quot; href=&quot;/static/favicon/apple-touch-icon-114x114.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;72x72&quot; href=&quot;/static/favicon/apple-touch-icon-72x72.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;144x144&quot; href=&quot;/static/favicon/apple-touch-icon-144x144.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;60x60&quot; href=&quot;/static/favicon/apple-touch-icon-60x60.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;120x120&quot; href=&quot;/static/favicon/apple-touch-icon-120x120.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;76x76&quot; href=&quot;/static/favicon/apple-touch-icon-76x76.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;152x152&quot; href=&quot;/static/favicon/apple-touch-icon-152x152.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;180x180&quot; href=&quot;/static/favicon/apple-touch-icon-180x180.png&quot;&gt; &lt;link rel=&quot;icon&quot; type=&quot;image/png&quot; href=&quot;/static/favicon/favicon-192x192.png&quot; sizes=&quot;192x192&quot;&gt; &lt;link rel=&quot;icon&quot; type=&quot;image/png&quot; href=&quot;/static/favicon/favicon-160x160.png&quot; sizes=&quot;160x160&quot;&gt; &lt;link rel=&quot;icon&quot; type=&quot;image/png&quot; href=&quot;/static/favicon/favicon-96x96.png&quot; sizes=&quot;96x96&quot;&gt; &lt;link rel=&quot;icon&quot; type=&quot;image/png&quot; href=&quot;/static/favicon/favicon-16x16.png&quot; sizes=&quot;16x16&quot;&gt; &lt;link rel=&quot;icon&quot; type=&quot;image/png&quot; href=&quot;/static/favicon/favicon-32x32.png&quot; sizes=&quot;32x32&quot;&gt; &lt;meta name=&quot;msapplication-TileColor&quot; content=&quot;#da532c&quot;&gt; &lt;meta name=&quot;msapplication-TileImage&quot; content=&quot;/static/favicon/mstile-144x144.png&quot;&gt; &lt;meta name=&quot;msapplication-config&quot; content=&quot;/static/favicon/browserconfig.xml&quot;&gt; &lt;!-- External CSS --&gt; &lt;link rel=&quot;stylesheet&quot; href=&quot;https://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap.min.css&quot;&gt; &lt;!-- External Fonts --&gt; &lt;link href=&quot;//maxcdn.bootstrapcdn.com/font-awesome/4.2.0/css/font-awesome.min.css&quot; rel=&quot;stylesheet&quot;&gt; &lt;link href=&#x27;//fonts.googleapis.com/css?family=Open+Sans:300,600&#x27; rel=&#x27;stylesheet&#x27; type=&#x27;text/css&#x27;&gt; &lt;link href=&#x27;//fonts.googleapis.com/css?family=Lora:400,700&#x27; rel=&#x27;stylesheet&#x27; type=&#x27;text/css&#x27;&gt; &lt;!--[if lt IE 9]&gt; &lt;script src=&quot;//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.2/html5shiv.min.js&quot;&gt;&lt;/script&gt; &lt;script src=&quot;//cdnjs.cloudflare.com/ajax/libs/respond.js/1.4.2/respond.min.js&quot;&gt;&lt;/script&gt; &lt;![endif]--&gt; &lt;!-- Site CSS --&gt; &lt;link rel=&quot;stylesheet&quot; type=&quot;text/css&quot; href=&quot;/static/css/style.css&quot;&gt; &lt;link rel=&quot;stylesheet&quot; type=&quot;text/css&quot; href=&quot;/static/css/glyphicon.css&quot;&gt; &lt;/head&gt; &lt;body&gt; &lt;div class=&quot;container article-page&quot;&gt; &lt;div class=&quot;row&quot;&gt; &lt;div class=&quot;col-md-8 col-md-offset-2&quot;&gt; &lt;h2&gt;&lt;a href=&quot;http://www.huffingtonpost.com/2015/03/22/ted-cruz-climate-change_n_6919002.html&quot;&gt;Gov. Jerry Brown Says Ted Cruz Is &amp;#39;Absolutely Unfit&amp;#39; To Run For Office Because Of Climate Change Views&lt;/a&gt;&lt;/h2&gt; &lt;h4&gt;Sam Levine&lt;/h4&gt; &lt;div class=&quot;article&quot;&gt; &lt;p&gt;California Gov. Jerry Brown (D) said on Sunday that Texas Sen. Ted Cruz (R-Texas) is &quot;absolutely unfit to be running for office&quot; because of his position on climate change.&lt;/p&gt; &lt;p&gt;&quot;I just came back from New Hampshire, where there&#x27;s snow and ice everywhere. My view on this is simple: Debates on this should follow science and should follow data, and many of the alarmists on global warming, they have a problem because the science doesn&#x27;t back them up,&quot; Cruz &lt;a href=&quot;https://www.youtube.com/watch?v=m0UJ_Sc0Udk&quot;&gt;said&lt;/a&gt; on &quot;Late Night with Seth Meyers&quot; last week.&lt;/p&gt; &lt;p&gt;To back up his claim, Cruz cited satellite data that has shown a lack of significant warming over the last 17 years. But Cruz&#x27;s reasoning &lt;a href=&quot;http://www.politifact.com/truth-o-meter/statements/2015/mar/20 /ted-cruz/ted-cruzs-worlds-fire-not-last-17-years/&quot;&gt;has been debunked by Politifact&lt;/a&gt;, which has shown that scientists have ample evidence to believe that the climate will continue to warm.&lt;/p&gt; &lt;p&gt;&quot;What he said is absolutely false,” Brown said on &lt;a href=&quot;http://www.nbcnews.com/meet-the-press/california-governor-ted-cruz- unfit-be-running-n328046&quot;&gt;NBC&#x27;s &quot;Meet the Press.&quot;&lt;/a&gt; He added that &lt;a href=&quot;http://climate.nasa.gov/scientific-consensus/&quot;&gt;over 90 percent&lt;/a&gt; of scientists who study the climate agree that climate change is caused by human activity. &quot;That man betokens such a level of ignorance and a direct falsification of existing scientific data. It&#x27;s shocking, and I think that man has rendered himself absolutely unfit to be running for office,&quot; Brown said.&lt;/p&gt; &lt;p&gt;Brown added that climate change has &lt;a href=&quot;http://www.huffingtonpost.com/2015/03/06/california-drought-february- record_n_6820704.html?utm_hp_ref=california-drought&quot;&gt;caused droughts in his state&lt;/a&gt;, as well as severe cold and storms on the east coast.&lt;/p&gt; &lt;p&gt;While Cruz may have seen snow and ice everywhere in New Hampshire, data shows that the country is actually experiencing a &lt;a href=&quot;http://www.huffingtonpost.com/2015/02/19/cold-weather- winter_n_6713104.html&quot;&gt;warmer than average&lt;/a&gt; winter.&lt;/p&gt; &lt;p&gt;Brown’s criticism of Cruz comes one day before the Texas senator is set to announce a &lt;a href=&quot;http://www.huffingtonpost.com/2015/03/22 /ted-cruz-2016_n_6917824.html&quot;&gt;presidential campaign&lt;/a&gt;. &lt;/p&gt; &lt;/div&gt; &lt;div class=&quot;original&quot;&gt; &lt;a href=&quot;http://www.huffingtonpost.com/2015/03/22/ted-cruz-climate-change_n_6919002.html&quot;&gt;VIEW ORIGINAL&lt;/a&gt; &lt;/div&gt; &lt;/div&gt; &lt;/div&gt; &lt;/div&gt; &lt;script src=&quot;//code.jquery.com/jquery-latest.js&quot;&gt;&lt;/script&gt; &lt;script src=&quot;/static/js/modal.js&quot;&gt;&lt;/script&gt; &lt;script src=&quot;/static/js/bootbox.min.js&quot;&gt;&lt;/script&gt; &lt;script src=&quot;/static/js/site.js&quot;&gt;&lt;/script&gt; &lt;script&gt; (function(i,s,o,g,r,a,m){i[&#x27;GoogleAnalyticsObject&#x27;]=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,&#x27;script&#x27;,&#x27;//www.google-analytics.com/analytics.js&#x27;,&#x27;ga&#x27;); ga(&#x27;create&#x27;, &#x27;UA-56257533-1&#x27;, &#x27;auto&#x27;); ga(&#x27;send&#x27;, &#x27;pageview&#x27;); &lt;/script&gt; &lt;/body&gt; &lt;/html&gt;";


        wv.loadData(stripHtml(s), "text/html", "UTF-8");

    }

    public String stripHtml(String html) {
        return Html.fromHtml(html).toString();
    }

}

I have modified the code here:

public class test extends Activity {
    private WebView wv;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.test);
        wv = (WebView) findViewById(R.id.wv);
        String s = "&lt;!DOCTYPE html&gt; &lt;html lang=&quot;en&quot;&gt; &lt;head&gt; &lt;meta charset=&quot;utf-8&quot;&gt; &lt;meta http-equiv=&quot;X-UA-Compatible&quot; content=&quot;IE=edge&quot;&gt; &lt;meta name=&quot;viewport&quot; content=&quot;width=device-width, initial-scale=1.0&quot;&gt; &lt;meta name=&quot;description&quot; content=&quot;&quot;&gt; &lt;title&gt;Saulify&lt;/title&gt; &lt;!-- All the Favicons... --&gt; &lt;link rel=&quot;shortcut icon&quot; href=&quot;/static/favicon/favicon.ico&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;57x57&quot; href=&quot;/static/favicon/apple-touch-icon-57x57.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;114x114&quot; href=&quot;/static/favicon/apple-touch-icon-114x114.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;72x72&quot; href=&quot;/static/favicon/apple-touch-icon-72x72.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;144x144&quot; href=&quot;/static/favicon/apple-touch-icon-144x144.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;60x60&quot; href=&quot;/static/favicon/apple-touch-icon-60x60.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;120x120&quot; href=&quot;/static/favicon/apple-touch-icon-120x120.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;76x76&quot; href=&quot;/static/favicon/apple-touch-icon-76x76.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;152x152&quot; href=&quot;/static/favicon/apple-touch-icon-152x152.png&quot;&gt; &lt;link rel=&quot;apple-touch-icon&quot; sizes=&quot;180x180&quot; href=&quot;/static/favicon/apple-touch-icon-180x180.png&quot;&gt; &lt;link rel=&quot;icon&quot; type=&quot;image/png&quot; href=&quot;/static/favicon/favicon-192x192.png&quot; sizes=&quot;192x192&quot;&gt; &lt;link rel=&quot;icon&quot; type=&quot;image/png&quot; href=&quot;/static/favicon/favicon-160x160.png&quot; sizes=&quot;160x160&quot;&gt; &lt;link rel=&quot;icon&quot; type=&quot;image/png&quot; href=&quot;/static/favicon/favicon-96x96.png&quot; sizes=&quot;96x96&quot;&gt; &lt;link rel=&quot;icon&quot; type=&quot;image/png&quot; href=&quot;/static/favicon/favicon-16x16.png&quot; sizes=&quot;16x16&quot;&gt; &lt;link rel=&quot;icon&quot; type=&quot;image/png&quot; href=&quot;/static/favicon/favicon-32x32.png&quot; sizes=&quot;32x32&quot;&gt; &lt;meta name=&quot;msapplication-TileColor&quot; content=&quot;#da532c&quot;&gt; &lt;meta name=&quot;msapplication-TileImage&quot; content=&quot;/static/favicon/mstile-144x144.png&quot;&gt; &lt;meta name=&quot;msapplication-config&quot; content=&quot;/static/favicon/browserconfig.xml&quot;&gt; &lt;!-- External CSS --&gt; &lt;link rel=&quot;stylesheet&quot; href=&quot;https://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap.min.css&quot;&gt; &lt;!-- External Fonts --&gt; &lt;link href=&quot;//maxcdn.bootstrapcdn.com/font-awesome/4.2.0/css/font-awesome.min.css&quot; rel=&quot;stylesheet&quot;&gt; &lt;link href=&#x27;//fonts.googleapis.com/css?family=Open+Sans:300,600&#x27; rel=&#x27;stylesheet&#x27; type=&#x27;text/css&#x27;&gt; &lt;link href=&#x27;//fonts.googleapis.com/css?family=Lora:400,700&#x27; rel=&#x27;stylesheet&#x27; type=&#x27;text/css&#x27;&gt; &lt;!--[if lt IE 9]&gt; &lt;script src=&quot;//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.2/html5shiv.min.js&quot;&gt;&lt;/script&gt; &lt;script src=&quot;//cdnjs.cloudflare.com/ajax/libs/respond.js/1.4.2/respond.min.js&quot;&gt;&lt;/script&gt; &lt;![endif]--&gt; &lt;!-- Site CSS --&gt; &lt;link rel=&quot;stylesheet&quot; type=&quot;text/css&quot; href=&quot;/static/css/style.css&quot;&gt; &lt;link rel=&quot;stylesheet&quot; type=&quot;text/css&quot; href=&quot;/static/css/glyphicon.css&quot;&gt; &lt;/head&gt; &lt;body&gt; &lt;div class=&quot;container article-page&quot;&gt; &lt;div class=&quot;row&quot;&gt; &lt;div class=&quot;col-md-8 col-md-offset-2&quot;&gt; &lt;h2&gt;&lt;a href=&quot;http://www.huffingtonpost.com/2015/03/22/ted-cruz-climate-change_n_6919002.html&quot;&gt;Gov. Jerry Brown Says Ted Cruz Is &amp;#39;Absolutely Unfit&amp;#39; To Run For Office Because Of Climate Change Views&lt;/a&gt;&lt;/h2&gt; &lt;h4&gt;Sam Levine&lt;/h4&gt; &lt;div class=&quot;article&quot;&gt; &lt;p&gt;California Gov. Jerry Brown (D) said on Sunday that Texas Sen. Ted Cruz (R-Texas) is &quot;absolutely unfit to be running for office&quot; because of his position on climate change.&lt;/p&gt; &lt;p&gt;&quot;I just came back from New Hampshire, where there&#x27;s snow and ice everywhere. My view on this is simple: Debates on this should follow science and should follow data, and many of the alarmists on global warming, they have a problem because the science doesn&#x27;t back them up,&quot; Cruz &lt;a href=&quot;https://www.youtube.com/watch?v=m0UJ_Sc0Udk&quot;&gt;said&lt;/a&gt; on &quot;Late Night with Seth Meyers&quot; last week.&lt;/p&gt; &lt;p&gt;To back up his claim, Cruz cited satellite data that has shown a lack of significant warming over the last 17 years. But Cruz&#x27;s reasoning &lt;a href=&quot;http://www.politifact.com/truth-o-meter/statements/2015/mar/20 /ted-cruz/ted-cruzs-worlds-fire-not-last-17-years/&quot;&gt;has been debunked by Politifact&lt;/a&gt;, which has shown that scientists have ample evidence to believe that the climate will continue to warm.&lt;/p&gt; &lt;p&gt;&quot;What he said is absolutely false,” Brown said on &lt;a href=&quot;http://www.nbcnews.com/meet-the-press/california-governor-ted-cruz- unfit-be-running-n328046&quot;&gt;NBC&#x27;s &quot;Meet the Press.&quot;&lt;/a&gt; He added that &lt;a href=&quot;http://climate.nasa.gov/scientific-consensus/&quot;&gt;over 90 percent&lt;/a&gt; of scientists who study the climate agree that climate change is caused by human activity. &quot;That man betokens such a level of ignorance and a direct falsification of existing scientific data. It&#x27;s shocking, and I think that man has rendered himself absolutely unfit to be running for office,&quot; Brown said.&lt;/p&gt; &lt;p&gt;Brown added that climate change has &lt;a href=&quot;http://www.huffingtonpost.com/2015/03/06/california-drought-february- record_n_6820704.html?utm_hp_ref=california-drought&quot;&gt;caused droughts in his state&lt;/a&gt;, as well as severe cold and storms on the east coast.&lt;/p&gt; &lt;p&gt;While Cruz may have seen snow and ice everywhere in New Hampshire, data shows that the country is actually experiencing a &lt;a href=&quot;http://www.huffingtonpost.com/2015/02/19/cold-weather- winter_n_6713104.html&quot;&gt;warmer than average&lt;/a&gt; winter.&lt;/p&gt; &lt;p&gt;Brown’s criticism of Cruz comes one day before the Texas senator is set to announce a &lt;a href=&quot;http://www.huffingtonpost.com/2015/03/22 /ted-cruz-2016_n_6917824.html&quot;&gt;presidential campaign&lt;/a&gt;. &lt;/p&gt; &lt;/div&gt; &lt;div class=&quot;original&quot;&gt; &lt;a href=&quot;http://www.huffingtonpost.com/2015/03/22/ted-cruz-climate-change_n_6919002.html&quot;&gt;VIEW ORIGINAL&lt;/a&gt; &lt;/div&gt; &lt;/div&gt; &lt;/div&gt; &lt;/div&gt; &lt;script src=&quot;//code.jquery.com/jquery-latest.js&quot;&gt;&lt;/script&gt; &lt;script src=&quot;/static/js/modal.js&quot;&gt;&lt;/script&gt; &lt;script src=&quot;/static/js/bootbox.min.js&quot;&gt;&lt;/script&gt; &lt;script src=&quot;/static/js/site.js&quot;&gt;&lt;/script&gt; &lt;script&gt; (function(i,s,o,g,r,a,m){i[&#x27;GoogleAnalyticsObject&#x27;]=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,&#x27;script&#x27;,&#x27;//www.google-analytics.com/analytics.js&#x27;,&#x27;ga&#x27;); ga(&#x27;create&#x27;, &#x27;UA-56257533-1&#x27;, &#x27;auto&#x27;); ga(&#x27;send&#x27;, &#x27;pageview&#x27;); &lt;/script&gt; &lt;/body&gt; &lt;/html&gt;";


        wv.loadData(stripHtml(s), "text/html", "UTF-8");

    }

    public String stripHtml(String html) {
        return Html.fromHtml(html).toString();
    }

}


回答 1

试试这个代码,

if (android.os.Build.VERSION.SDK_INT >= android.os.Build.VERSION_CODES.N){
   yourtextview.setText(Html.fromHtml(yourstring,Html.FROM_HTML_MODE_LEGACY));
}
else {
   yourtextview.setText(Html.fromHtml(yourstring));
}

Try this code,

if (android.os.Build.VERSION.SDK_INT >= android.os.Build.VERSION_CODES.N){
   yourtextview.setText(Html.fromHtml(yourstring,Html.FROM_HTML_MODE_LEGACY));
}
else {
   yourtextview.setText(Html.fromHtml(yourstring));
}

回答 2

试试这个:

wv = (WebView) findViewById(R.id.wv);
String s = "You HTML string";
wv.loadData(stripHtml(s), "text/html", "UTF-8");

public String stripHtml(String html) {
    return Html.fromHtml(html).toString();
}

Try this:

wv = (WebView) findViewById(R.id.wv);
String s = "You HTML string";
wv.loadData(stripHtml(s), "text/html", "UTF-8");

public String stripHtml(String html) {
    return Html.fromHtml(html).toString();
}

比较两个DataFrame并并排输出它们的差异

问题:比较两个DataFrame并并排输出它们的差异

我试图突出显示两个数据框之间到底发生了什么变化。

假设我有两个Python Pandas数据框:

"StudentRoster Jan-1":
id   Name   score                    isEnrolled           Comment
111  Jack   2.17                     True                 He was late to class
112  Nick   1.11                     False                Graduated
113  Zoe    4.12                     True       

"StudentRoster Jan-2":
id   Name   score                    isEnrolled           Comment
111  Jack   2.17                     True                 He was late to class
112  Nick   1.21                     False                Graduated
113  Zoe    4.12                     False                On vacation

我的目标是输出一个HTML表:

  1. 标识已更改的行(可以是int,float,boolean,string)
  2. 输出具有相同,OLD和NEW值的行(理想情况下将其输出到HTML表中),以便使用者可以清楚地看到两个数据框之间的变化:

    "StudentRoster Difference Jan-1 - Jan-2":  
    id   Name   score                    isEnrolled           Comment
    112  Nick   was 1.11| now 1.21       False                Graduated
    113  Zoe    4.12                     was True | now False was "" | now   "On   vacation"

我想我可以逐行和逐列进行比较,但是有没有更简单的方法?

I am trying to highlight exactly what changed between two dataframes.

Suppose I have two Python Pandas dataframes:

"StudentRoster Jan-1":
id   Name   score                    isEnrolled           Comment
111  Jack   2.17                     True                 He was late to class
112  Nick   1.11                     False                Graduated
113  Zoe    4.12                     True       

"StudentRoster Jan-2":
id   Name   score                    isEnrolled           Comment
111  Jack   2.17                     True                 He was late to class
112  Nick   1.21                     False                Graduated
113  Zoe    4.12                     False                On vacation

My goal is to output an HTML table that:

  1. Identifies rows that have changed (could be int, float, boolean, string)
  2. Outputs rows with same, OLD and NEW values (ideally into an HTML table) so the consumer can clearly see what changed between two dataframes:

    "StudentRoster Difference Jan-1 - Jan-2":  
    id   Name   score                    isEnrolled           Comment
    112  Nick   was 1.11| now 1.21       False                Graduated
    113  Zoe    4.12                     was True | now False was "" | now   "On   vacation"
    

I suppose I could do a row by row and column by column comparison, but is there an easier way?


回答 0

第一部分类似于君士坦丁,您可以获取哪些行为空的布尔值*:

In [21]: ne = (df1 != df2).any(1)

In [22]: ne
Out[22]:
0    False
1     True
2     True
dtype: bool

然后,我们可以查看哪些条目已更改:

In [23]: ne_stacked = (df1 != df2).stack()

In [24]: changed = ne_stacked[ne_stacked]

In [25]: changed.index.names = ['id', 'col']

In [26]: changed
Out[26]:
id  col
1   score         True
2   isEnrolled    True
    Comment       True
dtype: bool

在这里,第一个条目是索引,第二个条目是已更改的列。

In [27]: difference_locations = np.where(df1 != df2)

In [28]: changed_from = df1.values[difference_locations]

In [29]: changed_to = df2.values[difference_locations]

In [30]: pd.DataFrame({'from': changed_from, 'to': changed_to}, index=changed.index)
Out[30]:
               from           to
id col
1  score       1.11         1.21
2  isEnrolled  True        False
   Comment     None  On vacation

*注:这是非常重要的df1,并df2在这里分享相同的索引。为了克服这种歧义,您可以确保仅使用来查看共享标签df1.index & df2.index,但我想将其保留为练习。

The first part is similar to Constantine, you can get the boolean of which rows are empty*:

In [21]: ne = (df1 != df2).any(1)

In [22]: ne
Out[22]:
0    False
1     True
2     True
dtype: bool

Then we can see which entries have changed:

In [23]: ne_stacked = (df1 != df2).stack()

In [24]: changed = ne_stacked[ne_stacked]

In [25]: changed.index.names = ['id', 'col']

In [26]: changed
Out[26]:
id  col
1   score         True
2   isEnrolled    True
    Comment       True
dtype: bool

Here the first entry is the index and the second the columns which has been changed.

In [27]: difference_locations = np.where(df1 != df2)

In [28]: changed_from = df1.values[difference_locations]

In [29]: changed_to = df2.values[difference_locations]

In [30]: pd.DataFrame({'from': changed_from, 'to': changed_to}, index=changed.index)
Out[30]:
               from           to
id col
1  score       1.11         1.21
2  isEnrolled  True        False
   Comment     None  On vacation

* Note: it’s important that df1 and df2 share the same index here. To overcome this ambiguity, you can ensure you only look at the shared labels using df1.index & df2.index, but I think I’ll leave that as an exercise.


回答 1

突出显示两个DataFrame之间的差异

可以使用DataFrame样式属性突出显示存在差异的单元格的背景色。

使用原始问题中的示例数据

第一步是使用concat功能将DataFrames水平连接,并使用keys参数区分每个帧:

df_all = pd.concat([df.set_index('id'), df2.set_index('id')], 
                   axis='columns', keys=['First', 'Second'])
df_all

交换列级别并将相同的列名称彼此相邻可能更容易:

df_final = df_all.swaplevel(axis='columns')[df.columns[1:]]
df_final

现在,更容易发现框架中的差异。但是,我们可以走得更远,并使用该style属性突出显示不同的单元格。我们定义了一个自定义函数来执行此操作,您可以在文档的此部分中看到。

def highlight_diff(data, color='yellow'):
    attr = 'background-color: {}'.format(color)
    other = data.xs('First', axis='columns', level=-1)
    return pd.DataFrame(np.where(data.ne(other, level=0), attr, ''),
                        index=data.index, columns=data.columns)

df_final.style.apply(highlight_diff, axis=None)

这将突出显示两个均缺少值的单元格。您可以填充它们或提供额外的逻辑,以免突出显示它们。

Highlighting the difference between two DataFrames

It is possible to use the DataFrame style property to highlight the background color of the cells where there is a difference.

Using the example data from the original question

The first step is to concatenate the DataFrames horizontally with the concat function and distinguish each frame with the keys parameter:

df_all = pd.concat([df.set_index('id'), df2.set_index('id')], 
                   axis='columns', keys=['First', 'Second'])
df_all

It’s probably easier to swap the column levels and put the same column names next to each other:

df_final = df_all.swaplevel(axis='columns')[df.columns[1:]]
df_final

Now, its much easier to spot the differences in the frames. But, we can go further and use the style property to highlight the cells that are different. We define a custom function to do this which you can see in this part of the documentation.

def highlight_diff(data, color='yellow'):
    attr = 'background-color: {}'.format(color)
    other = data.xs('First', axis='columns', level=-1)
    return pd.DataFrame(np.where(data.ne(other, level=0), attr, ''),
                        index=data.index, columns=data.columns)

df_final.style.apply(highlight_diff, axis=None)

This will highlight cells that both have missing values. You can either fill them or provide extra logic so that they don’t get highlighted.


回答 2

这个答案只是扩展了@Andy Hayden的值,使其在数字字段为时具有弹性nan,并将其包装到函数中。

import pandas as pd
import numpy as np


def diff_pd(df1, df2):
    """Identify differences between two pandas DataFrames"""
    assert (df1.columns == df2.columns).all(), \
        "DataFrame column names are different"
    if any(df1.dtypes != df2.dtypes):
        "Data Types are different, trying to convert"
        df2 = df2.astype(df1.dtypes)
    if df1.equals(df2):
        return None
    else:
        # need to account for np.nan != np.nan returning True
        diff_mask = (df1 != df2) & ~(df1.isnull() & df2.isnull())
        ne_stacked = diff_mask.stack()
        changed = ne_stacked[ne_stacked]
        changed.index.names = ['id', 'col']
        difference_locations = np.where(diff_mask)
        changed_from = df1.values[difference_locations]
        changed_to = df2.values[difference_locations]
        return pd.DataFrame({'from': changed_from, 'to': changed_to},
                            index=changed.index)

因此,对于您的数据(略作编辑以使分数列中具有NaN):

import sys
if sys.version_info[0] < 3:
    from StringIO import StringIO
else:
    from io import StringIO

DF1 = StringIO("""id   Name   score                    isEnrolled           Comment
111  Jack   2.17                     True                 "He was late to class"
112  Nick   1.11                     False                "Graduated"
113  Zoe    NaN                     True                  " "
""")
DF2 = StringIO("""id   Name   score                    isEnrolled           Comment
111  Jack   2.17                     True                 "He was late to class"
112  Nick   1.21                     False                "Graduated"
113  Zoe    NaN                     False                "On vacation" """)
df1 = pd.read_table(DF1, sep='\s+', index_col='id')
df2 = pd.read_table(DF2, sep='\s+', index_col='id')
diff_pd(df1, df2)

输出:

                from           to
id  col                          
112 score       1.11         1.21
113 isEnrolled  True        False
    Comment           On vacation

This answer simply extends @Andy Hayden’s, making it resilient to when numeric fields are nan, and wrapping it up into a function.

import pandas as pd
import numpy as np


def diff_pd(df1, df2):
    """Identify differences between two pandas DataFrames"""
    assert (df1.columns == df2.columns).all(), \
        "DataFrame column names are different"
    if any(df1.dtypes != df2.dtypes):
        "Data Types are different, trying to convert"
        df2 = df2.astype(df1.dtypes)
    if df1.equals(df2):
        return None
    else:
        # need to account for np.nan != np.nan returning True
        diff_mask = (df1 != df2) & ~(df1.isnull() & df2.isnull())
        ne_stacked = diff_mask.stack()
        changed = ne_stacked[ne_stacked]
        changed.index.names = ['id', 'col']
        difference_locations = np.where(diff_mask)
        changed_from = df1.values[difference_locations]
        changed_to = df2.values[difference_locations]
        return pd.DataFrame({'from': changed_from, 'to': changed_to},
                            index=changed.index)

So with your data (slightly edited to have a NaN in the score column):

import sys
if sys.version_info[0] < 3:
    from StringIO import StringIO
else:
    from io import StringIO

DF1 = StringIO("""id   Name   score                    isEnrolled           Comment
111  Jack   2.17                     True                 "He was late to class"
112  Nick   1.11                     False                "Graduated"
113  Zoe    NaN                     True                  " "
""")
DF2 = StringIO("""id   Name   score                    isEnrolled           Comment
111  Jack   2.17                     True                 "He was late to class"
112  Nick   1.21                     False                "Graduated"
113  Zoe    NaN                     False                "On vacation" """)
df1 = pd.read_table(DF1, sep='\s+', index_col='id')
df2 = pd.read_table(DF2, sep='\s+', index_col='id')
diff_pd(df1, df2)

Output:

                from           to
id  col                          
112 score       1.11         1.21
113 isEnrolled  True        False
    Comment           On vacation

回答 3

import pandas as pd
import io

texts = ['''\
id   Name   score                    isEnrolled                        Comment
111  Jack   2.17                     True                 He was late to class
112  Nick   1.11                     False                           Graduated
113  Zoe    4.12                     True       ''',

         '''\
id   Name   score                    isEnrolled                        Comment
111  Jack   2.17                     True                 He was late to class
112  Nick   1.21                     False                           Graduated
113  Zoe    4.12                     False                         On vacation''']


df1 = pd.read_fwf(io.StringIO(texts[0]), widths=[5,7,25,21,20])
df2 = pd.read_fwf(io.StringIO(texts[1]), widths=[5,7,25,21,20])
df = pd.concat([df1,df2]) 

print(df)
#     id  Name  score isEnrolled               Comment
# 0  111  Jack   2.17       True  He was late to class
# 1  112  Nick   1.11      False             Graduated
# 2  113   Zoe   4.12       True                   NaN
# 0  111  Jack   2.17       True  He was late to class
# 1  112  Nick   1.21      False             Graduated
# 2  113   Zoe   4.12      False           On vacation

df.set_index(['id', 'Name'], inplace=True)
print(df)
#           score isEnrolled               Comment
# id  Name                                        
# 111 Jack   2.17       True  He was late to class
# 112 Nick   1.11      False             Graduated
# 113 Zoe    4.12       True                   NaN
# 111 Jack   2.17       True  He was late to class
# 112 Nick   1.21      False             Graduated
# 113 Zoe    4.12      False           On vacation

def report_diff(x):
    return x[0] if x[0] == x[1] else '{} | {}'.format(*x)

changes = df.groupby(level=['id', 'Name']).agg(report_diff)
print(changes)

版画

                score    isEnrolled               Comment
id  Name                                                 
111 Jack         2.17          True  He was late to class
112 Nick  1.11 | 1.21         False             Graduated
113 Zoe          4.12  True | False     nan | On vacation
import pandas as pd
import io

texts = ['''\
id   Name   score                    isEnrolled                        Comment
111  Jack   2.17                     True                 He was late to class
112  Nick   1.11                     False                           Graduated
113  Zoe    4.12                     True       ''',

         '''\
id   Name   score                    isEnrolled                        Comment
111  Jack   2.17                     True                 He was late to class
112  Nick   1.21                     False                           Graduated
113  Zoe    4.12                     False                         On vacation''']


df1 = pd.read_fwf(io.StringIO(texts[0]), widths=[5,7,25,21,20])
df2 = pd.read_fwf(io.StringIO(texts[1]), widths=[5,7,25,21,20])
df = pd.concat([df1,df2]) 

print(df)
#     id  Name  score isEnrolled               Comment
# 0  111  Jack   2.17       True  He was late to class
# 1  112  Nick   1.11      False             Graduated
# 2  113   Zoe   4.12       True                   NaN
# 0  111  Jack   2.17       True  He was late to class
# 1  112  Nick   1.21      False             Graduated
# 2  113   Zoe   4.12      False           On vacation

df.set_index(['id', 'Name'], inplace=True)
print(df)
#           score isEnrolled               Comment
# id  Name                                        
# 111 Jack   2.17       True  He was late to class
# 112 Nick   1.11      False             Graduated
# 113 Zoe    4.12       True                   NaN
# 111 Jack   2.17       True  He was late to class
# 112 Nick   1.21      False             Graduated
# 113 Zoe    4.12      False           On vacation

def report_diff(x):
    return x[0] if x[0] == x[1] else '{} | {}'.format(*x)

changes = df.groupby(level=['id', 'Name']).agg(report_diff)
print(changes)

prints

                score    isEnrolled               Comment
id  Name                                                 
111 Jack         2.17          True  He was late to class
112 Nick  1.11 | 1.21         False             Graduated
113 Zoe          4.12  True | False     nan | On vacation

回答 4

我已经遇到了这个问题,但是在找到这篇文章之前找到了答案:

根据unutbu的答案,加载您的数据…

import pandas as pd
import io

texts = ['''\
id   Name   score                    isEnrolled                       Date
111  Jack                            True              2013-05-01 12:00:00
112  Nick   1.11                     False             2013-05-12 15:05:23
     Zoe    4.12                     True                                  ''',

         '''\
id   Name   score                    isEnrolled                       Date
111  Jack   2.17                     True              2013-05-01 12:00:00
112  Nick   1.21                     False                                
     Zoe    4.12                     False             2013-05-01 12:00:00''']


df1 = pd.read_fwf(io.StringIO(texts[0]), widths=[5,7,25,17,20], parse_dates=[4])
df2 = pd.read_fwf(io.StringIO(texts[1]), widths=[5,7,25,17,20], parse_dates=[4])

…定义您的diff函数…

def report_diff(x):
    return x[0] if x[0] == x[1] else '{} | {}'.format(*x)

然后,您可以简单地使用面板来得出结论:

my_panel = pd.Panel(dict(df1=df1,df2=df2))
print my_panel.apply(report_diff, axis=0)

#          id  Name        score    isEnrolled                       Date
#0        111  Jack   nan | 2.17          True        2013-05-01 12:00:00
#1        112  Nick  1.11 | 1.21         False  2013-05-12 15:05:23 | NaT
#2  nan | nan   Zoe         4.12  True | False  NaT | 2013-05-01 12:00:00

顺便说一句,如果您使用的是IPython Notebook,则可能希望使用彩色的diff函数根据单元格是不同,相等还是left / right null来赋予颜色:

from IPython.display import HTML
pd.options.display.max_colwidth = 500  # You need this, otherwise pandas
#                          will limit your HTML strings to 50 characters

def report_diff(x):
    if x[0]==x[1]:
        return unicode(x[0].__str__())
    elif pd.isnull(x[0]) and pd.isnull(x[1]):
        return u'<table style="background-color:#00ff00;font-weight:bold;">'+\
            '<tr><td>%s</td></tr><tr><td>%s</td></tr></table>' % ('nan', 'nan')
    elif pd.isnull(x[0]) and ~pd.isnull(x[1]):
        return u'<table style="background-color:#ffff00;font-weight:bold;">'+\
            '<tr><td>%s</td></tr><tr><td>%s</td></tr></table>' % ('nan', x[1])
    elif ~pd.isnull(x[0]) and pd.isnull(x[1]):
        return u'<table style="background-color:#0000ff;font-weight:bold;">'+\
            '<tr><td>%s</td></tr><tr><td>%s</td></tr></table>' % (x[0],'nan')
    else:
        return u'<table style="background-color:#ff0000;font-weight:bold;">'+\
            '<tr><td>%s</td></tr><tr><td>%s</td></tr></table>' % (x[0], x[1])

HTML(my_panel.apply(report_diff, axis=0).to_html(escape=False))

I have faced this issue, but found an answer before finding this post :

Based on unutbu’s answer, load your data…

import pandas as pd
import io

texts = ['''\
id   Name   score                    isEnrolled                       Date
111  Jack                            True              2013-05-01 12:00:00
112  Nick   1.11                     False             2013-05-12 15:05:23
     Zoe    4.12                     True                                  ''',

         '''\
id   Name   score                    isEnrolled                       Date
111  Jack   2.17                     True              2013-05-01 12:00:00
112  Nick   1.21                     False                                
     Zoe    4.12                     False             2013-05-01 12:00:00''']


df1 = pd.read_fwf(io.StringIO(texts[0]), widths=[5,7,25,17,20], parse_dates=[4])
df2 = pd.read_fwf(io.StringIO(texts[1]), widths=[5,7,25,17,20], parse_dates=[4])

…define your diff function…

def report_diff(x):
    return x[0] if x[0] == x[1] else '{} | {}'.format(*x)

Then you can simply use a Panel to conclude :

my_panel = pd.Panel(dict(df1=df1,df2=df2))
print my_panel.apply(report_diff, axis=0)

#          id  Name        score    isEnrolled                       Date
#0        111  Jack   nan | 2.17          True        2013-05-01 12:00:00
#1        112  Nick  1.11 | 1.21         False  2013-05-12 15:05:23 | NaT
#2  nan | nan   Zoe         4.12  True | False  NaT | 2013-05-01 12:00:00

By the way, if you’re in IPython Notebook, you may like to use a colored diff function to give colors depending whether cells are different, equal or left/right null :

from IPython.display import HTML
pd.options.display.max_colwidth = 500  # You need this, otherwise pandas
#                          will limit your HTML strings to 50 characters

def report_diff(x):
    if x[0]==x[1]:
        return unicode(x[0].__str__())
    elif pd.isnull(x[0]) and pd.isnull(x[1]):
        return u'<table style="background-color:#00ff00;font-weight:bold;">'+\
            '<tr><td>%s</td></tr><tr><td>%s</td></tr></table>' % ('nan', 'nan')
    elif pd.isnull(x[0]) and ~pd.isnull(x[1]):
        return u'<table style="background-color:#ffff00;font-weight:bold;">'+\
            '<tr><td>%s</td></tr><tr><td>%s</td></tr></table>' % ('nan', x[1])
    elif ~pd.isnull(x[0]) and pd.isnull(x[1]):
        return u'<table style="background-color:#0000ff;font-weight:bold;">'+\
            '<tr><td>%s</td></tr><tr><td>%s</td></tr></table>' % (x[0],'nan')
    else:
        return u'<table style="background-color:#ff0000;font-weight:bold;">'+\
            '<tr><td>%s</td></tr><tr><td>%s</td></tr></table>' % (x[0], x[1])

HTML(my_panel.apply(report_diff, axis=0).to_html(escape=False))

回答 5

如果您的两个数据帧中具有相同的ID,那么找出更改实际上是很容易的。这样做frame1 != frame2会为您提供一个布尔型DataFrame,其中每个True都是已更改的数据。由此,您可以通过轻松获得每个更改行的索引changedids = frame1.index[np.any(frame1 != frame2,axis=1)]

If your two dataframes have the same ids in them, then finding out what changed is actually pretty easy. Just doing frame1 != frame2 will give you a boolean DataFrame where each True is data that has changed. From that, you could easily get the index of each changed row by doing changedids = frame1.index[np.any(frame1 != frame2,axis=1)].


回答 6

使用concat和drop_duplicates的另一种方法:

import sys
if sys.version_info[0] < 3:
    from StringIO import StringIO
else:
    from io import StringIO
import pandas as pd

DF1 = StringIO("""id   Name   score                    isEnrolled           Comment
111  Jack   2.17                     True                 "He was late to class"
112  Nick   1.11                     False                "Graduated"
113  Zoe    NaN                     True                  " "
""")
DF2 = StringIO("""id   Name   score                    isEnrolled           Comment
111  Jack   2.17                     True                 "He was late to class"
112  Nick   1.21                     False                "Graduated"
113  Zoe    NaN                     False                "On vacation" """)

df1 = pd.read_table(DF1, sep='\s+', index_col='id')
df2 = pd.read_table(DF2, sep='\s+', index_col='id')
#%%
dictionary = {1:df1,2:df2}
df=pd.concat(dictionary)
df.drop_duplicates(keep=False)

输出:

       Name  score isEnrolled      Comment
  id                                      
1 112  Nick   1.11      False    Graduated
  113   Zoe    NaN       True             
2 112  Nick   1.21      False    Graduated
  113   Zoe    NaN      False  On vacation

A different approach using concat and drop_duplicates:

import sys
if sys.version_info[0] < 3:
    from StringIO import StringIO
else:
    from io import StringIO
import pandas as pd

DF1 = StringIO("""id   Name   score                    isEnrolled           Comment
111  Jack   2.17                     True                 "He was late to class"
112  Nick   1.11                     False                "Graduated"
113  Zoe    NaN                     True                  " "
""")
DF2 = StringIO("""id   Name   score                    isEnrolled           Comment
111  Jack   2.17                     True                 "He was late to class"
112  Nick   1.21                     False                "Graduated"
113  Zoe    NaN                     False                "On vacation" """)

df1 = pd.read_table(DF1, sep='\s+', index_col='id')
df2 = pd.read_table(DF2, sep='\s+', index_col='id')
#%%
dictionary = {1:df1,2:df2}
df=pd.concat(dictionary)
df.drop_duplicates(keep=False)

Output:

       Name  score isEnrolled      Comment
  id                                      
1 112  Nick   1.11      False    Graduated
  113   Zoe    NaN       True             
2 112  Nick   1.21      False    Graduated
  113   Zoe    NaN      False  On vacation

回答 7

摆弄@journois的答案后,由于Panel的贬值,我能够使用MultiIndex而不是Panel使它正常工作

首先,创建一些虚拟数据:

df1 = pd.DataFrame({
    'id': ['111', '222', '333', '444', '555'],
    'let': ['a', 'b', 'c', 'd', 'e'],
    'num': ['1', '2', '3', '4', '5']
})
df2 = pd.DataFrame({
    'id': ['111', '222', '333', '444', '666'],
    'let': ['a', 'b', 'c', 'D', 'f'],
    'num': ['1', '2', 'Three', '4', '6'],
})

然后,定义您的diff函数,在这种情况下,我将使用他的答案中的一个report_diff保持不变:

def report_diff(x):
    return x[0] if x[0] == x[1] else '{} | {}'.format(*x)

然后,我将把数据连接到一个MultiIndex数据帧中:

df_all = pd.concat(
    [df1.set_index('id'), df2.set_index('id')], 
    axis='columns', 
    keys=['df1', 'df2'],
    join='outer'
)
df_all = df_all.swaplevel(axis='columns')[df1.columns[1:]]

最后,我将report_diff向下应用每个列组:

df_final.groupby(level=0, axis=1).apply(lambda frame: frame.apply(report_diff, axis=1))

输出:

         let        num
111        a          1
222        b          2
333        c  3 | Three
444    d | D          4
555  e | nan    5 | nan
666  nan | f    nan | 6

仅此而已!

After fiddling around with @journois’s answer, I was able to get it to work using MultiIndex instead of Panel due to Panel’s deprication.

First, create some dummy data:

df1 = pd.DataFrame({
    'id': ['111', '222', '333', '444', '555'],
    'let': ['a', 'b', 'c', 'd', 'e'],
    'num': ['1', '2', '3', '4', '5']
})
df2 = pd.DataFrame({
    'id': ['111', '222', '333', '444', '666'],
    'let': ['a', 'b', 'c', 'D', 'f'],
    'num': ['1', '2', 'Three', '4', '6'],
})

Then, define your diff function, in this case I’ll use the one from his answer report_diff stays the same:

def report_diff(x):
    return x[0] if x[0] == x[1] else '{} | {}'.format(*x)

Then, I’m going to concatenate the data into a MultiIndex dataframe:

df_all = pd.concat(
    [df1.set_index('id'), df2.set_index('id')], 
    axis='columns', 
    keys=['df1', 'df2'],
    join='outer'
)
df_all = df_all.swaplevel(axis='columns')[df1.columns[1:]]

And finally I’m going to apply the report_diff down each column group:

df_final.groupby(level=0, axis=1).apply(lambda frame: frame.apply(report_diff, axis=1))

This outputs:

         let        num
111        a          1
222        b          2
333        c  3 | Three
444    d | D          4
555  e | nan    5 | nan
666  nan | f    nan | 6

And that is all!


回答 8

扩展@cge的答案,这对于提高结果的可读性非常酷:

a[a != b][np.any(a != b, axis=1)].join(pd.DataFrame('a<->b', index=a.index, columns=['a<=>b'])).join(
        b[a != b][np.any(a != b, axis=1)]
        ,rsuffix='_b', how='outer'
).fillna('')

完整的演示示例:

import numpy as np, pandas as pd

a = pd.DataFrame(np.random.randn(7,3), columns=list('ABC'))
b = a.copy()
b.iloc[0,2] = np.nan
b.iloc[1,0] = 7
b.iloc[3,1] = 77
b.iloc[4,2] = 777

a[a != b][np.any(a != b, axis=1)].join(pd.DataFrame('a<->b', index=a.index, columns=['a<=>b'])).join(
        b[a != b][np.any(a != b, axis=1)]
        ,rsuffix='_b', how='outer'
).fillna('')

Extending answer of @cge, which is pretty cool for more readability of result:

a[a != b][np.any(a != b, axis=1)].join(pd.DataFrame('a<->b', index=a.index, columns=['a<=>b'])).join(
        b[a != b][np.any(a != b, axis=1)]
        ,rsuffix='_b', how='outer'
).fillna('')

Full demonstration example:

import numpy as np, pandas as pd

a = pd.DataFrame(np.random.randn(7,3), columns=list('ABC'))
b = a.copy()
b.iloc[0,2] = np.nan
b.iloc[1,0] = 7
b.iloc[3,1] = 77
b.iloc[4,2] = 777

a[a != b][np.any(a != b, axis=1)].join(pd.DataFrame('a<->b', index=a.index, columns=['a<=>b'])).join(
        b[a != b][np.any(a != b, axis=1)]
        ,rsuffix='_b', how='outer'
).fillna('')

回答 9

这是使用选择并合并的另一种方法:

In [6]: # first lets create some dummy dataframes with some column(s) different
   ...: df1 = pd.DataFrame({'a': range(-5,0), 'b': range(10,15), 'c': range(20,25)})
   ...: df2 = pd.DataFrame({'a': range(-5,0), 'b': range(10,15), 'c': [20] + list(range(101,105))})


In [7]: df1
Out[7]:
   a   b   c
0 -5  10  20
1 -4  11  21
2 -3  12  22
3 -2  13  23
4 -1  14  24


In [8]: df2
Out[8]:
   a   b    c
0 -5  10   20
1 -4  11  101
2 -3  12  102
3 -2  13  103
4 -1  14  104


In [10]: # make condition over the columns you want to comapre
    ...: condition = df1['c'] != df2['c']
    ...:
    ...: # select rows from each dataframe where the condition holds
    ...: diff1 = df1[condition]
    ...: diff2 = df2[condition]


In [11]: # merge the selected rows (dataframes) with some suffixes (optional)
    ...: diff1.merge(diff2, on=['a','b'], suffixes=('_before', '_after'))
Out[11]:
   a   b  c_before  c_after
0 -4  11        21      101
1 -3  12        22      102
2 -2  13        23      103
3 -1  14        24      104

这是Jupyter屏幕截图中的相同内容:

Here is another way using select and merge:

In [6]: # first lets create some dummy dataframes with some column(s) different
   ...: df1 = pd.DataFrame({'a': range(-5,0), 'b': range(10,15), 'c': range(20,25)})
   ...: df2 = pd.DataFrame({'a': range(-5,0), 'b': range(10,15), 'c': [20] + list(range(101,105))})


In [7]: df1
Out[7]:
   a   b   c
0 -5  10  20
1 -4  11  21
2 -3  12  22
3 -2  13  23
4 -1  14  24


In [8]: df2
Out[8]:
   a   b    c
0 -5  10   20
1 -4  11  101
2 -3  12  102
3 -2  13  103
4 -1  14  104


In [10]: # make condition over the columns you want to comapre
    ...: condition = df1['c'] != df2['c']
    ...:
    ...: # select rows from each dataframe where the condition holds
    ...: diff1 = df1[condition]
    ...: diff2 = df2[condition]


In [11]: # merge the selected rows (dataframes) with some suffixes (optional)
    ...: diff1.merge(diff2, on=['a','b'], suffixes=('_before', '_after'))
Out[11]:
   a   b  c_before  c_after
0 -4  11        21      101
1 -3  12        22      102
2 -2  13        23      103
3 -1  14        24      104

Here is the same thing from a Jupyter screenshot:


回答 10

大熊猫> = 1.1: DataFrame.compare

使用pandas 1.1,您基本上可以通过一个函数调用来复制Ted Petrou的输出。来自文档的示例:

pd.__version__
# '1.1.0.dev0+2004.g8d10bfb6f'

df1.compare(df2)

  score       isEnrolled       Comment             
   self other       self other    self        other
1  1.11  1.21        NaN   NaN     NaN          NaN
2   NaN   NaN        1.0   0.0     NaN  On vacation

此处,“自身”是指LHS数据帧,而“其他”是指RHS数据帧。默认情况下,相等值将替换为NaN,因此您可以仅关注差异。如果要显示相等的值,请使用

df1.compare(df2, keep_equal=True, keep_shape=True) 

  score       isEnrolled           Comment             
   self other       self  other       self        other
1  1.11  1.21      False  False  Graduated    Graduated
2  4.12  4.12       True  False        NaN  On vacation

您还可以使用align_axis以下方式更改比较轴:

df1.compare(df2, align_axis='index')

         score  isEnrolled      Comment
1 self    1.11         NaN          NaN
  other   1.21         NaN          NaN
2 self     NaN         1.0          NaN
  other    NaN         0.0  On vacation

这将按行而不是按列比较值。

pandas >= 1.1: DataFrame.compare

With pandas 1.1, you could essentially replicate Ted Petrou’s output with a single function call. Example taken from the docs:

pd.__version__
# '1.1.0'

df1.compare(df2)

  score       isEnrolled       Comment             
   self other       self other    self        other
1  1.11  1.21        NaN   NaN     NaN          NaN
2   NaN   NaN        1.0   0.0     NaN  On vacation

Here, “self” refers to the LHS dataFrame, while “other” is the RHS DataFrame. By default, equal values are replaced with NaNs so you can focus on just the diffs. If you want to show values that are equal as well, use

df1.compare(df2, keep_equal=True, keep_shape=True) 

  score       isEnrolled           Comment             
   self other       self  other       self        other
1  1.11  1.21      False  False  Graduated    Graduated
2  4.12  4.12       True  False        NaN  On vacation

You can also change the axis of comparison using align_axis:

df1.compare(df2, align_axis='index')

         score  isEnrolled      Comment
1 self    1.11         NaN          NaN
  other   1.21         NaN          NaN
2 self     NaN         1.0          NaN
  other    NaN         0.0  On vacation

This compares values row-wise, instead of column-wise.


回答 11

查找两个数据帧之间不对称差异的函数在以下实现:(基于熊猫的集合差异)GIST:https : //gist.github.com/oneryalcin/68cf25f536a25e65f0b3c84f9c118e03

def diff_df(df1, df2, how="left"):
    """
      Find Difference of rows for given two dataframes
      this function is not symmetric, means
            diff(x, y) != diff(y, x)
      however
            diff(x, y, how='left') == diff(y, x, how='right')

      Ref: /programming/18180763/set-difference-for-pandas/40209800#40209800
    """
    if (df1.columns != df2.columns).any():
        raise ValueError("Two dataframe columns must match")

    if df1.equals(df2):
        return None
    elif how == 'right':
        return pd.concat([df2, df1, df1]).drop_duplicates(keep=False)
    elif how == 'left':
        return pd.concat([df1, df2, df2]).drop_duplicates(keep=False)
    else:
        raise ValueError('how parameter supports only "left" or "right keywords"')

例:

df1 = pd.DataFrame(d1)
Out[1]: 
                Comment  Name  isEnrolled  score
0  He was late to class  Jack        True   2.17
1             Graduated  Nick       False   1.11
2                         Zoe        True   4.12


df2 = pd.DataFrame(d2)

Out[2]: 
                Comment  Name  isEnrolled  score
0  He was late to class  Jack        True   2.17
1           On vacation   Zoe        True   4.12

diff_df(df1, df2)
Out[3]: 
     Comment  Name  isEnrolled  score
1  Graduated  Nick       False   1.11
2              Zoe        True   4.12

diff_df(df2, df1)
Out[4]: 
       Comment Name  isEnrolled  score
1  On vacation  Zoe        True   4.12

# This gives the same result as above
diff_df(df1, df2, how='right')
Out[22]: 
       Comment Name  isEnrolled  score
1  On vacation  Zoe        True   4.12

A function that finds asymmetrical difference between two data frames is implemented below: (Based on set difference for pandas) GIST: https://gist.github.com/oneryalcin/68cf25f536a25e65f0b3c84f9c118e03

def diff_df(df1, df2, how="left"):
    """
      Find Difference of rows for given two dataframes
      this function is not symmetric, means
            diff(x, y) != diff(y, x)
      however
            diff(x, y, how='left') == diff(y, x, how='right')

      Ref: https://stackoverflow.com/questions/18180763/set-difference-for-pandas/40209800#40209800
    """
    if (df1.columns != df2.columns).any():
        raise ValueError("Two dataframe columns must match")

    if df1.equals(df2):
        return None
    elif how == 'right':
        return pd.concat([df2, df1, df1]).drop_duplicates(keep=False)
    elif how == 'left':
        return pd.concat([df1, df2, df2]).drop_duplicates(keep=False)
    else:
        raise ValueError('how parameter supports only "left" or "right keywords"')

Example:

df1 = pd.DataFrame(d1)
Out[1]: 
                Comment  Name  isEnrolled  score
0  He was late to class  Jack        True   2.17
1             Graduated  Nick       False   1.11
2                         Zoe        True   4.12


df2 = pd.DataFrame(d2)

Out[2]: 
                Comment  Name  isEnrolled  score
0  He was late to class  Jack        True   2.17
1           On vacation   Zoe        True   4.12

diff_df(df1, df2)
Out[3]: 
     Comment  Name  isEnrolled  score
1  Graduated  Nick       False   1.11
2              Zoe        True   4.12

diff_df(df2, df1)
Out[4]: 
       Comment Name  isEnrolled  score
1  On vacation  Zoe        True   4.12

# This gives the same result as above
diff_df(df1, df2, how='right')
Out[22]: 
       Comment Name  isEnrolled  score
1  On vacation  Zoe        True   4.12

回答 12

将pda导入为pd将numpy导入为np

df = pd.read_excel(’D:\ HARISH \ DATA SCIENCE \ 1 MY Training \ SAMPLE DATA&PROJS \ CRICKET DATA \ IPL PLAYER LIST \ IPL PLAYER LIST _ harish.xlsx’)

df1 = srh = df [df [‘TEAM’]。str.contains(“ SRH”)] df2 = csk = df [df [‘TEAM’]。str.contains(“ CSK”)]

srh = srh.iloc [:,0:2] csk = csk.iloc [:,0:2]

csk = csk.reset_index(drop = True)csk

srh = srh.reset_index(drop = True)srh

new = pd.concat([srh,csk],axis = 1)

new.head()

**玩家类型玩家类型

0戴维·华纳·蝙蝠侠… MS Dhoni Captain

1布瓦内什瓦尔·库马尔·鲍勒(Bhuvaneshwar Kumar Bowler)…

2 Manish Pandey击球手… Suresh Raina All-Rounder

3拉希德·汗·阿曼·鲍勒(Kashir Jadhav All-Rounder)

4 Shikhar Dhawan击球手…. Dwayne Bravo All-Rounder

import pandas as pd
import numpy as np

df = pd.read_excel('D:\\HARISH\\DATA SCIENCE\\1 MY Training\\SAMPLE DATA & projs\\CRICKET DATA\\IPL PLAYER LIST\\IPL PLAYER LIST _ harish.xlsx')


df1= srh = df[df['TEAM'].str.contains("SRH")]
df2 = csk = df[df['TEAM'].str.contains("CSK")]   

srh = srh.iloc[:,0:2]
csk = csk.iloc[:,0:2]

csk = csk.reset_index(drop=True)
csk

srh = srh.reset_index(drop=True)
srh

new = pd.concat([srh, csk], axis=1)

new.head()
** 
               PLAYER     TYPE           PLAYER         TYPE

0        David Warner  Batsman    ...        MS Dhoni      Captain

1  Bhuvaneshwar Kumar   Bowler  ...    Ravindra Jadeja  All-Rounder

2       Manish Pandey  Batsman   ...   Suresh Raina  All-Rounder

3   Rashid Khan Arman   Bowler     ...   Kedar Jadhav  All-Rounder

4      Shikhar Dhawan  Batsman    ....    Dwayne Bravo  All-Rounder

如何将HTML嵌入IPython输出中?

问题:如何将HTML嵌入IPython输出中?

是否可以将呈现的HTML输出嵌入到IPython输出中?

一种方法是使用

from IPython.core.display import HTML
HTML('<a href="http://example.com">link</a>')

或(IPython多行单元格别名)

%%html
<a href="http://example.com">link</a>

哪个返回格式化的链接,但是

  1. 此链接不会从控制台打开带有网页本身的浏览器。不过,IPython笔记本支持诚实渲染。
  2. 我不知道如何HTML()在列表或pandas打印表中呈现对象。您可以这样做df.to_html(),但无需在单元格内建立链接。
  3. 此输出在PyCharm Python控制台中不是交互式的(因为它不是QT)。

如何克服这些缺点并使IPython输出更具交互性?

Is it possible to embed rendered HTML output into IPython output?

One way is to use

from IPython.core.display import HTML
HTML('<a href="http://example.com">link</a>')

or (IPython multiline cell alias)

%%html
<a href="http://example.com">link</a>

Which return a formatted link, but

  1. This link doesn’t open a browser with the webpage itself from the console. IPython notebooks support honest rendering, though.
  2. I’m unaware of how to render HTML() object within, say, a list or pandas printed table. You can do df.to_html(), but without making links inside cells.
  3. This output isn’t interactive in the PyCharm Python console (because it’s not QT).

How can I overcome these shortcomings and make IPython output a bit more interactive?


回答 0

这似乎为我工作:

from IPython.core.display import display, HTML
display(HTML('<h1>Hello, world!</h1>'))

诀窍是也将其包装在“显示”中。

来源:http : //python.6.x6.nabble.com/Printing-HTML-within-IPython-Notebook-IPython-specific-prettyprint-tp5016624p5016631.html

This seems to work for me:

from IPython.core.display import display, HTML
display(HTML('<h1>Hello, world!</h1>'))

The trick is to wrap it in “display” as well.

Source: http://python.6.x6.nabble.com/Printing-HTML-within-IPython-Notebook-IPython-specific-prettyprint-tp5016624p5016631.html


回答 1

一段时间以前,Jupyter Notebooks开始从HTML内容中剥离JavaScript [ #3118 ]。这是两个解决方案:

提供本地HTML

如果要立即在页面上嵌入带有JavaScript的HTML页面,最简单的方法是使用笔记本将HTML文件保存到目录中,然后按以下方式加载HTML:

from IPython.display import IFrame

IFrame(src='./nice.html', width=700, height=600)

提供远程HTML

如果您喜欢托管解决方案,则可以在S3中将HTML页面上传到Amazon Web Services“存储桶”,更改该存储桶上的设置,以使该存储桶托管静态网站,然后在笔记本中使用Iframe组件:

from IPython.display import IFrame

IFrame(src='https://s3.amazonaws.com/duhaime/blog/visualizations/isolation-forests.html', width=700, height=600)

就像在其他任何网页上一样,这将在iframe中呈现HTML内容和JavaScript:

<iframe src='https://s3.amazonaws.com/duhaime/blog/visualizations/isolation-forests.html', width=700, height=600></iframe>

Some time ago Jupyter Notebooks started stripping JavaScript from HTML content [#3118]. Here are two solutions:

Serving Local HTML

If you want to embed an HTML page with JavaScript on your page now, the easiest thing to do is to save your HTML file to the directory with your notebook and then load the HTML as follows:

from IPython.display import IFrame

IFrame(src='./nice.html', width=700, height=600)

Serving Remote HTML

If you prefer a hosted solution, you can upload your HTML page to an Amazon Web Services “bucket” in S3, change the settings on that bucket so as to make the bucket host a static website, then use an Iframe component in your notebook:

from IPython.display import IFrame

IFrame(src='https://s3.amazonaws.com/duhaime/blog/visualizations/isolation-forests.html', width=700, height=600)

This will render your HTML content and JavaScript in an iframe, just like you can on any other web page:

<iframe src='https://s3.amazonaws.com/duhaime/blog/visualizations/isolation-forests.html', width=700, height=600></iframe>

回答 2

相关:在构造类时,def _repr_html_(self): ...可用于创建其实例的自定义HTML表示形式:

class Foo:
    def _repr_html_(self):
        return "Hello <b>World</b>!"

o = Foo()
o

将呈现为:

世界你好!

有关更多信息,请参阅IPython的文档

一个高级示例:

from html import escape # Python 3 only :-)

class Todo:
    def __init__(self):
        self.items = []

    def add(self, text, completed):
        self.items.append({'text': text, 'completed': completed})

    def _repr_html_(self):
        return "<ol>{}</ol>".format("".join("<li>{} {}</li>".format(
            "☑" if item['completed'] else "☐",
            escape(item['text'])
        ) for item in self.items))

my_todo = Todo()
my_todo.add("Buy milk", False)
my_todo.add("Do homework", False)
my_todo.add("Play video games", True)

my_todo

将呈现:

  1. ☐购买牛奶
  2. ☐做作业
  3. ☑玩电子游戏

Related: While constructing a class, def _repr_html_(self): ... can be used to create a custom HTML representation of its instances:

class Foo:
    def _repr_html_(self):
        return "Hello <b>World</b>!"

o = Foo()
o

will render as:

Hello World!

For more info refer to IPython’s docs.

An advanced example:

from html import escape # Python 3 only :-)

class Todo:
    def __init__(self):
        self.items = []

    def add(self, text, completed):
        self.items.append({'text': text, 'completed': completed})

    def _repr_html_(self):
        return "<ol>{}</ol>".format("".join("<li>{} {}</li>".format(
            "☑" if item['completed'] else "☐",
            escape(item['text'])
        ) for item in self.items))

my_todo = Todo()
my_todo.add("Buy milk", False)
my_todo.add("Do homework", False)
my_todo.add("Play video games", True)

my_todo

Will render:

  1. ☐ Buy milk
  2. ☐ Do homework
  3. ☑ Play video games

回答 3

在上面的@Harmon上展开,看来您可以将displayand print语句组合在一起……如果需要的话。或者,将整个HTML格式化为一个字符串然后使用显示可能会更容易。无论哪种方式,不错的功能。

display(HTML('<h1>Hello, world!</h1>'))
print("Here's a link:")
display(HTML("<a href='http://www.google.com' target='_blank'>www.google.com</a>"))
print("some more printed text ...")
display(HTML('<p>Paragraph text here ...</p>'))

输出如下所示:


你好,世界!

这里是一个链接:

www.google.com

一些更多的印刷文字…

此处的段落文字…


Expanding on @Harmon above, looks like you can combine the display and print statements together … if you need. Or, maybe it’s easier to just format your entire HTML as one string and then use display. Either way, nice feature.

display(HTML('<h1>Hello, world!</h1>'))
print("Here's a link:")
display(HTML("<a href='http://www.google.com' target='_blank'>www.google.com</a>"))
print("some more printed text ...")
display(HTML('<p>Paragraph text here ...</p>'))

Outputs something like this:


Hello, world!

Here’s a link:

www.google.com

some more printed text …

Paragraph text here …