EDIT: If you have non-ascii chars you also want to escape, for inclusion in another encoded document that uses a different encoding, like Craig says, just use:
data.encode('ascii', 'xmlcharrefreplace')
Don’t forget to decode data to unicode first, using whatever encoding it was encoded.
However in my experience that kind of encoding is useless if you just work with unicode all the time from start. Just encode at the end to the encoding specified in the document header (utf-8 for maximum compatibility).
Also worth of note (thanks Greg) is the extra quote parameter cgi.escape takes. With it set to True, cgi.escape also escapes double quote chars (") so you can use the resulting value in a XML/HTML attribute.
EDIT: Note that cgi.escape has been deprecated in Python 3.2 in favor of html.escape, which does the same except that quote defaults to True.
This is probably NOT what the OP wanted (the question doesn’t clearly indicate in which context the escaping is meant to be used), but Python’s native library urllib has a method to escape HTML entities that need to be included in a URL safely.
The following is an example:
#!/usr/bin/python
from urllib import quote
x = '+<>^&'
print quote(x) # prints '%2B%3C%3E%5E%26'
cgi.escape should be good to escape HTML in the limited sense of escaping the HTML tags and character entities.
But you might have to also consider encoding issues: if the HTML you want to quote has non-ASCII characters in a particular encoding, then you would also have to take care that you represent those sensibly when quoting. Perhaps you could convert them to entities. Otherwise you should ensure that the correct encoding translations are done between the “source” HTML and the page it’s embedded in, to avoid corrupting the non-ASCII characters.
def escape(s, quote=None):'''Replace special characters "&", "<" and ">" to HTML-safe sequences.
If the optional flag quote is true, the quotation mark character (")
is also translated.'''
s = s.replace("&","&")# Must be done first!
s = s.replace("<","<")
s = s.replace(">",">")if quote:
s = s.replace('"',""")return s
正则表达式版本
QUOTE_PATTERN = r"""([&<>"'])(?!(amp|lt|gt|quot|#39);)"""def escape(word):"""
Replaces special characters <>&"' to HTML-safe sequences.
With attention to already escaped characters.
"""
replace_with ={'<':'>','>':'<','&':'&','"':'"',# should be escaped in attributes"'":'''# should be escaped in attributes}
quote_pattern = re.compile(QUOTE_PATTERN)return re.sub(quote_pattern,lambda x: replace_with[x.group(0)], word)
Not the easiest way, but still straightforward. The main difference from cgi.escape module – it still will work properly if you already have & in your text. As you see from comments to it:
cgi.escape version
def escape(s, quote=None):
'''Replace special characters "&", "<" and ">" to HTML-safe sequences.
If the optional flag quote is true, the quotation mark character (")
is also translated.'''
s = s.replace("&", "&") # Must be done first!
s = s.replace("<", "<")
s = s.replace(">", ">")
if quote:
s = s.replace('"', """)
return s
regex version
QUOTE_PATTERN = r"""([&<>"'])(?!(amp|lt|gt|quot|#39);)"""
def escape(word):
"""
Replaces special characters <>&"' to HTML-safe sequences.
With attention to already escaped characters.
"""
replace_with = {
'<': '>',
'>': '<',
'&': '&',
'"': '"', # should be escaped in attributes
"'": ''' # should be escaped in attributes
}
quote_pattern = re.compile(QUOTE_PATTERN)
return re.sub(quote_pattern, lambda x: replace_with[x.group(0)], word)
Use () in regexp and group(1) in python to retrieve the captured string (re.search will return None if it doesn’t find the result, so don’t use group() directly):
title_search = re.search('<title>(.*)</title>', html, re.IGNORECASE)
if title_search:
title = title_search.group(1)
Note that starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), it’s possible to improve a bit on Krzysztof Krasoń’s solution by capturing the match result directly within the if condition as a variable and re-use it in the condition’s body:
# pattern = '<title>(.*)</title>'
# text = '<title>hello</title>'
if match := re.search(pattern, text, re.IGNORECASE):
title = match.group(1)
# hello
回答 2
尝试使用捕获组:
title = re.search('<title>(.*)</title>', html, re.IGNORECASE).group(1)
#!python
import re
pattern = re.compile(r'<title>([^<]*)</title>', re.MULTILINE|re.IGNORECASE)
pattern.search(text)
… assuming that your text (HTML) is in a variable named “text.”
This also assumes that there are not other HTML tags which can be legally embedded inside of an HTML TITLE tag and no way to legally embed any other < character within such a container/block.
However …
Don’t use regular expressions for HTML parsing in Python. Use an HTML parser! (Unless you’re going to write a full parser, which would be a of extra work when various HTML, SGML and XML parsers are already in the standard libraries.
If your handling “real world” tag soup HTML (which is frequently non-conforming to any SGML/XML validator) then use the BeautifulSoup package. It isn’t in the standard libraries (yet) but is wide recommended for this purpose.
Another option is: lxml … which is written for properly structured (standards conformant) HTML. But it has an option to fallback to using BeautifulSoup as a parser: ElementSoup.
I have a Django form with a RegexField, which is very similar to a normal text input field.
In my view, under certain conditions I want to hide it from the user, and trying to keep the form as similar as possible. What’s the best way to turn this field into a HiddenInput field?
classMyModelForm(forms.ModelForm):classMeta:
model =MyModeldef __init__(self,*args,**kwargs):from django.forms.widgets importHiddenInput
hide_condition = kwargs.pop('hide_condition',None)
super(MyModelForm, self).__init__(*args,**kwargs)if hide_condition:
self.fields['fieldname'].widget =HiddenInput()# or alternately: del self.fields['fieldname'] to remove it from the form altogether.
Firstly, if you don’t want the user to modify the data, then it seems cleaner to simply exclude the field. Including it as a hidden field just adds more data to send over the wire and invites a malicious user to modify it when you don’t want them to. If you do have a good reason to include the field but hide it, you can pass a keyword arg to the modelform’s constructor. Something like this perhaps:
class MyModelForm(forms.ModelForm):
class Meta:
model = MyModel
def __init__(self, *args, **kwargs):
from django.forms.widgets import HiddenInput
hide_condition = kwargs.pop('hide_condition',None)
super(MyModelForm, self).__init__(*args, **kwargs)
if hide_condition:
self.fields['fieldname'].widget = HiddenInput()
# or alternately: del self.fields['fieldname'] to remove it from the form altogether.
Then in your view:
form = MyModelForm(hide_condition=True)
I prefer this approach to modifying the modelform’s internals in the view, but it’s a matter of taste.
I want to get the content from the below website. If I use a browser like Firefox or Chrome I could get the real website page I want, but if I use the Python requests package (or wget command) to get it, it returns a totally different HTML page. I thought the developer of the website had made some blocks for this, so the question is:
How do I fake a browser visit by using python requests or command wget?
from fake_useragent importUserAgentimport requests
ua =UserAgent()print(ua.chrome)
header ={'User-Agent':str(ua.chrome)}print(header)
url ="https://www.hybrid-analysis.com/recent-submissions?filter=file&sort=^timestamp"
htmlContent = requests.get(url, headers=header)print(htmlContent)
输出:
Mozilla/5.0(Macintosh;IntelMac OS X 10_8_2)AppleWebKit/537.17(KHTML, like Gecko)Chrome/24.0.1309.0Safari/537.17{'User-Agent':'Mozilla/5.0 (X11; OpenBSD i386) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36'}<Response[200]>
The root of the answer is that the person asking the question needs to have a JavaScript interpreter to get what they are after. What I have found is I am able to get all of the information I wanted on a website in json before it was interpreted by JavaScript. This has saved me a ton of time in what would be parsing html hoping each webpage is in the same format.
So when you get a response from a website using requests really look at the html/text because you might find the javascripts JSON in the footer ready to be parsed.
Download it and install it as usual with python setup.py install
You will also need to install the following modules: xhtml2pdf, html5lib, pypdf with easy_install.
Here is an usage example:
First define this function:
import cStringIO as StringIO
from xhtml2pdf import pisa
from django.template.loader import get_template
from django.template import Context
from django.http import HttpResponse
from cgi import escape
def render_to_pdf(template_src, context_dict):
template = get_template(template_src)
context = Context(context_dict)
html = template.render(context)
result = StringIO.StringIO()
pdf = pisa.pisaDocument(StringIO.StringIO(html.encode("ISO-8859-1")), result)
if not pdf.err:
return HttpResponse(result.getvalue(), content_type='application/pdf')
return HttpResponse('We had some errors<pre>%s</pre>' % escape(html))
Then you can use it like this:
def myview(request):
#Retrieve data or whatever you need
return render_to_pdf(
'mytemplate.html',
{
'pagesize':'A4',
'mylist': results,
}
)
I just whipped this up for CBV. Not used in production but generates a PDF for me. Probably needs work for the error reporting side of things but does the trick so far.
import StringIO
from cgi import escape
from xhtml2pdf import pisa
from django.http import HttpResponse
from django.template.response import TemplateResponse
from django.views.generic import TemplateView
class PDFTemplateResponse(TemplateResponse):
def generate_pdf(self, retval):
html = self.content
result = StringIO.StringIO()
rendering = pisa.pisaDocument(StringIO.StringIO(html.encode("ISO-8859-1")), result)
if rendering.err:
return HttpResponse('We had some errors<pre>%s</pre>' % escape(html))
else:
self.content = result.getvalue()
def __init__(self, *args, **kwargs):
super(PDFTemplateResponse, self).__init__(*args, mimetype='application/pdf', **kwargs)
self.add_post_render_callback(self.generate_pdf)
class PDFTemplateView(TemplateView):
response_class = PDFTemplateResponse
Used like:
class MyPdfView(PDFTemplateView):
template_name = 'things/pdf.html'
# views.pyfrom django_xhtml2pdf.views importPdfMixinclassGroupPDFGenerate(PdfMixin,DetailView):
model =PeerGroupSignIn
template_name ='groups/pdf.html'# templates/groups/pdf.html<html><style>@page{ your xhtml2pdf pisa PDF parameters }</style></head><body><div id="header_content">(this is defined in the style section)<h1>{{ peergroupsignin.this_group_title }}</h1>...
# urls.py (using url namespaces defined in the main urls.py file)
url(
regex=r"^(?P<pk>\d+)/generate_pdf/$",
view=views.GroupPDFGenerate.as_view(),
name="generate_pdf",),
# views.py
from django_xhtml2pdf.views import PdfMixin
class GroupPDFGenerate(PdfMixin, DetailView):
model = PeerGroupSignIn
template_name = 'groups/pdf.html'
# templates/groups/pdf.html
<html>
<style>
@page { your xhtml2pdf pisa PDF parameters }
</style>
</head>
<body>
<div id="header_content"> (this is defined in the style section)
<h1>{{ peergroupsignin.this_group_title }}</h1>
...
Use the model name you defined in your view in all lowercase when populating the template fields. Because its a GCBV, you can just call it as ‘.as_view’ in your urls.py:
# urls.py (using url namespaces defined in the main urls.py file)
url(
regex=r"^(?P<pk>\d+)/generate_pdf/$",
view=views.GroupPDFGenerate.as_view(),
name="generate_pdf",
),
from django.test importTestCasefrom x_reports_jasper.models importJasperServerClient"""
to try integraction with jasper server through rest
"""classTestJasperServerClient(TestCase):# define required objects for testsdef setUp(self):# load the connection to remote servertry:
self.j_url ="http://127.0.0.1:8080/jasperserver"
self.j_user ="jasperadmin"
self.j_pass ="jasperadmin"
self.client =JasperServerClient.create_client(self.j_url,self.j_user,self.j_pass)exceptException, e:# if errors could not execute test given prerrequisitesraise# test exception when server data is invaliddef test_login_to_invalid_address_should_raise(self):
self.assertRaises(Exception,JasperServerClient.create_client,"http://127.0.0.1:9090/jasperserver",self.j_user,self.j_pass)# test execute existent report in serverdef test_get_report(self):
r_resource_path ="/reports/<PathToPublishedReport>"
r_format ="pdf"
r_params ={'PARAM_TO_REPORT':"1",}#resource_meta = client.load_resource_metadata( rep_resource_path )[uuid,out_mime,out_data]= self.client.generate_report(r_resource_path,r_format,r_params)
self.assertIsNotNone(uuid)
这是调用实现的示例:
from django.db import models
import requests
import sys
from xml.etree importElementTreeimport logging
# module logger definition
logger = logging.getLogger(__name__)# Create your models here.classJasperServerClient(models.Manager):def __handle_exception(self, exception_root, exception_id, exec_info ):
type, value, traceback = exec_info
raiseJasperServerClientError(exception_root, exception_id),None, traceback
# 01: REPORT-METADATA # get resource description to generate the reportdef __handle_report_metadata(self, rep_resourcepath):
l_path_base_resource ="/rest/resource"
l_path = self.j_url + l_path_base_resource
logger.info("metadata (begin) [path=%s%s]"%( l_path ,rep_resourcepath))
resource_response =Nonetry:
resource_response = requests.get("%s%s"%( l_path ,rep_resourcepath), cookies = self.login_response.cookies)exceptException, e:
self.__handle_exception(e,"REPORT_METADATA:CALL_ERROR", sys.exc_info())
resource_response_dom =Nonetry:# parse to dom and set parameters
logger.debug(" - response [data=%s]"%( resource_response.text))
resource_response_dom =ElementTree.fromstring(resource_response.text)
datum =""for node in resource_response_dom.getiterator():
datum ="%s<br />%s - %s"%(datum, node.tag, node.text)
logger.debug(" - response [xml=%s]"%( datum ))#
self.resource_response_payload= resource_response.text
logger.info("metadata (end) ")exceptException, e:
logger.error("metadata (error) [%s]"%(e))
self.__handle_exception(e,"REPORT_METADATA:PARSE_ERROR", sys.exc_info())# 02: REPORT-PARAMS def __add_report_params(self, metadata_text, params ):if(type(params)!= dict):raiseTypeError("Invalid parameters to report")else:
logger.info("add-params (begin) []")#copy parameters
l_params ={}for k,v in params.items():
l_params[k]=v
# get the payload metadata
metadata_dom =ElementTree.fromstring(metadata_text)# add attributes to payload metadata
root = metadata_dom #('report'):for k,v in l_params.items():
param_dom_element =ElementTree.Element('parameter')
param_dom_element.attrib["name"]= k
param_dom_element.text = v
root.append(param_dom_element)#
metadata_modified_text =ElementTree.tostring(metadata_dom, encoding='utf8', method='xml')
logger.info("add-params (end) [payload-xml=%s]"%( metadata_modified_text ))return metadata_modified_text
# 03: REPORT-REQUEST-CALL # call to generate the reportdef __handle_report_request(self, rep_resourcepath, rep_format, rep_params):# add parameters
self.resource_response_payload = self.__add_report_params(self.resource_response_payload,rep_params)# send report request
l_path_base_genreport ="/rest/report"
l_path = self.j_url + l_path_base_genreport
logger.info("report-request (begin) [path=%s%s]"%( l_path ,rep_resourcepath))
genreport_response =Nonetry:
genreport_response = requests.put("%s%s?RUN_OUTPUT_FORMAT=%s"%(l_path,rep_resourcepath,rep_format),data=self.resource_response_payload, cookies = self.login_response.cookies )
logger.info(" - send-operation-result [value=%s]"%( genreport_response.text))exceptException,e:
self.__handle_exception(e,"REPORT_REQUEST:CALL_ERROR", sys.exc_info())# parse the uuid of the requested report
genreport_response_dom =Nonetry:
genreport_response_dom =ElementTree.fromstring(genreport_response.text)for node in genreport_response_dom.findall("uuid"):
datum ="%s"%(node.text)
genreport_uuid = datum
for node in genreport_response_dom.findall("file/[@type]"):
datum ="%s"%(node.text)
genreport_mime = datum
logger.info("report-request (end) [uuid=%s,mime=%s]"%( genreport_uuid, genreport_mime))return[genreport_uuid,genreport_mime]exceptException,e:
self.__handle_exception(e,"REPORT_REQUEST:PARSE_ERROR", sys.exc_info())# 04: REPORT-RETRIEVE RESULTS def __handle_report_reply(self, genreport_uuid ):
l_path_base_getresult ="/rest/report"
l_path = self.j_url + l_path_base_getresult
logger.info("report-reply (begin) [uuid=%s,path=%s]"%( genreport_uuid,l_path))
getresult_response = requests.get("%s%s/%s?file=report"%(self.j_url,l_path_base_getresult,genreport_uuid),data=self.resource_response_payload, cookies = self.login_response.cookies )
l_result_header_mime =getresult_response.headers['Content-Type']
logger.info("report-reply (end) [uuid=%s,mime=%s]"%( genreport_uuid, l_result_header_mime))return[l_result_header_mime, getresult_response.content]# public methods --------------------------------------- # tries the authentication with jasperserver throug restdef login(self, j_url, j_user,j_pass):
self.j_url= j_url
l_path_base_auth ="/rest/login"
l_path = self.j_url + l_path_base_auth
logger.info("login (begin) [path=%s]"%( l_path))try:
self.login_response = requests.post(l_path , params ={'j_username':j_user,'j_password':j_pass
})if( requests.codes.ok != self.login_response.status_code ):
self.login_response.raise_for_status()
logger.info("login (end)")returnTrue# see http://blog.ianbicking.org/2007/09/12/re-raising-exceptions/exceptException, e:
logger.error("login (error) [e=%s]"% e )
self.__handle_exception(e,"LOGIN:CALL_ERROR",sys.exc_info())#raisedef generate_report(self, rep_resourcepath,rep_format,rep_params):
self.__handle_report_metadata(rep_resourcepath)[uuid,mime]= self.__handle_report_request(rep_resourcepath, rep_format,rep_params)# TODO: how to handle async?[out_mime,out_data]= self.__handle_report_reply(uuid)return[uuid,out_mime,out_data]@staticmethoddef create_client(j_url, j_user, j_pass):
client =JasperServerClient()
login_res = client.login( j_url, j_user, j_pass )return client
classJasperServerClientError(Exception):def __init__(self,exception_root,reason_id,reason_message=None):
super(JasperServerClientError, self).__init__(str(reason_message))
self.code = reason_id
self.description = str(exception_root)+" "+ str(reason_message)def __str__(self):return self.code +" "+ self.description
You can use iReport editor to define the layout, and publish the report in jasper reports server. After publish you can invoke the rest api to get the results.
Here is the test of the functionality:
from django.test import TestCase
from x_reports_jasper.models import JasperServerClient
"""
to try integraction with jasper server through rest
"""
class TestJasperServerClient(TestCase):
# define required objects for tests
def setUp(self):
# load the connection to remote server
try:
self.j_url = "http://127.0.0.1:8080/jasperserver"
self.j_user = "jasperadmin"
self.j_pass = "jasperadmin"
self.client = JasperServerClient.create_client(self.j_url,self.j_user,self.j_pass)
except Exception, e:
# if errors could not execute test given prerrequisites
raise
# test exception when server data is invalid
def test_login_to_invalid_address_should_raise(self):
self.assertRaises(Exception,JasperServerClient.create_client, "http://127.0.0.1:9090/jasperserver",self.j_user,self.j_pass)
# test execute existent report in server
def test_get_report(self):
r_resource_path = "/reports/<PathToPublishedReport>"
r_format = "pdf"
r_params = {'PARAM_TO_REPORT':"1",}
#resource_meta = client.load_resource_metadata( rep_resource_path )
[uuid,out_mime,out_data] = self.client.generate_report(r_resource_path,r_format,r_params)
self.assertIsNotNone(uuid)
And here is an example of the invocation implementation:
import os
from weasyprint import HTML
from django.template importTemplate,Contextfrom django.http importHttpResponsedef generate_pdf(self, report_id):# Render HTML into memory and get the template firstly
template_file_loc = os.path.join(os.path.dirname(__file__), os.pardir,'templates','the_template_pdf_generator.html')
template_contents = read_all_as_str(template_file_loc)
render_template =Template(template_contents)#rendering_map is the dict for params in the template
render_definition =Context(rendering_map)
render_output = render_template.render(render_definition)# Using Rendered HTML to generate PDF
response =HttpResponse(content_type='application/pdf')
response['Content-Disposition']='attachment; filename=%s-%s-%s.pdf'% \
('topic-test','topic-test','2018-05-04')# Generate PDF
pdf_doc = HTML(string=render_output).render()
pdf_doc.pages[0].height = pdf_doc.pages[0]._page_box.children[0].children[0].height # Make PDF file as single page file
pdf_doc.write_pdf(response)return response
def read_all_as_str(self, file_loc, read_method='r'):if file_exists(file_loc):
handler = open(file_loc, read_method)
contents = handler.read()
handler.close()return contents
else:return'file not exist'
I get the code to generate the PDF from html template :
import os
from weasyprint import HTML
from django.template import Template, Context
from django.http import HttpResponse
def generate_pdf(self, report_id):
# Render HTML into memory and get the template firstly
template_file_loc = os.path.join(os.path.dirname(__file__), os.pardir, 'templates', 'the_template_pdf_generator.html')
template_contents = read_all_as_str(template_file_loc)
render_template = Template(template_contents)
#rendering_map is the dict for params in the template
render_definition = Context(rendering_map)
render_output = render_template.render(render_definition)
# Using Rendered HTML to generate PDF
response = HttpResponse(content_type='application/pdf')
response['Content-Disposition'] = 'attachment; filename=%s-%s-%s.pdf' % \
('topic-test','topic-test', '2018-05-04')
# Generate PDF
pdf_doc = HTML(string=render_output).render()
pdf_doc.pages[0].height = pdf_doc.pages[0]._page_box.children[0].children[
0].height # Make PDF file as single page file
pdf_doc.write_pdf(response)
return response
def read_all_as_str(self, file_loc, read_method='r'):
if file_exists(file_loc):
handler = open(file_loc, read_method)
contents = handler.read()
handler.close()
return contents
else:
return 'file not exist'
None of my styles are being applied though. Does it have something to do with the fact that the html is a template I am rendering? The python looks like this.
I know this much is working, because I am still able to render the template. However, when I tried to move my styling code from a “style” block within the html’s “head” tag to an external file, all the styling went away, leaving a bare html page. Anyone see any errors with my file structure?
You need to have a ‘static’ folder setup (for css/js files) unless you specifically override it during Flask initialization. I am assuming you did not override it.
TEMPLATE_DIR = os.path.abspath('../templates')
STATIC_DIR = os.path.abspath('../static')# app = Flask(__name__) # to make the app run without any
app =Flask(__name__, template_folder=TEMPLATE_DIR, static_folder=STATIC_DIR)
I have read multiple threads and none of them fixed the issue that people are describing and I have experienced too.
I have even tried to move away from conda and use pip, to upgrade to python 3.7, i have tried all coding proposed and none of them fixed.
And here is why (the problem):
by default python/flask search the static and the template in a folder structure like:
/Users/username/folder_one/folder_two/ProjectName/src/app_name/<static>
and
/Users/username/folder_one/folder_two/ProjectName/src/app_name/<template>
you can verify by yourself using the debugger on Pycharm (or anything else) and check the values on the app (app = Flask(name)) and search for teamplate_folder and static_folder
in order to fix this, you have to specify the values when creating the app something like this:
TEMPLATE_DIR = os.path.abspath('../templates')
STATIC_DIR = os.path.abspath('../static')
# app = Flask(__name__) # to make the app run without any
app = Flask(__name__, template_folder=TEMPLATE_DIR, static_folder=STATIC_DIR)
the path TEMPLATE_DIR and STATIC_DIR depend on where the file app is located. in my case, see the picture, it was located within a folder under src.
you can change the template and static folders as you wish and register on the app = Flask…
In truth, I have started experiencing the problem when messing around with folder and at times worked at times not. this fixes the problem once and for all
Still having problems after following the solution provided by codegeek: <link rel= "stylesheet" type= "text/css" href= "{{ url_for('static',filename='styles/mainpage.css') }}"> ?
In Google Chrome pressing the reload button (F5) will not reload the static files. If you have followed the accepted solution but still don’t see the changes you have made to CSS, then press ctrl + shift + R to ignore cached files and reload the static files.
In Firefox pressing the reload button appears to reload the static files.
In Edge pressing the refresh button does not reload the static file. Pressing ctrl + shift + R is supposed to ignore cached files and reload the static files. However this does not work on my computer.
The flask project structure is different. As you mentioned in question the project structure is the same but the only problem is wit the styles folder. Styles folder must come within the static folder.
If any of the above method is not working and you code is perfect then try hard refreshing by pressing Ctrl + F5. It will clear all the chaces and then reload file. It worked for me.
However webview just display them as literal strings. Here are the result:
Edit: I add original string returned from server side:
“<!DOCTYPE html> <html
lang="en"> <head> <meta
charset="utf-8"> <meta
http-equiv="X-UA-Compatible"
content="IE=edge"> <meta
name="viewport"
content="width=device-width,
initial-scale=1.0"> <meta
name="description"
content="">
<title>Saulify</title> <!– All the
Favicons… –> <link rel="shortcut
icon"
href="/static/favicon/favicon.ico">
<link rel="apple-touch-icon"
sizes="57×57"
href="/static/favicon/apple-touch-icon-57×57.png">
<link rel="apple-touch-icon"
sizes="114×114"
href="/static/favicon/apple-touch-icon-114×114.png">
<link rel="apple-touch-icon"
sizes="72×72"
href="/static/favicon/apple-touch-icon-72×72.png">
<link rel="apple-touch-icon"
sizes="144×144"
href="/static/favicon/apple-touch-icon-144×144.png">
<link rel="apple-touch-icon"
sizes="60×60"
href="/static/favicon/apple-touch-icon-60×60.png">
<link rel="apple-touch-icon"
sizes="120×120"
href="/static/favicon/apple-touch-icon-120×120.png">
<link rel="apple-touch-icon"
sizes="76×76"
href="/static/favicon/apple-touch-icon-76×76.png">
<link rel="apple-touch-icon"
sizes="152×152"
href="/static/favicon/apple-touch-icon-152×152.png">
<link rel="apple-touch-icon"
sizes="180×180"
href="/static/favicon/apple-touch-icon-180×180.png">
<link rel="icon"
type="image/png"
href="/static/favicon/favicon-192×192.png"
sizes="192×192"> <link
rel="icon" type="image/png"
href="/static/favicon/favicon-160×160.png"
sizes="160×160"> <link
rel="icon" type="image/png"
href="/static/favicon/favicon-96×96.png"
sizes="96×96"> <link
rel="icon" type="image/png"
href="/static/favicon/favicon-16×16.png"
sizes="16×16"> <link
rel="icon" type="image/png"
href="/static/favicon/favicon-32×32.png"
sizes="32×32"> <meta
name="msapplication-TileColor"
content="#da532c"> <meta
name="msapplication-TileImage"
content="/static/favicon/mstile-144×144.png">
<meta name="msapplication-config"
content="/static/favicon/browserconfig.xml">
<!– External CSS –> <link
rel="stylesheet"
href="https://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap.min.css">
<!– External Fonts –> <link
href="//maxcdn.bootstrapcdn.com/font-awesome/4.2.0/css/font-awesome.min.css"
rel="stylesheet"> <link
href='//fonts.googleapis.com/css?family=Open+Sans:300,600'
rel='stylesheet'
type='text/css'> <link
href='//fonts.googleapis.com/css?family=Lora:400,700'
rel='stylesheet'
type='text/css'> <!–[if lt IE
9]> <script
src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.2/html5shiv.min.js"></script>
<script
src="//cdnjs.cloudflare.com/ajax/libs/respond.js/1.4.2/respond.min.js"></script>
<![endif]–> <!– Site CSS –>
<link rel="stylesheet"
type="text/css"
href="/static/css/style.css"> <link
rel="stylesheet" type="text/css"
href="/static/css/glyphicon.css">
</head> <body> <div
class="container article-page"> <div
class="row"> <div
class="col-md-8 col-md-offset-2">
<h2><a
href="http://www.huffingtonpost.com/2015/03/22/ted-cruz-climate-change_n_6919002.html">Gov.
Jerry Brown Says Ted Cruz Is &#39;Absolutely
Unfit&#39; To Run For Office Because Of Climate Change
Views</a></h2> <h4>Sam
Levine</h4> <div
class="article"> <p>California
Gov. Jerry Brown (D) said on Sunday that Texas Sen. Ted Cruz (R-Texas)
is "absolutely unfit to be running for office"
because of his position on climate change.</p>
<p>"I just came back from New Hampshire, where
there's snow and ice everywhere. My view on this is simple:
Debates on this should follow science and should follow data, and many
of the alarmists on global warming, they have a problem because the
science doesn't back them up," Cruz <a
href="https://www.youtube.com/watch?v=m0UJ_Sc0Udk">said</a>
on "Late Night with Seth Meyers" last
week.</p> <p>To back up his claim, Cruz
cited satellite data that has shown a lack of significant warming over
the last 17 years. But Cruz's reasoning <a
href="http://www.politifact.com/truth-o-meter/statements/2015/mar/20
/ted-cruz/ted-cruzs-worlds-fire-not-last-17-years/">has
been debunked by Politifact</a>, which has shown that
scientists have ample evidence to believe that the climate will
continue to warm.</p> <p>"What he
said is absolutely false,” Brown said on <a
href="http://www.nbcnews.com/meet-the-press/california-governor-ted-cruz-
unfit-be-running-n328046">NBC's
"Meet the Press."</a> He added that
<a
href="http://climate.nasa.gov/scientific-consensus/">over
90 percent</a> of scientists who study the climate agree
that climate change is caused by human activity. "That man
betokens such a level of ignorance and a direct falsification of
existing scientific data. It's shocking, and I think that man
has rendered himself absolutely unfit to be running for
office," Brown said.</p> <p>Brown
added that climate change has <a
href="http://www.huffingtonpost.com/2015/03/06/california-drought-february-
record_n_6820704.html?utm_hp_ref=california-drought">caused
droughts in his state</a>, as well as severe cold and
storms on the east coast.</p> <p>While
Cruz may have seen snow and ice everywhere in New Hampshire, data
shows that the country is actually experiencing a <a
href="http://www.huffingtonpost.com/2015/02/19/cold-weather-
winter_n_6713104.html">warmer than
average</a> winter.</p>
<p>Brown’s criticism of Cruz comes one day before the
Texas senator is set to announce a <a
href="http://www.huffingtonpost.com/2015/03/22
/ted-cruz-2016_n_6917824.html">presidential
campaign</a>. </p> </div>
<div class="original"> <a
href="http://www.huffingtonpost.com/2015/03/22/ted-cruz-climate-change_n_6919002.html">VIEW
ORIGINAL</a> </div> </div>
</div> </div> <script
src="//code.jquery.com/jquery-latest.js"></script>
<script
src="/static/js/modal.js"></script>
<script
src="/static/js/bootbox.min.js"></script>
<script
src="/static/js/site.js"></script> <script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new
Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-56257533-1',
'auto'); ga('send',
'pageview'); </script>
</body> </html>”
回答 0
我在这里修改了代码:
public classtestextendsActivity {privateWebViewwv;
@OverrideprotectedvoidonCreate(Bundle savedInstanceState) {super.onCreate(savedInstanceState);setContentView(R.layout.test);wv = (WebView) findViewById(R.id.wv);Strings = "<!DOCTYPEhtml> <htmllang="en"> <head> <metacharset="utf-8"> <metahttp-equiv="X-UA-Compatible" content="IE=edge"> <metaname="viewport" content="width=device-width, initial-scale=1.0"> <metaname="description" content=""> <title>Saulify</title> <!-- AlltheFavicons... --> <linkrel="shortcuticon" href="/static/favicon/favicon.ico"> <linkrel="apple-touch-icon" sizes="57x57" href="/static/favicon/apple-touch-icon-57x57.png"> <linkrel="apple-touch-icon" sizes="114x114" href="/static/favicon/apple-touch-icon-114x114.png"> <linkrel="apple-touch-icon" sizes="72x72" href="/static/favicon/apple-touch-icon-72x72.png"> <linkrel="apple-touch-icon" sizes="144x144" href="/static/favicon/apple-touch-icon-144x144.png"> <linkrel="apple-touch-icon" sizes="60x60" href="/static/favicon/apple-touch-icon-60x60.png"> <linkrel="apple-touch-icon" sizes="120x120" href="/static/favicon/apple-touch-icon-120x120.png"> <linkrel="apple-touch-icon" sizes="76x76" href="/static/favicon/apple-touch-icon-76x76.png"> <linkrel="apple-touch-icon" sizes="152x152" href="/static/favicon/apple-touch-icon-152x152.png"> <linkrel="apple-touch-icon" sizes="180x180" href="/static/favicon/apple-touch-icon-180x180.png"> <linkrel="icon" type="image/png" href="/static/favicon/favicon-192x192.png" sizes="192x192"> <linkrel="icon" type="image/png" href="/static/favicon/favicon-160x160.png" sizes="160x160"> <linkrel="icon" type="image/png" href="/static/favicon/favicon-96x96.png" sizes="96x96"> <linkrel="icon" type="image/png" href="/static/favicon/favicon-16x16.png" sizes="16x16"> <linkrel="icon" type="image/png" href="/static/favicon/favicon-32x32.png" sizes="32x32"> <metaname="msapplication-TileColor" content="#da532c"> <metaname="msapplication-TileImage" content="/static/favicon/mstile-144x144.png"> <metaname="msapplication-config" content="/static/favicon/browserconfig.xml"> <!-- ExternalCSS --> <linkrel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap.min.css"> <!-- External Fonts --> <link href="//maxcdn.bootstrapcdn.com/font-awesome/4.2.0/css/font-awesome.min.css" rel="stylesheet"> <link href='//fonts.googleapis.com/css?family=Open+Sans:300,600' rel='stylesheet' type='text/css'> <link href='//fonts.googleapis.com/css?family=Lora:400,700' rel='stylesheet' type='text/css'> <!--[if lt IE 9]> <script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.2/html5shiv.min.js"></script> <script src="//cdnjs.cloudflare.com/ajax/libs/respond.js/1.4.2/respond.min.js"></script> <![endif]--> <!-- Site CSS --> <link rel="stylesheet" type="text/css" href="/static/css/style.css"> <link rel="stylesheet" type="text/css" href="/static/css/glyphicon.css"> </head> <body> <div class="container article-page"> <div class="row"> <div class="col-md-8 col-md-offset-2"> <h2><a href="http://www.huffingtonpost.com/2015/03/22/ted-cruz-climate-change_n_6919002.html">Gov. Jerry Brown Says Ted Cruz Is &#39;Absolutely Unfit&#39; To Run For Office Because Of Climate Change Views</a></h2> <h4>Sam Levine</h4> <div class="article"> <p>California Gov. Jerry Brown (D) said on Sunday that Texas Sen. Ted Cruz (R-Texas) is "absolutely unfit to be running for office" because of his position on climate change.</p> <p>"I just came back from New Hampshire, where there's snow and ice everywhere. My view on this is simple: Debates on this should follow science and should follow data, and many of the alarmists on global warming, they have a problem because the science doesn't back them up," Cruz <a href="https://www.youtube.com/watch?v=m0UJ_Sc0Udk">said</a> on "Late Night with Seth Meyers" last week.</p> <p>To back up his claim, Cruz cited satellite data that has shown a lack of significant warming over the last 17 years. But Cruz's reasoning <a href="http://www.politifact.com/truth-o-meter/statements/2015/mar/20 /ted-cruz/ted-cruzs-worlds-fire-not-last-17-years/">has been debunked by Politifact</a>, which has shown that scientists have ample evidence to believe that the climate will continue to warm.</p> <p>"What he said is absolutely false,” Brown said on <a href="http://www.nbcnews.com/meet-the-press/california-governor-ted-cruz- unfit-be-running-n328046">NBC's "Meet the Press."</a> He added that <a href="http://climate.nasa.gov/scientific-consensus/">over 90 percent</a> of scientists who study the climate agree that climate change is caused by human activity. "That man betokens such a level of ignorance and a direct falsification of existing scientific data. It's shocking, and I think that man has rendered himself absolutely unfit to be running for office," Brown said.</p> <p>Brown added that climate change has <a href="http://www.huffingtonpost.com/2015/03/06/california-drought-february- record_n_6820704.html?utm_hp_ref=california-drought">caused droughts in his state</a>, as well as severe cold and storms on the east coast.</p> <p>While Cruz may have seen snow and ice everywhere in New Hampshire, data shows that the country is actually experiencing a <a href="http://www.huffingtonpost.com/2015/02/19/cold-weather- winter_n_6713104.html">warmer than average</a> winter.</p> <p>Brown’s criticism of Cruz comes one day before the Texas senator is set to announce a <a href="http://www.huffingtonpost.com/2015/03/22 /ted-cruz-2016_n_6917824.html">presidential campaign</a>. </p> </div> <div class="original"> <a href="http://www.huffingtonpost.com/2015/03/22/ted-cruz-climate-change_n_6919002.html">VIEW ORIGINAL</a> </div> </div> </div> </div> <script src="//code.jquery.com/jquery-latest.js"></script> <script src="/static/js/modal.js"></script> <script src="/static/js/bootbox.min.js"></script> <script src="/static/js/site.js"></script> <script> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-56257533-1', 'auto'); ga('send', 'pageview'); </script> </body> </html>";
wv.loadData(stripHtml(s), "text/html", "UTF-8");
}
public String stripHtml(String html) {
return Html.fromHtml(html).toString();
}
}
public class test extends Activity {
private WebView wv;
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.test);
wv = (WebView) findViewById(R.id.wv);
String s = "<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="description" content=""> <title>Saulify</title> <!-- All the Favicons... --> <link rel="shortcut icon" href="/static/favicon/favicon.ico"> <link rel="apple-touch-icon" sizes="57x57" href="/static/favicon/apple-touch-icon-57x57.png"> <link rel="apple-touch-icon" sizes="114x114" href="/static/favicon/apple-touch-icon-114x114.png"> <link rel="apple-touch-icon" sizes="72x72" href="/static/favicon/apple-touch-icon-72x72.png"> <link rel="apple-touch-icon" sizes="144x144" href="/static/favicon/apple-touch-icon-144x144.png"> <link rel="apple-touch-icon" sizes="60x60" href="/static/favicon/apple-touch-icon-60x60.png"> <link rel="apple-touch-icon" sizes="120x120" href="/static/favicon/apple-touch-icon-120x120.png"> <link rel="apple-touch-icon" sizes="76x76" href="/static/favicon/apple-touch-icon-76x76.png"> <link rel="apple-touch-icon" sizes="152x152" href="/static/favicon/apple-touch-icon-152x152.png"> <link rel="apple-touch-icon" sizes="180x180" href="/static/favicon/apple-touch-icon-180x180.png"> <link rel="icon" type="image/png" href="/static/favicon/favicon-192x192.png" sizes="192x192"> <link rel="icon" type="image/png" href="/static/favicon/favicon-160x160.png" sizes="160x160"> <link rel="icon" type="image/png" href="/static/favicon/favicon-96x96.png" sizes="96x96"> <link rel="icon" type="image/png" href="/static/favicon/favicon-16x16.png" sizes="16x16"> <link rel="icon" type="image/png" href="/static/favicon/favicon-32x32.png" sizes="32x32"> <meta name="msapplication-TileColor" content="#da532c"> <meta name="msapplication-TileImage" content="/static/favicon/mstile-144x144.png"> <meta name="msapplication-config" content="/static/favicon/browserconfig.xml"> <!-- External CSS --> <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap.min.css"> <!-- External Fonts --> <link href="//maxcdn.bootstrapcdn.com/font-awesome/4.2.0/css/font-awesome.min.css" rel="stylesheet"> <link href='//fonts.googleapis.com/css?family=Open+Sans:300,600' rel='stylesheet' type='text/css'> <link href='//fonts.googleapis.com/css?family=Lora:400,700' rel='stylesheet' type='text/css'> <!--[if lt IE 9]> <script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.2/html5shiv.min.js"></script> <script src="//cdnjs.cloudflare.com/ajax/libs/respond.js/1.4.2/respond.min.js"></script> <![endif]--> <!-- Site CSS --> <link rel="stylesheet" type="text/css" href="/static/css/style.css"> <link rel="stylesheet" type="text/css" href="/static/css/glyphicon.css"> </head> <body> <div class="container article-page"> <div class="row"> <div class="col-md-8 col-md-offset-2"> <h2><a href="http://www.huffingtonpost.com/2015/03/22/ted-cruz-climate-change_n_6919002.html">Gov. Jerry Brown Says Ted Cruz Is &#39;Absolutely Unfit&#39; To Run For Office Because Of Climate Change Views</a></h2> <h4>Sam Levine</h4> <div class="article"> <p>California Gov. Jerry Brown (D) said on Sunday that Texas Sen. Ted Cruz (R-Texas) is "absolutely unfit to be running for office" because of his position on climate change.</p> <p>"I just came back from New Hampshire, where there's snow and ice everywhere. My view on this is simple: Debates on this should follow science and should follow data, and many of the alarmists on global warming, they have a problem because the science doesn't back them up," Cruz <a href="https://www.youtube.com/watch?v=m0UJ_Sc0Udk">said</a> on "Late Night with Seth Meyers" last week.</p> <p>To back up his claim, Cruz cited satellite data that has shown a lack of significant warming over the last 17 years. But Cruz's reasoning <a href="http://www.politifact.com/truth-o-meter/statements/2015/mar/20 /ted-cruz/ted-cruzs-worlds-fire-not-last-17-years/">has been debunked by Politifact</a>, which has shown that scientists have ample evidence to believe that the climate will continue to warm.</p> <p>"What he said is absolutely false,” Brown said on <a href="http://www.nbcnews.com/meet-the-press/california-governor-ted-cruz- unfit-be-running-n328046">NBC's "Meet the Press."</a> He added that <a href="http://climate.nasa.gov/scientific-consensus/">over 90 percent</a> of scientists who study the climate agree that climate change is caused by human activity. "That man betokens such a level of ignorance and a direct falsification of existing scientific data. It's shocking, and I think that man has rendered himself absolutely unfit to be running for office," Brown said.</p> <p>Brown added that climate change has <a href="http://www.huffingtonpost.com/2015/03/06/california-drought-february- record_n_6820704.html?utm_hp_ref=california-drought">caused droughts in his state</a>, as well as severe cold and storms on the east coast.</p> <p>While Cruz may have seen snow and ice everywhere in New Hampshire, data shows that the country is actually experiencing a <a href="http://www.huffingtonpost.com/2015/02/19/cold-weather- winter_n_6713104.html">warmer than average</a> winter.</p> <p>Brown’s criticism of Cruz comes one day before the Texas senator is set to announce a <a href="http://www.huffingtonpost.com/2015/03/22 /ted-cruz-2016_n_6917824.html">presidential campaign</a>. </p> </div> <div class="original"> <a href="http://www.huffingtonpost.com/2015/03/22/ted-cruz-climate-change_n_6919002.html">VIEW ORIGINAL</a> </div> </div> </div> </div> <script src="//code.jquery.com/jquery-latest.js"></script> <script src="/static/js/modal.js"></script> <script src="/static/js/bootbox.min.js"></script> <script src="/static/js/site.js"></script> <script> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-56257533-1', 'auto'); ga('send', 'pageview'); </script> </body> </html>";
wv.loadData(stripHtml(s), "text/html", "UTF-8");
}
public String stripHtml(String html) {
return Html.fromHtml(html).toString();
}
}
回答 1
试试这个代码,
if (android.os.Build.VERSION.SDK_INT >= android.os.Build.VERSION_CODES.N){
yourtextview.setText(Html.fromHtml(yourstring,Html.FROM_HTML_MODE_LEGACY));
}
else {
yourtextview.setText(Html.fromHtml(yourstring));
}
"StudentRoster Jan-1":
id Name score isEnrolled Comment111Jack2.17TrueHe was late to class112Nick1.11FalseGraduated113Zoe4.12True"StudentRoster Jan-2":
id Name score isEnrolled Comment111Jack2.17TrueHe was late to class112Nick1.21FalseGraduated113Zoe4.12FalseOn vacation
"StudentRoster Difference Jan-1 - Jan-2":
id Name score isEnrolled Comment112Nick was 1.11| now 1.21FalseGraduated113Zoe4.12 was True| now False was ""| now "On vacation"
I am trying to highlight exactly what changed between two dataframes.
Suppose I have two Python Pandas dataframes:
"StudentRoster Jan-1":
id Name score isEnrolled Comment
111 Jack 2.17 True He was late to class
112 Nick 1.11 False Graduated
113 Zoe 4.12 True
"StudentRoster Jan-2":
id Name score isEnrolled Comment
111 Jack 2.17 True He was late to class
112 Nick 1.21 False Graduated
113 Zoe 4.12 False On vacation
My goal is to output an HTML table that:
Identifies rows that have changed (could be int, float, boolean, string)
Outputs rows with same, OLD and NEW values (ideally into an HTML table) so the consumer can clearly see what changed between two dataframes:
"StudentRoster Difference Jan-1 - Jan-2":
id Name score isEnrolled Comment
112 Nick was 1.11| now 1.21 False Graduated
113 Zoe 4.12 was True | now False was "" | now "On vacation"
I suppose I could do a row by row and column by column comparison, but is there an easier way?
回答 0
第一部分类似于君士坦丁,您可以获取哪些行为空的布尔值*:
In[21]: ne =(df1 != df2).any(1)In[22]: ne
Out[22]:0False1True2True
dtype: bool
The first part is similar to Constantine, you can get the boolean of which rows are empty*:
In [21]: ne = (df1 != df2).any(1)
In [22]: ne
Out[22]:
0 False
1 True
2 True
dtype: bool
Then we can see which entries have changed:
In [23]: ne_stacked = (df1 != df2).stack()
In [24]: changed = ne_stacked[ne_stacked]
In [25]: changed.index.names = ['id', 'col']
In [26]: changed
Out[26]:
id col
1 score True
2 isEnrolled True
Comment True
dtype: bool
Here the first entry is the index and the second the columns which has been changed.
In [27]: difference_locations = np.where(df1 != df2)
In [28]: changed_from = df1.values[difference_locations]
In [29]: changed_to = df2.values[difference_locations]
In [30]: pd.DataFrame({'from': changed_from, 'to': changed_to}, index=changed.index)
Out[30]:
from to
id col
1 score 1.11 1.21
2 isEnrolled True False
Comment None On vacation
* Note: it’s important that df1 and df2 share the same index here. To overcome this ambiguity, you can ensure you only look at the shared labels using df1.index & df2.index, but I think I’ll leave that as an exercise.
Now, its much easier to spot the differences in the frames. But, we can go further and use the style property to highlight the cells that are different. We define a custom function to do this which you can see in this part of the documentation.
import pandas as pd
import numpy as np
def diff_pd(df1, df2):"""Identify differences between two pandas DataFrames"""assert(df1.columns == df2.columns).all(), \
"DataFrame column names are different"if any(df1.dtypes != df2.dtypes):"Data Types are different, trying to convert"
df2 = df2.astype(df1.dtypes)if df1.equals(df2):returnNoneelse:# need to account for np.nan != np.nan returning True
diff_mask =(df1 != df2)&~(df1.isnull()& df2.isnull())
ne_stacked = diff_mask.stack()
changed = ne_stacked[ne_stacked]
changed.index.names =['id','col']
difference_locations = np.where(diff_mask)
changed_from = df1.values[difference_locations]
changed_to = df2.values[difference_locations]return pd.DataFrame({'from': changed_from,'to': changed_to},
index=changed.index)
因此,对于您的数据(略作编辑以使分数列中具有NaN):
import sys
if sys.version_info[0]<3:fromStringIOimportStringIOelse:from io importStringIO
DF1 =StringIO("""id Name score isEnrolled Comment
111 Jack 2.17 True "He was late to class"
112 Nick 1.11 False "Graduated"
113 Zoe NaN True " "
""")
DF2 =StringIO("""id Name score isEnrolled Comment
111 Jack 2.17 True "He was late to class"
112 Nick 1.21 False "Graduated"
113 Zoe NaN False "On vacation" """)
df1 = pd.read_table(DF1, sep='\s+', index_col='id')
df2 = pd.read_table(DF2, sep='\s+', index_col='id')
diff_pd(df1, df2)
输出:
from to
id col
112 score 1.111.21113 isEnrolled TrueFalseCommentOn vacation
This answer simply extends @Andy Hayden’s, making it resilient to when numeric fields are nan, and wrapping it up into a function.
import pandas as pd
import numpy as np
def diff_pd(df1, df2):
"""Identify differences between two pandas DataFrames"""
assert (df1.columns == df2.columns).all(), \
"DataFrame column names are different"
if any(df1.dtypes != df2.dtypes):
"Data Types are different, trying to convert"
df2 = df2.astype(df1.dtypes)
if df1.equals(df2):
return None
else:
# need to account for np.nan != np.nan returning True
diff_mask = (df1 != df2) & ~(df1.isnull() & df2.isnull())
ne_stacked = diff_mask.stack()
changed = ne_stacked[ne_stacked]
changed.index.names = ['id', 'col']
difference_locations = np.where(diff_mask)
changed_from = df1.values[difference_locations]
changed_to = df2.values[difference_locations]
return pd.DataFrame({'from': changed_from, 'to': changed_to},
index=changed.index)
So with your data (slightly edited to have a NaN in the score column):
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
DF1 = StringIO("""id Name score isEnrolled Comment
111 Jack 2.17 True "He was late to class"
112 Nick 1.11 False "Graduated"
113 Zoe NaN True " "
""")
DF2 = StringIO("""id Name score isEnrolled Comment
111 Jack 2.17 True "He was late to class"
112 Nick 1.21 False "Graduated"
113 Zoe NaN False "On vacation" """)
df1 = pd.read_table(DF1, sep='\s+', index_col='id')
df2 = pd.read_table(DF2, sep='\s+', index_col='id')
diff_pd(df1, df2)
Output:
from to
id col
112 score 1.11 1.21
113 isEnrolled True False
Comment On vacation
回答 3
import pandas as pd
import io
texts =['''\
id Name score isEnrolled Comment
111 Jack 2.17 True He was late to class
112 Nick 1.11 False Graduated
113 Zoe 4.12 True ''','''\
id Name score isEnrolled Comment
111 Jack 2.17 True He was late to class
112 Nick 1.21 False Graduated
113 Zoe 4.12 False On vacation''']
df1 = pd.read_fwf(io.StringIO(texts[0]), widths=[5,7,25,21,20])
df2 = pd.read_fwf(io.StringIO(texts[1]), widths=[5,7,25,21,20])
df = pd.concat([df1,df2])print(df)# id Name score isEnrolled Comment# 0 111 Jack 2.17 True He was late to class# 1 112 Nick 1.11 False Graduated# 2 113 Zoe 4.12 True NaN# 0 111 Jack 2.17 True He was late to class# 1 112 Nick 1.21 False Graduated# 2 113 Zoe 4.12 False On vacation
df.set_index(['id','Name'], inplace=True)print(df)# score isEnrolled Comment# id Name # 111 Jack 2.17 True He was late to class# 112 Nick 1.11 False Graduated# 113 Zoe 4.12 True NaN# 111 Jack 2.17 True He was late to class# 112 Nick 1.21 False Graduated# 113 Zoe 4.12 False On vacationdef report_diff(x):return x[0]if x[0]== x[1]else'{} | {}'.format(*x)
changes = df.groupby(level=['id','Name']).agg(report_diff)print(changes)
版画
score isEnrolled Comment
id Name111Jack2.17TrueHe was late to class112Nick1.11|1.21FalseGraduated113Zoe4.12True|False nan |On vacation
import pandas as pd
import io
texts = ['''\
id Name score isEnrolled Comment
111 Jack 2.17 True He was late to class
112 Nick 1.11 False Graduated
113 Zoe 4.12 True ''',
'''\
id Name score isEnrolled Comment
111 Jack 2.17 True He was late to class
112 Nick 1.21 False Graduated
113 Zoe 4.12 False On vacation''']
df1 = pd.read_fwf(io.StringIO(texts[0]), widths=[5,7,25,21,20])
df2 = pd.read_fwf(io.StringIO(texts[1]), widths=[5,7,25,21,20])
df = pd.concat([df1,df2])
print(df)
# id Name score isEnrolled Comment
# 0 111 Jack 2.17 True He was late to class
# 1 112 Nick 1.11 False Graduated
# 2 113 Zoe 4.12 True NaN
# 0 111 Jack 2.17 True He was late to class
# 1 112 Nick 1.21 False Graduated
# 2 113 Zoe 4.12 False On vacation
df.set_index(['id', 'Name'], inplace=True)
print(df)
# score isEnrolled Comment
# id Name
# 111 Jack 2.17 True He was late to class
# 112 Nick 1.11 False Graduated
# 113 Zoe 4.12 True NaN
# 111 Jack 2.17 True He was late to class
# 112 Nick 1.21 False Graduated
# 113 Zoe 4.12 False On vacation
def report_diff(x):
return x[0] if x[0] == x[1] else '{} | {}'.format(*x)
changes = df.groupby(level=['id', 'Name']).agg(report_diff)
print(changes)
prints
score isEnrolled Comment
id Name
111 Jack 2.17 True He was late to class
112 Nick 1.11 | 1.21 False Graduated
113 Zoe 4.12 True | False nan | On vacation
回答 4
我已经遇到了这个问题,但是在找到这篇文章之前找到了答案:
根据unutbu的答案,加载您的数据…
import pandas as pd
import io
texts =['''\
id Name score isEnrolled Date
111 Jack True 2013-05-01 12:00:00
112 Nick 1.11 False 2013-05-12 15:05:23
Zoe 4.12 True ''','''\
id Name score isEnrolled Date
111 Jack 2.17 True 2013-05-01 12:00:00
112 Nick 1.21 False
Zoe 4.12 False 2013-05-01 12:00:00''']
df1 = pd.read_fwf(io.StringIO(texts[0]), widths=[5,7,25,17,20], parse_dates=[4])
df2 = pd.read_fwf(io.StringIO(texts[1]), widths=[5,7,25,17,20], parse_dates=[4])
my_panel = pd.Panel(dict(df1=df1,df2=df2))
print my_panel.apply(report_diff, axis=0)
# id Name score isEnrolled Date
#0 111 Jack nan | 2.17 True 2013-05-01 12:00:00
#1 112 Nick 1.11 | 1.21 False 2013-05-12 15:05:23 | NaT
#2 nan | nan Zoe 4.12 True | False NaT | 2013-05-01 12:00:00
By the way, if you’re in IPython Notebook, you may like to use a colored diff function
to give colors depending whether cells are different, equal or left/right null :
from IPython.display import HTML
pd.options.display.max_colwidth = 500 # You need this, otherwise pandas
# will limit your HTML strings to 50 characters
def report_diff(x):
if x[0]==x[1]:
return unicode(x[0].__str__())
elif pd.isnull(x[0]) and pd.isnull(x[1]):
return u'<table style="background-color:#00ff00;font-weight:bold;">'+\
'<tr><td>%s</td></tr><tr><td>%s</td></tr></table>' % ('nan', 'nan')
elif pd.isnull(x[0]) and ~pd.isnull(x[1]):
return u'<table style="background-color:#ffff00;font-weight:bold;">'+\
'<tr><td>%s</td></tr><tr><td>%s</td></tr></table>' % ('nan', x[1])
elif ~pd.isnull(x[0]) and pd.isnull(x[1]):
return u'<table style="background-color:#0000ff;font-weight:bold;">'+\
'<tr><td>%s</td></tr><tr><td>%s</td></tr></table>' % (x[0],'nan')
else:
return u'<table style="background-color:#ff0000;font-weight:bold;">'+\
'<tr><td>%s</td></tr><tr><td>%s</td></tr></table>' % (x[0], x[1])
HTML(my_panel.apply(report_diff, axis=0).to_html(escape=False))
If your two dataframes have the same ids in them, then finding out what changed is actually pretty easy. Just doing frame1 != frame2 will give you a boolean DataFrame where each True is data that has changed. From that, you could easily get the index of each changed row by doing changedids = frame1.index[np.any(frame1 != frame2,axis=1)].
回答 6
使用concat和drop_duplicates的另一种方法:
import sys
if sys.version_info[0]<3:fromStringIOimportStringIOelse:from io importStringIOimport pandas as pd
DF1 =StringIO("""id Name score isEnrolled Comment
111 Jack 2.17 True "He was late to class"
112 Nick 1.11 False "Graduated"
113 Zoe NaN True " "
""")
DF2 =StringIO("""id Name score isEnrolled Comment
111 Jack 2.17 True "He was late to class"
112 Nick 1.21 False "Graduated"
113 Zoe NaN False "On vacation" """)
df1 = pd.read_table(DF1, sep='\s+', index_col='id')
df2 = pd.read_table(DF2, sep='\s+', index_col='id')#%%
dictionary ={1:df1,2:df2}
df=pd.concat(dictionary)
df.drop_duplicates(keep=False)
输出:
Name score isEnrolled Comment
id
1112Nick1.11FalseGraduated113ZoeNaNTrue2112Nick1.21FalseGraduated113ZoeNaNFalseOn vacation
In[6]:# first lets create some dummy dataframes with some column(s) different...: df1 = pd.DataFrame({'a': range(-5,0),'b': range(10,15),'c': range(20,25)})...: df2 = pd.DataFrame({'a': range(-5,0),'b': range(10,15),'c':[20]+ list(range(101,105))})In[7]: df1
Out[7]:
a b c
0-510201-411212-312223-213234-11424In[8]: df2
Out[8]:
a b c
0-510201-4111012-3121023-2131034-114104In[10]:# make condition over the columns you want to comapre...: condition = df1['c']!= df2['c']...:...:# select rows from each dataframe where the condition holds...: diff1 = df1[condition]...: diff2 = df2[condition]In[11]:# merge the selected rows (dataframes) with some suffixes (optional)...: diff1.merge(diff2, on=['a','b'], suffixes=('_before','_after'))Out[11]:
a b c_before c_after
0-411211011-312221022-213231033-11424104
pd.__version__
# '1.1.0.dev0+2004.g8d10bfb6f'
df1.compare(df2)
score isEnrolled Comment
self other self other self other
11.111.21NaNNaNNaNNaN2NaNNaN1.00.0NaNOn vacation
df1.compare(df2, keep_equal=True, keep_shape=True)
score isEnrolled Comment
self other self other self other
11.111.21FalseFalseGraduatedGraduated24.124.12TrueFalseNaNOn vacation
您还可以使用align_axis以下方式更改比较轴:
df1.compare(df2, align_axis='index')
score isEnrolled Comment1 self 1.11NaNNaN
other 1.21NaNNaN2 self NaN1.0NaN
other NaN0.0On vacation
With pandas 1.1, you could essentially replicate Ted Petrou’s output with a single function call. Example taken from the docs:
pd.__version__
# '1.1.0'
df1.compare(df2)
score isEnrolled Comment
self other self other self other
1 1.11 1.21 NaN NaN NaN NaN
2 NaN NaN 1.0 0.0 NaN On vacation
Here, “self” refers to the LHS dataFrame, while “other” is the RHS DataFrame. By default, equal values are replaced with NaNs so you can focus on just the diffs. If you want to show values that are equal as well, use
df1.compare(df2, keep_equal=True, keep_shape=True)
score isEnrolled Comment
self other self other self other
1 1.11 1.21 False False Graduated Graduated
2 4.12 4.12 True False NaN On vacation
You can also change the axis of comparison using align_axis:
df1.compare(df2, align_axis='index')
score isEnrolled Comment
1 self 1.11 NaN NaN
other 1.21 NaN NaN
2 self NaN 1.0 NaN
other NaN 0.0 On vacation
This compares values row-wise, instead of column-wise.
def diff_df(df1, df2, how="left"):"""
Find Difference of rows for given two dataframes
this function is not symmetric, means
diff(x, y) != diff(y, x)
however
diff(x, y, how='left') == diff(y, x, how='right')
Ref: /programming/18180763/set-difference-for-pandas/40209800#40209800
"""if(df1.columns != df2.columns).any():raiseValueError("Two dataframe columns must match")if df1.equals(df2):returnNoneelif how =='right':return pd.concat([df2, df1, df1]).drop_duplicates(keep=False)elif how =='left':return pd.concat([df1, df2, df2]).drop_duplicates(keep=False)else:raiseValueError('how parameter supports only "left" or "right keywords"')
例:
df1 = pd.DataFrame(d1)Out[1]:CommentName isEnrolled score
0He was late to classJackTrue2.171GraduatedNickFalse1.112ZoeTrue4.12
df2 = pd.DataFrame(d2)Out[2]:CommentName isEnrolled score
0He was late to classJackTrue2.171On vacation ZoeTrue4.12
diff_df(df1, df2)Out[3]:CommentName isEnrolled score
1GraduatedNickFalse1.112ZoeTrue4.12
diff_df(df2, df1)Out[4]:CommentName isEnrolled score
1On vacation ZoeTrue4.12# This gives the same result as above
diff_df(df1, df2, how='right')Out[22]:CommentName isEnrolled score
1On vacation ZoeTrue4.12
def diff_df(df1, df2, how="left"):
"""
Find Difference of rows for given two dataframes
this function is not symmetric, means
diff(x, y) != diff(y, x)
however
diff(x, y, how='left') == diff(y, x, how='right')
Ref: https://stackoverflow.com/questions/18180763/set-difference-for-pandas/40209800#40209800
"""
if (df1.columns != df2.columns).any():
raise ValueError("Two dataframe columns must match")
if df1.equals(df2):
return None
elif how == 'right':
return pd.concat([df2, df1, df1]).drop_duplicates(keep=False)
elif how == 'left':
return pd.concat([df1, df2, df2]).drop_duplicates(keep=False)
else:
raise ValueError('how parameter supports only "left" or "right keywords"')
Example:
df1 = pd.DataFrame(d1)
Out[1]:
Comment Name isEnrolled score
0 He was late to class Jack True 2.17
1 Graduated Nick False 1.11
2 Zoe True 4.12
df2 = pd.DataFrame(d2)
Out[2]:
Comment Name isEnrolled score
0 He was late to class Jack True 2.17
1 On vacation Zoe True 4.12
diff_df(df1, df2)
Out[3]:
Comment Name isEnrolled score
1 Graduated Nick False 1.11
2 Zoe True 4.12
diff_df(df2, df1)
Out[4]:
Comment Name isEnrolled score
1 On vacation Zoe True 4.12
# This gives the same result as above
diff_df(df1, df2, how='right')
Out[22]:
Comment Name isEnrolled score
1 On vacation Zoe True 4.12
回答 12
将pda导入为pd将numpy导入为np
df = pd.read_excel(’D:\ HARISH \ DATA SCIENCE \ 1 MY Training \ SAMPLE DATA&PROJS \ CRICKET DATA \ IPL PLAYER LIST \ IPL PLAYER LIST _ harish.xlsx’)
Is it possible to embed rendered HTML output into IPython output?
One way is to use
from IPython.core.display import HTML
HTML('<a href="http://example.com">link</a>')
or (IPython multiline cell alias)
%%html
<a href="http://example.com">link</a>
Which return a formatted link, but
This link doesn’t open a browser with the webpage itself from the console. IPython notebooks support honest rendering, though.
I’m unaware of how to render HTML() object within, say, a list or pandas printed table. You can do df.to_html(), but without making links inside cells.
This output isn’t interactive in the PyCharm Python console (because it’s not QT).
How can I overcome these shortcomings and make IPython output a bit more interactive?
回答 0
这似乎为我工作:
fromIPython.core.display import display, HTML
display(HTML('<h1>Hello, world!</h1>'))
Some time ago Jupyter Notebooks started stripping JavaScript from HTML content [#3118]. Here are two solutions:
Serving Local HTML
If you want to embed an HTML page with JavaScript on your page now, the easiest thing to do is to save your HTML file to the directory with your notebook and then load the HTML as follows:
from IPython.display import IFrame
IFrame(src='./nice.html', width=700, height=600)
Serving Remote HTML
If you prefer a hosted solution, you can upload your HTML page to an Amazon Web Services “bucket” in S3, change the settings on that bucket so as to make the bucket host a static website, then use an Iframe component in your notebook:
from IPython.display import IFrame
IFrame(src='https://s3.amazonaws.com/duhaime/blog/visualizations/isolation-forests.html', width=700, height=600)
This will render your HTML content and JavaScript in an iframe, just like you can on any other web page:
display(HTML('<h1>Hello, world!</h1>'))print("Here's a link:")
display(HTML("<a href='http://www.google.com' target='_blank'>www.google.com</a>"))print("some more printed text ...")
display(HTML('<p>Paragraph text here ...</p>'))
Expanding on @Harmon above, looks like you can combine the display and print statements together … if you need. Or, maybe it’s easier to just format your entire HTML as one string and then use display. Either way, nice feature.
display(HTML('<h1>Hello, world!</h1>'))
print("Here's a link:")
display(HTML("<a href='http://www.google.com' target='_blank'>www.google.com</a>"))
print("some more printed text ...")
display(HTML('<p>Paragraph text here ...</p>'))