标签归档:google-app-engine

有关使用Google App Engine的反馈?[关闭]

问题:有关使用Google App Engine的反馈?[关闭]

希望做一个非常小,快速而又肮脏的项目。我喜欢Google App Engine在Python上运行并内置Django的事实-给我借口尝试该平台…但是我的问题是:

除了玩具问题以外,还有人利用App Engine处理其他事情吗?我看到了一些很好的示例应用程序,因此我认为这对于真实交易来说已经足够了,但是我想获得一些反馈。

任何其他成功/失败记录都很棒。

Looking to do a very small, quick ‘n dirty side project. I like the fact that the Google App Engine is running on Python with Django built right in – gives me an excuse to try that platform… but my question is this:

Has anyone made use of the app engine for anything other than a toy problem? I see some good example apps out there, so I would assume this is good enough for the real deal, but wanted to get some feedback.

Any other success/failure notes would be great.


回答 0

我已经为小型地震监视应用程序http://quakewatch.appspot.com/尝试了应用引擎

我的目的是查看应用程序引擎的功能,因此主要要点:

  1. 默认情况下,它不是Django附带的,它有自己的Web框架,它是pythonic的,具有像Django这样的URL调度程序,并且使用Django模板。因此,如果您有Django exp。您会发现它易于使用
  2. 您无法在服务器上执行任何长时间运行的进程,您要做的就是回复请求,并且应该很快,否则appengine会杀死它。因此,如果您的应用程序需要大量后端处理,则appengine不是最佳方法,否则您将不得不进行处理在您自己的服务器上
  3. 我的quakewatch应用程序具有订阅功能,这意味着我必须通过电子邮件将最新的地震发生,但是我无法在app引擎中运行后台进程来监视新的地震解决方案,这里是使用像pingablity.com这样的第三方服务,连接到您的页面之一并执行订阅电子邮件程序,但是在这里您还必须注意不要在这里花费太多时间或将任务分成几部分
  4. 它提供了类似于Django的建模功能,但后端完全不同,但是对于新项目而言,这并不重要。

但是总的来说,我认为这对于创建不需要大量后台处理的应用程序非常有用。

编辑:现在,任务队列可用于运行批处理或计划的任务

编辑:在GAE上工作/创建一个真实的应用程序一年之后,现在我的看法是,除非您要开发需要扩展到数百万用户的应用程序,否则请不要使用GAE。由于具有分布式特性,因此在GAE中维护和执行琐碎的任务是一件令人头疼的事情,为了避免超出期限,错误地计数实体或执行复杂的查询需要复杂的代码,因此小型复杂的应用程序应坚持使用LAMP。

编辑:模型应该特别考虑到您将来希望进行的所有交易而设计,因为只能在同一实体组中的实体可以在交易中使用,并且这使得更新两个不同组的过程成为噩梦,例如将资金从user1转移到user2除非它们在同一个实体组中,否则不可能在事务中进行交易,但是对于频繁更新而言,使它们成为同一实体组可能不是最好的….阅读此http://blog.notdot.net/2009/9/Distributed-Transactions-应用引擎

I have tried app engine for my small quake watch application http://quakewatch.appspot.com/

My purpose was to see the capabilities of app engine, so here are the main points:

  1. it doesn’t come by default with Django, it has its own web framework which is pythonic has URL dispatcher like Django and it uses Django templates So if you have Django exp. you will find it easy to use
  2. You can not execute any long running process on server, what you do is reply to request and which should be quick otherwise appengine will kill it So if your app needs lots of backend processing appengine is not the best way otherwise you will have to do processing on a server of your own
  3. My quakewatch app has a subscription feature, it means I had to email latest quakes as they happend, but I can not run a background process in app engine to monitor new quakes solution here is to use a third part service like pingablity.com which can connect to one of your page and which executes the subscription emailer but here also you will have to take care that you don’t spend much time here or break task into several pieces
  4. It provides Django like modeling capabilities but backend is totally different but for a new project it should not matter.

But overall I think it is excellent for creating apps which do not need lot of background processing.

Edit: Now task queues can be used for running batch processing or scheduled tasks

Edit: after working/creating a real application on GAE for a year, now my opnion is that unless you are making a application which needs to scale to million and million of users, don’t use GAE. Maintaining and doing trivial tasks in GAE is a headache due to distributed nature, to avoid deadline exceeded errors, count entities or do complex queries requires complex code, so small complex application should stick to LAMP.

Edit: Models should be specially designed considering all the transactions you wish to have in future, because entities only in same entity group can be used in a transaction and it makes the process of updating two different groups a nightmare e.g. transfer money from user1 to user2 in transaction is impossible unless they are in same entity group, but making them same entity group may not be best for frequent update purposes…. read this http://blog.notdot.net/2009/9/Distributed-Transactions-on-App-Engine


回答 1

我正在使用GAE托管多个高流量应用程序。大约为50-100 req / sec。太好了,我不能推荐它。

我以前的Web开发经验是使用Ruby(Rails / Merb)。学习Python很容易。我并没有弄乱Django或Pylons或任何其他框架,只是从GAE示例开始,并从提供的基本webapp库中构建了我需要的东西。

如果您习惯了SQL的灵活性,那么数据存储区可能需要一些时间来适应。没什么太伤人的!最大的调整是远离JOIN。您必须摆脱标准化是至关重要的想法。

I am using GAE to host several high-traffic applications. Like on the order of 50-100 req/sec. It is great, I can’t recommend it enough.

My previous experience with web development was with Ruby (Rails/Merb). Learning Python was easy. I didn’t mess with Django or Pylons or any other framework, just started from the GAE examples and built what I needed out of the basic webapp libraries that are provided.

If you’re used to the flexibility of SQL the datastore can take some getting used to. Nothing too traumatic! The biggest adjustment is moving away from JOINs. You have to shed the idea that normalizing is crucial.

Ben


回答 2

我使用Google App Engine的令人信服的原因之一是它与您所在域的Google Apps集成。从本质上讲,它允许您创建自定义的托管Web应用程序,这些应用程序仅限于您域的(受控)登录名。

我在这段代码中的大部分经验是构建一个简单的时间/任务跟踪应用程序。模板引擎很简单,但是使多页应用程序非常容易上手。登录/用户意识api同样有用。我能够创建一个公共页面/私有页面范例而没有太多问题。(用户将登录以查看私有页面。仅向匿名用户显示公共页面。)

当我因从事“实际工作”而离开时,我只是进入项目的数据存储部分。

我能够在极短的时间内完成很多工作(尚未完成)。因为我以前从未使用过Python,所以这特别令人愉快(既因为这对我来说是一种新语言,也因为尽管使用了新语言,但开发仍然非常快)。我碰到的很少,使我相信我将无法完成任务。相反,我对功能和特性有一个相当积极的印象。

这就是我的经验。也许它不仅仅代表未完成的玩具项目,还代表对平台的知情试用,我希望能有所帮助。

One of the compelling reasons I have come across for using Google App Engine is its integration with Google Apps for your domain. Essentially it allows you to create custom, managed web applications that are restricted to the (controlled) logins of your domain.

Most of my experience with this code was building a simple time/task tracking application. The template engine was simple and yet made a multi-page application very approachable. The login/user awareness api is similarly useful. I was able to make a public page/private page paradigm without too much issue. (a user would log in to see the private pages. An anonymous user was only shown the public page.)

I was just getting into the datastore portion of the project when I got pulled away for “real work”.

I was able to accomplish a lot (it still is not done yet) in a very little amount of time. Since I had never used Python before, this was particularly pleasant (both because it was a new language for me, and also because the development was still fast despite the new language). I ran into very little that led me to believe that I wouldn’t be able to accomplish my task. Instead I have a fairly positive impression of the functionality and features.

That is my experience with it. Perhaps it doesn’t represent more than an unfinished toy project, but it does represent an informed trial of the platform, and I hope that helps.


回答 3

“运行Django的App Engine”的想法有点误导。App Engine取代了整个Django模型层,因此准备花一些时间适应App Engine的数据存储,这需要以不同的方式建模和思考数据。

The “App Engine running Django” idea is a bit misleading. App Engine replaces the entire Django model layer so be prepared to spend some time getting acclimated with App Engine’s datastore which requires a different way of modeling and thinking about data.


回答 4

我用GAE建立了http://www.muspy.com

它不仅是一个玩具项目,而且也不过分复杂。我仍然依赖于Google可以解决的一些问题,但是整体开发网站是一种令人愉快的体验。

如果您不想处理托管问题,服务器管理等问题,绝对可以推荐。特别是如果您已经了解Python和Django。

I used GAE to build http://www.muspy.com

It’s a bit more than a toy project but not overly complex either. I still depend on a few issues to be addressed by Google, but overall developing the website was an enjoyable experience.

If you don’t want to deal with hosting issues, server administration, etc, I can definitely recommend it. Especially if you already know Python and Django.


回答 5

我认为,对于小型项目,App Engine目前非常不错。有很多事情可以说,不必担心托管。该API还会引导您朝着构建可扩展应用程序的方向发展,这是一种很好的做法。

  • app-engine-patch是Django和App Engine之间的良好层,可启用auth应用程序及更多功能。
  • Google承诺在2008年底之前提供SLA和定价模型。
  • 请求必须在10秒内完成,对Web服务的子请求则需要在5秒内完成。这迫使您设计一个快速,轻量级的应用程序,将重要的处理工作卸载到其他平台(例如,托管服务或EC2实例)。
  • 更多语言即将推出!Google不会说:-)。我的钱接下来是Java。

I think App Engine is pretty cool for small projects at this point. There’s a lot to be said for never having to worry about hosting. The API also pushes you in the direction of building scalable apps, which is good practice.

  • app-engine-patch is a good layer between Django and App Engine, enabling the use of the auth app and more.
  • Google have promised an SLA and pricing model by the end of 2008.
  • Requests must complete in 10 seconds, sub-requests to web services required to complete in 5 seconds. This forces you to design a fast, lightweight application, off-loading serious processing to other platforms (e.g. a hosted service or an EC2 instance).
  • More languages are coming soon! Google won’t say which though :-). My money’s on Java next.

回答 6

这个问题已经被完全回答。哪个好 但是也许有一件事值得一提。google app引擎有一个eclipse ide插件,很高兴使用。

如果您已经使用Eclipse进行开发,您将对此感到非常高兴。

要在Google App Engine的网站上进行部署,我需要做的就是单击一个带有飞机徽标的小按钮-超级。

This question has been fully answered. Which is good. But one thing perhaps is worth mentioning. The google app engine has a plugin for the eclipse ide which is a joy to work with.

If you already do your development with eclipse you are going to be so happy about that.

To deploy on the google app engine’s web site all I need to do is click one little button – with the airplane logo – super.


回答 7

看一下sql游戏,它非常稳定,实际上将流量限制提高了一点,从而使其受到Google的限制。除了将App托管在其他人完全控制的服务器上之外,我没有看到关于App Engine的好消息。

Take a look the the sql game, it is very stable and actually pushed traffic limits at one point so that it was getting throttled by Google. I have seen nothing but good news about App Engine, other than hosting you app on servers someone else controls completely.


回答 8

我使用GAE构建了一个简单的应用程序,该应用程序接受一些参数,格式并发送电子邮件。这是非常简单和快速的。我还在GAE数据存储和内存缓存服务(http://dbaspects.blogspot.com/2010/01/memcache-vs-datastore-on-google-app.html)上做了一些性能基准测试。没有那么快。我认为GAE是执行特定方法的严肃平台。我认为它将演变为真正可扩展的平台,在该平台上根本不允许不良做法。

I used GAE to build a simple application which accepts some parameters, formats and send email. It was extremely simple and fast. I also made some performance benchmarks on the GAE datastore and memcache services (http://dbaspects.blogspot.com/2010/01/memcache-vs-datastore-on-google-app.html ). It is not that fast. My opinion is that GAE is serious platform which enforce certain methodology. I think it will evolve to the truly scalable platform, where bad practices simply not allowed.


回答 9

我将GAE用于我的Flash游戏网站Bearded Games。GAE是一个很棒的平台。我使用了Django模板,它比PHP过去要容易得多。它带有出色的管理面板,并为您提供了非常好的日志。数据存储区不同于MySQL之类的数据库,但是使用起来要容易得多。建立网站非常简单明了,他们在网站上有很多有用的建议。

I used GAE for my flash gaming site, Bearded Games. GAE is a great platform. I used Django templates which are so much easier than the old days of PHP. It comes with a great admin panel, and gives you really good logs. The datastore is different than a database like MySQL, but it’s much easier to work with. Building the site was easy and straightforward and they have lots of helpful advice on the site.


回答 10

我使用GAE和Django构建了Facebook应用程序。我以http://code.google.com/p/app-engine-patch作为起点,因为它具有Django 1.1支持。我没有尝试使用任何manage.py命令,因为我认为它们不起作用,但是我什至没有调查。该应用程序具有三个模型,还使用了pyfacebook,但这就是复杂程度。我正在构建一个更加复杂的应用程序,并开始在http://brianyamabe.com上发布博客。

I used GAE and Django to build a Facebook application. I used http://code.google.com/p/app-engine-patch as my starting point as it has Django 1.1 support. I didn’t try to use any of the manage.py commands because I assumed they wouldn’t work, but I didn’t even look into it. The application had three models and also used pyfacebook, but that was the extent of the complexity. I’m in the process of building a much more complicated application which I’m starting to blog about on http://brianyamabe.com.


Google App Engine的项目结构

问题:Google App Engine的项目结构

我刚问世时就在Google App Engine中启动了一个应用程序,以使用该技术并从事一个我一直想了很久但从未尝试过的宠物项目。结果是BowlSK。但是,随着它的增长和功能的添加,使其变得井井有条变得非常困难-主要是因为这是我的第一个python项目,在开始工作之前我对此一无所知。

我有的:

  • 主级别包含:
    • 所有.py文件(不知道如何使程序包正常工作)
    • 主页面的所有.html模板
  • 子目录:
    • 用于CSS,图片,JS等的单独文件夹。
    • 包含用于子目录类型网址的.html模板的文件夹

示例:
http : //www.bowlsk.com/映射到HomePage(默认包),模板位于“ index.html”
http://www.bowlsk.com/games/view-series.html?series=7130映射到ViewSeriesPage(同样是默认程序包),位于“ games / view-series.html”的模板

真讨厌 我如何重组?我有两个想法:

  • 主文件夹包含:appdef,索引,main.py?

    • 代码的子文件夹。这一定是我的第一个包裹吗?
    • 模板的子文件夹。文件夹层次结构将与包层次结构匹配
    • CSS,图像,JS等的单个子文件夹。
  • 主文件夹包含appdef,索引,main.py?

    • 代码+模板的子文件夹。这样,我就在模板旁边设置了处理程序类,因为在此阶段,我要添加许多功能,因此对一个进行修改意味着对另一个进行了修改。同样,我必须将此文件夹名称作为Class的第一个软件包名称吗?我希望文件夹为“ src”,但我不希望我的Class为“ src.WhateverPage”

有最佳做法吗?随着Django 1.0的出现,当它成为正式的GAE模板引擎时,我现在可以做些什么来提高与它的集成能力?我将简单地开始尝试这些事情,然后看一看似乎更好,但是pyDev的重构支持似乎不能很好地处理程序包的移动,因此使所有这些再次工作可能不是一件容易的事。

I started an application in Google App Engine right when it came out, to play with the technology and work on a pet project that I had been thinking about for a long time but never gotten around to starting. The result is BowlSK. However, as it has grown, and features have been added, it has gotten really difficult to keep things organized – mainly due to the fact that this is my first python project, and I didn’t know anything about it until I started working.

What I have:

  • Main Level contains:
    • all .py files (didn’t know how to make packages work)
    • all .html templates for main level pages
  • Subdirectories:
    • separate folders for css, images, js, etc.
    • folders that hold .html templates for subdirecty-type urls

Example:
http://www.bowlsk.com/ maps to HomePage (default package), template at “index.html”
http://www.bowlsk.com/games/view-series.html?series=7130 maps to ViewSeriesPage (again, default package), template at “games/view-series.html”

It’s nasty. How do I restructure? I had 2 ideas:

  • Main Folder containing: appdef, indexes, main.py?

    • Subfolder for code. Does this have to be my first package?
    • Subfolder for templates. Folder heirarchy would match package heirarchy
    • Individual subfolders for css, images, js, etc.
  • Main Folder containing appdef, indexes, main.py?

    • Subfolder for code + templates. This way I have the handler class right next to the template, because in this stage, I’m adding lots of features, so modifications to one mean modifications to the other. Again, do I have to have this folder name be the first package name for my classes? I’d like the folder to be “src”, but I don’t want my classes to be “src.WhateverPage”

Is there a best practice? With Django 1.0 on the horizon, is there something I can do now to improve my ability to integrate with it when it becomes the official GAE templating engine? I would simply start trying these things, and seeing which seems better, but pyDev’s refactoring support doesn’t seem to handle package moves very well, so it will likely be a non-trivial task to get all of this working again.


回答 0

首先,我建议您看看“ 使用Python,Django和Google App Engine进行快速开发

GvR在幻灯片演示文稿的第10页上描述了常规/标准项目布局。

在这里,我将从该页面发布布局/结构的略微修改版本。我本人几乎遵循这种模式。您还提到了打包方面的问题。只要确保您的每个子文件夹都有一个__init__.py文件即可。如果它为空也可以。

样板文件

  • 这些项目之间几乎没有差异
  • app.yaml:将所有非静态请求定向到main.py
  • main.py:初始化应用并发送所有请求

项目布局

  • static / *:静态文件;由App Engine直接提供
  • myapp / *。py:特定于应用的python代码
    • views.py,models.py,tests.py,__ init__.py等
  • templates / *。html:模板(或myapp / templates / *。html)

以下是一些可能也有帮助的代码示例:

main.py

import wsgiref.handlers

from google.appengine.ext import webapp
from myapp.views import *

application = webapp.WSGIApplication([
  ('/', IndexHandler),
  ('/foo', FooHandler)
], debug=True)

def main():
  wsgiref.handlers.CGIHandler().run(application)

myapp / views.py

import os
import datetime
import logging
import time

from google.appengine.api import urlfetch
from google.appengine.ext.webapp import template
from google.appengine.api import users
from google.appengine.ext import webapp
from models import *

class IndexHandler(webapp.RequestHandler):
  def get(self):
    date = "foo"
    # Do some processing        
    template_values = {'data': data }
    path = os.path.join(os.path.dirname(__file__) + '/../templates/', 'main.html')
    self.response.out.write(template.render(path, template_values))

class FooHandler(webapp.RequestHandler):
  def get(self):
    #logging.debug("start of handler")

myapp / models.py

from google.appengine.ext import db

class SampleModel(db.Model):

我认为这种布局非常适合新的和相对较小的中型项目。对于较大的项目,我建议分解视图和模型以使其具有以下子文件夹:

项目布局

  • static /:静态文件;由App Engine直接提供
    • js / *。js
    • 图片/*.gif|png|jpg
    • css / *。css
  • myapp /:应用程序结构
    • 型号/*.py
    • 视图/*.py
    • 测试/*.py
    • templates / *。html:模板

First, I would suggest you have a look at “Rapid Development with Python, Django, and Google App Engine

GvR describes a general/standard project layout on page 10 of his slide presentation.

Here I’ll post a slightly modified version of the layout/structure from that page. I pretty much follow this pattern myself. You also mentioned you had trouble with packages. Just make sure each of your sub folders has an __init__.py file. It’s ok if its empty.

Boilerplate files

  • These hardly vary between projects
  • app.yaml: direct all non-static requests to main.py
  • main.py: initialize app and send it all requests

Project lay-out

  • static/*: static files; served directly by App Engine
  • myapp/*.py: app-specific python code
    • views.py, models.py, tests.py, __init__.py, and more
  • templates/*.html: templates (or myapp/templates/*.html)

Here are some code examples that may help as well:

main.py

import wsgiref.handlers

from google.appengine.ext import webapp
from myapp.views import *

application = webapp.WSGIApplication([
  ('/', IndexHandler),
  ('/foo', FooHandler)
], debug=True)

def main():
  wsgiref.handlers.CGIHandler().run(application)

myapp/views.py

import os
import datetime
import logging
import time

from google.appengine.api import urlfetch
from google.appengine.ext.webapp import template
from google.appengine.api import users
from google.appengine.ext import webapp
from models import *

class IndexHandler(webapp.RequestHandler):
  def get(self):
    date = "foo"
    # Do some processing        
    template_values = {'data': data }
    path = os.path.join(os.path.dirname(__file__) + '/../templates/', 'main.html')
    self.response.out.write(template.render(path, template_values))

class FooHandler(webapp.RequestHandler):
  def get(self):
    #logging.debug("start of handler")

myapp/models.py

from google.appengine.ext import db

class SampleModel(db.Model):

I think this layout works great for new and relatively small to medium projects. For larger projects I would suggest breaking up the views and models to have their own sub-folders with something like:

Project lay-out

  • static/: static files; served directly by App Engine
    • js/*.js
    • images/*.gif|png|jpg
    • css/*.css
  • myapp/: app structure
    • models/*.py
    • views/*.py
    • tests/*.py
    • templates/*.html: templates

回答 1

我通常的布局如下所示:

  • app.yaml
  • index.yaml
  • request.py-包含基本的WSGI应用程序
  • LIB
    • __init__.py -常见功能,包括请求处理程序基类
  • 控制器-包含所有处理程序。request.yaml导入这些。
  • 范本
    • 控制器使用的所有django模板
  • 模型
    • 所有数据存储区模型类
  • 静态的
    • 静态文件(css,图像等)。由app.yaml映射到/ static

如果不清楚,我可以提供一些示例,说明我的app.yaml,request.py,lib / init .py和示例控制器的外观。

My usual layout looks something like this:

  • app.yaml
  • index.yaml
  • request.py – contains the basic WSGI app
  • lib
    • __init__.py – common functionality, including a request handler base class
  • controllers – contains all the handlers. request.yaml imports these.
  • templates
    • all the django templates, used by the controllers
  • model
    • all the datastore model classes
  • static
    • static files (css, images, etc). Mapped to /static by app.yaml

I can provide examples of what my app.yaml, request.py, lib/init.py, and sample controllers look like, if this isn’t clear.


回答 2

我今天实现了一个Google App Engine样板,并在github上进行了检查。这与尼克·约翰逊(Nick Johnson)(曾为Google工作)所描述的思路一致。

点击此链接gae-boilerplate

I implemented a google app engine boilerplate today and checked it on github. This is along the lines described by Nick Johnson above (who used to work for Google).

Follow this link gae-boilerplate


回答 3

我认为第一种选择是最佳实践。并将代码文件夹作为您的第一个软件包。Guido van Rossum开发的Rietveld项目是一个很好的学习模型。看看吧:http : //code.google.com/p/rietveld

关于Django 1.0,建议您开始使用Django主干代码,而不要使用内置在django端口中的GAE。再次,看看Rietveld的工作方式。

I think the first option is considered the best practice. And make the code folder your first package. The Rietveld project developed by Guido van Rossum is a very good model to learn from. Have a look at it: http://code.google.com/p/rietveld

With regard to Django 1.0, I suggest you start using the Django trunk code instead of the GAE built in django port. Again, have a look at how it’s done in Rietveld.


回答 4

我喜欢webpy,因此在Google App Engine上将其用作模板框架。
我的软件包文件夹通常是这样组织的:

app.yaml
application.py
index.yaml
/app
   /config
   /controllers
   /db
   /lib
   /models
   /static
        /docs
        /images
        /javascripts
        /stylesheets
   test/
   utility/
   views/

是一个例子。

I like webpy so I’ve adopted it as templating framework on Google App Engine.
My package folders are typically organized like this:

app.yaml
application.py
index.yaml
/app
   /config
   /controllers
   /db
   /lib
   /models
   /static
        /docs
        /images
        /javascripts
        /stylesheets
   test/
   utility/
   views/

Here is an example.


回答 5

在代码布局方面,我并没有完全了解最新的最佳实践等,但是当我完成第一个GAE应用程序时,我在第二个选项中使用了一些东西,其中代码和模板彼此相邻。

造成这种情况的原因有两个:一是将代码和模板保留在附近,其二,我的目录结构布局与网站的布局相似,这对我来说使它变得更容易一点,也记住所有内容在哪里。

I am not entirely up to date on the latest best practices, et cetera when it comes to code layout, but when I did my first GAE application, I used something along your second option, where the code and templates are next to eachother.

There was two reasons for this – one, it kept the code and template nearby, and secondly, I had the directory structure layout mimic that of the website – making it (for me) a bit easier too remember where everything was.


Flask vs webapp2(适用于Google App Engine)

问题:Flask vs webapp2(适用于Google App Engine)

我正在启动新的Google App Engine应用程序,目前正在考虑两个框架:Flaskwebapp2。我对以前的App Engine应用程序使用的内置webapp框架感到非常满意,因此我认为webapp2会更好,并且不会有任何问题。

但是,Flask有很多不错的评论,我真的很喜欢Flask的方法以及到目前为止我在文档中已经读过的所有东西,我想尝试一下。但是我有点担心Flask会遇到的局限性。

因此,问题是- 您是否知道Flask可能会带入Google App Engine应用程序的任何问题,性能问题,限制(例如,路由系统,内置授权机制等)?“问题”是指我无法在几行代码(或任何合理数量的代码和工作量)中解决的问题,或者是完全不可能的事情。

还有一个后续问题:尽管我可能会遇到任何问题,但您认为Flask中是否有任何杀手级功能可以打动我,让我使用它?

I’m starting new Google App Engine application and currently considering two frameworks: Flask and webapp2. I’m rather satisfied with built-in webapp framework that I’ve used for my previous App Engine application, so I think webapp2 will be even better and I won’t have any problems with it.

However, there are a lot of good reviews of Flask, I really like its approach and all the things that I’ve read so far in the documentation and I want to try it out. But I’m a bit concerned about limitations that I can face down the road with Flask.

So, the question is – do you know any problems, performance issues, limitations (e.g. routing system, built-in authorization mechanism, etc.) that Flask could bring into Google App Engine application? By “problem” I mean something that I can’t work around in several lines of code (or any reasonable amount of code and efforts) or something that is completely impossible.

And as a follow-up question: are there any killer-features in Flask that you think can blow my mind and make me use it despite any problems that I can face?


回答 0

免责声明:我是tipfy和webapp2的作者。

坚持使用webapp(或其自然发展版本,webapp2)的一大优势是,您不必为自己选择的框架为现有的SDK处理程序创建自己的版本。

例如,deferred使用一个webapp处理程序。要在纯Flask视图中使用werkzeug.Request和werkzeug.Response来使用它,您需要为此实现deferred(就像我在这里为tipfy 所做的那样)。

其他处理程序也会发生同样的情况:blobstore(Werkzeug仍然不支持范围请求,因此即使您创建了自己的处理程序,也需要使用WebOb -请参见tipfy.appengine.blobstore),邮件,XMPP等,或以后包含在SDK中的其他内容。

对于使用App Engine创建的库,也是如此,例如ProtoRPC,它基于webapp,并且如果您不想混合使用webapp和框架,则需要端口或适配器才能与其他框架一起使用。同一应用中的选择处理程序。

因此,即使您选择其他框架,您也将不得不结束a)在某些特殊情况下使用webapp或b)必须创建和维护特定SDK处理程序或功能的版本(如果要使用它们)。

与WebOb相比,我更偏爱Werkzeug,但是在移植和维护可与Tipfy一起使用的SDK处理程序版本超过一年之后,我意识到这是一个失败的原因-长期支持GAE,最好是保持与webapp / WebOb。它使对SDK库的支持变得轻而易举,维护变得更加容易,因为新的库和SDK功能将立即可用,因此它具有更强的前瞻性,并且大型社区可以使用相同的App Engine工具来受益。

这里总结一个特定的webapp2防御。此外,webapp2可以在App Engine之外使用,并且易于自定义以使其看起来像流行的微框架,并且您有很多吸引人的理由。而且,webapp2有很大的机会被包含在将来的SDK版本中(这是非正式的,不要引用我的:-),这将推动它向前发展并带来新的开发人员和贡献。

也就是说,我是Werkzeug和Pocoo的忠实拥护者,并向Flask和其他人(web.py,Tornado)借了很多钱,但是-我知道,我有偏见-以上webapp2的好处应该被考虑在内。

Disclaimer: I’m the author of tipfy and webapp2.

A big advantage of sticking with webapp (or its natural evolution, webapp2) is that you don’t have to create your own versions for existing SDK handlers for your framework of your choice.

For example, deferred uses a webapp handler. To use it in a pure Flask view, using werkzeug.Request and werkzeug.Response, you’ll need to implement deferred for it (like I did here for tipfy).

The same happens for other handlers: blobstore (Werkzeug still doesn’t support range requests, so you’ll need to use WebOb even if you create your own handler — see tipfy.appengine.blobstore), mail, XMPP and so on, or others that are included in the SDK in the future.

And the same happens for libraries created with App Engine in mind, like ProtoRPC, which is based on webapp and would need a port or adapter to work with other frameworks, if you don’t want to mix webapp and your-framework-of-choice handlers in the same app.

So, even if you choose a different framework, you’ll end a) using webapp in some special cases or b) having to create and maintain your versions for specific SDK handlers or features, if you’ll use them.

I much prefer Werkzeug over WebOb, but after over one year porting and maintaining versions of the SDK handlers that work natively with tipfy, I realized that this is a lost cause — to support GAE for the long term, best is to stay close to webapp/WebOb. It makes support for SDK libraries a breeze, maintenance becomes a lot easier, it is more future-proof as new libraries and SDK features will work out of the box and there’s the benefit of a large community working around the same App Engine tools.

A specific webapp2 defense is summarized here. Add to those that webapp2 can be used outside of App Engine and is easy to be customized to look like popular micro-frameworks and you have a good set of compelling reasons to go for it. Also, webapp2 has a big chance to be included in a future SDK release (this is extra-official, don’t quote me :-) which will push it forward and bring new developers and contributions.

That said, I’m a big fan of Werkzeug and the Pocoo guys and borrowed a lot from Flask and others (web.py, Tornado), but — and, you know, I’m biased — the above webapp2 benefits should be taken into account.


回答 1

您的问题非常广泛,但是在Google App Engine上使用Flask似乎没有大问题。

该邮件列表线程链接到多个模板:

http://flask.pocoo.org/mailinglist/archive/2011/3/27/google-app-engine/#4f95bab1627a24922c60ad1d0a0a8e44

以下是针对Flask / App Engine组合的教程:

http://www.franciscosouza.com/2010/08/flying-with-flask-on-google-app-engine/

另请参阅App Engine-难以访问Twitter数据-FlaskFlask消息刷新无法跨重定向重定向,以及如何使用Google App Engine管理第三方Python库?(virtualenv?pip?)解决人们对于Flask和Google App Engine的问题。

Your question is extremely broad, but there appears to be no big problems using Flask on Google App Engine.

This mailing list thread links to several templates:

http://flask.pocoo.org/mailinglist/archive/2011/3/27/google-app-engine/#4f95bab1627a24922c60ad1d0a0a8e44

And here is a tutorial specific to the Flask / App Engine combination:

http://www.franciscosouza.com/2010/08/flying-with-flask-on-google-app-engine/

Also, see App Engine – Difficulty Accessing Twitter Data – Flask, Flask message flashing fails across redirects, and How do I manage third-party Python libraries with Google App Engine? (virtualenv? pip?) for issues people have had with Flask and Google App Engine.


回答 2

对我来说,当我发现烧瓶不是(从一开始就)不是面向对象的框架,而webapp2是一个纯粹的面向对象的框架时,对webapp2的决定很容易。webapp2将基于方法的调度作为所有RequestHandler的标准使用(如烧瓶文档在MethodViews中从V0.7开始对其进行调用和实现)。在烧瓶中,MethodViews是附加组件,它是webapp2的核心设计原则。因此,使用这两种框架,您的软件设计将看起来有所不同。如今,这两个框架都使用jinja2模板,并且功能完全相同。

我更喜欢将安全检查添加到基类RequestHandler并从中继承。这对于实用程序功能等也很有用。例如,您可以在链接[3]中看到,您可以重写方法以防止调度请求。

如果您是面向对象的人,或者需要设计REST服务器,我将为您推荐webapp2。如果您希望使用带有装饰器的简单函数作为多种请求类型的处理程序,或者对OO继承不满意,请选择flask。我认为这两个框架都避免了金字塔等大型框架的复杂性和依赖性。

  1. http://flask.pocoo.org/docs/0.10/views/#method-based-dispatching
  2. https://webapp-improved.appspot.com/guide/handlers.html
  3. https://webapp-improved.appspot.com/guide/handlers.html#overriding-dispatch

For me the decision for webapp2 was easy when I discovered that flask is not an object-oriented framework (from the beginning), while webapp2 is a pure object oriented framework. webapp2 uses Method Based Dispatching as standard for all RequestHandlers (as flask documentation calls it and implements it since V0.7 in MethodViews). While in flask MethodViews are an add-on it is a core design principle for webapp2. So your software design will look different using both frameworks. Both frameworks use nowadays jinja2 templates and are fairly feature identical.

I prefer to add security checks to a base-class RequestHandler and inherit from it. This is also good for utility functions, etc. As you can see for example in link [3] you can override methods to prevent dispatching a request.

If you are an OO-person, or if you need to design a REST-server, I would recommend webapp2 for you. If you prefer simple functions with decorators as handlers for multiple request-types, or you are uncomfortable with OO-inheritance then choose flask. I think both frameworks avoid the complexity and dependencies of much bigger frameworks like pyramid.

  1. http://flask.pocoo.org/docs/0.10/views/#method-based-dispatching
  2. https://webapp-improved.appspot.com/guide/handlers.html
  3. https://webapp-improved.appspot.com/guide/handlers.html#overriding-dispatch

回答 3

我认为Google App Engine正式支持Flask框架。这里有一个示例代码和教程-> https://console.developers.google.com/start/appengine?_ga=1.36257892.596387946.1427891855

I think google app engine officially supports flask framework. There is a sample code and tutorial here -> https://console.developers.google.com/start/appengine?_ga=1.36257892.596387946.1427891855


回答 4

我没有尝试使用webapp2,但发现tipfy有点难以使用,因为它需要安装脚本和用于将python安装配置为默认设置以外的构建。由于这些原因和其他原因,我没有使我最大的项目依赖于框架,而是使用普通的webapp,而是添加了名为beaker的库以获取会话功能,并且django已经内置了许多用例通用词的翻译,因此在构建框架时本地化的应用程序django是我最大的项目的正确选择。我实际在项目中部署到生产环境的其他两个框架分别是GAEframework.com和web2py,通常看来,添加一个更改其模板引擎的框架可能会导致新旧版本不兼容。

因此,我的经验是,除非他们解决了更高级的用例(文件上传,多重身份验证,管理ui是目前尚无框架的3个更高级的用例示例),否则我不愿意在项目中添加框架处理得很好。

I didn’t try webapp2 and found that tipfy was a bit difficult to use since it required setup scripts and builds that configure your python installation to other than default. For these and other reasons I haven’t made my largest project depend on a framework and I use the plain webapp instead, add the library called beaker to get session capability and django already has builtin translations for words common to many usecases so when building a localized application django was the right choice for my largest project. The 2 other frameworks I actually deployed with projects to a production environment were GAEframework.com and web2py and generally it seems that adding a framework which changes its template engine could lead to incompatibilities between old and new versions.

So my experience is that I’m being reluctant to adding a framework to my projects unless they solve the more advanced use cases (file upload, multi auth, admin ui are 3 examples of more advanced use cases that no framework for gae at the moment handles well.


使用app.yaml将环境变量安全地存储在GAE中

问题:使用app.yaml将环境变量安全地存储在GAE中

我需要将API密钥和其他敏感信息存储app.yaml为环境变量,以便在GAE上进行部署。问题是如果我推app.yaml送到GitHub,此信息将公开(不好)。我不想将信息存储在数据存储中,因为它不适合该项目。相反,我想换出.gitignore应用程序每次部署中列出的文件中的值。

这是我的app.yaml文件:

application: myapp
version: 3 
runtime: python27
api_version: 1
threadsafe: true

libraries:
- name: webapp2
  version: latest
- name: jinja2
  version: latest

handlers:
- url: /static
  static_dir: static

- url: /.*
  script: main.application  
  login: required
  secure: always
# auth_fail_action: unauthorized

env_variables:
  CLIENT_ID: ${CLIENT_ID}
  CLIENT_SECRET: ${CLIENT_SECRET}
  ORG: ${ORG}
  ACCESS_TOKEN: ${ACCESS_TOKEN}
  SESSION_SECRET: ${SESSION_SECRET}

有任何想法吗?

I need to store API keys and other sensitive information in app.yaml as environment variables for deployment on GAE. The issue with this is that if I push app.yaml to GitHub, this information becomes public (not good). I don’t want to store the info in a datastore as it does not suit the project. Rather, I’d like to swap out the values from a file that is listed in .gitignore on each deployment of the app.

Here is my app.yaml file:

application: myapp
version: 3 
runtime: python27
api_version: 1
threadsafe: true

libraries:
- name: webapp2
  version: latest
- name: jinja2
  version: latest

handlers:
- url: /static
  static_dir: static

- url: /.*
  script: main.application  
  login: required
  secure: always
# auth_fail_action: unauthorized

env_variables:
  CLIENT_ID: ${CLIENT_ID}
  CLIENT_SECRET: ${CLIENT_SECRET}
  ORG: ${ORG}
  ACCESS_TOKEN: ${ACCESS_TOKEN}
  SESSION_SECRET: ${SESSION_SECRET}

Any ideas?


回答 0

如果是敏感数据,则不应将其存储在源代码中,因为它将被检查到源代码管理中。错误的人(组织内部或外部)可能会在此处找到它。另外,您的开发环境可能会使用与生产环境不同的配置值。如果这些值存储在代码中,则您将不得不在开发和生产中运行不同的代码,这是很麻烦的做法。

在我的项目中,我使用此类将配置数据放入数据存储区:

from google.appengine.ext import ndb

class Settings(ndb.Model):
  name = ndb.StringProperty()
  value = ndb.StringProperty()

  @staticmethod
  def get(name):
    NOT_SET_VALUE = "NOT SET"
    retval = Settings.query(Settings.name == name).get()
    if not retval:
      retval = Settings()
      retval.name = name
      retval.value = NOT_SET_VALUE
      retval.put()
    if retval.value == NOT_SET_VALUE:
      raise Exception(('Setting %s not found in the database. A placeholder ' +
        'record has been created. Go to the Developers Console for your app ' +
        'in App Engine, look up the Settings record with name=%s and enter ' +
        'its value in that record\'s value field.') % (name, name))
    return retval.value

您的应用程序将这样做以获取价值:

API_KEY = Settings.get('API_KEY')

如果数据存储中有该键的值,则将获得它。如果没有,将创建一个占位符记录并引发异常。该异常将提醒您转到开发人员控制台并更新占位符记录。

我发现这消除了对设置配置值的猜测。如果不确定要设置哪些配置值,只需运行代码,它将告诉您!

上面的代码使用了ndb库,该库使用了memcache和后台的数据存储,因此速度很快。


更新:

jelder询问如何在App Engine控制台中找到数据存储区值并进行设置。方法如下:

  1. 前往https://console.cloud.google.com/datastore/

  2. 如果尚未选择项目,请在页面顶部选择它。

  3. 种类下拉框中,选择设置

  4. 如果您运行上面的代码,您的密钥将会显示。它们都将具有值NOT SET。单击每个并设置其值。

希望这可以帮助!

If it’s sensitive data, you should not store it in source code as it will be checked into source control. The wrong people (inside or outside your organization) may find it there. Also, your development environment probably uses different config values from your production environment. If these values are stored in code, you will have to run different code in development and production, which is messy and bad practice.

In my projects, I put config data in the datastore using this class:

from google.appengine.ext import ndb

class Settings(ndb.Model):
  name = ndb.StringProperty()
  value = ndb.StringProperty()

  @staticmethod
  def get(name):
    NOT_SET_VALUE = "NOT SET"
    retval = Settings.query(Settings.name == name).get()
    if not retval:
      retval = Settings()
      retval.name = name
      retval.value = NOT_SET_VALUE
      retval.put()
    if retval.value == NOT_SET_VALUE:
      raise Exception(('Setting %s not found in the database. A placeholder ' +
        'record has been created. Go to the Developers Console for your app ' +
        'in App Engine, look up the Settings record with name=%s and enter ' +
        'its value in that record\'s value field.') % (name, name))
    return retval.value

Your application would do this to get a value:

API_KEY = Settings.get('API_KEY')

If there is a value for that key in the datastore, you will get it. If there isn’t, a placeholder record will be created and an exception will be thrown. The exception will remind you to go to the Developers Console and update the placeholder record.

I find this takes the guessing out of setting config values. If you are unsure of what config values to set, just run the code and it will tell you!

The code above uses the ndb library which uses memcache and the datastore under the hood, so it’s fast.


Update:

jelder asked for how to find the Datastore values in the App Engine console and set them. Here is how:

  1. Go to https://console.cloud.google.com/datastore/

  2. Select your project at the top of the page if it’s not already selected.

  3. In the Kind dropdown box, select Settings.

  4. If you ran the code above, your keys will show up. They will all have the value NOT SET. Click each one and set its value.

Hope this helps!


回答 1

此解决方案很简单,但可能不适合所有不同的团队。

首先,将环境变量放入env_variables.yaml中,例如,

env_variables:
  SECRET: 'my_secret'

然后,将其包含env_variables.yamlapp.yaml

includes:
  - env_variables.yaml

最后,将添加env_variables.yaml.gitignore,以使秘密变量在存储库中不存在。

在这种情况下,env_variables.yaml需要在部署管理器之间共享。

This solution is simple but may not suit all different teams.

First, put the environment variables in an env_variables.yaml, e.g.,

env_variables:
  SECRET: 'my_secret'

Then, include this env_variables.yaml in the app.yaml

includes:
  - env_variables.yaml

Finally, add the env_variables.yaml to .gitignore, so that the secret variables won’t exist in the repository.

In this case, the env_variables.yaml needs to be shared among the deployment managers.


回答 2

我的方法是将客户端机密存储在App Engine应用本身中。客户端机密既不在源代码控制中,也不在任何本地计算机上。这样的好处是,任何 App Engine合作者都可以部署代码更改,而不必担心客户端机密。

我将客户端机密直接存储在数据存储区中,并使用Memcache改善了访问机密的延迟。数据存储区实体仅需要创建一次,并将在以后的部署中保持不变。当然,可以随时使用App Engine控制台更新这些实体。

有两种方法可以执行一次性实体创建:

  • 使用App Engine 远程API交互式外壳程序创建实体。
  • 创建一个仅管理员处理程序,该处理程序将使用伪值初始化实体。手动调用此管理处理程序,然后使用App Engine控制台使用生产客户端密码更新实体。

My approach is to store client secrets only within the App Engine app itself. The client secrets are neither in source control nor on any local computers. This has the benefit that any App Engine collaborator can deploy code changes without having to worry about the client secrets.

I store client secrets directly in Datastore and use Memcache for improved latency accessing the secrets. The Datastore entities only need to be created once and will persist across future deploys. of course the App Engine console can be used to update these entities at any time.

There are two options to perform the one-time entity creation:

  • Use the App Engine Remote API interactive shell to create the entities.
  • Create an Admin only handler that will initialize the entities with dummy values. Manually invoke this admin handler, then use the App Engine console to update the entities with the production client secrets.

回答 3

最好的方法是将密钥存储在client_secrets.json文件中,并通过在.gitignore文件中列出密钥,将其从上传到git中排除。如果您在不同环境下使用不同的密钥,则可以使用app_identity api来确定应用程序ID是什么,并进行适当加载。

这里有一个相当全面的示例-> https://developers.google.com/api-client-library/python/guide/aaa_client_secrets

这是一些示例代码:

# declare your app ids as globals ...
APPID_LIVE = 'awesomeapp'
APPID_DEV = 'awesomeapp-dev'
APPID_PILOT = 'awesomeapp-pilot'

# create a dictionary mapping the app_ids to the filepaths ...
client_secrets_map = {APPID_LIVE:'client_secrets_live.json',
                      APPID_DEV:'client_secrets_dev.json',
                      APPID_PILOT:'client_secrets_pilot.json'}

# get the filename based on the current app_id ...
client_secrets_filename = client_secrets_map.get(
    app_identity.get_application_id(),
    APPID_DEV # fall back to dev
    )

# use the filename to construct the flow ...
flow = flow_from_clientsecrets(filename=client_secrets_filename,
                               scope=scope,
                               redirect_uri=redirect_uri)

# or, you could load up the json file manually if you need more control ...
f = open(client_secrets_filename, 'r')
client_secrets = json.loads(f.read())
f.close()

Best way to do it, is store the keys in a client_secrets.json file, and exclude that from being uploaded to git by listing it in your .gitignore file. If you have different keys for different environments, you can use app_identity api to determine what the app id is, and load appropriately.

There is a fairly comprehensive example here -> https://developers.google.com/api-client-library/python/guide/aaa_client_secrets.

Here’s some example code:

# declare your app ids as globals ...
APPID_LIVE = 'awesomeapp'
APPID_DEV = 'awesomeapp-dev'
APPID_PILOT = 'awesomeapp-pilot'

# create a dictionary mapping the app_ids to the filepaths ...
client_secrets_map = {APPID_LIVE:'client_secrets_live.json',
                      APPID_DEV:'client_secrets_dev.json',
                      APPID_PILOT:'client_secrets_pilot.json'}

# get the filename based on the current app_id ...
client_secrets_filename = client_secrets_map.get(
    app_identity.get_application_id(),
    APPID_DEV # fall back to dev
    )

# use the filename to construct the flow ...
flow = flow_from_clientsecrets(filename=client_secrets_filename,
                               scope=scope,
                               redirect_uri=redirect_uri)

# or, you could load up the json file manually if you need more control ...
f = open(client_secrets_filename, 'r')
client_secrets = json.loads(f.read())
f.close()

回答 4

发布时不存在此功能,但对于在这里偶然发现的其他人,Google现在提供一项称为Secret Manager的服务

这是一个简单的REST服务(当然,其中包含SDK)将您的机密存储在Google云平台上的安全位置。与Data Store相比,这是一种更好的方法,需要额外的步骤来查看存储的机密并具有更细粒度的权限模型-如果需要,您可以针对项目的不同方面以不同的方式保护单个机密。

它提供版本控制,因此您可以相对轻松地处理密码更改,以及强大的查询和管理层,使您能够在必要时在运行时发现和创建机密信息。

Python SDK

用法示例:

from google.cloud import secretmanager_v1beta1 as secretmanager

secret_id = 'my_secret_key'
project_id = 'my_project'
version = 1    # use the management tools to determine version at runtime

client = secretmanager.SecretManagerServiceClient()

secret_path = client.secret_verion_path(project_id, secret_id, version)
response = client.access_secret_version(secret_path)
password_string = response.payload.data.decode('UTF-8')

# use password_string -- set up database connection, call third party service, whatever

This didn’t exist when you posted, but for anyone else who stumbles in here, Google now offers a service called Secret Manager.

It’s a simple REST service (with SDKs wrapping it, of course) to store your secrets in a secure location on google cloud platform. This is a better approach than Data Store, requiring extra steps to see the stored secrets and having a finer-grained permission model — you can secure individual secrets differently for different aspects of your project, if you need to.

It offers versioning, so you can handle password changes with relative ease, as well as a robust query and management layer enabling you to discover and create secrets at runtime, if necessary.

Python SDK

Example usage:

from google.cloud import secretmanager_v1beta1 as secretmanager

secret_id = 'my_secret_key'
project_id = 'my_project'
version = 1    # use the management tools to determine version at runtime

client = secretmanager.SecretManagerServiceClient()

secret_path = client.secret_verion_path(project_id, secret_id, version)
response = client.access_secret_version(secret_path)
password_string = response.payload.data.decode('UTF-8')

# use password_string -- set up database connection, call third party service, whatever

回答 5

此解决方案依赖于已弃用的appcfg.py

将应用程序部署到GAE时,可以使用appcfg.py的-E命令行选项设置环境变量(appcfg.py更新)

$ appcfg.py
...
-E NAME:VALUE, --env_variable=NAME:VALUE
                    Set an environment variable, potentially overriding an
                    env_variable value from app.yaml file (flag may be
                    repeated to set multiple variables).
...

This solution relies on the deprecated appcfg.py

You can use the -E command line option of appcfg.py to setup the environment variables when you deploy your app to GAE (appcfg.py update)

$ appcfg.py
...
-E NAME:VALUE, --env_variable=NAME:VALUE
                    Set an environment variable, potentially overriding an
                    env_variable value from app.yaml file (flag may be
                    repeated to set multiple variables).
...

回答 6

大多数答案已过时。实际上,现在使用Google Cloud Datastore有点不同。https://cloud.google.com/python/getting-started/using-cloud-datastore

这是一个例子:

from google.cloud import datastore
client = datastore.Client()
datastore_entity = client.get(client.key('settings', 'TWITTER_APP_KEY'))
connection_string_prod = datastore_entity.get('value')

假设实体名称为“ TWITTER_APP_KEY”,种类为“设置”,“值”为TWITTER_APP_KEY实体的属性。

Most answers are outdated. Using google cloud datastore is actually a bit different right now. https://cloud.google.com/python/getting-started/using-cloud-datastore

Here’s an example:

from google.cloud import datastore
client = datastore.Client()
datastore_entity = client.get(client.key('settings', 'TWITTER_APP_KEY'))
connection_string_prod = datastore_entity.get('value')

This assumes the entity name is ‘TWITTER_APP_KEY’, the kind is ‘settings’, and ‘value’ is a property of the TWITTER_APP_KEY entity.


回答 7

听起来您可以采取一些方法。我们有一个类似的问题,请执行以下操作(以适合您的用例):

  • 创建一个存储任何动态app.yaml值的文件,并将其放置在构建环境中的安全服务器上。如果您确实偏执,则可以非对称地加密值。如果您需要版本控制/动态拉取,甚至可以将其保存在专用回购中,或者仅使用shell脚本将其复制/从适当的地方拉出。
  • 在部署脚本期间从git中提取
  • 在git pull之后,通过使用yaml库在纯python中读写来修改app.yaml

最简单的方法是使用持续集成服务器,例如HudsonBambooJenkins。只需添加一些插件,脚本步骤或工作流程即可完成我提到的所有上述项目。例如,您可以传入在Bamboo本身中配置的环境变量。

总之,在您只能访问的环境中,只需在构建过程中输入值即可。如果您尚未使构建自动化,则应该这样做。

另一个选项就是您所说的内容,将其放入数据库中。如果您不这样做的原因是操作太慢,则只需将值作为第二层缓存推送到内存缓存中,然后将值作为第一层缓存固定到实例即可。如果值可以更改并且您需要在不重新启动实例的情况下更新实例,则只需保留一个散列即可检查它们何时更改,或者在您进行某些更改后以某种方式触发它。应该的。

It sounds like you can do a few approaches. We have a similar issue and do the following (adapted to your use-case):

  • Create a file that stores any dynamic app.yaml values and place it on a secure server in your build environment. If you are really paranoid, you can asymmetrically encrypt the values. You can even keep this in a private repo if you need version control/dynamic pulling, or just use a shells script to copy it/pull it from the appropriate place.
  • Pull from git during the deployment script
  • After the git pull, modify the app.yaml by reading and writing it in pure python using a yaml library

The easiest way to do this is to use a continuous integration server such as Hudson, Bamboo, or Jenkins. Simply add some plug-in, script step, or workflow that does all the above items I mentioned. You can pass in environment variables that are configured in Bamboo itself for example.

In summary, just push in the values during your build process in an environment you only have access to. If you aren’t already automating your builds, you should be.

Another option option is what you said, put it in the database. If your reason for not doing that is that things are too slow, simply push the values into memcache as a 2nd layer cache, and pin the values to the instances as a first-layer cache. If the values can change and you need to update the instances without rebooting them, just keep a hash you can check to know when they change or trigger it somehow when something you do changes the values. That should be it.


回答 8

您应该使用google kms加密变量,并将其嵌入到源代码中。(https://cloud.google.com/kms/

echo -n the-twitter-app-key | gcloud kms encrypt \
> --project my-project \
> --location us-central1 \
> --keyring THEKEYRING \
> --key THECRYPTOKEY \
> --plaintext-file - \
> --ciphertext-file - \
> | base64

将加扰后的值(加密并以base64编码)放入您的环境变量(在yaml文件中)。

一些Python式代码可帮助您开始解密。

kms_client = kms_v1.KeyManagementServiceClient()
name = kms_client.crypto_key_path_path("project", "global", "THEKEYRING", "THECRYPTOKEY")

twitter_app_key = kms_client.decrypt(name, base64.b64decode(os.environ.get("TWITTER_APP_KEY"))).plaintext

You should encrypt the variables with google kms and embed it in your source code. (https://cloud.google.com/kms/)

echo -n the-twitter-app-key | gcloud kms encrypt \
> --project my-project \
> --location us-central1 \
> --keyring THEKEYRING \
> --key THECRYPTOKEY \
> --plaintext-file - \
> --ciphertext-file - \
> | base64

put the scrambled (encrypted and base64 encoded) value into your environment variable (in yaml file).

Some pythonish code to get you started on decrypting.

kms_client = kms_v1.KeyManagementServiceClient()
name = kms_client.crypto_key_path_path("project", "global", "THEKEYRING", "THECRYPTOKEY")

twitter_app_key = kms_client.decrypt(name, base64.b64decode(os.environ.get("TWITTER_APP_KEY"))).plaintext

回答 9

@Jason F 基于使用Google数据存储的答案很接近,但是基于库docs上的示例用法,代码有些过时了。这是对我有用的代码片段:

from google.cloud import datastore

client = datastore.Client('<your project id>')
key = client.key('<kind e.g settings>', '<entity name>') # note: entity name not property
# get by key for this entity
result = client.get(key)
print(result) # prints all the properties ( a dict). index a specific value like result['MY_SECRET_KEY'])

部分受此中篇文章的启发

@Jason F’s answer based on using Google Datastore is close, but the code is a bit outdated based on the sample usage on the library docs. Here’s the snippet that worked for me:

from google.cloud import datastore

client = datastore.Client('<your project id>')
key = client.key('<kind e.g settings>', '<entity name>') # note: entity name not property
# get by key for this entity
result = client.get(key)
print(result) # prints all the properties ( a dict). index a specific value like result['MY_SECRET_KEY'])

Partly inspired by this Medium post


回答 10

只是想说明一下我是如何在javascript / nodejs中解决此问题的。对于本地开发,我使用了“ dotenv” npm软件包,该软件包将环境变量从.env文件加载到process.env中。当我开始使用GAE时,我了解到需要在“ app.yaml”文件中设置环境变量。好吧,我不想将’dotenv’用于本地开发,而不想将’app.yaml’用于GAE(并在两个文件之间复制我的环境变量),所以我编写了一个小脚本,将app.yaml环境变量加载到进程中.env,用于本地开发。希望这对某人有帮助:

yaml_env.js:

(function () {
    const yaml = require('js-yaml');
    const fs = require('fs');
    const isObject = require('lodash.isobject')

    var doc = yaml.safeLoad(
        fs.readFileSync('app.yaml', 'utf8'), 
        { json: true }
    );

    // The .env file will take precedence over the settings the app.yaml file
    // which allows me to override stuff in app.yaml (the database connection string (DATABASE_URL), for example)
    // This is optional of course. If you don't use dotenv then remove this line:
    require('dotenv/config');

    if(isObject(doc) && isObject(doc.env_variables)) {
        Object.keys(doc.env_variables).forEach(function (key) {
            // Dont set environment with the yaml file value if it's already set
            process.env[key] = process.env[key] || doc.env_variables[key]
        })
    }
})()

现在,尽早将此代码包含在您的代码中,您已完成:

require('../yaml_env')

Just wanted to note how I solved this problem in javascript/nodejs. For local development I used the ‘dotenv’ npm package which loads environment variables from a .env file into process.env. When I started using GAE I learned that environment variables need to be set in a ‘app.yaml’ file. Well, I didn’t want to use ‘dotenv’ for local development and ‘app.yaml’ for GAE (and duplicate my environment variables between the two files), so I wrote a little script that loads app.yaml environment variables into process.env, for local development. Hope this helps someone:

yaml_env.js:

(function () {
    const yaml = require('js-yaml');
    const fs = require('fs');
    const isObject = require('lodash.isobject')

    var doc = yaml.safeLoad(
        fs.readFileSync('app.yaml', 'utf8'), 
        { json: true }
    );

    // The .env file will take precedence over the settings the app.yaml file
    // which allows me to override stuff in app.yaml (the database connection string (DATABASE_URL), for example)
    // This is optional of course. If you don't use dotenv then remove this line:
    require('dotenv/config');

    if(isObject(doc) && isObject(doc.env_variables)) {
        Object.keys(doc.env_variables).forEach(function (key) {
            // Dont set environment with the yaml file value if it's already set
            process.env[key] = process.env[key] || doc.env_variables[key]
        })
    }
})()

Now include this file as early as possible in your code, and you’re done:

require('../yaml_env')

回答 11

扩展马丁的答案

from google.appengine.ext import ndb

class Settings(ndb.Model):
    """
    Get sensitive data setting from DataStore.

    key:String -> value:String
    key:String -> Exception

    Thanks to: Martin Omander @ Stackoverflow
    https://stackoverflow.com/a/35261091/1463812
    """
    name = ndb.StringProperty()
    value = ndb.StringProperty()

    @staticmethod
    def get(name):
        retval = Settings.query(Settings.name == name).get()
        if not retval:
            raise Exception(('Setting %s not found in the database. A placeholder ' +
                             'record has been created. Go to the Developers Console for your app ' +
                             'in App Engine, look up the Settings record with name=%s and enter ' +
                             'its value in that record\'s value field.') % (name, name))
        return retval.value

    @staticmethod
    def set(name, value):
        exists = Settings.query(Settings.name == name).get()
        if not exists:
            s = Settings(name=name, value=value)
            s.put()
        else:
            exists.value = value
            exists.put()

        return True

Extending Martin’s answer

from google.appengine.ext import ndb

class Settings(ndb.Model):
    """
    Get sensitive data setting from DataStore.

    key:String -> value:String
    key:String -> Exception

    Thanks to: Martin Omander @ Stackoverflow
    https://stackoverflow.com/a/35261091/1463812
    """
    name = ndb.StringProperty()
    value = ndb.StringProperty()

    @staticmethod
    def get(name):
        retval = Settings.query(Settings.name == name).get()
        if not retval:
            raise Exception(('Setting %s not found in the database. A placeholder ' +
                             'record has been created. Go to the Developers Console for your app ' +
                             'in App Engine, look up the Settings record with name=%s and enter ' +
                             'its value in that record\'s value field.') % (name, name))
        return retval.value

    @staticmethod
    def set(name, value):
        exists = Settings.query(Settings.name == name).get()
        if not exists:
            s = Settings(name=name, value=value)
            s.put()
        else:
            exists.value = value
            exists.put()

        return True

回答 12

有一个名为gae_env的pypi软件包,可让您将Appengine环境变量保存在Cloud Datastore中。在后台,它还使用Memcache,因此其速度很快

用法:

import gae_env

API_KEY = gae_env.get('API_KEY')

如果数据存储中有该键的值,则将其返回。如果没有,__NOT_SET__将创建一个占位符记录并ValueNotSetError抛出一个。该异常将提醒您转到开发人员控制台并更新占位符记录。


与Martin的答案类似,这是如何更新数据存储区中键的值:

  1. 转到开发人员控制台中的“ 数据存储”部分

  2. 如果尚未选择项目,请在页面顶部选择它。

  3. 在“ 种类”下拉框中,选择GaeEnvSettings

  4. 引发异常的键将具有价值__NOT_SET__


转到软件包的GitHub页面以获取有关用法/配置的更多信息

There is a pypi package called gae_env that allows you to save appengine environment variables in Cloud Datastore. Under the hood, it also uses Memcache so its fast

Usage:

import gae_env

API_KEY = gae_env.get('API_KEY')

If there is a value for that key in the datastore, it will be returned. If there isn’t, a placeholder record __NOT_SET__ will be created and a ValueNotSetError will be thrown. The exception will remind you to go to the Developers Console and update the placeholder record.


Similar to Martin’s answer, here is how to update the value for the key in Datastore:

  1. Go to Datastore Section in the developers console

  2. Select your project at the top of the page if it’s not already selected.

  3. In the Kind dropdown box, select GaeEnvSettings.

  4. Keys for which an exception was raised will have value __NOT_SET__.


Go to the package’s GitHub page for more info on usage/configuration


如何在Python中使用正则表达式验证URL?

问题:如何在Python中使用正则表达式验证URL?

我正在Google App Engine上构建应用程序。我是Python的新手,在过去3天里,我一直对下面的问题problem之以鼻。

我有一个表示RSS Feed的类,在这个类中,我有一个名为setUrl的方法。输入此方法的是URL。

我正在尝试使用re python模块来验证RFC 3986 Reg-ex(http://www.ietf.org/rfc/rfc3986.txt

下面是一个片段,哪个应该工作?

p = re.compile('^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?')
m = p.match(url)
if m:
  self.url = url
  return url

I’m building an app on Google App Engine. I’m incredibly new to Python and have been beating my head against the following problem for the past 3 days.

I have a class to represent an RSS Feed and in this class I have a method called setUrl. Input to this method is a URL.

I’m trying to use the re python module to validate off of the RFC 3986 Reg-ex (http://www.ietf.org/rfc/rfc3986.txt)

Below is a snipped which should work?

p = re.compile('^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?')
m = p.match(url)
if m:
  self.url = url
  return url

回答 0

urlparsepy2py3)模块是解析(和验证)URL的一种简单方法。

正则表达式是太多的工作。


没有“验证”方法,因为几乎所有内容都是有效的URL。有一些标点符号规则可以将其拆分。没有标点符号,您仍然有一个有效的URL。

仔细检查RFC,看看是否可以构造“无效” URL。规则非常灵活。

例如:::::,一个有效的URL。路径是":::::"。漂亮的文件名,但是有效的文件名。

此外,/////也是有效的网址。netloc(“主机名”)为""。路径是"///"。再次,愚蠢。也有效。此URL规范化为"///"等效的URL 。

类似的东西"bad://///worse/////"是完全有效的。哑巴但有效。

底线。对其进行解析,然后查看片段,看看它们是否在某种程度上令人不快。

您是否希望方案始终为“ http”?您是否希望netloc始终为“ www.somename.somedomain”?您是否要让路径看起来像Unix?还是像窗户?是否要删除查询字符串?还是保留它?

这些不是RFC指定的验证。这些是您的应用程序独有的验证。

An easy way to parse (and validate) URL’s is the urlparse (py2, py3) module.

A regex is too much work.


There’s no “validate” method because almost anything is a valid URL. There are some punctuation rules for splitting it up. Absent any punctuation, you still have a valid URL.

Check the RFC carefully and see if you can construct an “invalid” URL. The rules are very flexible.

For example ::::: is a valid URL. The path is ":::::". A pretty stupid filename, but a valid filename.

Also, ///// is a valid URL. The netloc (“hostname”) is "". The path is "///". Again, stupid. Also valid. This URL normalizes to "///" which is the equivalent.

Something like "bad://///worse/////" is perfectly valid. Dumb but valid.

Bottom Line. Parse it, and look at the pieces to see if they’re displeasing in some way.

Do you want the scheme to always be “http”? Do you want the netloc to always be “www.somename.somedomain”? Do you want the path to look unix-like? Or windows-like? Do you want to remove the query string? Or preserve it?

These are not RFC-specified validations. These are validations unique to your application.


回答 1

这是解析URL的完整正则表达式。

(?:http://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.
)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)
){3}))(?::(?:\d+))?)(?:/(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F
\d]{2}))|[;:@&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{
2}))|[;:@&=])*))*)(?:\?(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{
2}))|[;:@&=])*))?)?)|(?:ftp://(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?
:%[a-fA-F\d]{2}))|[;?&=])*)(?::(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-
fA-F\d]{2}))|[;?&=])*))?@)?(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-
)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?
:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?))(?:/(?:(?:(?:(?:[a-zA-Z\d$\-_.+!
*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'()
,]|(?:%[a-fA-F\d]{2}))|[?:@&=])*))*)(?:;type=[AIDaid])?)?)|(?:news:(?:
(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;/?:&=])+@(?:(?:(
?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[
a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3})))|(?:[a-zA-Z](
?:[a-zA-Z\d]|[_.+-])*)|\*))|(?:nntp://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[
a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d
])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?)/(?:[a-zA-Z](?:[a-zA-Z
\d]|[_.+-])*)(?:/(?:\d+))?)|(?:telnet://(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+
!*'(),]|(?:%[a-fA-F\d]{2}))|[;?&=])*)(?::(?:(?:(?:[a-zA-Z\d$\-_.+!*'()
,]|(?:%[a-fA-F\d]{2}))|[;?&=])*))?@)?(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a
-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d]
)?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?))/?)|(?:gopher://(?:(?:
(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:
(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+
))?)(?:/(?:[a-zA-Z\d$\-_.+!*'(),;/?:@&=]|(?:%[a-fA-F\d]{2}))(?:(?:(?:[
a-zA-Z\d$\-_.+!*'(),;/?:@&=]|(?:%[a-fA-F\d]{2}))*)(?:%09(?:(?:(?:[a-zA
-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;:@&=])*)(?:%09(?:(?:[a-zA-Z\d$
\-_.+!*'(),;/?:@&=]|(?:%[a-fA-F\d]{2}))*))?)?)?)?)|(?:wais://(?:(?:(?:
(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:
[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?
)/(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)(?:(?:/(?:(?:[a-zA
-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)/(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(
?:%[a-fA-F\d]{2}))*))|\?(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]
{2}))|[;:@&=])*))?)|(?:mailto:(?:(?:[a-zA-Z\d$\-_.+!*'(),;/?:@&=]|(?:%
[a-fA-F\d]{2}))+))|(?:file://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]
|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:
(?:\d+)(?:\.(?:\d+)){3}))|localhost)?/(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'()
,]|(?:%[a-fA-F\d]{2}))|[?:@&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(
?:%[a-fA-F\d]{2}))|[?:@&=])*))*))|(?:prospero://(?:(?:(?:(?:(?:[a-zA-Z
\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)
*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?)/(?:(?:(?:(?
:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&=])*)(?:/(?:(?:(?:[a-
zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&=])*))*)(?:(?:;(?:(?:(?:[
a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&])*)=(?:(?:(?:[a-zA-Z\d
$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&])*)))*)|(?:ldap://(?:(?:(?:(?:
(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:
[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?
))?/(?:(?:(?:(?:(?:(?:(?:[a-zA-Z\d]|%(?:3\d|[46][a-fA-F\d]|[57][Aa\d])
)|(?:%20))+|(?:OID|oid)\.(?:(?:\d+)(?:\.(?:\d+))*))(?:(?:%0[Aa])?(?:%2
0)*)=(?:(?:%0[Aa])?(?:%20)*))?(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F
\d]{2}))*))(?:(?:(?:%0[Aa])?(?:%20)*)\+(?:(?:%0[Aa])?(?:%20)*)(?:(?:(?
:(?:(?:[a-zA-Z\d]|%(?:3\d|[46][a-fA-F\d]|[57][Aa\d]))|(?:%20))+|(?:OID
|oid)\.(?:(?:\d+)(?:\.(?:\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])
?(?:%20)*))?(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)))*)(?:(
?:(?:(?:%0[Aa])?(?:%20)*)(?:[;,])(?:(?:%0[Aa])?(?:%20)*))(?:(?:(?:(?:(
?:(?:[a-zA-Z\d]|%(?:3\d|[46][a-fA-F\d]|[57][Aa\d]))|(?:%20))+|(?:OID|o
id)\.(?:(?:\d+)(?:\.(?:\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])?(
?:%20)*))?(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*))(?:(?:(?:
%0[Aa])?(?:%20)*)\+(?:(?:%0[Aa])?(?:%20)*)(?:(?:(?:(?:(?:[a-zA-Z\d]|%(
?:3\d|[46][a-fA-F\d]|[57][Aa\d]))|(?:%20))+|(?:OID|oid)\.(?:(?:\d+)(?:
\.(?:\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])?(?:%20)*))?(?:(?:[a
-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)))*))*(?:(?:(?:%0[Aa])?(?:%2
0)*)(?:[;,])(?:(?:%0[Aa])?(?:%20)*))?)(?:\?(?:(?:(?:(?:[a-zA-Z\d$\-_.+
!*'(),]|(?:%[a-fA-F\d]{2}))+)(?:,(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-f
A-F\d]{2}))+))*)?)(?:\?(?:base|one|sub)(?:\?(?:((?:[a-zA-Z\d$\-_.+!*'(
),;/?:@&=]|(?:%[a-fA-F\d]{2}))+)))?)?)?)|(?:(?:z39\.50[rs])://(?:(?:(?
:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?
:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))
?)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+)(?:\+(?:(?:
[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+))*(?:\?(?:(?:[a-zA-Z\d$\-_
.+!*'(),]|(?:%[a-fA-F\d]{2}))+))?)?(?:;esn=(?:(?:[a-zA-Z\d$\-_.+!*'(),
]|(?:%[a-fA-F\d]{2}))+))?(?:;rs=(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA
-F\d]{2}))+)(?:\+(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+))*)
?))|(?:cid:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;?:@&=
])*))|(?:mid:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;?:@
&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;?:@&=]
)*))?)|(?:vemmi://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z
\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\
.(?:\d+)){3}))(?::(?:\d+))?)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a
-fA-F\d]{2}))|[/?:@&=])*)(?:(?:;(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a
-fA-F\d]{2}))|[/?:@&])*)=(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d
]{2}))|[/?:@&])*))*))?)|(?:imap://(?:(?:(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+
!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~])+)(?:(?:;[Aa][Uu][Tt][Hh]=(?:\*|(?:(
?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~])+))))?)|(?:(?:;[
Aa][Uu][Tt][Hh]=(?:\*|(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2
}))|[&=~])+)))(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[
&=~])+))?))@)?(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])
?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:
\d+)){3}))(?::(?:\d+))?))/(?:(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:
%[a-fA-F\d]{2}))|[&=~:@/])+)?;[Tt][Yy][Pp][Ee]=(?:[Ll](?:[Ii][Ss][Tt]|
[Ss][Uu][Bb])))|(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))
|[&=~:@/])+)(?:\?(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[
&=~:@/])+))?(?:(?:;[Uu][Ii][Dd][Vv][Aa][Ll][Ii][Dd][Ii][Tt][Yy]=(?:[1-
9]\d*)))?)|(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~
:@/])+)(?:(?:;[Uu][Ii][Dd][Vv][Aa][Ll][Ii][Dd][Ii][Tt][Yy]=(?:[1-9]\d*
)))?(?:/;[Uu][Ii][Dd]=(?:[1-9]\d*))(?:(?:/;[Ss][Ee][Cc][Tt][Ii][Oo][Nn
]=(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~:@/])+)))?))
)?)|(?:nfs:(?:(?://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-
Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:
\.(?:\d+)){3}))(?::(?:\d+))?)(?:(?:/(?:(?:(?:(?:(?:[a-zA-Z\d\$\-_.!~*'
(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*)(?:/(?:(?:(?:[a-zA-Z\d\$\-_.!~*'(),
])|(?:%[a-fA-F\d]{2})|[:@&=+])*))*)?)))?)|(?:/(?:(?:(?:(?:(?:[a-zA-Z\d
\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*)(?:/(?:(?:(?:[a-zA-Z\d\$\
-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*))*)?))|(?:(?:(?:(?:(?:[a-zA-
Z\d\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*)(?:/(?:(?:(?:[a-zA-Z\d
\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*))*)?)))

考虑到它的复杂性,我认为您应该使用urlparse方法。

为了完整起见,以下是上述正则表达式的伪BNF(作为文档):

; URL的通用形式为:

genericurl =方案“:” schemepart

; 这里定义了特定的预定义方案。新计划
; 可能已在IANA上注册

url = httpurl | ftpurl | 新闻网址|
                 nntpurl | telneturl | gopherurl |
                 waisurl | mailtourl | fileurl |
                 繁荣| 其他网址

; 新方案遵循通用语法
otherurl =通用网址

; 该计划是小写的;口译员应忽略大小写
方案= 1 * [lowalpha | 数字| “ +” | “-” | “。” ]
schemepart = * xchar | ip方案


; 基于ip协议的URL scheme部分:

ip-schemepart =“ //”登录[“ /” urlpath]

登录名= [用户[“:”密码]“ @”]主机端口
hostport =主机[“:”端口]
主机=主机名| 主机号码
主机名= * [domainlabel“。” ] toplabel
domainlabel =字母数字| 字母数字* [字母数字| “-”]字母数字
toplabel = alpha | 字母* [字母数字| “-”]字母数字
字母数字=字母| 数字
主机号=数字“。” 数字“。” 数字“。” 数字
端口=数字
用户= * [uchar | “;” | “?” | “&” | “ =”]
密码= * [uchar | “;” | “?” | “&” | “ =”]
urlpath = * xchar; 取决于协议,请参阅第3.1节

; 预定义的方案:

; FTP(另请参阅RFC959)

ftpurl =“ ftp://”登录[“ /” fpath [“; type =” ftptype]]
fpath = fsegment * [“ /” fsegment]
fsegment = * [uchar | “?” | “:” | “ @” | “&” | “ =”]
ftptype =“ A” | “我” | “ D” | “ a” | “我” | “ d”

; 文件

fileurl =“ file://” [主机| “本地主机”]“ /” fpath

; HTTP

httpurl =“ http://”主机端口[“ /” hpath [“?” 搜索]]
hpath = hsegment * [“ /” hsegment]
hsegment = * [uchar | “;” | “:” | “ @” | “&” | “ =”]
搜索= * [uchar | “;” | “:” | “ @” | “&” | “ =”]

; GOPHER(另请参阅RFC1436)

gopherurl =“ gopher://”主机端口[/ [gtype [选择器
                 [“%09”搜索[“%09” gopher + _string]]]]]
gtype = xchar
选择器= * xchar
gopher + _string = * xchar

; MAILTO(另请参阅RFC822)

mailtourl =“ mailto:”已编码822addr
encode822addr = 1 * xchar; 在RFC822中进一步定义

; 新闻(另请参阅RFC1036)

newsurl =“新闻:” grouppart
grouppart =“ *” | 组| 文章
组= alpha * [alpha | 数字| “-” | “。” | “ +” | “ _”]
文章= 1 * [uchar | “;” | “ /” | “?” | “:” | “&” | “ =”]“ @”主机

; NNTP(另请参阅RFC977)

nntpurl =“ nntp://”主机端口“ /”组[“ /”数字]

; 电信网

telneturl =“ telnet://”登录[“ /”]

; WAIS(另请参阅RFC1625)

waisurl = wais数据库| waisindex | 怀斯多克
waisdatabase =“ wais://”主机端口“ /”数据库
waisindex =“ wais://”主机端口“ /”数据库“?” 搜索
waisdoc =“ wais://”主机端口“ /”数据库“ /” wtype“ /” wpath
数据库= * uchar
wtype = * uchar
wpath = * uchar

; PROSPERO

prosperourl =“ prospero://”主机端口“ /” ppath * [fieldspec]
ppath = psegment * [“ /” psegment]
psegment = * [uchar | “?” | “:” | “ @” | “&” | “ =”]
fieldspec =“;” fieldname“ =” fieldvalue
fieldname = * [uchar | “?” | “:” | “ @” | “&”]
fieldvalue = * [uchar | “?” | “:” | “ @” | “&”]

; 其他定义

lowalpha =“ a” | “ b” | “ c” | “ d” | “ e” | “ f” | “ g” | “ h” |
                 “我” | “ j” | “ k” | “ l” | “ m” | “ n” | “ o” | “ p” |
                 “ q” | “ r” | “ s” | “ t” | “ u” | “ v” | “ w” | “ x” |
                 “ y” | “ z”
hialpha =“ A” | “ B” | “ C” | “ D” | “ E” | “ F” | “ G” | “ H” | “我” |
                 “ J” | “ K” | “ L” | “ M” | “ N” | “ O” | “ P” | “ Q” | “ R” |
                 “ S” | “ T” | “ U” | “ V” | “ W” | “ X” | “ Y” | “ Z”
alpha = lowalpha | Hialpha
digit =“ 0” | “ 1” | “ 2” | “ 3” | “ 4” | “ 5” | “ 6” | “ 7” |
                 “ 8” | “ 9”
安全=“ $” | “-” | “ _” | “。” | “ +”
extra =“!” | “ *” | “'” | “(” |“)” | “,”
national =“ {” | “}” | “ |” | “ \” | “ ^” | “〜” | “ [” | “]” | “`”
标点符号=“” | “#” | “%” |


保留=“;” | “ /” | “?” | “:” | “ @” | “&” | “ =”
十六进制=数字| “ A” | “ B” | “ C” | “ D” | “ E” | “ F” |
                 “ a” | “ b” | “ c” | “ d” | “ e” | “F”
转义=“%”十六进制十六进制

未保留= alpha | 数字| 安全| 额外
uchar =保留| 逃逸
xchar =保留| 保留| 逃逸
位数= 1 *位数

Here’s the complete regexp to parse a URL.

(?:http://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.
)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)
){3}))(?::(?:\d+))?)(?:/(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F
\d]{2}))|[;:@&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{
2}))|[;:@&=])*))*)(?:\?(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{
2}))|[;:@&=])*))?)?)|(?:ftp://(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?
:%[a-fA-F\d]{2}))|[;?&=])*)(?::(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-
fA-F\d]{2}))|[;?&=])*))?@)?(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-
)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?
:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?))(?:/(?:(?:(?:(?:[a-zA-Z\d$\-_.+!
*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'()
,]|(?:%[a-fA-F\d]{2}))|[?:@&=])*))*)(?:;type=[AIDaid])?)?)|(?:news:(?:
(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;/?:&=])+@(?:(?:(
?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[
a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3})))|(?:[a-zA-Z](
?:[a-zA-Z\d]|[_.+-])*)|\*))|(?:nntp://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[
a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d
])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?)/(?:[a-zA-Z](?:[a-zA-Z
\d]|[_.+-])*)(?:/(?:\d+))?)|(?:telnet://(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+
!*'(),]|(?:%[a-fA-F\d]{2}))|[;?&=])*)(?::(?:(?:(?:[a-zA-Z\d$\-_.+!*'()
,]|(?:%[a-fA-F\d]{2}))|[;?&=])*))?@)?(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a
-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d]
)?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?))/?)|(?:gopher://(?:(?:
(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:
(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+
))?)(?:/(?:[a-zA-Z\d$\-_.+!*'(),;/?:@&=]|(?:%[a-fA-F\d]{2}))(?:(?:(?:[
a-zA-Z\d$\-_.+!*'(),;/?:@&=]|(?:%[a-fA-F\d]{2}))*)(?:%09(?:(?:(?:[a-zA
-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;:@&=])*)(?:%09(?:(?:[a-zA-Z\d$
\-_.+!*'(),;/?:@&=]|(?:%[a-fA-F\d]{2}))*))?)?)?)?)|(?:wais://(?:(?:(?:
(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:
[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?
)/(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)(?:(?:/(?:(?:[a-zA
-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)/(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(
?:%[a-fA-F\d]{2}))*))|\?(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]
{2}))|[;:@&=])*))?)|(?:mailto:(?:(?:[a-zA-Z\d$\-_.+!*'(),;/?:@&=]|(?:%
[a-fA-F\d]{2}))+))|(?:file://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]
|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:
(?:\d+)(?:\.(?:\d+)){3}))|localhost)?/(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'()
,]|(?:%[a-fA-F\d]{2}))|[?:@&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(
?:%[a-fA-F\d]{2}))|[?:@&=])*))*))|(?:prospero://(?:(?:(?:(?:(?:[a-zA-Z
\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)
*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?)/(?:(?:(?:(?
:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&=])*)(?:/(?:(?:(?:[a-
zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&=])*))*)(?:(?:;(?:(?:(?:[
a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&])*)=(?:(?:(?:[a-zA-Z\d
$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&])*)))*)|(?:ldap://(?:(?:(?:(?:
(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:
[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?
))?/(?:(?:(?:(?:(?:(?:(?:[a-zA-Z\d]|%(?:3\d|[46][a-fA-F\d]|[57][Aa\d])
)|(?:%20))+|(?:OID|oid)\.(?:(?:\d+)(?:\.(?:\d+))*))(?:(?:%0[Aa])?(?:%2
0)*)=(?:(?:%0[Aa])?(?:%20)*))?(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F
\d]{2}))*))(?:(?:(?:%0[Aa])?(?:%20)*)\+(?:(?:%0[Aa])?(?:%20)*)(?:(?:(?
:(?:(?:[a-zA-Z\d]|%(?:3\d|[46][a-fA-F\d]|[57][Aa\d]))|(?:%20))+|(?:OID
|oid)\.(?:(?:\d+)(?:\.(?:\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])
?(?:%20)*))?(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)))*)(?:(
?:(?:(?:%0[Aa])?(?:%20)*)(?:[;,])(?:(?:%0[Aa])?(?:%20)*))(?:(?:(?:(?:(
?:(?:[a-zA-Z\d]|%(?:3\d|[46][a-fA-F\d]|[57][Aa\d]))|(?:%20))+|(?:OID|o
id)\.(?:(?:\d+)(?:\.(?:\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])?(
?:%20)*))?(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*))(?:(?:(?:
%0[Aa])?(?:%20)*)\+(?:(?:%0[Aa])?(?:%20)*)(?:(?:(?:(?:(?:[a-zA-Z\d]|%(
?:3\d|[46][a-fA-F\d]|[57][Aa\d]))|(?:%20))+|(?:OID|oid)\.(?:(?:\d+)(?:
\.(?:\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])?(?:%20)*))?(?:(?:[a
-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)))*))*(?:(?:(?:%0[Aa])?(?:%2
0)*)(?:[;,])(?:(?:%0[Aa])?(?:%20)*))?)(?:\?(?:(?:(?:(?:[a-zA-Z\d$\-_.+
!*'(),]|(?:%[a-fA-F\d]{2}))+)(?:,(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-f
A-F\d]{2}))+))*)?)(?:\?(?:base|one|sub)(?:\?(?:((?:[a-zA-Z\d$\-_.+!*'(
),;/?:@&=]|(?:%[a-fA-F\d]{2}))+)))?)?)?)|(?:(?:z39\.50[rs])://(?:(?:(?
:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?
:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))
?)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+)(?:\+(?:(?:
[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+))*(?:\?(?:(?:[a-zA-Z\d$\-_
.+!*'(),]|(?:%[a-fA-F\d]{2}))+))?)?(?:;esn=(?:(?:[a-zA-Z\d$\-_.+!*'(),
]|(?:%[a-fA-F\d]{2}))+))?(?:;rs=(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA
-F\d]{2}))+)(?:\+(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+))*)
?))|(?:cid:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;?:@&=
])*))|(?:mid:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;?:@
&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;?:@&=]
)*))?)|(?:vemmi://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z
\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\
.(?:\d+)){3}))(?::(?:\d+))?)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a
-fA-F\d]{2}))|[/?:@&=])*)(?:(?:;(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a
-fA-F\d]{2}))|[/?:@&])*)=(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d
]{2}))|[/?:@&])*))*))?)|(?:imap://(?:(?:(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+
!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~])+)(?:(?:;[Aa][Uu][Tt][Hh]=(?:\*|(?:(
?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~])+))))?)|(?:(?:;[
Aa][Uu][Tt][Hh]=(?:\*|(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2
}))|[&=~])+)))(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[
&=~])+))?))@)?(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])
?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:
\d+)){3}))(?::(?:\d+))?))/(?:(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:
%[a-fA-F\d]{2}))|[&=~:@/])+)?;[Tt][Yy][Pp][Ee]=(?:[Ll](?:[Ii][Ss][Tt]|
[Ss][Uu][Bb])))|(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))
|[&=~:@/])+)(?:\?(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[
&=~:@/])+))?(?:(?:;[Uu][Ii][Dd][Vv][Aa][Ll][Ii][Dd][Ii][Tt][Yy]=(?:[1-
9]\d*)))?)|(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~
:@/])+)(?:(?:;[Uu][Ii][Dd][Vv][Aa][Ll][Ii][Dd][Ii][Tt][Yy]=(?:[1-9]\d*
)))?(?:/;[Uu][Ii][Dd]=(?:[1-9]\d*))(?:(?:/;[Ss][Ee][Cc][Tt][Ii][Oo][Nn
]=(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~:@/])+)))?))
)?)|(?:nfs:(?:(?://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-
Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:
\.(?:\d+)){3}))(?::(?:\d+))?)(?:(?:/(?:(?:(?:(?:(?:[a-zA-Z\d\$\-_.!~*'
(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*)(?:/(?:(?:(?:[a-zA-Z\d\$\-_.!~*'(),
])|(?:%[a-fA-F\d]{2})|[:@&=+])*))*)?)))?)|(?:/(?:(?:(?:(?:(?:[a-zA-Z\d
\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*)(?:/(?:(?:(?:[a-zA-Z\d\$\
-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*))*)?))|(?:(?:(?:(?:(?:[a-zA-
Z\d\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*)(?:/(?:(?:(?:[a-zA-Z\d
\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*))*)?)))

Given its complexibility, I think you should go the urlparse way.

For completeness, here’s the pseudo-BNF of the above regex (as a documentation):

; The generic form of a URL is:

genericurl     = scheme ":" schemepart

; Specific predefined schemes are defined here; new schemes
; may be registered with IANA

url            = httpurl | ftpurl | newsurl |
                 nntpurl | telneturl | gopherurl |
                 waisurl | mailtourl | fileurl |
                 prosperourl | otherurl

; new schemes follow the general syntax
otherurl       = genericurl

; the scheme is in lower case; interpreters should use case-ignore
scheme         = 1*[ lowalpha | digit | "+" | "-" | "." ]
schemepart     = *xchar | ip-schemepart


; URL schemeparts for ip based protocols:

ip-schemepart  = "//" login [ "/" urlpath ]

login          = [ user [ ":" password ] "@" ] hostport
hostport       = host [ ":" port ]
host           = hostname | hostnumber
hostname       = *[ domainlabel "." ] toplabel
domainlabel    = alphadigit | alphadigit *[ alphadigit | "-" ] alphadigit
toplabel       = alpha | alpha *[ alphadigit | "-" ] alphadigit
alphadigit     = alpha | digit
hostnumber     = digits "." digits "." digits "." digits
port           = digits
user           = *[ uchar | ";" | "?" | "&" | "=" ]
password       = *[ uchar | ";" | "?" | "&" | "=" ]
urlpath        = *xchar    ; depends on protocol see section 3.1

; The predefined schemes:

; FTP (see also RFC959)

ftpurl         = "ftp://" login [ "/" fpath [ ";type=" ftptype ]]
fpath          = fsegment *[ "/" fsegment ]
fsegment       = *[ uchar | "?" | ":" | "@" | "&" | "=" ]
ftptype        = "A" | "I" | "D" | "a" | "i" | "d"

; FILE

fileurl        = "file://" [ host | "localhost" ] "/" fpath

; HTTP

httpurl        = "http://" hostport [ "/" hpath [ "?" search ]]
hpath          = hsegment *[ "/" hsegment ]
hsegment       = *[ uchar | ";" | ":" | "@" | "&" | "=" ]
search         = *[ uchar | ";" | ":" | "@" | "&" | "=" ]

; GOPHER (see also RFC1436)

gopherurl      = "gopher://" hostport [ / [ gtype [ selector
                 [ "%09" search [ "%09" gopher+_string ] ] ] ] ]
gtype          = xchar
selector       = *xchar
gopher+_string = *xchar

; MAILTO (see also RFC822)

mailtourl      = "mailto:" encoded822addr
encoded822addr = 1*xchar               ; further defined in RFC822

; NEWS (see also RFC1036)

newsurl        = "news:" grouppart
grouppart      = "*" | group | article
group          = alpha *[ alpha | digit | "-" | "." | "+" | "_" ]
article        = 1*[ uchar | ";" | "/" | "?" | ":" | "&" | "=" ] "@" host

; NNTP (see also RFC977)

nntpurl        = "nntp://" hostport "/" group [ "/" digits ]

; TELNET

telneturl      = "telnet://" login [ "/" ]

; WAIS (see also RFC1625)

waisurl        = waisdatabase | waisindex | waisdoc
waisdatabase   = "wais://" hostport "/" database
waisindex      = "wais://" hostport "/" database "?" search
waisdoc        = "wais://" hostport "/" database "/" wtype "/" wpath
database       = *uchar
wtype          = *uchar
wpath          = *uchar

; PROSPERO

prosperourl    = "prospero://" hostport "/" ppath *[ fieldspec ]
ppath          = psegment *[ "/" psegment ]
psegment       = *[ uchar | "?" | ":" | "@" | "&" | "=" ]
fieldspec      = ";" fieldname "=" fieldvalue
fieldname      = *[ uchar | "?" | ":" | "@" | "&" ]
fieldvalue     = *[ uchar | "?" | ":" | "@" | "&" ]

; Miscellaneous definitions

lowalpha       = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" |
                 "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
                 "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
                 "y" | "z"
hialpha        = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
                 "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
                 "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
alpha          = lowalpha | hialpha
digit          = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
                 "8" | "9"
safe           = "$" | "-" | "_" | "." | "+"
extra          = "!" | "*" | "'" | "(" | ")" | ","
national       = "{" | "}" | "|" | "\" | "^" | "~" | "[" | "]" | "`"
punctuation    = "" | "#" | "%" | 


reserved       = ";" | "/" | "?" | ":" | "@" | "&" | "="
hex            = digit | "A" | "B" | "C" | "D" | "E" | "F" |
                 "a" | "b" | "c" | "d" | "e" | "f"
escape         = "%" hex hex

unreserved     = alpha | digit | safe | extra
uchar          = unreserved | escape
xchar          = unreserved | reserved | escape
digits         = 1*digit

回答 2

我使用的是Django所使用的,看来效果很好:

def is_valid_url(url):
    import re
    regex = re.compile(
        r'^https?://'  # http:// or https://
        r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+[A-Z]{2,6}\.?|'  # domain...
        r'localhost|'  # localhost...
        r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip
        r'(?::\d+)?'  # optional port
        r'(?:/?|[/?]\S+)$', re.IGNORECASE)
    return url is not None and regex.search(url)

您随时可以在这里查看最新版本:https//github.com/django/django/blob/master/django/core/validators.py#L74

I’m using the one used by Django and it seems to work pretty well:

def is_valid_url(url):
    import re
    regex = re.compile(
        r'^https?://'  # http:// or https://
        r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+[A-Z]{2,6}\.?|'  # domain...
        r'localhost|'  # localhost...
        r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip
        r'(?::\d+)?'  # optional port
        r'(?:/?|[/?]\S+)$', re.IGNORECASE)
    return url is not None and regex.search(url)

You can always check the latest version here: https://github.com/django/django/blob/master/django/core/validators.py#L74


回答 3

我承认,我发现你的正则表达完全无法理解。我想知道您是否可以使用urlparse代替?就像是:

pieces = urlparse.urlparse(url)
assert all([pieces.scheme, pieces.netloc])
assert set(pieces.netloc) <= set(string.letters + string.digits + '-.')  # and others?
assert pieces.scheme in ['http', 'https', 'ftp']  # etc.

它可能会比较慢,并且可能会错过条件,但是(对我而言)它似乎比URL的正则表达式更易于阅读和调试。

I admit, I find your regular expression totally incomprehensible. I wonder if you could use urlparse instead? Something like:

pieces = urlparse.urlparse(url)
assert all([pieces.scheme, pieces.netloc])
assert set(pieces.netloc) <= set(string.letters + string.digits + '-.')  # and others?
assert pieces.scheme in ['http', 'https', 'ftp']  # etc.

It might be slower, and maybe you’ll miss conditions, but it seems (to me) a lot easier to read and debug than a regular expression for URLs.


回答 4

urlparse很高兴使用无效的URL,它比任何一种验证器都更像是一个字符串字符串拆分库。例如:

from urlparse import urlparse
urlparse('http://----')
# returns: ParseResult(scheme='http', netloc='----', path='', params='', query='', fragment='')

根据情况,这可能很好。

如果您最信任数据,并且只想验证协议为HTTP,则 urlparse就是完美的选择。

如果要使该URL实际上是合法URL,请使用可笑的正则表达式

如果您想确保它是一个真实的网址,

import urllib
try:
    urllib.urlopen(url)
except IOError:
    print "Not a real URL"

urlparse quite happily takes invalid URLs, it is more a string string-splitting library than any kind of validator. For example:

from urlparse import urlparse
urlparse('http://----')
# returns: ParseResult(scheme='http', netloc='----', path='', params='', query='', fragment='')

Depending on the situation, this might be fine..

If you mostly trust the data, and just want to verify the protocol is HTTP, then urlparse is perfect.

If you want to make the URL is actually a legal URL, use the ridiculous regex

If you want to make sure it’s a real web address,

import urllib
try:
    urllib.urlopen(url)
except IOError:
    print "Not a real URL"

回答 5

http://pypi.python.org/pypi/rfc3987提供了正则表达式,以与RFC 3986和RFC 3987中的规则保持一致(即不与特定于方案的规则保持一致)。

IRI_reference的正则表达式为:

(?P<scheme>[a-zA-Z][a-zA-Z0-9+.-]*):(?://(?P<iauthority>(?:(?P<iuserinfo>(?:(?:[
a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U0002
0000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U
00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009ff
fd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U00
0dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:)*)@)?(?P<ihost>\
\[(?:(?:[0-9A-F]{1,4}:){6}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4]
[0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|::(?:[0
-9A-F]{1,4}:){5}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]
?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|[0-9A-F]{1,4}?::(
?:[0-9A-F]{1,4}:){4}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|
[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F
]{1,4}:)?[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:){3}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?
:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[
0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,2}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:){2}(?:
[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3
}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,3}[0-9A-F]{1,
4})?::(?:[0-9A-F]{1,4}:)(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0
-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-
9A-F]{1,4}:){,4}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]
|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|
(?:(?:[0-9A-F]{1,4}:){,5}[0-9A-F]{1,4})?::[0-9A-F]{1,4}|(?:(?:[0-9A-F]{1,4}:){,6
}[0-9A-F]{1,4})?::|v[0-9A-F]+\\.(?:[a-zA-Z0-9_.~-]|[!$&'()*+,;=]|:)+)\\]|(?:(?:(
?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][
0-9]?))|(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\
U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U000500
00-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00
090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd
\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=])*)(
?::(?P<port>[0-9]*))?)(?P<ipath>(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\uf
dcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\
U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007f
ffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U0
00bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-
F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)|(?P<ipath>/(?:(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7
ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000
-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U0007
0000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U
000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000eff
fd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff
\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\
U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U000700
00-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U00
0b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd
])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)?)|(?P<ipath>(?:(?:[a-zA-Z0-9._~-]|[\
xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U
00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006ff
fd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U00
0afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-
\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa
0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00
030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd
\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000a
fffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U
000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)|(?P<ipath>))(?:\\?(?P<iquery
>(?:(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U000
1fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\
U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U000900
00-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U00
0d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)|[\
ue000-\uf8ff\U000f0000-\U000ffffd\U00100000-\U0010fffd]|/|\\?)*))?(?:\\#(?P<ifra
gment>(?:(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-
\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050
000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U0
0090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfff
d\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|
@)|/|\\?)*))?|(?:(?://(?P<iauthority>(?:(?P<iuserinfo>(?:(?:[a-zA-Z0-9._~-]|[\xa
0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00
030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd
\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000a
fffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U
000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:)*)@)?(?P<ihost>\\[(?:(?:[0-9A-F]{1,
4}:){6}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-
9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|::(?:[0-9A-F]{1,4}:){5}(?:
[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3
}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|[0-9A-F]{1,4}?::(?:[0-9A-F]{1,4}:){4
}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\
.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:)?[0-9A-F]{1
,4})?::(?:[0-9A-F]{1,4}:){3}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-
4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?
:[0-9A-F]{1,4}:){,2}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:){2}(?:[0-9A-F]{1,4}:[0-9A
-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][
0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,3}[0-9A-F]{1,4})?::(?:[0-9A-F]{1
,4}:)(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]
?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,4}[0-
9A-F]{1,4})?::(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[
0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}
:){,5}[0-9A-F]{1,4})?::[0-9A-F]{1,4}|(?:(?:[0-9A-F]{1,4}:){,6}[0-9A-F]{1,4})?::|
v[0-9A-F]+\\.(?:[a-zA-Z0-9_.~-]|[!$&'()*+,;=]|:)+)\\]|(?:(?:(?:25[0-5]|2[0-4][0-
9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(?:(?:[a-zA
-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000
-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U0006
0000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U
000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dff
fd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=])*)(?::(?P<port>[0-9]*)
)?)(?P<ipath>(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U0
0010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fff
d\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U000
8fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\
U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*
+,;=]|:|@)*)*)|(?P<ipath>/(?:(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufd
f0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U000400
00-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00
080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd
\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A
-F]|[!$&'()*+,;=]|:|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0
-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000
-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U0008
0000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U
000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F
]|[!$&'()*+,;=]|:|@)*)*)?)|(?P<ipath>(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\u
fdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd
\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007
fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U
000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A
-F][0-9A-F]|[!$&'()*+,;=]|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf
\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00
040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd
\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000b
fffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][
0-9A-F]|[!$&'()*+,;=]|:|@)*)*)|(?P<ipath>))(?:\\?(?P<iquery>(?:(?:(?:[a-zA-Z0-9.
_~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U000
2fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\
U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a00
00-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U00
0e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)|[\ue000-\uf8ff\U000f000
0-\U000ffffd\U00100000-\U0010fffd]|/|\\?)*))?(?:\\#(?P<ifragment>(?:(?:(?:[a-zA-
Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-
\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060
000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U0
00a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfff
d\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)|/|\\?)*))?)

一行:

(?P<scheme>[a-zA-Z][a-zA-Z0-9+.-]*):(?://(?P<iauthority>(?:(?P<iuserinfo>(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:)*)@)?(?P<ihost>\\[(?:(?:[0-9A-F]{1,4}:){6}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|::(?:[0-9A-F]{1,4}:){5}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|[0-9A-F]{1,4}?::(?:[0-9A-F]{1,4}:){4}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:)?[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:){3}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,2}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:){2}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,3}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:)(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,4}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,5}[0-9A-F]{1,4})?::[0-9A-F]{1,4}|(?:(?:[0-9A-F]{1,4}:){,6}[0-9A-F]{1,4})?::|v[0-9A-F]+\\.(?:[a-zA-Z0-9_.~-]|[!$&'()*+,;=]|:)+)\\]|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=])*)(?::(?P<port>[0-9]*))?)(?P<ipath>(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)|(?P<ipath>/(?:(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)?)|(?P<ipath>(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)|(?P<ipath>))(?:\\?(?P<iquery>(?:(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)|[\ue000-\uf8ff\U000f0000-\U000ffffd\U00100000-\U0010fffd]|/|\\?)*))?(?:\\#(?P<ifragment>(?:(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)|/|\\?)*))?|(?:(?://(?P<iauthority>(?:(?P<iuserinfo>(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:)*)@)?(?P<ihost>\\[(?:(?:[0-9A-F]{1,4}:){6}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|::(?:[0-9A-F]{1,4}:){5}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|[0-9A-F]{1,4}?::(?:[0-9A-F]{1,4}:){4}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:)?[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:){3}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,2}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:){2}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,3}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:)(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,4}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,5}[0-9A-F]{1,4})?::[0-9A-F]{1,4}|(?:(?:[0-9A-F]{1,4}:){,6}[0-9A-F]{1,4})?::|v[0-9A-F]+\\.(?:[a-zA-Z0-9_.~-]|[!$&'()*+,;=]|:)+)\\]|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=])*)(?::(?P<port>[0-9]*))?)(?P<ipath>(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)|(?P<ipath>/(?:(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)?)|(?P<ipath>(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)|(?P<ipath>))(?:\\?(?P<iquery>(?:(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)|[\ue000-\uf8ff\U000f0000-\U000ffffd\U00100000-\U0010fffd]|/|\\?)*))?(?:\\#(?P<ifragment>(?:(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)|/|\\?)*))?)

http://pypi.python.org/pypi/rfc3987 gives regular expressions for consistency with the rules in RFC 3986 and RFC 3987 (that is, not with scheme-specific rules).

A regexp for IRI_reference is:

(?P<scheme>[a-zA-Z][a-zA-Z0-9+.-]*):(?://(?P<iauthority>(?:(?P<iuserinfo>(?:(?:[
a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U0002
0000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U
00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009ff
fd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U00
0dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:)*)@)?(?P<ihost>\
\[(?:(?:[0-9A-F]{1,4}:){6}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4]
[0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|::(?:[0
-9A-F]{1,4}:){5}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]
?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|[0-9A-F]{1,4}?::(
?:[0-9A-F]{1,4}:){4}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|
[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F
]{1,4}:)?[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:){3}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?
:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[
0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,2}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:){2}(?:
[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3
}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,3}[0-9A-F]{1,
4})?::(?:[0-9A-F]{1,4}:)(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0
-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-
9A-F]{1,4}:){,4}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]
|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|
(?:(?:[0-9A-F]{1,4}:){,5}[0-9A-F]{1,4})?::[0-9A-F]{1,4}|(?:(?:[0-9A-F]{1,4}:){,6
}[0-9A-F]{1,4})?::|v[0-9A-F]+\\.(?:[a-zA-Z0-9_.~-]|[!$&'()*+,;=]|:)+)\\]|(?:(?:(
?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][
0-9]?))|(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\
U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U000500
00-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00
090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd
\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=])*)(
?::(?P<port>[0-9]*))?)(?P<ipath>(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\uf
dcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\
U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007f
ffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U0
00bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-
F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)|(?P<ipath>/(?:(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7
ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000
-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U0007
0000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U
000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000eff
fd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff
\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\
U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U000700
00-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U00
0b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd
])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)?)|(?P<ipath>(?:(?:[a-zA-Z0-9._~-]|[\
xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U
00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006ff
fd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U00
0afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-
\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa
0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00
030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd
\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000a
fffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U
000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)|(?P<ipath>))(?:\\?(?P<iquery
>(?:(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U000
1fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\
U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U000900
00-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U00
0d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)|[\
ue000-\uf8ff\U000f0000-\U000ffffd\U00100000-\U0010fffd]|/|\\?)*))?(?:\\#(?P<ifra
gment>(?:(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-
\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050
000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U0
0090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfff
d\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|
@)|/|\\?)*))?|(?:(?://(?P<iauthority>(?:(?P<iuserinfo>(?:(?:[a-zA-Z0-9._~-]|[\xa
0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00
030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd
\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000a
fffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U
000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:)*)@)?(?P<ihost>\\[(?:(?:[0-9A-F]{1,
4}:){6}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-
9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|::(?:[0-9A-F]{1,4}:){5}(?:
[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3
}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|[0-9A-F]{1,4}?::(?:[0-9A-F]{1,4}:){4
}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\
.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:)?[0-9A-F]{1
,4})?::(?:[0-9A-F]{1,4}:){3}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-
4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?
:[0-9A-F]{1,4}:){,2}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:){2}(?:[0-9A-F]{1,4}:[0-9A
-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][
0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,3}[0-9A-F]{1,4})?::(?:[0-9A-F]{1
,4}:)(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]
?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,4}[0-
9A-F]{1,4})?::(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[
0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}
:){,5}[0-9A-F]{1,4})?::[0-9A-F]{1,4}|(?:(?:[0-9A-F]{1,4}:){,6}[0-9A-F]{1,4})?::|
v[0-9A-F]+\\.(?:[a-zA-Z0-9_.~-]|[!$&'()*+,;=]|:)+)\\]|(?:(?:(?:25[0-5]|2[0-4][0-
9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(?:(?:[a-zA
-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000
-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U0006
0000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U
000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dff
fd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=])*)(?::(?P<port>[0-9]*)
)?)(?P<ipath>(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U0
0010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fff
d\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U000
8fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\
U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*
+,;=]|:|@)*)*)|(?P<ipath>/(?:(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufd
f0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U000400
00-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00
080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd
\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A
-F]|[!$&'()*+,;=]|:|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0
-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000
-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U0008
0000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U
000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F
]|[!$&'()*+,;=]|:|@)*)*)?)|(?P<ipath>(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\u
fdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd
\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007
fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U
000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A
-F][0-9A-F]|[!$&'()*+,;=]|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf
\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00
040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd
\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000b
fffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][
0-9A-F]|[!$&'()*+,;=]|:|@)*)*)|(?P<ipath>))(?:\\?(?P<iquery>(?:(?:(?:[a-zA-Z0-9.
_~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U000
2fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\
U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a00
00-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U00
0e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)|[\ue000-\uf8ff\U000f000
0-\U000ffffd\U00100000-\U0010fffd]|/|\\?)*))?(?:\\#(?P<ifragment>(?:(?:(?:[a-zA-
Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-
\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060
000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U0
00a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfff
d\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)|/|\\?)*))?)

In one line:

(?P<scheme>[a-zA-Z][a-zA-Z0-9+.-]*):(?://(?P<iauthority>(?:(?P<iuserinfo>(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:)*)@)?(?P<ihost>\\[(?:(?:[0-9A-F]{1,4}:){6}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|::(?:[0-9A-F]{1,4}:){5}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|[0-9A-F]{1,4}?::(?:[0-9A-F]{1,4}:){4}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:)?[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:){3}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,2}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:){2}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,3}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:)(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,4}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,5}[0-9A-F]{1,4})?::[0-9A-F]{1,4}|(?:(?:[0-9A-F]{1,4}:){,6}[0-9A-F]{1,4})?::|v[0-9A-F]+\\.(?:[a-zA-Z0-9_.~-]|[!$&'()*+,;=]|:)+)\\]|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=])*)(?::(?P<port>[0-9]*))?)(?P<ipath>(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)|(?P<ipath>/(?:(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)?)|(?P<ipath>(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)|(?P<ipath>))(?:\\?(?P<iquery>(?:(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)|[\ue000-\uf8ff\U000f0000-\U000ffffd\U00100000-\U0010fffd]|/|\\?)*))?(?:\\#(?P<ifragment>(?:(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)|/|\\?)*))?|(?:(?://(?P<iauthority>(?:(?P<iuserinfo>(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:)*)@)?(?P<ihost>\\[(?:(?:[0-9A-F]{1,4}:){6}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|::(?:[0-9A-F]{1,4}:){5}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|[0-9A-F]{1,4}?::(?:[0-9A-F]{1,4}:){4}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:)?[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:){3}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,2}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:){2}(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,3}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:)(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,4}[0-9A-F]{1,4})?::(?:[0-9A-F]{1,4}:[0-9A-F]{1,4}|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))|(?:(?:[0-9A-F]{1,4}:){,5}[0-9A-F]{1,4})?::[0-9A-F]{1,4}|(?:(?:[0-9A-F]{1,4}:){,6}[0-9A-F]{1,4})?::|v[0-9A-F]+\\.(?:[a-zA-Z0-9_.~-]|[!$&'()*+,;=]|:)+)\\]|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=])*)(?::(?P<port>[0-9]*))?)(?P<ipath>(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)|(?P<ipath>/(?:(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)?)|(?P<ipath>(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|@)+(?:/(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)*)*)|(?P<ipath>))(?:\\?(?P<iquery>(?:(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)|[\ue000-\uf8ff\U000f0000-\U000ffffd\U00100000-\U0010fffd]|/|\\?)*))?(?:\\#(?P<ifragment>(?:(?:(?:[a-zA-Z0-9._~-]|[\xa0-\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd\U00020000-\U0002fffd\U00030000-\U0003fffd\U00040000-\U0004fffd\U00050000-\U0005fffd\U00060000-\U0006fffd\U00070000-\U0007fffd\U00080000-\U0008fffd\U00090000-\U0009fffd\U000a0000-\U000afffd\U000b0000-\U000bfffd\U000c0000-\U000cfffd\U000d0000-\U000dfffd\U000e1000-\U000efffd])|%[0-9A-F][0-9A-F]|[!$&'()*+,;=]|:|@)|/|\\?)*))?)

回答 6

注意 -Lepl不再受到维护或支持。

RFC 3696定义了URL验证的“最佳做法”-http: //www.faqs.org/rfcs/rfc3696.html

Lepl的最新版本(Python解析器库)包括RFC 3696的实现。您可以使用类似以下的方式:

from lepl.apps.rfc3696 import Email, HttpUrl

# compile the validators (do once at start of program)
valid_email = Email()
valid_http_url = HttpUrl()

# use the validators (as often as you like)
if valid_email(some_email):
    # email is ok
else:
    # email is bad
if valid_http_url(some_url):
    # url is ok
else:
    # url is bad

尽管验证器是在Lepl中定义的,Lepl是递归下降解析器,但它们很大程度上在内部编译为正则表达式。它结合了两全其美的优点-(相对)易于阅读的定义,可以根据RFC 3696 有效的实现进行检查。我的博客上有一篇文章显示了如何简化解析器-http: //www.acooke.org/cute/LEPLOptimi0.html

Lepl可以在http://www.acooke.org/lepl上获得,RFC 3696模块在http://www.acooke.org/lepl/rfc3696.html上可以找到。

这是此版本中的全新内容,因此可能包含错误。如果您有任何问题,请与我联系,我会尽快修复。谢谢。

note – Lepl is no longer maintained or supported.

RFC 3696 defines “best practices” for URL validation – http://www.faqs.org/rfcs/rfc3696.html

The latest release of Lepl (a Python parser library) includes an implementation of RFC 3696. You would use it something like:

from lepl.apps.rfc3696 import Email, HttpUrl

# compile the validators (do once at start of program)
valid_email = Email()
valid_http_url = HttpUrl()

# use the validators (as often as you like)
if valid_email(some_email):
    # email is ok
else:
    # email is bad
if valid_http_url(some_url):
    # url is ok
else:
    # url is bad

Although the validators are defined in Lepl, which is a recursive descent parser, they are largely compiled internally to regular expressions. That combines the best of both worlds – a (relatively) easy to read definition that can be checked against RFC 3696 and an efficient implementation. There’s a post on my blog showing how this simplifies the parser – http://www.acooke.org/cute/LEPLOptimi0.html

Lepl is available at http://www.acooke.org/lepl and the RFC 3696 module is documented at http://www.acooke.org/lepl/rfc3696.html

This is completely new in this release, so may contain bugs. Please contact me if you have any problems and I will fix them ASAP. Thanks.


回答 7

如今,在90%的情况下,如果您在Python中使用URL,则可能使用python-requests。因此,这里的问题是-为什么不重用请求中的URL验证?

from requests.models import PreparedRequest
import requests.exceptions


def check_url(url):
    prepared_request = PreparedRequest()
    try:
        prepared_request.prepare_url(url, None)
        return prepared_request.url
    except requests.exceptions.MissingSchema, e:
        raise SomeException

特征:

  • 不要重新发明轮子
  • 干燥
  • 离线办公
  • 最少的资源

Nowadays, in 90% of case if you working with URL in Python you probably use python-requests. Hence the question here – why not reuse URL validation from requests?

from requests.models import PreparedRequest
import requests.exceptions


def check_url(url):
    prepared_request = PreparedRequest()
    try:
        prepared_request.prepare_url(url, None)
        return prepared_request.url
    except requests.exceptions.MissingSchema, e:
        raise SomeException

Features:

  • Don’t reinvent the wheel
  • DRY
  • Work offline
  • Minimal resource

回答 8

提供的正则表达式应与http://www.ietf.org/rfc/rfc3986.txt格式的任何网址匹配;并且在python解释器中进行测试时会执行。

您遇到解析困难的网址使用哪种格式?

The regex provided should match any url of the form http://www.ietf.org/rfc/rfc3986.txt; and does when tested in the python interpreter.

What format have the URLs you’ve been having trouble parsing had?


回答 9

这些年来,我需要做很多次,并且总是以模仿别人正则表达式的方式复制自己,而正则表达式的思考方式比我想的要多。

话虽如此,Django表单代码中有一个正则表达式可以解决这个问题:

http://code.djangoproject.com/browser/django/trunk/django/forms/fields.py#L534

I’ve needed to do this many times over the years and always end up copying someone else’s regular expression who has thought about it way more than I want to think about it.

Having said that, there is a regex in the Django forms code which should do the trick:

http://code.djangoproject.com/browser/django/trunk/django/forms/fields.py#L534


回答 10

修改后的Django URL验证正则表达式:

import re

ul = '\u00a1-\uffff'  # unicode letters range (must not be a raw string)

# IP patterns 
ipv4_re = r'(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}' 
ipv6_re = r'\[[0-9a-f:\.]+\]'

# Host patterns 
hostname_re = r'[a-z' + ul + r'0-9](?:[a-z' + ul + r'0-9-]{0,61}[a-z' + ul + r'0-9])?'
domain_re = r'(?:\.(?!-)[a-z' + ul + r'0-9-]{1,63}(?<!-))*' # domain names have max length of 63 characters
tld_re = ( 
    r'\.'                                # dot 
    r'(?!-)'                             # can't start with a dash 
    r'(?:[a-z' + ul + '-]{2,63}'         # domain label 
    r'|xn--[a-z0-9]{1,59})'              # or punycode label 
    r'(?<!-)'                            # can't end with a dash 
    r'\.?'                               # may have a trailing dot 
) 
host_re = '(' + hostname_re + domain_re + tld_re + '|localhost)'

regex = re.compile( 
    r'^(?:http|ftp)s?://' # http(s):// or ftp(s)://
    r'(?:\S+(?::\S*)?@)?'  # user:pass authentication 
    r'(?:' + ipv4_re + '|' + ipv6_re + '|' + host_re + ')' # localhost or ip
    r'(?::\d{2,5})?'  # optional port
    r'(?:[/?#][^\s]*)?'  # resource path
    r'\Z', re.IGNORECASE)

来源:https : //github.com/django/django/blob/master/django/core/validators.py#L74

modified django url validation regex:

import re

ul = '\u00a1-\uffff'  # unicode letters range (must not be a raw string)

# IP patterns 
ipv4_re = r'(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}' 
ipv6_re = r'\[[0-9a-f:\.]+\]'

# Host patterns 
hostname_re = r'[a-z' + ul + r'0-9](?:[a-z' + ul + r'0-9-]{0,61}[a-z' + ul + r'0-9])?'
domain_re = r'(?:\.(?!-)[a-z' + ul + r'0-9-]{1,63}(?<!-))*' # domain names have max length of 63 characters
tld_re = ( 
    r'\.'                                # dot 
    r'(?!-)'                             # can't start with a dash 
    r'(?:[a-z' + ul + '-]{2,63}'         # domain label 
    r'|xn--[a-z0-9]{1,59})'              # or punycode label 
    r'(?<!-)'                            # can't end with a dash 
    r'\.?'                               # may have a trailing dot 
) 
host_re = '(' + hostname_re + domain_re + tld_re + '|localhost)'

regex = re.compile( 
    r'^(?:http|ftp)s?://' # http(s):// or ftp(s)://
    r'(?:\S+(?::\S*)?@)?'  # user:pass authentication 
    r'(?:' + ipv4_re + '|' + ipv6_re + '|' + host_re + ')' # localhost or ip
    r'(?::\d{2,5})?'  # optional port
    r'(?:[/?#][^\s]*)?'  # resource path
    r'\Z', re.IGNORECASE)

source: https://github.com/django/django/blob/master/django/core/validators.py#L74


回答 11

urlfinders = [
    re.compile("([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}|(((news|telnet|nttp|file|http|ftp|https)://)|(www|ftp)[-A-Za-z0-9]*\\.)[-A-Za-z0-9\\.]+)(:[0-9]*)?/[-A-Za-z0-9_\\$\\.\\+\\!\\*\\(\\),;:@&=\\?/~\\#\\%]*[^]'\\.}>\\),\\\"]"),
    re.compile("([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}|(((news|telnet|nttp|file|http|ftp|https)://)|(www|ftp)[-A-Za-z0-9]*\\.)[-A-Za-z0-9\\.]+)(:[0-9]*)?"),
    re.compile("(~/|/|\\./)([-A-Za-z0-9_\\$\\.\\+\\!\\*\\(\\),;:@&=\\?/~\\#\\%]|\\\\
)+"),
    re.compile("'\\<((mailto:)|)[-A-Za-z0-9\\.]+@[-A-Za-z0-9\\.]+"),
]

注意:在您的浏览器中看起来很丑陋,只需复制粘贴即可,并且格式应该很好

在python邮件列表中找到并用于gnome-terminal

来源:http : //mail.python.org/pipermail/python-list/2007-January/595436.html

urlfinders = [
    re.compile("([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}|(((news|telnet|nttp|file|http|ftp|https)://)|(www|ftp)[-A-Za-z0-9]*\\.)[-A-Za-z0-9\\.]+)(:[0-9]*)?/[-A-Za-z0-9_\\$\\.\\+\\!\\*\\(\\),;:@&=\\?/~\\#\\%]*[^]'\\.}>\\),\\\"]"),
    re.compile("([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}|(((news|telnet|nttp|file|http|ftp|https)://)|(www|ftp)[-A-Za-z0-9]*\\.)[-A-Za-z0-9\\.]+)(:[0-9]*)?"),
    re.compile("(~/|/|\\./)([-A-Za-z0-9_\\$\\.\\+\\!\\*\\(\\),;:@&=\\?/~\\#\\%]|\\\\
)+"),
    re.compile("'\\<((mailto:)|)[-A-Za-z0-9\\.]+@[-A-Za-z0-9\\.]+"),
]

NOTE: As ugly as it looks in your browser just copy paste and the formatting should be good

Found at the python mailing lists and used for the gnome-terminal

source: http://mail.python.org/pipermail/python-list/2007-January/595436.html


在Google App Engine上选择Java vs Python

问题:在Google App Engine上选择Java vs Python

目前,Google App Engine同时支持Python和Java。Java支持还不成熟。但是,Java似乎具有更长的库列表,尤其是对Java字节码的支持,无论用于编写该代码的语言是什么。哪种语言将提供更好的性能和更多的功能?请指教。谢谢!

编辑: http : //groups.google.com/group/google-appengine-java/web/will-it-play-in-app-engine?pli=1

编辑: “能力”是指更好的可扩展性和框架外部可用库的包含。不过,Python仅允许使用纯Python库。

Currently Google App Engine supports both Python & Java. Java support is less mature. However, Java seems to have a longer list of libraries and especially support for Java bytecode regardless of the languages used to write that code. Which language will give better performance and more power? Please advise. Thank you!

Edit: http://groups.google.com/group/google-appengine-java/web/will-it-play-in-app-engine?pli=1

Edit: By “power” I mean better expandability and inclusion of available libraries outside the framework. Python allows only pure Python libraries, though.


回答 0

我有偏见(是一名Python专家,但对Java却很生疏),但我认为GAE的Python运行时目前比Java运行时更先进和更好地开发-毕竟前者还有一年多的开发和成熟时间。

事情的进展当然很难预测-Java方面的需求可能会更强(特别是因为它不仅与Java有关,而且其他语言也位于JVM之上,因此这是运行方式,例如PHP或App Engine上的Ruby代码);但是,Python App Engine团队的确拥有加入Python的发明者,非常强大的工程师Guido van Rossum的优势。

在灵活性方面,如前所述,Java引擎确实提供了运行由不同语言(不仅是Java)制作的JVM字节码的可能性-如果您在一家多语言商店中,那肯定是很大的。反之亦然,如果您讨厌Javascript但必须在用户的浏览器中执行一些代码,则Java的GWT(通过Java级编码为您生成Javascript)比Python端的替代方案(实际上,如果您选择Python,您将为此目的自己编写一些JS,而如果您选择Java GWT,则如果您讨厌编写JS,则可以使用它。

就库而言,这几乎是一种洗礼- JVM受到足够的限制(没有线程,没有自定义类加载器,没有JNI,没有关系数据库),以至于妨碍了现有Java库的简单重用,甚至超过了现有Python。类似地,Python运行时的类似限制也限制了库。

就性能而言,尽管您应该以自己的任务为基准,但我认为这是一种洗礼–不要依赖高度优化的基于JIT的JVM实现的性能,因为它们会浪费大量的启动时间和内存,因为应用引擎环境是非常不同的(启动成本将经常支付,因为您的应用实例被启动,停止,移动到其他主机等)对您来说都是透明的-与Python相比,使用Python运行时环境这些事件通常要便宜得多。

叹气,尽管我认为XPath / XSLT的情况在两侧都不是完美的,但我认为这在JVM中可能会稍微好一些(显然,可以使Saxon的实质子集运行) ,但要小心)。我认为值得在Appengine问题页面上以XPath和XSLT作为标题来打开问题-现在只有问题需要特定的库,这是近视的:我真的不在乎如何实现良好的XPath / XSLT,只要适用于Python和/或Java。(特定的库可能会简化现有代码的迁移,但是这比能够以某种方式执行“快速应用XSLT转换”这样的任务要重要!)。我知道如果措辞得当(尤其是以与语言无关的方式),我会盯上这样的问题。

最后但并非最不重要的一点:请记住,您可以拥有不同版本的应用程序(使用相同的数据存储),其中一些版本是通过Python运行时实现的,某些版本是通过Java运行时实现的,并且您可以访问不同于“默认/活动”的版本”带有明确的网址。所以你可以同时拥有Python Java代码(在应用的不同版本中)使用和修改同一数据存储,从而为您提供更大的灵活性(尽管只有一个拥有“ nice” URL,例如foobar.appspot.com)我想这可能仅对交互式用户在浏览器上的访问很重要;-)。

I’m biased (being a Python expert but pretty rusty in Java) but I think the Python runtime of GAE is currently more advanced and better developed than the Java runtime — the former has had one extra year to develop and mature, after all.

How things will proceed going forward is of course hard to predict — demand is probably stronger on the Java side (especially since it’s not just about Java, but other languages perched on top of the JVM too, so it’s THE way to run e.g. PHP or Ruby code on App Engine); the Python App Engine team however does have the advantage of having on board Guido van Rossum, the inventor of Python and an amazingly strong engineer.

In terms of flexibility, the Java engine, as already mentioned, does offer the possibility of running JVM bytecode made by different languages, not just Java — if you’re in a multi-language shop that’s a pretty large positive. Vice versa, if you loathe Javascript but must execute some code in the user’s browser, Java’s GWT (generating the Javascript for you from your Java-level coding) is far richer and more advanced than Python-side alternatives (in practice, if you choose Python, you’ll be writing some JS yourself for this purpose, while if you choose Java GWT is a usable alternative if you loathe writing JS).

In terms of libraries it’s pretty much a wash — the JVM is restricted enough (no threads, no custom class loaders, no JNI, no relational DB) to hamper the simple reuse of existing Java libraries as much, or more, than existing Python libraries are similarly hampered by the similar restrictions on the Python runtime.

In terms of performance, I think it’s a wash, though you should benchmark on tasks of your own — don’t rely on the performance of highly optimized JIT-based JVM implementations discounting their large startup times and memory footprints, because the app engine environment is very different (startup costs will be paid often, as instances of your app are started, stopped, moved to different hosts, etc, all trasparently to you — such events are typically much cheaper with Python runtime environments than with JVMs).

The XPath/XSLT situation (to be euphemistic…) is not exactly perfect on either side, sigh, though I think it may be a tad less bad in the JVM (where, apparently, substantial subsets of Saxon can be made to run, with some care). I think it’s worth opening issues on the Appengine Issues page with XPath and XSLT in their titles — right now there are only issues asking for specific libraries, and that’s myopic: I don’t really care HOW a good XPath/XSLT is implemented, for Python and/or for Java, as long as I get to use it. (Specific libraries may ease migration of existing code, but that’s less important than being able to perform such tasks as “rapidly apply XSLT transformation” in SOME way!-). I know I’d star such an issue if well phrased (especially in a language-independent way).

Last but not least: remember that you can have different version of your app (using the same datastore) some of which are implemented with the Python runtime, some with the Java runtime, and you can access versions that differ from the “default/active” one with explicit URLs. So you could have both Python and Java code (in different versions of your app) use and modify the same data store, granting you even more flexibility (though only one will have the “nice” URL such as foobar.appspot.com — which is probably important only for access by interactive users on browsers, I imagine;-).


回答 1

观看此应用,了解Python和Java性能的变化:

http://gaejava.appspot.com/ (编辑:道歉,链接现在已断开。但是,当我看到它最后运行时,以下para仍然适用)

目前,对于此简单测试,Python和在Java中使用低级API的速度比Java上的JDO快。至少如果基础引擎发生变化,则该应用应反映出性能变化。

Watch this app for changes in Python and Java performance:

http://gaejava.appspot.com/ (edit: apologies, link is broken now. But following para still applied when I saw it running last)

Currently, Python and using the low-level API in Java are faster than JDO on Java, for this simple test. At least if the underlying engine changes, that app should reflect performance changes.


回答 2

基于在其他平台上运行这些VM的经验,我想说Java可能会比Python带来更多原始性能。但是,请不要小看Python的卖点:Python语言在代码行方面的生产力要高得多-普遍的共识是,Python需要等效Java程序代码的三分之一,同时保持可读性或可读性。这种优势乘以无需立即进行编译步骤即可立即运行代码的能力。

关于可用的库,您会发现很多扩展的Python运行时库都是现成的(就像Java一样)。AppEngine还支持流行的Django Web框架(http://www.djangoproject.com/)。

关于“电源”,很难理解您的意思,但是Python已在许多不同的领域中使用,尤其是在Web上:YouTube是用Python编写的,Sourceforge也是如此(截至上周)。

Based on experience with running these VMs on other platforms, I’d say that you’ll probably get more raw performance out of Java than Python. Don’t underestimate Python’s selling points, however: The Python language is much more productive in terms of lines of code – the general agreement is that Python requires a third of the code of an equivalent Java program, while remaining as or more readable. This benefit is multiplied by the ability to run code immediately without an explicit compile step.

With regards to available libraries, you’ll find that much of the extensive Python runtime library works out of the box (as does Java’s). The popular Django Web framework (http://www.djangoproject.com/) is also supported on AppEngine.

With regards to ‘power’, it’s difficult to know what you mean, but Python is used in many different domains, especially the Web: YouTube is written in Python, as is Sourceforge (as of last week).


回答 3

2013年6月:此视频是Google工程师的一个很好的回答:

http://www.youtube.com/watch?v=tLriM2krw2E

TLDR;是:

  • 选择您和您的团队最能用的语言
  • 如果您想为生产而构建:Java或Python(而非Go)
  • 如果您有一个庞大的团队和一个复杂的代码库:Java(由于静态代码分析和重构)
  • 快速迭代的小型团队:Python(尽管Java也可以)

June 2013: This video is a very good answer by a google engineer:

http://www.youtube.com/watch?v=tLriM2krw2E

TLDR; is:

  • Pick the language that you and your team is most productive with
  • If you want to build something for production: Java or Python (not Go)
  • If you have a big team and a complex code base: Java (because of static code analysis and refactoring)
  • Small teams that iterate quickly: Python (although Java is also okay)

回答 4

在决定Python和Java之间要考虑的一个重要问题是如何使用每种语言的数据存储(本主题已经很好地涵盖了与原始问题的大多数其他角度)。

对于Java,标准方法是使用JDO或JPA。这些对于可移植性非常有用,但不适用于数据存储。

可以使用低级API,但是对于日常使用而言,该级别太低-它更适合于构建第三方库。

对于Python,有一个专门设计用于向应用程序提供对数据存储的轻松但强大的访问的API。很棒,但是它不是便携式的,因此将您锁定在GAE中。

幸运的是,已经针对两种语言列出的弱点开发了解决方案。

对于Java而言,低级API用于开发持久性库,该持久性库比JDO / JPA(IMO)更适合于数据存储。示例包括Siena项目Objectify

我最近开始使用Objectify,发现它非常易于使用并且非常适合数据存储,并且它的日益普及已经转化为良好的支持。例如,Google的新Cloud Endpoints服务正式支持Objectify。另一方面,Objectify仅适用于数据存储,而Siena受数据存储“启发”,但旨在与各种SQL数据库和NoSQL数据存储一起使用。

对于Python,正在努力允许在GAE之外使用Python GAE数据存储区API。一个示例是Google发布用于SDK的SQLite后端,但我怀疑他们是否打算将其发展为可用于生产的产品。该TyphoonAE项目可能有更多的潜力,但我不认为这是生产准备好了吗或者(纠正我,如果我错了)。

如果任何人有使用这些替代方法的经验或了解其他替代方法,请在评论中添加它们。就我个人而言,我真的很喜欢GAE数据存储-我发现它比AWS SimpleDB有了很大的改进-因此,我希望这些努力的成功能够减轻使用它的一些问题。

An important question to consider in deciding between Python and Java is how you will use the datastore in each language (and most other angles to the original question have already been covered quite well in this topic).

For Java, the standard method is to use JDO or JPA. These are great for portability but are not very well suited to the datastore.

A low-level API is available but this is too low level for day-to-day use – it is more suitable for building 3rd party libraries.

For Python there is an API designed specifically to provide applications with easy but powerful access to the datastore. It is great except that it is not portable so it locks you into GAE.

Fortunately, there are solutions being developed for the weaknesses listed for both languages.

For Java, the low-level API is being used to develop persistence libraries that are much better suited to the datastore then JDO/JPA (IMO). Examples include the Siena project, and Objectify.

I’ve recently started using Objectify and am finding it to be very easy to use and well suited to the datastore, and its growing popularity has translated into good support. For example, Objectify is officially supported by Google’s new Cloud Endpoints service. On the other hand, Objectify only works with the datastore, while Siena is ‘inspired’ by the datastore but is designed to work with a variety of both SQL databases and NoSQL datastores.

For Python, there are efforts being made to allow the use of the Python GAE datastore API off of the GAE. One example is the SQLite backend that Google released for use with the SDK, but I doubt they intend this to grow into something production ready. The TyphoonAE project probably has more potential, but I don’t think it is production ready yet either (correct me if I am wrong).

If anyone has experience with any of these alternatives or knows of others, please add them in a comment. Personally, I really like the GAE datastore – I find it to be a considerable improvement over the AWS SimpleDB – so I wish for the success of these efforts to alleviate some of the issues in using it.


回答 5

我强烈推荐Java for GAE,这是为什么:

  1. 性能:Java可能比Python快。
  2. Python开发面临缺乏第三方库的压力。例如,根本没有用于Python / GAE的XSLT。几乎所有的Python库都是C绑定(GAE不支持这些绑定)。
  3. Memcache API:Java SDK比Python SDK具有更多有趣的功能。
  4. 数据存储区API:JDO速度很慢,但是本机Java数据存储区API却非常快速和容易。

我现在在开发中使用Java / GAE。

I’m strongly recommending Java for GAE and here’s why:

  1. Performance: Java is potentially faster then Python.
  2. Python development is under pressure of a lack of third-party libraries. For example, there is no XSLT for Python/GAE at all. Almost all Python libraries are C bindings (and those are unsupported by GAE).
  3. Memcache API: Java SDK have more interesting abilities than Python SDK.
  4. Datastore API: JDO is very slow, but native Java datastore API is very fast and easy.

I’m using Java/GAE in development right now.


回答 6

如您所确定的,使用JVM并不限制您使用Java语言。JVM语言和链接的列表可以在这里找到。然而,Google App Engine确实限制了您可以从常规Java SE集中使用的类集,并且您将要研究是否可以在App Engine上使用这些实现中的任何一种。

编辑:我看到你已经找到了这样的列表

我无法评论Python的性能。但是,由于JVM具有在运行时动态编译和优化代码的能力,因此它是一个非常强大的平台性能。

最终,性能将取决于您的应用程序的工作方式以及编码方式。在没有更多信息的情况下,我认为无法在此区域中提供更多的指针。

As you’ve identified, using a JVM doesn’t restrict you to using the Java language. A list of JVM languages and links can be found here. However, the Google App Engine does restrict the set of classes you can use from the normal Java SE set, and you will want to investigate if any of these implementations can be used on the app engine.

EDIT: I see you’ve found such a list

I can’t comment on the performance of Python. However, the JVM is a very powerful platform performance-wise, given its ability to dynamically compile and optimise code during the run time.

Ultimately performance will depend on what your application does, and how you code it. In the absence of further info, I think it’s not possible to give any more pointers in this area.


回答 7

我对Python / Django SDK的干净,直接和无问题感到惊讶。但是,我开始遇到需要开始做更多JavaScript的情况,并认为我可能想利用GWT和其他Java实用程序。我只完成了GAE Java教程的一半,却又遇到了一个问题:Eclipse配置问题,JRE版本问题,Java令人麻木的复杂性以及一个令人困惑甚至可能损坏的教程。查看此站点以及从此处链接的其他站点对我来说很重要。我将回到Python,然后研究睡衣以帮助我应对JavaScript挑战。

I’ve been amazed at how clean, straightforward, and problem free the Python/Django SDK is. However I started running into situations where I needed to start doing more JavaScript and thought I might want to take advantage of the GWT and other Java utilities. I’ve gotten just half way through the GAE Java tutorial, and have had one problem after another: Eclipse configuration issues, JRE versionitis, the mind-numbing complexity of Java, and a confusing and possibly broken tutorial. Checking out this site and others linked from here clinched it for me. I’m going back to Python, and I’ll look into Pyjamas to help with my JavaScript challenges.


回答 8

我的谈话有点晚了,但这是我的两分钱。我真的很难在Python和Java之间进行选择,因为我精通两种语言。众所周知,两者都有优点和缺点,您必须考虑自己的要求和最适合您的项目的框架。

像我通常在这种困境中所做的那样,我会寻找数字来支持我的决定。我决定使用Python的原因有很多,但就我而言,有一个情节是临界点。如果您从2014年9月开始在GitHub上搜索“ Google App Engine” ,则会发现下图:

这些数字可能有很多偏差,但总的来说,GAE Python存储库的数量是GAE Java存储库的三倍。不仅如此,而且如果按“星数”列出项目,您会看到大多数Python项目都出现在顶部(您必须考虑到Python的使用时间更长)。对我来说,这对于Python来说是一个很好的例子,因为我考虑了社区的采用和支持,文档以及开源项目的可用性。

I’m a little late to the conversation, but here are my two cents. I really had a hard time choosing between Python and Java, since I am well versed in both languages. As we all know, there are advantages and disadvantages for both, and you have to take in account your requirements and the frameworks that work best for your project.

As I usually do in this type of dilemmas, I look for numbers to support my decision. I decided to go with Python for many reasons, but in my case, there was one plot that was the tipping point. If you search “Google App Engine” in GitHub as of September 2014, you will find the following figure:

There could be many biases in these numbers, but overall, there are three times more GAE Python repositories than GAE Java repositories. Not only that, but if you list the projects by the “number of stars” you will see that a majority of the Python projects appear at the top (you have to take in account that Python has been around longer). To me, this makes a strong case for Python because I take in account community adoption & support, documentation, and availability of open-source projects.


回答 9

这是一个很好的问题,我认为许多回应都很好地说明了围栏两侧的利弊。我已经尝试了基于Python和基于JVM的AppEngine(以我为例, Gaelyk,它是为AppEngine构建的Groovy应用程序框架)。当谈到平台的性能时,我一直没有想到的一件事就是它一直盯着我,这是在“ Java防护”方面的“加载请求”的含义。使用Groovy时,这些加载请求是致命的。

我在该主题上发表了一篇文章(http://distractable.net/coding/google-appengine-java-vs-python-performance-comparison/),希望找到解决此问题的方法,但是如果不是这样的话,我想我会回到Python + Django的组合,直到冷启动Java请求的影响减小为止。

It’s a good question, and I think many of the responses have given good view points of pros and cons on both sides of the fence. I’ve tried both Python and JVM-based AppEngine (in my case I was using Gaelyk which is a Groovy application framework built for AppEngine). When it comes to performance on the platform, one thing I hadn’t considered until it was staring me in the face is the implication of “Loading Requests” that occur on the Java side of the fence. When using Groovy these loading requests are a killer.

I put a post together on the topic (http://distractable.net/coding/google-appengine-java-vs-python-performance-comparison/) and I’m hoping to find a way of working around the problem, but if not I think I’ll be going back to a Python + Django combination until cold starting java requests has less of an impact.


回答 10

根据我听到的Java人与Python用户相比抱怨AppEngine的情况,我会说Python的使用压力要小得多。

Based on how much I hear Java people complain about AppEngine compared to Python users, I would say Python is much less stressful to use.


回答 11

还有一个项目Unladen Swallow,如果不是Google所有,显然是Google资助的。他们正在尝试为Python 2.6.1字节码实现基于LLVM的后端,因此他们可以使用JIT和各种不错的本机代码/ GC /多核优化。(很好的说法:“我们希望不做任何原始工作,而是尽可能多地利用过去30年的研究成果。”)他们正在寻求将CPython的速度提高5倍。

当然,这并不能回答您的紧迫问题,而是指向未来(希望)“缩小差距”(如果有)。

There’s also project Unladen Swallow, which is apparently Google-funded if not Google-owned. They’re trying to implement a LLVM-based backend for Python 2.6.1 bytecode, so they can use a JIT and various nice native code/GC/multi-core optimisations. (Nice quote: “We aspire to do no original work, instead using as much of the last 30 years of research as possible.”) They’re looking for a 5x speed-up to CPython.

Of course this doesn’t answer your immediate question, but points towards a “closing of the gap” (if any) in the future (hopefully).


回答 12

如今,python的魅力在于它与其他语言的交流程度。例如,您可以使用Jython将python和java放在同一表上。当然,即使jython完全支持Java库,它也不完全支持python库。但是,如果您想弄乱Java库,它是一个理想的解决方案。它甚至允许您将其与Java代码混合使用而无需额外的编码。

但是,即使python本身也已经采取了一些措施。例如,参见ctypes,接近C速度,直接访问C库,而无需离开python编码。Cython更进一步,允许轻松地将c代码与python代码混合,或者即使您不想弄乱c或c ++,您仍然可以在python中进行编码,但使用静态类型变量使python程序与C应用程序一样快。顺便说一下,Cython既被Google使用也受其支持。

昨天我什至发现了用于python内联C或Assembly的工具(请参阅CorePy),您将无法获得比它更强大的功能。

Python当然是一种非常成熟的语言,不仅可以独立存在,而且可以轻松地与任何其他语言配合。我认为,即使在非常高级和苛刻的情况下,这也是使python成为理想解决方案的原因。

使用python,您可以使用几乎零附加的代码访问C / C ++,Java,.NET和许多其他库,从而为您提供一种使代码最小化,简化和美化的语言。这是一种非常诱人的语言。

The beauty of python nowdays is how well it communicates with other languages. For instance you can have both python and java on the same table with Jython. Of course jython even though it fully supports java libraries it does not support fully python libraries. But its an ideal solution if you want to mess with Java Libraries. It even allows you to mix it with Java code with no extra coding.

But even python itself has made some steps forwared. See ctypes for example, near C speed , direct accees to C libraries all of this without leaving the comfort of python coding. Cython goes one step further , allowing to mix c code with python code with ease, or even if you dont want to mess with c or c++ , you can still code in python but use statically type variables making your python programms as fast as C apps. Cython is both used and supported by google by the way.

Yesterday I even found tools for python to inline C or even Assembly (see CorePy) , you cant get any more powerful than that.

Python is surely a very mature language, not only standing on itself , but able to coooperate with any other language with easy. I think that is what makes python an ideal solution even in a very advanced and demanding scenarios.

With python you can have acess to C/C++ ,Java , .NET and many other libraries with almost zero additional coding giving you also a language that minimises, simplifies and beautifies coding. Its a very tempting language.


回答 13

即使GWT对于我正在开发的应用程序来说似乎是一个完美的选择,但Python已经不复存在。JPA在GAE上非常混乱(例如,没有@Embeddable和其他晦涩的未记录限制)。花了一个星期的时间,我可以说Java暂时不适用于GAE。

Gone with Python even though GWT seems a perfect match for the kind of an app I’m developing. JPA is pretty messed up on GAE (e.g. no @Embeddable and other obscure non-documented limitations). Having spent a week, I can tell that Java just doesn’t feel right on GAE at the moment.


回答 14

要考虑的是您打算使用的框架。并非Java方面的所有框架都非常适合在App Engine上运行的应用程序,这与传统的Java应用程序服务器有些不同。

要考虑的一件事是应用程序启动时间。使用传统的Java Web应用程序,您实际上不需要考虑这一点。应用程序启动,然后运行。启动时间是5秒钟还是几分钟并不重要。使用App Engine,您可能会遇到仅在请求进入时才启动应用程序的情况。这意味着用户在应用程序启动时正在等待。GAE的新功能(如预留实例)在这里有帮助,但请先检查。

另一件事是GAE对Java的限制不同。并非所有的框架都对可以使用哪些类的限制或不允许使用线程或无法访问本地文件系统的事实感到满意。仅通过搜索GAE兼容性,可能很容易发现这些问题。

我还看到有些人抱怨现代UI框架(即Wicket)上的会话大小问题。通常,这些框架倾向于进行某些折衷,以使开发变得有趣,快速且容易。有时,这可能会导致与App Engine限制冲突。

我最初开始使用Java开发GAE,但由于这些原因,后来切换到Python。我个人的感觉认为Python是App Engine开发的更好选择。我认为Java在Amazon的Elastic Beanstalk上更“居家”。

但是使用App Engine的事情正在发生非常迅速的变化。GAE本身正在发生变化,并且随着它的日益流行,其框架也在发生变化以克服其局限性。

One think to take into account are the frameworks you intend yo use. Not all frameworks on Java side are well suited for applications running on App Engine, which is somewhat different than traditional Java app servers.

One thing to consider is the application startup time. With traditional Java web apps you don’t really need to think about this. The application starts and then it just runs. Doesn’t really matter if the startup takes 5 seconds or couple of minutes. With App Engine you might end up in a situation where the application is only started when a request comes in. This means the user is waiting while your application boots up. New GAE features like reserved instances help here, but check first.

Another thing are the different limitations GAE psoes on Java. Not all frameworks are happy with the limitations on what classes you can use or the fact that threads are not allowed or that you can’t access local filesystem. These issues are probably easy to find out by just googling about GAE compatibility.

I’ve also seen some people complaining about issues with session size on modern UI frameworks (Wicket, namely). In general these frameworks tend to do certain trade-offs in order to make development fun, fast and easy. Sometimes this may lead to conflicts with the App Engine limitations.

I initially started developing working on GAE with Java, but then switched to Python because of these reasons. My personal feeling is that Python is a better choice for App Engine development. I think Java is more “at home” for example on Amazon’s Elastic Beanstalk.

BUT with App Engine things are changing very rapidly. GAE is changing itself and as it becomes more popular, the frameworks are also changing to work around its limitations.


ImportError:没有名为apiclient.discovery的模块

问题:ImportError:没有名为apiclient.discovery的模块

我在Google App Engine的Python使用过Google Translate API时遇到此错误,但是我不知道如何解决,

<module>
from apiclient.discovery import build
ImportError: No module named apiclient.discovery

我将尝试设置指示Google App Engine SDK的环境,然后再次上传到Google Apps Engine,始终会收到错误消息

错误:服务器错误

服务器遇到错误,无法完成您的请求。如果问题仍然存在,请报告您的问题并提及此错误消息以及引起该问题的查询。

请告诉我如何解决,

谢谢

更新:修复了在 Nijjin的帮助下,我通过添加以下文件夹修复了问题,

apiclient, gflags, httplib2, oauth2client, uritemplate

如果仍然有问题,请考虑在此页面的“答案”下获取更多信息。例如 :Varum答案等…

I got this error in Google App Engine’s Python have used Google Translate API, But I don’t know how to fix,

<module>
from apiclient.discovery import build
ImportError: No module named apiclient.discovery

I’ll try to set environment which indicates to Google App Engine SDK, And upload to Google Apps Engine again, always get the error,

Error: Server Error

The server encountered an error and could not complete your request. If the problem persists, please report your problem and mention this error message and the query that caused it.

Please tell me how to fix,

Thanks

UPDATE : Fixed Follow Nijjin’s help, I fixed problems by adding the following folders,

apiclient, gflags, httplib2, oauth2client, uritemplate

If you still got problem, please consider below Answer of this page to get more info. ex. : Varum answer, etc …


回答 0

您可以通过以下简单安装获得这些依赖项:

sudo pip install --upgrade google-api-python-client

python快速入门页面上对此进行了描述。

You should be able to get these dependencies with this simple install:

sudo pip install --upgrade google-api-python-client

This is described on the quick start page for python.


回答 1

apiclient是图书馆的原始名称。
在某个时候,它已切换为googleapiclient

如果您的代码在Google App Engine上运行,则两者均应正常工作。

如果您自己运行该应用程序,并且安装了google-api-python-client,那么两者都应该也可以正常运行。

虽然,如果我们看一下软件包模块的源代码apiclient__init__.py,我们可以看到该apiclient模块只是为了向后兼容而保留下来的。

保留apiclient作为googleapiclient的别名。

因此,您确实应该googleapiclient在代码中使用,因为apiclient别名只是为了不破坏原有代码而维护的。

# bad
from apiclient.discovery import build

# good
from googleapiclient.discovery import build

apiclient was the original name of the library.
At some point, it was switched over to be googleapiclient.

If your code is running on Google App Engine, both should work.

If you are running the application yourself, with the google-api-python-client installed, both should work as well.

Although, if we take a look at the source code of the apiclient package’s __init__.py module, we can see that the apiclient module was simply kept around for backwards-compatibility.

Retain apiclient as an alias for googleapiclient.

So, you really should be using googleapiclient in your code, since the apiclient alias was just maintained as to not break legacy code.

# bad
from apiclient.discovery import build

# good
from googleapiclient.discovery import build

回答 2

apiclient不在Appengine运行时提供的第三方库列表中:http : //developers.google.com/appengine/docs/python/tools/libraries27

你需要复制apiclient到你的项目目录和你需要复制这些uritemplatehttplib2过。

注意:文档列表中未提供的任何第三方库都必须复制到您的appengine项目目录中

apiclient is not in the list of third party library supplied by the appengine runtime: http://developers.google.com/appengine/docs/python/tools/libraries27 .

You need to copy apiclient into your project directory & you need to copy these uritemplate & httplib2 too.

Note: Any third party library that are not supplied in the documentation list must copy to your appengine project directory


回答 3

如果上述解决方案都不适合您,请考虑您是否已通过Anaconda安装了python。如果是这种情况,则使用conda安装google API库可能会解决该问题。

跑:

python --version

如果你得到类似的东西

Python 3.6.4 :: Anaconda, Inc.

然后尝试:

conda install google-api-python-client

正如bgoodr在评论中指出的那样,您可能需要指定渠道(认为存储库)才能获取google API库。在编写本文时,这意味着运行命令:

conda install -c conda-forge google-api-python-client

https://anaconda.org/conda-forge/google-api-python-client上查看更多

If none of the above solutions work for you, consider if you might have installed python through Anaconda. If this is the case then installing the google API library with conda might fix it.

Run:

python --version

If you get something like

Python 3.6.4 :: Anaconda, Inc.

Then try:

conda install google-api-python-client

As bgoodr has pointed out in a comment you might need to specify the channel (think repository) to get the google API library. At the time of writing this means running the command:

conda install -c conda-forge google-api-python-client

See more at https://anaconda.org/conda-forge/google-api-python-client


回答 4

确保只google-api-python-client安装了。如果已apiclient安装,将导致碰撞。因此,运行以下命令:

sudo pip uninstall apiclient

Make sure you only have google-api-python-client installed. If you have apiclient installed, it will cause a collision. So, run the following:

sudo pip uninstall apiclient

回答 5

对于App Engine项目,您必须通过输入本地安装lib

pip install -t lib google-api-python-client

在这里阅读更多

For app engine project you gotta install the lib locally by typing

pip install -t lib google-api-python-client

read more here


回答 6

Google API Python客户端库有一个下载文件,其中包含该库及其所有依赖项,在项目的“下载”部分中名为google-api-python-client-gae- <version> .zip之类。只需将其解压缩到您的App Engine项目中即可。

There is a download for the Google API Python Client library that contains the library and all of its dependencies, named something like google-api-python-client-gae-<version>.zip in the downloads section of the project. Just unzip this into your App Engine project.


回答 7

我通过以下方法重新安装了软件包,从而解决了该问题:

pip install --force-reinstall google-api-python-client

I fixed the problem by reinstalling the package with:

pip install --force-reinstall google-api-python-client

回答 8

对于python3这对我有用:

sudo pip3 install --upgrade google-api-python-client

for python3 this worked for me:

sudo pip3 install --upgrade google-api-python-client

回答 9

由于URITemplate模块安装中的错误,我遇到了同样的问题。

这样就解决了问题:

pip install --force-reinstall uritemplate.py

I had the same problem because of a bug in the installation of the URITemplate module.

This solved the problem:

pip install --force-reinstall uritemplate.py

回答 10

在处理从Google日历解析最近的日历事件的项目时,遇到了同样的错误。

使用pip进行标准安装对我不起作用,这是我获取所需软件包时所做的事情。

直接转到源,这里是google-api-python-client的链接,但是如果您需要其他语言,则应该不会有太大区别。

https://github.com/google/google-api-python-client

单击左上角附近的绿色“克隆或下载”按钮,并将其另存为zip文件。将zip移到您的项目文件夹中,然后将其解压缩。然后将其创建的文件夹中的所有文件切回到项目文件夹的根目录中。

是的,这确实会使您的工作空间混乱,但是许多编译器都有隐藏文件的方法。

完成此标准后

from googleapiclient import discovery

效果很好。

希望这可以帮助。

I got this same error when working on a project to parse recent calendar events from Google Calendar.

Using the standard install with pip did not work for me, here is what I did to get the packages I needed.

Go directly to the source, here is a link for the google-api-python-client, but if you need a different language it should not be too different.

https://github.com/google/google-api-python-client

Click on the green “Clone or Download” button near the top left and save it as a zip file. Move the zip to your project folder and extract it there. Then cut all the files from the folder it creates back into the root of your project folder.

Yes, this does clutter your work space, but many compilers have ways to hide files.

After doing this the standard

from googleapiclient import discovery

works great.

Hope this helps.


回答 11

“ google-api-python-client”要求:

pip install uritemplate.py

解决GAE开发服务器上的问题:

from googleapiclient.discovery import build

ImportError: No module named googleapiclient.discovery

“google-api-python-client” requires:

pip install uritemplate.py

to fix problem on GAE Development Server:

from googleapiclient.discovery import build

ImportError: No module named googleapiclient.discovery

回答 12

我遇到了同样的问题。这工作:

>>> import pkg_resources
>>> pkg_resources.require("google-api-python-client")
[google-api-python-client 1.5.3 (c:\python27), uritemplate 0.6 (c:\python27\lib\site-packages\uritemplate-0.6-py2.7.egg), six 1.10.0 (c:\python27\lib\site-packages\six-1.10.0-py2.7.egg), oauth2client 3.0.0 (c:\python27\lib\site-packages\oauth2client-3.0.0-py2.7.egg), httplib2 0.9.2 (c:\python27\lib\site-packages\httplib2-0.9.2-py2.7.egg), simplejson 3.8.2 (c:\python27\lib\site-packages\simplejson-3.8.2-py2.7-win32.egg), six 1.10.0 (c:\python27\lib\site-packages\six-1.10.0-py2.7.egg), rsa 3.4.2 (c:\python27\lib\site-packages\rsa-3.4.2-py2.7.egg), pyasn1-modules 0.0.8 (c:\python27\lib\site-packages\pyasn1_modules-0.0.8-py2.7.egg), pyasn1 0.1.9 (c:\python27\lib\site-packages\pyasn1-0.1.9-py2.7.egg)]

>>> from apiclient.discovery import build
>>> 

I encountered the same issue. This worked:

>>> import pkg_resources
>>> pkg_resources.require("google-api-python-client")
[google-api-python-client 1.5.3 (c:\python27), uritemplate 0.6 (c:\python27\lib\site-packages\uritemplate-0.6-py2.7.egg), six 1.10.0 (c:\python27\lib\site-packages\six-1.10.0-py2.7.egg), oauth2client 3.0.0 (c:\python27\lib\site-packages\oauth2client-3.0.0-py2.7.egg), httplib2 0.9.2 (c:\python27\lib\site-packages\httplib2-0.9.2-py2.7.egg), simplejson 3.8.2 (c:\python27\lib\site-packages\simplejson-3.8.2-py2.7-win32.egg), six 1.10.0 (c:\python27\lib\site-packages\six-1.10.0-py2.7.egg), rsa 3.4.2 (c:\python27\lib\site-packages\rsa-3.4.2-py2.7.egg), pyasn1-modules 0.0.8 (c:\python27\lib\site-packages\pyasn1_modules-0.0.8-py2.7.egg), pyasn1 0.1.9 (c:\python27\lib\site-packages\pyasn1-0.1.9-py2.7.egg)]

>>> from apiclient.discovery import build
>>> 

回答 13

它仅在我使用sudo时与我一起使用:

sudo pip install --upgrade google-api-python-client

It only worked with me when I used sudo:

sudo pip install --upgrade google-api-python-client

回答 14

即使遵循https://developers.google.com/drive/api/v3/quickstart/python的 Google指南,我也遇到了同样的错误,然后我意识到我必须像这样调用:

python3 quickstart.py

代替:

python quickstart.py <-- WRONG

(请注意“ 3”)

完美地工作。

我正在使用Ubuntu 18.04.4 LTS

I was getting the same error, even after following Google’s guide at https://developers.google.com/drive/api/v3/quickstart/python, then I realized I had to invoke like this:

python3 quickstart.py

Instead of:

python quickstart.py <-- WRONG

(Note the “3“)

Worked flawlessly.

I’m using Ubuntu 18.04.4 LTS.


回答 15

用这个

pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib

use this

pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib

DistutilsOptionError:必须提供home或prefix / exec-prefix —不能同时提供

问题:DistutilsOptionError:必须提供home或prefix / exec-prefix —不能同时提供

我通常是通过pip安装python软件包的。

对于Google App Engine,我需要将软件包安装到另一个目标目录。

我试过了:

pip install -I flask-restful –target ./lib

但是它失败了:

必须提供home或prefix / exec-prefix-不能同时提供

我怎样才能使它工作?

I’ve been usually installed python packages through pip.

For Google App Engine, I need to install packages to another target directory.

I’ve tried:

pip install -I flask-restful –target ./lib

but it fails with:

must supply either home or prefix/exec-prefix — not both

How can I get this to work?


回答 0

您正在使用OS X和Homebrew吗?Homebrew python页面https://github.com/Homebrew/brew/blob/master/docs/Homebrew-and-Python.md指出了pip的已知问题和解决方法。

为我工作。

通过添加具有以下内容的〜/ .pydistutils.cfg文件,可以将此“空前缀”设置为默认值:

[install]
prefix=

编辑:不要使用此Homebrew建议的选项,它将破坏正常的点操作

Are you using OS X and Homebrew? The Homebrew python page https://github.com/Homebrew/brew/blob/master/docs/Homebrew-and-Python.md calls out a known issue with pip and a work around.

Worked for me.

You can make this “empty prefix” the default by adding a ~/.pydistutils.cfg file with the following contents:

[install]
prefix=

Edit: Do not use this Homebrew recommended option, it will break normal pip operations.


回答 1

我相信有一个更简单的解决方案(在macOS上使用Homebrew的Python),不会破坏您的常规pip操作。

您要做的就是setup.cfg在项目的根目录下创建一个文件,通常在主__init__.py或可执行py文件所在的位置。因此,如果您的项目的根文件夹为:/path/to/my/project/,请setup.cfg在其中创建一个文件,然后在其中放入神奇的单词:

[install]
prefix=  

好的,现在您可以为该文件夹运行pip的命令:

pip install package -t /path/to/my/project/  

该命令仅在该文件夹中正常运行。只需将其复制setup.cfg到您可能拥有的任何其他项目中即可。无需.pydistutils.cfg在主目录上写一个。

安装完模块后,可以将其卸下 setup.cfg

I believe there is a simpler solution to this problem (Homebrew’s Python on macOS) that won’t break your normal pip operations.

All you have to do is to create a setup.cfg file at the root directory of your project, usually where your main __init__.py or executable py file is. So if the root folder of your project is: /path/to/my/project/, create a setup.cfg file in there and put the magic words inside:

[install]
prefix=  

OK, now you sould be able to run pip’s commands for that folder:

pip install package -t /path/to/my/project/  

This command will run gracefully for that folder only. Just copy setup.cfg to whatever other projects you might have. No need to write a .pydistutils.cfg on your home directory.

After you are done installing the modules, you may remove setup.cfg.


回答 2

在OSX(mac)上,假定项目文件夹为/ var / myproject

  1. cd /var/myproject
  2. 创建一个名为的文件setup.cfg并添加 [install] prefix=
  3. pip install <packagename> -t .

On OSX(mac), assuming a project folder called /var/myproject

  1. cd /var/myproject
  2. Create a file called setup.cfg and add [install] prefix=
  3. Run pip install <packagename> -t .

回答 3

针对Homebrew用户的另一种解决方案*仅是使用virtualenv

当然,无论如何,这可能会消除对目标目录的需求-但是即使没有,我发现--target在虚拟环境中时默认情况下也可以工作(例如,无需创建/修改配置文件)。


*我说解决方案;也许这只是精心使用venvs的另一动机…

Another solution* for Homebrew users is simply to use a virtualenv.

Of course, that may remove the need for the target directory anyway – but even if it doesn’t, I’ve found --target works by default (as in, without creating/modifying a config file) when in a virtual environment.


*I say solution; perhaps it’s just another motivation to meticulously use venvs…


回答 4

我在周围的其他建议中遇到了错误--install-option="--prefix=lib"。我发现唯一有效的方法就是PYTHONUSERBASE此处所述使用。

export PYTHONUSERBASE=lib
pip install -I flask-restful --user

这与并不完全相同--target,但是无论如何我都会成功。

I hit errors with the other recommendations around --install-option="--prefix=lib". The only thing I found that worked is using PYTHONUSERBASE as described here.

export PYTHONUSERBASE=lib
pip install -I flask-restful --user

this is not exactly the same as --target, but it does the trick for me in any case.


回答 5

如其他提到的那样,这是与homebrew一起安装的pip和python的已知错误。

如果~/.pydistutils.cfg使用“空前缀”指令创建文件,它将解决此问题,但会破坏正常的点操作。

在正式解决此错误之前,一种选择是创建自己的bash脚本来处理这种情况:

 #!/bin/bash

 name=''
 target=''

 while getopts 'n:t:' flag; do
     case "${flag}" in
         n) name="${OPTARG}" ;;
         t) target="${OPTARG}" ;;
     esac
 done

 if [ -z "$target" ];
 then
     echo "Target parameter must be provided"
     exit 1
 fi

 if [ -z "$name" ];
 then
     echo "Name parameter must be provided"
     exit 1
 fi

 # current workaround for homebrew bug
 file=$HOME'/.pydistutils.cfg'
 touch $file

 /bin/cat <<EOM >$file
 [install]
 prefix=
 EOM
 # end of current workaround for homebrew bug

 pip install -I $name --target $target

 # current workaround for homebrew bug
 rm -rf $file
 # end of current workaround for homebrew bug

该脚本包装了您的命令,并:

  1. 接受名称和目标参数
  2. 检查这些参数是否为空
  3. 创建~/.pydistutils.cfg其中带有“空前缀”指令的文件
  4. 使用提供的参数执行您的pip命令
  5. 删除~/.pydistutils.cfg文件

可以更改此脚本并将其改编为满足您的需求,但您有所想法。而且它使您可以在不制动点的情况下运行命令。希望能帮助到你 :)

As other mentioned, this is known bug with pip & python installed with homebrew.

If you create ~/.pydistutils.cfg file with “empty prefix” instruction it will fix this problem but it will break normal pip operations.

Until this bug is officially addressed, one of the options would be to create your own bash script that would handle this case:

 #!/bin/bash

 name=''
 target=''

 while getopts 'n:t:' flag; do
     case "${flag}" in
         n) name="${OPTARG}" ;;
         t) target="${OPTARG}" ;;
     esac
 done

 if [ -z "$target" ];
 then
     echo "Target parameter must be provided"
     exit 1
 fi

 if [ -z "$name" ];
 then
     echo "Name parameter must be provided"
     exit 1
 fi

 # current workaround for homebrew bug
 file=$HOME'/.pydistutils.cfg'
 touch $file

 /bin/cat <<EOM >$file
 [install]
 prefix=
 EOM
 # end of current workaround for homebrew bug

 pip install -I $name --target $target

 # current workaround for homebrew bug
 rm -rf $file
 # end of current workaround for homebrew bug

This script wraps your command and:

  1. accepts name and target parameters
  2. checks if those parameters are empty
  3. creates ~/.pydistutils.cfg file with “empty prefix” instruction in it
  4. executes your pip command with provided parameters
  5. removes ~/.pydistutils.cfg file

This script can be changed and adapted to address your needs but you get idea. And it allows you to run your command without braking pip. Hope it helps :)


回答 6

如果您使用的是virtualenv *,最好再次检查一下which pip您使用的是哪个。

如果看到类似情况,则说明/usr/local/bin/pip您已脱离环境。重新激活您的virtualenv将解决此问题:

VirtualEnv: $ source bin/activate

虚拟鱼: $ vf activate [environ]

*:我使用virtualfish,但我认为此技巧与两者都有关。

If you’re using virtualenv*, it might be a good idea to double check which pip you’re using.

If you see something like /usr/local/bin/pip you’ve broken out of your environment. Reactivating your virtualenv will fix this:

VirtualEnv: $ source bin/activate

VirtualFish: $ vf activate [environ]

*: I use virtualfish, but I assume this tip is relevant to both.


回答 7

我有一个类似的问题。我用–system标志,以避免错误,因为我粗糙地在这里对其他线程在这里我解释一下我的情况的具体情况。我将其发布在这里,希望可以帮助面临相同问题的任何人。

I have a similar issue. I use the –system flag to avoid the error as I decribe here on other thread where I explain the specific case of my situation. I post this here expecting that can help anyone facing the same problem.


Django模板变量和Javascript

问题:Django模板变量和Javascript

当我使用Django模板渲染器渲染页面时,可以传入包含各种值的字典变量,以使用来在页面中对其进行操作{{ myVar }}

有没有办法在Javascript中访问相同的变量(也许使用DOM,我不知道Django如何使变量可访问)?我希望能够基于传入的变量中包含的值使用AJAX查找来查找详细信息。

When I render a page using the Django template renderer, I can pass in a dictionary variable containing various values to manipulate them in the page using {{ myVar }}.

Is there a way to access the same variable in Javascript (perhaps using the DOM, I don’t know how Django makes the variables accessible)? I want to be able to lookup details using an AJAX lookup based on the values contained in the variables passed in.


回答 0

{{variable}}直接替换为HTML。查看资料;它不是“变量”或类似的变量。它只是渲染的文本。

话虽如此,您可以将这种替换形式添加到JavaScript中。

<script type="text/javascript"> 
   var a = "{{someDjangoVariable}}";
</script>

这为您提供了“动态” javascript。

The {{variable}} is substituted directly into the HTML. Do a view source; it isn’t a “variable” or anything like it. It’s just rendered text.

Having said that, you can put this kind of substitution into your JavaScript.

<script type="text/javascript"> 
   var a = "{{someDjangoVariable}}";
</script>

This gives you “dynamic” javascript.


回答 1

注意检查票证#17419,以获取有关将类似的标签添加到Django核心中的讨论以及通过将此模板标签与用户生成的数据一起使用而引入的可能的XSS漏洞。amacneil的评论讨论了故障单中提出的大多数问题。


我认为最灵活,方便的方法是为要在JS代码中使用的变量定义模板过滤器。这样可以确保您的数据已正确转义,并且可以将其用于复杂的数据结构,例如dictlist。这就是为什么我写这个答案的原因,尽管有一个被广泛接受的答案。

这是模板过滤器的示例:

// myapp/templatetags/js.py

from django.utils.safestring import mark_safe
from django.template import Library

import json


register = Library()


@register.filter(is_safe=True)
def js(obj):
    return mark_safe(json.dumps(obj))

此模板过滤器将变量转换为JSON字符串。您可以这样使用它:

// myapp/templates/example.html

{% load js %}

<script type="text/javascript">
    var someVar = {{ some_var | js }};
</script>

CAUTION Check ticket #17419 for discussion on adding similar tag into Django core and possible XSS vulnerabilities introduced by using this template tag with user generated data. Comment from amacneil discusses most of the concerns raised in the ticket.


I think the most flexible and handy way of doing this is to define a template filter for variables you want to use in JS code. This allows you to ensure, that your data is properly escaped and you can use it with complex data structures, such as dict and list. That’s why I write this answer despite there is an accepted answer with a lot of upvotes.

Here is an example of template filter:

// myapp/templatetags/js.py

from django.utils.safestring import mark_safe
from django.template import Library

import json


register = Library()


@register.filter(is_safe=True)
def js(obj):
    return mark_safe(json.dumps(obj))

This template filters converts variable to JSON string. You can use it like so:

// myapp/templates/example.html

{% load js %}

<script type="text/javascript">
    var someVar = {{ some_var | js }};
</script>

回答 2

对我有用的解决方案是使用模板中的隐藏输入字段

<input type="hidden" id="myVar" name="variable" value="{{ variable }}">

然后以这种方式在javascript中获取值,

var myVar = document.getElementById("myVar").value;

A solution that worked for me is using the hidden input field in the template

<input type="hidden" id="myVar" name="variable" value="{{ variable }}">

Then getting the value in javascript this way,

var myVar = document.getElementById("myVar").value;

回答 3

从Django 2.1开始,专门针对此用例引入了一个新的内置模板标记:json_script

https://docs.djangoproject.com/zh-CN/3.0/ref/templates/builtins/#json-script

新标签将安全地序列化模板值并防止XSS。

Django文档摘录:

安全地将Python对象作为JSON输出,包装在标记中,可以与JavaScript一起使用。

As of Django 2.1, a new built in template tag has been introduced specifically for this use case: json_script.

https://docs.djangoproject.com/en/3.0/ref/templates/builtins/#json-script

The new tag will safely serialize template values and protects against XSS.

Django docs excerpt:

Safely outputs a Python object as JSON, wrapped in a tag, ready for use with JavaScript.


回答 4

这是我很容易做的事情:我为模板修改了base.html文件,并将其放在底部:

{% if DJdata %}
    <script type="text/javascript">
        (function () {window.DJdata = {{DJdata|safe}};})();
    </script>
{% endif %}

然后,当我想在javascript文件中使用变量时,我创建了一个DJdata字典,并通过json将其添加到上下文中: context['DJdata'] = json.dumps(DJdata)

希望能帮助到你!

Here is what I’m doing very easily: I modified my base.html file for my template and put that at the bottom:

{% if DJdata %}
    <script type="text/javascript">
        (function () {window.DJdata = {{DJdata|safe}};})();
    </script>
{% endif %}

then when I want to use a variable in the javascript files, I create a DJdata dictionary and I add it to the context by a json : context['DJdata'] = json.dumps(DJdata)

Hope it helps!


回答 5

对于字典,最好先编码为JSON。您可以使用simplejson.dumps(),或者如果要从App Engine中的数据模型进行转换,则可以使用GQLEncoder库中的encode()。

For a dictionary, you’re best of encoding to JSON first. You can use simplejson.dumps() or if you want to convert from a data model in App Engine, you could use encode() from the GQLEncoder library.


回答 6

我面临着类似的问题,S.Lott建议的答案为我工作。

<script type="text/javascript"> 
   var a = "{{someDjangoVariable}}"
</script>

但是,我想在这里指出主要的实现限制。如果您打算将javascript代码放在其他文件中,然后将该文件包含在模板中。这行不通。

仅当主模板和javascript代码位于同一文件中时,此方法才有效。也许django小组可以解决这个限制。

I was facing simillar issue and answer suggested by S.Lott worked for me.

<script type="text/javascript"> 
   var a = "{{someDjangoVariable}}"
</script>

However I would like to point out major implementation limitation here. If you are planning to put your javascript code in different file and include that file in your template. This won’t work.

This works only when you main template and javascript code is in same file. Probably django team can address this limitation.


回答 7

我也一直在为此苦苦挣扎。从表面上看,上述解决方案应该可行。但是,django架构要求每个html文件都有其自己的呈现变量(即{{contact}}呈现为contact.html,而呈现{{posts}}给eg index.html等)。另一方面,<script>标记出现{%endblock%}base.htmlfrom的后面contact.htmlindex.html继承。这基本上意味着任何解决方案,包括

<script type="text/javascript">
    var myVar = "{{ myVar }}"
</script>

必然会失败,因为变量和脚本不能共存于同一文件中。

我最终想出并为我工作的一个简单解决方案是,简单地用带有id的标签包装变量,然后在js文件中引用它,如下所示:

// index.html
<div id="myvar">{{ myVar }}</div>

然后:

// somecode.js
var someVar = document.getElementById("myvar").innerHTML;

并且只包含<script src="static/js/somecode.js"></script>base.html照常进行。当然,这只是关于获取内容。关于安全性,只需遵循其他答案即可。

I’ve been struggling with this too. On the surface it seems that the above solutions should work. However, the django architecture requires that each html file has its own rendered variables (that is, {{contact}} is rendered to contact.html, while {{posts}} goes to e.g. index.html and so on). On the other hand, <script> tags appear after the {%endblock%} in base.html from which contact.html and index.html inherit. This basically means that any solution including

<script type="text/javascript">
    var myVar = "{{ myVar }}"
</script>

is bound to fail, because the variable and the script cannot co-exist in the same file.

The simple solution I eventually came up with, and worked for me, was to simply wrap the variable with a tag with id and later refer to it in the js file, like so:

// index.html
<div id="myvar">{{ myVar }}</div>

and then:

// somecode.js
var someVar = document.getElementById("myvar").innerHTML;

and just include <script src="static/js/somecode.js"></script> in base.html as usual. Of course this is only about getting the content. Regarding security, just follow the other answers.


回答 8

对于以文本形式存储在Django字段中的JavaScript对象,它需要再次成为动态插入页面脚本中的JavaScript对象,您需要同时使用escapejsJSON.parse()

var CropOpts = JSON.parse("{{ profile.last_crop_coords|escapejs }}");

Django的escapejs句柄正确处理了引号,JSON.parse()并将字符串转换回JS对象。

For a JavaScript object stored in a Django field as text, which needs to again become a JavaScript object dynamically inserted into on-page script, you need to use both escapejs and JSON.parse():

var CropOpts = JSON.parse("{{ profile.last_crop_coords|escapejs }}");

Django’s escapejs handles the quoting properly, and JSON.parse() converts the string back into a JS object.


回答 9

请注意,如果要将变量传递给外部.js脚本,则需要在脚本标签之前加上另一个声明全局变量的脚本标签。

<script type="text/javascript">
    var myVar = "{{ myVar }}"
</script>

<script type="text/javascript" src="{% static "scripts/my_script.js" %}"></script>

data 在视图中照常定义 get_context_data

def get_context_data(self, *args, **kwargs):
    context['myVar'] = True
    return context

Note, that if you want to pass a variable to an external .js script then you need to precede your script tag with another script tag that declares a global variable.

<script type="text/javascript">
    var myVar = "{{ myVar }}"
</script>

<script type="text/javascript" src="{% static "scripts/my_script.js" %}"></script>

data is defined in the view as usual in the get_context_data

def get_context_data(self, *args, **kwargs):
    context['myVar'] = True
    return context

回答 10

我在Django 2.1中使用这种方式并为我工作,这种方式很安全(参考)

Django方面:

def age(request):
    mydata = {'age':12}
    return render(request, 'test.html', context={"mydata_json": json.dumps(mydata)})

HTML方面:

<script type='text/javascript'>
     const  mydata = {{ mydata_json|safe }};
console.log(mydata)
 </script>

I use this way in Django 2.1 and work for me and this way is secure (reference):

Django side:

def age(request):
    mydata = {'age':12}
    return render(request, 'test.html', context={"mydata_json": json.dumps(mydata)})

Html side:

<script type='text/javascript'>
     const  mydata = {{ mydata_json|safe }};
console.log(mydata)
 </script>

回答 11

您可以在字符串中声明数组变量的地方组装整个脚本,如下所示,

views.py

    aaa = [41, 56, 25, 48, 72, 34, 12]
    prueba = "<script>var data2 =["
    for a in aaa:
        aa = str(a)
        prueba = prueba + "'" + aa + "',"
    prueba = prueba + "];</script>"

将生成如下字符串

prueba = "<script>var data2 =['41','56','25','48','72','34','12'];</script>"

拥有此字符串后,必须将其发送到模板

views.py

return render(request, 'example.html', {"prueba": prueba})

在模板中,您会收到它,并以书面形式将其解释为htm代码,例如您需要的javascript代码之前

模板

{{ prueba|safe  }}

在代码的其余部分下面,请记住,在示例中使用的变量是data2

<script>
 console.log(data2);
</script>

这样,您将保留数据类型,在这种情况下,这是一种安排

you can assemble the entire script where your array variable is declared in a string, as follows,

views.py

    aaa = [41, 56, 25, 48, 72, 34, 12]
    prueba = "<script>var data2 =["
    for a in aaa:
        aa = str(a)
        prueba = prueba + "'" + aa + "',"
    prueba = prueba + "];</script>"

that will generate a string as follows

prueba = "<script>var data2 =['41','56','25','48','72','34','12'];</script>"

after having this string, you must send it to the template

views.py

return render(request, 'example.html', {"prueba": prueba})

in the template you receive it and interpret it in a literary way as htm code, just before the javascript code where you need it, for example

template

{{ prueba|safe  }}

and below that is the rest of your code, keep in mind that the variable to use in the example is data2

<script>
 console.log(data2);
</script>

that way you will keep the type of data, which in this case is an arrangement


回答 12

Javascript中有两件事对我有用:

'{{context_variable|escapejs }}'

其他:在views.py中

from json import dumps as jdumps

def func(request):
    context={'message': jdumps('hello there')}
    return render(request,'index.html',context)

并在html中:

{{ message|safe }}

There are two things that worked for me inside Javascript:

'{{context_variable|escapejs }}'

and other: In views.py

from json import dumps as jdumps

def func(request):
    context={'message': jdumps('hello there')}
    return render(request,'index.html',context)

and in the html:

{{ message|safe }}