标签归档:javascript

是什么阻碍了Ruby,Python获得Javascript V8速度?[关闭]

问题:是什么阻碍了Ruby,Python获得Javascript V8速度?[关闭]

V8引擎是否有任何阻止优化实现(例如内联缓存)的Ruby / Python功能?

Python由Google家伙共同开发,因此不应被软件专利所阻止。

还是这与Google投入V8项目的资源有关。

Are there any Ruby / Python features that are blocking implementation of optimizations (e.g. inline caching) V8 engine has?

Python is co-developed by Google guys so it shouldn’t be blocked by software patents.

Or this is rather matter of resources put into the V8 project by Google.


回答 0

是什么阻碍了Ruby,Python获得Javascript V8速度?

没有。

好吧,好的:钱。(还有时间,人员,资源,但是如果您有钱,就可以购买。)

V8拥有一支由精明,高度专业,经验丰富(因此薪资很高)的工程师组成的团队,他们在创建高性能执行方面拥有数十年的经验(我个人是在讲话,而集体而言更像是几个世纪)动态OO语言的引擎。他们基本上都是创建Sun HotSpot JVM的人(还有许多其他人)。

首席开发人员Lars Bak实际上从事VM已有25年的历史(并且所有这些VM都达到了V8版本),这基本上就是他的整个(职业)生涯。有些编写Ruby VM的人甚至不到25岁。

是否有任何Ruby / Python功能阻止V8引擎执行优化(例如,内联缓存)?

至少考虑到IronRuby,JRuby,MagLev,MacRuby和Rubinius具有单态(IronRuby)或多态内联缓存,答案显然不是。

现代Ruby实现已经进行了大量优化。例如,对于某些操作,Rubinius的Hash类比YARV的类要快。现在,这听起来并不令人兴奋,直到您意识到Rubinius的Hash类是在100%纯Ruby中实现的,而YARV的类是在100%手动优化的C中实现的。

因此,至少在某些情况下,Rubinius可以生成比GCC更好的代码!

还是这与Google投入V8项目的资源有关。

是。不只是谷歌。V8的源代码沿袭至今已有25年的历史了。使用V8的人还创建了Self VM(迄今为止,这是迄今为止创建的最快的动态OO语言执行引擎之一),Animorphic Smalltalk VM(迄今为止,是迄今为止创建的最快的Smalltalk执行引擎之一),HotSpot JVM(有史以来最快的JVM,可能是最快的VM周期)和OOVM(有史以来最高效的Smalltalk VM之一)。

实际上,V8的首席开发人员Lars Bak致力于其中每一个,以及其他一些人。

What blocks Ruby, Python to get Javascript V8 speed?

Nothing.

Well, okay: money. (And time, people, resources, but if you have money, you can buy those.)

V8 has a team of brilliant, highly-specialized, highly-experienced (and thus highly-paid) engineers working on it, that have decades of experience (I’m talking individually – collectively it’s more like centuries) in creating high-performance execution engines for dynamic OO languages. They are basically the same people who also created the Sun HotSpot JVM (among many others).

Lars Bak, the lead developer, has been literally working on VMs for 25 years (and all of those VMs have lead up to V8), which is basically his entire (professional) life. Some of the people writing Ruby VMs aren’t even 25 years old.

Are there any Ruby / Python features that are blocking implementation of optimizations (e.g. inline caching) V8 engine has?

Given that at least IronRuby, JRuby, MagLev, MacRuby and Rubinius have either monomorphic (IronRuby) or polymorphic inline caching, the answer is obviously no.

Modern Ruby implementations already do a great deal of optimizations. For example, for certain operations, Rubinius’s Hash class is faster than YARV’s. Now, this doesn’t sound terribly exciting until you realize that Rubinius’s Hash class is implemented in 100% pure Ruby, while YARV’s is implemented in 100% hand-optimized C.

So, at least in some cases, Rubinius can generate better code than GCC!

Or this is rather matter of resources put into the V8 project by Google.

Yes. Not just Google. The lineage of V8’s source code is 25 years old now. The people who are working on V8 also created the Self VM (to this day one of the fastest dynamic OO language execution engines ever created), the Animorphic Smalltalk VM (to this day one of the fastest Smalltalk execution engines ever created), the HotSpot JVM (the fastest JVM ever created, probably the fastest VM period) and OOVM (one of the most efficient Smalltalk VMs ever created).

In fact, Lars Bak, the lead developer of V8, worked on every single one of those, plus a few others.


回答 1

高度优化JavaScript解释器的动力更大,这就是为什么我们看到Mozilla,Google和Microsoft之间投入了大量资源的原因。JavaScript必须在(通常是不耐烦的)人在等待时进行下载,解析,编译和实时运行,它必须在有人与之交互时运行,并且它是在不受控制的客户端中进行的可以是计算机,电话或烤面包机的环境。为了有效地在这些条件下运行,它必须高效。

Python和Ruby在开发人员/部署者控制的环境中运行。通常,功能强大的服务器或台式机系统的限制因素将是诸如内存或磁盘I / O之类的因素,而不是执行时间。或者在可以利用非引擎优化(例如缓存)的地方。对于这些语言,将重点放在语言和库功能集而不是速度优化上可能更有意义。

这样做的附带好处是,我们有两个出色的高性能开源JavaScript引擎,这些引擎可以并且正在重新用于所有类型的应用程序,例如Node.js。

There’s a lot more impetus to highly optimize JavaScript interpretors which is why we see so many resources being put into them between Mozilla, Google, and Microsoft. JavaScript has to be downloaded, parsed, compiled, and run in real time while a (usually impatient) human being is waiting for it, it has to run WHILE a person is interacting with it, and it’s doing this in an uncontrolled client-end environment that could be a computer, a phone, or a toaster. It HAS to be efficient in order to run under these conditions effectively.

Python and Ruby are run in an environment controlled by the developer/deployer. A beefy server or desktop system generally where the limiting factor will be things like memory or disk I/O and not execution time. Or where non-engine optimizations like caching can be utilized. For these languages it probably does make more sense to focus on language and library feature set over speed optimization.

The side benefit of this is that we have two great high performance open source JavaScript engines that can and are being re-purposed for all manner of applications such as Node.js.


回答 2

其中很大一部分与社区有关。大多数情况下,Python和Ruby没有公司的支持。没有人获得全职从事Python和Ruby工作的报酬(尤其是他们没有获得始终从事CPython或MRI工作的报酬)。另一方面,V8得到了世界上最强大的IT公司的支持。

此外,V8可以更快,因为对V8员工唯一重要的是解释器-他们没有标准的库可以使用,也无需担心语言设计。他们只是写翻译。而已。

它与知识产权法无关。Python也不是由Google伙计共同开发的(它的创建者与其他一些提交者在那儿一起工作,但是在Python上工作他们没有得到报酬)。

Python 3的另一个障碍是Python3。它的采用似乎是语言开发人员的主要关注点,以至于他们冻结了新语言功能的开发,直到其他实现赶上来。

关于技术细节,我对Ruby不太了解,但是Python在很多地方都可以使用优化功能(Google项目Unladen Swallow在开始努力之前就开始实现这些功能)。这是他们计划的一些优化。如果为CPython实现JIT la PyPy,我可以看到Python在将来获得V8的速度,但这在未来几年似乎不太可能(目前的重点是采用Python 3,而不是JIT)。

许多人还认为Ruby和Python可以从删除各自的全局解释器锁中受益匪浅。

您还必须了解Python和Ruby都是比JS重得多的语言-它们以标准库,语言功能和结构的方式提供了更多内容。单独的面向对象的类系统增加了很多权重(我认为这是一种很好的方式)。我几乎认为Javascript是一种旨在嵌入的语言,例如Lua(在许多方面,它们是相似的)。Ruby和Python具有更丰富的功能集,而表现力通常是以速度为代价的。

A good part of it has to do with community. Python and Ruby for the most part have no corporate backing. No one gets paid to work on Python and Ruby full-time (and they especially don’t get paid to work on CPython or MRI the whole time). V8, on the other hand, is backed by the most powerful IT company in the world.

Furthermore, V8 can be faster because the only thing that matters to the V8 people is the interpreter — they have no standard library to work on, no concerns about language design. They just write the interpreter. That’s it.

It has nothing to do with intellectual property law. Nor is Python co-developed by Google guys (its creator works there along with a few other committers, but they don’t get paid to work on Python).

Another obstacle to Python speed is Python 3. Its adoption seems to be the main concern of the language developers — to the point that they have frozen development of new language features until other implementations catch up.

On to the technical details, I don’t know much about Ruby, but Python has a number of places where optimizations could be used (and Unladen Swallow, a Google project, started to implement these before biting the dust). Here are some of the optimizations that they planned. I could see Python gaining V8 speed in the future if a JIT a la PyPy gets implemented for CPython, but that does not seem likely for the coming years (the focus right now is Python 3 adoption, not a JIT).

Many also feel that Ruby and Python could benefit immensely from removing their respective global interpreter locks.

You also have to understand that Python and Ruby are both much heavier languages than JS — they provide far more in the way of standard library, language features, and structure. The class system of object-orientation alone adds a great deal of weight (in a good way, I think). I almost think of Javascript as a language designed to be embedded, like Lua (and in many ways, they are similar). Ruby and Python have a much richer set of features, and that expressiveness is usually going to come at the cost of speed.


回答 3

性能似乎并不是核心Python开发人员的主要关注点,他们似乎认为“足够快”足够好,并且帮助程序员提高生产率的功能比帮助计算机更快地运行代码的功能更为重要。

但是,确实的确有一个Google项目(现在已被废弃),未满载吞咽,用于生成与标准解释器兼容的更快的Python解释器。PyPy是另一个旨在产生更快的Python的项目。还有PsyPy的先驱PsycoCython,它们可以在不更改整个解释器的情况下提高许多Python脚本的性能,而Cython则可以让您使用非常类似于Python语法的方式为Python编写高性能的C库。

Performance doesn’t seem to be a major focus of the core Python developers, who seem to feel that “fast enough” is good enough, and that features that help programmers be more productive are more important than features that help computers run code faster.

Indeed, however, there was a (now abandoned) Google project, unladen-swallow, to produce a faster Python interpreter compatible with the standard interpreter. PyPy is another project that intends to produce a faster Python. There is also Psyco, the forerunner of PyPy, which can provide performance boosts to many Python scripts without changing out the whole interpreter, and Cython, which lets you write high-performance C libraries for Python using something very much like Python syntax.


回答 4

误导性的问题。V8是JavaScript的JIT(即时编译器)实现,在其最流行的非浏览器实现Node.js中,它是围绕事件循环构建的。CPython不是JIT,也不是事件。但是这些在PyPy项目中最普遍地存在于Python中-一个兼容CPython 2.7(不久将成为3.0+)的JIT。并且有大量事件服务器库,例如Tornado。在运行Tornado与Node.js的PyPy之间存在真实的测试,并且性能差异很小。

Misleading question. V8 is a JIT (a just in time compiler) implementation of JavaScript and in its most popular non-browser implementation Node.js it is constructed around an event loop. CPython is not a JIT & not evented. But these exist in Python most commonly in the PyPy project – a CPython 2.7 (and soon to be 3.0+) compatible JIT. And there are loads of evented server libraries like Tornado for example. Real world tests exist between PyPy running Tornado vs Node.js and the performance differences are slight.


回答 5

我只是遇到了这个问题,而性能差异也有一个未提及的重大技术原因。Python拥有功能强大的软件扩展的庞大生态系统,但是其中大多数扩展都是用C或其他低级语言编写的,以提高性能,并且与CPython API紧密相关。

有许多众所周知的技术(JIT,现代垃圾收集器等)可用于加速CPython实现,但是所有这些都需要对API进行实质性更改,从而破坏了该过程中的大多数扩展。CPython会更快,但是很多使Python如此吸引人的东西(广泛的软件堆栈)将会丢失。恰当的例子是,那里有几种更快的Python实现,但是与CPython相比,它们几乎没有吸引力。

I just ran across this question and there is also a big technical reason for the performance difference that wasn’t mentioned. Python has a very large ecosystem of powerful software extensions, but most of these extensions are written in C or other low-level languages for performance and are heavily tied to the CPython API.

There are lots of well-known techniques (JIT, modern garbage collector, etc) that could be used to speed up the CPython implementation but all would require substantial changes to the API, breaking most of the extensions in the process. CPython would be faster, but a lot of what makes Python so attractive (the extensive software stack) would be lost. Case in point, there are several faster Python implementations out there but they have little traction compared to CPython.


回答 6

由于不同的设计优先级和用例目标,我相信。

通常,脚本(aka动态)语言的主要目的是成为本机函数调用之间的“胶水”。这些本机功能应a)覆盖最关键/经常使用的区域,并且b)尽可能有效。

这是一个示例: 导致iOS Safari冻结jQuery排序冻结是由于过度使用了按选择器调用而引起的。如果按选择器获取将以本机代码实现并且有效,那么根本就不会出现这样的问题。

考虑ray-tracer演示,它是V8演示的常用演示。在Python世界中,由于Python提供了本机扩展的所有功能,因此可以用本机代码实现。但是在V8领域(客户端沙箱)中,您没有其他选择,只能使VM发挥更大的作用。因此,唯一的选择是使用脚本代码来查看ray-tracer的实现。

因此有不同的优先事项和动机。

Sciter中,我通过本地实现几乎完整的jQurey核心进行了测试。在诸如ScIDE(由HTML / CSS / Script制成的IDE)之类的实际任务上,我相信这种解决方案比任何VM优化都有效。

Because of different design priorities and use case goals I believe.

In general main purpose of scripting (a.k.a. dynamic) languages is to be a “glue” between calls of native functions. And these native functions shall a) cover most critical/frequently used areas and b) be as effective as possible.

Here is an example: jQuery sort causing iOS Safari to freeze The freeze there is caused by excessive use of get-by-selector calls. If get-by-selector would be implemented in native code and effectively it will be no such problem at all.

Consider ray-tracer demo that is frequently used demo for V8 demonstration. In Python world it can be implemented in native code as Python provides all facilities for native extensions. But in V8 realm (client side sandbox) you have no other options rather than making VM to be [sub]effective as possible. And so the only option see ray-tracer implementation there is by using script code.

So different priorities and motivations.

In Sciter I’ve made a test by implementing pretty much full jQurey core natively. On practical tasks like ScIDE (IDE made of HTML/CSS/Script) I believe such solution works significantly better then any VM optimizations.


回答 7

正如其他人所提到的,Python具有PyPy形式的高性能JIT编译器。

进行有意义的基准测试总是很微妙的,但是我碰巧有一个用不同语言编写的K均值的简单基准测试-您可以在这里找到。约束之一是,各种语言都应实现相同的算法,并应努力做到简单和惯用(相对于速度优化而言)。我已经编写了所有实现,所以我知道我没有被欺骗,尽管我不能对所有语言都声称我所写的东西是惯用的(我只对其中的一些有所了解)。

我没有任何明确的结论,但是PyPy是我得到的最快的实现之一,远胜于Node。相反,CPython是排名最慢的一端。

As other people have mentioned, Python has a performant JIT compiler in the form of PyPy.

Making meaningful benchmarks is always subtle, but I happen to have a simple benchmark of K-means written in different languages – you can find it here. One of the constraints was that the various languages should all implement the same algorithm and should strive to be simple and idiomatic (as opposed to optimized for speed). I have written all the implementations, so I know I have not cheated, although I cannot claim for all languages that what I have written is idiomatic (I only have a passing knowledge of some of those).

I do not claim any definitive conclusion, but PyPy was among the fastest implementations I got, far better than Node. CPython, instead, was at the slowest end of the ranking.


回答 8

该说法不完全正确

就像V8只是JS的实现一样,CPython也是Python的一种实现。Pypy具有与V8匹配的性能

此外,还存在可感知的性能问题:由于V8本质上是非阻塞的,因此Web开发人员可以节省更多的IO等待,从而导致性能更高的项目。V8主要用于以IO为关键的dev Web,因此他们将其与类似项目进行了比较。但是您可以在Web开发人员之外的许多其他领域中使用Python。您甚至可以使用C扩展来完成许多任务,例如科学计算或加密,并以出色的性能处理数据。

但是在网络上,最流行的Python和Ruby项目正在阻止。尤其是Python,它具有同步WSGI标准的遗产,像著名的Django之类的框架都基于它。

您可以编写异步Python(例如Twisted,Tornado,gevent或asyncio)或Ruby。但是它并不经常执行。最好的工具仍在阻塞。

但是,这是为什么Ruby和Python中的默认实现没有V8这么快的一些原因。

经验

就像JörgW Mittag指出的那样,从事V8工作的人都是VM天才。Python是一群热情的开发人员,在很多领域都非常擅长,但并不擅长VM调优。

资源资源

Python软件基金会的资金很少:一年不到40K的资金来投资Python。当您认为像Google,Facebook或Apple这样的大公司都在使用Python时,这有点疯狂,但这是一个丑陋的事实:大多数工作都是免费完成的。自愿者在Java手工制作之前为Youtube提供支持的语言。

他们是聪明而敬业的志愿者,但是当他们确定自己在田间需要更多果汁时,就无法要求30万聘请该领域的顶尖专家。他们必须四处寻找免费提供服务的人。

在此有效的同时,这意味着您必须非常注意自己的优先事项。因此,现在我们需要查看:

目标

即使具有最新的现代功能,编写Java脚本也很糟糕。您遇到范围问题,集合极少,可怕的字符串和数组操作,几乎没有日期,数学和正则表达式之外的stdlist,甚至对于非常常见的操作也没有语法糖。

但是在V8中,您已经有了速度。

这是因为,速度是Google的主要目标,因为它是Chrome浏览器页面渲染的瓶颈。

在Python中,可用性是主要目标。因为这几乎从来不是项目的瓶颈。这里的稀缺资源是开发人员时间。针对开发人员进行了优化。

The statement is not exactly true

Just like V8 is just an implementation for JS, CPython is just one implementation for Python. Pypy has performances matching V8’s.

Also, there the problem of perceived performance : since V8 is natively non blocking, Web dev leads to more performant projects because you save the IO wait. And V8 is mainly used for dev Web where IO is key, so they compare it to similar projects. But you can use Python in many, many other areas than web dev. And you can even use C extensions for a lot of tasks, such as scientific computations or encryption, and crunch data with blazing perfs.

But on the web, most popular Python and Ruby projects are blocking. Python, especially, has the legacy of the synchronous WSGI standard, and frameworks like the famous Django are based on it.

You can write asynchronous Python (like with Twisted, Tornado, gevent or asyncio) or Ruby. But it’s not done often. The best tools are still blocking.

However, they are some reasons for why the default implementations in Ruby and Python are not as speedy as V8.

Experience

Like Jörg W Mittag pointed out, the guys working on V8 are VM geniuses. Python is dev by a bunch a passionate people, very good in a lot of domains, but are not as specialized in VM tuning.

Resources

The Python Software foundation has very little money : less than 40k in a year to invest in Python. This is kinda crazy when you think big players such as Google, Facebook or Apple are all using Python, but it’s the ugly truth : most work is done for free. The language that powers Youtube and existed before Java has been handcrafted by volunteers.

They are smart and dedicated volunteers, but when they identify they need more juice in a field, they can’t ask for 300k to hire a top notch specialist for this area of expertise. They have to look around for somebody who would do it for free.

While this works, it means you have to be very a careful about your priorities. Hence, now we need to look at :

Objectives

Even with the latest modern features, writing Javascript is terrible. You have scoping issues, very few collections, terrible string and array manipulation, almost no stdlist apart from date, maths and regexes, and no syntactic sugar even for very common operations.

But in V8, you’ve got speed.

This is because, speed was the main objective for Google, since it’s a bottleneck for page rendering in Chrome.

In Python, usability is the main objective. Because it’s almost never the bottleneck on the project. The scarce resource here is developer time. It’s optimized for the developer.


回答 9

因为JavaScript实现不需要关心其绑定的向后兼容性。

直到最近,JavaScript实现的唯一用户还是Web浏览器。由于安全性要求,仅Web浏览器供应商有权通过向运行时编写绑定来扩展功能。因此,无需保持绑定的C API向后兼容,可以允许Web浏览器开发人员随着JavaScript运行时的发展来更新其源代码。他们反正一起工作。甚至是V8,它是游戏的后来者,也是由一个非常有经验的开发人员领导的,随着它变得越来越好,它也改变了API。

OTOH Ruby(主要)用于服务器端。许多流行的ruby扩展都被编写为C绑定(考虑RDBMS驱动程序)。换句话说,如果不保持兼容性,Ruby永远不会成功。

今天,这种差异在一定程度上仍然存在。使用node.js的开发人员抱怨说,随着V8随着时间的推移更改API,很难保持其本机扩展向后兼容(这是node.js被派生的原因之一)。在这方面,IIRC红宝石仍采取更为保守的方法。

Because JavaScript implementations need not care about backwards compatibility of their bindings.

Until recently the only users of the JavaScript implementations have been web browsers. Due to security requirements, only the web browser vendors had the privilege to extend the functionality by writing bindings to the runtimes. Thus there was no need keep the C API of the bindings backwards compatible, it was permissible to request the web browser developers update their source code as the JavaScript runtimes evolved; they were working together anyways. Even V8, which was a latecomer to the game, and also lead by a very very experienced developer, have changed the API as it became better.

OTOH Ruby is used (mainly) on the server-side. Many popular ruby extensions are written as C bindings (consider an RDBMS driver). In other words, Ruby would have never succeeded without maintaining the compatibility.

Today, the difference still exist to some extent. Developers using node.js are complaining that it is hard to keep their native extensions backwards compatible, as V8 changes the API over time (and that is one of the reasons node.js has been forked). IIRC ruby is still taking a much more conservative approach in this respect.


回答 10

由于JIT,曲轴,类型推断器和数据优化代码,V8速度很快。标记指针,NaN标记双打。当然,它在中间进行常规的编译器优化。

普通的ruby,python和perl引擎都不会做这些,而只是进行一些基本的优化。

唯一接近的主要虚拟机是luajit,它甚至不进行类型推断,常量折叠,NaN标记或整数,但是使用类似的小代码和数据结构,不如不良语言那么胖。我的原型动态语言potion和p2具有与luajit类似的功能,并且胜过v8。使用可选的类型系统“渐进式键入”,您可以轻松超越v8,因为您可以绕过曲轴。参见飞镖。

已知的优化后端(如pypy或jruby)仍然遭受各种过度设计技术的困扰。

V8 is fast due to the JIT, Crankshaft, the type inferencer and data-optimized code. Tagged pointers, NaN-tagging of doubles. And of course it does normal compiler optimizations in the middle.

The plain ruby, python and perl engines don’t do neither of the those, just minor basic optimizations.

The only major vm which comes close is luajit, which doesn’t even do type inference, constant folding, NaN-tagging nor integers, but uses similar small code and data structures, not as fat as the bad languages. And my prototype dynamic languages, potion and p2 have similar features as luajit, and outperform v8. With an optional type system, “gradual typing”, you could easily outperform v8, as you can bypass crankshaft. See dart.

The known optimized backends, like pypy or jruby still suffer from various over-engineering techniques.


网站可以检测到何时在chromedriver中使用硒吗?

问题:网站可以检测到何时在chromedriver中使用硒吗?

我一直在使用Chromedriver测试Selenium,但我注意到有些页面可以检测到您正在使用Selenium,即使根本没有自动化。即使当我只是通过Selenium和Xephyr使用chrome手动浏览时,我也经常得到一个页面,指出检测到可疑活动。我已经检查了用户代理和浏览器指纹,它们与普通的chrome浏览器完全相同。

当我以普通的chrome浏览到这些站点时,一切正常,但是当我使用Selenium时,我被检测到。

从理论上讲,chromedriver和chrome在任何Web服务器上看起来都应该完全相同,但是它们可以通过某种方式检测到它。

如果您想要一些测试代码,请尝试以下方法:

from pyvirtualdisplay import Display
from selenium import webdriver

display = Display(visible=1, size=(1600, 902))
display.start()
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument('--profile-directory=Default')
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--disable-plugins-discovery");
chrome_options.add_argument("--start-maximized")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.delete_all_cookies()
driver.set_window_size(800,800)
driver.set_window_position(0,0)
print 'arguments done'
driver.get('http://stubhub.com')

如果浏览stubhub,您将在一个或两个请求中被重定向和“阻止”。我一直在对此进行调查,无法弄清楚他们如何分辨用户正在使用Selenium。

他们是怎么做到的呢?

编辑更新:

我在Firefox中安装了Selenium IDE插件,当我在普通的Firefox浏览器中仅使用附加插件访问stubhub.com时就被禁止了。

编辑:

当我使用Fiddler来回查看HTTP请求时,我注意到“假浏览器”的请求通常在响应标头中具有“ no-cache”。

编辑:

像这样的结果是否有办法从Javascript检测到我在Selenium Webdriver页面中,这表明应该没有办法检测何时使用Webdriver。但这证据表明并非如此。

编辑:

该站点将指纹上载到他们的服务器,但是我检查了一下,硒的指纹与使用chrome时的指纹相同。

编辑:

这是它们发送到服务器的指纹有效载荷之一

{"appName":"Netscape","platform":"Linuxx86_64","cookies":1,"syslang":"en-US","userlang":"en-US","cpu":"","productSub":"20030107","setTimeout":1,"setInterval":1,"plugins":{"0":"ChromePDFViewer","1":"ShockwaveFlash","2":"WidevineContentDecryptionModule","3":"NativeClient","4":"ChromePDFViewer"},"mimeTypes":{"0":"application/pdf","1":"ShockwaveFlashapplication/x-shockwave-flash","2":"FutureSplashPlayerapplication/futuresplash","3":"WidevineContentDecryptionModuleapplication/x-ppapi-widevine-cdm","4":"NativeClientExecutableapplication/x-nacl","5":"PortableNativeClientExecutableapplication/x-pnacl","6":"PortableDocumentFormatapplication/x-google-chrome-pdf"},"screen":{"width":1600,"height":900,"colorDepth":24},"fonts":{"0":"monospace","1":"DejaVuSerif","2":"Georgia","3":"DejaVuSans","4":"TrebuchetMS","5":"Verdana","6":"AndaleMono","7":"DejaVuSansMono","8":"LiberationMono","9":"NimbusMonoL","10":"CourierNew","11":"Courier"}}

硒和铬相同

编辑:

VPN只能使用一次,但是在加载第一页后会被检测到。显然,正在运行一些JavaScript来检测Selenium。

I’ve been testing out Selenium with Chromedriver and I noticed that some pages can detect that you’re using Selenium even though there’s no automation at all. Even when I’m just browsing manually just using chrome through Selenium and Xephyr I often get a page saying that suspicious activity was detected. I’ve checked my user agent, and my browser fingerprint, and they are all exactly identical to the normal chrome browser.

When I browse to these sites in normal chrome everything works fine, but the moment I use Selenium I’m detected.

In theory chromedriver and chrome should look literally exactly the same to any webserver, but somehow they can detect it.

If you want some testcode try out this:

from pyvirtualdisplay import Display
from selenium import webdriver

display = Display(visible=1, size=(1600, 902))
display.start()
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument('--profile-directory=Default')
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--disable-plugins-discovery");
chrome_options.add_argument("--start-maximized")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.delete_all_cookies()
driver.set_window_size(800,800)
driver.set_window_position(0,0)
print 'arguments done'
driver.get('http://stubhub.com')

If you browse around stubhub you’ll get redirected and ‘blocked’ within one or two requests. I’ve been investigating this and I can’t figure out how they can tell that a user is using Selenium.

How do they do it?

EDIT UPDATE:

I installed the Selenium IDE plugin in Firefox and I got banned when I went to stubhub.com in the normal firefox browser with only the additional plugin.

EDIT:

When I use Fiddler to view the HTTP requests being sent back and forth I’ve noticed that the ‘fake browser\’s’ requests often have ‘no-cache’ in the response header.

EDIT:

results like this Is there a way to detect that I’m in a Selenium Webdriver page from Javascript suggest that there should be no way to detect when you are using a webdriver. But this evidence suggests otherwise.

EDIT:

The site uploads a fingerprint to their servers, but I checked and the fingerprint of selenium is identical to the fingerprint when using chrome.

EDIT:

This is one of the fingerprint payloads that they send to their servers

{"appName":"Netscape","platform":"Linuxx86_64","cookies":1,"syslang":"en-US","userlang":"en-US","cpu":"","productSub":"20030107","setTimeout":1,"setInterval":1,"plugins":{"0":"ChromePDFViewer","1":"ShockwaveFlash","2":"WidevineContentDecryptionModule","3":"NativeClient","4":"ChromePDFViewer"},"mimeTypes":{"0":"application/pdf","1":"ShockwaveFlashapplication/x-shockwave-flash","2":"FutureSplashPlayerapplication/futuresplash","3":"WidevineContentDecryptionModuleapplication/x-ppapi-widevine-cdm","4":"NativeClientExecutableapplication/x-nacl","5":"PortableNativeClientExecutableapplication/x-pnacl","6":"PortableDocumentFormatapplication/x-google-chrome-pdf"},"screen":{"width":1600,"height":900,"colorDepth":24},"fonts":{"0":"monospace","1":"DejaVuSerif","2":"Georgia","3":"DejaVuSans","4":"TrebuchetMS","5":"Verdana","6":"AndaleMono","7":"DejaVuSansMono","8":"LiberationMono","9":"NimbusMonoL","10":"CourierNew","11":"Courier"}}

Its identical in selenium and in chrome

EDIT:

VPNs work for a single use but get detected after I load the first page. Clearly some javascript is being run to detect Selenium.


回答 0

对于Mac用户

cdc_使用Vim或Perl 替换变量

您可以使用vim,或如@Vic Seedoubleyew在@ Erti-Chris Eelmaa的答案中指出的那样perl,替换中的cdc_变量chromedriver请参阅@ Erti-Chris Eelmaa的帖子以了解有关该变量的更多信息)。使用vimperl防止您不得不重新编译源代码或使用十六进制编辑器。chromedriver在尝试编辑原件之前,请确保对其进行复印。另外,以下方法也在上进行了测试chromedriver version 2.41.578706


使用Vim

vim /path/to/chromedriver

在上面的代码行之后,您可能会看到一堆乱码。请执行下列操作:

  1. cdc_通过键入/cdc_并按进行搜索return
  2. 按启用编辑a
  3. 删除任意数量的,$cdc_lasutopfhvcZLmcfl然后用相等数量的字符替换删除的内容。如果您不这样做,chromedriver将会失败。
  4. 编辑完成后,按esc
  5. 要保存更改并退出,请键入:wq!并按return
  6. 如果您不想保存更改,但要退出,请键入:q!并按return
  7. 你完成了。

转到更改后的chromedriver双击。terminal应打开一个窗口。如果killed在输出中看不到,则说明您成功更改了驱动程序。


使用Perl

下面的行替换cdc_dog_

perl -pi -e 's/cdc_/dog_/g' /path/to/chromedriver

确保替换字符串的字符数与搜索字符串的字符数相同,否则chromedriver将失败。

Perl说明

s///g 表示您要搜索一个字符串并将其全局替换为另一个字符串(替换所有出现的字符串)。

例如, s/string/replacment/g

所以,

s/// 表示搜索并替换字符串。

cdc_ 是搜索字符串。

dog_ 是替换字符串。

g 是全局键,它将替换每次出现的字符串。

如何检查Perl替代品是否有效

以下行将打印每次出现的搜索字符串cdc_

perl -ne 'while(/cdc_/g){print "$&\n";}' /path/to/chromedriver

如果没有返回任何内容,cdc_则已被替换。

相反,您可以使用以下代码:

perl -ne 'while(/dog_/g){print "$&\n";}' /path/to/chromedriver

查看替换字符串,dog_现在是否在chromedriver二进制文件中。如果是这样,替换字符串将被打印到控制台。

转到更改后的chromedriver双击。terminal应打开一个窗口。如果killed在输出中看不到,则说明您成功更改了驱动程序。


包起来

更改chromedriver二进制文件后,请确保更改后的二进制文件的名称chromedriverchromedriver,并且原始二进制文件已从其原始位置移动或重命名。


我对这种方法的经验

以前,我在尝试登录时在网站上被检测到我,但是用cdc_相同大小的字符串替换后,我得以登录。但是就像其他人所说的那样,如果已经被检测到,则可能会被阻止即使使用此方法后,还有其他原因。因此,您可能必须尝试使用​​VPN,其他网络或具有什么功能的站点访问检测到您的站点。

For Mac Users

Replacing cdc_ variable using Vim or Perl

You can use vim, or as @Vic Seedoubleyew has pointed out in the answer by @Erti-Chris Eelmaa, perl, to replace the cdc_ variable in chromedriver(See post by @Erti-Chris Eelmaa to learn more about that variable). Using vim or perl prevents you from having to recompile source code or use a hex-editor. Make sure to make a copy of the original chromedriver before attempting to edit it. Also, the methods below were tested on chromedriver version 2.41.578706.


Using Vim

vim /path/to/chromedriver

After running the line above, you’ll probably see a bunch of gibberish. Do the following:

  1. Search for cdc_ by typing /cdc_ and pressing return.
  2. Enable editing by pressing a.
  3. Delete any amount of $cdc_lasutopfhvcZLmcfl and replace what was deleted with an equal amount characters. If you don’t, chromedriver will fail.
  4. After you’re done editing, press esc.
  5. To save the changes and quit, type :wq! and press return.
  6. If you don’t want to save the changes, but you want to quit, type :q! and press return.
  7. You’re done.

Go to the altered chromedriver and double click on it. A terminal window should open up. If you don’t see killed in the output, you successfully altered the driver.


Using Perl

The line below replaces cdc_ with dog_:

perl -pi -e 's/cdc_/dog_/g' /path/to/chromedriver

Make sure that the replacement string has the same number of characters as the search string, otherwise the chromedriver will fail.

Perl Explanation

s///g denotes that you want to search for a string and replace it globally with another string (replaces all occurrences).

e.g., s/string/replacment/g

So,

s/// denotes searching for and replacing a string.

cdc_ is the search string.

dog_ is the replacement string.

g is the global key, which replaces every occurrence of the string.

How to check if the Perl replacement worked

The following line will print every occurrence of the search string cdc_:

perl -ne 'while(/cdc_/g){print "$&\n";}' /path/to/chromedriver

If this returns nothing, then cdc_ has been replaced.

Conversely, you can use the this:

perl -ne 'while(/dog_/g){print "$&\n";}' /path/to/chromedriver

to see if your replacement string, dog_, is now in the chromedriver binary. If it is, the replacement string will be printed to the console.

Go to the altered chromedriver and double click on it. A terminal window should open up. If you don’t see killed in the output, you successfully altered the driver.


Wrapping Up

After altering the chromedriver binary, make sure that the name of the altered chromedriver binary is chromedriver, and that the original binary is either moved from its original location or renamed.


My Experience With This Method

I was previously being detected on a website while trying to log in, but after replacing cdc_ with an equal sized string, I was able to log in. Like others have said though, if you’ve already been detected, you might get blocked for a plethora of other reasons even after using this method. So you may have to try accessing the site that was detecting you using a VPN, different network, or what have you.


回答 1

基本上,硒检测的工作方式是,它们检测与selenium一起运行时出现的预定义javascript变量。僵尸程序检测脚本通常会在任何变量中(在窗口对象上)查找包含单词“ selenium” /“ webdriver”的内容,并记录名为$cdc_和的变量$wdc_。当然,所有这些取决于您所使用的浏览器。所有不同的浏览器都公开不同的内容。

对我来说,我使用了chrome,所以,要做的就是确保$cdc_不再存在作为文档变量,然后瞧瞧(下载chromedriver源代码,修改chromedriver并$cdc_以不同的名称重新编译。)

这是我在chromedriver中修改的功能:

call_function.js:

function getPageCache(opt_doc) {
  var doc = opt_doc || document;
  //var key = '$cdc_asdjflasutopfhvcZLmcfl_';
  var key = 'randomblabla_';
  if (!(key in doc))
    doc[key] = new Cache();
  return doc[key];
}

(注意评论,我所做的我转过身$cdc_randomblabla_

这是一个伪代码,演示了僵尸网络可能使用的一些技术:

runBotDetection = function () {
    var documentDetectionKeys = [
        "__webdriver_evaluate",
        "__selenium_evaluate",
        "__webdriver_script_function",
        "__webdriver_script_func",
        "__webdriver_script_fn",
        "__fxdriver_evaluate",
        "__driver_unwrapped",
        "__webdriver_unwrapped",
        "__driver_evaluate",
        "__selenium_unwrapped",
        "__fxdriver_unwrapped",
    ];

    var windowDetectionKeys = [
        "_phantom",
        "__nightmare",
        "_selenium",
        "callPhantom",
        "callSelenium",
        "_Selenium_IDE_Recorder",
    ];

    for (const windowDetectionKey in windowDetectionKeys) {
        const windowDetectionKeyValue = windowDetectionKeys[windowDetectionKey];
        if (window[windowDetectionKeyValue]) {
            return true;
        }
    };
    for (const documentDetectionKey in documentDetectionKeys) {
        const documentDetectionKeyValue = documentDetectionKeys[documentDetectionKey];
        if (window['document'][documentDetectionKeyValue]) {
            return true;
        }
    };

    for (const documentKey in window['document']) {
        if (documentKey.match(/\$[a-z]dc_/) && window['document'][documentKey]['cache_']) {
            return true;
        }
    }

    if (window['external'] && window['external'].toString() && (window['external'].toString()['indexOf']('Sequentum') != -1)) return true;

    if (window['document']['documentElement']['getAttribute']('selenium')) return true;
    if (window['document']['documentElement']['getAttribute']('webdriver')) return true;
    if (window['document']['documentElement']['getAttribute']('driver')) return true;

    return false;
};

根据用户@szx,也可以在十六进制编辑器中简单地打开chromedriver.exe,然后手动进行替换,而无需进行任何编译。

Basically the way the selenium detection works, is that they test for pre-defined javascript variables which appear when running with selenium. The bot detection scripts usually look anything containing word “selenium” / “webdriver” in any of the variables (on window object), and also document variables called $cdc_ and $wdc_. Of course, all of this depends on which browser you are on. All the different browsers expose different things.

For me, I used chrome, so, all that I had to do was to ensure that $cdc_ didn’t exist anymore as document variable, and voila (download chromedriver source code, modify chromedriver and re-compile $cdc_ under different name.)

this is the function I modified in chromedriver:

call_function.js:

function getPageCache(opt_doc) {
  var doc = opt_doc || document;
  //var key = '$cdc_asdjflasutopfhvcZLmcfl_';
  var key = 'randomblabla_';
  if (!(key in doc))
    doc[key] = new Cache();
  return doc[key];
}

(note the comment, all I did I turned $cdc_ to randomblabla_.

Here is a pseudo-code which demonstrates some of the techniques that bot networks might use:

runBotDetection = function () {
    var documentDetectionKeys = [
        "__webdriver_evaluate",
        "__selenium_evaluate",
        "__webdriver_script_function",
        "__webdriver_script_func",
        "__webdriver_script_fn",
        "__fxdriver_evaluate",
        "__driver_unwrapped",
        "__webdriver_unwrapped",
        "__driver_evaluate",
        "__selenium_unwrapped",
        "__fxdriver_unwrapped",
    ];

    var windowDetectionKeys = [
        "_phantom",
        "__nightmare",
        "_selenium",
        "callPhantom",
        "callSelenium",
        "_Selenium_IDE_Recorder",
    ];

    for (const windowDetectionKey in windowDetectionKeys) {
        const windowDetectionKeyValue = windowDetectionKeys[windowDetectionKey];
        if (window[windowDetectionKeyValue]) {
            return true;
        }
    };
    for (const documentDetectionKey in documentDetectionKeys) {
        const documentDetectionKeyValue = documentDetectionKeys[documentDetectionKey];
        if (window['document'][documentDetectionKeyValue]) {
            return true;
        }
    };

    for (const documentKey in window['document']) {
        if (documentKey.match(/\$[a-z]dc_/) && window['document'][documentKey]['cache_']) {
            return true;
        }
    }

    if (window['external'] && window['external'].toString() && (window['external'].toString()['indexOf']('Sequentum') != -1)) return true;

    if (window['document']['documentElement']['getAttribute']('selenium')) return true;
    if (window['document']['documentElement']['getAttribute']('webdriver')) return true;
    if (window['document']['documentElement']['getAttribute']('driver')) return true;

    return false;
};

according to user @szx, it is also possible to simply open chromedriver.exe in hex editor, and just do the replacement manually, without actually doing any compiling.


回答 2

正如我们已经在问题和发布的答案中弄清楚的那样,这里有一个反Web 爬网和一个名为“ Distil Networks”的Bot检测服务。而且,根据公司首席执行官的采访

即使他们可以创建新的机器人,我们还是想出了一种方法来识别Selenium,即他们正在使用的工具,因此,无论Selenium在该机器人上迭代多少次,我们都将阻止它。我们现在使用Python和许多不同的技术来做到这一点。一旦我们发现一种类型的漫游器出现了某种模式,那么我们就会对他们使用的技术进行反向工程并将其识别为恶意软件。

要了解它们如何精确地检测硒,需要时间和其他挑战,但是目前我们可以肯定地说些什么:

  • 它与您对硒采取的措施无关-一旦导航到该站点,便会立即被发现并被禁止。我尝试在动作之间添加人为的随机延迟,在页面加载后暂停-没有任何帮助
  • 这也不是关于浏览器指纹的-在具有干净配置文件而不是隐身模式的多个浏览器中尝试过-没有任何帮助
  • 因为根据采访中的提示,这是“逆向工程”,所以我怀疑这是通过在浏览器中执行一些JS代码完成的,这表明这是通过Selenium Webdriver自动化的浏览器

决定将其发布为答案,因为显然:

网站可以检测到何时在chromedriver中使用硒吗?

是。


另外,我还没有尝试过使用较旧的硒和较旧的浏览器版本-从理论上讲,Distil Networks僵尸检测程序当前依赖于某个特定点,硒中可能实现或添加了某些东西。然后,如果是这种情况,我们可能会检测到(是的,让我们检测检测器)在哪个点/版本上进行了相关更改,调查变更日志和变更集,并且可能会为我们提供有关在哪里查看的更多信息。以及它们用于检测由Webdriver驱动的浏览器的功能。这只是一个需要检验的理论。

As we’ve already figured out in the question and the posted answers, there is an anti Web-scraping and a Bot detection service called “Distil Networks” in play here. And, according to the company CEO’s interview:

Even though they can create new bots, we figured out a way to identify Selenium the a tool they’re using, so we’re blocking Selenium no matter how many times they iterate on that bot. We’re doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious.

It’ll take time and additional challenges to understand how exactly they are detecting Selenium, but what can we say for sure at the moment:

  • it’s not related to the actions you take with selenium – once you navigate to the site, you get immediately detected and banned. I’ve tried to add artificial random delays between actions, take a pause after the page is loaded – nothing helped
  • it’s not about browser fingerprint either – tried it in multiple browsers with clean profiles and not, incognito modes – nothing helped
  • since, according to the hint in the interview, this was “reverse engineering”, I suspect this is done with some JS code being executed in the browser revealing that this is a browser automated via selenium webdriver

Decided to post it as an answer, since clearly:

Can a website detect when you are using selenium with chromedriver?

Yes.


Also, what I haven’t experimented with is older selenium and older browser versions – in theory, there could be something implemented/added to selenium at a certain point that Distil Networks bot detector currently relies on. Then, if this is the case, we might detect (yeah, let’s detect the detector) at what point/version a relevant change was made, look into changelog and changesets and, may be, this could give us more information on where to look and what is it they use to detect a webdriver-powered browser. It’s just a theory that needs to be tested.


回答 3

在wellsfargo.com上如何实施的示例:

try {
 if (window.document.documentElement.getAttribute("webdriver")) return !+[]
} catch (IDLMrxxel) {}
try {
 if ("_Selenium_IDE_Recorder" in window) return !+""
} catch (KknKsUayS) {}
try {
 if ("__webdriver_script_fn" in document) return !+""

Example of how it’s implemented on wellsfargo.com:

try {
 if (window.document.documentElement.getAttribute("webdriver")) return !+[]
} catch (IDLMrxxel) {}
try {
 if ("_Selenium_IDE_Recorder" in window) return !+""
} catch (KknKsUayS) {}
try {
 if ("__webdriver_script_fn" in document) return !+""

回答 4

混淆JavaScript结果

我已经检查了chromedriver源代码。这会将一些javascript文件注入浏览器。
此链接上的每个javascript文件都会注入到以下网页: https : //chromium.googlesource.com/chromium/src/+/master/chrome/test/chromedriver/js/

因此,我使用了逆向工程,并通过十六进制编辑来模糊化js文件。现在,我确定不再使用JavaScript变量,函数名称和固定字符串来发现硒的活动。但是仍然有些站点和reCaptcha可以检测到硒!
也许他们检查由chromedriver js执行引起的修改:)


编辑1:

Chrome“导航器”参数修改

我发现“导航器”中有一些参数可以简要介绍chromedriver的使用。这些是参数:

  • “ navigator.webdriver”在非自动模式下为’undefined’。在自动模式下,它是“ true”。
  • “ navigator.plugins”在无头chrome上的长度为0。因此,我添加了一些假元素来欺骗插件长度检查过程。
  • navigator.languages”设置为默认镶边值'[“ en-US”,“ en”,“ es”]’。

因此,我需要一个Chrome扩展程序来在网页上运行javascript。我使用本文提供的js代码进行了扩展,并使用另一篇文章将压缩扩展添加到我的项目中。我已经成功更改了值;但是仍然没有改变!

我没有找到其他像这样的变量,但这并不意味着它们不存在。reCaptcha仍然检测到chromedriver,因此应该有更多变量要更改。在下一步应的检测服务,逆向工程,我不想做的事。

现在,我不确定是否值得在此自动化过程上花费更多时间或寻找替代方法!

Obfuscating JavaScripts result

I have checked the chromedriver source code. That injects some javascript files to the browser.
Every javascript file on this link is injected to the web pages: https://chromium.googlesource.com/chromium/src/+/master/chrome/test/chromedriver/js/

So I used reverse engineering and obfuscated the js files by Hex editing. Now i was sure that no more javascript variable, function names and fixed strings were used to uncover selenium activity. But still some sites and reCaptcha detect selenium!
Maybe they check the modifications that are caused by chromedriver js execution :)


Edit 1:

Chrome ‘navigator’ parameters modification

I discovered there are some parameters in ‘navigator’ that briefly uncover using of chromedriver. These are the parameters:

  • “navigator.webdriver” On non-automated mode it is ‘undefined’. On automated mode it’s ‘true’.
  • “navigator.plugins” On headless chrome has 0 length. So I added some fake elements to fool the plugin length checking process.
  • navigator.languages” was set to default chrome value ‘[“en-US”, “en”, “es”]’ .

So what i needed was a chrome extension to run javascript on the web pages. I made an extension with the js code provided in the article and used another article to add the zipped extension to my project. I have successfully changed the values; But still nothing changed!

I didn’t find other variables like these but it doesn’t mean that they don’t exist. Still reCaptcha detects chromedriver, So there should be more variables to change. The next step should be reverse engineering of the detector services that i don’t want to do.

Now I’m not sure does it worth to spend more time on this automation process or search for alternative methods!


回答 5

尝试将selenium与chrome的特定用户配置文件一起使用,以这种方式,您可以将其用作特定用户并定义所需的任何内容。这样做时,它将以“实际”用户身份运行,请使用一些进程浏览器查看chrome进程。您会看到标签的区别。

例如:

username = os.getenv("USERNAME")
userProfile = "C:\\Users\\" + username + "\\AppData\\Local\\Google\\Chrome\\User Data\\Default"
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir={}".format(userProfile))
# add here any tag you want.
options.add_experimental_option("excludeSwitches", ["ignore-certificate-errors", "safebrowsing-disable-download-protection", "safebrowsing-disable-auto-update", "disable-client-side-phishing-detection"])
chromedriver = "C:\Python27\chromedriver\chromedriver.exe"
os.environ["webdriver.chrome.driver"] = chromedriver
browser = webdriver.Chrome(executable_path=chromedriver, chrome_options=options)

chrome标签列表在这里

Try to use selenium with a specific user profile of chrome, That way you can use it as specific user and define any thing you want, When doing so it will run as a ‘real’ user, look at chrome process with some process explorer and you’ll see the difference with the tags.

For example:

username = os.getenv("USERNAME")
userProfile = "C:\\Users\\" + username + "\\AppData\\Local\\Google\\Chrome\\User Data\\Default"
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir={}".format(userProfile))
# add here any tag you want.
options.add_experimental_option("excludeSwitches", ["ignore-certificate-errors", "safebrowsing-disable-download-protection", "safebrowsing-disable-auto-update", "disable-client-side-phishing-detection"])
chromedriver = "C:\Python27\chromedriver\chromedriver.exe"
os.environ["webdriver.chrome.driver"] = chromedriver
browser = webdriver.Chrome(executable_path=chromedriver, chrome_options=options)

chrome tag list here


回答 6

partial interface Navigator { readonly attribute boolean webdriver; };

Navigator界面的webdriver IDL属性必须返回webdriver-active标志的值,该标志最初为false。

此属性使网站可以确定用户代理受WebDriver的控制,并且可以用于帮助减轻拒绝服务攻击。

直接取自2017年W3C编辑的WebDriver草案。这在很大程度上意味着,至少可以确定硒驱动程序的未来迭代,以防止滥用。最终,如果没有源代码,很难说出到底是什么导致chrome驱动程序可检测到。

partial interface Navigator { readonly attribute boolean webdriver; };

The webdriver IDL attribute of the Navigator interface must return the value of the webdriver-active flag, which is initially false.

This property allows websites to determine that the user agent is under control by WebDriver, and can be used to help mitigate denial-of-service attacks.

Taken directly from the 2017 W3C Editor’s Draft of WebDriver. This heavily implies that at the very least, future iterations of selenium’s drivers will be identifiable to prevent misuse. Ultimately, it’s hard to tell without the source code, what exactly causes chrome driver in specific to be detectable.


回答 7

据说window.navigator.webdriver === true如果使用webdriver 会设置Firefox 。这是根据较早的规范之一(例如:archive.org)得出的,但是我在新的规范中找不到它,除了附录中一些非常模糊的措词。

对它的测试是在文件fingerprint_test.js中的硒代码中,其末尾的注释显示“当前仅在firefox中实现”,但是我无法通过一些简单的grep方式识别出该方向上的任何代码,在当前(41.0.2)Firefox发行树或Chromium树中。

从2015年1月起,我还发现了有关firefox驱动程序b82512999938中有关指纹的较早提交的评论。Selenium GIT-master仍在昨天下载的Selenium GIT-master中javascript/firefox-driver/extension/content/server.js添加了注释,该注释链接到当前w3c Webdriver规范中措辞略有不同的附录。

Firefox is said to set window.navigator.webdriver === true if working with a webdriver. That was according to one of the older specs (e.g.: archive.org) but I couldn’t find it in the new one except for some very vague wording in the appendices.

A test for it is in the selenium code in the file fingerprint_test.js where the comment at the end says “Currently only implemented in firefox” but I wasn’t able to identify any code in that direction with some simple greping, neither in the current (41.0.2) Firefox release-tree nor in the Chromium-tree.

I also found a comment for an older commit regarding fingerprinting in the firefox driver b82512999938 from January 2015. That code is still in the Selenium GIT-master downloaded yesterday at javascript/firefox-driver/extension/content/server.js with a comment linking to the slightly differently worded appendix in the current w3c webdriver spec.


回答 8

除了@ Erti-Chris Eelmaa的出色答案-令人讨厌window.navigator.webdriver,它是只读的。如果将其值更改为它的事件false仍然会存在true。因此,仍然可以检测到由自动化软件驱动的浏览器。 MDN

该变量由--enable-automationchrome中的标志管理。chromedriver使用该标志启动chrome并将chrome设置window.navigator.webdrivertrue。你可以在这里找到它。您需要将标记添加到“排除开关”中。例如(golang):

package main

import (
    "github.com/tebeka/selenium"
    "github.com/tebeka/selenium/chrome"
)

func main() {

caps := selenium.Capabilities{
    "browserName": "chrome",
}

chromeCaps := chrome.Capabilities{
    Path:            "/path/to/chrome-binary",
    ExcludeSwitches: []string{"enable-automation"},
}
caps.AddChrome(chromeCaps)

wd, err := selenium.NewRemote(caps, fmt.Sprintf("http://localhost:%d/wd/hub", 4444))
}

Additionally to the great answer of @Erti-Chris Eelmaa – there’s annoying window.navigator.webdriver and it is read-only. Event if you change the value of it to false it will still have true. Thats why the browser driven by automated software can still be detected. MDN

The variable is managed by the flag --enable-automation in chrome. The chromedriver launches chrome with that flag and chrome sets the window.navigator.webdriver to true. You can find it here. You need to add to “exclude switches” the flag. For instance (golang):

package main

import (
    "github.com/tebeka/selenium"
    "github.com/tebeka/selenium/chrome"
)

func main() {

caps := selenium.Capabilities{
    "browserName": "chrome",
}

chromeCaps := chrome.Capabilities{
    Path:            "/path/to/chrome-binary",
    ExcludeSwitches: []string{"enable-automation"},
}
caps.AddChrome(chromeCaps)

wd, err := selenium.NewRemote(caps, fmt.Sprintf("http://localhost:%d/wd/hub", 4444))
}

回答 9

听起来好像它们在Web应用程序防火墙后面。看一下modsecurity和owasp,看看它们是如何工作的。实际上,您要问的是如何进行漫游器检测规避。这不是Selenium Web驱动程序的用途。它用于测试您的Web应用程序,而不打其他Web应用程序。有可能,但基本上,您必须查看WAF在其规则集中查找的内容,并且如果可以的话,特别要避免使用硒。即使那样,它仍然可能无法正常工作,因为您不知道他们在使用什么WAF。您做了正确的第一步,就是伪造用户代理。如果仍然不能解决问题,那么WAF已经到位,您可能需要变得更加棘手。

编辑:点取自其他答案。确保首先正确设置了用户代理。可能是它撞到了本地Web服务器,还是嗅探了流量。

It sounds like they are behind a web application firewall. Take a look at modsecurity and owasp to see how those work. In reality, what you are asking is how to do bot detection evasion. That is not what selenium web driver is for. It is for testing your web application not hitting other web applications. It is possible, but basically, you’d have to look at what a WAF looks for in their rule set and specifically avoid it with selenium if you can. Even then, it might still not work because you don’t know what WAF they are using. You did the right first step, that is faking the user agent. If that didn’t work though, then a WAF is in place and you probably need to get more tricky.

Edit: Point taken from other answer. Make sure your user agent is actually being set correctly first. Maybe have it hit a local web server or sniff the traffic going out.


回答 10

即使您发送了所有正确的数据(例如,Selenium并未显示为扩展名,您也具有合理的分辨率/位深度&c),但仍有许多服务和工具可以分析访问者的行为,以确定访问者的行为是否演员是用户或自动化系统。

例如,访问一个站点然后立即通过将鼠标直接移到相关按钮上不到一秒钟立即执行一些操作,这实际上是用户不会做的。

作为调试工具,使用https://panopticlick.eff.org/这样的站点来检查浏览器的独特性可能也很有用。它还将帮助您验证是否有任何特定参数表明您正在Selenium中运行。

Even if you are sending all the right data (e.g. Selenium doesn’t show up as an extension, you have a reasonable resolution/bit-depth, &c), there are a number of services and tools which profile visitor behaviour to determine whether the actor is a user or an automated system.

For example, visiting a site then immediately going to perform some action by moving the mouse directly to the relevant button, in less than a second, is something no user would actually do.

It might also be useful as a debugging tool to use a site such as https://panopticlick.eff.org/ to check how unique your browser is; it’ll also help you verify whether there are any specific parameters that indicate you’re running in Selenium.


回答 11

我所看到的漫游器检测似乎比我在下面的答案中读到的东西更加复杂或至少有所不同。

实验1:

  1. 我从Python控制台使用Selenium打开浏览器和网页。
  2. 鼠标已经位于特定的位置,我知道该链接将在页面加载后出现。我从不动鼠标。
  3. 我按下了鼠标左键一次(这是从运行Python的控制台到浏览器的焦点)。
  4. 我再次按下鼠标左键(记住,光标在给定链接的上方)。
  5. 链接会正常打开,应该打开。

实验2:

  1. 和以前一样,我从Python控制台使用Selenium打开浏览器和网页。

  2. 这次,我没有使用鼠标单击,而是使用Selenium(在Python控制台中)单击具有随机偏移量的相同元素。

  3. 链接没有打开,但是我被带到了注册页面。

含义:

  • 通过Selenium打开网络浏览器并不会阻止我出现人类
  • 像人一样移动鼠标并不一定要归类为人
  • 通过Selenium单击具有偏移量的内容仍会引发警报

似乎很神秘,但是我想他们可以确定某个动作是否源自Selenium,而他们并不关心浏览器本身是否通过Selenium打开。还是可以确定窗口是否具有焦点?听到有人有任何见识会很有趣。

The bot detection I’ve seen seems more sophisticated or at least different than what I’ve read through in the answers below.

EXPERIMENT 1:

  1. I open a browser and web page with Selenium from a Python console.
  2. The mouse is already at a specific location where I know a link will appear once the page loads. I never move the mouse.
  3. I press the left mouse button once (this is necessary to take focus from the console where Python is running to the browser).
  4. I press the left mouse button again (remember, cursor is above a given link).
  5. The link opens normally, as it should.

EXPERIMENT 2:

  1. As before, I open a browser and the web page with Selenium from a Python console.

  2. This time around, instead of clicking with the mouse, I use Selenium (in the Python console) to click the same element with a random offset.

  3. The link doesn’t open, but I am taken to a sign up page.

IMPLICATIONS:

  • opening a web browser via Selenium doesn’t preclude me from appearing human
  • moving the mouse like a human is not necessary to be classified as human
  • clicking something via Selenium with an offset still raises the alarm

Seems mysterious, but I guess they can just determine whether an action originates from Selenium or not, while they don’t care whether the browser itself was opened via Selenium or not. Or can they determine if the window has focus? Would be interesting to hear if anyone has any insights.


回答 12

我发现的另一件事是,某些网站使用检查用户代理的平台。如果该值包含:“ HeadlessChrome”,则在使用无头模式时,该行为可能会很奇怪。

解决方法是覆盖用户代理值,例如在Java中:

chromeOptions.addArguments("--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36");

One more thing I found is that some websites uses a platform that checks the User Agent. If the value contains: “HeadlessChrome” the behavior can be weird when using headless mode.

The workaround for that will be to override the user agent value, for example in Java:

chromeOptions.addArguments("--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36");

回答 13

一些站点正在检测到此:

function d() {
try {
    if (window.document.$cdc_asdjflasutopfhvcZLmcfl_.cache_)
        return !0
} catch (e) {}

try {
    //if (window.document.documentElement.getAttribute(decodeURIComponent("%77%65%62%64%72%69%76%65%72")))
    if (window.document.documentElement.getAttribute("webdriver"))
        return !0
} catch (e) {}

try {
    //if (decodeURIComponent("%5F%53%65%6C%65%6E%69%75%6D%5F%49%44%45%5F%52%65%63%6F%72%64%65%72") in window)
    if ("_Selenium_IDE_Recorder" in window)
        return !0
} catch (e) {}

try {
    //if (decodeURIComponent("%5F%5F%77%65%62%64%72%69%76%65%72%5F%73%63%72%69%70%74%5F%66%6E") in document)
    if ("__webdriver_script_fn" in document)
        return !0
} catch (e) {}

Some sites are detecting this:

function d() {
try {
    if (window.document.$cdc_asdjflasutopfhvcZLmcfl_.cache_)
        return !0
} catch (e) {}

try {
    //if (window.document.documentElement.getAttribute(decodeURIComponent("%77%65%62%64%72%69%76%65%72")))
    if (window.document.documentElement.getAttribute("webdriver"))
        return !0
} catch (e) {}

try {
    //if (decodeURIComponent("%5F%53%65%6C%65%6E%69%75%6D%5F%49%44%45%5F%52%65%63%6F%72%64%65%72") in window)
    if ("_Selenium_IDE_Recorder" in window)
        return !0
} catch (e) {}

try {
    //if (decodeURIComponent("%5F%5F%77%65%62%64%72%69%76%65%72%5F%73%63%72%69%70%74%5F%66%6E") in document)
    if ("__webdriver_script_fn" in document)
        return !0
} catch (e) {}

回答 14

用以下代码编写一个html页面。您将看到,在DOM硒中,在externalHTML中应用了webdriver属性

<html>
<head>
  <script type="text/javascript">
  <!--
    function showWindow(){
      javascript:(alert(document.documentElement.outerHTML));
    }
  //-->
  </script>
</head>
<body>
  <form>
    <input type="button" value="Show outerHTML" onclick="showWindow()">
  </form>
</body>
</html>

Write an html page with the following code. You will see that in the DOM selenium applies a webdriver attribute in the outerHTML

<html>
<head>
  <script type="text/javascript">
  <!--
    function showWindow(){
      javascript:(alert(document.documentElement.outerHTML));
    }
  //-->
  </script>
</head>
<body>
  <form>
    <input type="button" value="Show outerHTML" onclick="showWindow()">
  </form>
</body>
</html>

回答 15

我发现这样更改javascript“ key”变量:

//Fools the website into believing a human is navigating it
        ((JavascriptExecutor)driver).executeScript("window.key = \"blahblah\";");

在将Selenium Webdriver和Google Chrome结合使用时,某些网站可以使用,因为许多网站都会检查此变量,以避免被Selenium废弃。

I’ve found changing the javascript “key” variable like this:

//Fools the website into believing a human is navigating it
        ((JavascriptExecutor)driver).executeScript("window.key = \"blahblah\";");

works for some websites when using Selenium Webdriver along with Google Chrome, since many sites check for this variable in order to avoid being scrapped by Selenium.


回答 16

在我看来,使用Selenium做到这一点的最简单方法是拦截XHR,后者将发送回浏览器指纹。

但这是仅硒的问题,因此最好使用其他方法。硒应该使这种事情变得容易,而不是更困难。

It seems to me the simplest way to do it with Selenium is to intercept the XHR that sends back the browser fingerprint.

But since this is a Selenium-only problem, its better just to use something else. Selenium is supposed to make things like this easier, not way harder.


回答 17

您可以尝试使用参数“启用自动化”

var options = new ChromeOptions();

// hide selenium
options.AddExcludedArguments(new List<string>() { "enable-automation" });

var driver = new ChromeDriver(ChromeDriverService.CreateDefaultService(), options);

但是,我想提醒您,此功能已在ChromeDriver 79.0.3945.16中修复。因此,您可能应该使用旧版的chrome。

另外,作为另一个选择,您可以尝试使用InternetExplorerDriver而不是Chrome。对于我来说,IE不会在没有任何黑客的情况下完全阻止。

有关更多信息,请尝试在这里查看:

Selenium Webdriver:修改navigator.webdriver标志以防止硒检测

Chrome v76中无法隐藏“ Chrome正在由自动化软件控制”信息栏

You can try to use the parameter “enable-automation”

var options = new ChromeOptions();

// hide selenium
options.AddExcludedArguments(new List<string>() { "enable-automation" });

var driver = new ChromeDriver(ChromeDriverService.CreateDefaultService(), options);

But, I want to warn that this ability was fixed in ChromeDriver 79.0.3945.16. So probably you should use older versions of chrome.

Also, as another option, you can try using InternetExplorerDriver instead of Chrome. As for me, IE does not block at all without any hacks.

And for more info try to take a look here:

Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection

Unable to hide “Chrome is being controlled by automated software” infobar within Chrome v76


Python和JavaScript之间的JSON日期时间

问题:Python和JavaScript之间的JSON日期时间

我想使用JSON从Python发送序列化形式的datetime.datetime对象,并使用JSON在JavaScript中反序列化。做这个的最好方式是什么?

I want to send a datetime.datetime object in serialized form from Python using JSON and de-serialize in JavaScript using JSON. What is the best way to do this?


回答 0

您可以在json.dumps中添加“默认”参数来处理此问题:

date_handler = lambda obj: (
    obj.isoformat()
    if isinstance(obj, (datetime.datetime, datetime.date))
    else None
)
json.dumps(datetime.datetime.now(), default=date_handler)
'"2010-04-20T20:08:21.634121"'

这是ISO 8601格式。

更全面的默认处理程序功能:

def handler(obj):
    if hasattr(obj, 'isoformat'):
        return obj.isoformat()
    elif isinstance(obj, ...):
        return ...
    else:
        raise TypeError, 'Object of type %s with value of %s is not JSON serializable' % (type(obj), repr(obj))

更新:添加了类型和值的输出。
更新:还处理日期

You can add the ‘default’ parameter to json.dumps to handle this:

date_handler = lambda obj: (
    obj.isoformat()
    if isinstance(obj, (datetime.datetime, datetime.date))
    else None
)
json.dumps(datetime.datetime.now(), default=date_handler)
'"2010-04-20T20:08:21.634121"'

Which is ISO 8601 format.

A more comprehensive default handler function:

def handler(obj):
    if hasattr(obj, 'isoformat'):
        return obj.isoformat()
    elif isinstance(obj, ...):
        return ...
    else:
        raise TypeError, 'Object of type %s with value of %s is not JSON serializable' % (type(obj), repr(obj))

Update: Added output of type as well as value.
Update: Also handle date


回答 1

对于跨语言项目,我发现包含RfC 3339日期的字符串是最好的选择。RfC 3339日期如下所示:

  1985-04-12T23:20:50.52Z

我认为大多数格式都是显而易见的。唯一有点不寻常的事情可能是结尾处的“ Z”。它代表GMT / UTC。您还可以为CEST添加夏令时偏移,例如+02:00(德国夏季)。我个人更喜欢将所有内容保留在UTC中,直到显示为止。

为了显示,比较和存储,您可以将其保留为所有语言的字符串格式。如果您需要日期进行计算,可以轻松将其转换为大多数语言的原始日期对象。

因此,生成这样的JSON:

  json.dump(datetime.now().strftime('%Y-%m-%dT%H:%M:%SZ'))

不幸的是,Javascript的Date构造函数不接受RfC 3339字符串,但是Internet上有很多解析器

huTools.hujson会尝试处理Python代码中可能遇到的最常见的编码问题,包括日期/日期时间对象,同时正确处理时区。

For cross-language projects, I found out that strings containing RfC 3339 dates are the best way to go. An RfC 3339 date looks like this:

  1985-04-12T23:20:50.52Z

I think most of the format is obvious. The only somewhat unusual thing may be the “Z” at the end. It stands for GMT/UTC. You could also add a timezone offset like +02:00 for CEST (Germany in summer). I personally prefer to keep everything in UTC until it is displayed.

For displaying, comparisons and storage you can leave it in string format across all languages. If you need the date for calculations easy to convert it back to a native date object in most language.

So generate the JSON like this:

  json.dump(datetime.now().strftime('%Y-%m-%dT%H:%M:%SZ'))

Unfortunately, Javascript’s Date constructor doesn’t accept RfC 3339 strings but there are many parsers available on the Internet.

huTools.hujson tries to handle the most common encoding issues you might come across in Python code including date/datetime objects while handling timezones correctly.


回答 2

我已经解决了。

假设您有一个使用datetime.now()创建的Python datetime对象d。其值为:

datetime.datetime(2011, 5, 25, 13, 34, 5, 787000)

您可以将其序列化为JSON作为ISO 8601日期时间字符串:

import json    
json.dumps(d.isoformat())

日期时间对象示例将序列化为:

'"2011-05-25T13:34:05.787000"'

一旦在Javascript层中收到此值,就可以构造一个Date对象:

var d = new Date("2011-05-25T13:34:05.787000");

从Javascript 1.8.5开始,Date对象具有toJSON方法,该方法返回标准格式的字符串。要将上述Javascript对象序列化回JSON,命令将是:

d.toJSON()

这会给你:

'2011-05-25T20:34:05.787Z'

一旦在Python中收到此字符串,就可以将其反序列化为datetime对象:

datetime.strptime('2011-05-25T20:34:05.787Z', '%Y-%m-%dT%H:%M:%S.%fZ')

这将导致以下日期时间对象,该对象与您开始时使用的对象相同,因此是正确的:

datetime.datetime(2011, 5, 25, 20, 34, 5, 787000)

I’ve worked it out.

Let’s say you have a Python datetime object, d, created with datetime.now(). Its value is:

datetime.datetime(2011, 5, 25, 13, 34, 5, 787000)

You can serialize it to JSON as an ISO 8601 datetime string:

import json    
json.dumps(d.isoformat())

The example datetime object would be serialized as:

'"2011-05-25T13:34:05.787000"'

This value, once received in the Javascript layer, can construct a Date object:

var d = new Date("2011-05-25T13:34:05.787000");

As of Javascript 1.8.5, Date objects have a toJSON method, which returns a string in a standard format. To serialize the above Javascript object back to JSON, therefore, the command would be:

d.toJSON()

Which would give you:

'2011-05-25T20:34:05.787Z'

This string, once received in Python, could be deserialized back to a datetime object:

datetime.strptime('2011-05-25T20:34:05.787Z', '%Y-%m-%dT%H:%M:%S.%fZ')

This results in the following datetime object, which is the same one you started with and therefore correct:

datetime.datetime(2011, 5, 25, 20, 34, 5, 787000)

回答 3

使用json,您可以子类化JSONEncoder并重写default()方法以提供自己的自定义序列化程序:

import json
import datetime

class DateTimeJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime.datetime):
            return obj.isoformat()
        else:
            return super(DateTimeJSONEncoder, self).default(obj)

然后,您可以这样称呼它:

>>> DateTimeJSONEncoder().encode([datetime.datetime.now()])
'["2010-06-15T14:42:28"]'

Using json, you can subclass JSONEncoder and override the default() method to provide your own custom serializers:

import json
import datetime

class DateTimeJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime.datetime):
            return obj.isoformat()
        else:
            return super(DateTimeJSONEncoder, self).default(obj)

Then, you can call it like this:

>>> DateTimeJSONEncoder().encode([datetime.datetime.now()])
'["2010-06-15T14:42:28"]'

回答 4

这是一个使用标准库json模块递归编码和解码datetime.datetime和datetime.date对象的完整解决方案。此后需要Python> = 2.6,因为从那时起%f仅支持datetime.datetime.strptime()格式字符串中的格式代码。对于Python 2.5支持,%f在尝试转换之前,请从ISO日期字符串中删除和除去微秒,但是,您当然会降低微秒的精度。为了与其他来源的ISO日期字符串互操作,其中可能包括时区名称或UTC偏移量,您可能还需要在转换之前剥离日期字符串的某些部分。有关ISO日期字符串(和许多其他日期格式)的完整解析器,请参见第三方dateutil模块。

仅当ISO日期字符串是JavaScript文字对象表示法中的值或对象中嵌套结构中的值时,才起作用。作为顶级数组项目的ISO日期字符串将不会被解码。

即这有效:

date = datetime.datetime.now()
>>> json = dumps(dict(foo='bar', innerdict=dict(date=date)))
>>> json
'{"innerdict": {"date": "2010-07-15T13:16:38.365579"}, "foo": "bar"}'
>>> loads(json)
{u'innerdict': {u'date': datetime.datetime(2010, 7, 15, 13, 16, 38, 365579)},
u'foo': u'bar'}

这也是:

>>> json = dumps(['foo', 'bar', dict(date=date)])
>>> json
'["foo", "bar", {"date": "2010-07-15T13:16:38.365579"}]'
>>> loads(json)
[u'foo', u'bar', {u'date': datetime.datetime(2010, 7, 15, 13, 16, 38, 365579)}]

但这不能按预期工作:

>>> json = dumps(['foo', 'bar', date])
>>> json
'["foo", "bar", "2010-07-15T13:16:38.365579"]'
>>> loads(json)
[u'foo', u'bar', u'2010-07-15T13:16:38.365579']

这是代码:

__all__ = ['dumps', 'loads']

import datetime

try:
    import json
except ImportError:
    import simplejson as json

class JSONDateTimeEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (datetime.date, datetime.datetime)):
            return obj.isoformat()
        else:
            return json.JSONEncoder.default(self, obj)

def datetime_decoder(d):
    if isinstance(d, list):
        pairs = enumerate(d)
    elif isinstance(d, dict):
        pairs = d.items()
    result = []
    for k,v in pairs:
        if isinstance(v, basestring):
            try:
                # The %f format code is only supported in Python >= 2.6.
                # For Python <= 2.5 strip off microseconds
                # v = datetime.datetime.strptime(v.rsplit('.', 1)[0],
                #     '%Y-%m-%dT%H:%M:%S')
                v = datetime.datetime.strptime(v, '%Y-%m-%dT%H:%M:%S.%f')
            except ValueError:
                try:
                    v = datetime.datetime.strptime(v, '%Y-%m-%d').date()
                except ValueError:
                    pass
        elif isinstance(v, (dict, list)):
            v = datetime_decoder(v)
        result.append((k, v))
    if isinstance(d, list):
        return [x[1] for x in result]
    elif isinstance(d, dict):
        return dict(result)

def dumps(obj):
    return json.dumps(obj, cls=JSONDateTimeEncoder)

def loads(obj):
    return json.loads(obj, object_hook=datetime_decoder)

if __name__ == '__main__':
    mytimestamp = datetime.datetime.utcnow()
    mydate = datetime.date.today()
    data = dict(
        foo = 42,
        bar = [mytimestamp, mydate],
        date = mydate,
        timestamp = mytimestamp,
        struct = dict(
            date2 = mydate,
            timestamp2 = mytimestamp
        )
    )

    print repr(data)
    jsonstring = dumps(data)
    print jsonstring
    print repr(loads(jsonstring))

Here’s a fairly complete solution for recursively encoding and decoding datetime.datetime and datetime.date objects using the standard library json module. This needs Python >= 2.6 since the %f format code in the datetime.datetime.strptime() format string is only supported in since then. For Python 2.5 support, drop the %f and strip the microseconds from the ISO date string before trying to convert it, but you’ll loose microseconds precision, of course. For interoperability with ISO date strings from other sources, which may include a time zone name or UTC offset, you may also need to strip some parts of the date string before the conversion. For a complete parser for ISO date strings (and many other date formats) see the third-party dateutil module.

Decoding only works when the ISO date strings are values in a JavaScript literal object notation or in nested structures within an object. ISO date strings, which are items of a top-level array will not be decoded.

I.e. this works:

date = datetime.datetime.now()
>>> json = dumps(dict(foo='bar', innerdict=dict(date=date)))
>>> json
'{"innerdict": {"date": "2010-07-15T13:16:38.365579"}, "foo": "bar"}'
>>> loads(json)
{u'innerdict': {u'date': datetime.datetime(2010, 7, 15, 13, 16, 38, 365579)},
u'foo': u'bar'}

And this too:

>>> json = dumps(['foo', 'bar', dict(date=date)])
>>> json
'["foo", "bar", {"date": "2010-07-15T13:16:38.365579"}]'
>>> loads(json)
[u'foo', u'bar', {u'date': datetime.datetime(2010, 7, 15, 13, 16, 38, 365579)}]

But this doesn’t work as expected:

>>> json = dumps(['foo', 'bar', date])
>>> json
'["foo", "bar", "2010-07-15T13:16:38.365579"]'
>>> loads(json)
[u'foo', u'bar', u'2010-07-15T13:16:38.365579']

Here’s the code:

__all__ = ['dumps', 'loads']

import datetime

try:
    import json
except ImportError:
    import simplejson as json

class JSONDateTimeEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (datetime.date, datetime.datetime)):
            return obj.isoformat()
        else:
            return json.JSONEncoder.default(self, obj)

def datetime_decoder(d):
    if isinstance(d, list):
        pairs = enumerate(d)
    elif isinstance(d, dict):
        pairs = d.items()
    result = []
    for k,v in pairs:
        if isinstance(v, basestring):
            try:
                # The %f format code is only supported in Python >= 2.6.
                # For Python <= 2.5 strip off microseconds
                # v = datetime.datetime.strptime(v.rsplit('.', 1)[0],
                #     '%Y-%m-%dT%H:%M:%S')
                v = datetime.datetime.strptime(v, '%Y-%m-%dT%H:%M:%S.%f')
            except ValueError:
                try:
                    v = datetime.datetime.strptime(v, '%Y-%m-%d').date()
                except ValueError:
                    pass
        elif isinstance(v, (dict, list)):
            v = datetime_decoder(v)
        result.append((k, v))
    if isinstance(d, list):
        return [x[1] for x in result]
    elif isinstance(d, dict):
        return dict(result)

def dumps(obj):
    return json.dumps(obj, cls=JSONDateTimeEncoder)

def loads(obj):
    return json.loads(obj, object_hook=datetime_decoder)

if __name__ == '__main__':
    mytimestamp = datetime.datetime.utcnow()
    mydate = datetime.date.today()
    data = dict(
        foo = 42,
        bar = [mytimestamp, mydate],
        date = mydate,
        timestamp = mytimestamp,
        struct = dict(
            date2 = mydate,
            timestamp2 = mytimestamp
        )
    )

    print repr(data)
    jsonstring = dumps(data)
    print jsonstring
    print repr(loads(jsonstring))

回答 5

如果您确定只有Javascript将使用JSON,则我更喜欢Date直接传递Javascript 对象。

对象ctime()上的方法datetime将返回Javascript Date对象可以理解的字符串。

import datetime
date = datetime.datetime.today()
json = '{"mydate":new Date("%s")}' % date.ctime()

Javascript将很乐意将其用作对象文字,并且您已经内置了Date对象。

If you’re certain that only Javascript will be consuming the JSON, I prefer to pass Javascript Date objects directly.

The ctime() method on datetime objects will return a string that the Javascript Date object can understand.

import datetime
date = datetime.datetime.today()
json = '{"mydate":new Date("%s")}' % date.ctime()

Javascript will happily use that as an object literal, and you’ve got your Date object built right in.


回答 6

比赛后期… :)

一个非常简单的解决方案是修补json模块的默认值。例如:

import json
import datetime

json.JSONEncoder.default = lambda self,obj: (obj.isoformat() if isinstance(obj, datetime.datetime) else None)

现在,您可以使用json.dumps(),就好像它一直支持日期时间一样。

json.dumps({'created':datetime.datetime.now()})

如果您需要始终对json模块进行此扩展,并且希望不更改您或其他人使用json序列化的方式(无论是否使用现有代码),则这是有道理的。

请注意,有些人可能认为以这种方式对库进行修补是不好的做法。如果您可能希望以多种方式扩展应用程序,则需要格外小心-在这种情况下,我建议使用ramen或JT解决方案,并在每种情况下选择合适的json扩展名。

Late in the game… :)

A very simple solution is to patch the json module default. For example:

import json
import datetime

json.JSONEncoder.default = lambda self,obj: (obj.isoformat() if isinstance(obj, datetime.datetime) else None)

Now, you can use json.dumps() as if it had always supported datetime…

json.dumps({'created':datetime.datetime.now()})

This makes sense if you require this extension to the json module to always kick in and wish to not change the way you or others use json serialization (either in existing code or not).

Note that some may consider patching libraries in that way as bad practice. Special care need to be taken in case you may wish to extend your application in more than one way – is such a case, I suggest to use the solution by ramen or JT and choose the proper json extension in each case.


回答 7

除了时间戳,没有什么可添加到社区Wiki答案中了!

Javascript使用以下格式:

new Date().toJSON() // "2016-01-08T19:00:00.123Z"

Python端(有关json.dumps处理程序,请参见其他答案):

>>> from datetime import datetime
>>> d = datetime.strptime('2016-01-08T19:00:00.123Z', '%Y-%m-%dT%H:%M:%S.%fZ')
>>> d
datetime.datetime(2016, 1, 8, 19, 0, 0, 123000)
>>> d.isoformat() + 'Z'
'2016-01-08T19:00:00.123000Z'

如果将Z保留在外,那么诸如angular的前端框架将无法在浏览器本地时区中显示日期:

> $filter('date')('2016-01-08T19:00:00.123000Z', 'yyyy-MM-dd HH:mm:ss')
"2016-01-08 20:00:00"
> $filter('date')('2016-01-08T19:00:00.123000', 'yyyy-MM-dd HH:mm:ss')
"2016-01-08 19:00:00"

Not much to add to the community wiki answer, except for timestamp!

Javascript uses the following format:

new Date().toJSON() // "2016-01-08T19:00:00.123Z"

Python side (for the json.dumps handler, see the other answers):

>>> from datetime import datetime
>>> d = datetime.strptime('2016-01-08T19:00:00.123Z', '%Y-%m-%dT%H:%M:%S.%fZ')
>>> d
datetime.datetime(2016, 1, 8, 19, 0, 0, 123000)
>>> d.isoformat() + 'Z'
'2016-01-08T19:00:00.123000Z'

If you leave that Z out, frontend frameworks like angular can not display the date in browser-local timezone:

> $filter('date')('2016-01-08T19:00:00.123000Z', 'yyyy-MM-dd HH:mm:ss')
"2016-01-08 20:00:00"
> $filter('date')('2016-01-08T19:00:00.123000', 'yyyy-MM-dd HH:mm:ss')
"2016-01-08 19:00:00"

回答 8

在python方面:

import time, json
from datetime import datetime as dt
your_date = dt.now()
data = json.dumps(time.mktime(your_date.timetuple())*1000)
return data # data send to javascript

在JavaScript方面:

var your_date = new Date(data)

数据来自python的地方

On python side:

import time, json
from datetime import datetime as dt
your_date = dt.now()
data = json.dumps(time.mktime(your_date.timetuple())*1000)
return data # data send to javascript

On javascript side:

var your_date = new Date(data)

where data is result from python


回答 9

我的建议是使用一个库。pypi.org上有几个可用的。

我使用这个,它很好用:https : //pypi.python.org/pypi/asjson

My advice is to use a library. There are several available at pypi.org.

I use this one, it it works good: https://pypi.python.org/pypi/asjson


回答 10

显然,“正确的” JSON(很好的JavaScript)日期格式是2012-04-23T18:25:43.511Z-UTC和“ Z”。如果没有此功能,则从字符串创建Date()对象时,JavaScript将使用Web浏览器的本地时区。

对于“天真的”时间(Python称没有时区的时间,并且假定是本地时间),以下内容将强制使用本地时区,以便随后可以将其正确转换为UTC:

def default(obj):
    if hasattr(obj, "json") and callable(getattr(obj, "json")):
        return obj.json()
    if hasattr(obj, "isoformat") and callable(getattr(obj, "isoformat")):
        # date/time objects
        if not obj.utcoffset():
            # add local timezone to "naive" local time
            # /programming/2720319/python-figure-out-local-timezone
            tzinfo = datetime.now(timezone.utc).astimezone().tzinfo
            obj = obj.replace(tzinfo=tzinfo)
        # convert to UTC
        obj = obj.astimezone(timezone.utc)
        # strip the UTC offset
        obj = obj.replace(tzinfo=None)
        return obj.isoformat() + "Z"
    elif hasattr(obj, "__str__") and callable(getattr(obj, "__str__")):
        return str(obj)
    else:
        print("obj:", obj)
        raise TypeError(obj)

def dump(j, io):
    json.dump(j, io, indent=2, default=default)

为什么这么难。

Apparently The “right” JSON (well JavaScript) date format is 2012-04-23T18:25:43.511Z – UTC and “Z”. Without this JavaScript will use the web browser’s local timezone when creating a Date() object from the string.

For a “naive” time (what Python calls a time with no timezone and this assumes is local) the below will force local timezone so that it can then be correctly converted to UTC:

def default(obj):
    if hasattr(obj, "json") and callable(getattr(obj, "json")):
        return obj.json()
    if hasattr(obj, "isoformat") and callable(getattr(obj, "isoformat")):
        # date/time objects
        if not obj.utcoffset():
            # add local timezone to "naive" local time
            # https://stackoverflow.com/questions/2720319/python-figure-out-local-timezone
            tzinfo = datetime.now(timezone.utc).astimezone().tzinfo
            obj = obj.replace(tzinfo=tzinfo)
        # convert to UTC
        obj = obj.astimezone(timezone.utc)
        # strip the UTC offset
        obj = obj.replace(tzinfo=None)
        return obj.isoformat() + "Z"
    elif hasattr(obj, "__str__") and callable(getattr(obj, "__str__")):
        return str(obj)
    else:
        print("obj:", obj)
        raise TypeError(obj)

def dump(j, io):
    json.dump(j, io, indent=2, default=default)

why is this so hard.


回答 11

对于从Python到JavaScript的日期转换,日期对象必须采用特定的ISO格式,即ISO格式或UNIX数字。如果ISO格式缺少某些信息,则可以先使用Date.parse转换为Unix数字。此外,Date.parse也可以与React一起使用,而新的Date可能会触发异常。

如果您有一个不带毫秒的DateTime对象,则需要考虑以下因素。:

  var unixDate = Date.parse('2016-01-08T19:00:00') 
  var desiredDate = new Date(unixDate).toLocaleDateString();

在API调用之后,示例日期也可以是result.data对象中的变量。

有关以所需格式显示日期的选项(例如,显示较长的工作日),请查看MDN文档

For the Python to JavaScript date conversion, the date object needs to be in specific ISO format, i.e. ISO format or UNIX number. If the ISO format lacks some info, then you can convert to the Unix number with Date.parse first. Moreover, Date.parse works with React as well while new Date might trigger an exception.

In case you have a DateTime object without milliseconds, the following needs to be considered. :

  var unixDate = Date.parse('2016-01-08T19:00:00') 
  var desiredDate = new Date(unixDate).toLocaleDateString();

The example date could equally be a variable in the result.data object after an API call.

For options to display the date in the desired format (e.g. to display long weekdays) check out the MDN doc.


Mal-MAL-做一个Lisp

MAL-做一个Lisp

描述

1.Mal是一个受Clojure启发的Lisp解释器

2.MAL是一种学习工具

MAL的每个实现被分成11个增量的、自包含的(且可测试的)步骤,这些步骤演示了Lisp的核心概念。最后一步是能够自托管(运行mal的错误实现)。请参阅make-a-lisp process
guide

Make-a-LISP步骤包括:

每个Make-a-LISP步骤都有一个关联的架构图。该步骤的新元素以红色高亮显示。以下是step A

如果您对创建mal实现感兴趣(或者只是对使用mal做某事感兴趣),欢迎您加入我们的Discord或加入#mal onlibera.chat除了make-a-lisp
process guide
还有一个mal/make-a-lisp
FAQ
在这里我试图回答一些常见的问题

3.MAL用86种语言实现(91种不同实现,113种运行模式)

语言 创建者
Ada Chris Moore
Ada #2 Nicolas Boulenguez
GNU Awk Miutsuru Kariya
Bash 4 Joel Martin
BASIC(C64和QBASIC) Joel Martin
BBC BASIC V Ben Harris
C Joel Martin
C #2 Duncan Watts
C++ Stephen Thirlwall
C# Joel Martin
ChucK Vasilij Schneidermann
Clojure(Clojure和ClojureScript) Joel Martin
CoffeeScript Joel Martin
Common Lisp Iqbal Ansari
Crystal Linda_pp
D Dov Murik
Dart Harry Terkelsen
Elixir Martin Ek
Elm Jos van Bakel
Emacs Lisp Vasilij Schneidermann
Erlang Nathan Fiedler
ES6(ECMAScript 2015) Joel Martin
F# Peter Stephens
Factor Jordan Lewis
Fantom Dov Murik
Fennel sogaiu
Forth Chris Houser
GNU Guile Mu Lei
GNU Smalltalk Vasilij Schneidermann
Go Joel Martin
Groovy Joel Martin
Haskell Joel Martin
Haxe(Neko、Python、C++和JS) Joel Martin
Hy Joel Martin
Io Dov Murik
Janet sogaiu
Java Joel Martin
Java(松露/GraalVM) Matt McGill
JavaScript(Demo) Joel Martin
jq Ali MohammadPur
Julia Joel Martin
Kotlin Javier Fernandez-Ivern
LiveScript Jos van Bakel
Logo Dov Murik
Lua Joel Martin
GNU Make Joel Martin
mal itself Joel Martin
MATLAB(GNU Octave&MATLAB) Joel Martin
miniMAL(RepoDemo) Joel Martin
NASM Ben Dudson
Nim Dennis Felsing
Object Pascal Joel Martin
Objective C Joel Martin
OCaml Chris Houser
Perl Joel Martin
Perl 6 Hinrik Örn Sigurðsson
PHP Joel Martin
Picolisp Vasilij Schneidermann
Pike Dov Murik
PL/pgSQL(PostgreSQL) Joel Martin
PL/SQL(Oracle) Joel Martin
PostScript Joel Martin
PowerShell Joel Martin
Prolog Nicolas Boulenguez
Python(2.x和3.x) Joel Martin
Python #2(3.x) Gavin Lewis
RPython Joel Martin
R Joel Martin
Racket Joel Martin
Rexx Dov Murik
Ruby Joel Martin
Rust Joel Martin
Scala Joel Martin
Scheme (R7RS) Vasilij Schneidermann
Skew Dov Murik
Standard ML Fabian Bergström
Swift 2 Keith Rollin
Swift 3 Joel Martin
Swift 4 陆遥
Swift 5 Oleg Montak
Tcl Dov Murik
TypeScript Masahiro Wakame
Vala Simon Tatham
VHDL Dov Murik
Vimscript Dov Murik
Visual Basic.NET Joel Martin
WebAssembly(WASM) Joel Martin
Wren Dov Murik
XSLT Ali MohammadPur
Yorick Dov Murik
Zig Josh Tobin

演示文稿

Mal第一次出现在2014年Clojure West的闪电演讲中(不幸的是没有视频)。参见Examples/clojurewest2014.mal了解会议上的演示文稿(是的,该演示文稿是一个MALL程序)

在Midwest.io 2015上,乔尔·马丁(Joel Martin)就MAL做了题为“解锁的成就:一条更好的语言学习之路”的演讲VideoSlides

最近,乔尔在LambdaConf 2016大会上发表了题为“用10个增量步骤打造自己的Lisp解释器”的演讲:Part 1Part 2Part 3Part 4Slides

构建/运行实现

运行任何给定实现的最简单方法是使用docker。每个实现都有一个预先构建的停靠器映像,其中安装了语言依赖项。您可以在顶层Makefile中使用一个方便的目标启动REPL(其中impl是实现目录名,stepX是要运行的步骤):

make DOCKERIZE=1 "repl^IMPL^stepX"
    # OR stepA is the default step:
make DOCKERIZE=1 "repl^IMPL"

外部实现

以下实施作为单独的项目进行维护:

HolyC

生锈

  • by Tim Morgan
  • by vi-使用Pest语法,不使用典型的MAL基础设施(货币化步骤和内置的转换测试)

问:

  • by Ali Mohammad Pur-Q实现运行良好,但它需要专有的手动下载,不能Docker化(或集成到mal CI管道中),因此目前它仍然是一个单独的项目

其他MAL项目

  • malc详细说明:MAL(Make A Lisp)编译器。将MAL程序编译成LLVM汇编语言,然后编译成二进制
  • malcc-malcc是MAL语言的增量编译器实现。它使用微型C编译器作为编译器后端,并且完全支持MAL语言,包括宏、尾部调用消除,甚至运行时求值。“I Built a Lisp Compiler”发布有关该过程的帖子
  • frock+Clojure风格的PHP。使用mal/php运行程序
  • flk-无论Bash在哪里都可以运行的LISP
  • glisp详细说明:基于Lisp的自引导图形设计工具。Live Demo

实施详情

Ada

Ada实现是在Debian上使用GNAT4.9开发的。如果您有git、gnat和make(可选)的windows版本,它也可以在windows上编译而不变。没有外部依赖项(未实现ReadLine)

cd impls/ada
make
./stepX_YYY

Ada.2

第二个Ada实现是使用GNAT 8开发的,并与GNU读取线库链接

cd impls/ada
make
./stepX_YYY

GNU awk

Mal的GNU awk实现已经使用GNU awk 4.1.1进行了测试

cd impls/gawk
gawk -O -f stepX_YYY.awk

BASH 4

cd impls/bash
bash stepX_YYY.sh

基本(C64和QBasic)

Basic实现使用一个预处理器,该预处理器可以生成与C64 Basic(CBMv2)和QBasic兼容的Basic代码。C64模式已经过测试cbmbasic(当前需要打补丁的版本来修复线路输入问题),并且QBasic模式已经过测试qb64

生成C64代码并使用cbmbasic运行:

cd impls/basic
make stepX_YYY.bas
STEP=stepX_YYY ./run

生成QBasic代码并加载到qb64中:

cd impls/basic
make MODE=qbasic stepX_YYY.bas
./qb64 stepX_YYY.bas

感谢Steven Syrek有关此实现的原始灵感,请参阅

BBC Basic V

BBC Basic V实现可以在Brandy解释器中运行:

cd impls/bbc-basic
brandy -quit stepX_YYY.bbc

或在RISC OS 3或更高版本下的ARM BBC Basic V中:

*Dir bbc-basic.riscos
*Run setup
*Run stepX_YYY

C

mal的C实现需要以下库(lib和头包):glib、libffi6、libgc以及libedit或GNU readline库

cd impls/c
make
./stepX_YYY

C.2

mal的第二个C实现需要以下库(lib和头包):libedit、libgc、libdl和libffi

cd impls/c.2
make
./stepX_YYY

C++

构建mal的C++实现需要g++-4.9或clang++-3.5和readline兼容库。请参阅cpp/README.md有关更多详细信息,请执行以下操作:

cd impls/cpp
make
    # OR
make CXX=clang++-3.5
./stepX_YYY

C#

mal的C#实现已经在Linux上使用Mono C#编译器(MCS)和Mono运行时(2.10.8.1版)进行了测试。两者都是构建和运行C#实现所必需的

cd impls/cs
make
mono ./stepX_YYY.exe

卡盘

Chuck实现已经使用Chuck 1.3.5.2进行了测试

cd impls/chuck
./run

封闭式

在很大程度上,Clojure实现需要Clojure 1.5,然而,要通过所有测试,则需要Clojure 1.8.0-RC4

cd impls/clojure
lein with-profile +stepX trampoline run

CoffeeScript

sudo npm install -g coffee-script
cd impls/coffee
coffee ./stepX_YYY

通用Lisp

该实现已经在Ubuntu 16.04和Ubuntu 12.04上使用SBCL、CCL、CMUCL、GNU CLISP、ECL和Allegro CL进行了测试,请参阅README了解更多详细信息。如果您安装了上述依赖项,请执行以下操作来运行实现

cd impls/common-lisp
make
./run

水晶

MAL的晶体实现已经用Crystal 0.26.1进行了测试

cd impls/crystal
crystal run ./stepX_YYY.cr
    # OR
make   # needed to run tests
./stepX_YYY

D

使用GDC4.8对MAL的D实现进行了测试。它需要GNU读取线库

cd impls/d
make
./stepX_YYY

省道

DART实施已使用DART 1.20进行了测试

cd impls/dart
dart ./stepX_YYY

Emacs Lisp

Emacs Lisp的MAL实现已经使用Emacs 24.3和24.5进行了测试。虽然有非常基本的读数行编辑(<backspace>C-d工作,C-c取消进程),建议使用rlwrap

cd impls/elisp
emacs -Q --batch --load stepX_YYY.el
# with full readline support
rlwrap emacs -Q --batch --load stepX_YYY.el

灵丹妙药

MAL的长生不老的实现已经在长生不老的长生不老的1.0.5中进行了测试

cd impls/elixir
mix stepX_YYY
# Or with readline/line editing functionality:
iex -S mix stepX_YYY

榆树

MAL的ELM实现已经用ELM 0.18.0进行了测试

cd impls/elm
make stepX_YYY.js
STEP=stepX_YYY ./run

二郎

Mal的Erlang实现需要Erlang/OTP R17rebar要建造

cd impls/erlang
make
    # OR
MAL_STEP=stepX_YYY rebar compile escriptize # build individual step
./stepX_YYY

ES6(ECMAScript 2015)

ES6/ECMAScript 2015实施使用babel用于生成ES5兼容JavaScript的编译器。生成的代码已经在Node 0.12.4上进行了测试

cd impls/es6
make
node build/stepX_YYY.js

F#

mal的F#实现已经在Linux上使用Mono F#编译器(Fsharpc)和Mono运行时(版本3.12.1)进行了测试。单C#编译器(MCS)也是编译readline依赖项所必需的。所有这些都是构建和运行F#实现所必需的

cd impls/fsharp
make
mono ./stepX_YYY.exe

因素

MAL的因子实现已通过因子0.97(factorcode.org)

cd impls/factor
FACTOR_ROOTS=. factor -run=stepX_YYY

幻影

MAL的幻象实现已经用幻象1.0.70进行了测试

cd impls/fantom
make lib/fan/stepX_YYY.pod
STEP=stepX_YYY ./run

茴香

Mal的Fennel实现已经在Lua5.4上使用Fennel版本0.9.1进行了测试

cd impls/fennel
fennel ./stepX_YYY.fnl

第四

cd impls/forth
gforth stepX_YYY.fs

GNU Guile 2.1+

cd impls/guile
guile -L ./ stepX_YYY.scm

GNU Smalltalk

MALL的Smalltalk实现已经在GNU Smalltalk 3.2.91上进行了测试

cd impls/gnu-smalltalk
./run

MALL的GO实现要求在路径上安装GO。该实现已经在GO 1.3.1上进行了测试

cd impls/go
make
./stepX_YYY

时髦的

mal的Groovy实现需要Groovy才能运行,并且已经使用Groovy 1.8.6进行了测试

cd impls/groovy
make
groovy ./stepX_YYY.groovy

哈斯克尔

Haskell实现需要GHC编译器版本7.10.1或更高版本以及Haskell parsec和readline(或editline)包

cd impls/haskell
make
./stepX_YYY

Haxe(Neko、Python、C++和JavaScript)

Mal的Haxe实现需要编译Haxe3.2版。支持四种不同的Haxe目标:neko、Python、C++和JavaScript

cd impls/haxe
# Neko
make all-neko
neko ./stepX_YYY.n
# Python
make all-python
python3 ./stepX_YYY.py
# C++
make all-cpp
./cpp/stepX_YYY
# JavaScript
make all-js
node ./stepX_YYY.js

干草

MAL的Hy实现已经用Hy 0.13.0进行了测试

cd impls/hy
./stepX_YYY.hy

IO

已使用IO版本20110905测试了MAL的IO实现

cd impls/io
io ./stepX_YYY.io

珍妮特

MAIL的Janet实现已经使用Janet版本1.12.2进行了测试

cd impls/janet
janet ./stepX_YYY.janet

Java 1.7

mal的Java实现需要maven2来构建

cd impls/java
mvn compile
mvn -quiet exec:java -Dexec.mainClass=mal.stepX_YYY
    # OR
mvn -quiet exec:java -Dexec.mainClass=mal.stepX_YYY -Dexec.args="CMDLINE_ARGS"

Java,将Truffle用于GraalVM

这个Java实现可以在OpenJDK上运行,但是多亏了Truffle框架,它在GraalVM上的运行速度可以提高30倍。它已经在OpenJDK 11、GraalVM CE 20.1.0和GraalVM CE 21.1.0上进行了测试

cd impls/java-truffle
./gradlew build
STEP=stepX_YYY ./run

JavaScript/节点

cd impls/js
npm install
node stepX_YYY.js

朱莉娅

Mal的Julia实现需要Julia 0.4

cd impls/julia
julia stepX_YYY.jl

JQ

针对1.6版进行了测试,IO部门存在大量作弊行为

cd impls/jq
STEP=stepA_YYY ./run
    # with Debug
DEBUG=true STEP=stepA_YYY ./run

科特林

MAL的Kotlin实现已经使用Kotlin 1.0进行了测试

cd impls/kotlin
make
java -jar stepX_YYY.jar

LiveScript

已使用LiveScript 1.5测试了mal的LiveScript实现

cd impls/livescript
make
node_modules/.bin/lsc stepX_YYY.ls

徽标

MAL的Logo实现已经用UCBLogo 6.0进行了测试

cd impls/logo
logo stepX_YYY.lg

路亚

Mal的Lua实现已经使用Lua 5.3.5进行了测试。该实现需要安装luarock

cd impls/lua
make  # to build and link linenoise.so and rex_pcre.so
./stepX_YYY.lua

男性

运行mal的错误实现包括运行其他实现之一的STEPA,并传递作为命令行参数运行的mal步骤

cd impls/IMPL
IMPL_STEPA_CMD ../mal/stepX_YYY.mal

GNU Make 3.81

cd impls/make
make -f stepX_YYY.mk

NASM

MAL的NASM实现是为x86-64 Linux编写的,并且已经在Linux 3.16.0-4-AMD64和NASM版本2.11.05上进行了测试

cd impls/nasm
make
./stepX_YYY

NIM 1.0.4

MAL的NIM实现已经使用NIM 1.0.4进行了测试

cd impls/nim
make
  # OR
nimble build
./stepX_YYY

对象PASCAL

MAL的对象Pascal实现已经使用Free Pascal编译器版本2.6.2和2.6.4在Linux上构建和测试

cd impls/objpascal
make
./stepX_YYY

目标C

Mal的Objective C实现已经在Linux上使用CLANG/LLVM3.6进行了构建和测试。它还使用XCode7在OS X上进行了构建和测试

cd impls/objc
make
./stepX_YYY

OCaml 4.01.0

cd impls/ocaml
make
./stepX_YYY

MATLAB(GNU倍频程和MATLAB)

MATLAB实现已经在GNU Octave 4.2.1上进行了测试。它还在Linux上用MATLAB版本R2014a进行了测试。请注意,matlab是一个商业产品。

cd impls/matlab
./stepX_YYY
octave -q --no-gui --no-history --eval "stepX_YYY();quit;"
matlab -nodisplay -nosplash -nodesktop -nojvm -r "stepX_YYY();quit;"
    # OR with command line arguments
octave -q --no-gui --no-history --eval "stepX_YYY('arg1','arg2');quit;"
matlab -nodisplay -nosplash -nodesktop -nojvm -r "stepX_YYY('arg1','arg2');quit;"

极小值

miniMAL是用不到1024字节的JavaScript实现的小型Lisp解释器。要运行mal的最小实现,您需要下载/安装最小解释器(这需要Node.js)

cd impls/miniMAL
# Download miniMAL and dependencies
npm install
export PATH=`pwd`/node_modules/minimal-lisp/:$PATH
# Now run mal implementation in miniMAL
miniMAL ./stepX_YYY

Perl 5

Perl 5实现应该使用Perl 5.19.3和更高版本

要获得读取行编辑支持,请从CPAN安装Term::ReadLine::Perl或Term::ReadLine::GNU

cd impls/perl
perl stepX_YYY.pl

Perl 6

Perl6实现在Rakudo Perl6 2016.04上进行了测试

cd impls/perl6
perl6 stepX_YYY.pl

PHP 5.3

mal的PHP实现需要php命令行界面才能运行

cd impls/php
php stepX_YYY.php

皮奥利普

Picolisp实现需要libreadline和Picolisp 3.1.11或更高版本

cd impls/picolisp
./run

派克

Pike实现在Pike8.0上进行了测试

cd impls/pike
pike stepX_YYY.pike

pl/pgSQL(PostgreSQL SQL过程语言)

mal的PL/pgSQL实现需要一个正在运行的PostgreSQL服务器(“kanaka/mal-test-plpgsql”docker映像自动启动PostgreSQL服务器)。该实现连接到PostgreSQL服务器并创建名为“mal”的数据库来存储表和存储过程。包装器脚本使用psql命令连接到服务器,并默认为用户“postgres”,但可以使用PSQL_USER环境变量覆盖该值。可以使用PGPASSWORD环境变量指定密码。该实现已使用PostgreSQL 9.4进行了测试

cd impls/plpgsql
./wrap.sh stepX_YYY.sql
    # OR
PSQL_USER=myuser PGPASSWORD=mypass ./wrap.sh stepX_YYY.sql

PL/SQL(Oracle SQL过程语言)

mal的PL/SQL实现需要一个正在运行的Oracle DB服务器(“kanaka/mal-test-plsql”docker映像自动启动Oracle Express服务器)。该实现连接到Oracle服务器以创建类型、表和存储过程。默认的SQL*Plus登录值(用户名/口令@CONNECT_IDENTIFIER)是“SYSTEM/ORACLE”,但是可以用ORACLE_LOGON环境变量覆盖该值。该实施已使用Oracle Express Edition 11g Release 2进行了测试。请注意,任何SQL*Plus连接警告(用户密码过期等)都会干扰包装脚本与数据库通信的能力

cd impls/plsql
./wrap.sh stepX_YYY.sql
    # OR
ORACLE_LOGON=myuser/mypass@ORCL ./wrap.sh stepX_YYY.sql

PostScript Level 2/3

mal的PostScript实现需要运行Ghostscript。它已经使用Ghostscript 9.10进行了测试

cd impls/ps
gs -q -dNODISPLAY -I./ stepX_YYY.ps

PowerShell

Mal的PowerShell实现需要PowerShell脚本语言。它已经在Linux上使用PowerShell 6.0.0 Alpha 9进行了测试

cd impls/powershell
powershell ./stepX_YYY.ps1

序言

Prolog实现使用了一些特定于SWI-Prolog的结构,包括READLINE支持,并且已经在8.2.1版的Debian GNU/Linux上进行了测试

cd impls/prolog
swipl stepX_YYY

Python(2.x和3.x)

cd impls/python
python stepX_YYY.py

Python2(3.x)

第二个Python实现大量使用类型注释并使用Arpeggio解析器库

# Recommended: do these steps in a Python virtual environment.
pip3 install Arpeggio==1.9.0
python3 stepX_YYY.py

RPython

你一定是rpython在您的路径上(随附pypy)

cd impls/rpython
make        # this takes a very long time
./stepX_YYY

R

MALL R实现需要R(r-base-core)来运行

cd impls/r
make libs  # to download and build rdyncall
Rscript stepX_YYY.r

球拍(5.3)

Mal的racket实现需要运行racket编译器/解释器

cd impls/racket
./stepX_YYY.rkt

雷克斯

Mal的Rexx实现已经使用Regina Rexx 3.6进行了测试

cd impls/rexx
make
rexx -a ./stepX_YYY.rexxpp

拼音(1.9+)

cd impls/ruby
ruby stepX_YYY.rb

生锈(1.38+)

Mal的Rust实现需要使用Rust编译器和构建工具(Cargo)来构建

cd impls/rust
cargo run --release --bin stepX_YYY

缩放比例

安装Scala和SBT(http://www.scala-sbt.org/0.13/tutorial/Installing-sbt-on-Linux.html):

cd impls/scala
sbt 'run-main stepX_YYY'
    # OR
sbt compile
scala -classpath target/scala*/classes stepX_YYY

方案(R7RS)

MAL的方案实施已在赤壁-方案0.7.3、卡瓦2.4、高车0.9.5、鸡肉4.11.0、人马座0.8.3、气旋0.6.3(Git版本)和Foment 0.4(Git版本)上进行了测试。在弄清库是如何加载的并调整了R7RS实现的基础上,您应该能够让它在其他符合R7RS标准的实现上运行Makefilerun相应地编写脚本

cd impls/scheme
make symlinks
# chibi
scheme_MODE=chibi ./run
# kawa
make kawa
scheme_MODE=kawa ./run
# gauche
scheme_MODE=gauche ./run
# chicken
make chicken
scheme_MODE=chicken ./run
# sagittarius
scheme_MODE=sagittarius ./run
# cyclone
make cyclone
scheme_MODE=cyclone ./run
# foment
scheme_MODE=foment ./run

歪斜

MAL的不对称实现已经使用不对称0.7.42进行了测试

cd impls/skew
make
node stepX_YYY.js

标准ML(Poly/ML、MLton、莫斯科ML)

Mal的标准ML实现需要一个SML97实施。Makefile支持POLY/ML、MLTON、MOVICO ML,并已在POLY/ML 5.8.1、MLTON 20210117和MOSSIONS ML版本2.10上进行了测试

cd impls/sml
# Poly/ML
make sml_MODE=polyml
./stepX_YYY
# MLton
make sml_MODE=mlton
./stepX_YYY
# Moscow ML
make sml_MODE=mosml
./stepX_YYY

斯威夫特

MALL的SWIFT实施需要SWIFT 2.0编译器(XCode 7.0)来构建。由于语言和标准库中的更改,旧版本将无法运行

cd impls/swift
make
./stepX_YYY

斯威夫特3

MALL的SWIFT 3实施需要SWIFT 3.0编译器。它已经在SWIFT 3预览版3上进行了测试

cd impls/swift3
make
./stepX_YYY

斯威夫特4

MALL的SWIFT 4实施需要SWIFT 4.0编译器。它已在SWIFT 4.2.3版本中进行了测试

cd impls/swift4
make
./stepX_YYY

SWIFT 5

MALL的SWIFT 5实施需要SWIFT 5.0编译器。它已在SWIFT 5.1.1版本中进行了测试

cd impls/swift5
swift run stepX_YYY

TCL 8.6

Mal的Tcl实现需要运行Tcl 8.6。要获得readline行编辑支持,请安装tclreadline

cd impls/tcl
tclsh ./stepX_YYY.tcl

打字稿

mal的TypeScript实现需要TypeScript 2.2编译器。它已经在Node.js V6上进行了测试

cd impls/ts
make
node ./stepX_YYY.js

瓦拉

MALL的VALA实现已经用VALA0.40.8编译器进行了测试。您将需要安装valaclibreadline-dev或同等的

cd impls/vala
make
./stepX_YYY

VHDL

用GHDL0.29对mal的vhdl实现进行了测试。

cd impls/vhdl
make
./run_vhdl.sh ./stepX_YYY

Vimscript

Mal的Vimscript实现需要运行Vim 8.0

cd impls/vimscript
./run_vimscript.sh ./stepX_YYY.vim

Visual Basic.NET

Mal的VB.NET实现已经在Linux上使用Mono VB编译器(Vbnc)和Mono运行时(2.10.8.1版)进行了测试。构建和运行VB.NET实现需要两者

cd impls/vb
make
mono ./stepX_YYY.exe

WebAssembly(Wasm)

WebAssembly实现是用Wam(WebAssembly宏语言),并在几种不同的非Web嵌入(运行时)下运行:nodewasmtimewasmerlucetwaxwacewarpy

cd impls/wasm
# node
make wasm_MODE=node
./run.js ./stepX_YYY.wasm
# wasmtime
make wasm_MODE=wasmtime
wasmtime --dir=./ --dir=../ --dir=/ ./stepX_YYY.wasm
# wasmer
make wasm_MODE=wasmer
wasmer run --dir=./ --dir=../ --dir=/ ./stepX_YYY.wasm
# lucet
make wasm_MODE=lucet
lucet-wasi --dir=./:./ --dir=../:../ --dir=/:/ ./stepX_YYY.so
# wax
make wasm_MODE=wax
wax ./stepX_YYY.wasm
# wace
make wasm_MODE=wace_libc
wace ./stepX_YYY.wasm
# warpy
make wasm_MODE=warpy
warpy --argv --memory-pages 256 ./stepX_YYY.wasm

XSLT

mal的XSLT实现是用XSLT3编写的,并在Saxon 9.9.1.6家庭版上进行了测试

cd impls/xslt
STEP=stepX_YY ./run

雷恩

MAL的WREN实现在WREN 0.2.0上进行了测试

cd impls/wren
wren ./stepX_YYY.wren

约里克

MAL的Yorick实现在Yorick 2.2.04上进行了测试

cd impls/yorick
yorick -batch ./stepX_YYY.i

之字形

MAL的Zig实现在Zig0.5上进行了测试

cd impls/zig
zig build stepX_YYY

运行测试

顶层Makefile有许多有用的目标来协助实现、开发和测试。这个helpTarget提供目标和选项的列表:

make help

功能测试

中几乎有800个通用功能测试(针对所有实现)。tests/目录。每个步骤都有相应的测试文件,其中包含特定于该步骤的测试。这个runtest.py测试工具启动MAL步骤实现,然后将测试一次一个提供给实现,并将输出/返回值与预期的输出/返回值进行比较

  • 要在所有实现中运行所有测试(请准备等待):
make test
  • 要针对单个实施运行所有测试,请执行以下操作:
make "test^IMPL"

# e.g.
make "test^clojure"
make "test^js"
  • 要对所有实施运行单个步骤的测试,请执行以下操作:
make "test^stepX"

# e.g.
make "test^step2"
make "test^step7"
  • 要针对单个实施运行特定步骤的测试,请执行以下操作:
make "test^IMPL^stepX"

# e.g
make "test^ruby^step3"
make "test^ps^step4"

自托管功能测试

  • 若要在自托管模式下运行功能测试,请指定mal作为测试实现,并使用MAL_IMPLMake Variable以更改基础主机语言(默认值为JavaScript):
make MAL_IMPL=IMPL "test^mal^step2"

# e.g.
make "test^mal^step2"   # js is default
make MAL_IMPL=ruby "test^mal^step2"
make MAL_IMPL=python "test^mal^step2"

启动REPL

  • 要在特定步骤中启动实施的REPL,请执行以下操作:
make "repl^IMPL^stepX"

# e.g
make "repl^ruby^step3"
make "repl^ps^step4"
  • 如果您省略了这一步,那么stepA使用的是:
make "repl^IMPL"

# e.g
make "repl^ruby"
make "repl^ps"
  • 若要启动自托管实现的REPL,请指定mal作为REPL实现,并使用MAL_IMPLMake Variable以更改基础主机语言(默认值为JavaScript):
make MAL_IMPL=IMPL "repl^mal^stepX"

# e.g.
make "repl^mal^step2"   # js is default
make MAL_IMPL=ruby "repl^mal^step2"
make MAL_IMPL=python "repl^mal"

性能测试

警告:这些性能测试在统计上既不有效,也不全面;运行时性能不是mal的主要目标。如果你从这些性能测试中得出任何严肃的结论,那么请联系我,了解堪萨斯州一些令人惊叹的海滨房产,我愿意以低价卖给你

  • 要针对单个实施运行性能测试,请执行以下操作:
make "perf^IMPL"

# e.g.
make "perf^js"
  • 要对所有实施运行性能测试,请执行以下操作:
make "perf"

正在生成语言统计信息

  • 要报告单个实施的行和字节统计信息,请执行以下操作:
make "stats^IMPL"

# e.g.
make "stats^js"

对接测试

每个实现目录都包含一个Dockerfile,用于创建包含该实现的所有依赖项的docker映像。此外,顶级Makefile还支持在停靠器容器中通过传递以下参数来运行测试目标(以及perf、stats、repl等“DOCKERIZE=1”在make命令行上。例如:

make DOCKERIZE=1 "test^js^step3"

现有实现已经构建了坞站映像,并将其推送到坞站注册表。但是,如果您希望在本地构建或重建坞站映像,TopLevel Makefile提供了构建坞站映像的规则:

make "docker-build^IMPL"

注意事项

  • Docker镜像被命名为“Kanaka/mal-test-iml”
  • 基于JVM的语言实现(Groovy、Java、Clojure、Scala):您可能需要首先手动运行此命令一次make DOCKERIZE=1 "repl^IMPL"然后才能运行测试,因为需要下载运行时依赖项以避免测试超时。这些依赖项被下载到/mal目录中的点文件中,因此它们将在两次运行之间保持不变

许可证

MAL(make-a-lisp)是根据MPL 2.0(Mozilla Public License 2.0)许可的。有关更多详细信息,请参阅LICENSE.txt

Social-analyzer-API、CLI和Web应用程序,用于分析和查找跨社交媒体/网站的个人资料

Social Analyzer-API、CLI和Web应用程序,用于分析和查找一个人在+800个社交媒体/网站上的个人资料。它包括不同的字符串分析和检测模块,您可以选择在调查过程中使用哪种模块组合

检测模块利用基于不同检测技术的评级机制,该机制产生从0到100(否-可能-是)的率值。本模块旨在减少误报,并在下面的文档中进行了说明Wiki链接

从该OSINT工具分析和公开提取的信息可以帮助调查与可疑或恶意活动相关的配置文件,例如cyberbullyingcybergroomingcyberstalking,以及spreading misinformation

这个项目是“目前一些执法机构在资源有限的国家使用”

Social Analyzer is in a league of its own and is a very impressive tool that I thoroughly recommend for Digital Investigators and OSINT practitioners-由Joseph Jones, Founder of Strategy Nord, Unita Insight and OS2INT

更新

  • GUI版本的新更新-您可以生成类别统计信息。此外,您还可以在设置窗口中根据网站的全球排名选择网站🔥
  • CLI的新更新-您可以根据网站的全球排名选择网站,如-网站TOP10,-网站TOP123等。🔥
  • 新的Social-Analyzer版本使用它自己的名为Ixora的自动图形可视化
  • 我已经收到了很多公共和私人的请求,要求将静电网站的信息添加到检测数据库中,这一点正在实施中,+400%的检测应该有这样的要求。如果您有任何非私人模块,并且您无法查看静电网站的信息,请下载最新版本或发送电子邮件给我以了解详细信息

所以·社会我·迪·a

使用户能够创建和共享内容或参与社交网络的网站和应用程序-牛津词典

安全测试

-------------------------------------              ---------------------------------
|        Security Testing           |              |        Social-Analyzer        |
-------------------------------------              ---------------------------------
|   Passive Information Gathering   |     <-->     |   Find Social Media Profiles  |
|                                   |              |                               |
|    Active Information Gathering   |     <-->     |    Post Analysis Activities   |
-------------------------------------              ---------------------------------

应用程序

标准本地主机Web应用URL:http://0.0.0.0:9005/app.html

CLI

功能

  • 字符串和名称分析(排列和组合)
  • 使用多种技术(HTTPS库和WebDriver)查找配置文件
  • 多层检测(OCR、普通、高级和特殊)
  • 使用Ixora(元数据和模式)可视化配置文件信息
  • 元数据和模式提取(从Qeeqbox OSINT项目添加)
  • 元数据的强制有向图(需要ExtractPatterns)
  • 自动调情到不必要的输出
  • 搜索引擎查找(Google API-可选)
  • 自定义搜索查询(Google API和DuckDuckGo API-可选)
  • 个人资料屏幕截图、标题、信息和网站描述
  • 按语言查找姓名来源、姓名相似度和常用词
  • 自定义用户-代理、代理、超时和隐式等待
  • Python CLI&NodeJS CLI(仅限于FindUserProfilesFast选项)
  • 用于更快检查的网格选项(仅限于对接合成)
  • 将日志转储到文件夹或终端(美化)
  • 调整查找\获取配置文件工作进程(默认值为15)
  • 失败配置文件的重新检查选项
  • 按好的、可能的和坏的过滤配置文件
  • 将分析另存为JSON文件
  • 简化的Web界面和客户端

特殊探测

安装和运行

Linux(作为节点WebApp)

sudo apt-get update
#Depedning on your Linux distro, you may or may not need these 2 lines
sudo DEBIAN_FRONTEND=noninteractive apt-get install -y software-properties-common
sudo add-apt-repository ppa:mozillateam/ppa -y
sudo apt-get install -y firefox-esr tesseract-ocr git nodejs npm
git clone https://github.com/qeeqbox/social-analyzer.git
cd social-analyzer
npm install
npm start

Linux(作为python包)

sudo apt-get update
sudo apt-get install python3 python3-pip
pip3 install social-analyzer
social-analyzer --username "johndoe" --metadata
#or
python3 -m social-analyzer --username "johndoe" --metadata

Linux(作为python脚本)

sudo apt-get update
sudo apt-get install git python3 python3-pip
git clone https://github.com/qeeqbox/social-analyzer
cd social-analyzer
pip3 install –r requirements.txt
python3 app.py social-analyzer --username "johndoe" --metadata

作为对象导入(Python)

from importlib import import_module
SocialAnalyzer = import_module("social-analyzer").SocialAnalyzer(silent=True)
results = SocialAnalyzer.run_as_object(username="johndoe",silent=True)
print(results)

Linux、Windows、MacOS、Raspberry pi

  • 检查一下这个wiki适用于所有可能的安装方法
  • 检查一下这个wiki用于将社交分析器与您的OSINT工具、提要等集成

社交分析器–h

Required Arguments:
  --username   E.g. johndoe, john_doe or johndoe9999

Optional Arguments:
  --websites   Website or websites separated by space E.g. youtube, tiktok or tumblr
  --mode       Analysis mode E.g.fast -> FindUserProfilesFast, slow -> FindUserProfilesSlow or special -> FindUserProfilesSpecial
  --output     Show the output in the following format: json -> json output for integration or pretty -> prettify the output
  --options    Show the following when a profile is found: link, rate, title or text
  --method     find -> show detected profiles, get -> show all profiles regardless detected or not, both -> combine find & get
  --filter     Filter detected profiles by good, maybe or bad, you can do combine them with comma (good,bad) or use all
  --profiles   Filter profiles by detected, unknown or failed, you can do combine them with comma (detected,failed) or use all
  --extract    Extract profiles, urls & patterns if possible
  --metadata   Extract metadata if possible (pypi QeeqBox OSINT)
  --trim       Trim long strings

Listing websites & detections:
  --list       List all available websites

Setting:
  --headers    Headers as dict
  --logs_dir   Change logs directory
  --timeout    Change timeout between each request
  --silent     Disable output to screen

打开外壳

运行问题

  • 请记住,现有配置文件显示status:goodrate:%100
  • 一些网站返回blockedinvalid<-这是预期的行为
  • 使用代理、VPN、ToR或任何类似方式定期检查可疑配置文件
  • 请勿将FindUserProfilesFast与FindUserProfilesSlow和ShowUserProfilesSlow混合使用
  • 将用户代理更改为最新的用户代理或增加请求之间的随机时间
  • 使用慢速模式(CLI中不可用)以避免遇到阻塞\结果问题

目标

  • 添加通用网站检测(这些需要一些审查,但我将在2021年尝试添加它们)

资源

  • DuckDuckGo API、Google API、NodeJS、bootstrap、选择化、jQuery、维基百科、font-awawed、Selenium-Webdriver&tesseract.js
  • 如果我错过了参考资料或资源,请告诉我!

采访

一些新闻\文章

免责声明\备注

  • 请确保从GitHub下载此工具
  • 这是一个安全项目(将其视为安全项目)
  • 如果您希望将您的网站排除在此项目列表之外,请与我联系
  • 此工具将在本地使用,而不是作为服务使用(它没有任何类型的访问控制)
  • 有关以-private结尾的模块的相关问题,请直接联系我(不要在GitHub上打开问题)

其他项目

Oppia-一个免费的在线学习平台,让所有人都能获得优质教育

Oppia是一个在线学习工具,任何人都可以轻松地创建和分享互动活动(称为“探索”)。这些活动模拟与导师的一对一对话,使学生能够边做边学,同时获得反馈

除了开发Oppia平台外,团队还在开发和试点一套免费有效的lessons基础数学。这些课程针对的是无法获得教育资源的学习者。

Oppia是使用Python和AngularJS编写的,构建在Google App engine之上



安装

请参阅Installing Oppia page有关完整说明,请参阅

贡献

奥皮亚项目是由社区为社区建造的。我们欢迎每个人的贡献,特别是新的贡献者。

您可以在很多方面帮助Oppia的开发,包括艺术、编码、设计和文档

支持

如果您有任何功能请求或错误报告,请将它们记录在我们的issue tracker

请将安全问题直接报告给admin@oppia.org

许可证

Oppia代码在Apache v2 license

保持联系

我们在Gitter上也有公共聊天室:https://gitter.im/oppia/oppia-chat顺便过来打个招呼!

Js-beautify-javascript 格式化工具

JS 格式化

这个小美化器将重新格式化和重新缩进bookmarklet、丑陋的JavaScript、解压由Dean Edward的流行打包程序打包的脚本,以及对由NPM包处理的脚本进行部分去模糊处理javascript-obfuscator

我把这个放在最前面和中心,因为现有的业主目前在这个项目上的工作时间非常有限。这是一个很受欢迎并被广泛使用的项目,但它迫切需要有时间致力于修复面向客户的错误以及内部设计和实现的潜在问题的贡献者

如果您有兴趣,请看一下CONTRIBUTING.md然后修复标有“Good first issue”贴上标签并提交请购单。尽可能多地重复。谢谢!

安装

您可以安装node.js或python的美化器

Node.js JavaScript

您可以安装npm包。js-beautify全局安装时,它提供一个可执行文件js-beautify剧本。与Python脚本一样,美化结果被发送到stdout除非另有配置,否则

$ npm -g install js-beautify
$ js-beautify foo.js

您还可以使用js-beautify作为一个node库(本地安装,npm默认值):

$ npm install js-beautify

Node.js JavaScript(VNext)

以上安装了最新的稳定版本。要安装测试版或RC版,请执行以下操作:

$ npm install js-beautify@next

Web库

美容师可以作为Web库添加到您的页面上

JS美颜托管在两个CDN服务上:cdnjs和生菜

要从这些服务之一提取最新版本,请在您的文档中包含以下一组脚本标记:

“>
<script src="https://cdnjs.cloudflare.com/ajax/libs/js-beautify/1.14.0/beautify.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/js-beautify/1.14.0/beautify-css.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/js-beautify/1.14.0/beautify-html.js"></script>

<script src="https://cdnjs.cloudflare.com/ajax/libs/js-beautify/1.14.0/beautify.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/js-beautify/1.14.0/beautify-css.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/js-beautify/1.14.0/beautify-html.min.js"></script>

<script src="https://cdn.rawgit.com/beautify-web/js-beautify/v1.14.0/js/lib/beautify.js"></script>
<script src="https://cdn.rawgit.com/beautify-web/js-beautify/v1.14.0/js/lib/beautify-css.js"></script>
<script src="https://cdn.rawgit.com/beautify-web/js-beautify/v1.14.0/js/lib/beautify-html.js"></script>

通过更改版本号可以使用较旧的版本

免责声明:这些都是免费服务,所以有no uptime or support guarantees

python

要安装Python版本的美容器,请执行以下操作:

$ pip install jsbeautifier

与JavaScript版本不同,Python版本只能重新格式化JavaScript。它不适用于HTML或CSS文件,但您可以安装CSS-美化对于CSS:

$ pip install cssbeautifier

用法

您可以在Web浏览器中使用JS美化器美化javascript,也可以在命令行上使用node.js或python美化javascript

Web浏览器

打开beautifier.io选项通过UI提供

Web库

上面的脚本标记公开了三个函数:js_beautifycss_beautify,以及html_beautify

Node.js JavaScript

全局安装时,美容器提供一个可执行文件js-beautify剧本。美化结果发送到stdout除非另有配置,否则

$ js-beautify foo.js

要使用js-beautify作为一个node库(在本地安装之后),为javascript(Js)、CSS或HTML导入并调用适当的Beautifier方法。所有三个方法签名都是beautify(code, options)code是要美化的代码字符串。Options是一个具有您希望用来美化代码的设置的对象

配置选项名称与CLI名称相同,但使用下划线而不是破折号。例如,--indent-size 2 --space-in-empty-paren会是{ indent_size: 2, space_in_empty_paren: true }

var beautify = require('js-beautify').js,
    fs = require('fs');

fs.readFile('foo.js', 'utf8', function (err, data) {
    if (err) {
        throw err;
    }
    console.log(beautify(data, { indent_size: 2, space_in_empty_paren: true }));
});

python

安装后,要使用Python进行美化,请执行以下操作:

$ js-beautify file.js

美化的产出流向stdout默认情况下,

要使用jsbeautifier因为库很简单:

import jsbeautifier
res = jsbeautifier.beautify('your javascript string')
res = jsbeautifier.beautify_file('some_file.js')

或者,要指定一些选项,请执行以下操作:

opts = jsbeautifier.default_options()
opts.indent_size = 2
opts.space_in_empty_paren = True
res = jsbeautifier.beautify('some javascript', opts)

配置选项名称与CLI名称相同,但使用下划线而不是破折号。上面的示例将在命令行上设置为--indent-size 2 --space-in-empty-paren

选项

以下是Python和JS脚本的命令行标志:

CLI Options:
  -f, --file       Input file(s) (Pass '-' for stdin)
  -r, --replace    Write output in-place, replacing input
  -o, --outfile    Write output to file (default stdout)
  --config         Path to config file
  --type           [js|css|html] ["js"] Select beautifier type (NOTE: Does *not* filter files, only defines which beautifier type to run)
  -q, --quiet      Suppress logging to stdout
  -h, --help       Show this help
  -v, --version    Show the version

Beautifier Options:
  -s, --indent-size                 Indentation size [4]
  -c, --indent-char                 Indentation character [" "]
  -t, --indent-with-tabs            Indent with tabs, overrides -s and -c
  -e, --eol                         Character(s) to use as line terminators.
                                    [first newline in file, otherwise "\n]
  -n, --end-with-newline            End output with newline
  --editorconfig                    Use EditorConfig to set up the options
  -l, --indent-level                Initial indentation level [0]
  -p, --preserve-newlines           Preserve line-breaks (--no-preserve-newlines disables)
  -m, --max-preserve-newlines       Number of line-breaks to be preserved in one chunk [10]
  -P, --space-in-paren              Add padding spaces within paren, ie. f( a, b )
  -E, --space-in-empty-paren        Add a single space inside empty paren, ie. f( )
  -j, --jslint-happy                Enable jslint-stricter mode
  -a, --space-after-anon-function   Add a space before an anonymous function's parens, ie. function ()
  --space-after-named-function      Add a space before a named function's parens, i.e. function example ()
  -b, --brace-style                 [collapse|expand|end-expand|none][,preserve-inline] [collapse,preserve-inline]
  -u, --unindent-chained-methods    Don't indent chained method calls
  -B, --break-chained-methods       Break chained method calls across subsequent lines
  -k, --keep-array-indentation      Preserve array indentation
  -x, --unescape-strings            Decode printable characters encoded in xNN notation
  -w, --wrap-line-length            Wrap lines that exceed N characters [0]
  -X, --e4x                         Pass E4X xml literals through untouched
  --good-stuff                      Warm the cockles of Crockford's heart
  -C, --comma-first                 Put commas at the beginning of new line instead of end
  -O, --operator-position           Set operator position (before-newline|after-newline|preserve-newline) [before-newline]
  --indent-empty-lines              Keep indentation on empty lines
  --templating                      List of templating languages (auto,django,erb,handlebars,php,smarty) ["auto"] auto = none in JavaScript, all in html

它们对应于两个库接口的带下划线的选项键

每个CLI选项的默认值

{
    "indent_size": 4,
    "indent_char": " ",
    "indent_with_tabs": false,
    "editorconfig": false,
    "eol": "\n",
    "end_with_newline": false,
    "indent_level": 0,
    "preserve_newlines": true,
    "max_preserve_newlines": 10,
    "space_in_paren": false,
    "space_in_empty_paren": false,
    "jslint_happy": false,
    "space_after_anon_function": false,
    "space_after_named_function": false,
    "brace_style": "collapse",
    "unindent_chained_methods": false,
    "break_chained_methods": false,
    "keep_array_indentation": false,
    "unescape_strings": false,
    "wrap_line_length": 0,
    "e4x": false,
    "comma_first": false,
    "operator_position": "before-newline",
    "indent_empty_lines": false,
    "templating": ["auto"]
}

CLI中未显示的默认值

{
  "eval_code": false,
  "space_before_conditional": true
}

请注意,并非所有默认值都通过CLI显示。从历史上看,Python和JSAPI并不是100%相同的。还有一些其他情况使我们无法实现100%的API兼容性

从环境或.jsBeaufyrc加载设置(仅限JavaScript)

除了CLI参数之外,您还可以通过以下方式将配置传递给JS可执行文件:

  • 任何jsbeautify_-带前缀的环境变量
  • 一个JSON-由指示的格式化文件--config参数
  • 一个.jsbeautifyrc包含以下内容的文件JSON以上文件系统的任何级别的数据$PWD

此堆栈中前面提供的配置源将覆盖后面的配置源

设置继承和特定于语言的重写

这些设置是一个浅树,其值对于所有语言都是继承值,但可以被覆盖。这适用于在任一实现中直接传递给API的设置。在Javascript实现中,从配置文件(如.jsBeaufyrc)加载的设置也可以使用继承/覆盖

下面是一个示例配置树,显示了语言覆盖节点的所有支持位置。我们将使用indent_size要讨论此配置的行为方式,但可以继承或覆盖任意数量的设置,请执行以下操作:

{
    "indent_size": 4,
    "html": {
        "end_with_newline": true,
        "js": {
            "indent_size": 2
        },
        "css": {
            "indent_size": 2
        }
    },
    "css": {
        "indent_size": 1
    },
    "js": {
       "preserve-newlines": true
    }
}

使用上述示例将产生以下结果:

  • HTML文件
    • 继承indent_size从顶层设置开始,共4个空间
    • 这些文件还将以换行符结尾
    • HTML中的JavaScript和CSS
      • 继承HTMLend_with_newline设置
      • 将它们的缩进覆盖为2个空格
  • CSS文件
    • 将顶级设置重写为indent_size共1个空间
  • JavaScript文件
    • 继承indent_size从顶层设置开始,共4个空间
    • 设置preserve-newlinestrue

CSS和HTML

除了js-beautify可执行文件,css-beautifyhtml-beautify还提供了进入这些脚本的简单界面。或者,js-beautify --cssjs-beautify --html将分别完成相同的任务

// Programmatic access
var beautify_js = require('js-beautify'); // also available under "js" export
var beautify_css = require('js-beautify').css;
var beautify_html = require('js-beautify').html;

// All methods accept two arguments, the string to be beautified, and an options object.

CSS和HTML美化程序在范围上要简单得多,并且拥有的选项要少得多

CSS Beautifier Options:
  -s, --indent-size                  Indentation size [4]
  -c, --indent-char                  Indentation character [" "]
  -t, --indent-with-tabs             Indent with tabs, overrides -s and -c
  -e, --eol                          Character(s) to use as line terminators. (default newline - "\\n")
  -n, --end-with-newline             End output with newline
  -b, --brace-style                  [collapse|expand] ["collapse"]
  -L, --selector-separator-newline   Add a newline between multiple selectors
  -N, --newline-between-rules        Add a newline between CSS rules
  --indent-empty-lines               Keep indentation on empty lines

HTML Beautifier Options:
  -s, --indent-size                  Indentation size [4]
  -c, --indent-char                  Indentation character [" "]
  -t, --indent-with-tabs             Indent with tabs, overrides -s and -c
  -e, --eol                          Character(s) to use as line terminators. (default newline - "\\n")
  -n, --end-with-newline             End output with newline
  -p, --preserve-newlines            Preserve existing line-breaks (--no-preserve-newlines disables)
  -m, --max-preserve-newlines        Maximum number of line-breaks to be preserved in one chunk [10]
  -I, --indent-inner-html            Indent <head> and <body> sections. Default is false.
  -b, --brace-style                  [collapse-preserve-inline|collapse|expand|end-expand|none] ["collapse"]
  -S, --indent-scripts               [keep|separate|normal] ["normal"]
  -w, --wrap-line-length             Maximum characters per line (0 disables) [250]
  -A, --wrap-attributes              Wrap attributes to new lines [auto|force|force-aligned|force-expand-multiline|aligned-multiple|preserve|preserve-aligned] ["auto"]
  -i, --wrap-attributes-indent-size  Indent wrapped attributes to after N characters [indent-size] (ignored if wrap-attributes is "aligned")
  -d, --inline                       List of tags to be considered inline tags
  -U, --unformatted                  List of tags (defaults to inline) that should not be reformatted
  -T, --content_unformatted          List of tags (defaults to pre) whose content should not be reformatted
  -E, --extra_liners                 List of tags (defaults to [head,body,/html] that should have an extra newline before them.
  --editorconfig                     Use EditorConfig to set up the options
  --indent_scripts                   Sets indent level inside script tags ("normal", "keep", "separate")
  --unformatted_content_delimiter    Keep text content together between this string [""]
  --indent-empty-lines               Keep indentation on empty lines
  --templating                       List of templating languages (auto,none,django,erb,handlebars,php,smarty) ["auto"] auto = none in JavaScript, all in html

指令

指令允许您从源文件中控制美化器的行为。指令放在文件内部的注释中。指令的格式为/* beautify {name}:{value} */用CSS和JavaScript编写。在HTML中,它们的格式为<!-- beautify {name}:{value} -->

忽略指令

这个ignore指令使美化器完全忽略文件的一部分,将其视为不解析的文字文本。美化后以下输入保持不变:

// Use ignore when the content is not parsable in the current language, JavaScript in this case.var a =  1;/* beautify ignore:start */ {This is some strange{template language{using open-braces?/* beautify ignore:end */

保留指令

注意:此指令仅适用于HTML和JavaScript,不适用于CSS

这个preserve指令使美化器解析,然后保留一段代码的现有格式

美化后以下输入保持不变:

// Use preserve when the content is valid syntax in the current language, JavaScript in this case.
// This will parse the code and preserve the existing formatting.
/* beautify preserve:start */
{
    browserName: 'internet explorer',
    platform:    'Windows 7',
    version:     '8'
}
/* beautify preserve:end */

许可证

您可以自由地以任何您想要的方式使用它,以防您觉得它对您有用或有效,但您必须保留版权声明和许可证。(麻省理工学院)

学分

还要感谢杰森·戴蒙德、帕特里克·霍夫、诺姆·索松科、安德烈亚斯·施耐德、戴夫·瓦西列夫斯基、维塔利·巴特马诺夫、罗恩·鲍德温、加布里埃尔·哈里森、克里斯·J·舒尔、马蒂亚斯·拜恩斯、维托里奥·甘巴莱塔等人

(Readme.md:js-Beautify@1.14.0)

Zulip-Zulip服务器和web应用-功能强大的开源团队聊天

Zulip概述

Zulip是一款功能强大的开源群聊应用程序,它结合了实时聊天的即时性和线程化对话的生产力优势。Zulip被开源项目、财富500强公司、大型标准机构和其他需要实时聊天系统的人使用,该系统允许用户每天轻松处理数百或数千条消息。每月有700多名贡献者合并500多条提交,Zulip也是规模最大、增长最快的开源群聊项目

快速入门

单击下面的相应链接。如果什么都不适用,请加入我们的Zulip community server告诉我们怎么回事!

您可能会对以下内容感兴趣:

您可能还会有兴趣阅读我们的blog或者跟着我们走TwitterZulip分布在Apache 2.0许可证

Python Bokeh 浏览器中的交互式数据可视化




Bokeh是一个用于现代Web浏览器的交互式可视化程序库。它提供优雅、简洁的多功能图形构造,并在大型或流式数据集上提供高性能的交互性。Bokeh可以帮助任何想要快速、轻松地制作交互式绘图、仪表板和数据应用程序的人

最新版本
孔达

许可证

PyPI

赞助

实时教程

生成状态 静电分析
支持

推特

如果你喜欢伯克并愿意支持我们的使命,请考虑making a donation











































安装

安装Bokeh的最简单方法是使用Anaconda Python distribution及其包含的孔达包裹管理系统。要安装Bokeh及其所需的依赖项,请在Bash或Windows命令提示符下输入以下命令:

conda install bokeh

要使用pip进行安装,请在Bash或Windows命令提示符下输入以下命令:

pip install bokeh

有关更多信息,请参阅installation documentation

资源

安装Bokeh后,请查看first steps guides

访问full documentation site要查看User’s Guidelaunch the Bokeh tutorial要在实时Jupyter笔记本中了解Bokeh,请执行以下操作

社区支持可在Project Discourse

如果您想对Bokeh做出贡献,请查看Developer Guiderequest an invitation to the Bokeh Dev Slack workspace

注意:在Bokeh项目的代码库、问题跟踪器和论坛中互动的每个人都应该遵循Code of Conduct

跟我们走吧

关注我们的推特@bokeh

支持

财政支持

Bokeh项目对此表示感谢individual contributions以下组织和公司提供赞助和支持:

















如果您的公司使用Bokeh并能够赞助该项目,请联系info@bokeh.org

Bokeh是NumFOCUS的赞助项目,NumFOCUS是美国的501(C)(3)非营利性慈善机构。NumFOCUS为Bokeh提供财政、法律和行政支持,以帮助确保项目的健康和可持续性。参观numfocus.org了解更多信息

对Bokeh的捐款由NumFOCUS管理。对于美国的捐赠者,您的捐赠在法律规定的范围内是免税的。与任何捐赠一样,您应该就您的具体税务情况咨询您的税务顾问。

实物支持

Bokeh项目还感谢以下公司捐赠的服务:

安全性

若要报告安全漏洞,请使用Tidelift security contactTidelift将协调修复和披露

Graal-GraalVM:在🚀任何地方更快地运行程序

GraalVM是一个通用虚拟机,用于运行以JavaScript、Python、Ruby、R、基于JVM的语言(如Java、Scala、Clojure、Kotlin)以及基于LLVM的语言(如C和C++)编写的应用程序

项目网站为https://www.graalvm.org描述如何get started,如何stay connected,以及如何contribute

存储库结构

GraalVM主源存储库包括下面列出的组件。每个组件的文档都包括该组件的开发人员说明

  • GraalVM SDK包含长期支持的GraalVM API
  • GraalVM compiler用JAVA编写,支持动态编译和静电编译,可以与JAVA HotSpot VM集成,也可以独立运行
  • Truffle用于创建GraalVM的语言和工具的语言实现框架
  • Tools包含一组使用规范框架实现的GraalVM语言的工具
  • Substrate VM允许在封闭假设下提前(AOT)将Java应用程序编译为可执行映像或共享对象的框架
  • Sulong是用于在GraalVM上运行LLVM位码的引擎
  • GraalWasm是一个用于在GraalVM上运行WebAssembly程序的引擎
  • TRegex是正则表达式的实现,它利用GraalVM高效编译自动机
  • VM包括构建模块化GraalVM映像的组件
  • VS Code提供对Visual Studio代码的扩展,以支持使用GraalVM开发多语言应用程序

获得支持

相关存储库

GraalVM允许使用Truffle和GraalVM编译器在使用GraalVM内核的相关存储库中开发和测试的以下语言运行。这些是:

许可证

每个GraalVM组件均获得许可: