标签归档:scala

如何在Spark中关闭INFO日志记录?

问题:如何在Spark中关闭INFO日志记录?

我使用AWS EC2指南安装了Spark,并且可以使用bin/pyspark脚本正常启动该程序以获取Spark 提示,并且还可以成功执行快速入门Quide。

但是,我无法终生解决如何INFO在每个命令后停止所有冗长的日志记录。

我在下面的代码(注释掉,设置为OFF)中的几乎所有可能的情况下都尝试了log4j.propertiesconf文件夹,该文件夹位于我从中以及在每个节点上启动应用程序的文件夹中,没有任何反应。INFO执行每个语句后,我仍然可以打印日志记录语句。

我对应该如何工作感到非常困惑。

#Set everything to be logged to the console log4j.rootCategory=INFO, console                                                                        
log4j.appender.console=org.apache.log4j.ConsoleAppender 
log4j.appender.console.target=System.err     
log4j.appender.console.layout=org.apache.log4j.PatternLayout 
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

这是我使用时的完整类路径SPARK_PRINT_LAUNCH_COMMAND

Spark命令:/Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home/bin/java -cp:/root/spark-1.0.1-bin-hadoop2/conf:/root/spark-1.0.1 -bin-hadoop2 / conf:/root/spark-1.0.1-bin-hadoop2/lib/spark-assembly-1.0.1-hadoop2.2.0.jar:/root/spark-1.0.1-bin-hadoop2/lib /datanucleus-api-jdo-3.2.1.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/root/spark-1.0.1-bin-hadoop2 /lib/datanucleus-rdbms-3.2.1.jar -XX:MaxPermSize = 128m -Djava.library.path = -Xms512m -Xmx512m org.apache.spark.deploy.Spark提交spark-shell –class org.apache.spark。代表主

的内容spark-env.sh

#!/usr/bin/env bash

# This file is sourced when running various Spark programs.
# Copy it as spark-env.sh and edit that to configure Spark for your site.

# Options read when launching programs locally with 
# ./bin/run-example or ./bin/spark-submit
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
# - SPARK_CLASSPATH=/root/spark-1.0.1-bin-hadoop2/conf/

# Options read by executors and drivers running inside the cluster
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
# - SPARK_CLASSPATH, default classpath entries to append
# - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data
# - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos

# Options read in YARN client mode
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_EXECUTOR_INSTANCES, Number of workers to start (Default: 2)
# - SPARK_EXECUTOR_CORES, Number of cores for the workers (Default: 1).
# - SPARK_EXECUTOR_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
# - SPARK_DRIVER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: ‘default’)
# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job.
# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job.

# Options for the daemons used in the standalone deploy mode:
# - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_INSTANCES, to set the number of worker processes per node
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers

export SPARK_SUBMIT_CLASSPATH="$FWDIR/conf"

I installed Spark using the AWS EC2 guide and I can launch the program fine using the bin/pyspark script to get to the spark prompt and can also do the Quick Start quide successfully.

However, I cannot for the life of me figure out how to stop all of the verbose INFO logging after each command.

I have tried nearly every possible scenario in the below code (commenting out, setting to OFF) within my log4j.properties file in the conf folder in where I launch the application from as well as on each node and nothing is doing anything. I still get the logging INFO statements printing after executing each statement.

I am very confused with how this is supposed to work.

#Set everything to be logged to the console log4j.rootCategory=INFO, console                                                                        
log4j.appender.console=org.apache.log4j.ConsoleAppender 
log4j.appender.console.target=System.err     
log4j.appender.console.layout=org.apache.log4j.PatternLayout 
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

Here is my full classpath when I use SPARK_PRINT_LAUNCH_COMMAND:

Spark Command: /Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home/bin/java -cp :/root/spark-1.0.1-bin-hadoop2/conf:/root/spark-1.0.1-bin-hadoop2/conf:/root/spark-1.0.1-bin-hadoop2/lib/spark-assembly-1.0.1-hadoop2.2.0.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell –class org.apache.spark.repl.Main

contents of spark-env.sh:

#!/usr/bin/env bash

# This file is sourced when running various Spark programs.
# Copy it as spark-env.sh and edit that to configure Spark for your site.

# Options read when launching programs locally with 
# ./bin/run-example or ./bin/spark-submit
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
# - SPARK_CLASSPATH=/root/spark-1.0.1-bin-hadoop2/conf/

# Options read by executors and drivers running inside the cluster
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
# - SPARK_CLASSPATH, default classpath entries to append
# - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data
# - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos

# Options read in YARN client mode
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_EXECUTOR_INSTANCES, Number of workers to start (Default: 2)
# - SPARK_EXECUTOR_CORES, Number of cores for the workers (Default: 1).
# - SPARK_EXECUTOR_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
# - SPARK_DRIVER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: ‘default’)
# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job.
# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job.

# Options for the daemons used in the standalone deploy mode:
# - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_INSTANCES, to set the number of worker processes per node
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers

export SPARK_SUBMIT_CLASSPATH="$FWDIR/conf"

回答 0

只需在spark目录中执行以下命令:

cp conf/log4j.properties.template conf/log4j.properties

编辑log4j.properties:

# Set everything to be logged to the console
log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

在第一行替换:

log4j.rootCategory=INFO, console

通过:

log4j.rootCategory=WARN, console

保存并重新启动您的Shell。它适用于OS X上的Spark 1.1.0和Spark 1.5.1。

Just execute this command in the spark directory:

cp conf/log4j.properties.template conf/log4j.properties

Edit log4j.properties:

# Set everything to be logged to the console
log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

Replace at the first line:

log4j.rootCategory=INFO, console

by:

log4j.rootCategory=WARN, console

Save and restart your shell. It works for me for Spark 1.1.0 and Spark 1.5.1 on OS X.


回答 1

受到pyspark / tests.py的启发

def quiet_logs(sc):
    logger = sc._jvm.org.apache.log4j
    logger.LogManager.getLogger("org"). setLevel( logger.Level.ERROR )
    logger.LogManager.getLogger("akka").setLevel( logger.Level.ERROR )

在创建SparkContext之后立即调用此方法,将测试时记录的stderr行从2647减少到163。但是创建SparkContext本身会记录163,直至

15/08/25 10:14:16 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0

而且我还不清楚如何以编程方式进行调整。

Inspired by the pyspark/tests.py I did

def quiet_logs(sc):
    logger = sc._jvm.org.apache.log4j
    logger.LogManager.getLogger("org"). setLevel( logger.Level.ERROR )
    logger.LogManager.getLogger("akka").setLevel( logger.Level.ERROR )

Calling this just after creating SparkContext reduced stderr lines logged for my test from 2647 to 163. However creating the SparkContext itself logs 163, up to

15/08/25 10:14:16 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0

and it’s not clear to me how to adjust those programmatically.


回答 2

在Spark 2.0中,您还可以使用setLogLevel为应用程序动态配置它:

    from pyspark.sql import SparkSession
    spark = SparkSession.builder.\
        master('local').\
        appName('foo').\
        getOrCreate()
    spark.sparkContext.setLogLevel('WARN')

pyspark控制台中,默认spark会话将已经可用。

In Spark 2.0 you can also configure it dynamically for your application using setLogLevel:

    from pyspark.sql import SparkSession
    spark = SparkSession.builder.\
        master('local').\
        appName('foo').\
        getOrCreate()
    spark.sparkContext.setLogLevel('WARN')

In the pyspark console, a default spark session will already be available.


回答 3

编辑您的conf / log4j.properties文件,然后更改以下行:

   log4j.rootCategory=INFO, console

    log4j.rootCategory=ERROR, console

另一种方法是:

启动spark-shell并输入以下内容:

import org.apache.log4j.Logger
import org.apache.log4j.Level

Logger.getLogger("org").setLevel(Level.OFF)
Logger.getLogger("akka").setLevel(Level.OFF)

之后,您将看不到任何日志。

Edit your conf/log4j.properties file and Change the following line:

   log4j.rootCategory=INFO, console

to

    log4j.rootCategory=ERROR, console

Another approach would be to :

Fireup spark-shell and type in the following:

import org.apache.log4j.Logger
import org.apache.log4j.Level

Logger.getLogger("org").setLevel(Level.OFF)
Logger.getLogger("akka").setLevel(Level.OFF)

You won’t see any logs after that.


回答 4

>>> log4j = sc._jvm.org.apache.log4j
>>> log4j.LogManager.getRootLogger().setLevel(log4j.Level.ERROR)
>>> log4j = sc._jvm.org.apache.log4j
>>> log4j.LogManager.getRootLogger().setLevel(log4j.Level.ERROR)

回答 5

对于PySpark,您还可以使用在脚本中设置日志级别sc.setLogLevel("FATAL")。从文档

控制我们的logLevel。这将覆盖所有用户定义的日志设置。有效的日志级别包括:ALL,DEBUG,ERROR,FATAL,INFO,OFF,TRACE,WARN

For PySpark, you can also set the log level in your scripts with sc.setLogLevel("FATAL"). From the docs:

Control our logLevel. This overrides any user-defined log settings. Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN


回答 6

您可以使用setLogLevel

val spark = SparkSession
      .builder()
      .config("spark.master", "local[1]")
      .appName("TestLog")
      .getOrCreate()

spark.sparkContext.setLogLevel("WARN")

You can use setLogLevel

val spark = SparkSession
      .builder()
      .config("spark.master", "local[1]")
      .appName("TestLog")
      .getOrCreate()

spark.sparkContext.setLogLevel("WARN")

回答 7

这可能是由于Spark如何计算其类路径。我的直觉是,Hadoop的log4j.properties文件在类路径中出现在Spark之前,从而阻止您的更改生效。

如果你跑

SPARK_PRINT_LAUNCH_COMMAND=1 bin/spark-shell

然后Spark将打印用于启动Shell的完整类路径;就我而言

Spark Command: /usr/lib/jvm/java/bin/java -cp :::/root/ephemeral-hdfs/conf:/root/spark/conf:/root/spark/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:/root/spark/lib/datanucleus-api-jdo-3.2.1.jar:/root/spark/lib/datanucleus-core-3.2.2.jar:/root/spark/lib/datanucleus-rdbms-3.2.1.jar -XX:MaxPermSize=128m -Djava.library.path=:/root/ephemeral-hdfs/lib/native/ -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell --class org.apache.spark.repl.Main

/root/ephemeral-hdfs/conf类路径的最前面。

我已经发布了一个问题[SPARK-2913],可以在下一发行版中解决此问题(我应该尽快发布一个补丁)。

同时,有两种解决方法:

  • 添加export SPARK_SUBMIT_CLASSPATH="$FWDIR/conf"spark-env.sh
  • 删除(或重命名)/root/ephemeral-hdfs/conf/log4j.properties

This may be due to how Spark computes its classpath. My hunch is that Hadoop’s log4j.properties file is appearing ahead of Spark’s on the classpath, preventing your changes from taking effect.

If you run

SPARK_PRINT_LAUNCH_COMMAND=1 bin/spark-shell

then Spark will print the full classpath used to launch the shell; in my case, I see

Spark Command: /usr/lib/jvm/java/bin/java -cp :::/root/ephemeral-hdfs/conf:/root/spark/conf:/root/spark/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:/root/spark/lib/datanucleus-api-jdo-3.2.1.jar:/root/spark/lib/datanucleus-core-3.2.2.jar:/root/spark/lib/datanucleus-rdbms-3.2.1.jar -XX:MaxPermSize=128m -Djava.library.path=:/root/ephemeral-hdfs/lib/native/ -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell --class org.apache.spark.repl.Main

where /root/ephemeral-hdfs/conf is at the head of the classpath.

I’ve opened an issue [SPARK-2913] to fix this in the next release (I should have a patch out soon).

In the meantime, here’s a couple of workarounds:

  • Add export SPARK_SUBMIT_CLASSPATH="$FWDIR/conf" to spark-env.sh.
  • Delete (or rename) /root/ephemeral-hdfs/conf/log4j.properties.

回答 8

Spark 1.6.2:

log4j = sc._jvm.org.apache.log4j
log4j.LogManager.getRootLogger().setLevel(log4j.Level.ERROR)

Spark 2.x:

spark.sparkContext.setLogLevel('WARN')

(火花是SparkSession)

或者是旧方法

在Spark Dir中重命名conf/log4j.properties.templateconf/log4j.properties

在中log4j.properties,更改log4j.rootCategory=INFO, consolelog4j.rootCategory=WARN, console

可用的不同日志级别:

  • OFF(最具体,不记录)
  • 致命(最具体,数据很少)
  • 错误-仅在出现错误的情况下记录
  • 警告-仅在出现警告或错误时记录
  • INFO(默认)
  • 调试-日志详细信息步骤(以及上述所有日志)
  • TRACE(最具体,很多数据)
  • ALL(最少,所有数据)

Spark 1.6.2:

log4j = sc._jvm.org.apache.log4j
log4j.LogManager.getRootLogger().setLevel(log4j.Level.ERROR)

Spark 2.x:

spark.sparkContext.setLogLevel('WARN')

(spark being the SparkSession)

Alternatively the old methods,

Rename conf/log4j.properties.template to conf/log4j.properties in Spark Dir.

In the log4j.properties, change log4j.rootCategory=INFO, console to log4j.rootCategory=WARN, console

Different log levels available:

  • OFF (most specific, no logging)
  • FATAL (most specific, little data)
  • ERROR – Log only in case of Errors
  • WARN – Log only in case of Warnings or Errors
  • INFO (Default)
  • DEBUG – Log details steps (and all logs stated above)
  • TRACE (least specific, a lot of data)
  • ALL (least specific, all data)

回答 9

编程方式

spark.sparkContext.setLogLevel("WARN")

可用选项

ERROR
WARN 
INFO 

Programmatic way

spark.sparkContext.setLogLevel("WARN")

Available Options

ERROR
WARN 
INFO 

回答 10

我将其用于具有1个主设备和2个从设备以及Spark 1.2.1的Amazon EC2。

# Step 1. Change config file on the master node
nano /root/ephemeral-hdfs/conf/log4j.properties

# Before
hadoop.root.logger=INFO,console
# After
hadoop.root.logger=WARN,console

# Step 2. Replicate this change to slaves
~/spark-ec2/copy-dir /root/ephemeral-hdfs/conf/

I used this with Amazon EC2 with 1 master and 2 slaves and Spark 1.2.1.

# Step 1. Change config file on the master node
nano /root/ephemeral-hdfs/conf/log4j.properties

# Before
hadoop.root.logger=INFO,console
# After
hadoop.root.logger=WARN,console

# Step 2. Replicate this change to slaves
~/spark-ec2/copy-dir /root/ephemeral-hdfs/conf/

回答 11

只需将以下参数添加到您的spark-submit命令中

--conf "spark.driver.extraJavaOptions=-Dlog4jspark.root.logger=WARN,console"

这仅暂时覆盖该作业的系统值。从log4j.properties文件中检查确切的属性名称(此处为log4jspark.root.logger)。

希望这会有所帮助,加油!

Simply add below param to your spark-submit command

--conf "spark.driver.extraJavaOptions=-Dlog4jspark.root.logger=WARN,console"

This overrides system value temporarily only for that job. Check exact property name (log4jspark.root.logger here) from log4j.properties file.

Hope this helps, cheers!


回答 12

以下针对scala用户的代码段:

选项1 :

您可以在摘要下方添加文件级

import org.apache.log4j.{Level, Logger}
Logger.getLogger("org").setLevel(Level.WARN)

选项2:

注意:这将适用于所有正在使用spark会话的应用程序。

import org.apache.spark.sql.SparkSession

  private[this] implicit val spark = SparkSession.builder().master("local[*]").getOrCreate()

spark.sparkContext.setLogLevel("WARN")

选项3:

注意:此配置应添加到您的log4j.properties中。(可能类似于/etc/spark/conf/log4j.properties(其中有spark安装)或项目文件夹级别的log4j.properties),因为您在以下位置进行更改模块级别。这将适用于所有应用程序。

log4j.rootCategory=ERROR, console

恕我直言,选项1是明智的方法,因为可以在文件级别将其关闭。

This below code snippet for scala users :

Option 1 :

Below snippet you can add at the file level

import org.apache.log4j.{Level, Logger}
Logger.getLogger("org").setLevel(Level.WARN)

Option 2 :

Note : which will be applicable for all the application which is using spark session.

import org.apache.spark.sql.SparkSession

  private[this] implicit val spark = SparkSession.builder().master("local[*]").getOrCreate()

spark.sparkContext.setLogLevel("WARN")

Option 3 :

Note : This configuration should be added to your log4j.properties.. (could be like /etc/spark/conf/log4j.properties (where the spark installation is there) or your project folder level log4j.properties) since you are changing at module level. This will be applicable for all the application.

log4j.rootCategory=ERROR, console

IMHO, Option 1 is wise way since it can be switched off at file level.


回答 13

我这样做的方式是:

在我运行spark-submit脚本的位置

$ cp /etc/spark/conf/log4j.properties .
$ nano log4j.properties

更改INFO为所需的日志记录级别,然后运行spark-submit

The way I do it is:

in the location I run the spark-submit script do

$ cp /etc/spark/conf/log4j.properties .
$ nano log4j.properties

change INFO to what ever level of logging you want and then run your spark-submit


回答 14

我想继续使用日志记录(Python的日志记录工具),可以尝试拆分应用程序和Spark的配置:

LoggerManager()
logger = logging.getLogger(__name__)
loggerSpark = logging.getLogger('py4j')
loggerSpark.setLevel('WARNING')

I you want to keep using the logging (Logging facility for Python) you can try splitting configurations for your application and for Spark:

LoggerManager()
logger = logging.getLogger(__name__)
loggerSpark = logging.getLogger('py4j')
loggerSpark.setLevel('WARNING')

Awesomo 很酷的开源项目列表

A.W.E.S.O.M.O是用各种语言编写的有趣的开放源码项目的广泛列表。

如果您有兴趣Open Source并且正在考虑加入开源开发人员社区,那么在这里您可能会找到一个适合您的项目

Subscribe

我们有一个电报频道,我们每天在那里发布新闻,宣布和我们找到的所有开源的好东西,所以订阅我们:@the_art_of_development

语言

想要添加一个有趣的项目吗?

  • 简单地说fork此存储库
  • 使用与其他项目类似的格式将项目添加到列表中
  • 打开new pull request

☝️不过,请记住,我们不接受猛犸象的屎只有具有良好文档的活动和有趣的项目才会被添加。废弃和废弃的项目将被拆除

想支持我们吗?

只需将此列表与您的朋友分享到TwitterFacebookMedium或者其他地方

许可证

awesomo通过@lk-geimfari

在法律允许的范围内,将CC0与awesomo已放弃所有版权以及与以下内容相关或相邻的权利awesomo

您应该已经收到了这项工作附带的CC0法律代码的副本。如果没有,请参见https://creativecommons.org/publicdomain/zero/1.0/

Mal-MAL-做一个Lisp

MAL-做一个Lisp

描述

1.Mal是一个受Clojure启发的Lisp解释器

2.MAL是一种学习工具

MAL的每个实现被分成11个增量的、自包含的(且可测试的)步骤,这些步骤演示了Lisp的核心概念。最后一步是能够自托管(运行mal的错误实现)。请参阅make-a-lisp process
guide

Make-a-LISP步骤包括:

每个Make-a-LISP步骤都有一个关联的架构图。该步骤的新元素以红色高亮显示。以下是step A

如果您对创建mal实现感兴趣(或者只是对使用mal做某事感兴趣),欢迎您加入我们的Discord或加入#mal onlibera.chat除了make-a-lisp
process guide
还有一个mal/make-a-lisp
FAQ
在这里我试图回答一些常见的问题

3.MAL用86种语言实现(91种不同实现,113种运行模式)

语言 创建者
Ada Chris Moore
Ada #2 Nicolas Boulenguez
GNU Awk Miutsuru Kariya
Bash 4 Joel Martin
BASIC(C64和QBASIC) Joel Martin
BBC BASIC V Ben Harris
C Joel Martin
C #2 Duncan Watts
C++ Stephen Thirlwall
C# Joel Martin
ChucK Vasilij Schneidermann
Clojure(Clojure和ClojureScript) Joel Martin
CoffeeScript Joel Martin
Common Lisp Iqbal Ansari
Crystal Linda_pp
D Dov Murik
Dart Harry Terkelsen
Elixir Martin Ek
Elm Jos van Bakel
Emacs Lisp Vasilij Schneidermann
Erlang Nathan Fiedler
ES6(ECMAScript 2015) Joel Martin
F# Peter Stephens
Factor Jordan Lewis
Fantom Dov Murik
Fennel sogaiu
Forth Chris Houser
GNU Guile Mu Lei
GNU Smalltalk Vasilij Schneidermann
Go Joel Martin
Groovy Joel Martin
Haskell Joel Martin
Haxe(Neko、Python、C++和JS) Joel Martin
Hy Joel Martin
Io Dov Murik
Janet sogaiu
Java Joel Martin
Java(松露/GraalVM) Matt McGill
JavaScript(Demo) Joel Martin
jq Ali MohammadPur
Julia Joel Martin
Kotlin Javier Fernandez-Ivern
LiveScript Jos van Bakel
Logo Dov Murik
Lua Joel Martin
GNU Make Joel Martin
mal itself Joel Martin
MATLAB(GNU Octave&MATLAB) Joel Martin
miniMAL(RepoDemo) Joel Martin
NASM Ben Dudson
Nim Dennis Felsing
Object Pascal Joel Martin
Objective C Joel Martin
OCaml Chris Houser
Perl Joel Martin
Perl 6 Hinrik Örn Sigurðsson
PHP Joel Martin
Picolisp Vasilij Schneidermann
Pike Dov Murik
PL/pgSQL(PostgreSQL) Joel Martin
PL/SQL(Oracle) Joel Martin
PostScript Joel Martin
PowerShell Joel Martin
Prolog Nicolas Boulenguez
Python(2.x和3.x) Joel Martin
Python #2(3.x) Gavin Lewis
RPython Joel Martin
R Joel Martin
Racket Joel Martin
Rexx Dov Murik
Ruby Joel Martin
Rust Joel Martin
Scala Joel Martin
Scheme (R7RS) Vasilij Schneidermann
Skew Dov Murik
Standard ML Fabian Bergström
Swift 2 Keith Rollin
Swift 3 Joel Martin
Swift 4 陆遥
Swift 5 Oleg Montak
Tcl Dov Murik
TypeScript Masahiro Wakame
Vala Simon Tatham
VHDL Dov Murik
Vimscript Dov Murik
Visual Basic.NET Joel Martin
WebAssembly(WASM) Joel Martin
Wren Dov Murik
XSLT Ali MohammadPur
Yorick Dov Murik
Zig Josh Tobin

演示文稿

Mal第一次出现在2014年Clojure West的闪电演讲中(不幸的是没有视频)。参见Examples/clojurewest2014.mal了解会议上的演示文稿(是的,该演示文稿是一个MALL程序)

在Midwest.io 2015上,乔尔·马丁(Joel Martin)就MAL做了题为“解锁的成就:一条更好的语言学习之路”的演讲VideoSlides

最近,乔尔在LambdaConf 2016大会上发表了题为“用10个增量步骤打造自己的Lisp解释器”的演讲:Part 1Part 2Part 3Part 4Slides

构建/运行实现

运行任何给定实现的最简单方法是使用docker。每个实现都有一个预先构建的停靠器映像,其中安装了语言依赖项。您可以在顶层Makefile中使用一个方便的目标启动REPL(其中impl是实现目录名,stepX是要运行的步骤):

make DOCKERIZE=1 "repl^IMPL^stepX"
    # OR stepA is the default step:
make DOCKERIZE=1 "repl^IMPL"

外部实现

以下实施作为单独的项目进行维护:

HolyC

生锈

  • by Tim Morgan
  • by vi-使用Pest语法,不使用典型的MAL基础设施(货币化步骤和内置的转换测试)

问:

  • by Ali Mohammad Pur-Q实现运行良好,但它需要专有的手动下载,不能Docker化(或集成到mal CI管道中),因此目前它仍然是一个单独的项目

其他MAL项目

  • malc详细说明:MAL(Make A Lisp)编译器。将MAL程序编译成LLVM汇编语言,然后编译成二进制
  • malcc-malcc是MAL语言的增量编译器实现。它使用微型C编译器作为编译器后端,并且完全支持MAL语言,包括宏、尾部调用消除,甚至运行时求值。“I Built a Lisp Compiler”发布有关该过程的帖子
  • frock+Clojure风格的PHP。使用mal/php运行程序
  • flk-无论Bash在哪里都可以运行的LISP
  • glisp详细说明:基于Lisp的自引导图形设计工具。Live Demo

实施详情

Ada

Ada实现是在Debian上使用GNAT4.9开发的。如果您有git、gnat和make(可选)的windows版本,它也可以在windows上编译而不变。没有外部依赖项(未实现ReadLine)

cd impls/ada
make
./stepX_YYY

Ada.2

第二个Ada实现是使用GNAT 8开发的,并与GNU读取线库链接

cd impls/ada
make
./stepX_YYY

GNU awk

Mal的GNU awk实现已经使用GNU awk 4.1.1进行了测试

cd impls/gawk
gawk -O -f stepX_YYY.awk

BASH 4

cd impls/bash
bash stepX_YYY.sh

基本(C64和QBasic)

Basic实现使用一个预处理器,该预处理器可以生成与C64 Basic(CBMv2)和QBasic兼容的Basic代码。C64模式已经过测试cbmbasic(当前需要打补丁的版本来修复线路输入问题),并且QBasic模式已经过测试qb64

生成C64代码并使用cbmbasic运行:

cd impls/basic
make stepX_YYY.bas
STEP=stepX_YYY ./run

生成QBasic代码并加载到qb64中:

cd impls/basic
make MODE=qbasic stepX_YYY.bas
./qb64 stepX_YYY.bas

感谢Steven Syrek有关此实现的原始灵感,请参阅

BBC Basic V

BBC Basic V实现可以在Brandy解释器中运行:

cd impls/bbc-basic
brandy -quit stepX_YYY.bbc

或在RISC OS 3或更高版本下的ARM BBC Basic V中:

*Dir bbc-basic.riscos
*Run setup
*Run stepX_YYY

C

mal的C实现需要以下库(lib和头包):glib、libffi6、libgc以及libedit或GNU readline库

cd impls/c
make
./stepX_YYY

C.2

mal的第二个C实现需要以下库(lib和头包):libedit、libgc、libdl和libffi

cd impls/c.2
make
./stepX_YYY

C++

构建mal的C++实现需要g++-4.9或clang++-3.5和readline兼容库。请参阅cpp/README.md有关更多详细信息,请执行以下操作:

cd impls/cpp
make
    # OR
make CXX=clang++-3.5
./stepX_YYY

C#

mal的C#实现已经在Linux上使用Mono C#编译器(MCS)和Mono运行时(2.10.8.1版)进行了测试。两者都是构建和运行C#实现所必需的

cd impls/cs
make
mono ./stepX_YYY.exe

卡盘

Chuck实现已经使用Chuck 1.3.5.2进行了测试

cd impls/chuck
./run

封闭式

在很大程度上,Clojure实现需要Clojure 1.5,然而,要通过所有测试,则需要Clojure 1.8.0-RC4

cd impls/clojure
lein with-profile +stepX trampoline run

CoffeeScript

sudo npm install -g coffee-script
cd impls/coffee
coffee ./stepX_YYY

通用Lisp

该实现已经在Ubuntu 16.04和Ubuntu 12.04上使用SBCL、CCL、CMUCL、GNU CLISP、ECL和Allegro CL进行了测试,请参阅README了解更多详细信息。如果您安装了上述依赖项,请执行以下操作来运行实现

cd impls/common-lisp
make
./run

水晶

MAL的晶体实现已经用Crystal 0.26.1进行了测试

cd impls/crystal
crystal run ./stepX_YYY.cr
    # OR
make   # needed to run tests
./stepX_YYY

D

使用GDC4.8对MAL的D实现进行了测试。它需要GNU读取线库

cd impls/d
make
./stepX_YYY

省道

DART实施已使用DART 1.20进行了测试

cd impls/dart
dart ./stepX_YYY

Emacs Lisp

Emacs Lisp的MAL实现已经使用Emacs 24.3和24.5进行了测试。虽然有非常基本的读数行编辑(<backspace>C-d工作,C-c取消进程),建议使用rlwrap

cd impls/elisp
emacs -Q --batch --load stepX_YYY.el
# with full readline support
rlwrap emacs -Q --batch --load stepX_YYY.el

灵丹妙药

MAL的长生不老的实现已经在长生不老的长生不老的1.0.5中进行了测试

cd impls/elixir
mix stepX_YYY
# Or with readline/line editing functionality:
iex -S mix stepX_YYY

榆树

MAL的ELM实现已经用ELM 0.18.0进行了测试

cd impls/elm
make stepX_YYY.js
STEP=stepX_YYY ./run

二郎

Mal的Erlang实现需要Erlang/OTP R17rebar要建造

cd impls/erlang
make
    # OR
MAL_STEP=stepX_YYY rebar compile escriptize # build individual step
./stepX_YYY

ES6(ECMAScript 2015)

ES6/ECMAScript 2015实施使用babel用于生成ES5兼容JavaScript的编译器。生成的代码已经在Node 0.12.4上进行了测试

cd impls/es6
make
node build/stepX_YYY.js

F#

mal的F#实现已经在Linux上使用Mono F#编译器(Fsharpc)和Mono运行时(版本3.12.1)进行了测试。单C#编译器(MCS)也是编译readline依赖项所必需的。所有这些都是构建和运行F#实现所必需的

cd impls/fsharp
make
mono ./stepX_YYY.exe

因素

MAL的因子实现已通过因子0.97(factorcode.org)

cd impls/factor
FACTOR_ROOTS=. factor -run=stepX_YYY

幻影

MAL的幻象实现已经用幻象1.0.70进行了测试

cd impls/fantom
make lib/fan/stepX_YYY.pod
STEP=stepX_YYY ./run

茴香

Mal的Fennel实现已经在Lua5.4上使用Fennel版本0.9.1进行了测试

cd impls/fennel
fennel ./stepX_YYY.fnl

第四

cd impls/forth
gforth stepX_YYY.fs

GNU Guile 2.1+

cd impls/guile
guile -L ./ stepX_YYY.scm

GNU Smalltalk

MALL的Smalltalk实现已经在GNU Smalltalk 3.2.91上进行了测试

cd impls/gnu-smalltalk
./run

MALL的GO实现要求在路径上安装GO。该实现已经在GO 1.3.1上进行了测试

cd impls/go
make
./stepX_YYY

时髦的

mal的Groovy实现需要Groovy才能运行,并且已经使用Groovy 1.8.6进行了测试

cd impls/groovy
make
groovy ./stepX_YYY.groovy

哈斯克尔

Haskell实现需要GHC编译器版本7.10.1或更高版本以及Haskell parsec和readline(或editline)包

cd impls/haskell
make
./stepX_YYY

Haxe(Neko、Python、C++和JavaScript)

Mal的Haxe实现需要编译Haxe3.2版。支持四种不同的Haxe目标:neko、Python、C++和JavaScript

cd impls/haxe
# Neko
make all-neko
neko ./stepX_YYY.n
# Python
make all-python
python3 ./stepX_YYY.py
# C++
make all-cpp
./cpp/stepX_YYY
# JavaScript
make all-js
node ./stepX_YYY.js

干草

MAL的Hy实现已经用Hy 0.13.0进行了测试

cd impls/hy
./stepX_YYY.hy

IO

已使用IO版本20110905测试了MAL的IO实现

cd impls/io
io ./stepX_YYY.io

珍妮特

MAIL的Janet实现已经使用Janet版本1.12.2进行了测试

cd impls/janet
janet ./stepX_YYY.janet

Java 1.7

mal的Java实现需要maven2来构建

cd impls/java
mvn compile
mvn -quiet exec:java -Dexec.mainClass=mal.stepX_YYY
    # OR
mvn -quiet exec:java -Dexec.mainClass=mal.stepX_YYY -Dexec.args="CMDLINE_ARGS"

Java,将Truffle用于GraalVM

这个Java实现可以在OpenJDK上运行,但是多亏了Truffle框架,它在GraalVM上的运行速度可以提高30倍。它已经在OpenJDK 11、GraalVM CE 20.1.0和GraalVM CE 21.1.0上进行了测试

cd impls/java-truffle
./gradlew build
STEP=stepX_YYY ./run

JavaScript/节点

cd impls/js
npm install
node stepX_YYY.js

朱莉娅

Mal的Julia实现需要Julia 0.4

cd impls/julia
julia stepX_YYY.jl

JQ

针对1.6版进行了测试,IO部门存在大量作弊行为

cd impls/jq
STEP=stepA_YYY ./run
    # with Debug
DEBUG=true STEP=stepA_YYY ./run

科特林

MAL的Kotlin实现已经使用Kotlin 1.0进行了测试

cd impls/kotlin
make
java -jar stepX_YYY.jar

LiveScript

已使用LiveScript 1.5测试了mal的LiveScript实现

cd impls/livescript
make
node_modules/.bin/lsc stepX_YYY.ls

徽标

MAL的Logo实现已经用UCBLogo 6.0进行了测试

cd impls/logo
logo stepX_YYY.lg

路亚

Mal的Lua实现已经使用Lua 5.3.5进行了测试。该实现需要安装luarock

cd impls/lua
make  # to build and link linenoise.so and rex_pcre.so
./stepX_YYY.lua

男性

运行mal的错误实现包括运行其他实现之一的STEPA,并传递作为命令行参数运行的mal步骤

cd impls/IMPL
IMPL_STEPA_CMD ../mal/stepX_YYY.mal

GNU Make 3.81

cd impls/make
make -f stepX_YYY.mk

NASM

MAL的NASM实现是为x86-64 Linux编写的,并且已经在Linux 3.16.0-4-AMD64和NASM版本2.11.05上进行了测试

cd impls/nasm
make
./stepX_YYY

NIM 1.0.4

MAL的NIM实现已经使用NIM 1.0.4进行了测试

cd impls/nim
make
  # OR
nimble build
./stepX_YYY

对象PASCAL

MAL的对象Pascal实现已经使用Free Pascal编译器版本2.6.2和2.6.4在Linux上构建和测试

cd impls/objpascal
make
./stepX_YYY

目标C

Mal的Objective C实现已经在Linux上使用CLANG/LLVM3.6进行了构建和测试。它还使用XCode7在OS X上进行了构建和测试

cd impls/objc
make
./stepX_YYY

OCaml 4.01.0

cd impls/ocaml
make
./stepX_YYY

MATLAB(GNU倍频程和MATLAB)

MATLAB实现已经在GNU Octave 4.2.1上进行了测试。它还在Linux上用MATLAB版本R2014a进行了测试。请注意,matlab是一个商业产品。

cd impls/matlab
./stepX_YYY
octave -q --no-gui --no-history --eval "stepX_YYY();quit;"
matlab -nodisplay -nosplash -nodesktop -nojvm -r "stepX_YYY();quit;"
    # OR with command line arguments
octave -q --no-gui --no-history --eval "stepX_YYY('arg1','arg2');quit;"
matlab -nodisplay -nosplash -nodesktop -nojvm -r "stepX_YYY('arg1','arg2');quit;"

极小值

miniMAL是用不到1024字节的JavaScript实现的小型Lisp解释器。要运行mal的最小实现,您需要下载/安装最小解释器(这需要Node.js)

cd impls/miniMAL
# Download miniMAL and dependencies
npm install
export PATH=`pwd`/node_modules/minimal-lisp/:$PATH
# Now run mal implementation in miniMAL
miniMAL ./stepX_YYY

Perl 5

Perl 5实现应该使用Perl 5.19.3和更高版本

要获得读取行编辑支持,请从CPAN安装Term::ReadLine::Perl或Term::ReadLine::GNU

cd impls/perl
perl stepX_YYY.pl

Perl 6

Perl6实现在Rakudo Perl6 2016.04上进行了测试

cd impls/perl6
perl6 stepX_YYY.pl

PHP 5.3

mal的PHP实现需要php命令行界面才能运行

cd impls/php
php stepX_YYY.php

皮奥利普

Picolisp实现需要libreadline和Picolisp 3.1.11或更高版本

cd impls/picolisp
./run

派克

Pike实现在Pike8.0上进行了测试

cd impls/pike
pike stepX_YYY.pike

pl/pgSQL(PostgreSQL SQL过程语言)

mal的PL/pgSQL实现需要一个正在运行的PostgreSQL服务器(“kanaka/mal-test-plpgsql”docker映像自动启动PostgreSQL服务器)。该实现连接到PostgreSQL服务器并创建名为“mal”的数据库来存储表和存储过程。包装器脚本使用psql命令连接到服务器,并默认为用户“postgres”,但可以使用PSQL_USER环境变量覆盖该值。可以使用PGPASSWORD环境变量指定密码。该实现已使用PostgreSQL 9.4进行了测试

cd impls/plpgsql
./wrap.sh stepX_YYY.sql
    # OR
PSQL_USER=myuser PGPASSWORD=mypass ./wrap.sh stepX_YYY.sql

PL/SQL(Oracle SQL过程语言)

mal的PL/SQL实现需要一个正在运行的Oracle DB服务器(“kanaka/mal-test-plsql”docker映像自动启动Oracle Express服务器)。该实现连接到Oracle服务器以创建类型、表和存储过程。默认的SQL*Plus登录值(用户名/口令@CONNECT_IDENTIFIER)是“SYSTEM/ORACLE”,但是可以用ORACLE_LOGON环境变量覆盖该值。该实施已使用Oracle Express Edition 11g Release 2进行了测试。请注意,任何SQL*Plus连接警告(用户密码过期等)都会干扰包装脚本与数据库通信的能力

cd impls/plsql
./wrap.sh stepX_YYY.sql
    # OR
ORACLE_LOGON=myuser/mypass@ORCL ./wrap.sh stepX_YYY.sql

PostScript Level 2/3

mal的PostScript实现需要运行Ghostscript。它已经使用Ghostscript 9.10进行了测试

cd impls/ps
gs -q -dNODISPLAY -I./ stepX_YYY.ps

PowerShell

Mal的PowerShell实现需要PowerShell脚本语言。它已经在Linux上使用PowerShell 6.0.0 Alpha 9进行了测试

cd impls/powershell
powershell ./stepX_YYY.ps1

序言

Prolog实现使用了一些特定于SWI-Prolog的结构,包括READLINE支持,并且已经在8.2.1版的Debian GNU/Linux上进行了测试

cd impls/prolog
swipl stepX_YYY

Python(2.x和3.x)

cd impls/python
python stepX_YYY.py

Python2(3.x)

第二个Python实现大量使用类型注释并使用Arpeggio解析器库

# Recommended: do these steps in a Python virtual environment.
pip3 install Arpeggio==1.9.0
python3 stepX_YYY.py

RPython

你一定是rpython在您的路径上(随附pypy)

cd impls/rpython
make        # this takes a very long time
./stepX_YYY

R

MALL R实现需要R(r-base-core)来运行

cd impls/r
make libs  # to download and build rdyncall
Rscript stepX_YYY.r

球拍(5.3)

Mal的racket实现需要运行racket编译器/解释器

cd impls/racket
./stepX_YYY.rkt

雷克斯

Mal的Rexx实现已经使用Regina Rexx 3.6进行了测试

cd impls/rexx
make
rexx -a ./stepX_YYY.rexxpp

拼音(1.9+)

cd impls/ruby
ruby stepX_YYY.rb

生锈(1.38+)

Mal的Rust实现需要使用Rust编译器和构建工具(Cargo)来构建

cd impls/rust
cargo run --release --bin stepX_YYY

缩放比例

安装Scala和SBT(http://www.scala-sbt.org/0.13/tutorial/Installing-sbt-on-Linux.html):

cd impls/scala
sbt 'run-main stepX_YYY'
    # OR
sbt compile
scala -classpath target/scala*/classes stepX_YYY

方案(R7RS)

MAL的方案实施已在赤壁-方案0.7.3、卡瓦2.4、高车0.9.5、鸡肉4.11.0、人马座0.8.3、气旋0.6.3(Git版本)和Foment 0.4(Git版本)上进行了测试。在弄清库是如何加载的并调整了R7RS实现的基础上,您应该能够让它在其他符合R7RS标准的实现上运行Makefilerun相应地编写脚本

cd impls/scheme
make symlinks
# chibi
scheme_MODE=chibi ./run
# kawa
make kawa
scheme_MODE=kawa ./run
# gauche
scheme_MODE=gauche ./run
# chicken
make chicken
scheme_MODE=chicken ./run
# sagittarius
scheme_MODE=sagittarius ./run
# cyclone
make cyclone
scheme_MODE=cyclone ./run
# foment
scheme_MODE=foment ./run

歪斜

MAL的不对称实现已经使用不对称0.7.42进行了测试

cd impls/skew
make
node stepX_YYY.js

标准ML(Poly/ML、MLton、莫斯科ML)

Mal的标准ML实现需要一个SML97实施。Makefile支持POLY/ML、MLTON、MOVICO ML,并已在POLY/ML 5.8.1、MLTON 20210117和MOSSIONS ML版本2.10上进行了测试

cd impls/sml
# Poly/ML
make sml_MODE=polyml
./stepX_YYY
# MLton
make sml_MODE=mlton
./stepX_YYY
# Moscow ML
make sml_MODE=mosml
./stepX_YYY

斯威夫特

MALL的SWIFT实施需要SWIFT 2.0编译器(XCode 7.0)来构建。由于语言和标准库中的更改,旧版本将无法运行

cd impls/swift
make
./stepX_YYY

斯威夫特3

MALL的SWIFT 3实施需要SWIFT 3.0编译器。它已经在SWIFT 3预览版3上进行了测试

cd impls/swift3
make
./stepX_YYY

斯威夫特4

MALL的SWIFT 4实施需要SWIFT 4.0编译器。它已在SWIFT 4.2.3版本中进行了测试

cd impls/swift4
make
./stepX_YYY

SWIFT 5

MALL的SWIFT 5实施需要SWIFT 5.0编译器。它已在SWIFT 5.1.1版本中进行了测试

cd impls/swift5
swift run stepX_YYY

TCL 8.6

Mal的Tcl实现需要运行Tcl 8.6。要获得readline行编辑支持,请安装tclreadline

cd impls/tcl
tclsh ./stepX_YYY.tcl

打字稿

mal的TypeScript实现需要TypeScript 2.2编译器。它已经在Node.js V6上进行了测试

cd impls/ts
make
node ./stepX_YYY.js

瓦拉

MALL的VALA实现已经用VALA0.40.8编译器进行了测试。您将需要安装valaclibreadline-dev或同等的

cd impls/vala
make
./stepX_YYY

VHDL

用GHDL0.29对mal的vhdl实现进行了测试。

cd impls/vhdl
make
./run_vhdl.sh ./stepX_YYY

Vimscript

Mal的Vimscript实现需要运行Vim 8.0

cd impls/vimscript
./run_vimscript.sh ./stepX_YYY.vim

Visual Basic.NET

Mal的VB.NET实现已经在Linux上使用Mono VB编译器(Vbnc)和Mono运行时(2.10.8.1版)进行了测试。构建和运行VB.NET实现需要两者

cd impls/vb
make
mono ./stepX_YYY.exe

WebAssembly(Wasm)

WebAssembly实现是用Wam(WebAssembly宏语言),并在几种不同的非Web嵌入(运行时)下运行:nodewasmtimewasmerlucetwaxwacewarpy

cd impls/wasm
# node
make wasm_MODE=node
./run.js ./stepX_YYY.wasm
# wasmtime
make wasm_MODE=wasmtime
wasmtime --dir=./ --dir=../ --dir=/ ./stepX_YYY.wasm
# wasmer
make wasm_MODE=wasmer
wasmer run --dir=./ --dir=../ --dir=/ ./stepX_YYY.wasm
# lucet
make wasm_MODE=lucet
lucet-wasi --dir=./:./ --dir=../:../ --dir=/:/ ./stepX_YYY.so
# wax
make wasm_MODE=wax
wax ./stepX_YYY.wasm
# wace
make wasm_MODE=wace_libc
wace ./stepX_YYY.wasm
# warpy
make wasm_MODE=warpy
warpy --argv --memory-pages 256 ./stepX_YYY.wasm

XSLT

mal的XSLT实现是用XSLT3编写的,并在Saxon 9.9.1.6家庭版上进行了测试

cd impls/xslt
STEP=stepX_YY ./run

雷恩

MAL的WREN实现在WREN 0.2.0上进行了测试

cd impls/wren
wren ./stepX_YYY.wren

约里克

MAL的Yorick实现在Yorick 2.2.04上进行了测试

cd impls/yorick
yorick -batch ./stepX_YYY.i

之字形

MAL的Zig实现在Zig0.5上进行了测试

cd impls/zig
zig build stepX_YYY

运行测试

顶层Makefile有许多有用的目标来协助实现、开发和测试。这个helpTarget提供目标和选项的列表:

make help

功能测试

中几乎有800个通用功能测试(针对所有实现)。tests/目录。每个步骤都有相应的测试文件,其中包含特定于该步骤的测试。这个runtest.py测试工具启动MAL步骤实现,然后将测试一次一个提供给实现,并将输出/返回值与预期的输出/返回值进行比较

  • 要在所有实现中运行所有测试(请准备等待):
make test
  • 要针对单个实施运行所有测试,请执行以下操作:
make "test^IMPL"

# e.g.
make "test^clojure"
make "test^js"
  • 要对所有实施运行单个步骤的测试,请执行以下操作:
make "test^stepX"

# e.g.
make "test^step2"
make "test^step7"
  • 要针对单个实施运行特定步骤的测试,请执行以下操作:
make "test^IMPL^stepX"

# e.g
make "test^ruby^step3"
make "test^ps^step4"

自托管功能测试

  • 若要在自托管模式下运行功能测试,请指定mal作为测试实现,并使用MAL_IMPLMake Variable以更改基础主机语言(默认值为JavaScript):
make MAL_IMPL=IMPL "test^mal^step2"

# e.g.
make "test^mal^step2"   # js is default
make MAL_IMPL=ruby "test^mal^step2"
make MAL_IMPL=python "test^mal^step2"

启动REPL

  • 要在特定步骤中启动实施的REPL,请执行以下操作:
make "repl^IMPL^stepX"

# e.g
make "repl^ruby^step3"
make "repl^ps^step4"
  • 如果您省略了这一步,那么stepA使用的是:
make "repl^IMPL"

# e.g
make "repl^ruby"
make "repl^ps"
  • 若要启动自托管实现的REPL,请指定mal作为REPL实现,并使用MAL_IMPLMake Variable以更改基础主机语言(默认值为JavaScript):
make MAL_IMPL=IMPL "repl^mal^stepX"

# e.g.
make "repl^mal^step2"   # js is default
make MAL_IMPL=ruby "repl^mal^step2"
make MAL_IMPL=python "repl^mal"

性能测试

警告:这些性能测试在统计上既不有效,也不全面;运行时性能不是mal的主要目标。如果你从这些性能测试中得出任何严肃的结论,那么请联系我,了解堪萨斯州一些令人惊叹的海滨房产,我愿意以低价卖给你

  • 要针对单个实施运行性能测试,请执行以下操作:
make "perf^IMPL"

# e.g.
make "perf^js"
  • 要对所有实施运行性能测试,请执行以下操作:
make "perf"

正在生成语言统计信息

  • 要报告单个实施的行和字节统计信息,请执行以下操作:
make "stats^IMPL"

# e.g.
make "stats^js"

对接测试

每个实现目录都包含一个Dockerfile,用于创建包含该实现的所有依赖项的docker映像。此外,顶级Makefile还支持在停靠器容器中通过传递以下参数来运行测试目标(以及perf、stats、repl等“DOCKERIZE=1”在make命令行上。例如:

make DOCKERIZE=1 "test^js^step3"

现有实现已经构建了坞站映像,并将其推送到坞站注册表。但是,如果您希望在本地构建或重建坞站映像,TopLevel Makefile提供了构建坞站映像的规则:

make "docker-build^IMPL"

注意事项

  • Docker镜像被命名为“Kanaka/mal-test-iml”
  • 基于JVM的语言实现(Groovy、Java、Clojure、Scala):您可能需要首先手动运行此命令一次make DOCKERIZE=1 "repl^IMPL"然后才能运行测试,因为需要下载运行时依赖项以避免测试超时。这些依赖项被下载到/mal目录中的点文件中,因此它们将在两次运行之间保持不变

许可证

MAL(make-a-lisp)是根据MPL 2.0(Mozilla Public License 2.0)许可的。有关更多详细信息,请参阅LICENSE.txt

Spark-Apache 面向大规模数据处理的统一分析引擎

Spark

Spark是面向大规模数据处理的统一分析引擎。它提供了Scala、Java、Python和R的高级API,以及支持通用计算图形进行数据分析的优化引擎。它还支持一组丰富的高级工具,包括用于SQL和DataFrame的Spark SQL、用于机器学习的MLlib、用于图形处理的GraphX以及用于流处理的结构化流

在线文档

您可以在上找到最新的Spark文档,包括编程指南project web page本自述文件仅包含基本设置说明

建设Spark

Spark是用Apache Maven要构建Spark及其示例程序,请运行:

./build/mvn -DskipTests clean package

(如果您下载了预构建包,则不需要执行此操作。)

更详细的文档可从项目网站获得,网址为“Building Spark”

有关常规开发提示,包括有关使用集成开发环境开发Spark的信息,请参见“Useful Developer Tools”

交互式Scala外壳

开始使用Spark的最简单方式是通过scala shell:

./bin/spark-shell

尝试执行以下命令,该命令应返回1,000,000,000:

scala> spark.range(1000 * 1000 * 1000).count()

交互式Python外壳

或者,如果您喜欢Python,也可以使用Python shell:

./bin/pyspark

并运行以下命令,该命令也应返回1,000,000,000:

>>> spark.range(1000 * 1000 * 1000).count()

示例程序

Spark还在examples目录。要运行其中一个,请使用./bin/run-example <class> [params]例如:

./bin/run-example SparkPi

将在本地运行PI示例

您可以在运行示例时设置MASTER环境变量,以将示例提交到集群。这可以是 mesos:// 或 Spark://url,“纱线”在纱线上运行,“local”在本地运行,只有一个线程,或者“local[N]”在本地运行,有N个线程。如果类位于examples包裹。例如:

MASTER=spark://host:7077 ./bin/run-example SparkPi

如果没有给出参数,许多示例程序会打印用法帮助

运行测试

测试首先需要building Spark构建Spark后,可以使用以下工具运行测试:

./dev/run-tests

请参阅有关如何执行以下操作的指南run tests for a module, or individual tests

还有一个Kubernetes集成测试,参见resource-managers/kubernetes/integration-tests/README.md

关于Hadoop版本的说明

Spark使用hadoop核心库与hdfs和其他hadoop支持的存储系统对话。由于不同版本的HADOOP中的协议已更改,因此您必须根据群集运行的相同版本构建Spark

请参阅以下地址的构建文档:“Specifying the Hadoop Version and Enabling YARN”有关针对特定Hadoop发行版进行构建的详细指导,包括针对特定配置单元和配置单元节俭服务器发行版进行构建

配置

请参阅Configuration Guide有关如何配置Spark的概述,请参阅联机文档

贡献

请查看Contribution to Spark guide有关如何开始为项目做贡献的信息,请参阅