问题:如何在Spark中关闭INFO日志记录?
我使用AWS EC2指南安装了Spark,并且可以使用bin/pyspark
脚本正常启动该程序以获取Spark 提示,并且还可以成功执行快速入门Quide。
但是,我无法终生解决如何INFO
在每个命令后停止所有冗长的日志记录。
我在下面的代码(注释掉,设置为OFF)中的几乎所有可能的情况下都尝试了log4j.properties
该conf
文件夹,该文件夹位于我从中以及在每个节点上启动应用程序的文件夹中,没有任何反应。INFO
执行每个语句后,我仍然可以打印日志记录语句。
我对应该如何工作感到非常困惑。
#Set everything to be logged to the console log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
这是我使用时的完整类路径SPARK_PRINT_LAUNCH_COMMAND
:
Spark命令:/Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home/bin/java -cp:/root/spark-1.0.1-bin-hadoop2/conf:/root/spark-1.0.1 -bin-hadoop2 / conf:/root/spark-1.0.1-bin-hadoop2/lib/spark-assembly-1.0.1-hadoop2.2.0.jar:/root/spark-1.0.1-bin-hadoop2/lib /datanucleus-api-jdo-3.2.1.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/root/spark-1.0.1-bin-hadoop2 /lib/datanucleus-rdbms-3.2.1.jar -XX:MaxPermSize = 128m -Djava.library.path = -Xms512m -Xmx512m org.apache.spark.deploy.Spark提交spark-shell –class org.apache.spark。代表主
的内容spark-env.sh
:
#!/usr/bin/env bash
# This file is sourced when running various Spark programs.
# Copy it as spark-env.sh and edit that to configure Spark for your site.
# Options read when launching programs locally with
# ./bin/run-example or ./bin/spark-submit
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
# - SPARK_CLASSPATH=/root/spark-1.0.1-bin-hadoop2/conf/
# Options read by executors and drivers running inside the cluster
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
# - SPARK_CLASSPATH, default classpath entries to append
# - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data
# - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
# Options read in YARN client mode
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_EXECUTOR_INSTANCES, Number of workers to start (Default: 2)
# - SPARK_EXECUTOR_CORES, Number of cores for the workers (Default: 1).
# - SPARK_EXECUTOR_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
# - SPARK_DRIVER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: ‘default’)
# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job.
# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job.
# Options for the daemons used in the standalone deploy mode:
# - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_INSTANCES, to set the number of worker processes per node
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers
export SPARK_SUBMIT_CLASSPATH="$FWDIR/conf"
I installed Spark using the AWS EC2 guide and I can launch the program fine using the bin/pyspark
script to get to the spark prompt and can also do the Quick Start quide successfully.
However, I cannot for the life of me figure out how to stop all of the verbose INFO
logging after each command.
I have tried nearly every possible scenario in the below code (commenting out, setting to OFF) within my log4j.properties
file in the conf
folder in where I launch the application from as well as on each node and nothing is doing anything. I still get the logging INFO
statements printing after executing each statement.
I am very confused with how this is supposed to work.
#Set everything to be logged to the console log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
Here is my full classpath when I use SPARK_PRINT_LAUNCH_COMMAND
:
Spark Command:
/Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home/bin/java
-cp :/root/spark-1.0.1-bin-hadoop2/conf:/root/spark-1.0.1-bin-hadoop2/conf:/root/spark-1.0.1-bin-hadoop2/lib/spark-assembly-1.0.1-hadoop2.2.0.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar
-XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell –class
org.apache.spark.repl.Main
contents of spark-env.sh
:
#!/usr/bin/env bash
# This file is sourced when running various Spark programs.
# Copy it as spark-env.sh and edit that to configure Spark for your site.
# Options read when launching programs locally with
# ./bin/run-example or ./bin/spark-submit
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
# - SPARK_CLASSPATH=/root/spark-1.0.1-bin-hadoop2/conf/
# Options read by executors and drivers running inside the cluster
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
# - SPARK_CLASSPATH, default classpath entries to append
# - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data
# - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
# Options read in YARN client mode
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_EXECUTOR_INSTANCES, Number of workers to start (Default: 2)
# - SPARK_EXECUTOR_CORES, Number of cores for the workers (Default: 1).
# - SPARK_EXECUTOR_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
# - SPARK_DRIVER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: ‘default’)
# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job.
# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job.
# Options for the daemons used in the standalone deploy mode:
# - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_INSTANCES, to set the number of worker processes per node
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers
export SPARK_SUBMIT_CLASSPATH="$FWDIR/conf"
回答 0
只需在spark目录中执行以下命令:
cp conf/log4j.properties.template conf/log4j.properties
编辑log4j.properties:
# Set everything to be logged to the console
log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
在第一行替换:
log4j.rootCategory=INFO, console
通过:
log4j.rootCategory=WARN, console
保存并重新启动您的Shell。它适用于OS X上的Spark 1.1.0和Spark 1.5.1。
Just execute this command in the spark directory:
cp conf/log4j.properties.template conf/log4j.properties
Edit log4j.properties:
# Set everything to be logged to the console
log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
Replace at the first line:
log4j.rootCategory=INFO, console
by:
log4j.rootCategory=WARN, console
Save and restart your shell. It works for me for Spark 1.1.0 and Spark 1.5.1 on OS X.
回答 1
受到pyspark / tests.py的启发
def quiet_logs(sc):
logger = sc._jvm.org.apache.log4j
logger.LogManager.getLogger("org"). setLevel( logger.Level.ERROR )
logger.LogManager.getLogger("akka").setLevel( logger.Level.ERROR )
在创建SparkContext之后立即调用此方法,将测试时记录的stderr行从2647减少到163。但是创建SparkContext本身会记录163,直至
15/08/25 10:14:16 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
而且我还不清楚如何以编程方式进行调整。
Inspired by the pyspark/tests.py I did
def quiet_logs(sc):
logger = sc._jvm.org.apache.log4j
logger.LogManager.getLogger("org"). setLevel( logger.Level.ERROR )
logger.LogManager.getLogger("akka").setLevel( logger.Level.ERROR )
Calling this just after creating SparkContext reduced stderr lines logged for my test from 2647 to 163. However creating the SparkContext itself logs 163, up to
15/08/25 10:14:16 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
and it’s not clear to me how to adjust those programmatically.
回答 2
在Spark 2.0中,您还可以使用setLogLevel为应用程序动态配置它:
from pyspark.sql import SparkSession
spark = SparkSession.builder.\
master('local').\
appName('foo').\
getOrCreate()
spark.sparkContext.setLogLevel('WARN')
在pyspark控制台中,默认spark
会话将已经可用。
In Spark 2.0 you can also configure it dynamically for your application using setLogLevel:
from pyspark.sql import SparkSession
spark = SparkSession.builder.\
master('local').\
appName('foo').\
getOrCreate()
spark.sparkContext.setLogLevel('WARN')
In the pyspark console, a default spark
session will already be available.
回答 3
编辑您的conf / log4j.properties文件,然后更改以下行:
log4j.rootCategory=INFO, console
至
log4j.rootCategory=ERROR, console
另一种方法是:
启动spark-shell并输入以下内容:
import org.apache.log4j.Logger
import org.apache.log4j.Level
Logger.getLogger("org").setLevel(Level.OFF)
Logger.getLogger("akka").setLevel(Level.OFF)
之后,您将看不到任何日志。
Edit your conf/log4j.properties file and Change the following line:
log4j.rootCategory=INFO, console
to
log4j.rootCategory=ERROR, console
Another approach would be to :
Fireup spark-shell and type in the following:
import org.apache.log4j.Logger
import org.apache.log4j.Level
Logger.getLogger("org").setLevel(Level.OFF)
Logger.getLogger("akka").setLevel(Level.OFF)
You won’t see any logs after that.
回答 4
>>> log4j = sc._jvm.org.apache.log4j
>>> log4j.LogManager.getRootLogger().setLevel(log4j.Level.ERROR)
>>> log4j = sc._jvm.org.apache.log4j
>>> log4j.LogManager.getRootLogger().setLevel(log4j.Level.ERROR)
回答 5
对于PySpark,您还可以使用在脚本中设置日志级别sc.setLogLevel("FATAL")
。从文档:
控制我们的logLevel。这将覆盖所有用户定义的日志设置。有效的日志级别包括:ALL,DEBUG,ERROR,FATAL,INFO,OFF,TRACE,WARN
For PySpark, you can also set the log level in your scripts with sc.setLogLevel("FATAL")
. From the docs:
Control our logLevel. This overrides any user-defined log settings. Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN
回答 6
您可以使用setLogLevel
val spark = SparkSession
.builder()
.config("spark.master", "local[1]")
.appName("TestLog")
.getOrCreate()
spark.sparkContext.setLogLevel("WARN")
You can use setLogLevel
val spark = SparkSession
.builder()
.config("spark.master", "local[1]")
.appName("TestLog")
.getOrCreate()
spark.sparkContext.setLogLevel("WARN")
回答 7
这可能是由于Spark如何计算其类路径。我的直觉是,Hadoop的log4j.properties
文件在类路径中出现在Spark之前,从而阻止您的更改生效。
如果你跑
SPARK_PRINT_LAUNCH_COMMAND=1 bin/spark-shell
然后Spark将打印用于启动Shell的完整类路径;就我而言
Spark Command: /usr/lib/jvm/java/bin/java -cp :::/root/ephemeral-hdfs/conf:/root/spark/conf:/root/spark/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:/root/spark/lib/datanucleus-api-jdo-3.2.1.jar:/root/spark/lib/datanucleus-core-3.2.2.jar:/root/spark/lib/datanucleus-rdbms-3.2.1.jar -XX:MaxPermSize=128m -Djava.library.path=:/root/ephemeral-hdfs/lib/native/ -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell --class org.apache.spark.repl.Main
在/root/ephemeral-hdfs/conf
类路径的最前面。
我已经发布了一个问题[SPARK-2913],可以在下一发行版中解决此问题(我应该尽快发布一个补丁)。
同时,有两种解决方法:
- 添加
export SPARK_SUBMIT_CLASSPATH="$FWDIR/conf"
到spark-env.sh
。
- 删除(或重命名)
/root/ephemeral-hdfs/conf/log4j.properties
。
This may be due to how Spark computes its classpath. My hunch is that Hadoop’s log4j.properties
file is appearing ahead of Spark’s on the classpath, preventing your changes from taking effect.
If you run
SPARK_PRINT_LAUNCH_COMMAND=1 bin/spark-shell
then Spark will print the full classpath used to launch the shell; in my case, I see
Spark Command: /usr/lib/jvm/java/bin/java -cp :::/root/ephemeral-hdfs/conf:/root/spark/conf:/root/spark/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:/root/spark/lib/datanucleus-api-jdo-3.2.1.jar:/root/spark/lib/datanucleus-core-3.2.2.jar:/root/spark/lib/datanucleus-rdbms-3.2.1.jar -XX:MaxPermSize=128m -Djava.library.path=:/root/ephemeral-hdfs/lib/native/ -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell --class org.apache.spark.repl.Main
where /root/ephemeral-hdfs/conf
is at the head of the classpath.
I’ve opened an issue [SPARK-2913] to fix this in the next release (I should have a patch out soon).
In the meantime, here’s a couple of workarounds:
- Add
export SPARK_SUBMIT_CLASSPATH="$FWDIR/conf"
to spark-env.sh
.
- Delete (or rename)
/root/ephemeral-hdfs/conf/log4j.properties
.
回答 8
Spark 1.6.2:
log4j = sc._jvm.org.apache.log4j
log4j.LogManager.getRootLogger().setLevel(log4j.Level.ERROR)
Spark 2.x:
spark.sparkContext.setLogLevel('WARN')
(火花是SparkSession)
或者是旧方法
在Spark Dir中重命名conf/log4j.properties.template
为conf/log4j.properties
。
在中log4j.properties
,更改log4j.rootCategory=INFO, console
为log4j.rootCategory=WARN, console
可用的不同日志级别:
- OFF(最具体,不记录)
- 致命(最具体,数据很少)
- 错误-仅在出现错误的情况下记录
- 警告-仅在出现警告或错误时记录
- INFO(默认)
- 调试-日志详细信息步骤(以及上述所有日志)
- TRACE(最具体,很多数据)
- ALL(最少,所有数据)
Spark 1.6.2:
log4j = sc._jvm.org.apache.log4j
log4j.LogManager.getRootLogger().setLevel(log4j.Level.ERROR)
Spark 2.x:
spark.sparkContext.setLogLevel('WARN')
(spark being the SparkSession)
Alternatively the old methods,
Rename conf/log4j.properties.template
to conf/log4j.properties
in Spark Dir.
In the log4j.properties
, change log4j.rootCategory=INFO, console
to log4j.rootCategory=WARN, console
Different log levels available:
- OFF (most specific, no logging)
- FATAL (most specific, little data)
- ERROR – Log only in case of Errors
- WARN – Log only in case of Warnings or Errors
- INFO (Default)
- DEBUG – Log details steps (and all logs stated above)
- TRACE (least specific, a lot of data)
- ALL (least specific, all data)
回答 9
编程方式
spark.sparkContext.setLogLevel("WARN")
可用选项
ERROR
WARN
INFO
Programmatic way
spark.sparkContext.setLogLevel("WARN")
Available Options
ERROR
WARN
INFO
回答 10
我将其用于具有1个主设备和2个从设备以及Spark 1.2.1的Amazon EC2。
# Step 1. Change config file on the master node
nano /root/ephemeral-hdfs/conf/log4j.properties
# Before
hadoop.root.logger=INFO,console
# After
hadoop.root.logger=WARN,console
# Step 2. Replicate this change to slaves
~/spark-ec2/copy-dir /root/ephemeral-hdfs/conf/
I used this with Amazon EC2 with 1 master and 2 slaves and Spark 1.2.1.
# Step 1. Change config file on the master node
nano /root/ephemeral-hdfs/conf/log4j.properties
# Before
hadoop.root.logger=INFO,console
# After
hadoop.root.logger=WARN,console
# Step 2. Replicate this change to slaves
~/spark-ec2/copy-dir /root/ephemeral-hdfs/conf/
回答 11
只需将以下参数添加到您的spark-submit命令中
--conf "spark.driver.extraJavaOptions=-Dlog4jspark.root.logger=WARN,console"
这仅暂时覆盖该作业的系统值。从log4j.properties文件中检查确切的属性名称(此处为log4jspark.root.logger)。
希望这会有所帮助,加油!
Simply add below param to your spark-submit command
--conf "spark.driver.extraJavaOptions=-Dlog4jspark.root.logger=WARN,console"
This overrides system value temporarily only for that job. Check exact property name (log4jspark.root.logger here) from log4j.properties file.
Hope this helps, cheers!
回答 12
以下针对scala用户的代码段:
选项1 :
您可以在摘要下方添加文件级
import org.apache.log4j.{Level, Logger}
Logger.getLogger("org").setLevel(Level.WARN)
选项2:
注意:这将适用于所有正在使用spark会话的应用程序。
import org.apache.spark.sql.SparkSession
private[this] implicit val spark = SparkSession.builder().master("local[*]").getOrCreate()
spark.sparkContext.setLogLevel("WARN")
选项3:
注意:此配置应添加到您的log4j.properties中。(可能类似于/etc/spark/conf/log4j.properties(其中有spark安装)或项目文件夹级别的log4j.properties),因为您在以下位置进行更改模块级别。这将适用于所有应用程序。
log4j.rootCategory=ERROR, console
恕我直言,选项1是明智的方法,因为可以在文件级别将其关闭。
This below code snippet for scala users :
Option 1 :
Below snippet you can add at the file level
import org.apache.log4j.{Level, Logger}
Logger.getLogger("org").setLevel(Level.WARN)
Option 2 :
Note : which will be applicable for all the application which is using
spark session.
import org.apache.spark.sql.SparkSession
private[this] implicit val spark = SparkSession.builder().master("local[*]").getOrCreate()
spark.sparkContext.setLogLevel("WARN")
Option 3 :
Note : This configuration should be added to your log4j.properties.. (could be like /etc/spark/conf/log4j.properties (where the spark installation is there) or your project folder level log4j.properties)
since you are changing at module level. This will be applicable for all the application.
log4j.rootCategory=ERROR, console
IMHO, Option 1 is wise way since it can be switched off at file level.
回答 13
我这样做的方式是:
在我运行spark-submit
脚本的位置
$ cp /etc/spark/conf/log4j.properties .
$ nano log4j.properties
更改INFO
为所需的日志记录级别,然后运行spark-submit
The way I do it is:
in the location I run the spark-submit
script do
$ cp /etc/spark/conf/log4j.properties .
$ nano log4j.properties
change INFO
to what ever level of logging you want and then run your spark-submit
回答 14
我想继续使用日志记录(Python的日志记录工具),可以尝试拆分应用程序和Spark的配置:
LoggerManager()
logger = logging.getLogger(__name__)
loggerSpark = logging.getLogger('py4j')
loggerSpark.setLevel('WARNING')
I you want to keep using the logging (Logging facility for Python) you can try splitting configurations for your application and for Spark:
LoggerManager()
logger = logging.getLogger(__name__)
loggerSpark = logging.getLogger('py4j')
loggerSpark.setLevel('WARNING')