共计 776 个字符,预计需要花费 2 分钟才能阅读完成。
深夜水文一篇,这是n天前碰到的一个问题:
Exception in thread “main” org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
出现这个问题的是发生在join的时候,同时也会报一些 broadcast 异常,为了解决这个问题,你需要做以下两个配置
| spark.sql.broadcastTimeout | 300 | Timeout in seconds for the broadcast wait time in broadcast joins | 1.3.0 | 
|---|---|---|---|
| spark.sql.autoBroadcastJoinThreshold | 10485760 (10 MB) | Configures the maximum size in bytes for a table that will be broadcast to all worker nodes when performing a join. By setting this value to -1 broadcasting can be disabled. Note that currently statistics are only supported for Hive Metastore tables where the command ANALYZE TABLE <tableName> COMPUTE STATISTICS noscanhas been run. | 1.1.0 | 
上面是默认的配置,为了解决上面的问题,我设置如下
spark = SparkSession
  .builder
  .appName("test")
  .config("spark.sql.broadcastTimeout", "1800")
  .config("spark.sql.autoBroadcastJoinThreshold","-1")
  .getOrCreate()
正文完
                                                    请博主喝杯咖啡吧!
                                 
                             
                        