spark 遇到的一个join产生的broadcast问题

2,270次阅读
没有评论

共计 776 个字符,预计需要花费 2 分钟才能阅读完成。

深夜水文一篇,这是n天前碰到的一个问题:

Exception in thread “main” org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:

出现这个问题的是发生在join的时候,同时也会报一些 broadcast 异常,为了解决这个问题,你需要做以下两个配置

spark.sql.broadcastTimeout 300 Timeout in seconds for the broadcast wait time in broadcast joins 1.3.0
spark.sql.autoBroadcastJoinThreshold 10485760 (10 MB) Configures the maximum size in bytes for a table that will be broadcast to all worker nodes when performing a join. By setting this value to -1 broadcasting can be disabled. Note that currently statistics are only supported for Hive Metastore tables where the command ANALYZE TABLE <tableName> COMPUTE STATISTICS noscan has been run. 1.1.0

上面是默认的配置,为了解决上面的问题,我设置如下

spark = SparkSession
  .builder
  .appName("test")
  .config("spark.sql.broadcastTimeout", "1800")
  .config("spark.sql.autoBroadcastJoinThreshold","-1")
  .getOrCreate()
正文完
请博主喝杯咖啡吧!
post-qrcode
 
admin
版权声明:本站原创文章,由 admin 2021-12-31发表,共计776字。
转载说明:除特殊说明外本站文章皆由CC-4.0协议发布,转载请注明出处。
评论(没有评论)
验证码