使用 mapPartitionsWithIndex
如下所示
// Create (1, 1), (2, 2), ..., (100, 100) dataset // and partition by key so we know what to expect val rdd = sc.parallelize((1 to 100) map (i => (i, i)), 16) .partitionBy(new org.apache.spark.HashPartitioner(8)) val zeroth = rdd // If partition number is not zero ignore data .mapPartitionsWithIndex((idx, iter) => if (idx == 0) iter else Iterator()) // Check if we get expected results 8, 16, ..., 96 assert (zeroth.keys.map(_ % 8 == 0).reduce(_ & _) & zeroth.count == 12)