文章目录
  1. Spark2.x文档随意贴spark-core
    1. 1. Quick Start
    2. 2. RDD Programming Guide

[TOC]

Spark2.x文档随意贴spark-core

1. Quick Start

  • 使用 Caching

Caching

  • Spark-submit

可以直接提交jarpy以及R文件。

2. RDD Programming Guide

  • Overview

  • Spark2.x Wordcount HelloWorkd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
val sc =
SparkSession.builder()
.appName("")
.master("local[*]")
.getOrCreate()
import sc.implicits._

val collect = sc.read.textFile("hdfs://127.0.0.1:9000/mimosa/input/hello.txt")
.flatMap(_.split(" "))
.groupByKey(_.toLowerCase)
.count()
collect.show()

sc.stop()
  • Parallelized Collections

  • External Datasets

  • RDD Operations

  • Understanding closures
1
2
3
4
5
6
7
var counter = 0
var rdd = sc.parallelize(data)

// Wrong: Don't do this!!
rdd.foreach(x => counter += x)

println("Counter value: " + counter)

常用算子参考:http://patamon.me/icemimosa/spark/Spark2.x%E5%B8%B8%E7%94%A8%E7%AE%97%E5%AD%90/

  • RDD Persistence

  • Broadcast Variables 和 Accumulators
文章目录
  1. Spark2.x文档随意贴spark-core
    1. 1. Quick Start
    2. 2. RDD Programming Guide