本文共 1423 字,大约阅读时间需要 4 分钟。
使用wordcount官方自带案例,熟悉spark-submit和spark-shell两种提交spark应用程序方法。
(1)启动hdfs
(2)spark根目录下执行
bin/spark-submit --master local[2] \--class org.apache.spark.examples.streaming.NetworkWordCount \--name NetworkWordCount \/opt/modules/spark-2.1.0-bin-2.7.3/examples/jars/spark-examples_2.11-2.1.0.jar bigdata.ibeifeng.com 9999
(1)启动hdfs
(2)启动shell
./spark-shell --master local[2]
(3)启动metastore
bin/hive --service metastore &
(4)写入代码
import org.apache.spark.streaming.{Seconds, StreamingContext}val ssc = new StreamingContext(sc, Seconds(4))val lines = ssc.socketTextStream("bigdata.ibeifeng.com", 9999)val words = lines.flatMap(_.split(" "))val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)wordCounts.print()ssc.start()ssc.awaitTermination()
(5)开启nc
nc -lk 9999
【阿里云】
(1)开启hdfs
(2)开启spark-shell的local模式
bin/spark-shell --master local[2]
(3)安装nc
yum install -y nc
打开9999端口
nc -lk 9999
(5)在shell中输入
import org.apache.spark.streaming._import org.apache.spark.streaming.StreamingContext._ val ssc = new StreamingContext(sc,Seconds(10))val dstream = ssc.socketTextStream("hadoop", 9999)val resultDStream = dstream.flatMap(_.split(" ")).map((_,1)).reduceByKey(_ + _)resultDStream.print()ssc.start() // Start the computationssc.awaitTermination() // Wait for the computation to terminate
(4)在nc端口输入测试内容
转载地址:http://lvygi.baihongyu.com/