linux(centos7)基于hadoop2.5.2安装spark1.2.1.doc_第1页
linux(centos7)基于hadoop2.5.2安装spark1.2.1.doc_第2页
linux(centos7)基于hadoop2.5.2安装spark1.2.1.doc_第3页
linux(centos7)基于hadoop2.5.2安装spark1.2.1.doc_第4页
linux(centos7)基于hadoop2.5.2安装spark1.2.1.doc_第5页
已阅读5页,还剩5页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

linux(centos7)基于hadoop2.5.2安装spark1.2.11、安装hadoop参考/bahaidong/article/details/418659432、安装scala参考/bahaidong/article/details/442206333、安装spark下载spark最新版spark-1.2.1-bin-hadoop2.4.tgz/dyn/closer.cgi/spark/spark-1.2.1/spark-1.2.1-bin-hadoop2.4.tgz上传到linux上/opt下面rootmaster Downloads# cp spark-1.2.1-bin-hadoop2.4.tgz /opt解压rootmaster Downloads# cd /optrootmaster opt# tar -zxf spark-1.2.1-bin-hadoop2.4.tgz修改属组(与hadoop一个用户)rootmaster opt# chown -R hadoop:hadoop spark-1.2.1-bin-hadoop2.4/查看权限rootmaster opt# ls -lldrwxrwxr-x. 12 hadoop hadoop 4096 Apr 10 16:11 spark-1.2.1-bin-hadoop2.4-rw-r-. 1 root root 219309755 Apr 10 15:34 spark-1.2.1-bin-hadoop2.4.tg添加环境变量rootmaster spark-1.2.1-bin-hadoop2.4# vim /etc/profileexport SPARK_HOME=/opt/spark-1.2.1-bin-hadoop2.4export PATH=$PATH:$SPARK_HOME/bin:wq! #保存并退出执行使立即生效rootmaster spark-1.2.1-bin-hadoop2.4# . /etc/profile 或者source /etc/profile切换用户rootmaster spark-1.2.1-bin-hadoop2.4# su hadoop进入confhadoopmaster spark-1.2.1-bin-hadoop2.4$ cd conf拷贝spark-env.sh.template 到 spark-env.shhadoopmaster conf$ cp spark-env.sh.template spark-env.sh编辑hadoopmaster conf$ vim spark-env.sh添加如下内容export JAVA_HOME=/usr/java/jdk1.7.0_71export SCALA_HOME=/usr/scala/scala-2.11.6export SPARK_MASTER_IP=99 #集群master的ipexport SPARK_WORKER_MEMORY=2g #worker节点分配给excutors的最大内存,因为三台机器都是2Gexport HADOOP_CONF_DIR=/opt/hadoop-2.5.2/etc/hadoop #hadoop集群的配置文件的目录编辑slaveshadoopmaster conf$ cp slaves.template slaveshadoopmaster conf$ vim slaves修改成如下内容(集群中各台机器的名字)master.hadoopslave1.hadoopslave2.hadoop4、安装另两台slave1.hadoop 与slave2.hadoop,安装过程与上述过程一样直接拷贝文件即可hadoopmaster opt$ scp -r spark-1.2.1-bin-hadoop2.4 rootslave1:/opt/hadoopmaster opt$ scp -r spark-1.2.1-bin-hadoop2.4 rootslave2:/opt/登陆到slave1.hadoop并修改slave1.hadoop的属组rootmaster opt# ssh slave1.hadooprootslave1 #cd /opt rootslave1 opt# chown -R hadoop:hadoop spark-1.2.1-bin-hadoop2.4/添加slave1.hadoop的环境变量rootslave1 opt# vim /etc/profileexport SPARK_HOME=/opt/spark-1.2.1-bin-hadoop2.4export PATH=$PATH:$SPARK_HOME/binrootslave1 opt# . /etc/profile 或者 source /etc/profile打开一个新的终端,登陆到slave2.hadoop并修改slave2.hadoop的属组rootmaster opt# ssh slave2.hadooprootslave2 #cd /opt rootslave2 opt# chown -R hadoop:hadoop spark-1.2.1-bin-hadoop2.4/添加slave2.hadoop的环境变量rootslave2 opt# vim /etc/profileexport SPARK_HOME=/opt/spark-1.2.1-bin-hadoop2.4export PATH=$PATH:$SPARK_HOME/binrootslave2 opt# . /etc/profile 或者 source /etc/profile4、启动首先启动hadoophadoopmaster hadoop-2.5.2$ ./sbin/start-dfs.shhadoopmaster hadoop-2.5.2$ ./sbin/start-yarn.shhadoopmaster hadoop-2.5.2$ jps25229 NameNode25436 SecondaryNameNode25862 Jps25605 ResourceManagerhadoopmaster hadoop-2.5.2$表示启动成功再启动sparkhadoopmaster spark-1.2.1-bin-hadoop2.4$ ./sbin/start-all.shhadoopmaster spark-1.2.1-bin-hadoop2.4$ jps26070 Master25229 NameNode26219 Worker25436 SecondaryNameNode25605 ResourceManager26314 Jpshadoopmaster spark-1.2.1-bin-hadoop2.4$多了Master与Worker表示启动成功查看web页面http:/master.hadoop:8080/ 进入bin目录下的spark-shellhadoopmaster spark-1.2.1-bin-hadoop2.4$ cd binhadoopmaster bin$ spark-shellSpark assembly has been built with Hive, including Datanucleus jars on classpath15/03/12 14:53:48 INFO spark.SecurityManager: Changing view acls to: hadoop15/03/12 14:53:48 INFO spark.SecurityManager: Changing modify acls to: hadoop15/03/12 14:53:48 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)15/03/12 14:53:48 INFO spark.HttpServer: Starting HTTP Server15/03/12 14:53:48 INFO server.Server: jetty-8.y.z-SNAPSHOT15/03/12 14:53:48 INFO server.AbstractConnector: Started SocketConnector:4796515/03/12 14:53:48 INFO util.Utils: Successfully started service HTTP class server on port 47965.Welcome to _ / _/_ _ _/ /_ _ / _ / _ / _/ _/_/ ._/_,_/_/ /_/_ version 1.2.1 /_/Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_71)Type in expressions to have them evaluated.Type :help for more information.15/03/12 14:54:44 INFO spark.SecurityManager: Changing view acls to: hadoop15/03/12 14:54:44 INFO spark.SecurityManager: Changing modify acls to: hadoop15/03/12 14:54:44 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)15/03/12 14:54:47 INFO slf4j.Slf4jLogger: Slf4jLogger started15/03/12 14:54:47 INFO Remoting: Starting remoting15/03/12 14:54:48 INFO Remoting: Remoting started; listening on addresses :akka.tcp:/sparkDrivermaster:3560815/03/12 14:54:48 INFO util.Utils: Successfully started service sparkDriver on port 35608.15/03/12 14:54:48 INFO spark.SparkEnv: Registering MapOutputTracker15/03/12 14:54:48 INFO spark.SparkEnv: Registering BlockManagerMaster15/03/12 14:54:48 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-f86b289e-f690-4e31-9f8c-55814655620b/spark-c6d44057-0149-4046-bddb-7609e9b7898415/03/12 14:54:48 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB15/03/12 14:54:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform. using builtin-java classes where applicable15/03/12 14:54:51 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-0ffa51b3-bb0a-4689-8dd5-1d649503b21f/spark-04debaff-ac2c-403f-8c12-13f3e1f6381215/03/12 14:54:51 INFO spark.HttpServer: Starting HTTP Server15/03/12 14:54:51 INFO server.Server: jetty-8.y.z-SNAPSHOT15/03/12 14:54:51 INFO server.AbstractConnector: Started SocketConnector:3824515/03/12 14:54:51 INFO util.Utils: Successfully started service HTTP file server on port 38245.15/03/12 14:54:52 INFO server.Server: jetty-8.y.z-SNAPSHOT15/03/12 14:54:52 INFO server.AbstractConnector: Started SelectChannelConnector:404015/03/12 14:54:52 INFO util.Utils: Successfully started service SparkUI on port 4040.15/03/12 14:54:52 INFO ui.SparkUI: Started SparkUI athttp:/master:404015/03/12 14:54:52 INFO executor.Executor: Starting executor ID on host localhost15/03/12 14:54:52 INFO executor.Executor: Using REPL class URI: 36:4796515/03/12 14:54:52 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp:/sparkDrivermaster:35608/user/HeartbeatReceiver15/03/12 14:54:53 INFO netty.NettyBlockTransferService: Server created on 3756415/03/12 14:54:53 INFO storage.BlockManagerMaster: Trying to register BlockManager15/03/12 14:54:53 INFO storage.BlockManagerMasterActor: Registering block manager localhost:37564 with 267.3 MB RAM, BlockManagerId(, localhost, 37564)15/03/12 14:54:53 INFO storage.BlockManagerMaster: Registered BlockManager15/03/12 14:54:53 INFO repl.SparkILoop: Created spark context.Spark context available as sc.scala通过浏览器进入sparkUIhttp:/master.hadoop:4040 5、测试复制README.md文件到hdfs系统上(根目录下)hadoopmaster spark-1.2.1-bin-hadoop2.4$ hadoop dfs -copyFromLocal README.md /查看文件hadoopmaster hadoop-2.5.2$ hadoop fs -ls -R README.md-rw-r-r- 2 hadoop supergroup 3629 2015-03-12 15:11 README.md通过spark-shell读取文件scala val file=sc.textFile(hdfs:/master.hadoop:9000/README.md)统计Spark出现多少次scala val sparks = file.filter(line=line.contains(Spark)scala sparks.count15/03/12 15:28:47 INFO mapred.FileInputFormat: Total input paths to process : 115/03/12 15:28:47 INFO spark.SparkContext: Starting job: count at :1715/03/12 15:28:47 INFO scheduler.DAGScheduler: Got job 0 (count at :17) with 2 output partitions (allowLocal=false)15/03/12 15:28:47 INFO scheduler.DAGScheduler: Final stage: Stage 0(count at :17)15/03/12 15:28:47 INFO scheduler.DAGScheduler: Parents of final stage: List()15/03/12 15:28:47 INFO scheduler.DAGScheduler: Missing parents: List()15/03/12 15:28:47 INFO scheduler.DAGScheduler: Submitting Stage 0 (FilteredRDD2 at filter at :14), which has no missing parents15/03/12 15:28:47 INFO storage.MemoryStore: ensureFreeSpace(2752) called with curMem=187602, maxMem=28024897515/03/12 15:28:47 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.7 KB, free 267.1 MB)15/03/12 15:28:47 INFO storage.MemoryStore: ensureFreeSpace(1975) called with curMem=190354, maxMem=28024897515/03/12 15:28:47 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1975.0 B, free 267.1 MB)15/03/12 15:28:47 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:37564 (size: 1975.0 B, free: 267.2 MB)15/03/12 15:28:47 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece015/03/12 15:28:47 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:83815/03/12 15:28:47 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (FilteredRDD2 at filter at :14)15/03/12 15:28:47 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks15/03/12 15:28:47 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, ANY, 1304 bytes)15/03/12 15:28:47 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, ANY, 1304 bytes)15/03/12 15:28:47 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)15/03/12 15:28:47 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)15/03/12 15:28:48 INFO rdd.HadoopRDD: Input split: hdfs:/master:9000/user/hadoop/README.md:0+181415/03/12 15:28:48 INFO rdd.HadoopRDD: Input split: hdfs:/master:9000/user/hadoop/README.md:1814+181515/03/12 15:28:48 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id15/03/12 15:28:48 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap15/03/12 15:28:48 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论