Hadoop 2.6.0分布式部署参考手册.doc

上传人：缘*** IP属地：河北上传时间：2020-01-29 格式：DOC 页数：18 大小：169.01KB 积分：15 举报 版权申诉

已阅读5页，还剩13页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

Hadoop 2.6.0分布式部署参考手册1.环境说明21.1安装环境说明22.2 Hadoop集群环境说明：22.基础环境安装及配置22.1 添加hadoop用户22.2 JDK 1.7安装22.3 SSH无密码登陆配置32.4 修改hosts映射文件33.Hadoop安装及配置43.1 通用部分安装及配置43.2 各节点配置44.格式化/启动集群44.1 格式化集群HDFS文件系统44.2启动Hadoop集群5附录1 关键配置内容参考51core-site.xml52hdfs-site.xml53mapred-site.xml64yarn-site.xml65hadoop-env.sh66slaves7附录2 详细配置内容参考71core-site.xml72hdfs-site.xml73mapred-site.xml84yarn-site.xml105hadoop-env.sh126slaves13附录3 详细配置参数参考13 conf/core-site.xml13 conf/hdfs-site.xml13o Configurations for NameNode:13o Configurations for DataNode:14 conf/yarn-site.xml14o Configurations for ResourceManager and NodeManager:14o Configurations for ResourceManager:14o Configurations for NodeManager:15o Configurations for History Server (Needs to be moved elsewhere):16 conf/mapred-site.xml17o Configurations for MapReduce Applications:17o Configurations for MapReduce JobHistory Server:171.环境说明1.1安装环境说明本列中，操作系统为Centos 7.0，JDK版本为Oracle HotSpot 1.7，Hadoop版本为Apache Hadoop 2.6.0，操作用户为hadoop。2.2 Hadoop集群环境说明：集群各节点信息参考如下：主机名IP地址角色ResourceManagerResourceManager & MR JobHistory ServerNameNodeNameNodeSecondaryNameNodeSecondaryNameNodeDataNode01DataNode & NodeManagerDataNode02DataNode & NodeManagerDataNode03DataNode & NodeManagerDataNode04DataNode & NodeManagerDataNode05DataNode & NodeManager注：上述表中用”&”连接多个角色，如主机”ResourceManager”有两个角色，分别为ResourceManager和MR JobHistory Server。2.基础环境安装及配置2.1 添加hadoop用户useradd hadoop用户“hadoop”即为Hadoop集群的安装和使用用户。2.2 JDK 1.7安装 Centos 7自带的JDK版本为 OpenJDK 1.7，本例中需要将其更换为Oracle HotSpot 1.7版，本例中采用解压二进制包方式安装，安装目录为/opt/。1 查看当前JDK rpm包 rpm -qa | grep jdkjava-1.7.0-openjdk-1-.el7.x86_64java-1.7.0-openjdk-headless-1-.el7.x86_642 删除自带JDK rpm -e -nodepsjava-1.7.0-openjdk-1-.el7.x86_64 rpm -e -nodepsjava-1.7.0-openjdk-headless-1-.el7.x86_643 安装指定JDK 进入安装包所在目录并解压4 配置环境变量编辑/.bashrc或者/etc/profile，添加如下内容：#JAVAexport JAVA_HOME=/opt/jdk1.7export PATH=$PATH:$JAVA_HOME/binexport CLASSPATH=$JAVA_HOME/libexport CLASSPATH=$CLASSPATH:$JAVA_HOME/jre/lib2.3 SSH无密码登陆配置1 需要设置如上表格所示8台主机间的SSH无密码登陆。2 进入hadoop用户的根目录下并通过命令ssh-keygen -t rsa生成秘钥对3 创建公钥认证文件authorized_keys并将生成的/.ssh目录下的id_rsa.pub文件的内容输出至该文件：more id_rsa.pub auhorized_keys4 分别改变/.ssh目录和authorized_keys文件的权限： chmod700 /.ssh;chmod600 /.ssh/authorized_keys5 每个节点主机都重复以上步骤，并将各自的/.ssh/id_rsa.pub文件的公钥拷贝至其他主机。对于以上操作，也可以通过一句命令搞定：rm -rf /.ssh;ssh-keygen -t rsa;chmod 700 /.ssh;more /.ssh/id_rsa.pub /.ssh/authorized_keys;chmod 600 /.ssh/authorized_keys;注：在centos 6中可以用dsa方式：ssh-keygen -t dsa命令来设置无密码登陆，在centos 7中只能用rsa方式，否则只能ssh无密码登陆本机，无能登陆它机。2.4 修改hosts映射文件分别编辑各节点上的/etc/hosts文件，添加如下内容： ResourceManager NameNode SecondaryNameNode DataNode01 DataNode02 DataNode03 DataNode04 DataNode05 NodeManager01 NodeManager02 NodeManager03 NodeManager04 NodeManager053.Hadoop安装及配置3.1 通用部分安装及配置以下操作内容为通用操作部分，及在每个节点上的内容一样。分别在每个节点上重复如下操作：1 将hadoop安装包（hadoop-2.6.0.tar）拷贝至/opt目录下，并解压： tar -xvf hadoop-2.6.0.tar解压后的hadoop-2.6.0目录(/opt/hadoop-2.6.0)即为hadoop的安装根目录。2 更改hadoop安装目录hadoop-2.6.0的所有者为hadoop用户：chown -R hadoop.hadoop /opt/hadoop-2.6.03 添加环境变量：#hadoopexport HADOOP_HOME=/opt/hadoop-2.6.0export PATH=$PATH:$HADOOP_HOME/binexport PATH=$PATH:$HADOOP_HOME/sbin3.2 各节点配置分别将如下配置文件解压并分发至每个节点的Hadoop“$HADOOP_HOME/etc/hadoop”目录中，如提示是否覆盖文件，确认即可。注：关于各节点的配置参数设置，请参考后面的“附录1”或“附录2”4.格式化/启动集群4.1 格式化集群HDFS文件系统安装完毕后，需登陆NameNode节点或任一DataNode节点执行hdfs namenode -format格式化集群HDFS文件系统；注：如果非第一次格式化HDFS文件系统，则需要在进行格式化操作前分别将NameNode的.dir和各个DataNode节点的dfs.datanode.data.dir目录(在本例中为/home/hadoop/hadoopdata)下的所有内容清空。 4.2启动Hadoop集群分别登陆如下主机并执行相应命令：1 登陆ResourceManger执行start-yarn.sh命令启动集群资源管理系统yarn2 登陆NameNode执行start-dfs.sh命令启动集群HDFS文件系统3 分别登陆SecondaryNameNode、DataNode01、DataNode02、DataNode03、DataNode04节点执行jps命令，查看每个节点是否有如下Java进程运行：ResourceManger节点运行的进程：ResouceNamagerNameNode节点运行的进程：NameNodeSecondaryNameNode节点运行的进程：SecondaryNameNode各个DataNode节点运行的进程：DataNode & NodeManager如果以上操作正常则说明Hadoop集群已经正常启动。附录1 关键配置内容参考1core-site.xml fs.defaultFS hdfs:/NameNode:9000 NameNode URI l 属性”fs.defaultFS“表示NameNode节点地址，由”hdfs:/主机名(或ip)：端口号”组成。2hdfs-site.xml .dir file:/home/hadoop/hadoopdata/hdfs/namenode dfs.datanode.data.dir file:/home/jack/hadoopdata/hdfs/datanode /property node.secondary.http-address SecondaryNameNode:50090 l 属性“.dir”表示NameNode存储命名空间和操作日志相关的元数据信息的本地文件系统目录，该项默认本地路径为”/tmp/hadoop-username/dfs/name”；l 属性”dfs.datanode.data.dir“表示DataNode节点存储HDFS文件的本地文件系统目录，由”file:/本地目录”组成,该项默认本地路径为”/tmp/hadoop-username/dfs/data”。l 属性“node.secondary.http-address”表示SecondNameNode主机及端口号（如果无需额外指定SecondNameNode角色，可以不进行此项配置）；3mapred-site.xml yarn Execution framework set to Hadoop YARN. l 属性”“表示执行mapreduce任务所使用的运行框架，默认为local，需要将其改为”yarn”4yarn-site.xml yarn.resourcemanager.hostnameResourceManagerResourceManagerhost yarn.nodemanager.aux-services mapreduce_shuffle Shuffle service that needs to be set for Map Reduce applications. l 属性”yarn.resourcemanager.hostname”用来指定ResourceManager主机地址；l 属性”yarn.nodemanager.aux-service“表示MR applicatons所使用的shuffle工具类5hadoop-env.shJAVA_HOME表示当前的Java安装目录export JAVA_HOME=/opt/jdk-1.76slaves集群中的master节点(NameNode、ResourceManager)需要配置其所拥有的slaver节点，其中：NameNode节点的slaves内容为：DataNode01DataNode02DataNode03DataNode04DataNode05ResourceManager节点的slaves内容为：NodeManager01NodeManager02NodeManager03NodeManager04NodeManager05附录2 详细配置内容参考注：以下的红色字体部分的配置参数为必须配置的部分，其他配置皆为默认配置。1core-site.xml fs.defaultFS hdfs:/NameNode:9000 NameNode URI io.file.buffer.size 131072 Size of read/write buffer used in SequenceFiles,The default value is 131072 l 属性”fs.defaultFS“表示NameNode节点地址，由”hdfs:/主机名(或ip)：端口号”组成。2hdfs-site.xml .dir file:/home/hadoop/hadoopdata/hdfs/namenode node.secondary.http-address SecondaryNameNode:50090 dfs.replication 3 /property dfs.blocksize 268435456 node.handler.count 100 dfs.datanode.data.dir file:/home/hadoop/hadoopdata/hdfs/datanode l 属性“.dir”表示NameNode存储命名空间和操作日志相关的元数据信息的本地文件系统目录，该项默认本地路径为”/tmp/hadoop-username/dfs/name”；l 属性”dfs.datanode.data.dir“表示DataNode节点存储HDFS文件的本地文件系统目录，由”file:/本地目录”组成,该项默认本地路径为”/tmp/hadoop-username/dfs/data”。l 属性“node.secondary.http-address”表示SecondNameNode主机及端口号（如果无需额外指定SecondNameNode角色，可以不进行此项配置）；3mapred-site.xml yarn Execution framework set to Hadoop YARN. mapreduce.map.memory.mb 1024 Larger resource limit for maps. mapreduce.map.java.opts Xmx1024M Larger heap-size for child jvms of maps. mapreduce.reduce.memory.mb 1024 Larger resource limit for reduces. mapreduce.reduce.java.opts Xmx2560M mapreduce.task.io.sort.mb 512 mapreduce.task.io.sort.factor 10 More streams merged at once while sorting files. mapreduce.reduce.shuffle.parallelcopies 5 Higher number of parallel copies run by reduces to fetch outputs from very large number of maps. mapreduce.jobhistory.address ResourceManager:10020 MapReduce JobHistory Server host:port Default port is 10020 mapreduce.jobhistory.webapp.address ResourceManager:19888 MapReduce JobHistory Server Web UI host:port Default port is 19888 ermediate-done-dir /mr-history/tmp Directory where history files are written by MapReduce jobs. Defalut is /mr-history/tmp mapreduce.jobhistory.done-dir /mr-history/done Directory where history files are managed by the MR JobHistory Server.Default value is /mr-history/done l 属性”“表示执行mapreduce任务所使用的运行框架，默认为local，需要将其改为”yarn”4yarn-site.xml yarn.acl.enable false Enable ACLs? Defaults to false. The value of the optional is true or false yarn.admin.acl * ACL to set admins on the cluster. ACLs are of for comma-separated-usersspacecomma-separated-groups. Defaults to special value of * which means anyone. Special value of just space means no one has access yarn.log-aggregation-enable false Configuration to enable or disable log aggregation yarn.resourcemanager.address ResourceManager:8032 ResourceManager host:port for clients to submit jobs.NOTES:host:port If set, overrides the hostname set in yarn.resourcemanager.hostname. yarn.resourcemanager.scheduler.address ResourceManager:8030 ResourceManager host:port for ApplicationMasters to talk to Scheduler to obtain resources.NOTES:host:port If set, overrides the hostname set in yarn.resourcemanager.hostname yarn.resourcemanager.resource-tracker.address ResourceManager:8031 ResourceManager host:port for NodeManagers.NOTES:host:port If set, overrides the hostname set in yarn.resourcemanager.hostname yarn.resourcemanager.admin.addressResourceManager:8033ResourceManager host:port for administrative commands.NOTES:host:port If set, overrides the hostname set in yarn.resourcemanager.hostname. yarn.resourcemanager.webapp.address ResourceManager:8088 ResourceManager web-ui host:port. NOTES:host:port If set, overrides the hostname set in yarn.resourcemanager.hostname yarn.resourcemanager.hostname ResourceManager ResourceManager host yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler ResourceManager Scheduler class CapacityScheduler (recommended), FairScheduler (also recommended), or FifoScheduler.The default value is org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler. yarn.scheduler.minimum-allocation-mb 1024 Minimum limit of memory to allocate to each container request at the Resource Manager.NOTES:In MBs yarn.scheduler.maximum-allocation-mb 8192 Maximum limit of memory to allocate to each container request at the Resource Manager.NOTES:In MBs yarn.log-aggregation.retain-seconds -1 How long to keep aggregation logs before deleting them. -1 disables. Be careful, set this too small and you will spam the name node. yarn.log-aggregation.retain-check-interval-seconds -1 Time between checks for aggregated log retention. If set to 0 or a negative value then the value is computed as one-tenth of the aggregated log retention time. Be careful, set this too small and you will spam the name node. yarn.nodemanager.resource.memory-mb 8192 Resource i.e. available physical memory, in MB, for given NodeManager. The default value is 8192. NOTES:Defines total available resources on the NodeManager to be made available to running containers yarn.nodemanager.vmem-pmem-ratio 2.1 Maximum ratio by which virtual memory usage of tasks may exceed physical memory. The default value is 2.1 NOTES:The virtual memory usage of each task may exceed its physical memory limit by this ratio. The total amount of virtual memory used by tasks on the NodeManager may exceed its physical memory usage by this ratio. yarn.nodemanager.local-dir $hadoop.tmp.dir/nm-local-dir Comma-separated list of paths on the local filesystem where intermediate data is written. The default value is $hadoop.tmp.dir/nm-local-dir NOTES:Multiple paths help spread disk i/o. yarn.nodemanager.log-dirs $yarn.log.dir/userlogs Comma-separated list of paths on the local filesystem where logs are written The default value is $yarn.log.dir/userlogs NOTES:Multiple paths help spread disk i/o. yarn.nodemanager.log.retain-seconds 10800 Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled. The default value is 10800 yarn.nodemanager.remote-app-log-dir /logs HDFS directory where the application logs are moved on application completion. Need to set appropriate permissions. Only applicable if log-aggregation is enabled. The default value is /logs or /tmp/logs yarn.nodemanager.remote-app-log-dir-suffix logs Suffix appended to the remote log dir. Logs will be aggregated to $yarn.nodemanager.remote-app-log-dir/$user/$thisParam Only applicable if log-aggregation is enabled. yarn.nodemanager.aux-services mapreduce_shuffle Shuffle service that needs to be set for Map Reduce applications. l 属性”yarn.resourcemanager.hostname”用来指定ResourceManager主机地址；l 属性”yarn.nodemanager.aux-service“表示MR applicatons所使用的shuffle工具类5hadoop-env.shJAVA_HOME表示当前的Java安装目录export JAVA_HOME=/opt/jdk-1.76slaves集群中的master节点(NameNode、ResourceManager)需要配置其所拥有的slaver节点，其中：NameNode节点的slaves内容为：DataNode01DataNode02DataNode03DataNode04DataNode05ResourceManager节点的slaves内容为：NodeManager01NodeManager02NodeManager03NodeManager04NodeManager05附录3 详细配置参数参考Configuring the Hadoop Daemons in Non-Secure ModeThis section deals with important parame

人人文库> 全部分类> 行业资料 > 管理策划

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

Hadoop 2.6.0分布式部署参考手册.doc

文档简介

温馨提示

最新文档

评论

Hadoop 2.6.0分布式部署参考手册.doc

文档简介

温馨提示

最新文档

评论

相关文档