hadoop单机部署、集群部署及win7本地Eclipse远程配置管理.doc

上传人：仙*** IP属地：河南上传时间：2020-01-11 格式：DOC 页数：39 大小：557KB 积分：17 举报 版权申诉

hadoop单机部署、集群部署及win7本地Eclipse远程配置管理.doc_第2页

hadoop单机部署、集群部署及win7本地Eclipse远程配置管理.doc_第3页

hadoop单机部署、集群部署及win7本地Eclipse远程配置管理.doc_第4页

hadoop单机部署、集群部署及win7本地Eclipse远程配置管理.doc_第5页

已阅读5页，还剩34页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

准备工作：Window版hadoop下载地址：/apache/hadoop/common/hadoop-1.2.1/hadoop-1.2.1-bin.tar.gz下载Eclipse hadoop的插件地址：hadoop-eclipse-plugin-1.2.1.jarLinux Hadoop下载地址：/apache/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz在linux服务器上创建用户名为hadoop的用户：rootlocalhost # useradd hadoop设置密码：rootlocalhost # passwd hadoop添加hadoop用户使用vim、vi等命令的权限：rootlocalhost # vim /etc/sudoers root ALL=(ALL) ALLhadoop ALL=(ALL) ALL此处保存是可能需要使用:wq!强制保存。以下文档如无特殊说明均使用hadoop账户进行操作1.Hadoop单机部署1. 下载hadoop-1.2.1.tar.gz文件。2. 运行命令 tar zxvf hadoop-1.2.1.tar.gz将hadoop解压到自己喜欢的目录下（我的解压在/usr/local/目录下）3. 编辑hadoop-1.2.1目录下的conf/hadoop-env.sh文件，将其中的JAVA_HOME配置为自己的jdk目录（如我的为：JAVA_HOME=/usr/local/jdk1.7.0_60）4. 到此出Hadoop单机部署基本完成。5. 单机模式的操作方法默认情况下，Hadoop被配置成以非分布式模式运行的一个独立Java进程。这对调试非常有帮助。下面的实例将已解压的conf目录拷贝作为输入，查找并显示匹配给定正则表达式的条目。输出写入到指定的output目录。hadooplocalhost hadoop-1.2.1$ mkdirinputhadooplocalhost hadoop-1.2.1$ cpconf/*.xmlinputhadooplocalhost hadoop-1.2.1$ bin/hadoopjarhadoop-examples-1.2.1.jar grepinputoutputdfsa-z.+$catoutput/*注：语法不理解没关系看下面进一步说明显示结果1dfsadmin2.Hadoop伪分布式模式部署1.下载hadoop-1.2.1.tar.gz文件。2.运行命令 tar zxvf hadoop-1.2.1.tar.gz将hadoop解压到自己喜欢的目录下（我的解压在/usr/local/目录下）3.编辑hadoop-1.2.1目录下的conf/hadoop-env.sh文件，将其中的JAVA_HOME配置为自己的jdk目录（如我的为：JAVA_HOME=/usr/local/jdk1.7.0_60）4.编辑配置文件注：以前的版本是hadoop-site.xml,可hadoop在0.20版本，配置文件由以前的hadoop-site.xml文件变成三个配置文件core-site.xml,hdfs-site.xml,mapred-site.xml.内在的原因是因为hadoop代码量越来越宠大，拆解成三个大的分支进行独立开发，配置文件也独立了hadooplocalhost hadoop-1.2.1$ vim conf/core-site.xml:hdfs:/localhost:9000hadooplocalhost hadoop-1.2.1$ vim conf/hdfs-site.xml:dfs.replication1hadooplocalhost hadoop-1.2.1$ vim conf/mapred-site.xml:mapred.job.trackerlocalhost:90015.免密码ssh设置现在确认能否不输入口令就用ssh登录localhost:hadooplocalhost hadoop-1.2.1$ sshlocalhost如果不输入口令就无法用ssh登陆localhost，执行下面的命令：hadooplocalhost hadoop-1.2.1$ ssh-keygen-tdsa-P-f/.ssh/id_dsahadooplocalhost hadoop-1.2.1$ cat/.ssh/id_dsa.pub/.ssh/authorized_keys6.执行首先使用hadoop命令对HadoopFileSystem(HDFS)进行格式化。首先，请求namenode对DFS文件系统进行格式化。在安装过程中完成了这个步骤，但是了解是否需要生成干净的文件系统是有用的。hadooplocalhost hadoop-1.2.1$ bin/hadoopnamenode -format接下来，启动Hadoop守护进程。启动Hadoop守护进程：hadooplocalhost hadoop-1.2.1$ bin/start-all.sh注：1）Hadoop守护进程的日志写入到$HADOOP_LOG_DIR目录(默认是$HADOOP_HOME/logs)2）启动hadoop，但ssh端口不是默认的22怎么样？好在它可以配置。在conf/hadoop-env.sh里改下。如：exportHADOOP_SSH_OPTS=-p1234浏览NameNode和JobTracker的网络接口，它们的地址默认为：NameNode-http:/localhost:50070/JobTracker-http:/localhost:50030/将输入文件拷贝到分布式文件系统：hadooplocalhost hadoop-1.2.1$ bin/hadoopfs-putconfinput运行发行版提供的示例程序：hadooplocalhost hadoop-1.2.1$ bin/hadoopjarhadoop-examples-1.2.1.jargrepinputoutputdfsa-z.+查看输出文件：将输出文件从分布式文件系统拷贝到本地文件系统查看：hadooplocalhost hadoop-1.2.1$ bin/hadoopfs-getoutputoutput$catoutput/*或者在分布式文件系统上查看输出文件：hadooplocalhost hadoop-1.2.1$ bin/hadoopfs-catoutput/*完成全部操作后，停止守护进程：hadooplocalhost hadoop-1.2.1$ bin/stop-all.shHadoop在这个伪分布式配置中启动5个守护进程：namenode、secondarynamenode、datanode、jobtracker和tasktracker。在启动每个守护进程时，会看到一些相关信息（指出存储日志的位置）。每个守护进程都在后台运行。图1说明完成启动之后伪分布式配置的架构。图1.伪分布式Hadoop配置3.Hadoop集群搭建用了三台服务器作了hadoop集群的部署测试，服务器有29（下面简称129）30（下面简称130）31（下面简称131）架构规化如下：1）129作为NameNode，SecondaryNameNode，JobTracker；2）130和131作为DataNode,TaskTracker1. 创建hadoop用户在三台linux服务器上分别创建用户名为hadoop的用户：rootlocalhost # useradd hadoop设置密码：rootlocalhost # passwd hadoop添加hadoop用户使用vim、vi等命令的权限：rootlocalhost # vim /etc/sudoers root ALL=(ALL) ALLhadoop ALL=(ALL) ALL此处保存是可能需要使用:wq!强制保存。2. 配置无密码登录以hadoop用户名登陆名称节点（129）执行hadooplocalhost $ ssh-keygen -t rsa然后一路回车，完毕后生成文件.ssh/id_rsa.pub按以下步骤执行命令：hadooplocalhost $ cd .ssh/hadooplocalhost .ssh$ cp id_rsa.pub authorized_keyshadooplocalhost .ssh$ ssh localhostLast login: Mon Nov 24 17:09:56 2014 from localhosthadooplocalhost $ 如果不需要密码则直接登陆进去的话，就达到要求；否则需检查authorized_keys的权限，看是否为644（-rw-r-r-）。然后执行命令hadooplocalhost $ ssh-copy-id -i hadoop30hadooplocalhost $ ssh 30如果不需要密码能登陆成功30，则ssh的配置结束。同理执行hadooplocalhost $ ssh-copy-id -i hadoop31hadooplocalhost $ ssh 31如果不需要密码能登陆成功31，则ssh的配置结束。（免密码登录也可以按以下方式操作：以hadoop用户登陆数据节点服务器（130，131），创建.ssh目录，并给与600权限（chmod600.ssh）;再把名称129服务器上的authorized_keys复制到目录数据节点（130，131）./ssh,注意权限和目录结构跟名称节点保持一致，然后再从名称129节点用ssh登陆数据节点（130，131），如果不需要密码能登陆成功，则ssh的配置结束。）3.hadoop软件安装及集群部署1）下载hadoop-1.2.1.tar.gz文件。2）运行命令 tar zxvf hadoop-1.2.1.tar.gz将hadoop解压到自己喜欢的目录下（我的解压在/usr/local/目录下）3）编辑hadoop-1.2.1目录下的conf/hadoop-env.sh文件，将其中的JAVA_HOME配置为自己的jdk目录（如我的为：JAVA_HOME=/usr/local/jdk1.7.0_60）4)修改masters和slaves配置修改文件/usr/local/hadoop-1.2.1/conf/slaves及/usr/local/hadoop-1.2.1/conf/masters,把数据节点的主机名加到slaves、名称节点主机名加到masters。可以加多个，每行一个。注意主机名需要在每个服务器的/etc/hosts映射好。在129上执行hadooplocalhost hadoop-1.2.1$ vi conf/slaves3031在130、131上执行hadooplocalhost hadoop-1.2.1$ vi conf/masters 295)Master(129)配置129为master结点，则129的配置文件如下：hadoop在0.20版本，配置文件由以前的hadoop-site.xml文件变成三个配置文件core-site.xml，hdfs-site.xml，mapred-site.xml。内在的原因是因为hadoop代码量越来越宠大，拆解成三个大的分支进行独立开发，配置文件也独立了。下面是129三个配置文件示例：hadooplocalhost hadoop-1.2.1$ cat conf/core-site.xml hdfs:/29:9000hadoop.tmp.dir/data/hadoopData/root/tmp/hadoophadooplocalhost hadoop-1.2.1$ cat conf/hdfs-site.xml dfs.replication 3hadooplocalhost hadoop-1.2.1$ cat conf/mapred-site.xml mapred.job.tracker29:90016）Slave(130、131上)配置在Slave(130、131上)上的配置文件如下(hdfs-site.xml不需要配置)：conf/core-site.xml 、conf/mapred-site.xml文件与129上的相同至此hadoop集群部署完成。4.初始化和启动hadoop集群4.1初始化文件系统初始化namenode,为HDFS作第一次运行的准备。hadooplocalhost hadoop-1.2.1$ bin/hadoopnamenode -format注：一看到format就和磁盘格式化联想到一起，然后这个format是对hdfs来说的，所以有些人害怕真的是格式化自己的文件系统了，其实大可不必担心,namenodeformat只是初始化一些目录和文件而已。4.2启动Hadoop在master结点配置用户环境变量,在master结点29上启动hadoop集群程序，执行bin目录下的start-all.shhadooplocalhost hadoop-1.2.1$ bin/start-all.sh停止hadoophadooplocalhost hadoop-1.2.1$ bin/stop-all.sh4.3测试在hdfs上创建test1文件夹，上传文件到此目录下hadooplocalhost hadoop-1.2.1$ bin/hadoopfs-mkdirtest1hadooplocalhost hadoop-1.2.1$ bin/hadoopfs-put./README.txttest1hadooplocalhost hadoop-1.2.1$ bin/hadoop fs -lsFound1itemsdrwxr-xr-x-hadoopsupergroup02011-07-2119:58/user/hadoop/test1运行一个map-reduce示例程序wordcount，运行结果如下：hadooplocalhost hadoop-1.2.1$ hadoop jar hadoop-examples-1.2.1.jar wordcount/user/hadoop/test1/README.txtoutput1查看输出结果文件，这个文件在hdfs上:hadooplocalhost hadoop-1.2.1$ bin/hadoop fs -ls output1hadooplocalhost hadoop-1.2.1$ bin/hadoop fs -cat output1/part-r-000004.4.管理界面与命令29:50070/dfshealth.jsp29:50030/jobtracker.jsp30:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/5.win7本地Eclipse远程管理配置hadoop5.1将hadoop-eclipse-plugin-1.2.1.jar插件添加到本地eclipseplugins目录下5.2将hadoop-1.2.1-bin.tar.gz解压到本地目录（我解压在D:myworkToolshadoop-1.2.1）下5.3重启Eclipse，通过Open Perspective菜单打开Map Reduce视图，如下：选中大象图标，编辑Hadoop配置信息：hadoop.tmp.dir 中的值为hadoop服务器上 conf/core-site.xml中hadoop.tmp.dir的值通过 window-open prespective-resource打开Resource视图，即可看到DFS：这样可以正常的进行HDFS分布式文件系统的管理：上传，删除等操作。为下面测试做准备，需要先建了一个目录 user/root/input2，然后上传两个txt文件到此目录：intput1.txt 对应内容：Hello Hadoop Goodbye Hadoopintput2.txt 对应内容：Hello World Bye WorldHDFS的准备工作好了，下面可以开始测试了。新建简单Mapreduce项目通过向导新建一个Map/Reduce项目。在此过程中，点击配置Hadoop的安装路径。新建一个测试类WordCountTest：package com.hadoop.learn.test; import java.io.IOException;import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;import org.apache.log4j.Logger; /* * 运行测试程序 * * author yongboy * date 2012-04-16 */public class WordCountTest private static final Logger log = Logger.getLogger(WordCountTest.class); public static class TokenizerMapper extendsMapper private final static IntWritable one = new IntWritable(1);private Text word = new Text(); public void map(Object key, Text value, Context context)throws IOException, InterruptedException (Map key : + key);(Map value : + value);StringTokenizer itr = new StringTokenizer(value.toString();while (itr.hasMoreTokens() String wordStr = itr.nextToken();word.set(wordStr);(Map word : + wordStr);context.write(word, one); public static class IntSumReducer extendsReducer private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable values,Context context) throws IOException, InterruptedException (Reduce key : + key);(Reduce value : + values);int sum = 0;for (IntWritable val : values) sum += val.get();result.set(sum);(Reduce sum : + sum);context.write(key, result); public static void main(String args) throws Exception Configuration conf = new Configuration();String otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();if (otherArgs.length != 2) System.err.println(Usage: WordCountTest );System.exit(2); Job job = new Job(conf, word count);job.setJarByClass(WordCountTest.class); job.setMapperClass(TokenizerMapper.class);job.setCombinerClass(IntSumReducer.class);job.setReducerClass(IntSumReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs0);FileOutputFormat.setOutputPath(job, new Path(otherArgs1); System.exit(job.waitForCompletion(true) ? 0 : 1);右键，选择“Run Configurations”,弹出窗口，点击“Arguments”选项卡,在“Program argumetns”处预先输入参数:hdfs:/29:9000/user/hadoop/test1 hdfs:/29:9000/user/hadoop/output3“VM argumetns”处输入参数-DHADOOP_USER_NAME=hadoop（VM argumetns如果不设置参数可能会有权限的问题Permission denied,如果你的hadoop账号为hadooptest，或者xxx 则此处设置为-DHADOOP_USER_NAME=hadooptest、-DHADOOP_USER_NAME=xxx网上也有其他解决方案网上的解决方案：修改hdfs-core.xml配置文件，关闭权限验证。dfs.permissionsfalse）如图：备注：参数为了在本地调试使用，而非真实环境。然后，点击“Apply”，然后“Close”。现在可以右键，选择“Run on Hadoop”，运行。但此时会出现类似异常信息：12/04/24 15:32:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform. using builtin-java classes where applicable12/04/24 15:32:44 ERROR security.UserGroupInformation: PriviledgedActionException as:Administrator cause:java.io.IOException: Failed to set permissions of path: tmphadoop-AdministratormapredstagingAdministrator-519341271.staging to 0700Exception in thread main java.io.IOException: Failed to set permissions of path: tmphadoop-AdministratormapredstagingAdministrator-519341271.staging to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:682) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:655) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850) at org.apache.hadoop.mapreduce.Job.submit(Job.java:500) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530) at com.hadoop.learn.test.WordCountTest.main(WordCountTest.java:85)这个是Windows下文件权限问题，在Linux下可以正常运行，不存在这样的问题。解决方法是，修改/hadoop-1.0.2/src/core/org/apache/hadoop/fs/FileUtil.java里面的checkReturnValue，注释掉即可（有些粗暴，在Window下，可以不用检查）：/* * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * License); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * /licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an AS IS BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */package org.apache.hadoop.fs;import java.io.*;import java.util.Enumeration;import java.util.zip.ZipEntry;import java.util.zip.ZipFile;import mons.logging.Log;import mons.logging.LogFactory;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.permission.FsAction;import org.apache.hadoop.fs.permission.FsPermission;import org.apache.hadoop.io.IOUtils;import org.apache.hadoop.io.nativeio.NativeIO;import org.apache.hadoop.util.StringUtils;import org.apache.hadoop.util.Shell;import org.apache.hadoop.util.Shell.ShellCommandExecutor;/* * A collection of file-processing util methods */public class FileUtil private static final Log LOG = LogFactory.getLog(FileUtil.class); /* * convert an array of FileStatus to an array of Path * * param stats * an array of FileStatus objects * return an array of paths corresponding to the input */ public static Path stat2Paths(FileStatus stats) if (stats = null) return null; Path ret = new Pathstats.length; for (int i = 0; i stats.length; +i) reti = statsi.getPath(); return ret; /* * convert an array of FileStatus to an array of Path. * If stats if null, return path * param stats * an array of FileStatus objects * param path * default path to return in stats is null * return an array of paths corresponding to the input */ public static Path stat2Paths(FileStatus stats, Path path) if (stats = null) return new Pathpath; else return stat2Paths(stats); /* * Delete a directory and all its contents. If * we return false, the directory may be partially-deleted. */ public static boolean fullyDelete(File dir) throws IOException if (!fullyDeleteContents(dir) return false; return dir.delete(); /* * Delete the contents of a directory, not the directory itself. If * we return false, the directory may be partially-deleted. */ public static boolean fullyDeleteContents(File dir) throws IOException boolean deletionSucceeded =

人人文库> 全部分类> 教育资料 > 课设设计

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

hadoop单机部署、集群部署及win7本地Eclipse远程配置管理.doc

文档简介

温馨提示

最新文档

评论

hadoop单机部署、集群部署及win7本地Eclipse远程配置管理.doc

文档简介

温馨提示

最新文档

评论

相关文档