版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、一、环境(伪分布式)Centos、虚拟机、hadoop-1.0.1、jdk1.6(这两个的环境变量要设置)、myeclipse8.6安装linux命令如下:先授权安装java1.6chmod +x jdk-6u23-linux-i586.bin安装bin文件./jdk-6u23-linux-i586.binlinux 下设置JAVA 环境变量1.cd /etc/profile.d2.touch java.sh3.在java.sh写入以下内容: vi java.sh#set java_environmentexport JAVA_HOME=/tools/jdk1.6.0_23export CLA
2、SSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/libexport PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$HOME/bin保存是: Esc到命令模式, 敲冒号:, 再wq!(3个字符),就保存退出了, w是保存,q是退出,可单用,!是强制也可以是Shift+ZZ,保存退出的意思4.chmod 777 java.sh -改为可执行文件5.source java.sh -使生效6.javac -看是否成功Hadoop环境变量设置,但是此种方式重新启动后又将失效rootlocalhost # cd
3、tools/hadoop-1.0.1/binrootlocalhost bin#export PATH=$PATH:/tools/hadoop-1.0.1/bin修改配置:ConfigurationUse the following: conf/core-site.xml:<configuration> <property> <name></name> <value>hdfs:/localhost:9000</value> </property></configur
4、ation>conf/hdfs-site.xml:<configuration> <property> <name>dfs.replication</name> <value>1</value> </property></configuration>conf/mapred-site.xml:<configuration> <property> <name>mapred.job.tracker</name> <value>localho
5、st:9001</value> </property></configuration>Setup passphraseless sshNow check that you can ssh to the localhost without a passphrase:$ ssh localhostIf you cannot ssh to localhost without a passphrase, execute the following commands:$ ssh-keygen -t dsa -P '' -f /.ssh/id_dsa
6、160;$ cat /.ssh/id_dsa.pub >> /.ssh/authorized_keysFormat a new distributed-filesystem:$ bin/hadoop namenode -formatStart the hadoop daemons:$ bin/start-all.shThe hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).Browse the web
7、interface for the NameNode and the JobTracker; by default they are available at:· NameNode - http:/localhost:50070/· JobTracker - http:/localhost:50030/Copy the input files into the distributed filesystem:$ bin/hadoop fs -put conf inputRun some of the examples provided:
8、$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfsa-z.+'Examine the output files:Copy the output files from the distributed filesystem to the local filesytem and examine them:$ bin/hadoop fs -get output output $ cat output/*orView the output files on the distributed filesyste
9、m:$ bin/hadoop fs -cat output/*When you're done, stop the daemons with:$ bin/stop-all.sh启动hadooprootlocalhost hadoop-1.0.1# bin/start-all.sh查询启动状态: roothadoop21 hadoop-1.0.1# jps6365 TaskTracker5993 NameNode6241 JobTracker8202 Jps6106 DataNode暂停hadooproothadoop21 bin# ./stop-all.sh no jobtracker
10、 to stophadoop21: no tasktracker to stopno namenode to stophadoop21: no datanode to stoprootlocalhost bin# hadoop fs -mkdir input循环看文件夹rootlocalhost bin# hadoop fs -lsFound 2 items文件夹drwxr-xr-x - root supergroup 0 2012-11-21 21:08 /user/root/input文件夹drwxr-xr-x - root supergroup 0 2012-11-21 20:25 /u
11、ser/root/output文件:-rw-r-r- 1 root supergroup 22 2012-11-21 20:23 /user/root/input文件夹 drwxr-xr-x - root supergroup 0 2012-11-21 20:25 /user/root/outputrootlocalhost bin# hadoop fs -rm input循环看文件夹中的文件rootlocalhost hadoop-1.0.1# hadoop fs -lsrdrwxr-xr-x - root supergroup 0 2012-11-21 21:25 /user/root/i
12、nput-rw-r-r- 1 root supergroup 22 2012-11-21 21:25 /user/root/input/file01drwxr-xr-x - root supergroup 0 2012-11-21 21:20 /user/root/output删除tmp下全部内容roothadoop21 tmp# rm -rf *离开hadoop安全模式hadoop namenode 处在安全模式下bin/hadoop dfsadmin -safemode leave访问hadoop输出日志查询页面1:50030/jobtracker.jsp
13、部署说明:/docs/r1.0.4/single_node_setup.html一个namenode、一个datanode Hosts修改:本机:c:windowssystem32driveretchosts虚拟机:vi etchosts在本机中添加虚拟机映射在虚拟机中修改hostsNamenode重新初始化hadoop namenode format设置IProotlocalhost hadoop-1.0.1# ifconfig eth0 2 netmask uprootlocalhost had
14、oop-1.0.1# setuprootlocalhost hadoop-1.0.1# service iptables status表格:filterChain INPUT (policy ACCEPT)num target prot opt source destination Chain FORWARD (policy ACCEPT)num target prot opt source destination Chain OUTPUT (policy ACCEPT)num target prot opt source destination rootlocalhost hadoop-1.
15、0.1# service iptables stop清除防火墙规则: 确定把 chains 设置为 ACCEPT 策略:filter 确定正在卸载 Iiptables 模块: 确定rootlocalhost conf# vi slavesrootlocalhost conf# rootlocalhost conf# rootlocalhost conf# vi /etc/hostsrootlocalhost conf# cd /rootlocalhost /# cd etc/sysconfig/rootlocalhost sysconfig# lsapmd hidd mkinitrd samb
16、aapm-scripts hsqldb modules saslauthdatd httpd named selinuxauditd hwconf netconsole sendmailauthconfig i18n network smartmontoolsautofs init networking snmpd.optionsbluetooth ip6tables-config network-scripts snmptrapd.optionscbq ipmi nfs spamassassinclock iptables-config nspluginwrapper squidconman
17、 ipvsadm-config ntpd syslogconsole irda pand system-config-netbootcpuspeed irqbalance pm-action system-config-securitylevelcrond kdump prelink system-config-usersdesktop kernel pulse tuxdovecot keyboard raid-check udev-stwdund krb524 rawdevices vncserversfirstboot kudzu readonly-root wpa_supplicantg
18、rub lm_sensors rhn xinetdha luci rsyslogrootlocalhost sysconfig# vi networkrootlocalhost sysconfig# service network restart正在关闭接口 eth0: 确定关闭环回接口: 确定弹出环回接口: 确定弹出界面 eth0: 正在决定 eth0 的 IP 信息.完成。 确定rootlocalhost sysconfig# ifconfigrootlocalhost sysconfig# cd network-scripts/rootlocalhost network-scripts#
19、 lsrootlocalhost network-scripts# vi ifcfg-eth0 rootlocalhost network-scripts# service network restartrootlocalhost network-scripts# ifconfigrootlocalhost network-scripts# ping hadoop21其他rootlocalhost conf# netstat -a | grep 50030tcp 0 0 *:50030 *:* LISTEN rootlocalhost conf# netstat -a | grep 50070
20、tcp 0 0 *:50070 *:* LISTEN rootlocalhost conf# ls -a. hadoop-env.sh mapred-site.xml. perties masterscapacity-scheduler.xml hadoop-policy.xml slavesconfiguration.xsl hdfs-site.xml ssl-client.xml.examplecore-site.xml perties ssl-server.xml.examplefair-scheduler.xml mapred-q
21、ueue-acls.xml taskcontroller.cfgrootlocalhost conf# ls l总计 140-rw-rw-r- 1 root root 7457 2012-02-14 capacity-scheduler.xml-rw-rw-r- 1 root root 535 2012-02-14 configuration.xsl-rw-rw-r- 1 root root 294 11-21 22:58 core-site.xml-rw-rw-r- 1 root root 327 2012-02-14 fair-scheduler.xml-rw-rw-r- 1 root r
22、oot 2232 11-21 05:17 hadoop-env.sh-rw-rw-r- 1 root root 1488 2012-02-14 perties-rw-rw-r- 1 root root 4644 2012-02-14 hadoop-policy.xml-rw-rw-r- 1 root root 276 11-21 05:23 hdfs-site.xml-rw-rw-r- 1 root root 4441 2012-02-14 perties-rw-rw-r- 1 root root 2033 2012-02-14 mapr
23、ed-queue-acls.xml-rw-rw-r- 1 root root 290 11-21 22:59 mapred-site.xml-rw-rw-r- 1 root root 10 2012-02-14 masters-rw-rw-r- 1 root root 10 2012-02-14 slaves-rw-rw-r- 1 root root 1243 2012-02-14 ssl-client.xml.example-rw-rw-r- 1 root root 1195 2012-02-14 ssl-server.xml.example-rw-rw-r- 1 root root 382
24、 2012-02-14 taskcontroller.cfgrootlocalhost conf# vi slaves rootlocalhost conf# vi mastersrootlocalhost bin# jps3799 SecondaryNameNode5477 Jpsrootlocalhost bin# kill -9 3799rootlocalhost bin# rm -r ./logs/*rm:是否删除 一般文件 “./logs/hadoop-root-datanode-hadoop21.log”? yrm:是否删除 一般空文件 “./logs/hadoop-root-da
25、tanode-hadoop21.out”? rm:是否删除 一般文件 “./logs/hadoop-root-datanode-localhost.log”? rm:是否删除 一般空文件 “./logs/hadoop-root-datanode-localhost.out”? rm:是否删除 一般空文件 “./logs/hadoop-root-datanode-localhost.out.1”? rm:是否删除 一般文件 “./logs/hadoop-root-jobtracker-localhost.log”? rm:是否删除 一般空文件 “./logs/hadoop-root-jobtra
26、cker-localhost.out”? rm:是否删除 一般空文件 “./logs/hadoop-root-jobtracker-localhost.out.1”? rm:是否删除 一般空文件 “./logs/hadoop-root-jobtracker-localhost.out.2”? rm:是否删除 一般文件 “./logs/hadoop-root-namenode-localhost.log”? rm:是否删除 一般空文件 “./logs/hadoop-root-namenode-localhost.out”? rm:是否删除 一般空文件 “./logs/hadoop-root-na
27、menode-localhost.out.1”? rm:是否删除 一般空文件 “./logs/hadoop-root-namenode-localhost.out.2”? rm:是否删除 一般空文件 “./logs/hadoop-root-namenode-localhost.out.3”? rm:是否删除 一般空文件 “./logs/hadoop-root-namenode-localhost.out.4”? rm:是否删除 一般文件 “./logs/hadoop-root-secondarynamenode-localhost.log”? rm:是否删除 一般空文件 “./logs/had
28、oop-root-secondarynamenode-localhost.out”? rm:是否删除 一般文件 “./logs/hadoop-root-secondarynamenode-localhost.out.1”? rm:是否删除 一般文件 “./logs/hadoop-root-tasktracker-hadoop21.log”? rm:是否删除 一般空文件 “./logs/hadoop-root-tasktracker-hadoop21.out”? rm:是否删除 一般文件 “./logs/hadoop-root-tasktracker-localhost.log”? rm:是否删
29、除 一般空文件 “./logs/hadoop-root-tasktracker-localhost.out”? rm:是否删除 一般空文件 “./logs/hadoop-root-tasktracker-localhost.out.1”? rm:是否进入目录 “./logs/history”? rm:是否删除 一般文件 “./logs/job_201211212007_0001_conf.xml”? rm:是否删除 一般文件 “./logs/job_201211212007_0003_conf.xml”? rm:是否删除 一般文件 “./logs/job_201211212007_0004_c
30、onf.xml”? rootlocalhost bin# rm -rf ./logs/*rootlocalhost bin#rootlocalhost bin#查看文件在hdfs中分块方式和存储路径:hadoop fsck /cgj/cw_kcmx.csv -files -blocks二、myeclipse8.6安装hadoop插件hadoop-eclipse-plugin-1.0.0.jar安装路径:D:myEclipse8.6dropinshadoopplugins hadoop-eclipse-plugin-1.0.0.jar如果是myeclipse6.5: 则放到D:MyEclipse
31、 6.5eclipseplugins(1)指定hadoop安装包(2)创建hadoop服务创建hadoop服务(3)新建hadoop工程安装过程遇到的问题:An internal error occurred during: "Connecting to DFS Hadoop".org/apache/commons/configuration/Configuration 首先判断:9000 是否可以正常telnet如果不可以:1、 判断是否防火墙没有关闭2、 看端口nestata -ano 发现9000端口是用的ipv6的格式,关闭ipv6格式,重启机器,搞定关闭
32、IPV6#1. 可以通过在sysctl.conf添加下面来禁用ipv6 ,不过并不能使得其它程序默认不开启对ipv6的技持# 编辑 /etc/sysctl.conf,添加如下行net.ipv6.conf.all.disable_ipv6=1# 保存退出,并且重新启动系统 #2. 关闭IPV6# 添加下面两行内容到/etc/modprobe.confalias net-pf-10 off alias ipv6 off# 保存退出,并且重新启动系统
33、。3、 最后在去网上查:三、执行main方法时,对路径没有访问权限Failed to set permissions of path: tmphadoop-adminmapredlocalttprivate to 0700解决此问题有三种方式:1、 在mapreduce工程下面添加如下配置文件:mapred-site.xml2、将上面的配置文件mapred-site.xml添加到hadoop安装目录下3、修改hadoop jar包中的类文件采用修改FileUtil类 checkReturnValue方法代码 重新编译 替换原来的hadoop-core-1.0.0.jar文件 来解决改后的had
34、oop-core-1.0.0.jar下载地址bug /jira/browse/HADOOP-7682四、hadoop安全模式解除报错:node.SafeModeException: Cannot delete /tmp/hadoop-SYSTEM/mapred/system. Name node is in safe mode.The ratio of reported blocks 0.9412 has not reached the threshold 0.9990. Saf
35、e mode will be turned off automatically.解决办法::bin/hadoop dfsadmin -safemode leave (解除安全模式)safemode参数说明:enter - 进入安全模式leave - 强制NameNode离开安全模式get - 返回安全模式是否开启的信息wait - 等待,一直到安全模式结束。五、本机与虚拟机访问不通1、将虚拟集中的IP手动设置为固定IP与本机IP保持一个网段2、关掉虚拟机中的防火墙六、导入jar包版本冲突import org.apache.hadoop.mapred.Partitioner;【正确的】impor
36、t org.apache.hadoop.mapreduce.Partitioner; 【错误的】七、Partitioner使用方法八、统计EXCEL中的其中1列各个KEY数量九、测试HDFS-IO是否通roothadoop21 hadoop-1.0.1# hadoop jar hadoop-test-1.0.1.jar TestDFSIO -write -nrFile 5 -fileSize 100TestDFSIO.0.0.412/11/22 01:39:56 INFO fs.TestDFSIO: nrFiles = 112/11/22 01:39:56 INFO fs.TestDFSIO:
37、 fileSize (MB) = 10012/11/22 01:39:56 INFO fs.TestDFSIO: bufferSize = 100000012/11/22 01:39:57 INFO fs.TestDFSIO: creating control file: 100 mega bytes, 1 files12/11/22 01:39:57 INFO fs.TestDFSIO: created control files for: 1 files12/11/22 01:39:58 INFO mapred.FileInputFormat: Total input paths to p
38、rocess : 112/11/22 01:39:58 INFO mapred.JobClient: Running job: job_201211220133_000312/11/22 01:39:59 INFO mapred.JobClient: map 0% reduce 0%12/11/22 01:40:57 INFO mapred.JobClient: map 100% reduce 0%12/11/22 01:41:24 INFO mapred.JobClient: map 100% reduce 100%12/11/22 01:41:30 INFO mapred.JobClien
39、t: Job complete: job_201211220133_000312/11/22 01:41:30 INFO mapred.JobClient: Counters: 3012/11/22 01:41:30 INFO mapred.JobClient: Job Counters 12/11/22 01:41:30 INFO mapred.JobClient: Launched reduce tasks=112/11/22 01:41:30 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5466612/11/22 01:41:30 INFO mapr
40、ed.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=012/11/22 01:41:30 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=012/11/22 01:41:30 INFO mapred.JobClient: Launched map tasks=112/11/22 01:41:30 INFO mapred.JobClient: Data-local
41、 map tasks=112/11/22 01:41:30 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=2411412/11/22 01:41:30 INFO mapred.JobClient: File Input Format Counters 12/11/22 01:41:30 INFO mapred.JobClient: Bytes Read=11212/11/22 01:41:30 INFO mapred.JobClient: File Output Format Counters 12/11/22 01:41:30 INFO mapred
42、.JobClient: Bytes Written=7612/11/22 01:41:30 INFO mapred.JobClient: FileSystemCounters12/11/22 01:41:30 INFO mapred.JobClient: FILE_BYTES_READ=9212/11/22 01:41:30 INFO mapred.JobClient: HDFS_BYTES_READ=23612/11/22 01:41:30 INFO mapred.JobClient: FILE_BYTES_WRITTEN=4314512/11/22 01:41:30 INFO mapred
43、.JobClient: HDFS_BYTES_WRITTEN=10485767612/11/22 01:41:30 INFO mapred.JobClient: Map-Reduce Framework12/11/22 01:41:30 INFO mapred.JobClient: Map output materialized bytes=9212/11/22 01:41:30 INFO mapred.JobClient: Map input records=112/11/22 01:41:30 INFO mapred.JobClient: Reduce shuffle bytes=9212
44、/11/22 01:41:30 INFO mapred.JobClient: Spilled Records=1012/11/22 01:41:30 INFO mapred.JobClient: Map output bytes=7612/11/22 01:41:30 INFO mapred.JobClient: Total committed heap usage (bytes)11/22 01:41:30 INFO mapred.JobClient: CPU time spent (ms)=897012/11/22 01:41:30 INFO mapred.Job
45、Client: Map input bytes=2612/11/22 01:41:30 INFO mapred.JobClient: SPLIT_RAW_BYTES=12412/11/22 01:41:30 INFO mapred.JobClient: Combine input records=012/11/22 01:41:30 INFO mapred.JobClient: Reduce input records=512/11/22 01:41:30 INFO mapred.JobClient: Reduce input groups=512/11/22 01:41:30 INFO ma
46、pred.JobClient: Combine output records=012/11/22 01:41:30 INFO mapred.JobClient: Physical memory (bytes) snapshot=22906880012/11/22 01:41:30 INFO mapred.JobClient: Reduce output records=512/11/22 01:41:30 INFO mapred.JobClient: Virtual memory (bytes) snapshot=74924441612/11/22 01:41:30 INFO mapred.J
47、obClient: Map output records=512/11/22 01:41:30 INFO fs.TestDFSIO: - TestDFSIO - : write12/11/22 01:41:30 INFO fs.TestDFSIO: Date & time: Thu Nov 22 01:41:30 CST 201212/11/22 01:41:30 INFO fs.TestDFSIO: Number of files: 112/11/22 01:41:30 INFO fs.TestDFSIO: Total MBytes processed: 10012/11/22 01
48、:41:30 INFO fs.TestDFSIO: Throughput mb/sec: 3.274823159549384412/11/22 01:41:30 INFO fs.TestDFSIO: Average IO rate mb/sec: 3.274823188781738312/11/22 01:41:30 INFO fs.TestDFSIO: IO rate std deviation: 7.706685757074053E-412/11/22 01:41:30 INFO fs.TestDFSIO: Test exec time sec: 92.54412/11/22 01:41:
49、30 INFO fs.TestDFSIO:十、节点ID不一致?调整节点?待补充十一、使用DataJoin进行Reduce侧连接多数据源时,发生异常 使用DataJoi n进行Reduce侧连接多数据源时,发生异常:java.lang.RuntimeException: java.lang.NoSuchMethodException: com.hadoop.reducedatajoin.ReduceDataJoin$TaggedWritable.() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.jav
50、a:115)解决法案:You need a default constructor for TaggedWritable (Hadoop uses reflection to create this object, and requires a default constructor (no args).You also have a problem in that your readFields method, you call data.readFields(in) on the writable interface - but has no knowledge of the actual
51、 runtime class of data.I suggest you either write out the data class name before outputting the data object itself, or look into the GenericWritable class (you'll need to extend it to define the set of allowable writable classes that can be used).So you could amend as follows:大概意思是说:你需要为TaggedWr
52、itable提供一个默认的无参数构造方法。您需要一个默认的的构造函数TaggedWritable(Hadoop的使用反射来创建这个对象,需要一个默认的构造函数(无参数)。你也有一个问题,就是你的ReadFields方法,你可写的接口上调用data.readFields(中) - 但没有知识的实际运行时类的数据。我建议你要么写出来的数据类的名称,然后输出的数据对象本身,或寻找到GenericWritable类(你需要扩展它定义一组允许可写的类,可以使用)。所以,你可以修改如下:十二、MapReduce工作机制1、 作业提交向jobtracker请求一个新的作业id检查作业的输出说明计算作业的输入分片将运行作业所需要的资源复制到一个以作业id命名的目录下告知jobtracker作业准备执行2、 作业初始化(jobtracker)Jobtracker做两件事:1、 接受jobclient的请求,初始化job,专门由一个线程负责,每个job都重新起一个线程负责初始化;2、 接受tasktracker的心跳,rpc调用,根据心跳信息向tasktracker传递相应信息包;JobTracker作为一个单独的JVM运行,其运行的main函数
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026年普惠托育示范项目资金申请与绩效目标编制指南
- 2026年光伏治沙一体化生态修复实施方案
- 2026年工业软件从不可用到好用迭代关键技术突破路径
- 2026年现场处置方案及重点岗位应急处置卡
- 2026年养老金融纳入央行专项再贷款支持范围政策解析
- 2026年大模型与智能体底座与载体大脑与手脚协同关系解析
- 2026年老年助浴服务专项职业能力培训与上门助浴机构培育
- 2026年零售银行智能营销大模型客户画像策略
- 2026江苏南京工程学院人才招聘备考题库及参考答案详解【达标题】
- 2026年从材料视角看中国大硅片产业如何突破技术无人区
- 2026年江苏苏锡常镇四市高三一模高考数学试卷(答案详解)
- 第三单元整本书阅读《骆驼祥子》 课件(内嵌视频) 2025-2026学年统编版语文七年级下册
- 医务人员职业暴露防护知识更新培训课件
- 小学四年级科学核心素养国测模拟测试题(含参考答案)
- 2025年事业单位教师招聘考试英语学科专业知识试卷(英语教学课件)试题
- 江苏省无锡市江阴市2023年事业单位考试A类《职业能力倾向测验》临考冲刺试题含解析
- GB/T 5752-2013输送带标志
- GB/T 3146.1-2010工业芳烃及相关物料馏程的测定第1部分:蒸馏法
- GB/T 31087-2014商品煤杂物控制技术要求
- GB/T 30812-2014燃煤电厂用玻璃纤维增强塑料烟道
- 住院医师规范化培训临床技能结业考核体格检查评分表(神经外科)
评论
0/150
提交评论