Hadoop集群搭建

说明: 1-4 所有机器都要做

参考 http://dblab.xmu.edu.cn/blog/install-hadoop-cluster/

1. 系统设置

  • 修改/etc/hostname 需要重启 shutdown -r now
  • 修改/etc/hosts, 追加各个节点hostname 10.2.35.117 xhb-master 10.2.35.118 xhb-slave-1 10.2.35.119 xhb-slave-2 10.2.35.4 xhb-slave-3 在各节点测试 ping xhb-master

  • 添加hadoop用户 useradd -m hadoop -G root -s /bin/bash // -G 附加组root passwd hadoop visudo visudo 在root (all:all) 行后 复制该行修改为hadoop

  • 修改时区 [root@xhb-slave-2 ~]# vim /etc/profile 在文件末增加 TZ='Asia/Shanghai'; export TZ [root@xhb-slave-2 yum.repos.d]# source /etc/profile [root@xhb-slave-2 yum.repos.d]# vim /etc/ntp.conf 替换为 server 0.cn.pool.ntp.org server 1.cn.pool.ntp.org server 2.cn.pool.ntp.org server 3.cn.pool.ntp.org 设置启动服务 [root@xhb-slave-2 yum.repos.d]# systemctl enable ntpd [root@xhb-slave-2 yum.repos.d]# systemctl start ntpd

  • 修改国内源 cd /etc/yum.repos.d/ mkdir repo-back mv Centos* repo-back/ wget http://mirrors.aliyun.com/repo/Centos-7.repo wget http://mirrors.163.com/.help/CentOS7-Base-163.repo yum clean all yum makecache yum install -y epel-release //添加epel源

2. 使用hadoop用户登录

3. 安装配置ssh

如果修改过hostname后,需要重新生成

  • 检查是否已经安装ssh [hadoop@xhb-master ~]$ rpm -qa |grep ssh openssh-clients-7.4p1-16.el7.x86_64 openssh-7.4p1-16.el7.x86_64 openssh-server-7.4p1-16.el7.x86_64 libssh2-1.4.3-12.el7.x86_64 如果无下面命令安装 sudo yum install openssh-clients sudo yum install openssh-server 测试ssh是否可用 [hadoop@xhb-master ~]$ ssh localhost The authenticity of host 'localhost (::1)' can't be established. ... Are you sure you want to continue connecting (yes/no)? yes ... hadoop@localhost's password: ctrl+c终止,用户目录下多出.ssh目录 cd .ssh/ [hadoop@xhb-master .ssh]$ ssh-keygen -t rsa ... ... 将得出来的公钥加入授权 [hadoop@xhb-master .ssh]$ cat id_rsa.pub >> authorized_keys // 通过授权 [hadoop@xhb-master .ssh]$ chmod 600 ./authorized_keys // 必须设定权限 测试直接登录,如下OK。 [hadoop@xhb-master .ssh]$ ssh localhost Last login: Mon May 6 17:59:41 2019 from 10.2.35.19 [hadoop@xhb-master ~]$ exit logout Connection to localhost closed.

4. 安装java环境

centos7最新版安装时选择gnome环境已经自带安装jdk1.8

  • 检查java环境 ``` [hadoop@xhb-master .ssh]$ java -version openjdk version "1.8.0181" OpenJDK Runtime Environment (build 1.8.0181-b13) OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)

    [hadoop@xhb-master .ssh]$ yum list installed |grep jdk ... java-1.8.0-openjdk.x8664 1:1.8.0.181-7.b13.el7 @anaconda java-1.8.0-openjdk-headless.x8664 1:1.8.0.181-7.b13.el7 @anaconda 但缺少javac,如果都无也可以通过下面目录进行安装 [hadoop@xhb-master .ssh]$ sudo yum install java-1.8.0-openjdk-devel.x86_64 ```

  • 设置java环境变量 [hadoop@xhb-master .ssh]$ cd [hadoop@xhb-master ~]$ vim .bashrc 在文件最后一行加入 export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk [hadoop@xhb-master ~]$ source .bashrc 测试是否一致 ``` [hadoop@xhb-master ~]$ $JAVAHOME/bin/java -version openjdk version "1.8.0212" OpenJDK Runtime Environment (build 1.8.0_212-b04) OpenJDK 64-Bit Server VM (build 25.212-b04, mixed mode)

    [hadoop@xhb-master ~]$ java -version openjdk version "1.8.0212" OpenJDK Runtime Environment (build 1.8.0212-b04) OpenJDK 64-Bit Server VM (build 25.212-b04, mixed mode) ```

5. 在master上安装hadoop2.7.7

新手还是选择稳定老版本 https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz

  • 解压安装到/usr/local/hadoop目录 (因为是windows下载传到linux的所以这里用了root用户) ``` tar -xf hadoop-2.7.7.tar.gz [root@xhb-master data]# cp -r ./hadoop-2.7.7/ /usr/local/hadoop/ [root@xhb-master local]# chown -R hadoop:hadoop ./hadoop/ // 修改目录权限

    [hadoop@xhb-master hadoop]$ ./bin/hadoop version Hadoop 2.7.7 ... ```

6. 配置master无密码ssh登录slave

  • 将master公钥传给slave [hadoop@xhb-master hadoop]$ scp /home/hadoop/.ssh/id_rsa.pub hadoop@xhb-slave-1:/home/hadoop // 输入slave上hadoop账户密码 在xhb-slave-1上 [hadoop@xhb-slave-1 ~]$ ll 总用量 4 -rw-r--r--. 1 hadoop hadoop 399 5月 7 19:23 id_rsa.pub [hadoop@xhb-slave-1 ~]$ cat id_rsa.pub >> .ssh/authorized_keys master则可以免密码ssh到slave-1 [hadoop@xhb-master hadoop]$ ssh xhb-slave-1 Last login: Tue May 7 03:14:47 2019 from localhost [hadoop@xhb-slave-1 ~]$ exit 在其他机器上重复操作 [hadoop@xhb-master hadoop]$ scp /home/hadoop/.ssh/id_rsa.pub hadoop@xhb-slave-2:/home/hadoop
    [hadoop@xhb-master hadoop]$ scp /home/hadoop/.ssh/id_rsa.pub hadoop@xhb-slave-3:/home/hadoop

7. 在master上配置PATH变量

vim ~/.bashrc 在文件最后一行加入 export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin [hadoop@xhb-master hadoop]$ source ~/.bashrc

8. 在master上配置hadoop

配置文件在/usr/local/hadoop/etc/hadoop/内

配置默认值 https://segmentfault.com/a/1190000011832566

  • slave文件 [hadoop@xhb-master hadoop]$ cat slaves localhost 删除localhost行,让master单纯作为namenode节点. 添加slave主机名 [hadoop@xhb-master hadoop]$ cat slaves xhb-slave-1 xhb-slave-2 xhb-slave-3
  • core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://xhb-master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> <description>Abase for other temporary directories.</description> </property> <property> <name>hadoop.http.staticuser.user</name> <value>hadoop</value> <description> The user name to filter as, on static web filters while rendering content. An example use is the HDFS web UI</description> </property> </configuration>

  • hdfs-site.xml <configuration> <property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>xhb-master:50090</value>
    </property> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration>

  • mapred-site.xml

    复制出新配置文件 [hadoop@xhb-master hadoop]$ cp mapred-site.xml.template mapred-site.xml 内容 <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>xhb-master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>xhb-master:19888</value> </property> </configuration>

  • yarn-site.xml <configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>xhb-master</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>

9. 将配置好的hadoop打包

打包后直接发送给slaves,slaves不用再配置

[hadoop@xhb-master hadoop]$ cd /usr/local/ [hadoop@xhb-master local]$ tar -cf ~/hadoop.master.tar ./hadoop/ [hadoop@xhb-master ~]$ ll -h total 329M -rw-rw-r-- 1 hadoop hadoop 329M May 7 13:26 hadoop.master.tar [hadoop@xhb-master ~]$ scp ./hadoop.master.tar xhb-slave-1:/home/hadoop [hadoop@xhb-master ~]$ scp ./hadoop.master.tar xhb-slave-2:/home/hadoop [hadoop@xhb-master ~]$ scp ./hadoop.master.tar xhb-slave-3:/home/hadoop 在slave上 tar -xf hadoop.master.tar sudo mv hadoop /usr/local/

10. 关闭所有节点防火墙

[hadoop@xhb-master ~]$ sudo systemctl stop firewalld.service
[hadoop@xhb-master ~]$ sudo systemctl disable firewalld.service

11. 在master上格式化hdfs

首次运行需要执行初始化,之后不需要

[hadoop@xhb-master ~]$ which hdfs
/usr/local/hadoop/bin/hdfs
[hadoop@xhb-master ~]$ hdfs namenode -format

12. 在master上启动hadoop

[hadoop@xhb-master ~]$ start-dfs.sh
Starting namenodes on [xhb-master]  
xhb-master: starting namenode, logging to...  
xhb-slave-2: starting datanode, logging to ...  
xhb-slave-3: starting datanode, logging to ...  
xhb-slave-1: starting datanode, logging to ...  
Starting secondary namenodes [xhb-master]  
xhb-master: starting secondarynamenode, logging to ...  
hadoop-hadoop-secondarynamenode-xhb-master.out  
[hadoop@xhb-master ~]$ start-yarn.sh
starting yarn daemons  
starting resourcemanager, logging to ...  
xhb-slave-3: starting nodemanager, logging to ...  
xhb-slave-1: starting nodemanager, logging to ...  
xhb-slave-2: starting nodemanager, logging to ...  
[hadoop@xhb-master ~]$ mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to ...  

查看master上的进程

[hadoop@xhb-master hadoop]$ jps
27346 NameNode  
27570 SecondaryNameNode  
27779 ResourceManager  
28199 JobHistoryServer  
28301 Jps  

查看slave上的进程

[hadoop@xhb-slave-1 hadoop]$ jps
21952 DataNode  
22082 NodeManager  
22219 Jps  

缺少进程则配置有问题.

查看群集状态

[hadoop@xhb-master hadoop]$ hdfs dfsadmin -report
...
-------------------------------------------------
Live datanodes (3):

Name: 10.2.35.4:50010 (xhb-slave-3)  
Hostname: xhb-slave-3  
...


Name: 10.2.35.119:50010 (xhb-slave-2)  
Hostname: xhb-slave-2  
...
Last contact: Tue May 07 14:21:10 CST 2019


Name: 10.2.35.118:50010 (xhb-slave-1)  
Hostname: xhb-slave-1  
...
Last contact: Tue May 07 14:21:10 CST 2019  

13. 查看web

hdfs: http://10.2.35.117:50070

mapreduce: http://10.2.35.117:8088

14. 创建目录

  • 创建用户目录 [hadoop@xhb-master hadoop]$ hdfs dfs -mkdir -p /user/hadoop 下面相对路径则会在用户目录下创建 [hadoop@xhb-master hadoop]$ hdfs dfs -mkdir input [hadoop@xhb-master hadoop]$ hdfs dfs -mkdir output input路径为/user/hadoop/input, output同理

  • 上传文件 [hadoop@xhb-master hadoop]$ hdfs dfs -put /data/spark-hdfs.md input

  • 执行wordcount任务 [hadoop@xhb-master hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount input output/1 输入为input目录,输出为output/1目录

  • 查看输出 [hadoop@xhb-master hadoop]$ hdfs dfs -ls output/1 Found 2 items -rw-r--r-- 3 hadoop supergroup 0 2019-05-07 16:47 output/1/_SUCCESS -rw-r--r-- 3 hadoop supergroup 9323 2019-05-07 16:47 output/1/part-r-00000 [hadoop@xhb-master hadoop]$ hdfs dfs -cat output/1/part-r-00000

  • 下载 hdfs dfs -get output/1/part-r-00000 下载到当前工作目录

15. 停止

stop-yarn.sh  
stop-dfs.sh  
mr-jobhistory-daemon.sh stop historyserver