Hadoop集群搭建
说明: 1-4 所有机器都要做
1. 系统设置
- 修改/etc/hostname 需要重启 shutdown -r now
修改/etc/hosts, 追加各个节点hostname
10.2.35.117 xhb-master 10.2.35.118 xhb-slave-1 10.2.35.119 xhb-slave-2 10.2.35.4 xhb-slave-3
在各节点测试ping xhb-master
添加hadoop用户
useradd -m hadoop -G root -s /bin/bash // -G 附加组root passwd hadoop visudo
visudo 在root (all:all)
行后 复制该行修改为hadoop修改时区
[root@xhb-slave-2 ~]# vim /etc/profile
在文件末增加TZ='Asia/Shanghai'; export TZ
[root@xhb-slave-2 yum.repos.d]# source /etc/profile [root@xhb-slave-2 yum.repos.d]# vim /etc/ntp.conf
替换为server 0.cn.pool.ntp.org server 1.cn.pool.ntp.org server 2.cn.pool.ntp.org server 3.cn.pool.ntp.org
设置启动服务[root@xhb-slave-2 yum.repos.d]# systemctl enable ntpd [root@xhb-slave-2 yum.repos.d]# systemctl start ntpd
修改国内源
cd /etc/yum.repos.d/ mkdir repo-back mv Centos* repo-back/ wget http://mirrors.aliyun.com/repo/Centos-7.repo wget http://mirrors.163.com/.help/CentOS7-Base-163.repo yum clean all yum makecache yum install -y epel-release //添加epel源
2. 使用hadoop用户登录
3. 安装配置ssh
如果修改过hostname后,需要重新生成
- 检查是否已经安装ssh
[hadoop@xhb-master ~]$ rpm -qa |grep ssh openssh-clients-7.4p1-16.el7.x86_64 openssh-7.4p1-16.el7.x86_64 openssh-server-7.4p1-16.el7.x86_64 libssh2-1.4.3-12.el7.x86_64
如果无下面命令安装sudo yum install openssh-clients sudo yum install openssh-server
测试ssh是否可用[hadoop@xhb-master ~]$ ssh localhost The authenticity of host 'localhost (::1)' can't be established. ... Are you sure you want to continue connecting (yes/no)? yes ... hadoop@localhost's password:
ctrl+c终止,用户目录下多出.ssh目录cd .ssh/ [hadoop@xhb-master .ssh]$ ssh-keygen -t rsa ... ...
将得出来的公钥加入授权[hadoop@xhb-master .ssh]$ cat id_rsa.pub >> authorized_keys // 通过授权 [hadoop@xhb-master .ssh]$ chmod 600 ./authorized_keys // 必须设定权限
测试直接登录,如下OK。[hadoop@xhb-master .ssh]$ ssh localhost Last login: Mon May 6 17:59:41 2019 from 10.2.35.19 [hadoop@xhb-master ~]$ exit logout Connection to localhost closed.
4. 安装java环境
centos7最新版安装时选择gnome环境已经自带安装jdk1.8
检查java环境 ``` [hadoop@xhb-master .ssh]$ java -version openjdk version "1.8.0181" OpenJDK Runtime Environment (build 1.8.0181-b13) OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)
[hadoop@xhb-master .ssh]$ yum list installed |grep jdk ... java-1.8.0-openjdk.x8664 1:1.8.0.181-7.b13.el7 @anaconda java-1.8.0-openjdk-headless.x8664 1:1.8.0.181-7.b13.el7 @anaconda
但缺少javac,如果都无也可以通过下面目录进行安装
[hadoop@xhb-master .ssh]$ sudo yum install java-1.8.0-openjdk-devel.x86_64 ```设置java环境变量
[hadoop@xhb-master .ssh]$ cd [hadoop@xhb-master ~]$ vim .bashrc
在文件最后一行加入export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk [hadoop@xhb-master ~]$ source .bashrc
测试是否一致 ``` [hadoop@xhb-master ~]$ $JAVAHOME/bin/java -version openjdk version "1.8.0212" OpenJDK Runtime Environment (build 1.8.0_212-b04) OpenJDK 64-Bit Server VM (build 25.212-b04, mixed mode)[hadoop@xhb-master ~]$ java -version openjdk version "1.8.0212" OpenJDK Runtime Environment (build 1.8.0212-b04) OpenJDK 64-Bit Server VM (build 25.212-b04, mixed mode) ```
5. 在master上安装hadoop2.7.7
新手还是选择稳定老版本 https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz
解压安装到/usr/local/hadoop目录
(因为是windows下载传到linux的所以这里用了root用户)
``` tar -xf hadoop-2.7.7.tar.gz [root@xhb-master data]# cp -r ./hadoop-2.7.7/ /usr/local/hadoop/ [root@xhb-master local]# chown -R hadoop:hadoop ./hadoop/ // 修改目录权限[hadoop@xhb-master hadoop]$ ./bin/hadoop version Hadoop 2.7.7 ... ```
6. 配置master无密码ssh登录slave
- 将master公钥传给slave
[hadoop@xhb-master hadoop]$ scp /home/hadoop/.ssh/id_rsa.pub hadoop@xhb-slave-1:/home/hadoop // 输入slave上hadoop账户密码
在xhb-slave-1上[hadoop@xhb-slave-1 ~]$ ll 总用量 4 -rw-r--r--. 1 hadoop hadoop 399 5月 7 19:23 id_rsa.pub [hadoop@xhb-slave-1 ~]$ cat id_rsa.pub >> .ssh/authorized_keys
master则可以免密码ssh到slave-1[hadoop@xhb-master hadoop]$ ssh xhb-slave-1 Last login: Tue May 7 03:14:47 2019 from localhost [hadoop@xhb-slave-1 ~]$ exit
在其他机器上重复操作[hadoop@xhb-master hadoop]$ scp /home/hadoop/.ssh/id_rsa.pub hadoop@xhb-slave-2:/home/hadoop
[hadoop@xhb-master hadoop]$ scp /home/hadoop/.ssh/id_rsa.pub hadoop@xhb-slave-3:/home/hadoop
7. 在master上配置PATH变量
vim ~/.bashrc
在文件最后一行加入 export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin
[hadoop@xhb-master hadoop]$ source ~/.bashrc
8. 在master上配置hadoop
配置文件在/usr/local/hadoop/etc/hadoop/内
- slave文件
[hadoop@xhb-master hadoop]$ cat slaves localhost
删除localhost
行,让master单纯作为namenode节点. 添加slave主机名[hadoop@xhb-master hadoop]$ cat slaves xhb-slave-1 xhb-slave-2 xhb-slave-3
core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://xhb-master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> <description>Abase for other temporary directories.</description> </property> <property> <name>hadoop.http.staticuser.user</name> <value>hadoop</value> <description> The user name to filter as, on static web filters while rendering content. An example use is the HDFS web UI</description> </property> </configuration>
hdfs-site.xml
<configuration> <property>
<name>dfs.namenode.secondary.http-address</name>
<value>xhb-master:50090</value>
</property> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration>mapred-site.xml
复制出新配置文件
[hadoop@xhb-master hadoop]$ cp mapred-site.xml.template mapred-site.xml
内容<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>xhb-master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>xhb-master:19888</value> </property> </configuration>
yarn-site.xml
<configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>xhb-master</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
9. 将配置好的hadoop打包
打包后直接发送给slaves,slaves不用再配置
[hadoop@xhb-master hadoop]$ cd /usr/local/
[hadoop@xhb-master local]$ tar -cf ~/hadoop.master.tar ./hadoop/
[hadoop@xhb-master ~]$ ll -h
total 329M
-rw-rw-r-- 1 hadoop hadoop 329M May 7 13:26 hadoop.master.tar
[hadoop@xhb-master ~]$ scp ./hadoop.master.tar xhb-slave-1:/home/hadoop
[hadoop@xhb-master ~]$ scp ./hadoop.master.tar xhb-slave-2:/home/hadoop
[hadoop@xhb-master ~]$ scp ./hadoop.master.tar xhb-slave-3:/home/hadoop
在slave上
tar -xf hadoop.master.tar
sudo mv hadoop /usr/local/
10. 关闭所有节点防火墙
[hadoop@xhb-master ~]$ sudo systemctl stop firewalld.service
[hadoop@xhb-master ~]$ sudo systemctl disable firewalld.service
11. 在master上格式化hdfs
首次运行需要执行初始化,之后不需要
[hadoop@xhb-master ~]$ which hdfs
/usr/local/hadoop/bin/hdfs
[hadoop@xhb-master ~]$ hdfs namenode -format
12. 在master上启动hadoop
[hadoop@xhb-master ~]$ start-dfs.sh
Starting namenodes on [xhb-master]
xhb-master: starting namenode, logging to...
xhb-slave-2: starting datanode, logging to ...
xhb-slave-3: starting datanode, logging to ...
xhb-slave-1: starting datanode, logging to ...
Starting secondary namenodes [xhb-master]
xhb-master: starting secondarynamenode, logging to ...
hadoop-hadoop-secondarynamenode-xhb-master.out
[hadoop@xhb-master ~]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to ...
xhb-slave-3: starting nodemanager, logging to ...
xhb-slave-1: starting nodemanager, logging to ...
xhb-slave-2: starting nodemanager, logging to ...
[hadoop@xhb-master ~]$ mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to ...
查看master上的进程
[hadoop@xhb-master hadoop]$ jps
27346 NameNode
27570 SecondaryNameNode
27779 ResourceManager
28199 JobHistoryServer
28301 Jps
查看slave上的进程
[hadoop@xhb-slave-1 hadoop]$ jps
21952 DataNode
22082 NodeManager
22219 Jps
缺少进程则配置有问题.
查看群集状态
[hadoop@xhb-master hadoop]$ hdfs dfsadmin -report
...
-------------------------------------------------
Live datanodes (3):
Name: 10.2.35.4:50010 (xhb-slave-3)
Hostname: xhb-slave-3
...
Name: 10.2.35.119:50010 (xhb-slave-2)
Hostname: xhb-slave-2
...
Last contact: Tue May 07 14:21:10 CST 2019
Name: 10.2.35.118:50010 (xhb-slave-1)
Hostname: xhb-slave-1
...
Last contact: Tue May 07 14:21:10 CST 2019
13. 查看web
hdfs: http://10.2.35.117:50070
mapreduce: http://10.2.35.117:8088
14. 创建目录
创建用户目录
[hadoop@xhb-master hadoop]$ hdfs dfs -mkdir -p /user/hadoop
下面相对路径则会在用户目录下创建[hadoop@xhb-master hadoop]$ hdfs dfs -mkdir input [hadoop@xhb-master hadoop]$ hdfs dfs -mkdir output
input路径为/user/hadoop/input, output同理
上传文件
[hadoop@xhb-master hadoop]$ hdfs dfs -put /data/spark-hdfs.md input
执行wordcount任务
[hadoop@xhb-master hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount input output/1
输入为input
目录,输出为output/1
目录查看输出
[hadoop@xhb-master hadoop]$ hdfs dfs -ls output/1 Found 2 items -rw-r--r-- 3 hadoop supergroup 0 2019-05-07 16:47 output/1/_SUCCESS -rw-r--r-- 3 hadoop supergroup 9323 2019-05-07 16:47 output/1/part-r-00000 [hadoop@xhb-master hadoop]$ hdfs dfs -cat output/1/part-r-00000
下载
hdfs dfs -get output/1/part-r-00000
下载到当前工作目录
15. 停止
stop-yarn.sh
stop-dfs.sh
mr-jobhistory-daemon.sh stop historyserver