环境信息:

  • 192.168.1.200 hadoop000 (NN,DN,RM,NM,HIVE)
  • 192.168.1.201 hadoop001 (DN)
  • 192.168.1.202 hadoop002 (DN)
  • 192.168.1.109 mysql000 (Hive使用的数据库)

准备工作

修改hostname

CentOS修改hostname hostnamectl set-hostname xxx

修改hosts

切换root用户,将三个节点的信息配置到/etc/hosts 特别注意

  1. 需要给本机也配置一个别名,比如上图是hadoop000机器,他的IP为200
  2. localhost的配置一定要放在最后,这点特别要注意,否则可能造成无法连接的异常。

配置免密登录

namenode:免密登录所有datanode datanode:免密登录自己

1
2
ssh-keygen -t RSA
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@localhost

测试: ssh hadoop@localhost 登录成功则说明配置成功。

集群搭建

为了保证配置的一致性,在搭建集群环境的时候,最好在一个机器上配置好,然后通过scp命令同步到其他的节点上去。

配置环境变量

1
2
3
4
5
export JAVA_HOME=/home/hadoop/app/jdk1.8.0_162
export PATH=$JAVA_HOME/bin:$PATH

export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.15.1
export PATH=$HADOOP_HOME/bin:$PATH

配置hdfs信息

  • 修改$HADOOP_HOME/etc/hadoop/core-site.xml
1
2
3
4
5
6
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop000:9000</value>
</property>
</configuration>
  • 修改$HADOOP_HOME/etc/hadoop/hadoop-env.sh
1
export JAVA_HOME=/home/hadoop/app/jdk1.8.0_162
  • 修改$HADOOP_HOME/etc/hadoop/hdfs.xml
1
2
3
4
5
6
7
8
9
10
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/app/tmp</value>
</property>
</configuration>
  • 修改$HADOOP_HOME/etc/hadoop/yarn-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/hadoop/app/tmp/nm-local-dir</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop000</value>
</property>
</configuration>
  • 新增$HADOOP_HOME/etc/hadoop/mapred-sit.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
  • 修改$HADOOP_HOME/etc/hadoop/slaves
1
2
3
hadoop000
hadoop001
hadoop002

分发

将配置完成之后的Hadoop安装目录整体复制到其他的节点。

集群启动

NN节点格式化

hadoop namenode -format

启动HDFS

确保各个datanode是允许NameNode通过50070端口访问的 在主节点执行./start-dfs.sh,从节点会自动启动

启动YARN

在主节点执行./start-yarn.sh,从节点会自动启动

验证

  • 在hadoop002上测试PI:

  • 在yarn上查看:

  • hadoop控制台:

其他

测试环境可以关闭防火墙 systemctl stop firewalld.service