The following table lists the minimum and optimal hardware requirements for the Hadoop cluster:
Sandbox Deployment
Basic or Standard Deployment
Advanced Deployment
2.5 - 3.5 GHz
Logical or virtual CPU cores
Total system memory
Local disk space for yarn.nodemanager.local-dirs
DFS block size
HDFS replication factor
Disk capacity
256 GB - 1 TB
Total number of disks for HDFS
Total HDFS capacity per node
At least 14 TB
Number of nodes
Total HDFS capacity on the cluster
Actual HDFS capacity (with replication)
Installation disk space requirement
Network bandwidth (Ethernet card)
2 Gbps (bonded channel)
10 Gbps (Ethernet card)
A property in the yarn-site.xml that contains a list of directories to store localized files. You can find the localized file directory in:
$/usercache/$/appcache/application_$
. You can find the work directories of individual containers,
container_$
, as the subdirectories of the localized file directory.
MapR Cluster Recommendation
When you run mappings on the Blaze, Spark, or Hive engine, local cache files are generated under the directory specified in the yarn.nodemanager.local-dirs property in the yarn-site.xml. However, the directory might not contain sufficient disk capacity on a MapR cluster.
To make sure that the directory has sufficient disk capacity, perform the following steps:
Create a volume on HDFS.
Mount the volume through NFS.
Configure the NFS mount location in yarn.nodemanager.local-dirs.