Tuning and Sizing Guidelines for Data Engineering Integration (10.4.x)

The following table lists the minimum and optimal hardware requirements for the Hadoop cluster:

Sandbox Deployment

Basic or Standard Deployment

Advanced Deployment

2.5 - 3.5 GHz

Logical or virtual CPU cores

Total system memory

Local disk space for yarn.nodemanager.local-dirs

DFS block size

HDFS replication factor

Disk capacity

256 GB - 1 TB

Total number of disks for HDFS

Total HDFS capacity per node

At least 14 TB

Number of nodes

Total HDFS capacity on the cluster

Actual HDFS capacity (with replication)

/tmp mount point

Installation disk space requirement

Network bandwidth (Ethernet card)

2 Gbps (bonded channel)

10 Gbps (Ethernet card)

A property in the yarn-site.xml that contains a list of directories to store localized files. You can find the localized file directory in:

$/usercache/$/appcache/application_$

. You can find the work directories of individual containers,

container_$

, as the subdirectories of the localized file directory.

MapR Cluster Recommendation

When you run mappings on the Blaze, Spark, or Hive engine, local cache files are generated under the directory specified in the yarn.nodemanager.local-dirs property in the yarn-site.xml. However, the directory might not contain sufficient disk capacity on a MapR cluster.

To make sure that the directory has sufficient disk capacity, perform the following steps:

Create a volume on HDFS.

Mount the volume through NFS.

Configure the NFS mount location in yarn.nodemanager.local-dirs.