You can log in to any EMR server and run the following command to view task logs:
yarn logs -applicationId application_1507732460084_0057
Note:You need to run this command with the
Hadoop
username. If it is a task of another user, you can add the-appOwner username
parameter.
To view the cause of a task exception, run the following command:
yarn logs -applicationId application_1507732460084_0057|grep -A20 Exception
Cluster computing resources are determined by the following two configuration items in yarn-site.xml
:
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>14745</value>
</property>
By default, cpu-vcores
is equal to number of CPU cores of the server, and memory-mb
is equal to 91% of the memory size of the server. You can adjust them based on your actual needs, but if they are too large, there may be a risk of server failure.
If an out of memory error occurs when you are submitting a MapReduce task or running a SQL script through Hive, fix it by setting the following parameters:
set mapreduce.map.java.opts=-Xmx4096m;
set mapreduce.reduce.java.opts=-Xmx4096m;
The memory parameter can be adjusted based on your actual computation needs. It can also be written in the ~/.hiverc
file in Hive and will be executed automatically when submitted.
Suppose that you need to run a SQL query. If 64 vcores and 128 GB memory are needed for getting the query result in the specified time period, and the business requires 10 concurrencies, then the required resources would be 640 vcores and 1,280 GB memory. If the server specification you are using is 24 cores and 48 GB memory, then you need around 1280 / 48 = 27 servers.
The default query in Hive is as follows:
select * from tablename where a='1' limit 10;
The default query does not start a computation task. You can start a distributed query by adding the set hive.fetch.task.conversion=none
parameter.
An EMR cluster supports the following storage media: HDD local disk, SSD local disk, HDD cloud disk, SSD cloud disk, and COS. You can choose the most appropriate one based on your actual needs:
You can log in to the cluster after enabling the public IP as instructed in Logging in to Clusters.
LDAP authentication is subject to the product version. On v2.3.0 and later, it is supported and enabled by default and cannot be disabled. It is not supported on earlier versions.
No.
At present, there are three releases in an EMR ClickHouse cluster, and ClickHouse has been upgraded in each release. For more information, see Version Overview.
Components can be directly installed in the console.
Yes.
No. If this affects your business, submit a ticket for assistance.
When creating a cluster, you can use bootstrap actions to set a custom bootstrap script to achieve this (after node initialization, before cluster start, or after cluster start). For more information, see Bootstrap Action.
EMR provides six types of clusters for you to choose from based on your business needs. For more information, see Business Evaluation.
Was this page helpful?