Amazon Linux Ami에서 dhclient-script 대체 ntp 무시

우리는 아마존 웹 서비스 'aws'에서 emr 클러스터를 사용하고 있습니다. 사용자 지정없이 기본 'Amazon Linux AMI'이미지를 사용하고 있습니다. dhclient-script가 회사 dhcp (동적 호스트 구성 프로토콜), 특히 ntp (네트워크 시간 프로토콜)에서 구성을 가져 오는 것으로 보입니다.

마스터 노드의 예로서, dhclient-script는 회사 ntp 서버를 /etc/ntp.conf파일에 추가 합니다.

[hadoop@ip-10-5-21-157 ~]$ grep ^server /etc/ntp.conf 
server 0.amazon.pool.ntp.org iburst
server 1.amazon.pool.ntp.org iburst
server 2.amazon.pool.ntp.org iburst
server 3.amazon.pool.ntp.org iburst
server 10.2.78.21   # added by /sbin/dhclient-script
server 10.2.78.22   # added by /sbin/dhclient-script
server 10.2.78.23   # added by /sbin/dhclient-script
server 10.2.78.24   # added by /sbin/dhclient-script

IP 주소는 10.2.78.21-24로 해결됩니다. clockNN.ntp.mycompany.com

아마존의 기본 설정을 사용하기 위해 어떻게 이것을 피할 수 있습니까?

편집 emr 클러스터에서 돼지 집계를 실행하는 동안 문제가 발생했습니다. 예외 스택 추적의 예는 다음과 같습니다.

18/01/07 13:50:23 INFO tez.TezJob: DAG Status: status=FAILED, progress=TotalTasks: 4737 Succeeded: 3777 Running: 0 Failed: 1 Killed: 959 FailedTaskAttempts: 428 KilledTaskAttempts: 309, diagnostics=Vertex failed, vertexName=scope-421, vertexId=vertex_1515326570070_0001_1_04, diagnostics=[Task failed, taskId=task_1515326570070_0001_1_04_002846, diagnostics=[TaskAttempt 0 failed, info=[Container launch failed for container_1515326570070_0001_01_000599 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1515332813920 found 1515330236564
Note: System times on machines may be out of sync. Check system time and time zones.
       at sun.reflect.GeneratedConstructorAccessor51.newInstance(Unknown Source)
       at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
       at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
       at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
       at org.apache.tez.dag.app.launcher.TezContainerLauncherImpl$Container.launch(TezContainerLauncherImpl.java:160)
       at org.apache.tez.dag.app.launcher.TezContainerLauncherImpl$EventProcessor.run(TezContainerLauncherImpl.java:353)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       at java.lang.Thread.run(Thread.java:745)

emr 시스템 (vm, 이미지, 노드?) 시스템 시간이 꺼져있는 근본 원인은 회사 DNS 서버 일 수 있습니다. (이것은 거친 추측입니다.)이 가능성을 제거하는 한 가지 아이디어는 /etc/ntp.conf 파일에서 해당 ntp 서버를 제거하고 시스템 시간을 다시 초기화하는 것입니다.

— 허치
소스

약간의 연구 끝에, 나는 다음을 생각해 냈습니다.

modify_ntp_config.shS3에서 생성 된 파일 :

#!/bin/bash

set -eEu

ntp_config_file="${1:-example_config}"
echo "Removing 'server 10.*' entries from \"$ntp_config_file\""
sudo sed -i -e '/server 10.*/d' $ntp_config_file
echo "Reinitialize ntp"
sudo service ntpd stop
sudo ntpdate -s time.nist.gov
sudo service ntpd start

이 파일을 s3에 복사했습니다.

$ aws s3 cp /var/tmp/modify_ntp_config.sh \
    s3://<s3-bucket-name>/data/scripts/modify_ntp_config.sh

그런 다음 aws-tools:

aws emr create-cluster --name "..." [...cluster create options ...] \
    --steps \
Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,\
Jar=s3://<region>.elasticmapreduce/libs/script-runner/script-runner.jar,\
Args=["s3://<s3-bucket-name>/data/scripts/modify_ntp_config.sh","/etc/ntp.conf"]

다음 로그 출력 결과 (s3에서 localdisk로 복사 됨)

$ aws s3 cp --recursive s3://<s3-bucket-name>/log/<cluster-id>/steps/<step-id>/ /var/tmp/5HKO7
download: s3://[...]/stdout.gz to ../../var/tmp/5HKO7/stdout.gz
download: s3://[...]/stderr.gz to ../../var/tmp/5HKO7/stderr.gz
download: s3://[...]/controller.gz to ../../var/tmp/5HKO7/controller.gz

$ zcat /var/tmp/5HKO7/stdout.gz 
Downloading 's3://<s3-bucket-name>/data/scripts/modify_ntp_config.sh' to '/mnt/var/lib/hadoop/steps/[...]/.'
Removing 'server 10.*' entries from "/etc/ntp.conf"
Reinitialize ntp
Shutting down ntpd: [  OK  ]
Starting ntpd: [  OK  ]

$ zcat /var/tmp/5HKO7/stderr.gz 
Command exiting with ret '0'

참고 : 또 다른 방법은을 사용하여 이미 실행중인 emr 클러스터에서 사용하는 것 aws emr add-steps입니다.

$ aws emr add-steps --cluster-id "j-<emr_cluster_id>"\
--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,\
Jar=s3://<region>.elasticmapreduce/libs/script-runner/script-runner.jar,\
Args=["s3://<s3-bucket-name>/data/scripts/modify_ntp_config.sh","/etc/ntp.conf"]

참조 : https://docs.aws.amazon.com/emr/latest/DeveloperGuide//emr-hadoop-script.html https://docs.aws.amazon.com/cli/latest/reference/emr/add- steps.html https://askubuntu.com/questions/254826/how-to-force-a-clock-update-using-ntp https://unix.stackexchange.com/questions/158802/how-to-update- ntp-out-shutting-down-the-ntp- 데몬

— 허치
소스