dataproc/Check Port Exhaustion
Verify if the port exhaustion has happened.
Product: Cloud Dataproc
Step Type: COMPOSITE STEP
Description
None
Failure Reason
Log messages related to “{log}” were found on the cluster: {cluster_name}.
Failure Remediation
This issue occurs when Spark jobs cannot find an available port after 1000 retries. CLOSE_WAIT connections are a possible cause. To identify CLOSE_WAIT connections, analyze the netstat output:
- Run
netstat -plant >> open_connections.txt
. - Run
cat open_connections.txt | grep "CLOSE_WAIT"
.
If blocked connections are due to a specific application, restart that application. Alternatively, restart the master node to release the affected connections.
Success Reason
No log messages related to “{log}” were found on the cluster: {cluster_name}.