Job Submission

We can submit and run a spark job by spark-submit script in bin/.Actually, this script will call Main class in Launcher module. Main class is responsible for creating the java command and its parameters, including the entry class to submit spark jobs. For example, if we use spark-submit to launch a job, org.apache.spark.deploy.SparkSubmit is used to finish the task.

SparkSubmit also has the main function. If the action is to submit a job, SparkSubmit class will decide which client to use to submit the job based on the master url and running mode. For standalone cluster, org.apache.spark.deploy.rest.RestSubmissionClient is used to submit the job. For yarn cluster, org.apache.spark.deploy.yarn.Client takes the responsibility. In Mesos cluster, only REST API can submit job to the master. After having initialized the running mode, runMain in SparkSubmit tries to submit the job. For example, in YARN cluster mode, the run method in org.apache.spark.deploy.yarn.Client is invoked to submit the application to YARN Resource Manager, which will allocate a node for the application master - spark driver. org.apache.spark.deploy.yarn.ApplicationMaster is the main entry for YARN Application Master. Like other YARN applications, all parameters / env / GC arguments will be set up, forming a launching command. The launching command will be executed by YARN node-manager to start the application master, a.k.a, spark driver.

In different running mode, different classes and clients will be used to submit the job. After the submission, it is driver's role to schedule the job. In the follow chapter, I will use YARN Application Master to illustrate how spark job runs in the YARN cluster.

results matching ""

    No results matching ""