Metric System

SparkEnv, Master, Worker create a MetricSystem while starting. Sources are registered into MetricSystem, which relies on MetricRegistry from Dropwizard's Metric library. Sinks are used to save the metrics from sources.

Metric sources can be configured through metrics.properties. Some useful sources include:

  • StreamingSource: keep tracks of number of received records, complete batches, etc.
  • ExecutorSource: keep tracks of executor's metrics
  • BlockManagerSource: keep tracks of BlockManager metrics

Sinks can also be configured through metrics.properties. Some useful sink include:

  • GraphiteReporter: send metric data to Graphite

Accumulators

We can get Accumulators through SparkContext. Once they are created, they can be used in the RDD's function. While DAGScheduler submits tasks(each task represents a partition in each stage), those accumulators are also serialised into the tasks' binary.

While resources are offered, the Executor will deserialise the task binary. When the deserialiser encounters an Accumulator, in its readObject() function, it will be registered to the Executor. Once the task finishes, the Executor sends completion event to the DAGScheduler, containing the latest Accumulators. Those updated Accumulators are merged to get the aggregated value.

At this point, we can see that, if some task takes a long time to finish, the Accumulators' value will not be updated unless the task finishes.

WebUI

Spark WebUI is based on Jetty Server. While starting SparkContext, SparkUI is created. SparkUI initialises all handlers and start a Jetty Server to handle requests, serving static and dynamic pages.

On spark ui, there are several tabs, including JobsTab, StagesTab, StorageTab, EnvironmentTab, ExecutorsTab. Each tab contains several pages. For example, for JobsTab, it has AllJobsPage and JobPage. While creating tabs, pages are attached to current tab. And later, those tabs are attached to the spark ui. During attaching, the server handlers are created and set into Jetty Server.

results matching ""

    No results matching ""