Traffic Flow Diagram

Broadcast

When calling SparkContext.broadcast(), the value will be wrapped in Broadcast object, which is created by BroadcastManager through BroadcastFactory. The BlockManager in the driver will save the broadcasted value.

Access Broadcast Value

User codes can get the broadcasted through Broadcast.value(). It will trigger getValue() in Broadcast, which will propagate the call to local BlockManager with current Executor. The red traffic flow in the above diagram indicates how the data is fetched.

The first step is check the broadcasted value in local BlockManager by broadcasted id. Each broadcast value has its unique broadcast id. The StorageLevel for broadcasted value is MEMORY_AND_DISK_SER, meaning the data is serialized and saved into memory if the memory space is enough, or save onto disk if not. So we can see that while reading data from local BlockManager, Spark tries to get data from local through getLocalValues, which will access MemoryStore or DiskStore if it exists.

Once broadcasted data is not available locally, local BlockManager tries to get the location of broadcasted data from Driver / Master. When calling SparkContext.broadcast(), the broadcasted value with its broadcast id is registered into BlockManagerMaster, which keeps records of all registered broadcasted values. When the location is identified, the BlockTransferServer will fetch data from another BlockManager containing the broadcasted value and save it locally. Currently, there is only on implementation for BlockTransferServer - NettyBlockTransferServer.

results matching ""

    No results matching ""