Off-heap

By setting spark.memory.offHeap.enabled to true, Spark can leverage off-heap memory.

Parameters

spark.memory.offHeap.enabled: Enable off-heap

spark.memory.offHeap.size: size of off heap memory

spark.memory.storageFraction: fraction of immune memory used by block manager storage

off heap memory used by execution is (off-heap_memory * (1 - storage_fraction))

spark.executor.memory: executor's memory size

spark.memory.fraction: fraction of memory used by

MemoryManager

UnifiedMemoryManager: The storage and execution memory can be shared.

  • max heap memory: Runtime.getRuntime.maxMemory * spark.memory.fraction
  • max on heap storage memory: max heap memory * spark.memory.storageFraction

StaticMemoryManager: Unlike UnifiedMemoryManager, the storage and execution memory can not be shared.

  • max heap memory: Runtime.getRuntime.maxMemory * spark.shuffle.memoryFraction * spark.shuffle.safetyFraction
  • max on heap storage memory: Runtime.getRuntime.maxMemory * spark.storage.memoryFraction * spark.storage.safetyFraction

spark.storage.unrollMemoryThreshold: Initial memory to request before unrolling any block

Unrolling Block(Result Iterator)

While RDDs are executed, generating a bunch of result data, the result data may save into memory through BlockManager. The way to save result is called unrolling the result iterator step by step, meaning Spark tries to allocate some memory(spark.storage.unrollMemoryThreshold) to put some of data in result iterator. If the memory is not enough, another memory allocation request will be issued, until all data in result iterator has allocated memory or memory is not sufficient.

Then if memory is not enough and using disk is enabled, all the data in result iterator will be saved into DiskStore, instead of MemoryStore.

results matching ""

    No results matching ""