Off-heap
By setting spark.memory.offHeap.enabled to true, Spark can leverage off-heap memory.
Parameters
spark.memory.offHeap.enabled: Enable off-heap
spark.memory.offHeap.size: size of off heap memory
spark.memory.storageFraction: fraction of immune memory used by block manager storage
off heap memory used by execution is (off-heap_memory * (1 - storage_fraction))
spark.executor.memory: executor's memory size
spark.memory.fraction: fraction of memory used by
MemoryManager
UnifiedMemoryManager: The storage and execution memory can be shared.
- max heap memory: Runtime.getRuntime.maxMemory * spark.memory.fraction
- max on heap storage memory: max heap memory * spark.memory.storageFraction
StaticMemoryManager: Unlike UnifiedMemoryManager, the storage and execution memory can not be shared.
- max heap memory: Runtime.getRuntime.maxMemory * spark.shuffle.memoryFraction * spark.shuffle.safetyFraction
- max on heap storage memory: Runtime.getRuntime.maxMemory * spark.storage.memoryFraction * spark.storage.safetyFraction
spark.storage.unrollMemoryThreshold: Initial memory to request before unrolling any block
Unrolling Block(Result Iterator)
While RDDs are executed, generating a bunch of result data, the result data may save into memory through BlockManager. The way to save result is called unrolling the result iterator step by step, meaning Spark tries to allocate some memory(spark.storage.unrollMemoryThreshold) to put some of data in result iterator. If the memory is not enough, another memory allocation request will be issued, until all data in result iterator has allocated memory or memory is not sufficient.
Then if memory is not enough and using disk is enabled, all the data in result iterator will be saved into DiskStore, instead of MemoryStore.