Defined in File comp_node.h
abstraction of a streaming computing resource on localhost (a thread on CPU, a cuda stream, etc.)
Note that most of the operations are asynchronous with respect to the caller thread
computing device type
for “xpu” comp node that would mapped to available cn on current system
Whether computing recorder is supported on this comp node (i.e. whether non-zero comp_node_seq_record_level is allowed)
Whether dynamic memory allocation is supported in seq recorder. If this flag is not setted, ComputingSequence::do_execute() would skip the warm up and allow seq recorder to start immediately
Whether the capacity of the asynchronous execution queue on this comp node is limited. If this flag is set, tasks on multiple comp nodes would be dispatched from multiple cpu threads.
Whether this comp node supports copy stream, so computation and I/O can be parallelized
Destructing an event is unsafe if the comp node is not synchronized; setting this flag would cause computing sequence to sync the comp node in its dtor.
CompNode is available even there is no thread support, i.e. MGB_HAVE_THREAD=0. Usually this means that execution on the CompNode is synchronous, i.e. behaves like cpu:default
Whether this comp node supports unified address. i.e. CPU and CUDA supports unified address.
allocate memory on this computing node
Note: allocation of device memory is synchronous with the host, meaning that the memory can be used immediately; however deallocation is asynchronous to ensure that the memory can be used by already-launched kernels on the computing node.
Exception should be raised if allocation fails.
deallocate device buffer; see alloc_device() for more details
allocate memory on host that is associated with the device, which may accelerate I/O
Both allocation and deallocation on host are synchronous.
copy from underlying device to host
copy from host to underlying device
copy from this device to another device; would use the computing resource on dest_node
src: source memory that must be allocated on this device
get alignment requiement in bytes; guaranteed to be power of 2
get the size of the paddings which must be reserved at the end of memory chunk; guaranteed to be power of 2
wait for an event created on another CompNode
block host thread to wait for all previous operations on this computing node to finish
get id of underlying memory node; comp nodes that share the same mem node can access memory allocated by each other.
get total and free memory on the computing device in bytes
change to another stream on the same memory node
get string representation of physical device
get string representation of logical device
get the physical locator that created this comp node
get the logical locator that created this comp node
get device type of this comp node
MGB_WARN_UNUSED_RESULT std::unique_ptr< MegBrainError > check_async_error () const
check for error on the asynchronous computing stream
This is used for devices with limited error handling such as CUDA.
It will return MegBrainError with error messages rather than directly throw exception; return nullptr if no error.
create a CompNodeSeqRecorder associated with this computing node
Note: the implementation must be thread safe: simultaneous calls to create_seq_recorder() must block until existing CompNodeSeqRecorder objects are either destructed or stopped.
the recorder object; nullptr is returned if recording is not supported
insert callback into current compute stream. The callack is to be called after all currently enqueued iterms in the stream have completed. And the later tasks in the stream must wait for the callback to finish.
Public Static Functions
manually destroy all comp node resources
load a computing node from logical locator ID;
create a CompNode object from logical locator
release consecutive free chunks on all devices to defragment; see DevMemAlloc::try_coalesce_free
synchronize all computing nodes
apply function to each initialized comp node
get total number of specific devices on this system
get default CPU comp node
set whether to enable affinity setting for CPU comp nodes
If enabled, computation on cpux would be bound to the x’th CPU.
This is disabled by default.
(implemented in comp_node/cpu/comp_node.cpp)
Public Static Attributes
implementations are allocated statically, so no memory management is needed
event associated with a CompNode node, used for cross-device synchronization
record this event on the comp node that creates it
Note that if a comp node is recorded multiple times, then subsequent calls would overwrite its internal state and other methods that examine the status would only examine the completion of the most recent call to record().
whether this event has finished; it must has been recorded
block the host thread (caller thread) to wait for this event
get elapsed time in seconds from this to another event; the events must be finished
record an action on another comp node so it would wait for this event
get the comp node to which this event is associated
flags when this event is created
set CPU resource usage level when performing synchronization
level: CPU waiting level: 0. condition var (the default)
busy wait with yield
Protected Static Attributes
pool of events that can be reused
assert that all allocated events have been freed
memory free might be called after finalize(); so we should not rely on virtual function for this
an identifier to specify a computing node
Note: logical locator is directly parsed from a string identifier given by user; it should be translated to physical locator by calling to_physical() before actual use.
Unless explicitly specified otherwise, all locators are physical locators.
get corresponding physical Locator
DeviceType::UNSPEC would be resolved, and device map would be applied on device number
get string description of this locator that can be parsed again
corresponding to a physical computing device; memories between different devices are not shared.
device == -1 means logical default device (maps to 0 by default, and can be changed by set_device_map)
multiple streams can execute on one computing device and share memory, when compnode type is multithread the field also stand for nr_threads
parse a string identifier
currently supported ID format: (gpu|cpu)<n>[:m] where n is the device number, possibly with m as the stream id.
set mapping between device numbers of a device type
set the actual device type to be used for DeviceType::UNSPEC
special device number for the “cpu default” comp node, which dispatches all tasks in the caller thread
special device number for the “multithread_default” comp node, which dispatches all tasks to thread pool and the caller thread is the main thread of thread pool
predefined special streams