lite/network.h¶
-
using lite::StartCallback = std::function<void(const std::unordered_map<std::string, std::pair<IO, std::shared_ptr<Tensor>>>&)>¶
the start/finish callback function type
- Param unordered_map
map from the io tensor name to the pair of the user configuration information and the really input or output tensor.
-
using lite::FinishCallback = std::function<void(const std::unordered_map<std::string, std::pair<IO, std::shared_ptr<Tensor>>>&)>¶
-
using lite::AsyncCallback = std::function<void(void)>¶
the network async callback function type
-
using lite::ThreadAffinityCallback = std::function<void(int thread_id)>¶
the thread affinith callback function type
- Param thread_id
the id of the current thread, the id is a number begin from 0 to (nr_threads - 1), thread id of (nr_threads - 1) is the main worker thread.
-
struct Options¶
the inference options which can optimize the network forwarding performance
- Param weight_preprocess
is the option which optimize the inference performance with processing the weights of the network ahead
- Param fuse_preprocess
fuse preprocess patten, like astype + pad_channel + dimshuffle
- Param fake_next_exec
whether only to perform non-computing tasks (like memory allocation and queue initialization) for next exec. This will be reset to false when the graph is executed.
- Param var_sanity_check_first_run
Disable var sanity check on the first run. Var sanity check is enabled on the first-time execution by default, and can be used to find some potential memory access errors in the operator
- Param const_shape
used to reduce memory usage and improve performance since some static inference data structures can be omitted and some operators can be compute before forwarding
- Param force_dynamic_alloc
force dynamic allocate memory for all vars
- Param force_output_dynamic_alloc
force dynamic allocate memory for output tensor which are used as the input of CallbackCaller Operator
- Param no_profiling_on_shape_change
do not re-profile to select best implement algo when input shape changes (use previous algo)
- Param jit_level
Execute supported operators with JIT (support MLIR, NVRTC). Can only be used on Nvidia GPUs and X86 CPU, this value indicates JIT level: level 1: for JIT execute with basic elemwise operator level 2: for JIT execute elemwise and reduce operators
- Param record_level
flags to optimize the inference performance with record the kernel tasks in first run, hereafter the inference all need is to execute the recorded tasks. level = 0 means the normal inference, level = 1 means use record inference, level = 2 means record inference with free the extra memory
- Param graph_opt_level
network optimization level: 0: disable 1: level-1: inplace arith transformations during graph construction 2: level-2: level-1, plus global optimization before graph compiling 3: also enable JIT
- Param async_exec_level
level of dispatch on separate threads for different comp_node. 0: do not perform async dispatch 1: dispatch async if there are more than one comp node with limited queue mask 0b10: async if there are multiple comp nodes with mask 0b100: always async
Public Members
-
bool weight_preprocess = false¶
-
bool fuse_preprocess = false¶
-
bool fake_next_exec = false¶
-
bool var_sanity_check_first_run = true¶
-
bool const_shape = false¶
-
bool force_dynamic_alloc = false¶
-
bool force_output_dynamic_alloc = false¶
-
bool force_output_use_user_specified_memory = false¶
-
bool no_profiling_on_shape_change = false¶
-
uint8_t jit_level = 0¶
-
uint8_t comp_node_seq_record_level = 0¶
-
uint8_t graph_opt_level = 2¶
-
uint16_t async_exec_level = 1¶
-
bool enable_nchw44 = false¶
layout transform options
-
bool enable_nchw44_dot = false¶
-
bool enable_nchw88 = false¶
-
bool enable_nhwcd4 = false¶
-
bool enable_nchw4 = false¶
-
bool enable_nchw32 = false¶
-
bool enable_nchw64 = false¶
-
struct IO¶
config the network input and output item, the input and output tensor information will describe there
Note
if other layout is set to input tensor before forwarding, this layout will not work
if no layout is set before forwarding, the model will forward with its origin layout
if layout is set in output tensor, it will used to check whether the layout computed from the network is correct
- Param name
the input/output tensor name
- Param is_host
Used to mark where the input tensor comes from and where the output tensor will copy to, if is_host is true, the input is from host and output copy to host, otherwise in device. Sometimes the input is from device and output no need copy to host, default is true.
- Param io_type
The IO type, it can be SHAPE or VALUE, when SHAPE is set, the input or output tensor value is invaid, only shape will be set, default is VALUE
- Param config_layout
The layout of input or output tensor
Public Members
-
std::string name¶
-
bool is_host = true¶
-
LiteIOType io_type = LiteIOType::LITE_IO_VALUE¶
-
struct Config¶
Configuration when load and compile a network.
- Param has_compression
flag whether the model is compressed, the compress method is stored in the model
- Param device_id
configure the device id of a network
- Param device_type
configure the device type of a network
- Param backend
configure the inference backend of a network, now only support megengine
- Param bare_model_cryption_name
is the bare model encryption method name, bare model is not pack json information data inside
- Param options
configuration of Options
Public Members
-
bool has_compression = false¶
-
int device_id = 0¶
-
LiteDeviceType device_type = LiteDeviceType::LITE_CPU¶
-
LiteBackend backend = LiteBackend::LITE_DEFAULT¶
-
std::string bare_model_cryption_name = {}¶
-
struct NetworkIO¶
the input and output information when load the network the NetworkIO will remain in the network until the network is destroyed.
- Param inputs
The all input tensors information that will configure to the network
- Param outputs
The all output tensors information that will configure to the network
-
class Allocator¶
A user-implemented allocator interface, user can register an allocator to the megengine, then all the runtime memory will allocate by this allocator.
Public Functions
-
virtual ~Allocator() = default¶
-
virtual void *allocate(LiteDeviceType device_type, int device_id, size_t size, size_t align) = 0¶
allocate memory of size in the given device with the given align
- Parameters
device_type – the device type the memory will allocate from
device_id – the device id the memory will allocate from
size – the byte size of memory will be allocated
align – the align size require when allocate the memory
-
virtual void free(LiteDeviceType device_type, int device_id, void *ptr) = 0¶
free the memory pointed by ptr in the given device
- Parameters
device_type – the device type the memory will allocate from
device_id – the device id the memory will allocate from
ptr – the memory pointer to be free
-
virtual ~Allocator() = default¶
-
class Network¶
The network is the main class to perform forwarding, which is construct form a model, and implement model load, init, forward, and display some model information.
Constructor
Construct a network with given configuration and IO information
- param config
The configuration to create the network
- param networkio
The NetworkIO to describe the input and output tensor of the network
- friend class NetworkHelper
-
void load_model(void *model_mem, size_t size)¶
load the model form memory
-
void load_model(std::string model_path)¶
load the model from a model path
-
void compute_only_configured_output()¶
only compute the output tensor configured by the IO information
-
std::shared_ptr<Tensor> get_io_tensor(std::string io_name, LiteTensorPhase phase = LiteTensorPhase::LITE_IO)¶
get the network input and output tensor, the layout of which is sync from megengine tensor, when the name of input and output tensor are the same, use LiteTensorPhase to separate them
- Parameters
io_name – the name of the tensor
phase – indicate whether the tensor is input tensor or output tensor, maybe the input tensor name is the same with the output tensor name
-
Network &set_async_callback(const AsyncCallback &async_callback)¶
set the network forwarding in async mode and set the AsyncCallback callback function
-
Network &set_start_callback(const StartCallback &start_callback)¶
set the start forwarding callback function of type StartCallback, which will be execute before forward. this can be used to check network input or dump model inputs for debug
-
Network &set_finish_callback(const FinishCallback &finish_callback)¶
set the finish forwarding callback function of type FinishCallback, which will be execute after forward. this can be used to dump model outputs for debug
-
void forward()¶
forward the network with filled input data and fill the output data to the output tensor
-
void wait()¶
waite until forward finish in sync model
-
std::string get_input_name(size_t index) const¶
get the input tensor name by index
-
std::string get_output_name(size_t index) const¶
get the output tensor name by index
-
std::vector<std::string> get_all_input_name() const¶
get all the input tensor names
-
std::vector<std::string> get_all_output_name() const¶
get all the output tensor names
-
int get_device_id() const¶
get the network forwarding device id
-
int get_stream_id() const¶
get the network stream id
-
void enable_profile_performance(std::string profile_file_path)¶
enable profile the network, a file will be generated to the given path
-
const std::string &get_model_extra_info()¶
get model extra info, the extra information is packed into model by user
-
LiteDeviceType get_device_type() const¶
get the network device type
-
void get_static_memory_alloc_info(const std::string &log_dir = "logs/test") const¶
get static peak memory info showed by Graph visualization
-
void extra_configure(const ExtraConfig &extra_config)¶
the extra configuration
- Parameters
extra_config – the extra configuration to set into the network
Public Functions
-
~Network()¶
-
class Runtime¶
All the runtime configuration function is define in Runtime class, as a static member function.
Public Static Functions
The multithread number setter and getter interface When device is CPU, this interface will set the network running in multi thread mode with the given thread number.
- Parameters
dst_network – the target network to set/get the thread number
nr_threads – the thread number set to the target network
set threads affinity callback
- Parameters
dst_network – the target network to set the thread affinity callback
thread_affinity_callback – the ThreadAffinityCallback callback to set the thread affinity
Set cpu default mode when device is CPU, in some low computation device or single core device, this mode will get good performace.
- Parameters
dst_network – the target network to set/get cpu inplace model
Set the network forwarding use tensorrt.
set opr algorithm selection strategy in the target network
- Parameters
dst_network – the target network to set the algorithm strategy
strategy – the algorithm strategy will set to the network, if multi strategy should set, use | operator can pack them together
shared_batch_size – the batch size used by fast-run, Non-zero value means that fast-run use this batch size regardless of the batch size of the model, if set to zero means fast-run use batch size of the model
binary_equal_between_batch – if set true means if the content of each input batch is binary equal, whether the content of each output batch is promised to be equal, otherwise not
set the opr workspace limitation in the target network, some opr maybe use large of workspace to get good performance, set workspace limitation can save memory but may influence the performance
- Parameters
dst_network – the target network to set/get workspace limitation
workspace_limit – the byte size of workspace limitation
set the network runtime memory Allocator, the Allocator is defined by user, through this method, user can implement a memory pool for network forwarding
- Parameters
dst_network – the target network
user_allocator – the user defined Allocator
share the runtime memory with other network, the weights is not shared
Warning
the src network and the dst network can not execute in simultaneous
- Parameters
dst_network – the target network to share the runtime memory from src_network
src_network – the source network to shared runtime memory to dst_network
dump all input/output tensor of all operators to the output file, in txt format, user can use this function to debug compute error
- Parameters
dst_network – the target network to dump its tensors
io_txt_out_file – the txt file
dump all input/output tensor of all operators to the output file, in binary format, user can use this function to debug compute error
- Parameters
dst_network – the target network to dump its tensors
io_bin_out_dir – the binary file director
load a new network which will share weights with src network, this can reduce memory usage when user want to load the same model multi times
- Parameters
dst_network – the target network to share weights from src_network
src_network – the source network to shared weights to dst_network
set global layout transform optimization for network, global layout optimization can auto determine the layout of every operator in the network by profile, thus it can improve the performance of the network forwarding
dump network after global layout transform optimization to the specific path
-
static NetworkIO get_model_io_info(const std::string &model_path, const Config &config = {})¶
get the model io information before model loaded by model path.