megenginelite.network¶

class LiteOptions[source]¶

the inference options which can optimize the network forwarding performance

Variables

weight_preprocess – is the option which optimize the inference performance with processing the weights of the network ahead
fuse_preprocess – fuse preprocess patten, like astype + pad_channel + dimshuffle
fake_next_exec – whether only to perform non-computing tasks (like memory allocation and queue initialization) for next exec. This will be reset to false when the graph is executed.
var_sanity_check_first_run – Disable var sanity check on the first run. Var sanity check is enabled on the first-time execution by default, and can be used to find some potential memory access errors in the operator
const_shape – used to reduce memory usage and improve performance since some static inference data structures can be omitted and some operators can be compute before forwarding
force_dynamic_alloc – force dynamic allocate memory for all vars
force_output_dynamic_alloc – force dynamic allocate memory for output tensor which are used as the input of CallbackCaller Operator
no_profiling_on_shape_change – do not re-profile to select best implement algo when input shape changes (use previous algo)
jit_level –
Execute supported operators with JIT, please check with MGB_JIT_BACKEND for more details, this value indicates JIT level:

level 1: for JIT execute with basic elemwise operator

level 2: for JIT execute elemwise and reduce operators
record_level –
flags to optimize the inference performance with record the kernel tasks in first run, hereafter the inference all need is to execute the recorded tasks.

level = 0 means the normal inference

level = 1 means use record inference

level = 2 means record inference with free the extra memory
graph_opt_level –
network optimization level:

0: disable

1: level-1: inplace arith transformations during graph construction

2: level-2: level-1, plus global optimization before graph compiling

3: also enable JIT
async_exec_level –
level of dispatch on separate threads for different comp_node.

0: do not perform async dispatch

1: dispatch async if there are more than one comp node with limited queue

mask 0b10: async if there are multiple comp nodes with

mask 0b100: always async

Examples

from megenginelite import *
options = LiteOptions()
options.weight_preprocess = true
options.record_level = 1
options.fuse_preprocess = true

async_exec_level¶: Structure/Union member

comp_node_seq_record_level¶: Structure/Union member

const_shape¶: Structure/Union member

enable_nchw32¶: Structure/Union member

enable_nchw4¶: Structure/Union member

enable_nchw44¶: Structure/Union member

enable_nchw44_dot¶: Structure/Union member

enable_nchw64¶: Structure/Union member

enable_nchw88¶: Structure/Union member

enable_nhwcd4¶: Structure/Union member

fake_next_exec¶: Structure/Union member

force_dynamic_alloc¶: Structure/Union member

force_output_dynamic_alloc¶: Structure/Union member

force_output_use_user_specified_memory¶: Structure/Union member

fuse_preprocess¶: Structure/Union member

graph_opt_level¶: Structure/Union member

jit_level¶: Structure/Union member

no_profiling_on_shape_change¶: Structure/Union member

var_sanity_check_first_run¶: Structure/Union member

weight_preprocess¶: Structure/Union member

class LiteConfig(device_type=LiteDeviceType.LITE_CPU, option=None)[source]¶

Configuration when load and compile a network

Variables

has_compression – flag whether the model is compressed, the compress method is stored in the model
device_id – configure the device id of a network
device_type – configure the device type of a network
backend – configure the inference backend of a network, now only support megengine
bare_model_cryption_name – is the bare model encryption method name, bare model is not packed with json information, this encryption method name is useful to decrypt the encrypted bare model
options – configuration of Options
auto_optimize_inference – lite will detect the device information add set the options heuristically
discrete_input_name – configure which input is composed of discrete multiple tensors

Examples

from megenginelite import *
config = LiteConfig()
config.has_compression = False
config.device_type = LiteDeviceType.LITE_CPU
config.backend = LiteBackend.LITE_DEFAULT
config.bare_model_cryption_name = "AES_default".encode("utf-8")
config.auto_optimize_inference = False

auto_optimize_inference¶: Structure/Union member

backend¶: Structure/Union member

property bare_model_cryption_name¶

device_id¶: Structure/Union member

device_type¶: Structure/Union member

discrete_input_name¶: Structure/Union member

has_compression¶: Structure/Union member

options¶: Structure/Union member

class LiteIO(name, is_host=True, io_type=LiteIOType.LITE_IO_VALUE, layout=None)[source]¶

config the network input and output item, the input and output tensor information will describe there

Variables

name – the tensor name in the graph corresponding to the IO is_host: Used to mark where the input tensor comes from and where the output tensor will copy to, if is_host is true, the input is from host and output copy to host, otherwise in device. Sometimes the input is from device and output no need copy to host, default is true.
io_type – The IO type, it can be SHAPE or VALUE, when SHAPE is set, the input or output tensor value is invaid, only shape will be set, default is VALUE
config_layout – The layout of the config from user, if other layout is set before forward or get after forward, this layout will by pass. if no other layout is set before forward, this layout will work. if this layout is no set, the model will forward with its origin layout. if in output, it will used to check.

Note

if other layout is set to input tensor before forwarding, this layout will not work

if no layout is set before forwarding, the model will forward with its origin layout

if layout is set in output tensor, it will used to check whether the layout computed from the network is correct

Examples

from megenginelite import *
io = LiteIO(
    "data2",
    is_host=True,
    io_type=LiteIOType.LITE_IO_SHAPE,
    layout=LiteLayout([2, 4, 4]),
)

config_layout¶: Structure/Union member

io_type¶: Structure/Union member

is_host¶: Structure/Union member

property name¶: get the name of IO item

class LiteNetworkIO(inputs=None, outputs=None)[source]¶

the input and output information when load the network for user the NetworkIO will remain in the network until the network is destroyed.

Variables

inputs – The all input tensors information that will configure to the network
outputs – The all output tensors information that will configure to the network

Examples

from megenginelite import *
input_io = LiteIO("data", is_host=False, io_type=LiteIOType.LITE_IO_VALUE)
io = LiteNetworkIO()
io.add_input(input_io)
output_io = LiteIO("out", is_host=True, layout=LiteLayout([1, 1000]))
io.add_output(output_io)

add_input(obj, is_host=True, io_type=LiteIOType.LITE_IO_VALUE, layout=None)[source]¶: add input information into LiteNetworkIO

add_output(obj, is_host=True, io_type=LiteIOType.LITE_IO_VALUE, layout=None)[source]¶: add output information into LiteNetworkIO

class LiteNetwork(config=None, io=None)[source]¶

the network to load a model and forward

Examples

from megenginelite import *
config = LiteConfig()
config.device_type = LiteDeviceType.LITE_CPU
network = LiteNetwork(config)
network.load("model_path")

input_name = network.get_input_name(0)
input_tensor = network.get_io_tensor(input_name)
output_name = network.get_output_name(0)
output_tensor = network.get_io_tensor(output_name)

input_tensor.set_data_by_copy(input_data)

network.forward()
network.wait()

async_with_callback(async_callback)[source]¶

set the network forwarding in async mode and set the AsyncCallback callback function

Parameters: async_callback – the callback to set for network

property device_id¶

get the device id

Returns: the device id of current network used

dump_layout_transform_model(model_file)[source]¶

dump network after global layout transform optimization to the specific path

Parameters: model_file – the file path to dump model

enable_cpu_inplace_mode()[source]¶: set cpu forward in inplace mode with which cpu forward only create one thread

Note

this must be set before the network loaded

enable_global_layout_transform()[source]¶: set global layout transform optimization for network, global layout optimization can auto determine the layout of every operator in the network by profile, thus it can improve the performance of the network forwarding

enable_profile_performance(profile_file)[source]¶

enable get the network performance profiled information and save into given file

Parameters: profile_file – the file to save profile information

extra_configure(extra_config)[source]¶: Extra Configuration to the network.

forward()[source]¶: forward the network with filled input data and fill the output data to the output tensor

get_all_input_name()[source]¶

get all the input tensor name in the network

Returns: the names of all input tesor in the network

get_all_output_name()[source]¶

get all the output tensor name in the network

Returns: the names of all output tesor in the network

get_discrete_tensor(name, n_idx, phase=LiteTensorPhase.LITE_INPUT)[source]¶

get the n_idx’th tensor in the network input tensors whose input consists of discrete multiple tensors and tensor name is name

Parameters

name – the name of input tensor
n_idx – the tensor index
phase – the type of LiteTensor, this is useful to separate input tensor with the same name

Returns

the tensors with given name and type

get_input_name(index)[source]¶

get the input name by the index in the network

Parameters: index – the index of the input name
Returns: the name of input tesor with given index

get_io_tensor(name, phase=LiteTensorPhase.LITE_IO)[source]¶

get input or output tensor by its name

Parameters

name – the name of io tensor
phase – the type of LiteTensor, this is useful to separate input or output tensor with the same name

Returns

the tensor with given name and type

get_output_name(index)[source]¶

get the output name by the index in the network

Parameters: index – the index of the output name
Returns: the name of output tesor with given index

get_static_memory_alloc_info(log_dir='logs/test')[source]¶

get static peak memory info showed by Graph visualization

Parameters: log_dir – the directory to save information log

io_bin_dump(bin_dir)[source]¶

dump all input/output tensor of all operators to the output file, in binary format, user can use this function to debug compute error

Parameters: bin_dir – the binary file directory

io_txt_dump(txt_file)[source]¶

dump all input/output tensor of all operators to the output file, in txt format, user can use this function to debug compute error

Parameters: txt_file – the txt file

is_cpu_inplace_mode()[source]¶

whether the network run in cpu inpalce mode

Returns: if use inpalce mode return True, else return False

load(file)[source]¶: load network from given file or file object

set_finish_callback(finish_callback)[source]¶

when the network finish forward, the callback will be called, the finish_callback with param mapping from LiteIO to the corresponding LiteTensor

Parameters: finish_callback – the callback to set for network

set_network_algo_policy(policy, shared_batch_size=0, binary_equal_between_batch=False)[source]¶

set the network algorithm search policy for fast-run

Parameters

shared_batch_size – the batch size used by fastrun, Non-zero value means that fastrun use this batch size regardless of the batch size of the model. Zero means fastrun use batch size of the model
binary_equal_between_batch – if the content of each input batch is binary equal,whether the content of each output batch is promised to be equal

set_network_algo_workspace_limit(size_limit)[source]¶

set the opr workspace limitation in the target network, some opr maybe use large of workspace to get good performance, set workspace limitation can save memory but may influence the performance

Parameters: size_limit – the byte size of workspace limitation

set_start_callback(start_callback)[source]¶

when the network start forward, the callback will be called, the start_callback with param mapping from LiteIO to the corresponding LiteTensor

Parameters: start_callback – the callback to set for network

share_runtime_memroy(src_network)[source]¶

share runtime memory with the srouce network

Parameters: src_network – the network to share runtime memory

share_weights_with(src_network)[source]¶

share weights with the loaded network

Parameters: src_network – the network to share weights

property stream_id¶

get the stream id

Returns: the value of stream id set for detwork

property threads_number¶

get the thread number of the netwrok

Returns: the number of thread set in the network

use_tensorrt()[source]¶: use TensorRT

Note

this must be set before the network loaded

wait()[source]¶: wait until forward finish in sync model

megenginelite.tensor

megenginelite.global_setting