megenginelite.network

class LiteOptions[source]

the inference options which can optimize the network forwarding performance

Variables
  • weight_preprocess – is the option which optimize the inference performance with processing the weights of the network ahead

  • fuse_preprocess – fuse preprocess patten, like astype + pad_channel + dimshuffle

  • fake_next_exec – whether only to perform non-computing tasks (like memory allocation and queue initialization) for next exec. This will be reset to false when the graph is executed.

  • var_sanity_check_first_run – Disable var sanity check on the first run. Var sanity check is enabled on the first-time execution by default, and can be used to find some potential memory access errors in the operator

  • const_shape – used to reduce memory usage and improve performance since some static inference data structures can be omitted and some operators can be compute before forwarding

  • force_dynamic_alloc – force dynamic allocate memory for all vars

  • force_output_dynamic_alloc – force dynamic allocate memory for output tensor which are used as the input of CallbackCaller Operator

  • no_profiling_on_shape_change – do not re-profile to select best implement algo when input shape changes (use previous algo)

  • jit_level

    Execute supported operators with JIT, please check with MGB_JIT_BACKEND for more details, this value indicates JIT level:

    level 1: for JIT execute with basic elemwise operator

    level 2: for JIT execute elemwise and reduce operators

  • record_level

    flags to optimize the inference performance with record the kernel tasks in first run, hereafter the inference all need is to execute the recorded tasks.

    level = 0 means the normal inference

    level = 1 means use record inference

    level = 2 means record inference with free the extra memory

  • graph_opt_level

    network optimization level:

    0: disable

    1: level-1: inplace arith transformations during graph construction

    2: level-2: level-1, plus global optimization before graph compiling

    3: also enable JIT

  • async_exec_level

    level of dispatch on separate threads for different comp_node.

    0: do not perform async dispatch

    1: dispatch async if there are more than one comp node with limited queue

    mask 0b10: async if there are multiple comp nodes with

    mask 0b100: always async

Examples

from megenginelite import *
options = LiteOptions()
options.weight_preprocess = true
options.record_level = 1
options.fuse_preprocess = true
async_exec_level

Structure/Union member

comp_node_seq_record_level

Structure/Union member

const_shape

Structure/Union member

enable_nchw32

Structure/Union member

enable_nchw4

Structure/Union member

enable_nchw44

Structure/Union member

enable_nchw44_dot

Structure/Union member

enable_nchw64

Structure/Union member

enable_nchw88

Structure/Union member

enable_nhwcd4

Structure/Union member

fake_next_exec

Structure/Union member

force_dynamic_alloc

Structure/Union member

force_output_dynamic_alloc

Structure/Union member

force_output_use_user_specified_memory

Structure/Union member

fuse_preprocess

Structure/Union member

graph_opt_level

Structure/Union member

jit_level

Structure/Union member

no_profiling_on_shape_change

Structure/Union member

var_sanity_check_first_run

Structure/Union member

weight_preprocess

Structure/Union member

class LiteConfig(device_type=LiteDeviceType.LITE_CPU, option=None)[source]

Configuration when load and compile a network

Variables
  • has_compression – flag whether the model is compressed, the compress method is stored in the model

  • device_id – configure the device id of a network

  • device_type – configure the device type of a network

  • backend – configure the inference backend of a network, now only support megengine

  • bare_model_cryption_name – is the bare model encryption method name, bare model is not packed with json information, this encryption method name is useful to decrypt the encrypted bare model

  • options – configuration of Options

  • auto_optimize_inference – lite will detect the device information add set the options heuristically

  • discrete_input_name – configure which input is composed of discrete multiple tensors

Examples

from megenginelite import *
config = LiteConfig()
config.has_compression = False
config.device_type = LiteDeviceType.LITE_CPU
config.backend = LiteBackend.LITE_DEFAULT
config.bare_model_cryption_name = "AES_default".encode("utf-8")
config.auto_optimize_inference = False
auto_optimize_inference

Structure/Union member

backend

Structure/Union member

property bare_model_cryption_name
device_id

Structure/Union member

device_type

Structure/Union member

discrete_input_name

Structure/Union member

has_compression

Structure/Union member

options

Structure/Union member

class LiteIO(name, is_host=True, io_type=LiteIOType.LITE_IO_VALUE, layout=None)[source]

config the network input and output item, the input and output tensor information will describe there

Variables
  • name – the tensor name in the graph corresponding to the IO is_host: Used to mark where the input tensor comes from and where the output tensor will copy to, if is_host is true, the input is from host and output copy to host, otherwise in device. Sometimes the input is from device and output no need copy to host, default is true.

  • io_type – The IO type, it can be SHAPE or VALUE, when SHAPE is set, the input or output tensor value is invaid, only shape will be set, default is VALUE

  • config_layout – The layout of the config from user, if other layout is set before forward or get after forward, this layout will by pass. if no other layout is set before forward, this layout will work. if this layout is no set, the model will forward with its origin layout. if in output, it will used to check.

Note

if other layout is set to input tensor before forwarding, this layout will not work

if no layout is set before forwarding, the model will forward with its origin layout

if layout is set in output tensor, it will used to check whether the layout computed from the network is correct

Examples

from megenginelite import *
io = LiteIO(
    "data2",
    is_host=True,
    io_type=LiteIOType.LITE_IO_SHAPE,
    layout=LiteLayout([2, 4, 4]),
)
config_layout

Structure/Union member

io_type

Structure/Union member

is_host

Structure/Union member

property name

get the name of IO item

class LiteNetworkIO(inputs=None, outputs=None)[source]

the input and output information when load the network for user the NetworkIO will remain in the network until the network is destroyed.

Variables
  • inputs – The all input tensors information that will configure to the network

  • outputs – The all output tensors information that will configure to the network

Examples

from megenginelite import *
input_io = LiteIO("data", is_host=False, io_type=LiteIOType.LITE_IO_VALUE)
io = LiteNetworkIO()
io.add_input(input_io)
output_io = LiteIO("out", is_host=True, layout=LiteLayout([1, 1000]))
io.add_output(output_io)
add_input(obj, is_host=True, io_type=LiteIOType.LITE_IO_VALUE, layout=None)[source]

add input information into LiteNetworkIO

add_output(obj, is_host=True, io_type=LiteIOType.LITE_IO_VALUE, layout=None)[source]

add output information into LiteNetworkIO

class LiteNetwork(config=None, io=None)[source]

the network to load a model and forward

Examples

from megenginelite import *
config = LiteConfig()
config.device_type = LiteDeviceType.LITE_CPU
network = LiteNetwork(config)
network.load("model_path")

input_name = network.get_input_name(0)
input_tensor = network.get_io_tensor(input_name)
output_name = network.get_output_name(0)
output_tensor = network.get_io_tensor(output_name)

input_tensor.set_data_by_copy(input_data)

network.forward()
network.wait()
async_with_callback(async_callback)[source]

set the network forwarding in async mode and set the AsyncCallback callback function

Parameters

async_callback – the callback to set for network

property device_id

get the device id

Returns

the device id of current network used

dump_layout_transform_model(model_file)[source]

dump network after global layout transform optimization to the specific path

Parameters

model_file – the file path to dump model

enable_cpu_inplace_mode()[source]

set cpu forward in inplace mode with which cpu forward only create one thread

Note

this must be set before the network loaded

enable_global_layout_transform()[source]

set global layout transform optimization for network, global layout optimization can auto determine the layout of every operator in the network by profile, thus it can improve the performance of the network forwarding

enable_profile_performance(profile_file)[source]

enable get the network performance profiled information and save into given file

Parameters

profile_file – the file to save profile information

extra_configure(extra_config)[source]

Extra Configuration to the network.

forward()[source]

forward the network with filled input data and fill the output data to the output tensor

get_all_input_name()[source]

get all the input tensor name in the network

Returns

the names of all input tesor in the network

get_all_output_name()[source]

get all the output tensor name in the network

Returns

the names of all output tesor in the network

get_discrete_tensor(name, n_idx, phase=LiteTensorPhase.LITE_INPUT)[source]

get the n_idx’th tensor in the network input tensors whose input consists of discrete multiple tensors and tensor name is name

Parameters
  • name – the name of input tensor

  • n_idx – the tensor index

  • phase – the type of LiteTensor, this is useful to separate input tensor with the same name

Returns

the tensors with given name and type

get_input_name(index)[source]

get the input name by the index in the network

Parameters

index – the index of the input name

Returns

the name of input tesor with given index

get_io_tensor(name, phase=LiteTensorPhase.LITE_IO)[source]

get input or output tensor by its name

Parameters
  • name – the name of io tensor

  • phase – the type of LiteTensor, this is useful to separate input or output tensor with the same name

Returns

the tensor with given name and type

get_output_name(index)[source]

get the output name by the index in the network

Parameters

index – the index of the output name

Returns

the name of output tesor with given index

get_static_memory_alloc_info(log_dir='logs/test')[source]

get static peak memory info showed by Graph visualization

Parameters

log_dir – the directory to save information log

io_bin_dump(bin_dir)[source]

dump all input/output tensor of all operators to the output file, in binary format, user can use this function to debug compute error

Parameters

bin_dir – the binary file directory

io_txt_dump(txt_file)[source]

dump all input/output tensor of all operators to the output file, in txt format, user can use this function to debug compute error

Parameters

txt_file – the txt file

is_cpu_inplace_mode()[source]

whether the network run in cpu inpalce mode

Returns

if use inpalce mode return True, else return False

load(file)[source]

load network from given file or file object

set_finish_callback(finish_callback)[source]

when the network finish forward, the callback will be called, the finish_callback with param mapping from LiteIO to the corresponding LiteTensor

Parameters

finish_callback – the callback to set for network

set_network_algo_policy(policy, shared_batch_size=0, binary_equal_between_batch=False)[source]

set the network algorithm search policy for fast-run

Parameters
  • shared_batch_size – the batch size used by fastrun, Non-zero value means that fastrun use this batch size regardless of the batch size of the model. Zero means fastrun use batch size of the model

  • binary_equal_between_batch – if the content of each input batch is binary equal,whether the content of each output batch is promised to be equal

set_network_algo_workspace_limit(size_limit)[source]

set the opr workspace limitation in the target network, some opr maybe use large of workspace to get good performance, set workspace limitation can save memory but may influence the performance

Parameters

size_limit – the byte size of workspace limitation

set_start_callback(start_callback)[source]

when the network start forward, the callback will be called, the start_callback with param mapping from LiteIO to the corresponding LiteTensor

Parameters

start_callback – the callback to set for network

share_runtime_memroy(src_network)[source]

share runtime memory with the srouce network

Parameters

src_network – the network to share runtime memory

share_weights_with(src_network)[source]

share weights with the loaded network

Parameters

src_network – the network to share weights

property stream_id

get the stream id

Returns

the value of stream id set for detwork

property threads_number

get the thread number of the netwrok

Returns

the number of thread set in the network

use_tensorrt()[source]

use TensorRT

Note

this must be set before the network loaded

wait()[source]

wait until forward finish in sync model