VisionTransform

class VisionTransform(order=None)[source]

Base class of all transforms used in computer vision. Calling logic: apply_batch() -> apply() -> _apply_image() and other _apply_*() method. If you want to implement a self-defined transform method for image, rewrite _apply_image method in subclass.

Parameters

order

input type order. Input is a tuple containing different structures, order is used to specify the order of structures. For example, if your input is (image, boxes) type, then the order should be (“image”, “boxes”). Current available strings and data type are describe below:

  • ”image”: input image, with shape of (H, W, C).

  • ”coords”: coordinates, with shape of (N, 2).

  • ”boxes”: bounding boxes, with shape of (N, 4), “xyxy” format, the 1st “xy” represents top left point of a box, the 2nd “xy” represents right bottom point.

  • ”mask”: map used for segmentation, with shape of (H, W, 1).

  • ”keypoints”: keypoints with shape of (N, K, 3), N for number of instances, and K for number of keypoints in one instance. The first two dimensions of last axis is coordinate of keypoints and the the 3rd dimension is the label of keypoints.

  • ”polygons”: a sequence containing numpy arrays, its length is the number of instances. Each numpy array represents polygon coordinate of one instance.

  • ”category”: categories for some data type. For example, “image_category” means category of the input image and “boxes_category” means categories of bounding boxes.

  • ”info”: information for images such as image shapes and image path.

You can also customize your data types only if you implement the corresponding _apply_*() methods, otherwise NotImplementedError will be raised.

apply(input)[source]

Apply transform on single input data.

apply_batch(inputs)[source]

Apply transform on batch input data.