

# 3D Point Cloud Input Data
<a name="sms-point-cloud-input-data"></a>

To create a 3D point cloud labeling job, you must create an input manifest file. Use this topic to learn the formatting requirements of the input manifest file for each task type. To learn about the raw input data formats Ground Truth accepts for 3D point cloud labeling jobs, see the section [Accepted Raw 3D Data Formats](sms-point-cloud-raw-data-types.md).

Use your [labeling job task type](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-point-cloud-task-types.html) to choose a topics on [Input Manifest Files for 3D Point Cloud Labeling Jobs](sms-point-cloud-input-manifest.md) to learn about the formatting requirements for each line of your input manifest file.

**Topics**
+ [Accepted Raw 3D Data Formats](sms-point-cloud-raw-data-types.md)
+ [Input Manifest Files for 3D Point Cloud Labeling Jobs](sms-point-cloud-input-manifest.md)
+ [Understand Coordinate Systems and Sensor Fusion](sms-point-cloud-sensor-fusion-details.md)

# Accepted Raw 3D Data Formats
<a name="sms-point-cloud-raw-data-types"></a>

Ground Truth uses your 3D point cloud data to render a 3D scenes that workers annotate. This section describes the raw data formats that are accepted for point cloud data and sensor fusion data for a point cloud frame. To learn how to create an input manifest file to connect your raw input data files with Ground Truth, see [Input Manifest Files for 3D Point Cloud Labeling Jobs](sms-point-cloud-input-manifest.md).

For each frame, Ground Truth supports Compact Binary Pack Format (.bin) and ASCII (.txt) files. These files contain information about the location (`x`, `y`, and `z` coordinates) of all points that make up that frame, and, optionally, information about the pixel color of each point for colored point clouds. When you create a 3D point cloud labeling job input manifest file, you can specify the format of your raw data in the `format` parameter. 

The following table lists elements that Ground Truth supports in point cloud frame files to describe individual points. 


****  

| Symbol | Value | 
| --- | --- | 
|  `x`  |  The x coordinate of the point.  | 
|  `y`  |  The y coordinate of the point.  | 
|  `z`  |  The z coordinate of the point.  | 
|  `i`  |  The intensity of the point.  | 
|  `r`  |  The red color channel component. An 8-bit value (0-255).  | 
|  `g`  |  The green color channel component. An 8-bit value (0-255)  | 
|  `b`  |  The blue color channel component. An 8-bit value (0-255)  | 

Ground Truth assumes the following about your input data:
+ All of the positional coordinates (x, y, z) are in meters. 
+ All the pose headings (qx, qy, qz, qw) are measured in Spatial [Quaternions](https://en.wikipedia.org/wiki/Quaternions_and_spatial_rotation) .

## Compact Binary Pack Format
<a name="sms-point-cloud-raw-data-cbpf-format"></a>

The Compact Binary Pack Format represents a point cloud as an ordered set of a stream of points. Each point in the stream is an ordered binary pack of 4-byte float values in some variant of the form `xyzirgb`. The `x`, `y`, and `z` elements are required and additional information about that pixel can be included in a variety of ways using `i`, `r`, `g`, and `b`. 

To use a binary file to input point cloud frame data to a Ground Truth 3D point cloud labeling job, enter `binary/` in the `format` parameter for your input manifest file and replace `` with the order of elements in each binary pack. For example, you may enter one of the following for the `format` parameter. 
+ `binary/xyzi` – When you use this format, your point element stream would be in the following order: `x1y1z1i1x2y2z2i2...`
+ `binary/xyzrgb` – When you use this format, your point element stream would be in the following order: `x1y1z1r1g1b1x2y2z2r2g2b2...`
+ `binary/xyzirgb` – When you use this format, your point element stream would be in the following order: `x1y1z1i1r1g1b1x2y2z2i2r2g2b2...`

When you use a binary file for your point cloud frame data, if you do not enter a value for `format`, the default pack format `binary/xyzi` is used. 

## ASCII Format
<a name="sms-point-cloud-raw-data-ascii-format"></a>

The ASCII format uses a text file to represent a point cloud, where each line in the ASCII point cloud file represents a single point. Each point is a line the text file and contains white space separated values, each of which is a 4-byte float ASCII values. The `x`, `y`, and `z` elements are required for each point and additional information about that point can be included in a variety of ways using `i`, `r`, `g`, and `b`.

To use a text file to input point cloud frame data to a Ground Truth 3D point cloud labeling job, enter `text/` in the `format` parameter for your input manifest file and replace `` with the order of point elements on each line. 

For example, if you enter `text/xyzi` for `format`, your text file for each point cloud frame should look similar to the following: 

```
x1 y1 z1 i1
x2 y2 z2 i2
...
...
```

If you enter `text/xyzrgb`, your text file should look similar to the following: 

```
x1 y1 z1 r1 g1 b1
x2 y2 z2 r2 g2 b1
...
...
```

When you use a text file for your point cloud frame data, if you do not enter a value for `format`, the default format `text/xyzi` will be used. 

## Point Cloud Resolution Limits
<a name="sms-point-cloud-resolution"></a>

Ground Truth does not have a resolution limit for 3D point cloud frames. However, we recommend that you limit each point cloud frame to 500K points for optimal performance. When Ground Truth renders the 3D point cloud visualization, it must be viewable on your workers' computers, which depends on workers' computer hardware. Point cloud frames that are larger than 1 million points may not render on standard machines, or may take too long to load. 

# Input Manifest Files for 3D Point Cloud Labeling Jobs
<a name="sms-point-cloud-input-manifest"></a>

When you create a labeling job, you provide an input manifest file where each line of the manifest describes a unit of task to be completed by annotators. The format of your input manifest file depends on your task type. 
+ If you are creating a 3D point cloud **object detection** or **semantic segmentation** labeling job, each line in your input manifest file contains information about a single 3D point cloud frame. This is called a *point cloud frame input manifest*. To learn more, see [Create a Point Cloud Frame Input Manifest File](sms-point-cloud-single-frame-input-data.md). 
+ If you are creating a 3D point cloud **object tracking** labeling job, each line of your input manifest file contains a sequence of 3D point cloud frames and associated data. This is called a *point cloud sequence input manifest*. To learn more, see [Create a Point Cloud Sequence Input Manifest](sms-point-cloud-multi-frame-input-data.md). 

# Create a Point Cloud Frame Input Manifest File
<a name="sms-point-cloud-single-frame-input-data"></a>

The manifest is a UTF-8 encoded file in which each line is a complete and valid JSON object. Each line is delimited by a standard line break, \$1n or \$1r\$1n. Because each line must be a valid JSON object, you can't have unescaped line break characters. In the single-frame input manifest file, each line in the manifest contains data for a single point cloud frame. The point cloud frame data can either be stored in binary or ASCII format (see [Accepted Raw 3D Data Formats](sms-point-cloud-raw-data-types.md)). This is the manifest file formatting required for 3D point cloud object detection and semantic segmentation. Optionally, you can also provide camera sensor fusion data for each point cloud frame. 

Ground Truth supports point cloud and video camera sensor fusion in the [world coordinate system](sms-point-cloud-sensor-fusion-details.md#sms-point-cloud-world-coordinate-system) for all modalities. If you can obtain your 3D sensor extrinsic (like a LiDAR extrinsic), we recommend that you transform 3D point cloud frames into the world coordinate system using the extrinsic. For more information, see [Sensor Fusion](sms-point-cloud-sensor-fusion-details.md#sms-point-cloud-sensor-fusion). 

However, if you cannot obtain a point cloud in world coordinate system, you can provide coordinates in the original coordinate system that the data was captured in. If you are providing camera data for sensor fusion, it is recommended that you provide LiDAR sensor and camera pose in the world coordinate system. 

To create a single-frame input manifest file, you will identify the location of each point cloud frame that you want workers to label using the `source-ref` key. Additionally, you must use the `source-ref-metadata` key to identify the format of your dataset, a timestamp for that frame, and, optionally, sensor fusion data and video camera images.

The following example demonstrates the syntax used for an input manifest file for a single-frame point cloud labeling job. The example includes two point cloud frames. For details about each parameter, see the table following this example. 

**Important**  
Each line in your input manifest file must be in [JSON Lines](http://jsonlines.org/) format. The following code block shows an input manifest file with two JSON objects. Each JSON object is used to point to and provide details about a single point cloud frame. The JSON objects have been expanded for readability, but you must minimize each JSON object to fit on a single line when creating an input manifest file. An example is provided under this code block.

```
{
    "source-ref": "s3://amzn-s3-demo-bucket/examplefolder/frame1.bin",
    "source-ref-metadata":{
        "format": "binary/xyzi",
        "unix-timestamp": 1566861644.759115,
        "ego-vehicle-pose":{
            "position": {
                "x": -2.7161461413869947,
                "y": 116.25822288149078,
                "z": 1.8348751887989483
            },
            "heading": {
                "qx": -0.02111296123795955,
                "qy": -0.006495469416730261,
                "qz": -0.008024565904865688,
                "qw": 0.9997181192298087
            }
        },
        "prefix": "s3://amzn-s3-demo-bucket/lidar_singleframe_dataset/someprefix/",
        "images": [
        {
            "image-path": "images/frame300.bin_camera0.jpg",
            "unix-timestamp": 1566861644.759115,
            "fx": 847.7962624528487,
            "fy": 850.0340893791985,
            "cx": 576.2129134707038,
            "cy": 317.2423573573745,
            "k1": 0,
            "k2": 0,
            "k3": 0,
            "k4": 0,
            "p1": 0,
            "p2": 0,
            "skew": 0,
            "position": {
                "x": -2.2722515189268138,
                "y": 116.86003310568965,
                "z": 1.454614668542299
            },
            "heading": {
                "qx": 0.7594754093069037,
                "qy": 0.02181790885672969,
                "qz": -0.02461725233103356,
                "qw": -0.6496916273040025
            },
            "camera-model": "pinhole"
        }]
    }
}
{
    "source-ref": "s3://amzn-s3-demo-bucket/examplefolder/frame2.bin",
    "source-ref-metadata":{
        "format": "binary/xyzi",
        "unix-timestamp": 1566861632.759133,
        "ego-vehicle-pose":{
            "position": {
                "x": -2.7161461413869947,
                "y": 116.25822288149078,
                "z": 1.8348751887989483
            },
            "heading": {
                "qx": -0.02111296123795955,
                "qy": -0.006495469416730261,
                "qz": -0.008024565904865688,
                "qw": 0.9997181192298087
            }
        },
        "prefix": "s3://amzn-s3-demo-bucket/lidar_singleframe_dataset/someprefix/",
        "images": [
        {
            "image-path": "images/frame300.bin_camera0.jpg",
            "unix-timestamp": 1566861644.759115,
            "fx": 847.7962624528487,
            "fy": 850.0340893791985,
            "cx": 576.2129134707038,
            "cy": 317.2423573573745,
            "k1": 0,
            "k2": 0,
            "k3": 0,
            "k4": 0,
            "p1": 0,
            "p2": 0,
            "skew": 0,
            "position": {
                "x": -2.2722515189268138,
                "y": 116.86003310568965,
                "z": 1.454614668542299
            },
            "heading": {
                "qx": 0.7594754093069037,
                "qy": 0.02181790885672969,
                "qz": -0.02461725233103356,
                "qw": -0.6496916273040025
            },
            "camera-model": "pinhole"
        }]
    }
}
```

When you create an input manifest file, you must collapse your JSON objects to fit on a single line. For example, the code block above would appear as follows in an input manifest file:

```
{"source-ref":"s3://amzn-s3-demo-bucket/examplefolder/frame1.bin","source-ref-metadata":{"format":"binary/xyzi","unix-timestamp":1566861644.759115,"ego-vehicle-pose":{"position":{"x":-2.7161461413869947,"y":116.25822288149078,"z":1.8348751887989483},"heading":{"qx":-0.02111296123795955,"qy":-0.006495469416730261,"qz":-0.008024565904865688,"qw":0.9997181192298087}},"prefix":"s3://amzn-s3-demo-bucket/lidar_singleframe_dataset/someprefix/","images":[{"image-path":"images/frame300.bin_camera0.jpg","unix-timestamp":1566861644.759115,"fx":847.7962624528487,"fy":850.0340893791985,"cx":576.2129134707038,"cy":317.2423573573745,"k1":0,"k2":0,"k3":0,"k4":0,"p1":0,"p2":0,"skew":0,"position":{"x":-2.2722515189268138,"y":116.86003310568965,"z":1.454614668542299},"heading":{"qx":0.7594754093069037,"qy":0.02181790885672969,"qz":-0.02461725233103356,"qw":-0.6496916273040025},"camera-model":"pinhole"}]}}
{"source-ref":"s3://amzn-s3-demo-bucket/examplefolder/frame2.bin","source-ref-metadata":{"format":"binary/xyzi","unix-timestamp":1566861632.759133,"ego-vehicle-pose":{"position":{"x":-2.7161461413869947,"y":116.25822288149078,"z":1.8348751887989483},"heading":{"qx":-0.02111296123795955,"qy":-0.006495469416730261,"qz":-0.008024565904865688,"qw":0.9997181192298087}},"prefix":"s3://amzn-s3-demo-bucket/lidar_singleframe_dataset/someprefix/","images":[{"image-path":"images/frame300.bin_camera0.jpg","unix-timestamp":1566861644.759115,"fx":847.7962624528487,"fy":850.0340893791985,"cx":576.2129134707038,"cy":317.2423573573745,"k1":0,"k2":0,"k3":0,"k4":0,"p1":0,"p2":0,"skew":0,"position":{"x":-2.2722515189268138,"y":116.86003310568965,"z":1.454614668542299},"heading":{"qx":0.7594754093069037,"qy":0.02181790885672969,"qz":-0.02461725233103356,"qw":-0.6496916273040025},"camera-model":"pinhole"}]}}
```

The following table shows the parameters you can include in your input manifest file:


****  

|  Parameter  |  Required  |  Accepted Values  |  Description  | 
| --- | --- | --- | --- | 
|  `source-ref`  |  Yes  |  String **Accepted string value format**:  `s3://<bucket-name>/<folder-name>/point-cloud-frame-file`  |  The Amazon S3 location of a single point cloud frame.  | 
|  `source-ref-metadata`  |  Yes  |  JSON object **Accepted parameters**:  `format`, `unix-timestamp`, `ego-vehicle-pose`, `position`, `prefix`, `images`  |  Use this parameter to include additional information about the point cloud in `source-ref`, and to provide camera data for sensor fusion.   | 
|  `format`  |  No  |  String **Accepted string values**: `"binary/xyz"`, `"binary/xyzi"`, `"binary/xyzrgb"`, `"binary/xyzirgb"`, `"text/xyz"`, `"text/xyzi"`, `"text/xyzrgb"`, `"text/xyzirgb"` **Default Values**:  When the file identified in `source-ref` has a .bin extension, `binary/xyzi` When the file identified in `source-ref` has a .txt extension, `text/xyzi`  |  Use this parameter to specify the format of your point cloud data. For more information, see [Accepted Raw 3D Data Formats](sms-point-cloud-raw-data-types.md).  | 
|  `unix-timestamp`  |  Yes  |  Number A unix timestamp.   |  The unix timestamp is the number of seconds since January 1st, 1970 until the UTC time that the data was collected by a sensor.   | 
|  `ego-vehicle-pose`  |  No  |  JSON object  |  The pose of the device used to collect the point cloud data. For more information about this parameter, see [Include Vehicle Pose Information in Your Input Manifest](#sms-point-cloud-single-frame-ego-vehicle-input).  | 
|  `prefix`  |  No  |  String **Accepted string value format**:  `s3://<bucket-name>/<folder-name>/`  |  The location in Amazon S3 where your metadata, such as camera images, is stored for this frame.  The prefix must end with a forward slash: `/`.  | 
|  `images`  |  No  |  List  |  A list of parameters describing color camera images used for sensor fusion. You can include up to 8 images in this list. For more information about the parameters required for each image, see [Include Camera Data in Your Input Manifest](#sms-point-cloud-single-frame-image-input).   | 

## Include Vehicle Pose Information in Your Input Manifest
<a name="sms-point-cloud-single-frame-ego-vehicle-input"></a>

Use the ego-vehicle location to provide information about the location of the vehicle used to capture point cloud data. Ground Truth use this information to compute LiDAR extrinsic matrix. 

Ground Truth uses extrinsic matrices to project labels to and from the 3D scene and 2D images. For more information, see [Sensor Fusion](sms-point-cloud-sensor-fusion-details.md#sms-point-cloud-sensor-fusion).

The following table provides more information about the `position` and orientation (`heading`) parameters that are required when you provide ego-vehicle information. 


****  

|  Parameter  |  Required  |  Accepted Values  |  Description  | 
| --- | --- | --- | --- | 
|  `position`  |  Yes  |  JSON object **Required Parameters**: `x`, `y`, and `z`. Enter numbers for these parameters.   |  The translation vector of the ego vehicle in the world coordinate system.   | 
|  `heading`  |  Yes  |  JSON Object **Required Parameters**: `qx`, `qy`, `qz`, and `qw`. Enter numbers for these parameters.   |  The orientation of the frame of reference of the device or sensor mounted on the vehicle sensing the surrounding, measured in [quaternions](https://en.wikipedia.org/wiki/Quaternion), (`qx`, `qy`, `qz`, `qw`) in the a coordinate system.  | 

## Include Camera Data in Your Input Manifest
<a name="sms-point-cloud-single-frame-image-input"></a>

If you want to include video camera data with a frame, use the following parameters to provide information about each image. The **Required** column below applies when the `images` parameter is included in the input manifest file under `source-ref-metadata`. You are not required to include images in your input manifest file. 

If you include camera images, you must include information about the camera `position` and `heading` used the capture the images in the world coordinate system.

If your images are distorted, Ground Truth can automatically undistort them using information you provide about the image in your input manifest file, including distortion coefficients (`k1`, `k2`, `k3`, `k4`, `p1`, `p1`), the camera model and the camera intrinsic matrix. The intrinsic matrix is made up of focal length (`fx`, `fy`), and the principal point (`cx`, `cy)`. See [Intrinsic Matrix](sms-point-cloud-sensor-fusion-details.md#sms-point-cloud-intrinsic) to learn how Ground Truth uses the camera intrinsic. If distortion coefficients are not included, Ground Truth will not undistort an image. 


****  

|  Parameter  |  Required  |  Accepted Values  |  Description  | 
| --- | --- | --- | --- | 
|  `image-path`  |  Yes  |  String **Example of format**:  `<folder-name>/<imagefile.png>`  |  The relative location, in Amazon S3 of your image file. This relative path will be appended to the path you specify in `prefix`.   | 
|  `unix-timestamp`  |  Yes  |  Number  |  The unix timestamp is the number of seconds since January 1st, 1970 until the UTC time that the data was collected by a camera.   | 
|  `camera-model`  |  No  |  String: **Accepted Values**: `"pinhole"`, `"fisheye"` **Default**: `"pinhole"`  |  The model of the camera used to capture the image. This information is used to undistort camera images.   | 
|  `fx, fy`  |  Yes  |  Numbers  |  The focal length of the camera, in the x (`fx`) and y (`fy`) directions.  | 
|  `cx, cy`  |  Yes  | Numbers |  The x (`cx`) and y (`cy`) coordinates of the principal point.   | 
|  `k1, k2, k3, k4`  |  No  |  Number  |  Radial distortion coefficients. Supported for both **fisheye** and **pinhole** camera models.   | 
|  `p1, p2`  |  No  |  Number  |  Tangential distortion coefficients. Supported for **pinhole** camera models.  | 
|  `skew`  |  No  |  Number  |  A parameter to measure the skew of an image.   | 
|  `position`  |  Yes  |  JSON object **Required Parameters**: `x`, `y`, and `z`. Enter numbers for these parameters.   | The location or origin of the frame of reference of the camera mounted on the vehicle capturing images. | 
|  `heading`  |  Yes  |  JSON Object **Required Parameters**: `qx`, `qy`, `qz`, and `qw`. Enter numbers for these parameters.   |  The orientation of the frame of reference of the camera mounted on the vehicle capturing images, measured using [quaternions](https://en.wikipedia.org/wiki/Quaternion), (`qx`, `qy`, `qz`, `qw`), in the world coordinate system.   | 

## Point Cloud Frame Limits
<a name="sms-point-cloud-single-frame-limits"></a>

You can include up to 100,000 point cloud frames in your input manifest file. 3D point cloud labeling job have longer pre-processing times than other Ground Truth task types. For more information, see [Job pre-processing time](sms-point-cloud-general-information.md#sms-point-cloud-job-creation-time).

# Create a Point Cloud Sequence Input Manifest
<a name="sms-point-cloud-multi-frame-input-data"></a>

The manifest is a UTF-8 encoded file in which each line is a complete and valid JSON object. Each line is delimited by a standard line break, \$1n or \$1r\$1n. Because each line must be a valid JSON object, you can't have unescaped line break characters. In the point cloud sequence input manifest file, each line in the manifest contains a sequence of point cloud frames. The point cloud data for each frame in the sequence can either be stored in binary or ASCII format. For more information, see [Accepted Raw 3D Data Formats](sms-point-cloud-raw-data-types.md). This is the manifest file formatting required for 3D point cloud object tracking. Optionally, you can also provide point attribute and camera sensor fusion data for each point cloud frame. When you create a sequence input manifest file, you must provide LiDAR and video camera sensor fusion data in a [world coordinate system](sms-point-cloud-sensor-fusion-details.md#sms-point-cloud-world-coordinate-system). 

The following example demonstrates the syntax used for an input manifest file when each line in the manifest is a sequence file. Each line in your input manifest file must be in [JSON Lines](http://jsonlines.org/) format.

```
{"source-ref": "s3://amzn-s3-demo-bucket/example-folder/seq1.json"}
{"source-ref": "s3://amzn-s3-demo-bucket/example-folder/seq2.json"}
```

The data for each sequence of point cloud frames needs to be stored in a JSON data object. The following is an example of the format you use for a sequence file. Information about each frame is included as a JSON object and is listed in the `frames` list. This is an example of a sequence file with two point cloud frame files, `frame300.bin` and `frame303.bin`. The *...* is used to indicated where you should include information for additional frames. Add a JSON object for each frame in the sequence.

The following code block includes a JSON object for a single sequence file. The JSON object has been expanded for readability.

```
{
  "seq-no": 1,
  "prefix": "s3://amzn-s3-demo-bucket/example_lidar_sequence_dataset/seq1/",
  "number-of-frames": 100,
  "frames":[
    {
        "frame-no": 300, 
        "unix-timestamp": 1566861644.759115, 
        "frame": "example_lidar_frames/frame300.bin", 
        "format": "binary/xyzi", 
        "ego-vehicle-pose":{
            "position": {
                "x": -2.7161461413869947,
                "y": 116.25822288149078,
                "z": 1.8348751887989483
            },
            "heading": {
                "qx": -0.02111296123795955,
                "qy": -0.006495469416730261,
                "qz": -0.008024565904865688,
                "qw": 0.9997181192298087
            }
        }, 
        "images": [
        {
            "image-path": "example_images/frame300.bin_camera0.jpg",
            "unix-timestamp": 1566861644.759115,
            "fx": 847.7962624528487,
            "fy": 850.0340893791985,
            "cx": 576.2129134707038,
            "cy": 317.2423573573745,
            "k1": 0,
            "k2": 0,
            "k3": 0,
            "k4": 0,
            "p1": 0,
            "p2": 0,
            "skew": 0,
            "position": {
                "x": -2.2722515189268138,
                "y": 116.86003310568965,
                "z": 1.454614668542299
            },
            "heading": {
                "qx": 0.7594754093069037,
                "qy": 0.02181790885672969,
                "qz": -0.02461725233103356,
                "qw": -0.6496916273040025
            },
            "camera-model": "pinhole"
        }]
    },
    {
        "frame-no": 303, 
        "unix-timestamp": 1566861644.759115, 
        "frame": "example_lidar_frames/frame303.bin", 
        "format": "text/xyzi", 
        "ego-vehicle-pose":{...}, 
        "images":[{...}]
    },
     ...
  ]
}
```

The following table provides details about the top-level parameters of a sequence file. For detailed information about the parameters required for individual frames in the sequence file, see [Parameters for Individual Point Cloud Frames](#sms-point-cloud-multi-frame-input-single-frame).


****  

|  Parameter  |  Required  |  Accepted Values  |  Description  | 
| --- | --- | --- | --- | 
|  `seq-no`  |  Yes  |  Integer  |  The ordered number of the sequence.   | 
|  `prefix`  |  Yes  |  String **Accepted Values**: `s3://<bucket-name>/<prefix>/`  |  The Amazon S3 location where the sequence files are located.  The prefix must end with a forward slash: `/`.  | 
|  `number-of-frames`  |  Yes  |  Integer  |  The total number of frames included in the sequence file. This number must match the total number of frames listed in the `frames` parameter in the next row.  | 
|  `frames`  |  Yes  |  List of JSON objects  |  A list of frame data. The length of the list must equal `number-of-frames`. In the worker UI, frames in a sequence will be the same as the order of frames in this array.  For details about the format of each frame, see [Parameters for Individual Point Cloud Frames](#sms-point-cloud-multi-frame-input-single-frame).   | 

## Parameters for Individual Point Cloud Frames
<a name="sms-point-cloud-multi-frame-input-single-frame"></a>

The following table shows the parameters you can include in your input manifest file.


****  

|  Parameter  |  Required  |  Accepted Values  |  Description  | 
| --- | --- | --- | --- | 
|  `frame-no`  |  No  |  Integer  |  A frame number. This is an optional identifier specified by the customer to identify the frame within a sequence. It is not used by Ground Truth.  | 
|  `unix-timestamp`  |  Yes  |  Number  |  The unix timestamp is the number of seconds since January 1st, 1970 until the UTC time that the data was collected by a sensor.  The timestamp for each frame must be different and timestamps must be sequential because they are used for cuboid interpolation. Ideally, this should be the real timestamp when the data was collected. If this is not available, you must use an incremental sequence of timestamps, where the first frame in your sequence file corresponds to the first timestamp in the sequence.  | 
|  `frame`  |  Yes  |  String **Example of format** `<folder-name>/<sequence-file.json>`  |  The relative location, in Amazon S3 of your sequence file. This relative path will be appended to the path you specify in `prefix`.  | 
|  `format`  |  No  |  String **Accepted string values**: `"binary/xyz"`, `"binary/xyzi"`, `"binary/xyzrgb"`, `"binary/xyzirgb"`, `"text/xyz"`, `"text/xyzi"`, `"text/xyzrgb"`, `"text/xyzirgb"` **Default Values**:  When the file identified in `source-ref` has a .bin extension, `binary/xyzi` When the file identified in `source-ref` has a .txt extension, `text/xyzi`  |  Use this parameter to specify the format of your point cloud data. For more information, see [Accepted Raw 3D Data Formats](sms-point-cloud-raw-data-types.md).  | 
|  `ego-vehicle-pose`  |  No  |  JSON object  |  The pose of the device used to collect the point cloud data. For more information about this parameter, see [Include Vehicle Pose Information in Your Input Manifest](#sms-point-cloud-multi-frame-ego-vehicle-input).  | 
|  `prefix`  |  No  |  String **Accepted string value format**:  `s3://<bucket-name>/<folder-name>/`  |  The location in Amazon S3 where your metadata, such as camera images, is stored for this frame.  The prefix must end with a forward slash: `/`.  | 
|  `images`  |  No  |  List  |  A list parameters describing color camera images used for sensor fusion. You can include up to 8 images in this list. For more information about the parameters required for each image, see [Include Camera Data in Your Input Manifest](#sms-point-cloud-multi-frame-image-input).   | 

## Include Vehicle Pose Information in Your Input Manifest
<a name="sms-point-cloud-multi-frame-ego-vehicle-input"></a>

Use the ego-vehicle location to provide information about the pose of the vehicle used to capture point cloud data. Ground Truth use this information to compute LiDAR extrinsic matrices. 

Ground Truth uses extrinsic matrices to project labels to and from the 3D scene and 2D images. For more information, see [Sensor Fusion](sms-point-cloud-sensor-fusion-details.md#sms-point-cloud-sensor-fusion).

The following table provides more information about the `position` and orientation (`heading`) parameters that are required when you provide ego-vehicle information. 


****  

|  Parameter  |  Required  |  Accepted Values  |  Description  | 
| --- | --- | --- | --- | 
|  `position`  |  Yes  |  JSON object **Required Parameters**: `x`, `y`, and `z`. Enter numbers for these parameters.   |  The translation vector of the ego vehicle in the world coordinate system.   | 
|  `heading`  |  Yes  |  JSON Object **Required Parameters**: `qx`, `qy`, `qz`, and `qw`. Enter numbers for these parameters.   |  The orientation of the frame of reference of the device or sensor mounted on the vehicle sensing the surrounding, measured in [quaternions](https://en.wikipedia.org/wiki/Quaternion), (`qx`, `qy`, `qz`, `qw`) in the a coordinate system.  | 

## Include Camera Data in Your Input Manifest
<a name="sms-point-cloud-multi-frame-image-input"></a>

If you want to include color camera data with a frame, use the following parameters to provide information about each image. The **Required** column in the following table applies when the `images` parameter is included in the input manifest file. You are not required to include images in your input manifest file. 

If you include camera images, you must include information about the `position` and orientation (`heading`) of the camera used the capture the images. 

If your images are distorted, Ground Truth can automatically undistort them using information you provide about the image in your input manifest file, including distortion coefficients (`k1`, `k2`, `k3`, `k4`, `p1`, `p1`), camera model and focal length (`fx`, `fy`), and the principal point (`cx`, `cy)`. To learn more about these coefficients and undistorting images, see [Camera calibration With OpenCV](https://docs.opencv.org/2.4.13.7/doc/tutorials/calib3d/camera_calibration/camera_calibration.html). If distortion coefficients are not included, Ground Truth will not undistort an image. 


****  

|  Parameter  |  Required  |  Accepted Values  |  Description  | 
| --- | --- | --- | --- | 
|  `image-path`  |  Yes  |  String **Example of format**:  `<folder-name>/<imagefile.png>`  |  The relative location, in Amazon S3 of your image file. This relative path will be appended to the path you specify in `prefix`.   | 
|  `unix-timestamp`  |  Yes  |  Number  |  The timestamp of the image.   | 
|  `camera-model`  |  No  |  String: **Accepted Values**: `"pinhole"`, `"fisheye"` **Default**: `"pinhole"`  |  The model of the camera used to capture the image. This information is used to undistort camera images.   | 
|  `fx, fy`  |  Yes  |  Numbers  |  The focal length of the camera, in the x (`fx`) and y (`fy`) directions.  | 
|  `cx, cy`  |  Yes  | Numbers |  The x (`cx`) and y (`cy`) coordinates of the principal point.   | 
|  `k1, k2, k3, k4`  |  No  |  Number  |  Radial distortion coefficients. Supported for both **fisheye** and **pinhole** camera models.   | 
|  `p1, p2`  |  No  |  Number  |  Tangential distortion coefficients. Supported for **pinhole** camera models.  | 
|  `skew`  |  No  |  Number  |  A parameter to measure any known skew in the image.  | 
|  `position`  |  Yes  |  JSON object **Required Parameters**: `x`, `y`, and `z`. Enter numbers for these parameters.   |  The location or origin of the frame of reference of the camera mounted on the vehicle capturing images.  | 
|  `heading`  |  Yes  |  JSON Object **Required Parameters**: `qx`, `qy`, `qz`, and `qw`. Enter numbers for these parameters.   |  The orientation of the frame of reference of the camera mounted on the vehicle capturing images, measured using [quaternions](https://en.wikipedia.org/wiki/Quaternion), (`qx`, `qy`, `qz`, `qw`).   | 

## Sequence File and Point Cloud Frame Limits
<a name="sms-point-cloud-multi-frame-limits"></a>

You can include up to 100,000 point cloud frame sequences in your input manifest file. You can include up to 500 point cloud frames in each sequence file. 

Keep in mind that 3D point cloud labeling job have longer pre-processing times than other Ground Truth task types. For more information, see [Job pre-processing time](sms-point-cloud-general-information.md#sms-point-cloud-job-creation-time).

# Understand Coordinate Systems and Sensor Fusion
<a name="sms-point-cloud-sensor-fusion-details"></a>

Point cloud data is always located in a coordinate system. This coordinate system may be local to the vehicle or the device sensing the surroundings, or it may be a world coordinate system. When you use Ground Truth 3D point cloud labeling jobs, all the annotations are generated using the coordinate system of your input data. For some labeling job task types and features, you must provide data in a world coordinate system. 

In this topic, you'll learn the following:
+ When you *are required to* provide input data in a world coordinate system or global frame of reference.
+ What a world coordinate is and how you can convert point cloud data to a world coordinate system. 
+ How you can use your sensor and camera extrinsic matrices to provide pose data when using sensor fusion. 

## Coordinate System Requirements for Labeling Jobs
<a name="sms-point-cloud-sensor-fusion-coordinate-requirements"></a>

If your point cloud data was collected in a local coordinate system, you can use an extrinsic matrix of the sensor used to collect the data to convert it to a world coordinate system or a global frame of reference. If you cannot obtain an extrinsic for your point cloud data and, as a result, cannot obtain point clouds in a world coordinate system, you can provide point cloud data in a local coordinate system for 3D point cloud object detection and semantic segmentation task types. 

For object tracking, you must provide point cloud data in a world coordinate system. This is because when you are tracking objects across multiple frames, the ego vehicle itself is moving in the world and so all of the frames need a point of reference. 

If you include camera data for sensor fusion, it is recommended that you provide camera poses in the same world coordinate system as the 3D sensor (such as a LiDAR sensor). 

## Using Point Cloud Data in a World Coordinate System
<a name="sms-point-cloud-world-coordinate-system"></a>

This section explains what a world coordinate system (WCS), also referred to as a *global frame of reference*, is and explains how you can provide point cloud data in a world coordinate system.

### What is a World Coordinate System?
<a name="sms-point-cloud-what-is-wcs"></a>

A WCS or global frame of reference is a fixed universal coordinate system in which vehicle and sensor coordinate systems are placed. For example, if multiple point cloud frames are located in different coordinate systems because they were collected from two sensors, a WCS can be used to translate all of the coordinates in these point cloud frames into a single coordinate system, where all frames have the same origin, (0,0,0). This transformation is done by translating the origin of each frame to the origin of the WCS using a translation vector, and rotating the three axes (typically x, y, and z) to the right orientation using a rotation matrix. This rigid body transformation is called a *homogeneous transformation*.

A world coordinate system is important in global path planning, localization, mapping, and driving scenario simulations. Ground Truth uses the right-handed Cartesian world coordinate system such as the one defined in [ISO 8855](https://www.iso.org/standard/51180.html), where the x axis is forward toward the car’s movement, y axis is left, and the z axis points up from the ground. 

The global frame of reference depends on the data. Some datasets use the LiDAR position in the first frame as the origin. In this scenario, all the frames use the first frame as a reference and device heading and position will be near the origin in the first frame. For example, KITTI datasets have the first frame as a reference for world coordinates. Other datasets use a device position that is different from the origin.

Note that this is not the GPS/IMU coordinate system, which is typically rotated by 90 degrees along the z-axis. If your point cloud data is in a GPS/IMU coordinate system (such as OxTS in the open source AV KITTI dataset), then you need to transform the origin to a world coordinate system (typically the vehicle's reference coordinate system). You apply this transformation by multiplying your data with transformation metrics (the rotation matrix and translation vector). This will transform the data from its original coordinate system to a global reference coordinate system. Learn more about this transformation in the next section. 

### Convert 3D Point Cloud Data to a WCS
<a name="sms-point-cloud-coordinate-system-general"></a>

Ground Truth assumes that your point cloud data has already been transformed into a reference coordinate system of your choice. For example, you can choose the reference coordinate system of the sensor (such as LiDAR) as your global reference coordinate system. You can also take point clouds from various sensors and transform them from the sensor's view to the vehicle's reference coordinate system view. You use the a sensor's extrinsic matrix, made up of a rotation matrix and translation vector, to convert your point cloud data to a WCS or global frame of reference. 

Collectively, the translation vector and rotation matrix can be used to make up an *extrinsic matrix*, which can be used to convert data from a local coordinate system to a WCS. For example, your LiDAR extrinsic matrix may be composed as follows, where `R` is the rotation matrix and `T` is the translation vector:

```
LiDAR_extrinsic = [R T;0 0 0 1]
```

For example, the autonomous driving KITTI dataset includes a rotation matrix and translation vector for the LiDAR extrinsic transformation matrix for each frame. The [pykitti](https://github.com/utiasSTARS/pykitti) python module can be used for loading the KITTI data, and in the dataset `dataset.oxts[i].T_w_imu` gives the LiDAR extrinsic transform for the `i`th frame with can be multiplied with points in that frame to convert them to a world frame - `np.matmul(lidar_transform_matrix, points)`. Multiplying a point in LiDAR frame with a LiDAR extrinsic matrix transforms it into world coordinates. Multiplying a point in the world frame with the camera extrinsic matrix gives the point coordinates in the camera's frame of reference.

The following code example demonstrates how you can convert point cloud frames from the KITTI dataset into a WCS. 

```
import pykitti
import numpy as np

basedir = '/Users/nameofuser/kitti-data'
date = '2011_09_26'
drive = '0079'

# The 'frames' argument is optional - default: None, which loads the whole dataset.
# Calibration, timestamps, and IMU data are read automatically. 
# Camera and velodyne data are available via properties that create generators
# when accessed, or through getter methods that provide random access.
data = pykitti.raw(basedir, date, drive, frames=range(0, 50, 5))

# i is frame number
i = 0

# lidar extrinsic for the ith frame
lidar_extrinsic_matrix = data.oxts[i].T_w_imu

# velodyne raw point cloud in lidar scanners own coordinate system
points = data.get_velo(i)

# transform points from lidar to global frame using lidar_extrinsic_matrix
def generate_transformed_pcd_from_point_cloud(points, lidar_extrinsic_matrix):
    tps = []
    for point in points:
        transformed_points = np.matmul(lidar_extrinsic_matrix, np.array([point[0], point[1], point[2], 1], dtype=np.float32).reshape(4,1)).tolist()
        if len(point) > 3 and point[3] is not None:
            tps.append([transformed_points[0][0], transformed_points[1][0], transformed_points[2][0], point[3]])
       
    return tps
    
# customer transforms points from lidar to global frame using lidar_extrinsic_matrix
transformed_pcl = generate_transformed_pcd_from_point_cloud(points, lidar_extrinsic_matrix)
```

## Sensor Fusion
<a name="sms-point-cloud-sensor-fusion"></a>

Ground Truth supports sensor fusion of point cloud data with up to 8 video camera inputs. This feature allows human labellers to view the 3D point cloud frame side-by-side with the synchronized video frame. In addition to providing more visual context for labeling, sensor fusion allows workers to adjust annotations in the 3D scene and in 2D images and the adjustment are projected into the other view. The following video demonstrates a 3D point cloud labeling job with LiDAR and camera sensor fusion. 

![\[Gif showing a 3D point cloud labeling job with LiDAR and camera sensor fusion.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/pointcloud/gifs/object_tracking/sensor-fusion.gif)


For best results, when using sensor fusion, your point cloud should be in a WCS. Ground Truth uses your sensor (such as LiDAR), camera, and ego vehicle pose information to compute extrinsic and intrinsic matrices for sensor fusion. 

### Extrinsic Matrix
<a name="sms-point-cloud-extrinsics"></a>

Ground Truth uses sensor (such as LiDAR) extrinsic and camera extrinsic and intrinsic matrices to project objects to and from the point cloud data's frame of reference to the camera's frame of reference. 

For example, in order to project a label from the 3D point cloud to camera image plane, Ground Truth transforms 3D points from LiDAR’s own coordinate system to the camera's coordinate system. This is typically done by first transforming 3D points from LiDAR’s own coordinate system to a world coordinate system (or a global reference frame) using the LiDAR extrinsic matrix. Ground Truth then uses the camera inverse extrinsic (which converts points from a global frame of reference to the camera's frame of reference) to transform the 3D points from world coordinate system obtained in previous step into the camera image plane. The LiDAR extrinsic matrix can also be used to transform 3D data into a world coordinate system. If your 3D data is already transformed into world coordinate system then the first transformation doesn’t have any impact on label translation, and label translation only depends on the camera inverse extrinsic. A view matrix is used to visualize projected labels. To learn more about these transformations and the view matrix, see [Ground Truth Sensor Fusion Transformations](#sms-point-cloud-extrinsic-intrinsic-explanation).

 Ground Truth computes these extrinsic matrices by using LiDAR and camera *pose data* that you provide: `heading` ( in quaternions: `qx`, `qy`, `qz`, and `qw`) and `position` (`x`, `y`, `z`). For the vehicle, typically the heading and position are described in vehicle's reference frame in a world coordinate system and are called a *ego vehicle pose*. For each camera extrinsic, you can add pose information for that camera. For more information, see [Pose](#sms-point-cloud-pose).

### Intrinsic Matrix
<a name="sms-point-cloud-intrinsic"></a>

Ground Truth use the camera extrinsic and intrinsic matrices to compute view metrics to transform labels to and from the 3D scene to camera images. Ground Truth computes the camera intrinsic matrix using camera focal length (`fx`, `fy`) and optical center coordinates (`cx`,`cy`) that you provide. For more information, see [Intrinsic and Distortion](#sms-point-cloud-camera-intrinsic-distortion).

### Image Distortion
<a name="sms-point-cloud-image-distortion"></a>

Image distortion can occur for a variety of reasons. For example, images may be distorted due to barrel or fish-eye effects. Ground Truth uses intrinsic parameters along with distortion co-efficient to undistort images you provide when creating 3D point cloud labeling jobs. If a camera image is already been undistorted, all distortion coefficients should be set to 0.

For more information about the transformations Ground Truth performs to undistort images, see [Camera Calibrations: Extrinsic, Intrinsic and Distortion](#sms-point-cloud-extrinsic-camera-explanation).

### Ego Vehicle
<a name="sms-point-cloud-ego-vehicle"></a>

To collect data for autonomous driving applications, the measurements used to generate point cloud data and are taken from sensors mounted on a vehicle, or the *ego vehicle*. To project label adjustments to and from the 3D scene and 2D images, Ground Truth needs your ego vehicle pose in a world coordinate system. The ego vehicle pose is comprised of position coordinates and orientation quaternion. 

 Ground Truth uses your ego vehicle pose to compute rotation and transformations matrices. Rotations in 3 dimensions can be represented by a sequence of 3 rotations around a sequence of axes. In theory, any three axes spanning the 3D Euclidean space are enough. In practice, the axes of rotation are chosen to be the basis vectors. The three rotations are expected to be in a global frame of reference (extrinsic). Ground Truth does not a support body centered frame of reference (intrinsic) which is attached to, and moves with, the object under rotation. To track objects, Ground Truth needs to measure from a global reference where all vehicles are moving. When using Ground Truth 3D point cloud labeling jobs, z specifies the axis of rotation (extrinsic rotation) and yaw Euler angles are in radians (rotation angle).

### Pose
<a name="sms-point-cloud-pose"></a>

Ground Truth uses pose information for 3D visualizations and sensor fusion. Pose information you input through your manifest file is used to compute extrinsic matrices. If you already have an extrinsic matrix, you can use it to extract sensor and camera pose data. 

For example in the autonomous driving KITTI dataset, the [pykitti](https://github.com/utiasSTARS/pykitti) python module can be used for loading the KITTI data. In the dataset `dataset.oxts[i].T_w_imu` gives the LiDAR extrinsic transform for the `i`th frame and it can be multiplied with the points to get them in a world frame - `matmul(lidar_transform_matrix, points)`. This transform can be converted into position (translation vector) and heading (in quaternion) of LiDAR for the input manifest file JSON format. Camera extrinsic transform for `cam0` in `i`th frame can be calculated by `inv(matmul(dataset.calib.T_cam0_velo, inv(dataset.oxts[i].T_w_imu)))` and this can be converted into heading and position for `cam0`.

```
import numpy

rotation = [[ 9.96714314e-01, -8.09890350e-02,  1.16333982e-03],
 [ 8.09967396e-02,  9.96661051e-01, -1.03090934e-02],
 [-3.24531964e-04,  1.03694477e-02,  9.99946183e-01]]
 
origin= [1.71104606e+00,
          5.80000039e-01,
          9.43144935e-01]

         
from scipy.spatial.transform import Rotation as R

# position is the origin
position = origin 
r = R.from_matrix(np.asarray(rotation))

# heading in WCS using scipy 
heading = r.as_quat()
print(f"pose:{position}\nheading: {heading}")
```

**Position**  
In the input manifest file, `position` refers to the position of the sensor with respect to a world frame. If you are unable to put the device position in a world coordinate system, you can use LiDAR data with local coordinates. Similarly, for mounted video cameras you can specify the position and heading in a world coordinate system. For camera, if you do not have position information, please use (0, 0, 0). 

The following are the fields in the position object:

1.  `x` (float) – x coordinate of ego vehicle, sensor, or camera position in meters. 

1.  `y` (float) – y coordinate of ego vehicle, sensor, or camera position in meters. 

1.  `z` (float) – z coordinate of ego vehicle, sensor, or camera position in meters. 

The following is an example of a `position` JSON object: 

```
{
    "position": {
        "y": -152.77584902657554,
        "x": 311.21505956090624,
        "z": -10.854137529636024
      }
}
```

**Heading**  
In the input manifest file, `heading` is an object that represents the orientation of a device with respect to world frame. Heading values should be in quaternion. A [quaternion](https://en.wikipedia.org/wiki/Quaternions_and_spatial_rotation) is a representation of the orientation consistent with geodesic spherical properties. If you are unable to put the sensor heading in world coordinates, please use the identity quaternion `(qx = 0, qy = 0, qz = 0, qw = 1)`. Similarly, for cameras, specify the heading in quaternions. If you are unable to obtain extrinsic camera calibration parameters, please also use the identity quaternion. 

Fields in `heading` object are as follows:

1.  `qx` (float) - x component of ego vehicle, sensor, or camera orientation. 

1.  `qy` (float) - y component of ego vehicle, sensor, or camera orientation. 

1.  `qz` (float) - z component of ego vehicle, sensor, or camera orientation. 

1. `qw` (float) - w component of ego vehicle, sensor, or camera orientation. 

The following is an example of a `heading` JSON object: 

```
{
    "heading": {
        "qy": -0.7046155108831117,
        "qx": 0.034278837280808494,
        "qz": 0.7070617895701465,
        "qw": -0.04904659893885366
      }
}
```

To learn more, see [Compute Orientation Quaternions and Position](#sms-point-cloud-ego-vehicle-orientation).

## Compute Orientation Quaternions and Position
<a name="sms-point-cloud-ego-vehicle-orientation"></a>

Ground Truth requires that all orientation, or heading, data be given in quaternions. A [quaternions](https://en.wikipedia.org/wiki/Quaternions_and_spatial_rotation) is a representation of the orientation consistent with geodesic spherical properties that can be used to approximate of rotation. Compared to [Euler angles](https://en.wikipedia.org/wiki/Euler_angles) they are simpler to compose and avoid the problem of [gimbal lock](https://en.wikipedia.org/wiki/Gimbal_lock). Compared to rotation matrices they are more compact, more numerically stable, and more efficient. 

You can compute quaternions from a rotation matrix or a transformation matrix.

If you have a rotation matrix (made up of the axis rotations) and translation vector (or origin) in world coordinate system instead of a single 4x4 rigid transformation matrix, then you can directly use the rotation matrix and translation vector to compute quaternions. Libraries like [scipy](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.transform.Rotation.html) and [pyqaternion ](http://kieranwynn.github.io/pyquaternion/#explicitly-by-rotation-or-transformation-matrix) can help. The following code-block shows an example using these libraries to compute quaternion from a rotation matrix. 

```
import numpy

rotation = [[ 9.96714314e-01, -8.09890350e-02,  1.16333982e-03],
 [ 8.09967396e-02,  9.96661051e-01, -1.03090934e-02],
 [-3.24531964e-04,  1.03694477e-02,  9.99946183e-01]]
 
origin = [1.71104606e+00,
          5.80000039e-01,
          9.43144935e-01]

         
from scipy.spatial.transform import Rotation as R
# position is the origin
position = origin 
r = R.from_matrix(np.asarray(rotation))
# heading in WCS using scipy 
heading = r.as_quat()
print(f"position:{position}\nheading: {heading}")
```

A UI tool like [3D Rotation Converter](https://www.andre-gaschler.com/rotationconverter/) can also be useful.

If you have a 4x4 extrinsic transformation matrix, note that the transformation matrix is in the form `[R T; 0 0 0 1]` where `R` is the rotation matrix and `T` is the origin translation vector. That means you can extract rotation matrix and translation vector from the transformation matrix as follows.

```
import numpy as np

transformation 
= [[ 9.96714314e-01, -8.09890350e-02,  1.16333982e-03, 1.71104606e+00],
   [ 8.09967396e-02,  9.96661051e-01, -1.03090934e-02, 5.80000039e-01],
   [-3.24531964e-04,  1.03694477e-02,  9.99946183e-01, 9.43144935e-01],
   [              0,               0,               0,              1]]

transformation  = np.array(transformation )
rotation = transformation[0:3,0:3]
translation= transformation[0:3,3]

from scipy.spatial.transform import Rotation as R
# position is the origin translation
position = translation
r = R.from_matrix(np.asarray(rotation))
# heading in WCS using scipy 
heading = r.as_quat()
print(f"position:{position}\nheading: {heading}")
```

With your own setup, you can compute an extrinsic transformation matrix using the GPS/IMU position and orientation (latitude, longitude, altitude and roll, pitch, yaw) with respect to the LiDAR sensor on the ego vehicle. For example, you can compute pose from KITTI raw data using `pose = convertOxtsToPose(oxts)` to transform the oxts data into a local euclidean poses, specified by 4x4 rigid transformation matrices. You can then transform this pose transformation matrix to a global reference frame using the reference frames transformation matrix in the world coordinate system.

```
struct Quaternion
{
    double w, x, y, z;
};

Quaternion ToQuaternion(double yaw, double pitch, double roll) // yaw (Z), pitch (Y), roll (X)
{
    // Abbreviations for the various angular functions
    double cy = cos(yaw * 0.5);
    double sy = sin(yaw * 0.5);
    double cp = cos(pitch * 0.5);
    double sp = sin(pitch * 0.5);
    double cr = cos(roll * 0.5);
    double sr = sin(roll * 0.5);

    Quaternion q;
    q.w = cr * cp * cy + sr * sp * sy;
    q.x = sr * cp * cy - cr * sp * sy;
    q.y = cr * sp * cy + sr * cp * sy;
    q.z = cr * cp * sy - sr * sp * cy;

    return q;
}
```

## Ground Truth Sensor Fusion Transformations
<a name="sms-point-cloud-extrinsic-intrinsic-explanation"></a>

The following sections go into greater detail about the Ground Truth sensor fusion transformations that are performed using the pose data you provide.

### LiDAR Extrinsic
<a name="sms-point-cloud-extrinsic-lidar-explanation"></a>

In order to project to and from a 3D LiDAR scene to a 2D camera image, Ground Truth computes the rigid transformation projection metrics using the ego vehicle pose and heading. Ground Truth computes rotation and translation of a world coordinates into the 3D plane by doing a simple sequence of rotations and translation. 

Ground Truth computes rotation metrics using the heading quaternions as follows:

![\[Equation: Ground Truth point cloud rotation metrics.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/pointcloud/sms-point-cloud-rotation-matrix.png)


Here, `[x, y, z, w]` corresponds to parameters in the `heading` JSON object, `[qx, qy, qz, qw]`. Ground Truth computes the translation column vector as `T = [poseX, poseY, poseZ]`. Then the extrinsic metrics is simply as follows:

```
LiDAR_extrinsic = [R T;0 0 0 1]
```

### Camera Calibrations: Extrinsic, Intrinsic and Distortion
<a name="sms-point-cloud-extrinsic-camera-explanation"></a>

*Geometric camera calibration*, also referred to as *camera resectioning*, estimates the parameters of a lens and image sensor of an image or video camera. You can use these parameters to correct for lens distortion, measure the size of an object in world units, or determine the location of the camera in the scene. Camera parameters include intrinsics and distortion coefficients.

#### Camera Extrinsic
<a name="sms-point-cloud-camera-extrinsic"></a>

If the camera pose is given, then Ground Truth computes the camera extrinsic based on a rigid transformation from the 3D plane into the camera plane. The calculation is the same as the one used for the [LiDAR Extrinsic](#sms-point-cloud-extrinsic-lidar-explanation), except that Ground Truth uses camera pose (`position` and `heading`) and computes the inverse extrinsic.

```
 camera_inverse_extrinsic = inv([Rc Tc;0 0 0 1]) #where Rc and Tc are camera pose components
```

#### Intrinsic and Distortion
<a name="sms-point-cloud-camera-intrinsic-distortion"></a>

Some cameras, such as pinhole or fisheye cameras, may introduce significant distortion in photos. This distortion can be corrected using distortion coefficients and the camera focal length. To learn more, see [Camera calibration With OpenCV](https://docs.opencv.org/2.4.13.7/doc/tutorials/calib3d/camera_calibration/camera_calibration.html) in the OpenCV documentation.

There are two types of distortion Ground Truth can correct for: radial distortion and tangential distortion.

*Radial distortion* occurs when light rays bend more near the edges of a lens than they do at its optical center. The smaller the lens, the greater the distortion. The presence of the radial distortion manifests in form of the *barrel* or *fish-eye* effect and Ground Truth uses Formula 1 to undistort it. 

**Formula 1:**

![\[Formula 1: equations for x_{corrected} and y_{corrected}, to undistort radial distortion.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/pointcloud/sms-point-cloud-camera-distortion-1.png)


*Tangential distortion* occurs because the lenses used to take the images are not perfectly parallel to the imaging plane. This can be corrected with Formula 2. 

**Formula 2:**

![\[Formula 2: equations for x_{corrected} and y_{corrected}, to correct for tangential distortion.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/pointcloud/sms-point-cloud-camera-distortion-2.png)


In the input manifest file, you can provide distortion coefficients and Ground Truth will undistort your images. All distortion coefficients are floats. 
+ `k1`, `k2`, `k3`, `k4` – Radial distortion coefficients. Supported for both fisheye and pinhole camera models.
+ `p1` ,`p2` – Tangential distortion coefficients. Supported for pinhole camera models.

If images are already undistorted, all distortion coefficients should be 0 in your input manifest. 

In order to correctly reconstruct the corrected image, Ground Truth does a unit conversion of the images based on focal lengths. If a common focal length is used with a given aspect ratio for both axes, such as 1, in the upper formula we will have a single focal length. The matrix containing these four parameters is referred to as the *in camera intrinsic calibration matrix*. 

![\[The in camera intrinsic calibration matrix.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/pointcloud/sms-point-cloud-camera-intrinsic.png)


While the distortion coefficients are the same regardless of the camera resolutions used, these should be scaled with the current resolution from the calibrated resolution. 

The following are float values. 
+ `fx` - focal length in x direction.
+ `fy` - focal length in y direction.
+ `cx` - x coordinate of principal point.
+ `cy` - y coordinate of principal point.

Ground Truth use the camera extrinsic and camera intrinsic to compute view metrics as shown in the following code block to transform labels between the 3D scene and 2D images.

```
def generate_view_matrix(intrinsic_matrix, extrinsic_matrix):
    intrinsic_matrix = np.c_[intrinsic_matrix, np.zeros(3)]
    view_matrix = np.matmul(intrinsic_matrix, extrinsic_matrix)
    view_matrix = np.insert(view_matrix, 2, np.array((0, 0, 0, 1)), 0)
    return view_matrix
```