# Intel® Media Transcode Accelerator

## OVERVIEW

Intel® Media Transcode Accelerator provides video encoding and decoding hardware acceleration to improve efficiency and performance by moving operations from CPU to Media HW, saving a significant amount of CPU cores.

This README applies to Intel® GNR-D with Media device.

## Installation of Intel® Media Transcode Accelerator

### Requirements and Dependencies

**Supported OS and kernels:**

- Ubuntu 24.04:
  - kernel 6.8
  - kernel 6.14
- RHEL 9.6:
  - kernel 5.14.0-570
- RHEL 10:
  - kernel 6.12.0-55

**Libraries, tools and development packages:**

- gcc >= 10
- libva >= 2.18 (2.20 recommended)
- libva-devel >= 2.18 (2.20 recommended)
- libudev-devel
- pciutils
- kernel-devel
- openssl-devel
- zlib-devel
- pcre2-devel
- git
- autoconf
- automake
- libtool
- httpd-tools
- boost-devel
- libnl3-devel.x86_64
- cmake
- pkgconf
- libdrm-devel
- nasm
- yasm
- libvpl-devel
- libvpl-tools
- libva-utils

**For RHEL:**

```sh
dnf install -y dnf-plugins-core libudev-devel pciutils gcc gcc-c++ openssl-devel zlib-devel pcre2-devel git autoconf automake kernel-devel libtool httpd-tools boost-devel libnl3-devel pkgconf patch libdrm-devel libva-devel cmake
 
# NASM (not included in RHEL 9.4 package)
wget https://www.nasm.us/pub/nasm/releasebuilds/2.16.01/nasm-2.16.01.tar.bz2 && tar -xjf nasm-2.16.01.tar.bz2 && cd nasm-2.16.01 && ./configure --prefix=/usr/local && make -j$(nproc) && sudo make install && cd ..
 
# YASM (not included in RHEL 9.4 package)
wget https://www.tortall.net/projects/yasm/releases/yasm-1.3.0.tar.gz && tar -xzf yasm-1.3.0.tar.gz && cd yasm-1.3.0 && ./configure --prefix=/usr/local && make -j$(nproc) && sudo make install && cd ..
 
# libva-utils (not included in RHEL 9.4 package)
git clone https://github.com/intel/libva-utils.git && cd libva-utils && ./autogen.sh --prefix=/usr/local && make -j$(nproc) && sudo make install && cd ..
 
# Intel VPL (not included in RHEL 9.4 package)
git clone https://github.com/intel/libvpl.git && cd libvpl && mkdir build && cd build && cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local && make -j$(nproc) && sudo make install && sudo ldconfig

```

**For Ubuntu:**

```sh
apt-get install apache2 autoconf automake bison cmake diffutils dwarves findutils flex g++ gawk gcc git libboost-all-dev libdrm-dev libnl-3-dev libnl-genl-3-dev libssl-dev libtool libudev-dev libva-dev libva-drm2 mawk nasm ncurses-base ncurses-bin pciutils pkgconf udev vainfo yasm zlib1g zlib1g-dev libvpl-dev onevpl-tools
```

**Intel QAT 2.2 (Intel® QuickAssist Technology) package**  

Copy the uncompressed package to:
`<Intel® Media Transcode Accelerator path>/src/aux_drv/qat_aux`

### Installation

> **RUN ALL COMMANDS AS ROOT!**

Make sure that IOMMU is enabled (`intel_iommu=on,sm_on`):

```sh
cat /proc/cmdline
```

or

```sh
dmesg | grep -i iommu
```

To enable IOMMU do the following:  
Add boot line parameters `iommu=on intel_iommu=on,sm_on`,

**for RHEL:**

```sh
grubby --update-kernel=ALL --args="iommu=on intel_iommu=on,sm_on"
reboot
```

Red Hat does NOT automatically generate the `signing_key.pem` required for signing external kernel modules. To load Intel® Media Transcode Accelerator driver, make sure to generate this key manually. Go to `<kernel_source>/certs` (e.g. `/usr/src/kernels/5.14.0-503.11.1.el9_5.x86_64/certs`) and run:

```sh
openssl req -new -x509 -newkey rsa:2048 -keyout signing_key.pem -outform PEM -out signing_key.x509 -days 365 -nodes -subj "/CN=Module Signing Key"
```

**Install Media:**

```sh
source setenv.sh
./install.sh
```

### Uninstallation

Run installation script with clean command

```sh
./install.sh clean
```

Or you can do it manually :

First uninstall Intel QAT®

Go to `<Intel® Media Transcode Accelerator path>/src/aux_drv/qat_aux`  
Run:

```sh
make uninstall
```

Uninstall Intel® Media Transcode Accelerator

Go to `<Intel® Media Transcode Accelerator path>/src/aux_drv`

```sh
make uninstall
```

## Usage of Intel® Media Transcode Accelerator

Intel® Media Transcode Accelerator allows the creation of DRM devices which are used by FFMPEG as HW acceleration.
Those devices work in two modes:

- **UQ (User Queue)** - default on host without SRIOV enabled,
- **WQM (Worker Queue Mode)** - default on virtual machine or host with SRIOV.

UQ mode allows the device to be used by several users at the same time, WQM allows only one user to access the device.

Modes supported:

**Full Offload Mode** The hardware takes care of all. It is supported in FFmpeg and sample applications for VPL.
You can find them in `<Intel® Media Transcode Accelerator path>/src/libvpl-media-ip-rt/build/bin/Release` folder.

**Hybrid Mode** This is a team effort between CPU and the hardware. It is supported in x264 and x265 applications.

### Device check

- **vainfo:**

  ```sh
  su -c vainfo --display drm --device <device driver>
  ```

  `<device driver>` – e.g.: /dev/dri/renderD128 or /dev/dri/renderD129  

### Full offload mode

- **FFmpeg decode:**

  ```sh
  ffmpeg -hwaccel vaapi -hwaccel_device <device driver> -hwaccel_output_format vaapi -i <input file> -vf 'hwdownload,format=nv12' -pix_fmt yuv420p <output file>
  ```

  `<device driver>` – e.g.: /dev/dri/renderD128 or /dev/dri/renderD129  
  `<input file>` – a supported video file (avc, av1 or hevc)  
  `<output file>` – an uncompressed yuv file

- **FFmpeg transcode:**

  ```sh
  ffmpeg -hwaccel vaapi -init_hw_device vaapi=hw:<device driver> -hwaccel_output_format vaapi -v verbose -i <input file> -an -c:v <codec> -profile:v <profile> -rc_mode <rc mode> -g <GoP size> -slices <slices> -compression_level <compression_level> -y <output file>
  ```

  `<device driver>` – e.g.: /dev/dri/renderD128 or /dev/dri/renderD129  
  `<input file>` – a supported video file (AVC, AV1 or HEVC)  
  `<codec>` - codec selected for encoding: av1_vaapi, h264_vaapi, hevc_vaapi or mjpeg_vaapi (profile:v parameter is ignored for JPEG coding, as it does not support profiles)  
  `<profile>` - coding profile (color depth); supported profiles: main (AVC, AV1, HEVC), main10 (HEVC), high (AVC), high10 (AVC)  
  `<rc mode>` - rate control mode. Requires additional parameters:
  - **VBR**: `-b:v <bitrate> -maxrate <bitrate> -bufsize <bufsize>` - Variable Bitrate (average target)
  - **CBR**: `-b:v <bitrate>` - Constant Bitrate
  - **CQP**: `-qp <value>` - Constant Quantization Parameter (supported values 0-51 for AVC/HEVC, 0-63 for AV1, '0' means the best quality, suggested values: 20-45, where 26 is treated as default/optimal value)  
   Note: These modes are mutually exclusive - only one can be used per encoding session.

  `<GoP size>` - Group of Pictures size used for encoding; number should be higher than mini GoP value, depending on the low_delay mode and codec (go to Supported GOP Structures and Low Delay Mode)
  `<slices>` -  the number of fragments into which the image is divided during encoding, which allows for parallelization of processes and reduction of the risk of encoding error of the entire frame; the number of slices supported depends on the codec (AVC: 1-300, HEVC: 1-64, AV1: 1-64), but it is usually between 1 and 16; we recommend choosing values between 1 and 4  
  `<output file>` – output file path  
  `compression_level` - 0 (default) for best quality, supported values for AV1: 0-3, AVC: 0-2, HEVC: 0-4

- **FFmpeg encode:**

  ```sh
  ffmpeg -init_hw_device vaapi=hw:<device driver> -filter_hw_device hw -f rawvideo -pix_fmt yuv420p -s 720x420 -i <input file> -vf 'format=nv12,hwupload' -c:v <codec> -profile:v <profile> <output file>
  ```

### Hybrid Mode

Hybrid mode can be used for HEVC and AVC encoding, leveraging the encoder with various presets and parameters. Hybrid mode allows you to combine hardware and software encoding features for optimal performance and quality. Adjust the hardware and software preset parameters to find the best balance.

**Example HEVC encode command:**

```sh
x265 --tune psnr --temporal-layers 4 --no-scenecut --no-open-gop --ctu 64 --min-cu-size 8 --tu-inter-depth 4 --tu-intra-depth 3 --rect --amp --no-temporal-mvp --no-strong-intra-smoothing --no-signhide --no-weightp --no-weightb --hwax 1 --hwapi 0 --fps 60000/1000 --bitrate 6000 --input-res 2048x858 --hwpreset 3 --preset 4 --keyint 200 --min-keyint 200 --frames 50 --input-depth 8 --input <input> -o <output file>
```

**Example AVC encode command:**

```sh
x264 --tune psnr --no-scenecut --hwax 1 --fps 60000/1000 --bitrate 6000 --input-res 2048x858 --hwpreset 0 --preset slow --keyint 200 --min-keyint 200 --frames 50  --input-depth 8 -o <output file> <input>
```

**Explanation of options:**

- `hwax` - use acceleration (1)
- `hwapi` - API select 0=VA-API, 1=VPL (x265 only)
- `hwpreset` - 0 – default, 1, 2 or 3, higher values may correspond to different hardware profiles (e.g., faster or more energy-efficient)  
- `preset` - the trade-off between encoding speed and compression efficiency, the slower the preset, the better the compression, but the longer the encoding time; ultrafast (0), superfast (1), veryfast (2), faster (3), fast (4), medium (5, default), slow (6), slower (7), veryslow (8), placebo (9, highest quality, slowest);  
- `<input file>` - 8-bit YUV file  
- `<output file>` - output file path  

CRF (Constant Rate Factor) mode can also be used for quality-based encoding by setting the --crf parameter (e.g., --crf 23, --crf 28) instead of --bitrate, but it cannot be combined with VBR (Variable Bitrate) - only one of these modes may be active at a time.

## Quality Optimizations

### Lookahead

Lookahead allows the encoder to analyze future frames to make better decisions about bitrate allocation, resulting in higher quality video at the same bitrate at the expense of increased latency.  
Depending on `compression_level` setting, lookahead might be enabled by default. To overwrite the default setting, please use `--lookahead <lookahead depth>` option.

### Improved Mini GOP Structure

Improved mini GOP structure enhances the group of pictures (GOP) structure, leading to better compression efficiency and video quality.

### Supported GOP Structures and Low Delay Mode

- By default, Intel® Media Transcode Accelerator uses a hierarchical GOP structure with 7 B-frames (mini GOP 8), optimized for quality. The `-bf` parameter does not affect the GOP structure; the encoder always uses the same (pyramidal) structure.
- For low latency use cases, enable Low Delay mode by adding `-low_delay 1` to your ffmpeg command.
- The GOP structure cannot be changed via the `-bf` parameter. Low delay mode is only available through the `-low_delay` parameter.
- When low delay mode is enabled:
  - Lookahead is automatically disabled.
  - The GOP structure changes: for HEVC, a mini GOP of 4 is used; for AVC, a mini GOP of 1 is used.
  - All references are to previous frames (no reordering).

**Example of a command using low latency:**

```sh
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -i <input file> -c:v hevc_vaapi -low_delay 1 <output file>
```

### Other

> Please note, that Intel® Media Transcode Accelerator software stack includes an optimized (patched) version of FFmpeg which is part of the Intel® Media Transcode Accelerator installation package. The optimized version of FFmpeg will display "Intel® Media Transcode Accelerator optimized version" in its banner.

## Limitations

- GStreamer is not supported, but providing support for it is planned for the future.
- In Low Delay mode, the AV1 codec is not supported, but providing support for it is planned.

For complete list of limitations please refer to release notes.

## Telegraf support for Intel® Media Transcode Accelerator

The Intel® Media Transcode Accelerator (MTA) Input Plugin for Telegraf collects hardware utilization metrics from Intel MTA devices using the Linux sysfs telemetry interface.
It periodically reads low-level counters directly from the device, providing real-time insights into encoding, decoding, and image/media processing utilization.
The plugin supports configurable telemetry intervals and averaging windows, allowing users to fine-tune the granularity and smoothing of reported metrics.
It is designed for Linux systems and may require elevated permissions to access device telemetry data.

**For now plugin is available as Telegraf patch only.**

### Patch installation

1. Install Go (required by Telegraf)
   For RHEL:

   ```sh
   dnf install -y golang
   ```

   For Ubuntu:

   ```sh
   apt-get install golang-go
   ```

2. Clone Telegraf repository

   ```sh
   git clone https://github.com/influxdata/telegraf
   ```

3. Copy patch from Intel® Media Transcode Accelerator package

   ```sh
   cp <Intel® Media Transcode Accelerator path>/src/telegraf_intel_mta_plugin.patch <Telegraf path>
   ```

4. Apply the patch

      ```sh
   git apply telegraf_intel_mta_plugin.patch
   ```

5. Build Telegraf

   ```sh
   make
   ```

6. Configure the plugin
   - Edit your *telegraf.conf* and add the [[inputs.intel_mta]] section as described in plugin README or use the provided sample config.
7. Run Telegraf

   ```sh
   ./telegraf --config telegraf.conf
   ```

## Utilization Monitor

Utilization Monitor is a Python script which displays real-time utilization data for Intel® Media Transcode Accelerator devices. It can also save the utilization values over time as CSV which allows for creation of utilization graphs. The script will work on a GNR-D system with properly set up Intel® QuickAssist Technology 2.2 and Intel® Media Transcode Accelerator drivers.  

Usage:

```sh
./utilmon.py -i 1 -o ./util.csv
```

Arguments:  
`-i` - telemetry interval ( 0 - 1ms, 1 - 33ms, 2 - 268ms, 3 - 1s), optional
`-o` - path to output csv file, optional  

If run with no arguments, the default interval is 0 (1ms) and utilization data will not be saved to csv. This will launch Utilization Monitor which will work until closed via CTRL+C key combo. In a separate terminal instance, an encoding/decoding/transcoding task can be launched and Utilization Monitor should show real-time usage.

## Virtualization Support – Preview

- Support for virtualized environments using QEMU is currently in preview mode. While SR-IOV VF passthrough has been
  validated with select bitstreams, some limitations may still occur. Users are advised to evaluate virtualization
  use cases accordingly.

  Note: Containerized deployments using Docker are fully supported and offer a stable alternative for most use cases

## Virtualization usage guidelines

- For QEMU, WQM mode with "virtio-iommu-pci" (paravirtualized vIOMMU) or "intel-iommu"
  is supported.
- vIOMMU is necessary to allocate and map large memory buffers needed by the HW
  and SW stack during the operation.
- iommufd support on the host side is necessary to link the host IOMMU with the one
  in the VM.
- QEMU is sensitive about the order of device declaration.
- This combination of parameters (order is important) adds an iommufd object, creates
  a vIOMMU, binds a VF to the VM with vIOMMU support:

```sh
    -object iommufd,id=iommufd0 \
    -device virtio-iommu-pci \
    -device vfio-pci,host=0000:01:00.1,iommufd=iommufd0 \
```

  or

```sh
    -object iommufd,id=iommufd0 \
    -device intel-iommu,caching-mode=on,x-scalable-mode=on,x-pasid-mode=on \
    -device vfio-pci,host=0000:01:00.1,iommufd=iommufd0 \
```

- The (para)virtualized kernel needs to be built with CONFIG_VIRTIO_IOMMU=y or
  CONFIG_INTEL_IOMMU=y.
  This is distribution-specific and should be clarified in distribution
  documentation.
- Sample scripts are included in the "samples/" directory

## Host SRIOV mode usage guidelines
- Only WQM mode is supported for SRIOV host. To enable, it is necessary to set environment
  variables VA_USE_SVM=0 export VA_USE_WQM=1

## Legal/Dislaimers

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL(R) PRODUCTS.
NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL
PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S
TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY
WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO
SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING
TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY
PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel® products are
not intended for use in medical, life saving, life sustaining, critical control
or safety systems, or in nuclear facility applications.

Intel® may make changes to specifications and product descriptions at any time,
without notice.

(C) Intel® Corporation 2025

- Other names and brands may be claimed as the property of others.