# Intel® Media Transcode Accelerator

## JULY 2025

## OVERVIEW

Intel® Media Transcode Accelerator provides video encoding and decoding hardware acceleration to improve efficiency and performance by moving operations from CPU to Media HW, saving a significant amount of CPU cores.

This README applies to Intel® GNR-D with Media device.

## LICENSING

Software under this project is licensed by GPL-2.0-Only.

Copyright (C) 2025 Intel® Corporation

## Installation of Intel® Media Transcode Accelerator

### Requirements and Dependencies

**Supported OS and kernels:**

- Ubuntu 24.04 kernel 6.8
- RHEL 9.4 kernel 5.14

**Libraries, tools and development packages:**

- gcc >= 10
- libva >= 2.18 (2.20 recommended)
- libva-devel >= 2.18 (2.20 recommended)
- libudev-devel
- pciutils
- kernel-devel
- openssl-devel
- zlib-devel
- pcre-devel
- git
- autoconf
- automake
- libtool
- httpd-tools
- boost-devel
- libnl3-devel.x86_64
- cmake
- pkgconf
- libdrm-devel
- nasm
- yasm
- libvpl-devel
- libvpl-tools
- libva-utils

**For RHEL:**

```sh
dnf install -y dnf-plugins-core libudev-devel pciutils gcc gcc-c++ openssl-devel zlib-devel pcre-devel git autoconf automake kernel-devel libtool httpd-tools boost-devel libnl3-devel pkgconf libdrm-devel libva-devel cmake
 
# NASM (not included in RHEL 9.4 package)
wget https://www.nasm.us/pub/nasm/releasebuilds/2.16.01/nasm-2.16.01.tar.bz2 && tar -xjf nasm-2.16.01.tar.bz2 && cd nasm-2.16.01 && ./configure --prefix=/usr/local && make -j$(nproc) && sudo make install && cd ..
 
# YASM (not included in RHEL 9.4 package)
wget https://www.tortall.net/projects/yasm/releases/yasm-1.3.0.tar.gz && tar -xzf yasm-1.3.0.tar.gz && cd yasm-1.3.0 && ./configure --prefix=/usr/local && make -j$(nproc) && sudo make install && cd ..
 
# libva-utils (not included in RHEL 9.4 package)
git clone https://github.com/intel/libva-utils.git && cd libva-utils && ./autogen.sh --prefix=/usr/local && make -j$(nproc) && sudo make install && cd ..
 
# Intel VPL (not included in RHEL 9.4 package)
git clone https://github.com/intel/libvpl.git && cd libvpl && mkdir build && cd build && cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local && make -j$(nproc) && sudo make install && sudo ldconfig

```

**For Ubuntu:**

```sh
apt-get install apache2 autoconf automake bison cmake diffutils dwarves findutils flex g++ gawk gcc git libboost-all-dev libdrm-dev libnl-3-dev libnl-genl-3-dev libssl-dev libtool libudev-dev libva-dev libva-drm2 mawk nasm ncurses-base ncurses-bin pciutils pkgconf udev vainfo yasm zlib1g zlib1g-dev libvpl-dev onevpl-tools
```

**Intel QAT 2.2 (Intel® QuickAssist Technology) package**  
*Recommended QAT package path:*  
`<Intel® Media Transcode Accelerator path>/src/aux_drv/qat_aux`

If QAT were to be placed in a different location, the user would be prompted to enter the actual path during the installation process.

### Installation
>
> **RUN ALL COMMANDS AS ROOT!**

Make sure that IOMMU is enabled (`intel_iommu=on,sm_on`):

```sh
cat /proc/cmdline
```

or

```sh
dmesg | grep -i iommu
```

To enable IOMMU do the following:  
Add boot line parameters `iommu=on intel_iommu=on,sm_on`,

**for RHEL:**

```sh
grubby --update-kernel=ALL --args="iommu=on intel_iommu=on,sm_on"
reboot
```

Red Hat does NOT automatically generate the `signing_key.pem` required for signing external kernel modules. To load Intel® Media Transcode Accelerator driver, make sure to generate this key manually. Go to `<kernel_source>/certs` (e.g. `/usr/src/kernels/5.14.0-503.11.1.el9_5.x86_64/certs`) and run:

```sh
openssl req -new -x509 -newkey rsa:2048 -keyout signing_key.pem -outform PEM -out signing_key.x509 -days 365 -nodes -subj "/CN=Module Signing Key"
```

**Install Media:**

```sh
source setenv.sh
./install.sh
```

### Uninstallation

First uninstall Intel QAT®

Go to `<Intel® Media Transcode Accelerator path>/src/aux_drv/qat_aux`  
Run:

```sh
make uninstall
```

Uninstall Intel® Media Transcode Accelerator

Go to `<Intel® Media Transcode Accelerator path>/src/aux_drv`

```sh
make uninstall
```

## Usage of Intel® Media Transcode Accelerator

Once everything is installed, you're ready to start processing videos! Here's what you can do:

**Full Offload Mode** Think of this as the autopilot for video processing. Whether you're decoding or encoding, the hardware takes care of it all, no extra help needed.

**Hybrid Mode** This is a team effort between your computer's CPU and the video hardware. Sometimes the CPU will take over, and other times it'll pass the baton to the hardware for certain tasks.

**Shared Virtual Memory (SVM)** This is like a shared workspace where your computer's CPU and the video hardware can both access the same files and information, making things more efficient.

**User Mode Command Dispatch** You can send commands directly to the video hardware, which is great for multitasking across different users.

Intel® Media Transcode Accelerator allows the creation of DRM devices which are used by FFMPEG as HW acceleration. Those devices work in:

**UQ (User Queue)** mode. On top of the mailroom, there's a help desk (User Queue Manager) where people can drop off their requests without worrying about how the mailroom sorts them. This help desk makes it even easier for anyone to send a message because they don't need to understand the mailroom's sorting system—they just hand over their request, and the help desk takes care of the rest. The User Queue is like a shared dropbox where everyone can leave their messages. This system is especially good at working with a shared memory space (SVM), which is like having a common area where both the messengers and the people sending requests can see and access everything they need.

**Simple use cases:**

- **vainfo:**

  ```sh
  su -c vainfo --display drm --device <device driver>
  ```

  `<device driver>` – e.g.: /dev/dri/renderD128 or /dev/dri/renderD129  

- **FFmpeg decode:**

  ```sh
  ffmpeg -hwaccel vaapi -hwaccel_device <device driver> -hwaccel_output_format vaapi -i <input file> -vf 'hwdownload,format=nv12' -pix_fmt yuv420p <output file>
  ```

  `<device driver>` – e.g.: /dev/dri/renderD128 or /dev/dri/renderD129  
  `<input file>` – a supported video file (avc, av1 or hevc)  
  `<output file>` – an uncompressed yuv file

- **FFmpeg transcode:**

  ```sh
  ffmpeg -hwaccel vaapi -init_hw_device vaapi=hw:<device driver> -hwaccel_output_format vaapi -v verbose -i <input file> -an -c:v <codec> -profile:v <profile> -rc_mode <rc mode> -g <GoP size> -slices <slices> -y <output file>
  ```

  `<device driver>` – e.g.: /dev/dri/renderD128 or /dev/dri/renderD129  
  `<input file>` – a supported video file (AVC, AV1 or HEVC)  
  `<codec>` - codec selected for encoding: av1_vaapi, h264_vaapi, hevc_vaapi or mjpeg_vaapi (profile:v parameter is ignored for JPEG coding, as it does not support profiles)  
  `<profile>` - coding profile (color depth); supported profiles: main (AVC, AV1, HEVC), main10 (HEVC), high (AVC), high10 (AVC)  
  `<rc mode>` - rate control mode, parameter in ffmpeg depend on the codec used, especially for hardware encoders (e.g., VAAPI). Rate control modes supported by Intel Media Transcode Accelerator are:  CBR (Constant Bitrate) for AVC,HEVC; VBR (Variable Bitrate) for AVC, HEVC, AV1; CQP (Constant Quantization Parameter) for AVC, HEVC, AV1;
   Rate Control Modes - Required Parameters:
   **CBR**: `-b:v <bitrate> -maxrate <bitrate> -bufsize <bufsize>` - Constant Bitrate  
   **VBR**: `-b:v <bitrate>` - Variable Bitrate (average target)
   **CQP**: `-qp <value>` - Constant Quantization Parameter (supported values 0-51 for AVC/HEVC, 0-63 for AV1, '0' means the best quality, suggested values: 20-45, where 26 is treated as default/optimal value)  
   Note: These modes are mutually exclusive - only one can be used per encoding session.

  `<GoP size>` - Group of Pictures size used for encoding; number should be higher than mini GoP value, depending on the low_delay mode and codec (go to Supported GOP Structures and Low Delay Mode)
  `<slices>` -  the number of fragments into which the image is divided during encoding, which allows for parallelization of processes and reduction of the risk of encoding error of the entire frame; the number of slices supported depends on the codec (AVC: 1-300, HEVC: 1-64, AV1: 1-64), but it is usually between d 1 and 16; we recommend choosing values between 1 and 4  
  `<output file>` – output file path  

- **FFmpeg encode:**

  ```sh
  ffmpeg -init_hw_device vaapi=hw:<device driver> -filter_hw_device hw -f rawvideo -pix_fmt yuv420p -s 720x420 -i <input file> -vf 'format=nv12,hwupload' -c:v <codec> -profile:v <profile> <output file>
  ```

  `<device driver>` – e.g.: /dev/dri/renderD128 or /dev/dri/renderD129  
  `<input file>` – an uncompressed yuv file  
  `<codec>` - codec selected for encoding: av1_vaapi, h264_vaapi, hevc_vaapi or mjpeg_vaapi (profile:v parameter is ignored for JPEG coding, as it does not support profiles)  
  `<profile>` - coding profile (color depth); supported profiles: main (AVC, AV1, HEVC), main10 (HEVC), high (AVC), high10 (AVC)  
  `<output file>` – output file path  
  
- **Other supported parameters (examples)**
`-compression_level` - at the moment FFmpeg itself does not support it for  av1_vaapi, h264_vaapi, hevc_vaapi codecs but our changes made it available so it can be added optionally for encoding and transcoding (supported values for AV1: 0-3, AVC: 0-2, HEVC: 0-4)  
`-vframes` - parameter in FFmpeg specifies how many video frames should be encoded or written to the output file  

**Hybrid Mode (HEVC):**

Hybrid mode can be used for HEVC (H.265) encoding, leveraging the x265 encoder with various presets and parameters. Hybrid mode allows you to combine hardware and software encoding features for optimal performance and quality. Adjust the hardware and software preset parameters to find the best balance.

Example command:

```sh
x265 --tune psnr --temporal-layers 4 --hwax 1 --bitrate <VBR> --input-res 1920x1080 --hwpreset <HWpreset> --preset <preset> --keyint 200 --input <input file> -o <output file>
```

Hybrid mode acceleration is enabled by setting --hwax 1.

`<VBR>` - Variable Bitrate (average bitrate value, e.g. 6000)
`<HWpreset>` - 0 – default, 1, 2 or 3, higher values may correspond to different hardware profiles (e.g., faster or more energy-efficient)  
`<preset>` - the trade-off between encoding speed and compression efficiency (the slower the preset, the better the compression, but the longer the encoding time; ultrafast (0), superfast (1), veryfast (2), faster (3), fast (4), medium (5, default), slow (6), slower (7), veryslow (8), placebo (9, highest quality, slowest);  
`<input file>` - 8-bit YUV file  
`<output file>` - output file path  

CRF (Constant Rate Factor) mode can also be used for quality-based encoding by setting the --crf parameter (e.g., --crf 23, --crf 28) instead of --bitrate, but it cannot be combined with VBR (Variable Bitrate) - only one of these modes may be active at a time.

## Quality Optimizations

### Lookahead

Lookahead allows the encoder to analyze future frames to make better decisions about bitrate allocation, resulting in higher quality video at the same bitrate at the expense of increased latency.  
Depending on `compression_level` setting, lookahead might be enabled by default. To overwrite the default setting, please use `--lookahead <lookahead depth>` option.

### Improved Mini GOP Structure

Improved mini GOP structure enhances the group of pictures (GOP) structure, leading to better compression efficiency and video quality.

### Supported GOP Structures and Low Delay Mode

- By default, Intel® Media Transcode Accelerator uses a hierarchical GOP structure with 7 B-frames (mini GOP 8), optimized for quality. The `-bf` parameter does not affect the GOP structure; the encoder always uses the same (pyramidal) structure.
- For low latency use cases, enable Low Delay mode by adding `-low_delay 1` to your ffmpeg command.
- The GOP structure cannot be changed via the `-bf` parameter. Low delay mode is only available through the `-low_delay` parameter.
- When low delay mode is enabled:
  - Lookahead is automatically disabled.
  - The GOP structure changes: for HEVC, a mini GOP of 4 is used; for AVC, a mini GOP of 1 is used.
  - All references are to previous frames (no reordering).

**Example of a command using low latency:**

```sh
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -i <input file> -c:v hevc_vaapi -low_delay 1 <output file>
```

### Other
>
> Please note, that Intel® Media Transcode Accelerator software stack includes an optimized (patched) version of FFmpeg which is part of the Intel® Media Transcode Accelerator installation package. The optimized version of FFmpeg will display "Intel® Media Transcode Accelerator optimized version" in its banner.

## Limitations

- The current build system does not save environmental variables so for each session user need to run setenv.sh script again.
- GStreamer is not supported, but providing support for it is planned for the future.
- In Low Delay mode, the AV1 codec is not supported, but providing support for it is planned.

For complete list of limitations please refer to release notes.

## Telegraf support for Intel® Media Transcode Accelerator

The Intel® Media Transcode Accelerator (MTA) Input Plugin for Telegraf collects hardware utilization metrics from Intel MTA devices using the Linux sysfs telemetry interface.
It periodically reads low-level counters directly from the device, providing real-time insights into encoding, decoding, and image/media processing utilization.
The plugin supports configurable telemetry intervals and averaging windows, allowing users to fine-tune the granularity and smoothing of reported metrics.
It is designed for Linux systems and may require elevated permissions to access device telemetry data.

**For now plugin is available as Telegraf patch only.**

### Patch installation

1. Install Go (required by Telegraf)
   For RHEL:

   ```sh
   dnf install -y golang
   ```

   For Ubuntu:

   ```sh
   apt-get install golang-go
   ```

2. Clone Telegraf repository

   ```sh
   git clone https://github.com/influxdata/telegraf
   ```

3. Copy patch from Intel® Media Transcode Accelerator package

   ```sh
   cp <Intel® Media Transcode Accelerator path>/src/telegraf_intel_mta_plugin.patch <Telegraf path>
   ```

4. Apply the patch

      ```sh
   git apply telegraf_intel_mta_plugin.patch
   ```

5. Build Telegraf

   ```sh
   make
   ```

6. Configure the plugin
   - Edit your *telegraf.conf* and add the [[inputs.intel_mta]] section as described in plugin README or use the provided sample config.
7. Run Telegraf

   ```sh
   ./telegraf --config telegraf.conf
   ```

## Utilization Monitor

Utilization Monitor is a Python script which displays real-time utilization data for Intel® Media Transcode Accelerator devices. It can also save the utilization values over time as CSV which allows for creation of utilization graphs. The script will work on a GNR-D system with properly set up Intel® QuickAssist Technology 2.2 and Intel® Media Transcode Accelerator drivers.  

Usage:

```sh
./utilmon.py -i 1 -o ./util.csv
```

Arguments:  
`-i` - telemetry interval ( 0 - 1ms, 1 - 33ms, 2 - 268ms, 3 - 1s), optional
`-o` - path to output csv file, optional  

If run with no arguments, the default interval is 0 (1ms) and utilization data will not be saved to csv. This will launch Utilization Monitor which will work until closed via CTRL+C key combo. In a separate terminal instance, an encoding/decoding/transcoding task can be launched and Utilization Monitor should show real-time usage.

## Virtualization Support – Preview
- Support for virtualized environments using QEMU is currently in preview mode. While SR-IOV VF passthrough has been
  validated with select bitstreams, some limitations may still occur. Users are advised to evaluate virtualization
  use cases accordingly.

  Note: Containerized deployments using Docker are fully supported and offer a stable alternative for most use cases

## Virtualization usage guidelines

- For QEMU, WQM mode with "virtio-iommu-pci" (paravirtualized vIOMMU) or "intel-iommu"
  is supported.
- vIOMMU is necessary to allocate and map large memory buffers needed by the HW
  and SW stack during the operation.
- iommufd support on the host side is necessary to link the host IOMMU with the one
  in the VM.
- QEMU is sensitive about the order of device declaration.
- This combination of parameters (order is important) adds an iommufd object, creates
  a vIOMMU, binds a VF to the VM with vIOMMU support:

```sh
    -object iommufd,id=iommufd0 \
    -device virtio-iommu-pci \
    -device vfio-pci,host=0000:01:00.1,iommufd=iommufd0 \
```
  or
```sh
    -object iommufd,id=iommufd0 \
    -device intel-iommu,caching-mode=on,x-scalable-mode=on,x-pasid-mode=on \
    -device vfio-pci,host=0000:01:00.1,iommufd=iommufd0 \
```
- The (para)virtualized kernel needs to be built with CONFIG_VIRTIO_IOMMU=y or
  CONFIG_INTEL_IOMMU=y.
  This is distribution-specific and should be clarified in distribution
  documentation.
- Sample scripts are included in the "samples/" directory

## Legal/Dislaimers

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL(R) PRODUCTS.
NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL
PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S
TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY
WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO
SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING
TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY
PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel® products are
not intended for use in medical, life saving, life sustaining, critical control
or safety systems, or in nuclear facility applications.

Intel® may make changes to specifications and product descriptions at any time,
without notice.

(C) Intel® Corporation 2025

- Other names and brands may be claimed as the property of others.