# Intel® Media Transcode Accelerator ## OVERVIEW Intel® Media Transcode Accelerator provides video encoding and decoding hardware acceleration to improve efficiency and performance by moving operations from CPU to Media HW, saving a significant amount of CPU cores. This README applies to Intel® GNR-D with Media device. ## Installation of Intel® Media Transcode Accelerator ### Requirements and Dependencies **Supported OS and kernels:** - Ubuntu 24.04: - kernel 6.8 - kernel 6.14 - RHEL 9.6: - kernel 5.14.0-570 - RHEL 10: - kernel 6.12.0-55 **Libraries, tools and development packages:** - gcc >= 10 - libva >= 2.18 (2.20 recommended) - libva-devel >= 2.18 (2.20 recommended) - libudev-devel - pciutils - kernel-devel - openssl-devel - zlib-devel - pcre2-devel - git - autoconf - automake - libtool - httpd-tools - boost-devel - libnl3-devel.x86_64 - cmake - pkgconf - libdrm-devel - nasm - yasm - libvpl-devel - libvpl-tools - libva-utils **For RHEL:** ```sh dnf install -y dnf-plugins-core libudev-devel pciutils gcc gcc-c++ openssl-devel zlib-devel pcre2-devel git autoconf automake kernel-devel libtool httpd-tools boost-devel libnl3-devel pkgconf patch libdrm-devel libva-devel cmake # NASM (not included in RHEL 9.4 package) wget https://www.nasm.us/pub/nasm/releasebuilds/2.16.01/nasm-2.16.01.tar.bz2 && tar -xjf nasm-2.16.01.tar.bz2 && cd nasm-2.16.01 && ./configure --prefix=/usr/local && make -j$(nproc) && sudo make install && cd .. # YASM (not included in RHEL 9.4 package) wget https://www.tortall.net/projects/yasm/releases/yasm-1.3.0.tar.gz && tar -xzf yasm-1.3.0.tar.gz && cd yasm-1.3.0 && ./configure --prefix=/usr/local && make -j$(nproc) && sudo make install && cd .. # libva-utils (not included in RHEL 9.4 package) git clone https://github.com/intel/libva-utils.git && cd libva-utils && ./autogen.sh --prefix=/usr/local && make -j$(nproc) && sudo make install && cd .. # Intel VPL (not included in RHEL 9.4 package) git clone https://github.com/intel/libvpl.git && cd libvpl && mkdir build && cd build && cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local && make -j$(nproc) && sudo make install && sudo ldconfig ``` **For Ubuntu:** ```sh apt-get install apache2 autoconf automake bison cmake diffutils dwarves findutils flex g++ gawk gcc git libboost-all-dev libdrm-dev libnl-3-dev libnl-genl-3-dev libssl-dev libtool libudev-dev libva-dev libva-drm2 mawk nasm ncurses-base ncurses-bin pciutils pkgconf udev vainfo yasm zlib1g zlib1g-dev libvpl-dev onevpl-tools ``` **Intel QAT 2.2 (Intel® QuickAssist Technology) package** Copy the uncompressed package to: `/src/aux_drv/qat_aux` ### Installation > **RUN ALL COMMANDS AS ROOT!** Make sure that IOMMU is enabled (`intel_iommu=on,sm_on`): ```sh cat /proc/cmdline ``` or ```sh dmesg | grep -i iommu ``` To enable IOMMU do the following: Add boot line parameters `iommu=on intel_iommu=on,sm_on`, **for RHEL:** ```sh grubby --update-kernel=ALL --args="iommu=on intel_iommu=on,sm_on" reboot ``` Red Hat does NOT automatically generate the `signing_key.pem` required for signing external kernel modules. To load Intel® Media Transcode Accelerator driver, make sure to generate this key manually. Go to `/certs` (e.g. `/usr/src/kernels/5.14.0-503.11.1.el9_5.x86_64/certs`) and run: ```sh openssl req -new -x509 -newkey rsa:2048 -keyout signing_key.pem -outform PEM -out signing_key.x509 -days 365 -nodes -subj "/CN=Module Signing Key" ``` **Install Media:** ```sh source setenv.sh ./install.sh ``` ### Uninstallation Run installation script with clean command ```sh ./install.sh clean ``` Or you can do it manually : First uninstall Intel QAT® Go to `/src/aux_drv/qat_aux` Run: ```sh make uninstall ``` Uninstall Intel® Media Transcode Accelerator Go to `/src/aux_drv` ```sh make uninstall ``` ## Usage of Intel® Media Transcode Accelerator Intel® Media Transcode Accelerator allows the creation of DRM devices which are used by FFMPEG as HW acceleration. Those devices work in two modes: - **UQ (User Queue)** - default on host without SRIOV enabled, - **WQM (Worker Queue Mode)** - default on virtual machine or host with SRIOV. UQ mode allows the device to be used by several users at the same time, WQM allows only one user to access the device. Modes supported: **Full Offload Mode** The hardware takes care of all. It is supported in FFmpeg and sample applications for VPL. You can find them in `/src/libvpl-media-ip-rt/build/bin/Release` folder. **Hybrid Mode** This is a team effort between CPU and the hardware. It is supported in x264 and x265 applications. ### Device check - **vainfo:** ```sh su -c vainfo --display drm --device ``` `` – e.g.: /dev/dri/renderD128 or /dev/dri/renderD129 ### Full offload mode - **FFmpeg decode:** ```sh ffmpeg -hwaccel vaapi -hwaccel_device -hwaccel_output_format vaapi -i -vf 'hwdownload,format=nv12' -pix_fmt yuv420p ``` `` – e.g.: /dev/dri/renderD128 or /dev/dri/renderD129 `` – a supported video file (avc, av1 or hevc) `` – an uncompressed yuv file - **FFmpeg transcode:** ```sh ffmpeg -hwaccel vaapi -init_hw_device vaapi=hw: -hwaccel_output_format vaapi -v verbose -i -an -c:v -profile:v -rc_mode -g -slices -compression_level -y ``` `` – e.g.: /dev/dri/renderD128 or /dev/dri/renderD129 `` – a supported video file (AVC, AV1 or HEVC) `` - codec selected for encoding: av1_vaapi, h264_vaapi, hevc_vaapi or mjpeg_vaapi (profile:v parameter is ignored for JPEG coding, as it does not support profiles) `` - coding profile (color depth); supported profiles: main (AVC, AV1, HEVC), main10 (HEVC), high (AVC), high10 (AVC) `` - rate control mode. Requires additional parameters: - **VBR**: `-b:v -maxrate -bufsize ` - Variable Bitrate (average target) - **CBR**: `-b:v ` - Constant Bitrate - **CQP**: `-qp ` - Constant Quantization Parameter (supported values 0-51 for AVC/HEVC, 0-63 for AV1, '0' means the best quality, suggested values: 20-45, where 26 is treated as default/optimal value) Note: These modes are mutually exclusive - only one can be used per encoding session. `` - Group of Pictures size used for encoding; number should be higher than mini GoP value, depending on the low_delay mode and codec (go to Supported GOP Structures and Low Delay Mode) `` -  the number of fragments into which the image is divided during encoding, which allows for parallelization of processes and reduction of the risk of encoding error of the entire frame; the number of slices supported depends on the codec (AVC: 1-300, HEVC: 1-64, AV1: 1-64), but it is usually between 1 and 16; we recommend choosing values between 1 and 4 `` – output file path `compression_level` - 0 (default) for best quality, supported values for AV1: 0-3, AVC: 0-2, HEVC: 0-4 - **FFmpeg encode:** ```sh ffmpeg -init_hw_device vaapi=hw: -filter_hw_device hw -f rawvideo -pix_fmt yuv420p -s 720x420 -i -vf 'format=nv12,hwupload' -c:v -profile:v ``` ### Hybrid Mode Hybrid mode can be used for HEVC and AVC encoding, leveraging the encoder with various presets and parameters. Hybrid mode allows you to combine hardware and software encoding features for optimal performance and quality. Adjust the hardware and software preset parameters to find the best balance. **Example HEVC encode command:** ```sh x265 --tune psnr --temporal-layers 4 --no-scenecut --no-open-gop --ctu 64 --min-cu-size 8 --tu-inter-depth 4 --tu-intra-depth 3 --rect --amp --no-temporal-mvp --no-strong-intra-smoothing --no-signhide --no-weightp --no-weightb --hwax 1 --hwapi 0 --fps 60000/1000 --bitrate 6000 --input-res 2048x858 --hwpreset 3 --preset 4 --keyint 200 --min-keyint 200 --frames 50 --input-depth 8 --input -o ``` **Example AVC encode command:** ```sh x264 --tune psnr --no-scenecut --hwax 1 --fps 60000/1000 --bitrate 6000 --input-res 2048x858 --hwpreset 0 --preset slow --keyint 200 --min-keyint 200 --frames 50 --input-depth 8 -o ``` **Explanation of options:** - `hwax` - use acceleration (1) - `hwapi` - API select 0=VA-API, 1=VPL (x265 only) - `hwpreset` - 0 – default, 1, 2 or 3, higher values may correspond to different hardware profiles (e.g., faster or more energy-efficient) - `preset` - the trade-off between encoding speed and compression efficiency, the slower the preset, the better the compression, but the longer the encoding time; ultrafast (0), superfast (1), veryfast (2), faster (3), fast (4), medium (5, default), slow (6), slower (7), veryslow (8), placebo (9, highest quality, slowest); - `` - 8-bit YUV file - `` - output file path CRF (Constant Rate Factor) mode can also be used for quality-based encoding by setting the --crf parameter (e.g., --crf 23, --crf 28) instead of --bitrate, but it cannot be combined with VBR (Variable Bitrate) - only one of these modes may be active at a time. ## Quality Optimizations ### Lookahead Lookahead allows the encoder to analyze future frames to make better decisions about bitrate allocation, resulting in higher quality video at the same bitrate at the expense of increased latency. Depending on `compression_level` setting, lookahead might be enabled by default. To overwrite the default setting, please use `--lookahead ` option. ### Improved Mini GOP Structure Improved mini GOP structure enhances the group of pictures (GOP) structure, leading to better compression efficiency and video quality. ### Supported GOP Structures and Low Delay Mode - By default, Intel® Media Transcode Accelerator uses a hierarchical GOP structure with 7 B-frames (mini GOP 8), optimized for quality. The `-bf` parameter does not affect the GOP structure; the encoder always uses the same (pyramidal) structure. - For low latency use cases, enable Low Delay mode by adding `-low_delay 1` to your ffmpeg command. - The GOP structure cannot be changed via the `-bf` parameter. Low delay mode is only available through the `-low_delay` parameter. - When low delay mode is enabled: - Lookahead is automatically disabled. - The GOP structure changes: for HEVC, a mini GOP of 4 is used; for AVC, a mini GOP of 1 is used. - All references are to previous frames (no reordering). **Example of a command using low latency:** ```sh ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -i -c:v hevc_vaapi -low_delay 1 ``` ### Other > Please note, that Intel® Media Transcode Accelerator software stack includes an optimized (patched) version of FFmpeg which is part of the Intel® Media Transcode Accelerator installation package. The optimized version of FFmpeg will display "Intel® Media Transcode Accelerator optimized version" in its banner. ## Limitations - GStreamer is not supported, but providing support for it is planned for the future. - In Low Delay mode, the AV1 codec is not supported, but providing support for it is planned. For complete list of limitations please refer to release notes. ## Telegraf support for Intel® Media Transcode Accelerator The Intel® Media Transcode Accelerator (MTA) Input Plugin for Telegraf collects hardware utilization metrics from Intel MTA devices using the Linux sysfs telemetry interface. It periodically reads low-level counters directly from the device, providing real-time insights into encoding, decoding, and image/media processing utilization. The plugin supports configurable telemetry intervals and averaging windows, allowing users to fine-tune the granularity and smoothing of reported metrics. It is designed for Linux systems and may require elevated permissions to access device telemetry data. **For now plugin is available as Telegraf patch only.** ### Patch installation 1. Install Go (required by Telegraf) For RHEL: ```sh dnf install -y golang ``` For Ubuntu: ```sh apt-get install golang-go ``` 2. Clone Telegraf repository ```sh git clone https://github.com/influxdata/telegraf ``` 3. Copy patch from Intel® Media Transcode Accelerator package ```sh cp /src/telegraf_intel_mta_plugin.patch ``` 4. Apply the patch ```sh git apply telegraf_intel_mta_plugin.patch ``` 5. Build Telegraf ```sh make ``` 6. Configure the plugin - Edit your *telegraf.conf* and add the [[inputs.intel_mta]] section as described in plugin README or use the provided sample config. 7. Run Telegraf ```sh ./telegraf --config telegraf.conf ``` ## Utilization Monitor Utilization Monitor is a Python script which displays real-time utilization data for Intel® Media Transcode Accelerator devices. It can also save the utilization values over time as CSV which allows for creation of utilization graphs. The script will work on a GNR-D system with properly set up Intel® QuickAssist Technology 2.2 and Intel® Media Transcode Accelerator drivers. Usage: ```sh ./utilmon.py -i 1 -o ./util.csv ``` Arguments: `-i` - telemetry interval ( 0 - 1ms, 1 - 33ms, 2 - 268ms, 3 - 1s), optional `-o` - path to output csv file, optional If run with no arguments, the default interval is 0 (1ms) and utilization data will not be saved to csv. This will launch Utilization Monitor which will work until closed via CTRL+C key combo. In a separate terminal instance, an encoding/decoding/transcoding task can be launched and Utilization Monitor should show real-time usage. ## Virtualization Support – Preview - Support for virtualized environments using QEMU is currently in preview mode. While SR-IOV VF passthrough has been validated with select bitstreams, some limitations may still occur. Users are advised to evaluate virtualization use cases accordingly. Note: Containerized deployments using Docker are fully supported and offer a stable alternative for most use cases ## Virtualization usage guidelines - For QEMU, WQM mode with "virtio-iommu-pci" (paravirtualized vIOMMU) or "intel-iommu" is supported. - vIOMMU is necessary to allocate and map large memory buffers needed by the HW and SW stack during the operation. - iommufd support on the host side is necessary to link the host IOMMU with the one in the VM. - QEMU is sensitive about the order of device declaration. - This combination of parameters (order is important) adds an iommufd object, creates a vIOMMU, binds a VF to the VM with vIOMMU support: ```sh -object iommufd,id=iommufd0 \ -device virtio-iommu-pci \ -device vfio-pci,host=0000:01:00.1,iommufd=iommufd0 \ ``` or ```sh -object iommufd,id=iommufd0 \ -device intel-iommu,caching-mode=on,x-scalable-mode=on,x-pasid-mode=on \ -device vfio-pci,host=0000:01:00.1,iommufd=iommufd0 \ ``` - The (para)virtualized kernel needs to be built with CONFIG_VIRTIO_IOMMU=y or CONFIG_INTEL_IOMMU=y. This is distribution-specific and should be clarified in distribution documentation. - Sample scripts are included in the "samples/" directory ## Host SRIOV mode usage guidelines - Only WQM mode is supported for SRIOV host. To enable, it is necessary to set environment variables VA_USE_SVM=0 export VA_USE_WQM=1 ## Legal/Dislaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL(R) PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel® products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications. Intel® may make changes to specifications and product descriptions at any time, without notice. (C) Intel® Corporation 2025 - Other names and brands may be claimed as the property of others.