Intel(R) QuickAssist Technology Software Readme =============================================== Intel(R) QuickAssist Technology Software Package Version: QATSWPkgVersion Intel(R) QuickAssist Technology Driver Version: 2.6.0.x Contents ======== - License - Details/Limitations of this Release - Software Installation - Intel QuickAssist Technology Compression library - QATzip - QATzip - Introduction - QATZip - Features - QATzip - Hardware Requirements - QATzip - Software Requirements - QATzip - API Manual - QATzip - Additional Information - QATzip - Limitations - QATzip software Application (Parcomp) - Cryptography performance micro-benchmark Tool (CNGTest) - Rate Limiting - Rate Limiting - API Programmer's Guide - Troubleshooting License ======= Refer to license.txt in this package for the Intel software license agreement before using this software. In addition to Intel software, this package includes the following components: 1) Built in sample code from Microsoft samples, licenced with MS-LPL (license.rtf) * License file included in this package and detailed at: http://code.msdn.microsoft.com/windowshardware/Windows-8-Driver-Samples-5e1aa62e 2) This software package uses the Intel(R) Storage Acceleration library (isa-l) to perform data compression using software. * The isa-l library is used under the terms of the license listed at: https://github.com/intel/isa-l/blob/master/LICENSE * This license file is included in this package as a file named isa-l_LICENSE.txt 3) This software package also uses the LZ4 software library to perform data compression using the LZ4 algorithm. * The LZ4 software library is used under the terms of the license listed at: https://github.com/lz4/lz4/blob/dev/lib/LICENSE * This license file is included in this package as a file named lz4_LICENSE.txt Details/Limitations of this Release =================================== * This software is only supported on Windows Server 2022. * 32-bit applications are not supported on Windows Server 2022 when using the Intel QuickAssist Technology CNG provider. * This software package supports virtualization (SR-IOV) using Hyper-V for Intel QuickAssist Technology devices. - Virtualization is only supported with Linux VMs running Ubuntu v18.04 or Ubuntu v20.04, over Hyper-V. - Virtualization is also supported on Windows VMs (Windows Server 2019 or newer) running over Hyper-V. * Windows Remote Desktop is not supported if Intel QuickAssist Technology CNG providers are registered as default providers for cryptographic algorithms. Software Installation ===================== To install the Intel(R) QuickAssist Accelerator software: - Navigate to the QuickAssist\Setup sub-folder (within the folder where the package was extracted) - Run QatSetup.exe - Follow all instructions as displayed by the installation program. - For virtualization (SR-IOV) support without QAT Host services, select the option to install as a "virtualization host" in the installation program. A system restart is required at the end of the installation in order to fully enable virtualization support. - When the driver is installed, check the Device Manager for four devices under 'Security Accelerator'. - Ensure that the devices are in 'Enabled' state and 'Hardware Ids' in the ‘Details’ tab shows 4940, 4942, 4944 or 4946 - In a Windows Virtual Machine (VM), after the driver is installed, check the Device Manager for accelerator devices under 'Security Accelerator'. - Ensure that the devices are in 'Enabled' state and 'Hardware Ids' in the ‘Details’ tab shows 4941, 4943, 4945 or 4947 To uninstall the Intel(R) QuickAssist Accelerator software: - Open "Programs and Features" from the Control Panel application - Click on the installed application "Intel(R) QuickAssist Technology 2.6.0.xxxx" - Choose Uninstall - Reboot Release Notes ============= For the latest information about this release, download the "Release Notes" from the same location where you downloaded this software package. Getting Started Guide ===================== For general information on how to use this software package, download the "Getting Started Guide" from the same location where you downloaded this software package. Intel(R) QuickAssist Technology Compression library - QATzip ============================================================ * Other names and brands may be claimed as the property of others The Intel(R) QuickAssist Technology Compression library called QATzip and its associated header file can be found in the "\Intel\Intel(R) QuickAssist Technology\Compression\Library" folder. The components of the library are: 1) qatzip.h - Header file describing the QATzip API. 2) libqatzip.lib - Static library containing the implementation of the QATzip API 3) qatzip.lib - Import library to interface with qatzip.dll 4) qatzip.dll - DLL containing the implementation of the QATzip API - installed into the folder The library and header file can be compiled and linked into any software component that requires Intel(R) QuickAssist Technology compression and/or decompression services. More information on the QATzip API and other details about the library are available upon request. QATzip - Introduction ===================== QATzip is a user space library which builds on top of the Intel(R) QuickAssist Technology user space library, to provide extended accelerated compression and decompression services by offloading the actual compression and decompression request(s) to the Intel(R) QuickAssist Accelerator. QATzip produces data using the standard Gzip* format (RFC 1952) with extended headers encapsulated with an additional 4 bytes to accelerate data decompression. QATzip is designed to take full advantage of the performance provided by Intel(R) QuickAssist Technology. The currently supported formats include: * Formats based on algorithms: | Data Format | Parcomp Provider | Description | :---------------: | :---------------: | :------------------------------------------------------------: | | `QZ_DEFLATE_4B` | qat |Data is in DEFLATE* with a 4 byte header| | `QZ_DEFLATE_GZIP` | qatgzip |Data is in DEFLATE* wrapped by Gzip* header and footer| | `QZ_DEFLATE_GZIP_EXT` | qatgzipext |Data is in DEFLATE* wrapped by Intel(R) QAT Gzip* extension header and footer| | `QZ_DEFLATE_RAW` | N/A |Data is in raw DEFLATE* without any additional header. (Not supported since release 1.4.)| * Available compression algorithms: | Compression Algorithm | Parcomp Provider | Description | :---------------: | :-----------------------: | :------------------------------------------------------------: | | `QZ_DEFLATE` | qat, qatgzip, qatgzipext |Data is in DEFLATE*| | `QZ_MSZIP_COMPATIBLE` | qatms |MSZIP* format wrapped with a 4 byte header. Deprecated in release 1.4| | `QZ_ZLIB_COMPATIBLE` | qatzlib |zlib* format wrapped with a 4 byte header| | `QZ_SW_XPRESS` | xpress |Software Compression using Xpress* algorithm wrapped with a 4 byte header| | `QZ_SW_IGZIP` | igzip |Software Compression using DEFLATE* algorithm wrapped with a 4 byte header| | `QZ_LZ4` | qatlz4 |Compression using LZ4* algorithm| QATZip - Features ================= * Acceleration of compression and decompression utilizing Intel(R) QuickAssist Technology, including a utility to compress and decompress files. * Instance over-subscription, allowing a number of threads in the same process to seamlessly share a smaller number of hardware instances. * Optional software fallback for both compression and decompression services. QATzip Microsoft(R) Windows(TM) may switch to software if there is insufficient system resources including acceleration instances or memory. This feature allows for a common software stack between server platforms that have acceleration devices and non-accelerated platforms. * Intel(R) QATzip 4 byte header: This header is composed of an unsigned integer [4 bytes] indicating the length of the compressed block followed by the standard header for the data format used. * Introduction of QATzip Gzip* format. This consists of 10 bytes as the standard Gzip* data format, which is structured as follows: `| ID1(0x1F) 1B | ID2(0x8B) 1B | Compression Method (8 = DEFLATE*) 1B | Flags 1B | Modification Time 4B | Extra Flags 1B | Operating System 1B |` * Introduction of QATzip Gzip* extended format. This consists of the standard 10 byte Gzip* header and follows RFC 1952 to extend the header by an additional 14 bytes. Below is an outline of the extended headers structure: `| Length of ext. header 2B | ID1('Q') 1B | ID2('Z') 1B | Length of subheader 2B | Intel(R) defined field 'Chunksize' 4B | Intel(R) defined field 'Blocksize' 4B |` Chunksize and Blocksize are unsigned integers, which stores the original size of the data and the size of the compressed data block respectively. * Introduction of Dynamically Linked Library for QATzip Microsoft(R) Windows(TM). * Introduction of QATzip LZ4* format. This format compresses using multiple blocks, which are organized into a frame that is structured as follows: `| MagicNb(0x184D2204) 4B | FLG(0x60) 1B | BD(0x60) 1B | HC(0x51) 1B | Block 1 size 4B | Block 1 | ... | Block N size 4B | Block N | EndMark(0x00000000) 4B |` * Introduction of Programmable Cyclic Redundancy Check (CRC). This will allow for custom CRC64 configurations to be set for a provided session with a user defined set of parameters as follows. | Parameter | Description | | :---------------: | :-----------------------: | | `polynomial` | Polynomial used for CRC64 calculation. Default 0x42F0E1EBA9EA3693 | | `initial_value` | Defaults to 0x0000000000000000 | | `reflect_in` | Reflect bit order before CRC calculation. Default 0 | | `reflect_out` | Reflect bit order after CRC calculation. Default 0 | | `xor_out` | Defaults to 0x0000000000000000 | A custom programmable CRC64 configuration can only be set on a session after setup. A state machine tracks the state of the session to only allow programmable CRC64 configurations to be set in the `setup` state. The states of a session are defined as follows. | Updating | ^ | | ⌄ | Created | -> | Setup | -> | Active | -> | Closing | In order to propagate a new CRC64 configuration the session must be restarted. Requests received while multithreading in the 'updating' state will be rejected. The CRC64 value is calculated for the src buffer in compression and dst buffer in decompression. The completed compression or decompression blocks are placed in the output buffer and the CRC64 checksum will be in the user provided buffer *crc. QATzip - Hardware Requirements ============================== This QATzip library supports compression and decompression offload to the following acceleration devices: * Intel(R) 4XXX Accelerator QATzip - API Manual =================== Please refer to file `QATzip-man.pdf` found at this link https://github.com/intel/QATzip/blob/master/docs/QATzip-man.pdf QATzip - Limitations ==================== * When passing data for compression into the library the complete payload for compression should be passed in rather than sub divided due to the "last" bit being set on the final compressed block. * This software is only supported on the Microsoft(R) Windows(TM) Server 2022. * Largest compressible file size limitation of 999MB. * Since release 1.4, "RAW" DEFLATE* (QZ_DEFLATE_RAW) is not a supported data format. * Software fallback in QATzip is not applicable for the following formats: * `QZ_ZLIB_COMPATIBLE` * `QZ_MSZIP_COMPATIBLE` * `QZ_SW_XPRESS` * `QZ_SW_IGZIP` * Gzip* decompression is currently only supported using software offload as Gzip* does not contain a blocksize value. QATzip - Known Issues ===================== * When decompressing a file it is important to match the chunk size/hw_buff_sz to the value that was specified to compress the data. If no value was specified 64KB is the default. This value is used during decompression to provision appropriate space between the inflated blocks to minimise the number of buffer copies during parallel driver decompression. Using an incorrect value will result in inflated blocks which overlap or blocks spaced too far apart. This issue applies to all formats with the exception of QZ_DEFLATE_GZIP compressed format. The primary mitigation for this issue is to record the chunk size/hw_buff_sz used during compression. QATzip software Application (Parcomp) ===================================== This package comes with a tool called 'parcomp' to test the performance of the Intel(R) QuickAssist Technology compression accelerator. Parcomp has been built using the QATzip API and library, and can be found in the "\Intel\Intel(R) QuickAssist Technology\Compression" folder. It can be used to measure and report the rate at which compression and decompression operations are performed using the accelerator as well as those operations performed by the default services offered by the system OS. Note:- When the Parcomp application is running in Windows Virtual Machines (VMs), the memory (RAM) allocated to the VM should be of sufficient size. For example, using Parcomp to compress a file of 999MB with 2 threads, the memory allocated to the VM should be at least 4GB or more. To run Parcomp: -------------- 1) Launch Command Prompt (cmd.exe) as Administrator. 2) Navigate to the following sub-folder where the software package was installed: "\Intel\Intel(R) QuickAssist Technology\Compression" 3) Run Parcomp (eg:) For compression: parcomp -p qat –i -o For de-compression: parcomp -p qat -d –i -o 4) To see a list of supported command-line options, simply run the application without any command-line options (or use the -h option) Command-line options: -------------------- Usage: parcomp.exe -i -o [options] Required options: -i srcFilename Input (source) filename -o dstFilename Output (destination) filename Optional options can be: -b [cold|warm] Use cold buffer or not. -p providerName Specifies the provider (implementation). Options include: 'qat' - QuickAssist accelerated DEFLATE* algorithm 'qatzlib' - QuickAssist accelerated DEFLATE* algorithm with zlib* header. 'qatms' - Deprecated QuickAssist accelerated DEFLATE* algorithm with MSZIP* header. Only decompression supported. 'qatgzip' - QuickAssist accelerated DEFLATE* algorithm with Gzip* header. Note: With qatgzip provider, options -k -t -Q cannot be used in decompression direction. The operation will be executed in a single thread with SW implementation. Parcomp application cannot process Gzip* source buffers bigger than 1Gb and the number of iterations is 1. 'qatgzipext' - QuickAssist accelerated DEFLATE* algorithm with Gzip* extended header. 'xpress' - Software based Xpress* algorithm. 'igzip' - Software based DEFLATE* algorithm using igzip dll. 'qatlz4' - QuickAssist accelerated algorithm with lz4* header. -c chunkSizeInKB Chunk size, in KB. Default is 64. Files will be divided into chunks of this size (the last chunk may be smaller), and each chunk is compressed separately. State is not maintained between chunks. The chunk size used for decompression must match the value used for compression. -crc bitWidth CRC bitwidth. Valid value is 64. -pcrc index CRC64 configuration to use. Valid values are 0 (ECMA-182) or 1 (Rocksoft). Default is 0. -l compressionLevel Compression level. Default is 1. compressionLevel can range from 1 - 4. Lower values imply less compressibility in less time. -d Decompress the input file. Default is to compress. -v (or -g) Verbose (or debug) -x numLines Print a summary of the inputs and outputs in a comma-separated variable (CSV) format for easy importing into a spreadsheet. Specify numLines as 1 for data only, or 2 to also include a header summary -t numThreads Creates specified number of threads, splits input file into numThreads (near-)equal chunks, and performs the operation in each thread. If specifying multiple threads, -Q is also required. -f cpuFreqInMHz Specifies the CPU frequency in MHz. If not specified, this will be measured (takes approx. 1 second) -n numIterations Specifies the number of iterations (allows you to run the same operation numIterations times) Default is 1. -Q Test independent threads writing one process using one session. Uses multiple copies of same input file, outputs one output file per thread. Must be used with -t and threadcount of 1 or more. -k blockSizeInKB Separate the source data into several blocks of size specified by blockSizeInKB. -k uses -Q by default. -h Print this help message. The following are applicable for the providers qat (all options) and qatlz4 (-FB, -SW and -FT) only: -j maxOutstandingJobs Maximum number of outstanding jobs (requests) that may be outstanding at any one time. Default is 30. -s Static compression. Default is dynamic. -D Dynamic compression. This is default. -FB Enable igzip and LZ4 software fallback. -FT thresSizeInKB Threshold value for fallback. If the offload size is less than the threshold, software provider is used. Sample Results -------------- Here are a few examples of the results obtained by running Parcomp for compression and decompression: Note: These examples are for illustrative purposes only. 1) Using Intel(R) QuickAssist Technology accelerator for compression and decompression: parcomp -p qat -i largetext -o largetext.compressed --------------------------------------------------- Warning: The hw_buff_sz parameter value used for decompression must match the value used for compression. Default hw_buff_sz value: 65536 bytes. hw_buff_sz value used in current execution: 65536 bytes. Parcomp: Tool to test compression & decompression (c) 2018, Intel(R) Corporation Reading input file: C:\CompressionFiles\largetext (1000000000 Bytes) Writing output file: C:\CompressionFiles\largetext.compressed (489318805 Bytes) Deflation Ratio (%age) : 48.9 Thruput (uncompressed Mbps): 12312.576 Time (ms) : 571.054 Note:- All times exclude file I/O and are measured around the call to the qzCompress() API only. parcomp -p qat -d -i largetext.compressed -o largetext.original --------------------------------------------------------------- Warning: The hw_buff_sz parameter value used for decompression must match the value used for compression. Default hw_buff_sz value: 65536 bytes. hw_buff_sz value used in current execution: 65536 bytes. Parcomp: Tool to test compression & decompression (c) 2018, Intel(R) Corporation Reading input file: C:\CompressionFiles\largetext.compressed (489318806 Bytes) Writing output file: C:\CompressionFiles\largetext.original (1000000000 Bytes) Inflation Ratio (%age) : 204.4 Thruput (uncompressed Mbps): 23407.884 Time (ms) : 280.976 Note:- All times exclude file I/O and are measured around the call to the qzDecompress() API only. 2) Using Gzip* extended header with Intel(R) QuickAssist Technology accelerator for compression and decompression: parcomp -p qatgzipext -i largetext -o largetext.compressed ---------------------------------------------------------- Warning: The hw_buff_sz parameter value used for decompression must match the value used for compression. Default hw_buff_sz value: 65536 bytes. hw_buff_sz value used in current execution: 65536 bytes. Parcomp: Tool to test compression & decompression (c) 2020, Intel(R) Corporation Reading input file: largetext.txt (1000000000 Bytes) Writing output file: largetext.compressed (396497299 Bytes) Deflation Ratio (%age) : 39.6 Thruput (uncompressed Mbps): 14677.658 Time (ms) : 356.475 Note:- All times exclude file I/O and are measured around the call to the qzCompress() API only. parcomp -p qatgzipext -d -i largetext.compressed -o largetext.original ---------------------------------------------------------------------- Warning: The hw_buff_sz parameter value used for decompression must match the value used for compression. Default hw_buff_sz value: 65536 bytes. hw_buff_sz value used in current execution: 65536 bytes. Parcomp: Tool to test compression & decompression (c) 2020, Intel(R) Corporation Reading input file: largetext.compressed (396497299 Bytes) Writing output file: largetext.original (1000000000 Bytes) Inflation Ratio (%age) : 252.2 Thruput (uncompressed Mbps): 24304.489 Time (ms) : 188.323 Note:- All times exclude file I/O and are measured around the call to the qzDecompress() API only. 3) Using Windows(TM) API (Xpress*) for compression and decompression: parcomp -p xpress -i largetext -o largetext.compressed ------------------------------------------------------ Parcomp: Tool to test compression & decompression (c) 2018, Intel(R) Corporation Reading input file: C:\CompressionFiles\largetext (1000000000 Bytes) Writing output file: C:\CompressionFiles\largetext.compressed (573799425 Bytes) Deflation Ratio (%age) : 57.4 Thruput (uncompressed Mbps): 1070.143 Time (ms) : 7283.892 Note:- All times exclude file I/O and are measured around the call to the qzCompress() API only. parcomp -p xpress -d -i largetext.compressed -o largetext.original ----------------------------------------------------------------- Parcomp: Tool to test compression & decompression (c) 2018, Intel(R) Corporation Reading input file: C:\CompressionFiles\largetext.compressed (573799425 Bytes) Writing output file: C:\CompressionFiles\largetext.original (1000000000 Bytes) Inflation Ratio (%age) : 174.3 Thruput (uncompressed Mbps): 3129.774 Time (ms) : 2408.823 Note:- All times exclude file I/O and are measured around the call to the qzDecompress() API only. 4) For performance test of QAT compression and decompression: **NOTE**: When using -Q as a parameter in the compression command, this will produce a number of identical output files with an appended numeric feather starting with 0. This feather will also be required for the decompression command. The use of -k implicitly brings the same enablement of the -Q option. parcomp.exe -p qat -Q -t 6 -c 64 -k 4096 -j 60 -n 200 -i largetext -o largetext.compressed ------------------------------------------------------------------------------------------ Warning: The hw_buff_sz parameter value used for decompression must match the value used for compression. Default hw_buff_sz value: 65536 bytes. hw_buff_sz value used in current execution: 65536 bytes. Parcomp: Tool to test compression & decompression (c) 2018, Intel(R) Corporation Reading input file: C:\CompressionFiles\largetext (1000000000 Bytes) All threads completed as Expected. Deflation Ratio (%age) : 47.9 Thruput (uncompressed Mbps): 63967.443 Processing Block size : 4096 KB Block count : 239 Time/block (ms) : 3.140 Note:- All times exclude file I/O and are measured around the call to the qzCompress() API only. parcomp.exe -p qat -d -Q -t 6 -c 64 -k 4096 -j 60 -n 200 -i largetext.compressed0 -o largetext.original ------------------------------------------------------------------------------------------------------ Warning: The hw_buff_sz parameter value used for decompression must match the value used for compression. Default hw_buff_sz value: 65536 bytes. hw_buff_sz value used in current execution: 65536 bytes. Parcomp: Tool to test compression & decompression (c) 2018, Intel(R) Corporation Reading input file: C:\CompressionFiles\largetext.compressed0 (489318782 Bytes) All threads completed as Expected. Inflation Ratio (%age) : 204.4 Thruput (uncompressed Mbps): 131133.883 Processing Block size : 4096 KB Block count : 477 Time/block (ms) : 1.535 Note:- All times exclude file I/O and are measured around the call to the qzDecompress() API only. Cryptography performance micro-benchmark Tool (cngtest) ======================================================= The software driver and providers come with a micro-benchmark tool (cngtest) to test the performance of various cryptography algorithms. This tool can be used to measure and report the rate at which crypto algorithm operations (like encrypt, decrypt, signhash, verifysignature, finalizekeypair, secretagreement, etc.) are performed using the Windows(TM) Cryptography Next-Generation (CNG) framework, with either of two providers: - the default software provider (provided by Microsoft and which is part of the OS) - the provider based on Intel(R) QuickAssist Technology. Using the cngtest tool, it is possible to quickly see the substantial CPU savings that can be gained by offloading public key cryptography – for example, the RSA 2048 decrypt operations used by a web server during SSL handshakes – from the CPU to the hardware accelerator. You can use the batch file Perf_User.bat installed in the following location: \Intel\Intel(R) QuickAssist Technology\Crypto\Samples\bin to obtain results using cngtest for user mode tests. Kernel-mode tests are no longer supported. The batch file contains cngtest commands to perform various cryptographic operations using different algorithms and parameters. To use the batch file, you will need to open a command-prompt window with Administrator privileges. 1. Navigate to the \Intel\Intel(R) QuickAssist Technology\Crypto\Samples\bin folder 2. For user mode performance of RSA, DSA, ECDSA, DH, ECDH algorithms, run: Perf_User.bat How to run cngtest independently -------------------------------- The cngtest application (cngtest.exe) is located in the "\Intel\Intel(R) QuickAssist Technology\Crypto\Samples\bin" folder. - Launch the Command Prompt (cmd.exe) window as Administrator. - Navigate to the following sub-folder: \Intel\Intel(R) QuickAssist Technology\Crypto\Samples\bin This micro-benchmark tool is a command-line utility which allows you to specify, via command line parameters, the provider (HW vs. SW), the algorithm, the number of operations to perform, the number of software threads across which to spread the requests, the CPU cores to which those software threads should be affinitized, and the key length. CNGTest Flags ------------- The micro-benchmark tool includes some brief usage help, which can be seen by running: cngtest -help cngtest is a "microbenchmark" which measures and reports the rate (measured in ops/second) at which encrypt and decrypt operations are performed using the CNG framework, with one of two providers: the default (software) provider, or a provider based on Intel(R) QuickAssist Technology (this requires the presence on the platform of a hardware accelerator). Usage: cngtest [flags] The flags are as follows: -provider={sw|qa} Specifies the provider to use. The default value is "sw", meaning use the default (software) provider. The only other legal value is "qa", which means to use the QuickAssist provider. Note that the remaining parameters have different defaults depending on the provider, as indicated below: -algo= Specifies the algorithm to test. The values of supported are: rsa: This measures the rate at which RSA decrypt operations (using CRT), and encrypt operations, are performed. Additional flags that can be provided in this case include: -keyLength= Specifies the key size (modulus size) for the operation. The default is 2048. Other legal values include 512, 1024, 1536, 3072 and 4096. Any other value will result in the default 2048 value being used. ecdsa: This measures the performance of ECDSA algorithm. Addtional flag that can be provided in this case include. -ecccurve= Specifies the ECC curve name used for ECDSA and ECDH algorithm, default curve name is nistP256, other legal values can refer to CNG Named Elliptic Curves on MSDN ecdh: This measures the performance of ECDH algorithm Addtional flag that can be provided in this case include. -ecccurve= Specifies the ECC curve name used for ECDSA and ECDH algorithm, default curve name is nistP256, other legal values can refer to CNG Named Elliptic Curves on MSDN dsa: This measures the performance of DSA algorithm (DSA algorithm is not supported in kernel mode, if kernel mode is specified, will route to running user mode test). Additional flags that can be provided in this case include: -keyLength= Specifies the key size (modulus size) for the operation. The default is 2048. Other legal values include 1024 and 3072. Any other value will result in the default 2048 value being used. dh: This measures the performance of DH algorithm. Additional flags that can be provided in this case include: -keyLength= Specifies the key size (modulus size) for the operation. The default is 2048. Other legal values include 768, 1024, 1536, 3072 and 4096. Any other value will result in the default 2048 value being used. -padding= RSA algorithm only, ignored for other algorithms. pkcs1: PKCS1 padding mode. oaep : OAEP padding mode. pss : PSS padding mode. -numThreads= Specifies the number of software threads to spawn. The default value is 2 (sw) or 150 (qa). Note that the number of outstanding requests required to "max out" the hardware is approximately 150. Note too that specifying numThreads=n is equivalent to specifying -minThreads=n and -maxThreads=n. -numIter= Specifies the number of iterations to perform. The default value is 10000 (sw) or 100000 (qa). -affinityMask= Specifies the CPU/core affinity mask for the threads. The affinity mask is interpreted as a bitmask, with each bit indicating a CPU core to which a thread should be affinitized, where core number c is represented as 2^c. Note that the mask is specified as a hexadecimal number, and must begin with the the prefix "0x". For example, to run the software threads on cores 2 and 3, specify -affinityMask=0x0C (binary 00001100). Software threads are assigned to the cores in a round-robin fashion, with the first software thread being assigned to the lowest numbered core, etc. -minThreads= Specifies the minimum number of software threads. The benchmark will be performed using each number of software threads from minThreads to maxThreads. In this case, the value of -numThreads is ignored. -maxThreads= Specifies the maximum number of software threads. The benchmark will be performed using each number of software threads from minThreads to maxThreads. In this case, the value of -numThreads is ignored. If minThreads is larger than maxThreads, minThreads and maxThreads are set as default value 1. The maxThreads limit is 150 -check Specifies that the software and hardware providers should both be executed exactly once each, and the results compared. This is a purely functional check. All other parameters are ignored in this case. -encrypt Measure performance of only the encryption operation for the specified algorigthm test. By default performance is measured over both encryption and decryption operations. -decrypt Measure performance of only the decryption operation for the specified algorigthm test. -derivekey Measure performance of only the derive key operation for the specified algorigthm (ECDH or DH) test. This option is not supported when using software provider in kernel mode. -secretderive Measure performance of the secretagreement + derive key operation for the specified algorigthm (ECDH or DH) test. This option is not supported when using software provider in kernel mode. -finalizesecret Measure performance of the key generate key + secretagreement operation for the specified algorigthm (ECDH or DH) test. -finalizekey Measure performance of only the finalize key operation for the specified algorigthm (ECDH or DH) test. -secretagreement Measure performance of only the secretagreement operation for the specified algorigthm (ECDH or DH) test. -sign Measure performance of only the sign hash operation for the specified algorigthm (ECDSA or DSA) test. -verify Measure performance of only the verify signature operation for the specified algorigthm (ECDSA or DSA) test. -generatekey Measure performance of generate key. This parameter is ignored if algo=DSA is specified. -debug Print debug messages. Cngtest Sample Results ---------------------- Here is an example output of cngtest test results for asymmetric cryptography operations using the Intel(R) QuickAssist Accelerator Software script (Perf_user.bat) to test user-mode performance: This script must be run in a Command-prompt/Shell window with Administrator privileges (1) Test 100000 RSA decrypt operations using a key size of 2048: Running in user mode... Time [ms] : 1029 Number of iterations : 100000 RSA Decrypt Ops/s : 97181.73 CPU core utilization percentage : 9% CPU overall utilizaion percentage : 4% (2) Test 100000 DSA signhash/verifysignature operations using a key size of 2048: Running in user mode... Time [ms] : 563 Number of iterations : 100000 DSA Sign Hash Ops/s : 177619.89 CPU core utilization percentage : 0% CPU overall utilizaion percentage : 12% Time [ms] : 1044 Number of iterations : 100000 DSA Verify Signature Ops/s : 95785.44 CPU core utilization percentage : 0% CPU overall utilizaion percentage : 3% (3) Test 100000 ECDSA signhash/verifysignature operations using the P-256 curve: Running in user mode... Time [ms] : 673 Number of iterations : 100000 ECDSA Sign Hash Ops/s : 148588.41 CPU core utilization percentage : 1% CPU overall utilizaion percentage : 8% Time [ms] : 1371 Number of iterations : 100000 ECDSA Verify Signature Ops/s : 72939.46 CPU core utilization percentage : 0% CPU overall utilizaion percentage : 2% (4) Test 100000 DH finalizekeypair/secretagreement operations using a key size of 2048: Running in user mode... Time [ms] : 2950 Number of iterations : 100000 DH Stage 1 Ops/s : 33898.31 CPU core utilization percentage : 0% CPU overall utilizaion percentage : 6% Time [ms] : 3708 Number of iterations : 100000 DH Stage 2 Ops/s : 26968.72 CPU core utilization percentage : 0% CPU overall utilizaion percentage : 1% (5) Test 100000 ECDH finalizekeypair/secretagreement operations using the P-256 curve: Running in user mode... Time [ms] : 1724 Number of iterations : 100000 ECDH Stage 1 Ops/s : 58004.64 CPU core utilization percentage : 0% CPU overall utilizaion percentage : 11% Time [ms] : 914 Number of iterations : 100000 ECDH Stage 2 Ops/s : 109409.19 CPU core utilization percentage : 0% CPU overall utilizaion percentage : 11% Rate Limiting ============== Rate limiting is a solution designed in the Intel(R) QuickAssist Technology accelerator software to enforce Service Level Agreements (SLA), which allocates a specified amount of acceleration capacity for a specified service, including symmetric cryptography (SYM), PKE (ASYM) and compression (DC), at a ring-pair or queue-pair (QP) granularity. The rate limiting solution provides the following features: 1. Virtualization technology agnostic SLA management API 2. Ability to query rate limiting information for QAT instances in the Guest/Host 3. Ability to query rate limiting information for QAT devices in the Host 4. Ability to configure rate limiting SLA for service Instances based on QP Rate Limiting - API Programmer's Guide ====================================== To configure rate limits on your Intel® QuickAssist Technology device, please refer to the Rate Limiting API Guide on Intel’s QuickAssist Technology website.