Intel(R) QuickAssist Technology Software Readme =============================================== Intel(R) QuickAssist Technology Software Package Version: QAT1.7.W.1.5.0-0007 Intel(R) QuickAssist Technology Driver Version: 1.70.15.4 Contents ======== - License - Details/Limitations of this Release - Software Installation - Intel QuickAssist Technology Compression library - QATzip - QATzip - Introduction - QATZip - Features - QATzip - Hardware Requirements - QATzip - Software Requirements - QATzip - API Manual - QATzip - Additional Information - QATzip - Limitations - QATzip software Application (Parcomp) - Cryptography performance micro-benchmark Tool (CNGTest) - Troubleshooting License ======= Refer to license.txt in this package for the Intel software license agreement before using this software. In addition to Intel software, this package includes the following components: 1) Built in sample code from Microsoft samples, licenced with MS-LPL (license.rtf) * License file included in this package and detailed at: http://code.msdn.microsoft.com/windowshardware/Windows-8-Driver-Samples-5e1aa62e Details/Limitations of this Release =================================== * This software is only supported on the Windows Server 2019 and Windows Server 2016 operating systems. * This software has been certified for use on Windows Server 2019 and Windows Server 2016. * 32-bit applications are not supported on Windows Server 2019/2016 when using the Intel QuickAssist Technology CNG provider. * This software package supports virtualization (SR-IOV) using Hyper-V for Intel QuickAssist Technology devices. - Virtualization (SR-IOV) is only supported on C62X series Intel QuickAssist Technology devices. - Virtualization is only supported with Linux VMs running Ubuntu v18.04 or Ubuntu v20.04, over Hyper-V. - Software fallback support for encryption on virtualized devices is only supported on Windows Server 20H1 and newer service channel releases. - Virtualization is not supported on Windows VMs running over Hyper-V. * IIS web server on Windows Server 2019/2016 cannot offload SSL encryption to the Intel QuickAssist Technology CNG provider. - In Windows Server 2019/2016, SSL default settings uses ECDH with ECC25519 as the ECC Curve order. Since ECC25519 is not supported by the Intel QuickAssist Technology device, IIS web server cannot offload SSL encryption functions using the default SSL Configuration Settings. * Windows Remote Desktop is not supported if Intel QuickAssist Technology CNG providers are registered as default providers for cryptographic algorithms. Software Installation ===================== To install the Intel(R) QuickAssist Accelerator software: - Navigate to the QuickAssist\Setup sub-folder (within the folder where the package was extracted) - Run Setup.exe - Follow all instructions as displayed by the installation program. - For virtualization (SR-IOV) support without QAT Host services, select the option to install as a "virtualization host" in the installation program. A system restart is required at the end of the installation in order to fully enable virtualization support. - When the driver is installed, check the Device Manager for three devices under 'Security Accelerator'. - Ensure that the devices are in 'Enabled' state and 'Hardware Ids' in the ‘Details’ tab shows 37C8. To uninstall the Intel(R) QuickAssist Accelerator software: - Open "Programs and Features" from the Control Panel application - Click on the installed application "Intel(R) QuickAssist Technology 1.5.0.0007" - Choose Uninstall - Reboot Intel(R) QuickAssist Technology Compression library - QATzip ============================================================ * Other names and brands may be claimed as the property of others The Intel(R) QuickAssist Technology Compression library called QATzip and its associated header file can be found in the "\Intel\Intel(R) QuickAssist Technology\Compression\Library" folder. The components of the library are: 1) qatzip.h - Header file describing the QATzip API. 2) libqatzip.lib - Static library containing the implementation of the QATzip API 3) qatzip.lib - Import library to interface with qatzip.dll 2) qatzip.dll - DLL containing the implementation of the QATzip API - installed into the folder The library and header file can be compiled and linked into any software component that requires Intel(R) QuickAssist Technology compression and/or decompression services. The library also includes support for compatibility with zlib* software compression and decompression. Compression and decompression using QZ_SW_MSZIP algorithm has been removed in release 1.4. Compression using QZ_MSZIP_COMPATIBLE algorithm has been disabled in release 1.4. Only decompression is available in release 1.4 for backward compatibility. More information on the QATzip API and other details about the library are available upon request. QATzip - Introduction ===================== QATzip is a user space library which builds on top of the Intel(R) QuickAssist Technology user space library, to provide extended accelerated compression and decompression services by offloading the actual compression and decompression request(s) to the Intel(R) QuickAssist Accelerator. QATzip produces data using the standard Gzip* format (RFC 1952) with extended headers encapsulated with an additional 4 bytes to accelerate data decompression. QATzip is designed to take full advantage of the performance provided by Intel(R) QuickAssist Technology. The currently supported formats include: * Formats based on algorithms: |Data Format|Parcomp Provider|Description| | :---------------: | :---------------: | :------------------------------------------------------------: | | `QZ_DEFLATE_4B` | qat |Data is in DEFLATE* with a 4 byte header| | `QZ_DEFLATE_GZIP` | qatgzip |Data is in DEFLATE* wrapped by Gzip* header and footer| | `QZ_DEFLATE_GZIP_EXT` | qatgzipext |Data is in DEFLATE* wrapped by Intel(R) QAT Gzip* extension header and footer| | `QZ_DEFLATE_RAW` | N/A |Data is in raw DEFLATE* without any additional header. (Not supported in release 1.4.)| * Available compression algorithms: |Compression Algorithm|Parcomp Provider|Description| | :---------------: | :-----------------------: | :------------------------------------------------------------: | | `QZ_DEFLATE` | qat, qatgzip, qatgzipext |Data is in DEFLATE*| | `QZ_MSZIP_COMPATIBLE` | qatms |MSZIP* format wrapped with a 4 byte header. Deprecated in release 1.4| | `QZ_ZLIB_COMPATIBLE` | qatzlib |zlib* format wrapped with a 4 byte header| | `QZ_SW_XPRESS` | xpress |Software Compression using Xpress* algorithm wrapped with a 4 byte header| | `QZ_SW_IGZIP` | igzip |Software Compression using DEFLATE* algorithm wrapped with a 4 byte header| QATZip - Features ================= * Acceleration of compression and decompression utilizing Intel(R) QuickAssist Technology, including a utility to compress and decompress files. * Instance over-subscription, allowing a number of threads in the same process to seamlessly share a smaller number of hardware instances. * Optional software fallback for both compression and decompression services. QATzip Microsoft(R) Windows(TM) may switch to software if there is insufficient system resources including acceleration instances or memory. This feature allows for a common software stack between server platforms that have acceleration devices and non-accelerated platforms. * Intel(R) QATzip 4 byte header: This header is composed of an unsigned integer [4 bytes] indicating the length of the compressed block followed by the standard header for the data format used. The following formats are wrapped with the `QZ_DEFLATE_4B` 4 byte header: * `QZ_DEFLATE` * `QZ_MSZIP_COMPATIBLE` * `QZ_ZLIB_COMPATIBLE` * `QZ_SW_XPRESS` * `QZ_SW_IGZIP` **NOTE**: All formats which are wrapped with the 4 byte header (as listed above) do not provide a checksum of the uncompressed data, hence decompressed data can't be validated with the original payload after decompression. The `QZ_DEFLATE_4B` data format is recommended for high performance processing in use cases that can tolerate faults in decompressed blocks. * Introduction of QATzip Gzip* format. This consists of 10 bytes as the standard Gzip* data format, which is structured as follows: `| ID1(0x1F) 1B | ID2(0x8B) 1B | Compression Method (8 = DEFLATE*) 1B | Flags 1B | Modification Time 4B | Extra Flags 1B | Operating System 1B |` * Introduction of QATzip Gzip* extended format. This consists of the standard 10 byte Gzip* header and follows RFC 1952 to extend the header by an additional 14 bytes. Below is an outline of the extended headers structure: `| Length of ext. header 2B | ID1('Q') 1B | ID2('Z') 1B | Length of subheader 2B | Intel(R) defined field 'Chunksize' 4B | Intel(R) defined field 'Blocksize' 4B |` Chunksize and Blocksize are unsigned integers, which stores the original size of the data and the size of the compressed data block respectively. * Introduction of Dynamically Linked Library for QATzip Microsoft(R) Windows(TM). QATzip - Hardware Requirements ============================== This QATzip library supports compression and decompression offload to the following acceleration devices: * Intel(R) C62X Series Chipset * Intel(R) QuickAssist Adapter 8970 * Intel(R) C3XXX Series Chipset QATzip - Software Requirements ============================== This release was validated on the following: * QATzip has been tested with the latest Intel(R) QuickAssist Acceleration Driver. Please download the QAT driver from the link https://01.org/intel-quickassist-technology * QATzip has been tested by Intel(R) on Windows(TM) Server 2019 * igzip and software fallback have a dependency on isa-l DLL module installation. The isa-l DLL module can be found and built from the link https://github.com/intel/isa-l QATzip - API Manual =================== Please refer to file `QATzip-man.pdf` found at this link https://github.com/intel/QATzip/blob/master/docs/QATzip-man.pdf QATzip - Additional Information =============================== * The compression level in QATzip could be mapped to standard zlib* as below: * QATzip level 1 - 4, similar to zlib* level 1 - 4. * The current default supported data format is DEFLATE* with a 4 byte header (QZ_DEFLATE_4B) and it is backward compatible with the output produced in release 1.3 (and older versions) when using QZ_DEFLATE_RAW (former default). * Compression and decompression using QZ_SW_MSZIP algorithm has been removed in release 1.4. * Compression using QZ_MSZIP_COMPATIBLE algorithm has been disabled in release 1.4. Only decompression is available in release 1.4 for backward compatibility. QATzip - Limitations ==================== * When passing data for compression into the library the complete payload for compression should be passed in rather than sub divided due to the "last" bit being set on the final compressed block. * For MSZIP* decompression, chunk size of 32KB should be provided in line with MSZIP* format for optimisation purposes. * This software is only supported on the Microsoft(R) Windows(TM) Server 2019 and Microsoft(R) Windows(TM) Server 2016 operating systems. * This software has been certified for use on Microsoft(R) Windows(TM) Server 2019 and Microsoft(R) Windows(TM) Server 2016. * Largest compressible file size limitation of 999MB. * QATzip level 5 - 8 mapped to QATzip level 4. * In release 1.4, "RAW" DEFLATE* (QZ_DEFLATE_RAW) is not a supported data format. * Software fallback in QATzip is not applicable for the following formats: * `QZ_ZLIB_COMPATIBLE` * `QZ_MSZIP_COMPATIBLE` * `QZ_SW_XPRESS` * `QZ_SW_IGZIP` * Gzip* decompression is currently only supported using software offload as Gzip* does not contain a blocksize value. QATzip - Known Issues ===================== * When decompressing a file it is important to match the chunk size/hw_buff_sz to the value that was specified to compress the data. If no value was specified 64KB is the default. This value is used during decompression to provision appropriate space between the inflated blocks to minimise the number of buffer copies during parallel driver decompression. Using an incorrect value will result in inflated blocks which overlap or blocks spaced too far apart. This issue applies to all formats with the exception of QZ_DEFLATE_GZIP compressed format. The primary mitigation for this issue is to record the chunk size/hw_buff_sz used during compression. QATzip software Application (Parcomp) ===================================== This package comes with a tool called 'parcomp' to test the performance of the Intel(R) QuickAssist Technology compression accelerator. Parcomp has been built using the QATzip API and library, and can be found in the "\Intel\Intel(R) QuickAssist Technology\Compression" folder. It can be used to measure and report the rate at which compression and decompression operations are performed using the accelerator as well as those operations performed by the default services offered by the system OS. To run Parcomp: -------------- 1) Launch Command Prompt (cmd.exe) as Administrator. 2) Navigate to the following sub-folder where the software package was installed: "\Intel\Intel(R) QuickAssist Technology\Compression" 3) Run Parcomp (eg:) For compression: parcomp -p qat –i -o For de-compression: parcomp -p qat -d –i -o 4) To see a list of supported command-line options, simply run the application without any command-line options (or use the -h option) Command-line options: -------------------- Usage: parcomp.exe -i -o [options] Required options: -i srcFilename Input (source) filename -o dstFilename Output (destination) filename Optional options can be: -b [cold|warm] Use cold buffer or not. -p providerName Specifies the provider (implementation). Options include: 'qat' - QuickAssist accelerated DEFLATE* algorithm 'qatzlib' - QuickAssist accelerated DEFLATE* algorithm with zlib* header. 'qatms' - Deprecated QuickAssist accelerated DEFLATE* algorithm with MSZIP* header. Only decompression supported. 'qatgzip' - QuickAssist accelerated DEFLATE* algorithm with Gzip* header. Note: With qatgzip provider, options -k -t -Q cannot be used in decompression direction. The operation will be executed in a single thread with SW implementation. Parcomp application cannot process Gzip* source buffers bigger than 1Gb and the number of iterations is 1. 'qatgzipext' - QuickAssist accelerated DEFLATE* algorithm with Gzip* extended header. 'xpress' - Software based Xpress* algorithm. 'igzip' - Software based DEFLATE* algorithm using igzip dll. -c chunkSizeInKB Chunk size, in KB. Default is 64. Files will be divided into chunks of this size (the last chunk may be smaller), and each chunk is compressed separately. State is not maintained between chunks. The chunk size used for decompression must match the value used for compression. -l compressionLevel Compression level. Default is 1. compressionLevel can range from 1 - 4. Lower values imply less compressibility in less time. -d Decompress the input file. Default is to compress. -v (or -g) Verbose (or debug) -x numLines Print a summary of the inputs and outputs in a comma-separated variable (CSV) format for easy importing into a spreadsheet. Specify numLines as 1 for data only, or 2 to also include a header summary -t numThreads Creates specified number of threads, splits input file into numThreads (near-)equal chunks, and performs the operation in each thread. If specifying multiple threads, -Q is also required. -f cpuFreqInMHz Specifies the CPU frequency in MHz. If not specified, this will be measured (takes approx. 1 second) -n numIterations Specifies the number of iterations (allows you to run the same operation numIterations times) Default is 1. -Q Test independent threads writing one process using one session. Uses multiple copies of same input file, outputs one output file per thread. Must be used with -t and threadcount of 1 or more. -k blockSizeInKB Separate the source data into several blocks of size specified by blockSizeInKB. -k uses -Q by default. -h Print this help message. The following options are applicable for the qat provider only: -j maxOutstandingJobs Maximum number of outstanding jobs (requests) that may be outstanding at any one time. Default is 30. -s Static compression. Default is dynamic. -D Dynamic compression. This is default. -FB Enable igzip fallback. -FT thresSizeInKB Threshold value for fallback. If the offload size is less than the threshold, software provider is used. Sample Results -------------- Here are a few examples of the results obtained by running Parcomp for compression and decompression: Note: These examples are for illustrative purposes only. 1) Using Intel(R) QuickAssist Technology accelerator for compression and decompression: parcomp -p qat -i largetext -o largetext.compressed --------------------------------------------------- Warning: The hw_buff_sz parameter value used for decompression must match the value used for compression. Default hw_buff_sz value: 65536 bytes. hw_buff_sz value used in current execution: 65536 bytes. Parcomp: Tool to test compression & decompression (c) 2018, Intel(R) Corporation Reading input file: C:\CompressionFiles\largetext (1000000000 Bytes) Writing output file: C:\CompressionFiles\largetext.compressed (489318805 Bytes) Deflation Ratio (%age) : 48.9 Thruput (uncompressed Mbps): 12312.576 Time (ms) : 571.054 Note:- All times exclude file I/O and are measured around the call to the qzCompress() API only. parcomp -p qat -d -i largetext.compressed -o largetext.original --------------------------------------------------------------- Warning: The hw_buff_sz parameter value used for decompression must match the value used for compression. Default hw_buff_sz value: 65536 bytes. hw_buff_sz value used in current execution: 65536 bytes. Parcomp: Tool to test compression & decompression (c) 2018, Intel(R) Corporation Reading input file: C:\CompressionFiles\largetext.compressed (489318806 Bytes) Writing output file: C:\CompressionFiles\largetext.original (1000000000 Bytes) Inflation Ratio (%age) : 204.4 Thruput (uncompressed Mbps): 23407.884 Time (ms) : 280.976 Note:- All times exclude file I/O and are measured around the call to the qzDecompress() API only. 2) Using Gzip* extended header with Intel(R) QuickAssist Technology accelerator for compression and decompression: parcomp -p qatgzipext -i largetext -o largetext.compressed ---------------------------------------------------------- Warning: The hw_buff_sz parameter value used for decompression must match the value used for compression. Default hw_buff_sz value: 65536 bytes. hw_buff_sz value used in current execution: 65536 bytes. Parcomp: Tool to test compression & decompression (c) 2020, Intel(R) Corporation Reading input file: largetext.txt (1000000000 Bytes) Writing output file: largetext.compressed (396497299 Bytes) Deflation Ratio (%age) : 39.6 Thruput (uncompressed Mbps): 14677.658 Time (ms) : 356.475 Note:- All times exclude file I/O and are measured around the call to the qzCompress() API only. parcomp -p qatgzipext -d -i largetext.compressed -o largetext.original ---------------------------------------------------------------------- Warning: The hw_buff_sz parameter value used for decompression must match the value used for compression. Default hw_buff_sz value: 65536 bytes. hw_buff_sz value used in current execution: 65536 bytes. Parcomp: Tool to test compression & decompression (c) 2020, Intel(R) Corporation Reading input file: largetext.compressed (396497299 Bytes) Writing output file: largetext.original (1000000000 Bytes) Inflation Ratio (%age) : 252.2 Thruput (uncompressed Mbps): 24304.489 Time (ms) : 188.323 Note:- All times exclude file I/O and are measured around the call to the qzDecompress() API only. 3) Using Windows(TM) API (Xpress*) for compression and decompression: parcomp -p xpress -i largetext -o largetext.compressed ------------------------------------------------------ Parcomp: Tool to test compression & decompression (c) 2018, Intel(R) Corporation Reading input file: C:\CompressionFiles\largetext (1000000000 Bytes) Writing output file: C:\CompressionFiles\largetext.compressed (573799425 Bytes) Deflation Ratio (%age) : 57.4 Thruput (uncompressed Mbps): 1070.143 Time (ms) : 7283.892 Note:- All times exclude file I/O and are measured around the call to the qzCompress() API only. parcomp -p xpress -d -i largetext.compressed -o largetext.original ----------------------------------------------------------------- Parcomp: Tool to test compression & decompression (c) 2018, Intel(R) Corporation Reading input file: C:\CompressionFiles\largetext.compressed (573799425 Bytes) Writing output file: C:\CompressionFiles\largetext.original (1000000000 Bytes) Inflation Ratio (%age) : 174.3 Thruput (uncompressed Mbps): 3129.774 Time (ms) : 2408.823 Note:- All times exclude file I/O and are measured around the call to the qzDecompress() API only. 3) For performance test of QAT compression and decompression: **NOTE**: When using -Q as a parameter in the compression command, this will produce a number of identical output files with an appended numeric feather starting with 0. This feather will also be required for the decompression command. The use of -k implicitly brings the same enablement of the -Q option. parcomp.exe -p qat -Q -t 6 -c 64 -k 4096 -j 60 -n 200 -i largetext -o largetext.compressed ------------------------------------------------------------------------------------------ Warning: The hw_buff_sz parameter value used for decompression must match the value used for compression. Default hw_buff_sz value: 65536 bytes. hw_buff_sz value used in current execution: 65536 bytes. Parcomp: Tool to test compression & decompression (c) 2018, Intel(R) Corporation Reading input file: C:\CompressionFiles\largetext (1000000000 Bytes) All threads completed as Expected. Deflation Ratio (%age) : 47.9 Thruput (uncompressed Mbps): 63967.443 Processing Block size : 4096 KB Block count : 239 Time/block (ms) : 3.140 Note:- All times exclude file I/O and are measured around the call to the qzCompress() API only. parcomp.exe -p qat -d -Q -t 6 -c 64 -k 4096 -j 60 -n 200 -i largetext.compressed0 -o largetext.original ------------------------------------------------------------------------------------------------------ Warning: The hw_buff_sz parameter value used for decompression must match the value used for compression. Default hw_buff_sz value: 65536 bytes. hw_buff_sz value used in current execution: 65536 bytes. Parcomp: Tool to test compression & decompression (c) 2018, Intel(R) Corporation Reading input file: C:\CompressionFiles\largetext.compressed0 (489318782 Bytes) All threads completed as Expected. Inflation Ratio (%age) : 204.4 Thruput (uncompressed Mbps): 131133.883 Processing Block size : 4096 KB Block count : 477 Time/block (ms) : 1.535 Note:- All times exclude file I/O and are measured around the call to the qzDecompress() API only. Cryptography performance micro-benchmark Tool (cngtest) ======================================================= The software driver and providers come with a micro-benchmark tool (cngtest) to test the performance of various cryptography algorithms. This tool can be used to measure and report the rate at which crypto algorithm operations (like encrypt, decrypt, signhash, verifysignature, finalizekeypair, secretagreement, etc.) are performed using the Windows(TM) Cryptography Next-Generation (CNG) framework, with either of two providers: - the default software provider (provided by Microsoft and which is part of the OS) - the provider based on Intel(R) QuickAssist Technology. Using the cngtest tool, it is possible to quickly see the substantial CPU savings that can be gained by offloading public key cryptography – for example, the RSA 2048 decrypt operations used by a web server during SSL handshakes – from the CPU to the hardware accelerator. You can use the batch files installed in the \Intel\Intel(R) QuickAssist Technology\Crypto\Samples\bin folder to obtain results using cngtest for user mode and kernel mode tests. Both batch files contain cngtest commands to perform various cryptographic operations using different algorithms and parameters. To use these batch files, you will need to open a command-prompt window with Administrator privileges. 1. Navigate to the \Intel\Intel(R) QuickAssist Technology\Crypto\Samples\bin folder 2. For example user mode performance of RSA, DSA, ECDSA, DH, ECDH algorithms, run: Perf_User.bat 3. For example kernel mode performance of RSA, DH, ECDH, ECDSA algorithms, run: Perf_Kern.bat How to run cngtest independently -------------------------------- The cngtest application (cngtest.exe) is located in the "\Intel\Intel(R) QuickAssist Technology\Crypto\Samples\bin" folder. - Launch the Command Prompt (cmd.exe) window as Administrator. - Navigate to the following sub-folder: \Intel\Intel(R) QuickAssist Technology\Crypto\Samples\bin This micro-benchmark tool is a command-line utility which allows you to specify, via command line parameters, the provider (HW vs. SW), the algorithm, the number of operations to perform, the number of software threads across which to spread the requests, the CPU cores to which those software threads should be affinitized, and the key length. CNGTest Flags ------------- The micro-benchmark tool includes some brief usage help, which can be seen by running: cngtest -help cngtest is a "microbenchmark" which measures and reports the rate (measured in ops/second) at which encrypt and decrypt operations are performed using the CNG framework, with one of two providers: the default (software) provider, or a provider based on Intel(R) QuickAssist Technology (this requires the presence on the platform of a hardware accelerator). Usage: cngtest [flags] The flags are as follows: -provider={sw|qa} Specifies the provider to use. The default value is "sw", meaning use the default (software) provider. The only other legal value is "qa", which means to use the QuickAssist provider. Note that the remaining parameters have different defaults depending on the provider, as indicated below: -algo= Specifies the algorithm to test. The values of supported are: rsa: This measures the rate at which RSA decrypt operations (using CRT), and encrypt operations, are performed. Additional flags that can be provided in this case include: -keyLength= Specifies the key size (modulus size) for the operation. The default is 2048. Other legal values include 512, 1024, 1536, 3072 and 4096. Any other value will result in the default 2048 value being used. ecdsa_256: This measures the performance of ECDSA_P256 algorithm ecdsa_384: This measures the performance of ECDSA_P384 algorithm ecdsa_521: This measures the performance of ECDSA_P521 algorithm ecdsa: This measures the performance of ECDSA algorithm. Addtional flag that can be provided in this case include. -ecccurve= Specifies the ECC curve name used for ECDSA and ECDH algorithm, default curve name is nistP256, other legal values can refer to CNG Named Elliptic Curves on MSDN ecdh_256: This measures the performance of ECDH_P256 algorithm ecdh_384: This measures the performance of ECDH_P384 algorithm ecdh_521: This measures the performance of ECDH_P521 algorithm ecdh: This measures the performance of ECDH algorithm Addtional flag that can be provided in this case include. -ecccurve= Specifies the ECC curve name used for ECDSA and ECDH algorithm, default curve name is nistP256, other legal values can refer to CNG Named Elliptic Curves on MSDN dsa: This measures the performance of DSA algorithm (DSA algorithm is not supported in kernel mode, if kernel mode is specified, will route to running user mode test). Additional flags that can be provided in this case include: -keyLength= Specifies the key size (modulus size) for the operation. The default is 2048. Other legal values include 1024 and 3072. Any other value will result in the default 2048 value being used. dh: This measures the performance of DH algorithm. Additional flags that can be provided in this case include: -keyLength= Specifies the key size (modulus size) for the operation. The default is 2048. Other legal values include 512, 1024, 1536, 3072 and 4096. Any other value will result in the default 2048 value being used. -padding= RSA algorithm only, ignored for other algorithms. pkcs1: PKCS1 padding mode. oaep : OAEP padding mode. pss : PSS padding mode. -user Runs the test in user space. Default kernel space. -numThreads= Specifies the number of software threads to spawn. The default value is 2 (sw) or 150 (qa). Note that the number of outstanding requests required to "max out" the hardware is approximately 150. Note too that specifying numThreads=n is equivalent to specifying -minThreads=n and -maxThreads=n. -numIter= Specifies the number of iterations to perform. The default value is 10000 (sw) or 100000 (qa). -affinityMask= Specifies the CPU/core affinity mask for the threads. The affinity mask is interpreted as a bitmask, with each bit indicating a CPU core to which a thread should be affinitized, where core number c is represented as 2^c. Note that the mask is specified as a hexadecimal number, and must begin with the the prefix "0x". For example, to run the software threads on cores 2 and 3, specify -affinityMask=0x0C (binary 00001100). Software threads are assigned to the cores in a round-robin fashion, with the first software thread being assigned to the lowest numbered core, etc. -minThreads= Specifies the minimum number of software threads. The benchmark will be performed using each number of software threads from minThreads to maxThreads. In this case, the value of -numThreads is ignored. -maxThreads= Specifies the maximum number of software threads. The benchmark will be performed using each number of software threads from minThreads to maxThreads. In this case, the value of -numThreads is ignored. If minThreads is larger than maxThreads, minThreads and maxThreads are set as default value 1 -check Specifies that the software and hardware providers should both be executed exactly once each, and the results compared. This is a purely functional check. All other parameters are ignored in this case. -encrypt Measure performance of only the encryption operation for the specified algorigthm test. By default performance is measured over both encryption and decryption operations. -decrypt Measure performance of only the decryption operation for the specified algorigthm test. -derivekey Measure performance of only the derive key operation for the specified algorigthm (ECDH or DH) test. This option is not supported when using software provider in kernel mode. -secretderive Measure performance of the secretagreement + derive key operation for the specified algorigthm (ECDH or DH) test. This option is not supported when using software provider in kernel mode. -finalizesecret Measure performance of the key generate key + secretagreement operation for the specified algorigthm (ECDH or DH) test. -finalizekey Measure performance of only the finalize key operation for the specified algorigthm (ECDH or DH) test. -secretagreement Measure performance of only the secretagreement operation for the specified algorigthm (ECDH or DH) test. -sign Measure performance of only the sign hash operation for the specified algorigthm (ECDSA or DSA) test. -verify Measure performance of only the verify signature operation for the specified algorigthm (ECDSA or DSA) test. -generatekey Measure performance of generate key. This parameter is ignored if algo=dsa is specified. Cngtest Sample Results ---------------------- Here is an example output of cngtest test results for asymmetric crypto. 1) When using Perf_Kern.bat Intel(R) QuickAssist Accelerator Software script to test Kernel mode performance This script must be run in a Command-prompt/Shell window with Administrator privileges (1) Test 100000 RSA decrypt operations using a key size of 2048: Running in kernel mode... Time [ms] : 1050 Number of iterations : 100000 RSA Decrypt Ops/s : 95238.10 (2) Test 100000 DH finalizekeypair/secretagreement operations using a key size of 2048: Running in kernel mode... Time [ms] : 3012 Number of iterations : 100000 DH Stage 1 Ops/s : 33200.53 Time [ms] : 3825 Number of iterations : 100000 DH Stage 2 Ops/s : 26143.79 (3) Test 100000 ECDH finalizekeypair/secretagreement operations using the P-256 curve: Running in kernel mode... Time [ms] : 1466 Number of iterations : 100000 ECDH Stage 1 Ops/s : 68212.82 Time [ms] : 776 Number of iterations : 100000 ECDH Stage 2 Ops/s : 128865.98 (4) Test 100000 ECDSA signhash/verifysignature operations using the P-256 curve: Running in kernel mode... Time [ms] : 664 Number of iterations : 100000 ECDSA Sign Hash Ops/s : 150602.41 Time [ms] : 1423 Number of iterations : 100000 ECDSA Verify Signature Ops/s : 70274.07 Intel(R) QuickAssist Accelerator Software script to test User mode performance 2) When using Perf_User.bat This script must be run in a Command-prompt/Shell window with Administrator privileges (1) Test 100000 RSA decrypt operations using a key size of 2048: Running in user mode... Time [ms] : 1029 Number of iterations : 100000 RSA Decrypt Ops/s : 97181.73 CPU core utilization percentage : 9% CPU overall utilizaion percentage : 4% (2) Test 100000 DSA signhash/verifysignature operations using a key size of 2048: Running in user mode... Time [ms] : 563 Number of iterations : 100000 DSA Sign Hash Ops/s : 177619.89 CPU core utilization percentage : 0% CPU overall utilizaion percentage : 12% Time [ms] : 1044 Number of iterations : 100000 DSA Verify Signature Ops/s : 95785.44 CPU core utilization percentage : 0% CPU overall utilizaion percentage : 3% (3) Test 100000 ECDSA signhash/verifysignature operations using the P-256 curve: Running in user mode... Time [ms] : 673 Number of iterations : 100000 ECDSA Sign Hash Ops/s : 148588.41 CPU core utilization percentage : 1% CPU overall utilizaion percentage : 8% Time [ms] : 1371 Number of iterations : 100000 ECDSA Verify Signature Ops/s : 72939.46 CPU core utilization percentage : 0% CPU overall utilizaion percentage : 2% (4) Test 100000 DH finalizekeypair/secretagreement operations using a key size of 2048: Running in user mode... Time [ms] : 2950 Number of iterations : 100000 DH Stage 1 Ops/s : 33898.31 CPU core utilization percentage : 0% CPU overall utilizaion percentage : 6% Time [ms] : 3708 Number of iterations : 100000 DH Stage 2 Ops/s : 26968.72 CPU core utilization percentage : 0% CPU overall utilizaion percentage : 1% (5) Test 100000 ECDH finalizekeypair/secretagreement operations using the P-256 curve: Running in user mode... Time [ms] : 1724 Number of iterations : 100000 ECDH Stage 1 Ops/s : 58004.64 CPU core utilization percentage : 0% CPU overall utilizaion percentage : 11% Time [ms] : 914 Number of iterations : 100000 ECDH Stage 2 Ops/s : 109409.19 CPU core utilization percentage : 0% CPU overall utilizaion percentage : 11%