METHODS OF USING SIMD INSTRUCTIONS ON X86 COMPATIBLE OLDER GENERATION PROCESSORS

Authors

  • A. Yanko
  • A. Martynenko
  • O. But

DOI:

https://doi.org/10.26906/SUNZ.2021.4.044

Keywords:

arithmetic operation, vector instruction, set of processor instructions, operand constants, optimization of data processing, parallelism at the instruction level, parallel calculation

Abstract

The use of vector SIMD instructions on x86 compatible processors to improve the efficiency of computing and data processing is considered. The use of a vector set of instructions allows you to increase the number of operations performed per clock cycle through the use of parallelism at the instruction level or parallelism at the data level. At the same time, the reduction of branches in the algorithms also adds a positive impact on the speed of program execution due to fewer loads on the module for predicting conditional transitions in the processor. Also it help to optimize the use of cache lines and data transportation between the L1 cache and the CPU register because in modern processors the bus through which data is transported between the L1 cache and the register is at least 128 bits. In some cases this fact can play a significant role in achieving the goal of computational optimization. Additionally, the previously listed factors improve the ability of modern processors to execute instructions out of order. An important factor for the effective use of SIMD instructions is to accurately determine the type, nature of the data and the desired end result of the algorithm because further modification of the algorithm based on the use of SIMD is not flexible and depends entirely on these factors. So far, there is software running on the x86 architecture of processor cores, this fact does not always allow the use of the latest vector instructions starting with SSE4.1. The main disadvantage of previous implementations of vector instruction sets is the lack of logical and arithmetic operations with some types of data, especially in operations with integers. Using the features of binary implementation of sign integers, allow to compensate for the lack of logical operations for these data types. Exploiting the degenerate and indirect properties of some instructions helps both to compensate for the absence of arithmetic operations with the required data types and operations for integers with a different bit depth, likewise to optimize the performance of mathematical operations such as finding the sum, difference, multiplication and scalar product.

Downloads

References

Christer Ericson, “Real-time Collision Detection” – The Morgan Kaufmann Series, 2004. pp. 543–545.

Daniel Kusswurm, “Modern X86 Assembly Language Programming: 32-bit, 64-bit, SSE, and AVX”, 2014. – pp. 179-187.

David H. Eberly, “GPGPU Programming for Games and Science”, 2014. – 93 p.

Paul Besl, “A case study comparing AoS (Arrays of Structures) and SoA (Structures of Arrays) data layouts for a computeintensive loop run on Intel Xeon processors and Intel Xeon Phi product family coprocessors”. In: Intel Article 392271, 2015.

G. Ren, P. Wu and D. A. Padua, “A preliminary study on the vectorization of multimedia applications for multimedia extensions”, in Proc. LCPC 03, 2003. – pp. 420–435.

Published

2021-12-01

Most read articles by the same author(s)