Skip to content

add missing ceil avx, sse functions #1207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 1, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 34 additions & 2 deletions runtime/libpgmath/lib/common/ceil.c
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,40 @@
*/

#include "mthdecls.h"
#if defined(__SSE4_1__) || defined(__AVX__)
#include <immintrin.h>

#if defined(TARGET_X8664)
/*
* For X8664, implement both SSE and AVX versions of __mth_i_ceil using ISA
* instruction extensions.
*
* Using inline assembly allows both the SSE and AVX versions of the routine
* to be compiled in a single unit.
*
* The following asm statements is equivalent to:
* return _mm_cvtss_f32(_mm_ceil_ss(_mm_set1_ps(x), _mm_set1_ps(x)));
* But without the need for separate compiliations for SSE4.1 and AVX ISA
* extensions.
*/

float
__mth_i_ceil_sse(float x)
{
__asm__(
"roundss $0x2,%0,%0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct to hardcode the rounding mode here? What if the program has called ieee_set_rounding_mode?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Intel's documentation:
Round to 00B Rounded result is the closest to the infinitely precise result. If two values are equally close, the result is nearest (even) the even value (i.e., the integer value with the least-significant bit of zero).

Round down 01B Rounded result is closest to but no greater than the infinitely precise result. (toward −∞)

Round up 10B Rounded result is closest to but no less than the infinitely precise result. (toward +∞)

Round toward 11B Rounded result is closest to but no greater in absolute value than the infinitely precise result. zero (Truncate)

What ever the current rounding mode the user has specified should not affect computing ceiling or floor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification.

:"+x"(x)
);
return x;
}

float
__mth_i_ceil_avx(float x)
{
__asm__(
"vroundss $0x2,%0,%0,%0"
:"+x"(x)
);
return x;
}
#endif

float
Expand Down
8 changes: 7 additions & 1 deletion runtime/libpgmath/lib/x86_64/math_tables/mth_ceildefs.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,13 @@
*
*/

MTHINTRIN(ceil , ss , any , __mth_i_ceil , __mth_i_ceil , __mth_i_ceil ,__math_dispatch_error)
MTHINTRIN(ceil , ss , em64t , __mth_i_ceil , __mth_i_ceil , __mth_i_ceil ,__math_dispatch_error)
MTHINTRIN(ceil , ss , sse4 , __mth_i_ceil_sse , __mth_i_ceil_sse , __mth_i_ceil_sse ,__math_dispatch_error)
MTHINTRIN(ceil , ss , avx , __mth_i_ceil_avx , __mth_i_ceil_avx , __mth_i_ceil_avx ,__math_dispatch_error)
MTHINTRIN(ceil , ss , avxfma4 , __mth_i_ceil_avx , __mth_i_ceil_avx , __mth_i_ceil_avx ,__math_dispatch_error)
MTHINTRIN(ceil , ss , avx2 , __mth_i_ceil_avx , __mth_i_ceil_avx , __mth_i_ceil_avx ,__math_dispatch_error)
MTHINTRIN(ceil , ss , avx512knl , __mth_i_ceil_avx , __mth_i_ceil_avx , __mth_i_ceil_avx ,__math_dispatch_error)
MTHINTRIN(ceil , ss , avx512 , __mth_i_ceil_avx , __mth_i_ceil_avx , __mth_i_ceil_avx ,__math_dispatch_error)
MTHINTRIN(ceil , ds , em64t , __mth_i_dceil , __mth_i_dceil , __mth_i_dceil ,__math_dispatch_error)
MTHINTRIN(ceil , ds , sse4 , __mth_i_dceil_sse , __mth_i_dceil_sse , __mth_i_dceil_sse ,__math_dispatch_error)
MTHINTRIN(ceil , ds , avx , __mth_i_dceil_avx , __mth_i_dceil_avx , __mth_i_dceil_avx ,__math_dispatch_error)
Expand Down