-
Notifications
You must be signed in to change notification settings - Fork 140
add missing ceil avx, sse functions #1207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
__mth_i_ceil_sse(float x) | ||
{ | ||
__asm__( | ||
"roundss $0x2,%0,%0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct to hardcode the rounding mode here? What if the program has called ieee_set_rounding_mode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From Intel's documentation:
Round to 00B Rounded result is the closest to the infinitely precise result. If two values are equally close, the result is nearest (even) the even value (i.e., the integer value with the least-significant bit of zero).
Round down 01B Rounded result is closest to but no greater than the infinitely precise result. (toward −∞)
Round up 10B Rounded result is closest to but no less than the infinitely precise result. (toward +∞)
Round toward 11B Rounded result is closest to but no greater in absolute value than the infinitely precise result. zero (Truncate)
What ever the current rounding mode the user has specified should not affect computing ceiling or floor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This was noticed in #1120, but decided to implement as a separate PR.
cc @d-parks