Skip to content

Clang compiles _Generic expressions very slowly #137891

Open
@JacksonAllan

Description

@JacksonAllan

Hello :)

Clang appears to compile C’s _Generic expressions far slower than other compilers. This I noticed because Clang takes more than 3x longer than GCC to compile the unit tests of my own library, which relies heavily on _Generic. These unit tests make more than 4,000 library API calls that together contain more than 35,000 _Generic expressions. Another library I maintain also makes ample – albeit not as extensive – use of _Generic, and it too compiles much more slowly with Clang than with GCC (see the previous link).

To demonstrate, I’ve created a simple benchmark that uses the preprocessor to generate 100,000 _Generic expressions, each supporting 24 types paired with empty functions:

static void handle_unsigned_char( unsigned char x ){}
static void handle_signed_char( signed char x ){}
static void handle_unsigned_short( unsigned short x ){}
static void handle_short( short x ){}
static void handle_unsigned_int( unsigned int x ){}
static void handle_int( int x ){}
static void handle_unsigned_long( unsigned long x ){}
static void handle_long( long x ){}
static void handle_unsigned_long_long( unsigned long long x ){}
static void handle_long_long( long long x ){}
static void handle_float( float x ){}
static void handle_double( double x ){}
static void handle_unsigned_char_ptr( unsigned char *x ){}
static void handle_signed_char_ptr( signed char *x ){}
static void handle_unsigned_short_ptr( unsigned short *x ){}
static void handle_short_ptr( short *x ){}
static void handle_unsigned_int_ptr( unsigned int *x ){}
static void handle_int_ptr( int *x ){}
static void handle_unsigned_long_ptr( unsigned long *x ){}
static void handle_long_ptr( long *x ){}
static void handle_unsigned_long_long_ptr( unsigned long long *x ){}
static void handle_long_long_ptr( long long *x ){}
static void handle_float_ptr( float *x ){}
static void handle_double_ptr( double *x ){}

int main( void )
{
  int foo = 0;

  #define X                                              \
  (void)_Generic( foo,                                   \
    unsigned char: handle_unsigned_char,                 \
    signed char: handle_signed_char,                     \
    unsigned short: handle_unsigned_short,               \
    short: handle_short,                                 \
    unsigned int: handle_unsigned_int,                   \
    int: handle_int,                                     \
    unsigned long: handle_unsigned_long,                 \
    long: handle_long,                                   \
    unsigned long long: handle_unsigned_long_long,       \
    long long: handle_long_long,                         \
    float: handle_float,                                 \
    double: handle_double,                               \
    unsigned char *: handle_unsigned_char_ptr,           \
    signed char *: handle_signed_char_ptr,               \
    unsigned short *: handle_unsigned_short_ptr,         \
    short *: handle_short_ptr,                           \
    unsigned int *: handle_unsigned_int_ptr,             \
    int *: handle_int_ptr,                               \
    unsigned long *: handle_unsigned_long_ptr,           \
    long *: handle_long_ptr,                             \
    unsigned long long *: handle_unsigned_long_long_ptr, \
    long long *: handle_long_long_ptr,                   \
    float *: handle_float_ptr,                           \
    double *: handle_double_ptr                          \
  )( foo );                                              \

  #define X10 X X X X X X X X X X
  #define X100 X10 X10 X10 X10 X10 X10 X10 X10 X10 X10
  #define X1000 X100 X100 X100 X100 X100 X100 X100 X100 X100 X100
  #define X10000 X1000 X1000 X1000 X1000 X1000 X1000 X1000 X1000 X1000 X1000
  #define X100000 X10000 X10000 X10000 X10000 X10000 X10000 X10000 X10000 X10000 X10000

  X100000
}

Here are the results I get (best of five attempts) on Windows with an AMD Ryzen 7 5800H for MSVC 19.43.34810 and GCC 14.2.0 and Clang 19.1.7 via the WinLibs distribution of MinGW-w64:

Image

As we can see, Clang is approximately 3x slower than GCC. The fact that the build time for Clang varies little between -O1 and -O3 suggests that the issue is not related to optimizations.

I get slightly better results on Replit (i.e. Linux), which provides older versions of Clang and GCC (17.0.6 and 13.2.0, respectively). In that case, Clang is only approximately 2x slower than GCC.

Obviously, I’d like to see Clang improve to handle _Generic at a speed more on par with other compilers, but I understand that this might not be a high priority. Let me know if I can contribute, e.g. by writing more comprehensive benchmarks.

Thanks for reading :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions