Thursday, November 10, 2016

"missing" optimizations — constant address comparison

Sometimes the compiler intentionally fails to optimize certain constructs for arguably good reasons. For example, compiling the following with GCC
extern int a;
extern int b;

int foo(void)
{
return &a != &b;
}

generates code doing a comparison
foo:
movl    $a, %eax cmpq$b, %rax
setne   %al
movzbl  %al, %eax
ret

even though the C standard ensures the addresses of a and b are different.

It seems to be a bit unclear why GCC keeps this comparison, but the discussion in the bug 78035 mentions the C defect report DR #078, and the expressiveness of the ELF format. DR #078 notes that
unsigned int h(void)
{
return memcpy != memmove;
}

may return 0, which happens on implementations where the C standard library uses the same code for memcpy and memmove (the C language cannot do that, but the standard library does not need to be written in C). This does not mean that the compiler must be able to handle different symbols mapping to the same address — it only says that C programs must not assume too much about the standard library. But ELF supports exporting multiple symbols for the same address, and GCC tries to honor ELF possibilities (such as the symbol interposition that is limiting optimizations for shared libraries).

I'm not convinced it makes sense for GCC to keep these comparisons in the generated code — other optimizations, such as alias analysis, treats global symbols as having different addresses, so it is likely that other optimizations will make the code fail if it has two symbols with the same address. For example,
extern int a;
extern int b;

int foo(void)
{
a = 1;
b = 5;
a++;
return &a != &b;
}

optimizes the accesses to a and b as if they have different addresses, even though the comparison is emitted:
foo:
movl    $a, %eax movl$5, b(%rip)
movl    $2, a(%rip) cmpq$b, %rax
setne   %al
movzbl  %al, %eax
ret

This missing optimization does probably not make any difference in reality (although I could imagine some macro or template that relies on this being optimized), but this inconsistency in what is optimized annoys me...

2. GCC sometimes use the same code for identical functions, but it ensures they have different addresses. E.g. if $$\verb!foo!$$ and if $$\verb!bar!$$ are identical, then $$\verb!foo!$$ is compiled to
$$\verb!_Z3foov:! \\ \verb! jmp _Z3barv! \\$$