Saturday, July 1, 2017

Hard-coded hardware addresses in C/C++

I read a blog post “reinterpret_cast vs. constant expression” that discusses how to get rid of C-style casts for code such as
#define FOO ((struct S*)0xdff000)
But there is no need to have hard-coded addresses in the code – it is better to declare a normal structure
extern struct S hw_s;
and tell the linker to place it at address 0xdff000 using an assembly file containing the lines
.global hw_s
hw_s = 0xdff000
FOO can now be defined without a cast
#define FOO &hw_s
although it is probably better to use hw_s directly...

It is good to get rid of hard-coded addresses in C/C++ code even if you do not care about ugly casts. One reason is that the compiler cannot know which objects the hard-coded addresses point to, which restricts the data flow analysis. One other reason is that hard-coded addresses interact badly with instruction selection in the backend. This is especially true for code accessing hardware registers that expand to assignments of the form
*(volatile int *)(0xdff008) = 0;
*(volatile int *)(0xdff010) = 10;
The best way of generating the code depends on the CPU architecture, but it usually involves loading a base address into a register and storing using a “base + offset” addressing mode, so the compiler needs to split and re-combine the addresses (which is complicated as there are often restrictions on which offsets are valid, the cost of the base depends on the value, etc.). The ARM backend is good at this, but I have seen many cases where much slower and larger code than necessary is generated for more obscure architectures. For example, GCC 7.1 for RISC-V compiles
void foo(void)
{
  *(volatile int *)(0xfffff00023400008) = 0;
  *(volatile int *)(0xfffff00023400010) = 10;
}
to
foo:
 lui a5,%hi(.LC0)
 ld a5,%lo(.LC0)(a5)
 li a4,10
 sw zero,0(a5)
 lui a5,%hi(.LC1)
 ld a5,%lo(.LC1)(a5)
 sw a4,0(a5)
 ret
.LC0:
 .dword -17591594647544
.LC1:
 .dword -17591594647536
instead of the smaller and faster
foo:
 lui a5,%hi(.LC0)
 ld a5,%lo(.LC0)(a5)
 li a4,10
 sw zero,8(a5)
 sw a4,16(a5)
 ret
.LC0:
 .dword -17591594647552
you get by writing through a normal structure.

3 comments:

  1. Interesting article. However I do not share your idea about writing linker-specific stuff to define one's register adresses: Linker scripts are highly non-portable, and it will be difficult to port to a new compiler in the future.

    I was not aware about the improved optimization when using structures, but I can confim that the STM32 Hardware Libraries (for ARM) use structures to encapsulate all hardware registers depending on a specific peripheral.

    ReplyDelete
    Replies
    1. I agree that using linker scripts is a bad idea. But my example does not use a linker script – it creates an absolute symbol using the assembler! It may be that different toolchains have a different syntax for this, but it should be trivial to change when porting (and I guess most projects working with fixed hardware addresses contain additional assembly files to port anyway...)

      Delete
  2. A compiler will have a better chance of exploiting base+displacement addressing when given hard-coded addresses than when given independent external symbols. Further, while the Standard only requires compilers to treat accesses of volatile objects as ordered with respect to other volatile accesses, and not with respect to all accesses to things that aren't guarded by "restrict", there are many cases where the stronger ordering can be useful. Achieving optimal code generation on systems using the stronger ordering would require use of "restrict" pointers, but I'm not sure why that should be seen as a problem.

    ReplyDelete