Not so position independent PIE
04 / 06 / 2022

I was adding new changes to the project I work on and (as almost everything on AMD64) the main binary for the project is 64-bit PIE (Position Independent Executable) ELF. Some (not all!) of our machines that was compiling the source was signaling a warning saying:

/usr/bin/ld: supersecretfilename.s.o: warning: relocation in read-only section `.rodata'
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE

I recognize the ".rodata" because It is the part of the patch, but what is DT_TEXTREL, what does it have to do with PIE and why would I care? Turns out I do care about the warning, because It warns me about wasting CPU cycles! (I would never do THAT!)

AMD64 introduced new addressing mode along with long mode - some readers probably knows it as "64bit x86". Most addressing instructions in long mode can encode usual base, displacemet, index and scale with base being RIP register (the 64bit program counter). In GNU AS the syntax looks like this:

lea rax, [rip + SOME_SYMBOL_NAME]

This helps with writing PIC (Position Independent Code) which helps with the problem of relocations known well for example from protected mode. One might think that it is useful only for shared libraries, but when one make an executable with PIC one can very easily apply ASLR or load it into memory of running process and just use it without any address fixing ie. relocating.

That is great, binary does addressing by offsets to the current position of program counter so that it doesn't care what address it was loaded on. No relocations has to be done, you just copy the binary into memory and use it.

But here is the problem, right.. Turns out, my PIE is not so position independent ;]

This is some short code example that shows the problem. It can be compiled with GCC and I am using Intel syntax for asm because AT&T is telecommunications company, not a hardware one and it shows.

#include <stdio.h>

// Defined in asm
int exec_func(int r);

int main(void)
{
    int r = 2;

    r = exec_func(r);

    printf("%i\n", r);

    return 0;
}
.intel_syntax noprefix


.section .text

.globl exec_func
exec_func:
    and rdi, 3
    mov rax, [rip + func_arr]
    lea rax, [rax + rdi * 8]
    call rax

func1:
    inc rdi
    mov rax, rdi
    ret

func2:
    add rdi, 2
    mov rax, rdi
    ret

func3:
    add rdi, 3
    mov rax, rdi
    ret


.section .rodata

.globl func_arr
func_arr:
.quad func1
.quad func2
.quad func3

All cool right? I use RIP relative addressing, so it should be all ok. Turns out It's not.

The problem is in the func_arr array. Which is the array of pointers to functions. What the assembler should generate here? It can't generate absolute addresses because we are building PIE. It can't generate relative addresses because it doesn't know from where in the code it is used. If the optimizer can prove that it can be referenced only from one place, it would be able to generate pointers relative to that only place and It's all good.

Turns out GNU's ld handles this by generating absoule addresses there (even for PIE) but when loader loads the binary into memory it checks for DT_TEXTREL section and does the usual relocation on those entries!! Which would invoke whole relocation procedure when loading my program causing longer program start.

For completeness, documentation of DT_TEXTREL from man elf:

              DT_TEXTREL
                     Absence of this entry indicates that no relocation
                     entries should apply to a nonwritable segment

This is a common practice, to use the array of pointers to solve some kind of problem, so I started wondering what code would GCC generate when I write this in C. I was getting this warning because I was writing assembly. We were using array of pointers throughout the code and we've never seen it before.

So my GCC generated this table not in ".rodata" (which made sense to me because it is read only throughout the program execution), but in ".data.rel.ro". To deal with read only data that needs relocation GCC would either place the data into .data.rel.ro or create DT_TEXTREL so that loader knows that it has to do the relocation. Funny thing is that because for relocation we need write privileges, loaders first has to map those read-only sections as writable, apply the relocations and then mprotect them back into read only.

Lots of work, right? IMHO both .data.rel.ro and DT_TEXTREL in a binary is a bad idea, so I propose a fix for the engineers who care ;] Instead of using pointers to function in read-only sections of your binary, just use trampolines, like that (pseudo C code):

void call_me_a_func(int index)
{
    switch(index) {
        case 0:
            func0();
            break
        case 1:
            func1();
            break;
        // ... and so on
    }
}

Or use tools (perhaps compiler) to generate it for you, perhaps slightly more efficient. Depending on the use case, trade-offs and target machine It is possible to make it work better in a number of ways, but this would need a blog post of It's own, so I am going to just leave it at that.

There is one more thing left I wanted to share. As I mentioned earlier, not all the machines were showing the warning with creation of DT_TEXTREL. When I dig into the sources of GNU's ld I found out that it is because some late optimizer might actually change the code so that generation of DT_TEXTREL is not necessary, I imagine by proving it can only be used from one place and generating RIP relative addresses from that place, because changing my section to .data.rel.ro seems wrong for me, but I forgot to check that actually ;(.