Sysprog

Operating System

Publish Date: 2022-10-27

lab3 ：A Minimal Debugger

Exercise 1:

Now read the source file and answer the following questions.

How the regs command is implemented?

Solution

定义user_regs_struct类型变量regs接收寄存器组
调用ptrace (PTRACE_GETREGS, c->pid, 0, &regs)将寄存器的值写入regs
打印各寄存器的值

Exercise 2:

Now, how the breaking point command b 0xaddr is implemented? In fact, there is a serious bug in current implementation we offered you. Let's check where is this bug. First run this

objdump -d hello

and figure out the address of the function print, suppose that address is 0xaddr on your machine. Now in the ssedb, you set up a break point on address 0xaddr by typing (remember this address must be in hexadecimal, which has a leading 0x):

(ssedb) b 0xaddr

now peek the registers:

(ssedb) regs

What's the value of rip? Is this value right? Why?

And then disassembly the content at address 0xaddr.

(ssedb) x/x 0xaddr

What's there? Have you detected the bug? How to fix this bug?

Solution

rip的值需要从两方面讨论，因为rip存储的是下一条指令的地址，设置断点时，将第一个字节设置为0xcc即int 3软中断指令，当子进程执行到此处时，会停止且发送信号给父进程，由于已经执行了int 3 那么其实rip的值是正确的，因为它确实指向了int 3的下一条指令的地址，但是，这种情况却不是我们想要的，因为我们为了让子进程停在断点处，于是覆盖了第一个字节，那么第一个字节的指令并未执行，所以在父进程中我们就需要恢复断点信息(因为第一个字节的指令并未执行)，即

rip = rip - 1
将0xcc改为原来的值

case CMD_KIND_BREAK:{
    unsigned long brk = 0x00000000000000cc;
    unsigned long data;
    unsigned long back;     // copy the origin data

    data = ptrace(PTRACE_PEEKDATA, c->pid, c->u.addr, 0);
    back = data;
    printf ("init data     = %lx\n", data);
    data = (data & 0xffffffffffffff00) | brk;
    printf ("revised data  = %lx\n", data);
    ptrace(PTRACE_POKEDATA, c->pid, c->u.addr, data);
    // child will stop in (int 3)
    ptrace(PTRACE_CONT, c->pid, 0, 0);
    waitChild ();

    // write back the origin rip
    struct user_regs_struct regs;
    ptrace (PTRACE_GETREGS, c->pid, 0, &regs);
    regs.rip--;
    ptrace (PTRACE_SETREGS, c->pid, 0, &regs);
    // write back the origin data
    ptrace(PTRACE_POKEDATA, c->pid, c->u.addr, back);

    return;
}

没有显示前导0，只需在打印时指定正确的格式即可，即printf ("%lx: %016lx\n", addr, data)

可以看到最低字节又修改为原来的值了

improvement

由于所给代码在b addr后会接着执行一条ptrace(PTRACE_CONT, c->pid, 0, 0);，这会导致traced程序直接命中断点，故将CONT指令移除，只设置断点，即写入一条int 3指令，并将原数据保存起来，定义struct breakpoint

struct breakpoint{
    long addr;
    unsigned long origin;
};

存放其地址和原始数据，而真正命中断点后还原断点信息的代码放在case CMD_KIND_CONT

case CMD_KIND_CONT:{
    ptrace(PTRACE_CONT, c->pid, 0, 0);
    int status;
    wait (&status);

    // back to origin rip
    struct user_regs_struct regs;
    ptrace (PTRACE_GETREGS, c->pid, 0, &regs);
    unsigned long long rip = regs.rip - 1;
    unsigned long data = ptrace(PTRACE_PEEKDATA, c->pid, rip, 0);
    if((uint8_t)data == 0xcc) {      // meet break point
        regs.rip--;
        ptrace(PTRACE_SETREGS, c->pid, 0, &regs);
        // write back the origin data
        unsigned long origin = get_origin();
        ptrace(PTRACE_POKEDATA, c->pid, rip, origin);
        printf("meet the break point %d at 0x%016x\n", idx, rip);
    }
    if(WIFEXITED(status))	// exit ?
        printf("process %d is exit!\n", c->pid);
    return;
}

这样做的结果是，b addr只设置断点，不会让traced程序继续运行，而c可以traced程序运行，并且可以命中断点，由于已经有了断点信息，索性增加了一个新的指令info：打印所有break point，实现如下：

int len = 0;	// 断点个数
int idx = 0;	// 命中个数

struct breakpoint bp[100];

void update_breakpoint_table(char *name, long addr, unsigned long origin){
    bp[len].name = name;
    bp[len].addr = addr;
    bp[len].origin = origin;
    len++;
}

在execCommand中新增case

case CMD_KIND_INFO:{
    for(int i = 0; i < len; i++)
        printf("break point %d  :  0x%016x\n", i + 1, bp[i].addr);
    return ;
}

最终执行的效果:

Exercise 3:

There is also a command to disassembly bianry into assembly intructions, but has not be completed. Now run

(ssedb) x/i 0xaddr

you'll see an error message indicating the file position you should supply code. Implement it. (Hint: manual disassemblying is tedious and error-prone, so you may find some libraries are helpful, such as the zydis.)

Solution

使用第三方库zydis，指令在我的机器上按little endian存储，故需要将获得的8字节16进制数据逆序且两两一组存放

ZyanU8 arr[80];  // define byte arr
long addr = c->u.addr;
for (int i = 0; i < 10; i++){
	long data = ptrace(PTRACE_PEEKDATA, c->pid, addr, 0);
	for(int j = 0; j < 8; j++){
		arr[8 * i + j] = data & 0x00000000000000ff;
		data = ((data & 0xffffffffffffff00) >> 8);
	}
	addr += 8;
}

这样机器码会按地址的从小到大存放在arr中，便于反汇编成汇编代码，然后交给zydis解析即可

ZyanU64 runtime_address = c->u.addr;

// Loop over the instructions in our buffer.
ZyanUSize offset = 0;
ZydisDisassembledInstruction instruction;

while (ZYAN_SUCCESS(ZydisDisassembleIntel(
	/* machine_mode:    */ ZYDIS_MACHINE_MODE_LONG_64,
	/* runtime_address: */ runtime_address,
	/* buffer:          */ arr + offset,
	/* length:          */ sizeof(arr) - offset,
	/* instruction:     */ &instruction
))) {
	printf("%016" PRIX64 "  %s\n", runtime_address, instruction.text);
	offset += instruction.info.length;
	runtime_address += instruction.info.length;
}

Challenge:

Another feature missing from the ssedb is debugging symbols. For instance, when setting up breaking points, we'd like just to type a symbolic name, such as:

(ssedb) b main

instead of an ugly hexadecimal address for main. Implement this feature.

Solution

调用BFD库，BFD库可以将符号和对应的值(地址)保存起来，这样如果输入一个符号，只需查询其地址即可

asymbol **symbol_table;	// symbol table
long num_symbols;		// length

// store symbol's name and value
void bfd_func(char *file){
    long storage_needed;
    bfd *abfd;

    bfd_init(); // magic

    abfd = bfd_openr(file, NULL);
    assert(abfd != NULL);
    bfd_check_format(abfd, bfd_object);
    storage_needed = bfd_get_symtab_upper_bound(abfd);
    assert(storage_needed >= 0);

    symbol_table = (asymbol**)malloc(storage_needed);
    assert(symbol_table != 0);
    num_symbols = bfd_canonicalize_symtab(abfd, symbol_table);
    assert(num_symbols >= 0);
}

// find symbol address
long symbol_address(char *s){
    symbol_info symbolinfo;
    for(int i = 0; i < num_symbols; i++){
        if (symbol_table[i]->section == NULL) continue;

        bfd_symbol_info(symbol_table[i], &symbolinfo);
        if (strcmp(s, symbolinfo.name))  continue;

        printf("%s : 0x%x\n", symbolinfo.name, symbolinfo.value);
        return (long)symbolinfo.value;
    }
}

在main.c中调用bfd_func()获得符号表，然后在parseCommand中判断输入是地址还是符号，如果是符号，则调用symbol_address()函数查找地址，顺便也实现了x/i symbol 和 x/x symbol

小结

本次实验整体不难，代码很全，难点是第三方库加入到工程中，需要一些cmake的知识。一开始将exercese2想的很简单，以为就前导0一个问题，不知道为什么要将第一个字节替换为0xcc，直到实现了x/i才知道，原来是需要一条int 3指令，使得子进程停止，那么问题就变得有意思了，再次返回父进程后我们需要将断点的原始信息还原。

Paranoid

http://Paran0idy.github.io/posts/1234.html