Debugging AMD64 crash dumps is made slightly more tricky when compared to SPARC due to its lack of register windows. In order to determine what a value was when initially passed into a function we can not look at a register in the previous register window. We must instead use the stack.

This is a topic that comes up frequently and one that I don't get enough practise at (and therefore tend to forget), it's a worthy blog entry. If you're really interested, all of this (and a lot more) is covered in Frank Hofmann's excellent book The Solaris Operating System on x86 Platforms: Crashdump Analysis, Operating System Internals which I can't recommend highly enough.

> ffffffff9bc9cc60::findstack -v

stack pointer for thread ffffffff9bc9cc60: fffffe800145d890

[ fffffe800145d890 _resume_from_idle+0xf8() ]

fffffe800145d8c0 swtch+0x12a()

fffffe800145d8e0 cv_wait+0x68()

fffffe800145d910 pr_p_lock+0x79()

fffffe800145d960 pr_lookup_piddir+0x7e()

fffffe800145d9c0 prlookup+0xd4()

fffffe800145da10 fop_lookup+0x35()

fffffe800145dbe0 lookuppnvp+0x1bf()

fffffe800145dc50 lookuppnat+0xf9()

fffffe800145dd10 lookupnameat+0x86()

fffffe800145de40 vn_openat+0x2aa()

fffffe800145def0 copen+0x1e5()

fffffe800145df00 open+0x19()

fffffe800145df10 sys_syscall+0x17b()

In the above stack we are interested in finding the first argument to pr_lookup_piddir(), which is a vnode_t pointer. We know that prlookup() makes a call to pr_lookup_piddir() therefore it must pass one of its registers to the input of pr_lookup_piddir(). A callee expects to find its input arg0 in register %rdi (this is part of the AMD64 ABI, more details are discussed in Frank's book and also at Solaris 64-bit Developer's Guide: AMD64 ABI Features). Therefore by disassembling the calling function we can check where %rdi comes from:

> prlookup+0xd4::dis

prlookup+0xae: orl %edx,%eax

prlookup+0xb0: testb $0x1,%al

prlookup+0xb2: jne +0xbf

prlookup+0xb8: cmpl $0x24,%r12d

prlookup+0xbc: je +0xb5

prlookup+0xc2: movl %r12d,%edx

prlookup+0xc5: xorl %eax,%eax

prlookup+0xc7: movq %r14,%rsi

prlookup+0xca: movq %rbx,%rdi

prlookup+0xcd: call *0xfffffffffbd0e460(,%rdx,8)

prlookup+0xd4: cmpq $0x1,%rax

At prlookup+0xca (just prior to calling pr_lookup_piddir) we see that the contents of register %rbx are moved to the callee's input register, %rdi. We now know that at the time we enter pr_lookup_piddir() both %rdi and %rbx contain the same value (a vnode_t pointer). If pr_lookup_piddir() is to use %rbx for scratch it must save the value so it can subsequently restore it when it returns control to pr_lookup().

We can disassemble pr_lookup_piddir() to get an idea of what it's doing (truncated for this example):

> pr_lookup_piddir::dis

pr_lookup_piddir: pushq %rbp

pr_lookup_piddir+1: movq %rsp,%rbp

pr_lookup_piddir+4: pushq %r15

pr_lookup_piddir+6: movq %rdi,%r15

pr_lookup_piddir+9: pushq %r14

pr_lookup_piddir+0xb: xorl %r14d,%r14d

pr_lookup_piddir+0xe: pushq %r13

pr_lookup_piddir+0x10: movq %rsi,%r13

pr_lookup_piddir+0x13: pushq %r12

pr_lookup_piddir+0x15: pushq %rbx

Above we are saving the caller's frame pointer (pushq %rbp) and setting our frame pointer (movq %rsp,%rbp) before we begin to push registers that we wish to reuse, onto the stack (the pushq instructions).

Of particular interest is pr_lookup_piddir+0x15 where we push %rbx onto the stack. From the top of the function this is the sixth pushq instruction and therefore the sixth register that we have stored to the stack. We can use this knowledge to vnode_t pointer we passed into pr_lookup_piddir().

Looking back at the ::findstack output we can see the function names on the right and the frame pointer on the left. pr_lookup_piddir() is the function that is pushing to the stack so we'll start with the pr_p_lock()'s fp (fffffe800145d910) and print down the stack, including pr_lookup_piddir()'s fp (fffffe800145d960):

> fffffe800145d910,10/naP

0xfffffe800145d910:

0xfffffe800145d910: 0xfffffe800145d960

0xfffffe800145d918: pr_lookup_piddir+0x7e

0xfffffe800145d920: 0xffffffff80037008

0xfffffe800145d928: 0xc

0xfffffe800145d930: 0xffffffffb0939700

0xfffffe800145d938: 0xffffffffb297e240

0xfffffe800145d940: 2

0xfffffe800145d948: 0xffffffffb0939700

0xfffffe800145d950: 0xfffffe800145dab0

0xfffffe800145d958: 0xfffffe800145da88

0xfffffe800145d960: 0xfffffe800145d9c0

0xfffffe800145d968: prlookup+0xd4

0xfffffe800145d970: 0xfffffe800145d990

0xfffffe800145d978: 0x19c691b80

0xfffffe800145d980: 0xffffffff816b6440

0xfffffe800145d988: 5

At 0xfffffe800145d960 we have pr_lookup()'s fp (1), this was the first register that we pushed to the stack. Counting five values up the stack we get to 0xfffffe800145d938 (2) which is the sixth value pushed to pr_lookup_piddir()'s stack. This value, 0xffffffffb297e240, is the value of pr_lookup()'s %rbx register when pr_lookup_piddir() was called. As we've shown above, this is also the register we sourced %rdi from and is therefore a vnode_t pointer:

> 0xffffffffb297e240::print vnode_t v_path

v_path = 0xffffffff91155060 "/proc/21391"

> 0t21391::pid2proc|::ps -f

S PID PPID PGID SID UID FLAGS ADDR NAME

R 21391 21379 21350 21265 41311 0x4a004000 ffffffff836021a0

/app/common/java/jdk1.5.0_14/bin/amd64/java -server -Xms1g -Xmx1g -Duser.langua

Just as expected! Furthermore, since we were waiting for a CV we were able to determine from the vnode what path we were waiting on and, since this was in /proc, we could even look up the process.

## No comments:

## Post a Comment