[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

uvm_fault in softdep_fsync


I must have a talent of getting myself into trouble with kernel. I'm
running 2.8-stable on i386. The problem is pretty much reproducible,
except it takes some weird conditions to reproduce it: I ssh in and run
xemacs in terminal, and if there's a network problem - kernel panics.
By inspecting *_curproc I found that this crash happens in xemacs

uvm_fault (0xec4bf19c, 0, 0, 1) -> 1
kernel: page fault trap, code=0
Stopped at _softdep_fsync + 0x1d: mov    0x20(%edx),%eax
ddb> t
_sys_fsync at _sys_fsync+0x65
_syscall() at _syscall+0x242
-- syscall (number 95)

show registers shows %edx is 0.

Here's _softdep_fsync disassembled:
e0241ccc <_softdep_fsync>:
e0241ccc:       55                      push   %ebp
e0241ccd:       89 e5                   mov    %esp,%ebp
e0241ccf:       83 ec 3c                sub    $0x3c,%esp
e0241cd2:       57                      push   %edi
e0241cd3:       56                      push   %esi
e0241cd4:       53                      push   %ebx
e0241cd5:       a1 ac 6c 3c e0          mov    0xe03c6cac,%eax
e0241cda:       89 45 dc                mov    %eax,0xffffffdc(%ebp)
e0241cdd:       8b 55 08                mov    0x8(%ebp),%edx
e0241ce0:       8b 92 a4 00 00 00       mov    0xa4(%edx),%edx
e0241ce6:       89 55 e4                mov    %edx,0xffffffe4(%ebp)
e0241ce9:       8b 42 20                mov    0x20(%edx),%eax

And by looking at softdep_fsync source I suspect it's ip which is 0:

	struct vnode *vp;	/* the "in_core" copy of the inode */
	struct diradd *dap, *olddap;
	struct inodedep *inodedep;
	struct pagedep *pagedep;
	struct worklist *wk;
	struct mount *mnt;
	struct vnode *pvp;
	struct inode *ip;
	struct buf *bp;
	struct fs *fs;
	struct proc *p = CURPROC;		/* XXX */
	int error, ret, flushparent;
	struct timespec ts;
	ino_t parentino;
	ufs_lbn_t lbn;

	ip = VTOI(vp);
	fs = ip->i_fs; <--- I guess it breaks here (0x20 is i_fs offset
                            in struct inode)

Note this 0x404f280f address, it's actually inside __thread_sys_fsync
in libc (have to confess, xemacs was linked against libc.so.24.4):

0005e808 <__thread_sys_fsync>:
   5e808:       b8 5f 00 00 00          mov    $0x5f,%eax
   5e80d:       cd 80                   int    $0x80
here>   5e80f:       72 ef                   jb     5e800 <_mpool_sync+0x268>
   5e811:       c3                      ret    

I guess mount output would be in order:

[greg_(_at_)_home greg]$ mount
/dev/wd0a on / type ffs (local)
/dev/wd0d on /var type ffs (local)
/dev/wd0e on /usr type ffs (local)
/dev/wd0f on /home type ffs (local)
/dev/wd0g on /usr/src type ffs (asynchronous, NFS exported, local)
mfs:31060 on /tmp type mfs (asynchronous, local, size=256000 512-blocks)
kernfs on /kern type kernfs (local)
amd:8198 on /a type nfs (v2, udp, intr, timeo=100, retrans=100)

Funny thing, I don't have any file systems with softdeps enabled...

I would really appreciate if you could give me a hint as to what I can
do to fix it.


P.S. I sent this info to art@, but I think he's away or too busy, so
maybe somebody else could take a look at this?