Server Help

ASSS Questions - Lots of segmentation faults?

Initrd.gz - Wed May 06, 2009 8:34 pm
Post subject: Lots of segmentation faults?
Hi all.
My ASSS program keeps closing, giving me a "Segmentation fault" message and leaving a file in the root directory. Is there a way to stop them, or at least a script that successfully restarts ASSS like in HS.

If it helps at all, I checked ASSS with Valgrind, and it detected several leaks in the core modules (I remember the config module was one)
Dr Brain - Wed May 06, 2009 8:43 pm
Post subject:
What's the file look like? If it's a back trace, you can find out why the zone was crashing.

Check out scripts/run-asss

You'll have to edit ASSSHOME to get it to work. After that, just run the script.
Initrd.gz - Wed May 06, 2009 9:18 pm
Post subject:
Dr Brain wrote:
What's the file look like? If it's a back trace, you can find out why the zone was crashing.

Currently, the backtrace files just say gdb isn't installed. It is now, so the next time it happens, it should do a dump.

Dr Brain wrote:
Check out scripts/run-asss

You'll have to edit ASSSHOME to get it to work. After that, just run the script.
Mmk, thanks. I'll look at that.
Dr Brain - Wed May 06, 2009 9:29 pm
Post subject:
Oh, to answer your question about valgrind: you'll have to remember to shut down the zone, or valgrind picks up some things that aren't really leaks. Especially player related things.

There was a minor leak in config. Probably the one you noticed. The fix will be part of the 1.5.0 release. If you care, the fix is to add afree(ch) on line 500 of config.c

Code: Show/Hide

--- a/src/core/config.c Tue Feb 03 15:55:10 2009 -0600
+++ b/src/core/config.c Sun Feb 08 18:24:52 2009 -0500
@@ -497,6 +497,7 @@
    removed = LLRemove(&cf->handles, ch);
    pthread_mutex_unlock(&cf->mutex);
    assert(removed);
+    afree(ch);
}

Initrd.gz - Sun May 10, 2009 9:28 pm
Post subject:
Here's the backtrace
Dr Brain - Sun May 10, 2009 9:42 pm
Post subject:
Here's the important part of that backtrace:

Code: Show/Hide
#7  0x0804f0f4 in afree (ptr=0x58850026) at main/util.c:136
#8  0x080952b5 in paction (p=0x8150b58, action=1) at core/filetrans.c:301
   ud = (struct upload_data *) 0x8151128


So paction in core/filetrans.c is calling afree on something that's not valid. I can't tell more from the backtrace.

Can you consistently reproduce the crash? What sort of conditions does it happen under?
Initrd.gz - Mon May 11, 2009 5:15 pm
Post subject:
I have absolutely no idea. No one goes on my server (or at least they shouldn't be at the moment). There are no recent log messages before the server crashes.

I have played on my server every now and then with another player. We can play for 30 minutes straight, using ?quickfix to mess around with the settings, and even do the speed/thrust glitch that makes you go through walls without problems.
D1st0rt - Tue May 12, 2009 8:33 am
Post subject:
A trick I learned when using Valgrind, you can get rid of the ??? trace frames in dynamically loaded modules by commenting out a part of cmod.c:
Code: Show/Hide
local int unload_c_module(mod_args_t *args)
{
   c_mod_data_t *cmd = args->privdata;
   if (cmd->main)
      if ((cmd->main)(MM_UNLOAD, mm, ALLARENAS) == MM_FAIL)
         return MM_FAIL;
   /*if (cmd->handle && !cmd->ismyself)
      dlclose(cmd->handle);
   afree(cmd);*/
   return MM_OK;
}

Initrd.gz - Tue May 26, 2009 6:46 pm
Post subject:
Mmk, we were messing around in my zone, and it crashed three times. I think it may be linked to sound macros ("%##"), but I still can't get it to crash consistently.

EDIT: I also use quickfix a bit.
Initrd.gz - Wed Jun 03, 2009 6:16 pm
Post subject:
Looks like the object passed to afree is a string allocated by astrdup. I don't see how it's going wrong.

You think I could rewrite filetrans in python, since string manipulation is easier and there's no allocation/deallocation calls?
Goldeye - Wed Jun 10, 2009 4:04 am
Post subject: Re: Lots of segmentation faults?
Initrd.gz wrote:
Hi all.
My ASSS program keeps closing, giving me a "Segmentation fault" message and leaving a file in the root directory. Is there a way to stop them, or at least a script that successfully restarts ASSS like in HS.

If it helps at all, I checked ASSS with Valgrind, and it detected several leaks in the core modules (I remember the config module was one)


Btw, the file in the root directory is a core file. You can run "gdb -c <core file left behind> <asss bin file>"
Initrd.gz - Wed Jun 17, 2009 6:17 pm
Post subject:
I did a change in filetrans.c so now, instead of juggling string allocations around, it simply has a string array of 256 bytes and uses astrncpy to copy the strings. Hopefully it will work.
Initrd.gz - Wed Jun 24, 2009 7:33 pm
Post subject:
Nope. The random crashes have stopped, but every time someone leaves the zone, the server crashes...

EDIT:
I figured how to use GDB, and at the function cleanup_ud, it says that ud->work_dir, as well as ud->fname are "out of range" pointers.
Dr Brain - Thu Jun 25, 2009 7:10 am
Post subject:
Try adding a watchpoint on them in GDB. See if something is corrupting them. You may have to unload the deadlock module to prevent things from dying when you're typing in commands in gdb.
Initrd.gz - Thu Jun 25, 2009 10:30 pm
Post subject:
Code: Show/Hide
$ gdb bin/asss
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
(gdb) break core/filetrans.c:paction
Breakpoint 1 at 0x8095250: file core/filetrans.c, line 292.
(gdb) run
Starting program: /home/initrdgz/asss-1.4.4/bin/asss
[Thread debugging using libthread_db enabled]
asss 1.4.4 built at Jun 25 2009 00:59:28
Loading modules...
<loading modules removed>

Breakpoint 1, paction (p=0x814fbe8, action=0) at core/filetrans.c:292
292             struct upload_data *ud = PPDATA(p, udkey);
(gdb) watch ud
Hardware watchpoint 2: ud
(gdb) c
Continuing.
Hardware watchpoint 2: ud

Old value = (struct upload_data *) 0x8150114
New value = (struct upload_data *) 0x815016c
paction (p=0x814fbe8, action=0) at core/filetrans.c:293
293             LOCK();
(gdb) print ud->work_dir
$1 = 0x0
(gdb) c
Continuing.

Watchpoint 2 deleted because the program has left the block in
which its expression is valid.
0x08057ac1 in process_player_states (v=0x0) at core/core.c:342
342                                     DO_CBS(CB_PLAYERACTION,
(gdb) info watchpoint
Num     Type           Disp Enb Address    What
1       breakpoint     keep y   0x08095250 in paction at core/filetrans.c:292
        breakpoint already hit 1 time
(gdb)


Have not unloaded deadlock yet.
Dr Brain - Fri Jun 26, 2009 9:29 am
Post subject:
It looks like gdb won't watch things that go out of scope. I'm not sure how to get around that. I tried setting a watch on the address on my machine, but that didn't seem to work. Instead it locked up the process.
Initrd.gz - Fri Jun 26, 2009 2:40 pm
Post subject:
yeah I tried that too. I think it converts the hex adderss into an integer and tries to watch that. I'll see if I can compile gdb manually and look for a configuration option.
Dr Brain - Fri Jun 26, 2009 3:09 pm
Post subject:
Well, a decimal integer and a hex integer are still the same address.
Initrd.gz - Fri Jun 26, 2009 9:34 pm
Post subject:
I read a document, saying that Valgrind can detect this sort of stuff.

I did a test. I ran:
Quote:
valgrind --trace-children=yes --tool=memcheck --leak-check=yes --track-origins=yes --log-file=valgrind.log bin/asss

Complete log file is attached. The interesting part is at around line 461:
Quote:
==8341== Invalid free() / delete / delete[]
==8341== at 0x4025E5A: free (vg_replace_malloc.c:323)
==8341== by 0x804F0F9: afree (util.c:136)
==8341== by 0x80952DE: paction (filetrans.c:303)
==8341== by 0x8057F50: process_player_states (core.c:435)
==8341== by 0x805A0DF: RunLoop (mainloop.c:63)
==8341== by 0x804CF9B: main (main.c:293)
==8341== Address 0x384c0077 is 1158375 bytes inside data symbol "temporary"

Perhaps the data has already been free'd? From what I can tell, the module null's all the pointers, and I have FREE_DOESNT_CHECK_NULL defined, so afree doesn't free null pointers.

PS. for 1.5.0, looks like afree only checks the pointer if FREE_DOESNT_CHECK_NULL is defined. It should be #ifndef or FREE_CHECKS_NULL.
Initrd.gz - Tue Jul 28, 2009 5:17 pm
Post subject:
Amazing. Redownload it and everything works...

grav_eek.gif
Dr Brain - Tue Jul 28, 2009 10:26 pm
Post subject:
At least it works now, even if we never will know what was wrong.
All times are -5 GMT
View topic
Powered by phpBB 2.0 .0.11 © 2001 phpBB Group