Server Help

ASSS Questions - Deadlock?

Anonymous - Sun Mar 23, 2008 6:25 pm
Post subject: Deadlock?
So I thought I had the "deadlock" situation under control by putting pthread_mutex_lock and unlock around every call to LLAdd and LLRemove, but i just got a deadlock today when me and my buddy were playin around in the zone. Can someone explain to me what exactly this is and how to fix it?
-me
tcsoccerman - Sun Mar 23, 2008 7:47 pm
Post subject:
As far as i know it happens after any crash. Review any modules you made and or added. Remove modules to see if it still happens. Remake the deadlock situation. Test. Debug. Yes it sucks.
JoWie - Sun Mar 23, 2008 8:03 pm
Post subject:
It can happen in two ways:

Main thread (The thread you are most likely working in) hangs / crashes
Some thread holds a mutex lock, and never unlocks it (or it just takes to long)


The deadlock module aborts ASSS if the main thread hasn't responded for over 10 seconds
Bak - Mon Mar 24, 2008 1:35 am
Post subject:
Well here's how a by-definition deadlock occurs:



Both threads need to acquire two locks, lock A and B. Thread 1 executes first and acquires lock A. After some time, thread 2 starts to execute (after all, one processor can only execute one thread at a time so it switches between them). It acquires lock B and then tries to get lock A. It can't get lock A since thread 1 holds the lock so thread 2 must stop executing until lock A is free. Thread 1 then executes and tries to acquire lock B. Since thread 2 holds lock B it can't get the lock. Now both threads are stuck forever.

Now this situation heavily depends on when tasks get preempted by the scheduler so it's a difficult bug to find and fix. Also, as mentioned before, if you never release a lock somewhere it can occur (as a thread will never be able to acquire the lock). An infinite loop inside a locked region (a critical section) may also cause a deadlock somewhere. And who said multi-threaded programming was supposed to be easy (until transactional memory becomes popular)!?!

I'd say do a

lm->log(L_DRIVEL,"got lock for MyLinkedList at line %i in file %s.\n", __LINE__, __FILE__);

when you acquire locks and

lm->log(L_DRIVEL,"released lock for MyLinkedList at line %i in file %s.\n", __LINE__, __FILE__);

whenever you release a lock. Then disable the deadlock module so it doesn't restart your server... when the program hangs check the log to see if there's some lock that you haven't released. (also do the same for arenalist locks and playerlist locks). If all locks are okay, do the same for every time you enter and leave a function in your code (or at least the popular ones like callbacks and interface functions). that way you can check for infinite loops. lastly it wouldn't surprise me if asss had deadlocks in the main code, since they're some of the hardest bugs to find and fix. However, since others aren't experiencing all these deadlocks it's most likely your code.
Anonymous - Mon Mar 24, 2008 10:27 am
Post subject:
Okay, so this is the situation: any time I kill a fake player, the server "deadlocks".
This wasn't always the case so I assume I added something that is causing this, but I have no idea what it could be. The deadlock described by definition does not occur, because I tried adding the messages and all locks are locked/released as they should be.
Next I tried adding messages at the beginning to the necessary bot functions to see where the hang occurs. No hints there either, no messages are sent/received before the hang.
So I don't really know what to do now. It seems like the Bot_killed function just isn't being called, which doesn't make sense to me because it's worked before and nothing has changed in it. But when the bot dies, the server freezes instantly and "u killed it" is never sent.
Note: Just to be sure, I also tried again by logman logging it and there's no message in the console log either.

So, what is the next step? I'm at a loss.

Code: Show/Hide

dmg->AddFake(bd->p, &bd->pos, Bot_Killed, Bot_Respawn, bd);

void Bot_Killed(Player *p, Player *killer, void *clos)
{
   chat->SendMessage(killer, "u killed it");
   
   BotData *bd = clos;
   void *v;
   
   chat->SendMessage(killer, "u killed it");
   
   stats->IncrementStat(killer, STAT_FLAG_POINTS, p->position.bounty);
   stats->SendUpdates(v);
   
   chat->SendMessage(killer, "u killed it");
   kill_bot(bd);
   
   if (get_bases(0) < 1)
   {
      chat->SendArenaMessage(ALLARENAS, "RTZ game has been won by team 1!");
      new_game();
   }
   else if (get_bases(1) < 1)
   {
      chat->SendArenaMessage(ALLARENAS, "RTZ game has been won by team 0!");
      new_game();
   }
}


Hang occurs when the bot dies, but none of the BotDied code is executed.
-me
Bak - Mon Mar 24, 2008 11:56 am
Post subject:
try using printf instead of logman or sendarenamessage. also make sure you end your lines with a \n so it'll flush the stream..


logman and sendarenamessage do buffering so they aren't instant.
Dr Brain - Mon Mar 24, 2008 6:42 pm
Post subject:
Try using the Hyperspace version from monotone. Branch asss.asss.hs. We had an issue with destroying fakes from a locked context, and I committed fixes to the monotone to solve them.

The patch to fake.c is all you really need, actually. http://asss.yi.org/viewmtn/viewmtn.py/revision/info/ae7e0babc0bc7d9862e7e0c3fe1ce0709d4abc1f
Bak - Tue Mar 25, 2008 8:26 am
Post subject:
wait is the problem when you kill a fake player or when you make the fake player leave the arena?
Animate Dreams - Wed Mar 26, 2008 11:21 am
Post subject:
Bak wrote:
try using printf instead of logman or sendarenamessage. also make sure you end your lines with a \n so it'll flush the stream..


logman and sendarenamessage do buffering so they aren't instant.


Does \n really flush the stream? =\ My professors told me the difference between using \n and using std::endl was that endl would flush the stream.
Bak - Wed Mar 26, 2008 11:23 am
Post subject:
it matters if you're using printf or cout. In my experience \n flushes printf. If you want to explicitly flush it do "cout << flush;" or "fflush(stdout);"

It wouldn't be hard to test, run this code:
Code: Show/Hide

#include "stdio.h"

int main()
{
   printf("hello");
   *((int*)0) = 0xbeef;
   return 0;
}
Then observe if it prints anything out before it crashes, try adding a '\n', try it with cout. Probably depends on your operating system too.
Animate Dreams - Wed Mar 26, 2008 11:26 am
Post subject:
Oh. I assumed since they were both streams, they'd operate basically the same as far as flushing.
Smong - Wed Apr 02, 2008 10:06 am
Post subject:
Code: Show/Hide
stats->SendUpdates(v);
v isn't initialised, but I doubt you would notice any side effects. You may want to use stats->SendUpdates(NULL) instead. Also are you attaching the points_kill module to the arena? You seem to be duplicating some of the code here.

Code: Show/Hide
kill_bot(bd);
Can you post the code for this?

As for \n flushing I would agree it probably depends on the OS/environment.
All times are -5 GMT
View topic
Powered by phpBB 2.0 .0.11 © 2001 phpBB Group