Monday, October 8, 2018

Pthread - Understanding deadlock

In this blog, we will understand deadlock in software applications. We will create deadlock using coding examples and analyze it. For coding examples, I have used pthreads in Ubuntu Linux.


What is deadlock


Deadlock is a situation when process1 has acquired resource1 and is waiting for resource2. At the same time, resource2 has been acquired by another process2 who is waiting for resource1. 

So, both the processes are holding one resource and waiting for the other resource. None of the processes is ready to release the already acquired resource. This is causing deadlock.


Image result for deadlock diagram



Deadlock can happen between processes or between threads of a process. We will be using both the terms for explaining the concept.

Deadlock happens when four conditions met

  • Circular wait for resources: It means a process/thread is holding a resource and trying to acquire another resource which is held by another process/thread.
  • Mutual exclusion: Mutual exclusion means resources are not shared. It can be owned by one process or one thread at a time.
  • Hold and wait: It means to hold a resource while waiting for another resource.
  • No pre-emption: It means the thread will not release the acquired resource until it finishes the task.

How to avoid deadlock

We can break any of the above four conditions to avoid deadlock.
  • No mutual exclusion for resources, which means resources can be shared between processes or threads.
  • No hold and wait. To implement this, need to ensure that only if all the required resources are available, allow a process/thread to proceed with acquiring the resources.
  • The circular wait is not allowed. This can be done by putting a check to ensure that wait for resources is making a circle.
  • Support pre-emption of resources. One way to implement this is if there is a delay in acquiring resources, release the already acquired resource.    

Deadlock symptoms

Deadlock happens when processes or threads are waiting for resources. What happens when a process/thread is waiting for a resource?

It goes into sleep. So, if there is a deadlock, the ps command output shows participating processes/threads in sleep and it remains in sleep for a prolonged period. The processes/threads may appear hung. The ps output shows thread's state as D or S if it's in sleep.
               D    uninterruptible sleep (usually IO)
               S    interruptible sleep (waiting for an event to complete)


Deadlock analysis with examples

We will create a deadlock and analyze it using two examples.  First with a producer-consumer example where the wrong placement of signal causes deadlock. In the second example, a wrong code block is causing all threads in wait.

Example 1: Producer-consumer example


// Producer
void *threadProducer(void *arg)
{
  int i;
  for(i = 0; i < loop; i++)
  {
        produce();
        sem_wait(&mtx);  // Continue if consumer is not consuming
        append();
        n++;
        if(n == 1) sem_post(&delay);    // Unblock consumer
        sem_post(&mtx);
  }
  printf("Producer DONE\n");
  return NULL;
}
// Consumer
void *threadConsumer(void *arg)
{
  int i;
  int tmp;
  sem_wait(&delay);
  for(i = 0; i < loop; i++)
  {
        sem_wait(&mtx);  // Continue if producer is not producing
        take();
        n--;
        if(n == 0 && i < loop - 1) sem_wait(&delay);
                                     // Block self if no data items
        consume();
        sem_post(&mtx);  // Signal done with consuming
  }
  printf("Consumer DONE\n");
  return NULL;
}
int main()
{
        sem_init(&mtx, 0, 1);
        sem_init(&delay, 0, 0);
        pthread_t prod, cons;

        pthread_create(&cons, NULL, threadConsumer, NULL);
        pthread_create(&prod, NULL, threadProducer, NULL);

        pthread_join(prod, NULL);
        pthread_join(cons, NULL);

        sem_destroy(&delay);
        sem_destroy(&mtx);
        return 0;
}

In the above example, the following sequence is creating deadlock:
  1. The mutexes mtx is initialized with 1 and delay with 0.
  2. The consumer is waiting in sem_wait(&delay);
  3. The producer produces one item and makes the value of mtx and delay to 1.
  4. The consumer is inside for loop and value of both semaphores is 0.
  5. Next the line "if(n == 0 && i < loop - 1) sem_wait(&delay);" causes wait in sem_wait()
  6. The producer is waiting in line "sem_wait(&mtx);" as mtx value is 0.
  7. Now both producer and consumer are in a deadlock.
To avoid deadlock, sem_post(&mtx) should be before sem_wait(&delay).


Ps output

All the three threads for process id 7113 are in sleep. It is shown by 'S' in the state option of ps output.


xxxx# ps -eflT|grep a.out
F S UID        PID  SPID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
0 S root      7469  7469  2074  0  80   0 - 22155 futex_ 22:00 pts/0    00:00:00 ./a.out
1 S root      7469  7470  2074  0  80   0 - 22155 futex_ 22:00 pts/0    00:00:00 ./a.out
1 S root      7469  7471  2074  0  80   0 - 22155 futex_ 22:00 pts/0    00:00:00 ./a.out


GDB output

The first thread is in sleep while executing pthread_join() for the producer and consumer threads. Second and third threads are for producer and consumer. It is in sleep in executing sem_wait() statement.



(gdb) info thread
  Id   Target Id         Frame 
* 1    Thread 0x7ffff7fdf740 (LWP 7135) "a.out" 0x00007ffff7bbed2d in __GI___pthread_timedjoin_ex (
    threadid=140737337112320, thread_return=0x0, abstime=0x0, block=) at pthread_join_common.c:89
  2    Thread 0x7ffff77c4700 (LWP 7139) "a.out" 0x00007ffff7bc66d6 in futex_abstimed_wait_cancelable (private=0, 
    abstime=0x0, expected=0, futex_word=0x555555756060 ) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
  3    Thread 0x7ffff6fc3700 (LWP 7140) "a.out" 0x00007ffff7bc66d6 in futex_abstimed_wait_cancelable (private=0, 
    abstime=0x0, expected=0, futex_word=0x555555756040 ) at ../sysdeps/unix/sysv/linux/futex-internal.h:205


Example 2: Wrong code block causing deadlock

In the below example, a wrong condition is causing all threads to enter into the wait.
The block "while (id != n)" may cause deadlock here. It's possible that all the threads might meet at condition "(id != n)" and result in wait. With no thread to signal, it will cause deadlock. 




void *func(void *v)
{
        int n = *((int*)v);
        int count = 0;
        while (count < 10)
        {
                pthread_mutex_lock(&token);
                while (id != n) // This block is causing deadlock
                {
                        printf("Its not my turn, n=%d id = %d\n",n, id);
                        pthread_cond_wait(&cond, &token);
                }
                count += 1;
                printf ("My turn! id= %d\n",n);
                printf("count %d\n", count);
                if (id == 3)
                      id = 0;
                else
                      id += 1;

                pthread_mutex_unlock(&token);
                pthread_cond_signal(&cond);
        }
        printf ("ID=%d has finished\n",n);
        return(NULL);
}


In this example, to avoid deadlock, need to put a check so that all threads are not in wait.


Ps output

Ps output shows all threads in sleep as shown by state S for all threads of process id 7361.


xxxx# ps -eflT|grep a.out
F S UID        PID  SPID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
0 S root      7361  7361  7152  0  80   0 - 26253 futex_ 18:44 pts/2    00:00:00 ./a.out
1 S root      7361  7362  7152  0  80   0 - 26253 futex_ 18:44 pts/2    00:00:00 ./a.out
1 S root      7361  7363  7152  0  80   0 - 26253 futex_ 18:44 pts/2    00:00:00 ./a.out
1 S root      7361  7364  7152  0  80   0 - 26253 futex_ 18:44 pts/2    00:00:00 ./a.out
1 S root      7361  7365  7152  0  80   0 - 26253 futex_ 18:44 pts/2    00:00:00 ./a.out



GDB output

The main thread is waiting at pthread_join(). Rest of the threads are waiting at futex_wait_cancelable() which is called by pthread_cond_wait().



(gdb) info threads
  Id   Target Id         Frame 
* 1    Thread 0x7ffff7fdf740 (LWP 7324) "a.out" 0x00007ffff7bbed2d in __GI___pthread_timedjoin_ex (
    threadid=140737345505024, thread_return=0x0, abstime=0x0, block=) at pthread_join_common.c:89
  2    Thread 0x7ffff77c4700 (LWP 7328) "a.out" 0x00007ffff7bc39f3 in futex_wait_cancelable (private=, 
    expected=0, futex_word=0x5555557560cc ) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  3    Thread 0x7ffff6fc3700 (LWP 7329) "a.out" 0x00007ffff7bc39f3 in futex_wait_cancelable (private=, 
    expected=0, futex_word=0x5555557560c8 ) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  4    Thread 0x7ffff67c2700 (LWP 7330) "a.out" 0x00007ffff7bc39f3 in futex_wait_cancelable (private=, 
    expected=0, futex_word=0x5555557560cc ) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  5    Thread 0x7ffff5fc1700 (LWP 7331) "a.out" 0x00007ffff7bc39f3 in futex_wait_cancelable (private=, 
    expected=0, futex_word=0x5555557560cc ) at ../sysdeps/unix/sysv/linux/futex-internal.h:88



Extra bytes

GDB output shows that Semaphore, mutex and condition variables are created using futex (fast userspace mutex).

No comments:

Post a Comment