What is deadlock
Deadlock is a situation when process1 has acquired resource1 and is waiting for resource2. At the same time, resource2 has been acquired by another process2 who is waiting for resource1.
So, both the processes are holding one resource and waiting for the other resource. None of the processes is ready to release the already acquired resource. This is causing deadlock.
Deadlock can happen between processes or between threads of a process. We will be using both the terms for explaining the concept.
Deadlock happens when four conditions met
- Circular wait for resources: It means a process/thread is holding a resource and trying to acquire another resource which is held by another process/thread.
- Mutual exclusion: Mutual exclusion means resources are not shared. It can be owned by one process or one thread at a time.
- Hold and wait: It means to hold a resource while waiting for another resource.
- No pre-emption: It means the thread will not release the acquired resource until it finishes the task.
How to avoid deadlock
We can break any of the above four conditions to avoid deadlock.
- No mutual exclusion for resources, which means resources can be shared between processes or threads.
- No hold and wait. To implement this, need to ensure that only if all the required resources are available, allow a process/thread to proceed with acquiring the resources.
- The circular wait is not allowed. This can be done by putting a check to ensure that wait for resources is making a circle.
- Support pre-emption of resources. One way to implement this is if there is a delay in acquiring resources, release the already acquired resource.
Deadlock symptoms
Deadlock happens when processes or threads are waiting for resources. What happens when a process/thread is waiting for a resource?
It goes into sleep. So, if there is a deadlock, the ps command output shows participating processes/threads in sleep and it remains in sleep for a prolonged period. The processes/threads may appear hung. The ps output shows thread's state as D or S if it's in sleep.
D uninterruptible sleep (usually IO)
S interruptible sleep (waiting for an event to complete)
Deadlock analysis with examples
We will create a deadlock and analyze it using two examples. First with a producer-consumer example where the wrong placement of signal causes deadlock. In the second example, a wrong code block is causing all threads in wait.Example 1: Producer-consumer example
// Producer
void *threadProducer(void *arg)
{
int i;
for(i = 0; i < loop; i++)
{
produce();
sem_wait(&mtx); // Continue if consumer is not consuming
append();
n++;
if(n == 1) sem_post(&delay); // Unblock consumer
sem_post(&mtx);
}
printf("Producer DONE\n");
return NULL;
}
// Consumer
void *threadConsumer(void *arg)
{
int i;
int tmp;
sem_wait(&delay);
for(i = 0; i < loop; i++)
{
sem_wait(&mtx); // Continue if producer is not producing
take();
n--;
if(n == 0 && i < loop - 1) sem_wait(&delay);
// Block self if no data items
consume();
sem_post(&mtx); // Signal done with consuming
}
printf("Consumer DONE\n");
return NULL;
}
int main()
{
sem_init(&mtx, 0, 1);
sem_init(&delay, 0, 0);
pthread_t prod, cons;
pthread_create(&cons, NULL, threadConsumer, NULL);
pthread_create(&prod, NULL, threadProducer, NULL);
pthread_join(prod, NULL);
pthread_join(cons, NULL);
sem_destroy(&delay);
sem_destroy(&mtx);
return 0;
}
- The mutexes mtx is initialized with 1 and delay with 0.
- The consumer is waiting in sem_wait(&delay);
- The producer produces one item and makes the value of mtx and delay to 1.
- The consumer is inside for loop and value of both semaphores is 0.
- Next the line "if(n == 0 && i < loop - 1) sem_wait(&delay);" causes wait in sem_wait()
- The producer is waiting in line "sem_wait(&mtx);" as mtx value is 0.
- Now both producer and consumer are in a deadlock.
Ps output
All the three threads for process id 7113 are in sleep. It is shown by 'S' in the state option of ps output.
xxxx# ps -eflT|grep a.out
F S UID PID SPID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
0 S root 7469 7469 2074 0 80 0 - 22155 futex_ 22:00 pts/0 00:00:00 ./a.out
1 S root 7469 7470 2074 0 80 0 - 22155 futex_ 22:00 pts/0 00:00:00 ./a.out
1 S root 7469 7471 2074 0 80 0 - 22155 futex_ 22:00 pts/0 00:00:00 ./a.out
GDB output
The first thread is in sleep while executing pthread_join() for the producer and consumer threads. Second and third threads are for producer and consumer. It is in sleep in executing sem_wait() statement.
(gdb) info thread
Id Target Id Frame
* 1 Thread 0x7ffff7fdf740 (LWP 7135) "a.out" 0x00007ffff7bbed2d in __GI___pthread_timedjoin_ex (
threadid=140737337112320, thread_return=0x0, abstime=0x0, block=) at pthread_join_common.c:89
2 Thread 0x7ffff77c4700 (LWP 7139) "a.out" 0x00007ffff7bc66d6 in futex_abstimed_wait_cancelable (private=0,
abstime=0x0, expected=0, futex_word=0x555555756060 ) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
3 Thread 0x7ffff6fc3700 (LWP 7140) "a.out" 0x00007ffff7bc66d6 in futex_abstimed_wait_cancelable (private=0,
abstime=0x0, expected=0, futex_word=0x555555756040 ) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
Example 2: Wrong code block causing deadlock
In the below example, a wrong condition is causing all threads to enter into the wait.The block "while (id != n)" may cause deadlock here. It's possible that all the threads might meet at condition "(id != n)" and result in wait. With no thread to signal, it will cause deadlock.
void *func(void *v)
{
int n = *((int*)v);
int count = 0;
while (count < 10)
{
pthread_mutex_lock(&token);
while (id != n) // This block is causing deadlock
{
printf("Its not my turn, n=%d id = %d\n",n, id);
pthread_cond_wait(&cond, &token);
}
count += 1;
printf ("My turn! id= %d\n",n);
printf("count %d\n", count);
if (id == 3)
id = 0;
else
id += 1;
pthread_mutex_unlock(&token);
pthread_cond_signal(&cond);
}
printf ("ID=%d has finished\n",n);
return(NULL);
}
In this example, to avoid deadlock, need to put a check so that all threads are not in wait.
Ps output
Ps output shows all threads in sleep as shown by state S for all threads of process id 7361.
xxxx# ps -eflT|grep a.out
F S UID PID SPID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
0 S root 7361 7361 7152 0 80 0 - 26253 futex_ 18:44 pts/2 00:00:00 ./a.out
1 S root 7361 7362 7152 0 80 0 - 26253 futex_ 18:44 pts/2 00:00:00 ./a.out
1 S root 7361 7363 7152 0 80 0 - 26253 futex_ 18:44 pts/2 00:00:00 ./a.out
1 S root 7361 7364 7152 0 80 0 - 26253 futex_ 18:44 pts/2 00:00:00 ./a.out
1 S root 7361 7365 7152 0 80 0 - 26253 futex_ 18:44 pts/2 00:00:00 ./a.out
GDB output
The main thread is waiting at pthread_join(). Rest of the threads are waiting at futex_wait_cancelable() which is called by pthread_cond_wait().
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7ffff7fdf740 (LWP 7324) "a.out" 0x00007ffff7bbed2d in __GI___pthread_timedjoin_ex (
threadid=140737345505024, thread_return=0x0, abstime=0x0, block=) at pthread_join_common.c:89
2 Thread 0x7ffff77c4700 (LWP 7328) "a.out" 0x00007ffff7bc39f3 in futex_wait_cancelable (private=,
expected=0, futex_word=0x5555557560cc ) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
3 Thread 0x7ffff6fc3700 (LWP 7329) "a.out" 0x00007ffff7bc39f3 in futex_wait_cancelable (private=,
expected=0, futex_word=0x5555557560c8 ) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
4 Thread 0x7ffff67c2700 (LWP 7330) "a.out" 0x00007ffff7bc39f3 in futex_wait_cancelable (private=,
expected=0, futex_word=0x5555557560cc ) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
5 Thread 0x7ffff5fc1700 (LWP 7331) "a.out" 0x00007ffff7bc39f3 in futex_wait_cancelable (private=,
expected=0, futex_word=0x5555557560cc ) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
No comments:
Post a Comment