Wednesday, December 4, 2019

DS: Termination Detection using Snapshots

Disclaimer: This article provides review and summary of  the given book & journal reference.



1. Introduction
One of fundamental problem in DS is to determine if a distributed computation has terminated. In DS, a problem is typically solved in a distributed manner with the cooperation of a number of processes. It is necessary to determine when the execution of a particular subproblem has ended so that the execution of the next subproblem may begin, or if the whole problem has been solved, and no longer additional computation is needed.

The detection of the termination in DS is non-trivial since no process has complete knowledge of the global state, and global time does not exist. A termination detection (TD) algorithm was developed to solved this issue. TD must ensure the following:

  1. Execution of a TD algorithm cannot indefinitely delay the underlying computation.
  2. TD algorithm must not require addition of new communication channels between processes.
This article provides a glance introduction of TD based on literature review.


2. System model of a distributed computation
Distributed computation consists of a fixed set of processes that communicate solely by message passing; usually this is done asynchronously. This means messages sent over the same communication channel may not obey the FIFO ordering.
A distributed computation has the following characteristics:

  1. During execution, a process can be in only one of the two states: active-busy or idle-passive.
  2. An active process can become idle at any time, once it finishes.
  3. An idle process can become active only on the receipt of a message from another process.
  4. Only active processes can send message.
  5. A message can be received by a process in idle or active.
  6. The sending of a message and the receipt of a message occur as atomic actions.
TD is obviously not necessary if the DS contains some processed that never been idle. The termination of distributed computation is model as:

pi(t) = the state (active or idle) of process pi at instant t
ci,j(t) = the number of messages in transit in the channel at instant t from process pi to process pj.



3. Termination detection using distributed snapshots
The algorithm was originally introduced by Huang. The algorithm uses the fact that a consistent snapshot of a distributed system captures stable properties. Termination of a distributed computation is a stable property. Thus, if a consistent snapshot of a distributed computation is taken after the distributed computation has terminated, the snapshot will capture the termination of the computation. The main idea behind the algorithm is as follows: when a computation terminates, there must exist a unique process which became idle last:

  1. When a process goes from active to idle, it issues a request to all other processes to take a local snapshot, and also requests itself to take a local snapshot.
  2. When a process receives the request, if it agrees that the requester became idle before itself, it grants the request by taking a local snapshot for the request.
  3. A request is said to be successful if all processes have taken a local snapshot for it.
  4. The requester or any external agent may collect all the local snapshots of a request.
  5. If a request is successful, a global snapshot of the request can thus be obtained and the recorded state will indicate termination of the computation: in the recorded snapshot, all the processes are idle and there is no message in transit to any of the processes.
The algorithm can be described in the following 7 steps.

  1. Each process i maintains a logical clock denoted by x, initialized to zero at the start of the computation.
  2. A process increments its x by one each time it becomes idle.
  3. A basic message sent by a process at its logical time x is of the form B(x).
  4. A control message that requests processes to take local snapshot issued by process i at its logical time x is of the form R(x, i).
  5. Each process synchronizes its logical clock x loosely with the logical clocks x’s on other processes in such a way that it is the maximum of clock values ever received or sent in messages.
  6. A process also maintains a variable k such that when the process is idle, (x,k) is the maximum of the values (x, k) on all messages R(x, k) ever received or sent by the process.
  7. Logical time is compared as follows: (x, k) > (x’, k’) if (x > x’) or ((x=x’) and (k>k’)), i.e., a tie between x and x’ is broken by the process identification numbers k and k’

The algorithm is defined by the following four rules:

The last process to terminate will have the largest clock value. Therefore, every process will take a snapshot for it, however, it will not take a snapshot for any other process.


4. Other TD Models
More TD models and algorithms were introduced to handle more DS implementations, such as:

  1. Weight throwing TD Algorithm
  2. A spanning-tree-based TD Algorithm
  3. Message-optimal termination detection
Discussion on these models are not covered in this article. The readers are encouraged to read about these models in the reference materials.


5. Conclusion
The TD is a basic activity of any DS model. This article provides introduction of the simplest form of TD algorithm known as Distributed Snapshot approach. A number of algorithms have been developed to detect the termination of a distributed computation. However, a comparative study between these algorithms was not seemed to be done yet. This article also encourages other researchers to continue the study in hope that the comparative result can be obtained in the future.


6. References
Distributed Computing: Principles, Algorithms, and Systems. 2008. Cambridge University Press.
Author: Ajay D.Kshemkalyani, Mukesh Singhai
Termination Detection by Using Distributed Snapshots. 1989. Information Processing Letters Vol. 32 No. 3. Elsevier Science Publishers.
Author: Shing-Tsaan HUANG

Tuesday, August 7, 2018

VirtualBox: Guest OS Loses Network Connection

Wow, its been 8 years since my last post. haha.

Well recently I found issue when dealing with VirtualBox and I think it will be best to put it here for other people experiencing similar problem.

One of the problem similar to this can be found in virtualbox website. https://www.virtualbox.org/ticket/6559


Environment:
VirtualBox 5 installed in host running under Ubuntu 16 with multiple network cards.
1 guest with same OS version with the host and set up with bridge mode. One of the host's eth was assigned directly to the guest.
Both host and Guest has static IP configured, and can access internet and ping each other without error.

Symptoms:
The issue occurs after a while (5-10 minutes). It appeared that the Guest Network is inactive. All ssh connection broken and new ssh will directed to the host not the guest.
Ifconfig working ok and display normal information. Testing connectivity to the internet by pinging to google.com, surprisingly bring the connectivity back.
After some googling, found some similar issues that the solution was using continuous ping to google
(just type ping google.com in the virtual box guest window)

Having try various methods to overcome this, for example:
  1. Modify Grup loader with parameter acpi=off apm=off
    But cousing problem in booting up guest. Must to log to safe mode and redo the change in grub loader config
  2. Changing to other Network Card on the Guest Network Setting
    Problem still exist
  3. etc.
Workaround:
Finally miracle happens, I try to change Promiscuous Mode from Deny to Allow All. The start the guest again. And wow the connection is maintained.
Ow forgot to mention my last Guest Adapter Network Type: is virtio-net. Previously it is Intel T Server, Don't know if this relevant though. But I will stick with this for a while, until issue for the adapter rised.

Actually I haven't verified whether this is permanent solution or not, but its been more than 1 hour since I changed the configuration, so I guest this can be marked as the answer.
EDIT: It is confirmed that the issue still occurs. The Real Solution had been found below.

Real Solution:
The real culprit of this issue was finally found. It is because the IP address. The current configuation: it the VMs' IP address defined in the host and the IP assigned in the VM itself is same IP. This seemed cause problem in the network, that some time the  network got lost due too network activation in both host and vm.
So to solve the issue we have changed the host eth ip, that is expected to use VM, to other IP address. So all host eth and guests' eth doesn't have same IP at all.

Wednesday, February 17, 2010

Tips & Tricks - General

Tips and tricks when programming VB6.

Thursday, February 4, 2010

Ms. Word Automation

How to automate MS. Word by using VB6?