Computational Graphs
The key to parallel programming is determining which steps within a program can be executed in parallel and then figuring out how to coordinate them, and one tool to help model how steps in a program relate to each other is a computational graph.
Example
Consider the steps to make a very simple salad. I'll need to chop some lettuce, chop some tomatoes, mix those chopped ingredients together, and then finally add salad dressing. Each of those steps represents a task which is a unit of execution or a unit of work. I can draw those tasks as nodes in a graph and use arrows to indicate progression from one task to the next. A task cannot begin executing until all of the tasks with arrows feeding into it have completed. When those four tasks are arranged sequentially like this, they represent a single path of execution which could be implemented as a single thread or process.
If I do those steps in that order, I'll make a somewhat boring salad, but that's not the only ordering that could work.
The tasks of chopping lettuce and chopping tomatoes can occur asynchronously, meaning the order in which they happen relative to each other doesn't really matter. I could chop the lettuce first or the tomatoes first, or ideally, I'll chop them both in parallel to save some time.
So I'll add another node to the front of my diagram labeled spawn which has two output arrows leading into the two asynchronous chopping tasks. Both tasks can begin running at any time after the spawn operation. Now, there is a dependency between those two chopping tasks and the third task, because I can't mix the chopped ingredients together until both of the chopping tasks are complete.
My program will need some form of synchronization, so I'll add another node to represent that operation. This sync node will not be able to execute until both of the chopping tasks that feed into it are complete. I've used the terms spawn and sync here but you'll also see the terms fork and join used in this context. The fact that there are not any direct edges or connections between the chopped lettuce and chopped tomato tasks indicates the possibility for parallel execution.
So if this program was implemented using two threads as shown here, with a second thread spawning or forking from the main thread, the two chopping tasks can run at the same time.
This type of diagram is called a directed acyclic graph or DAG.
Directed because each edge is directed from one node or vertex to another, and acyclic meaning it doesn't have any loops that cycle back on itself.
There are several variations and ways to draw these types of computational graphs, but their general-purpose is to provide an abstract representation of the program. They help to visualize the relationship and dependencies between tasks. They can also be used to get a sense of how parallel a program can be. Every node represents a task or operation, and for each one, I'll indicate the amount of time it takes to execute. For this example, I'm just using made-up numbers and units of seconds.
If I add together the execution times for all of the nodes, that gives me a metric called work which represents the time it would take to execute all of these tasks on a single processor. For this example, that comes out to 60 seconds. Next, I'll identify the longest possible path through this graph following the directed edges from node to node which is referred to as the critical path. It represents the longer series of sequential operations through the program. If I add together the times for all of those nodes along the critical path, I get another metric called span which indicates the shortest possible execution time if this program was parallelized as much as possible.
In this case, that's 45 seconds. The ratio of work to span indicates the ideal parallelism of this program. How much faster can the parallel version of this program possibly run using multiple processors than the sequential version running on just one processor? The parallelism ratio of 1.33 means that at very best, the parallel version of this program will be 33% faster than the sequential version. We may not be able to reduce the total amount of work a program needs to do, so minimizing the critical path is important in designing parallel algorithms, because span determines the shortest possible execution time.
Last updated