4. Shared-memory bus–based systems

surprising that processor utilization is quite sensitive to the values of h1, but is much less

Processor utilization as a function of h2, the hit rate of the second-level cache, is shown in Figure 3 for two values of the main memory access time, tm ¼ 25 and tm ¼ 50. Processor

Processor utilization as a function of the probability of pipeline stalls ps ¼ ps<sup>1</sup> þ 2ps<sup>2</sup> is shown

Again, processor utilization is rather insensitive to the probability of pipeline stalls as well as

utilization is rather insensitive to values of h2, and does not change much with tm.

Figure 3. Processor utilization as a function of second-level cache hit rate for h<sup>1</sup> ¼ 0:9, ps ¼ 0:2.

Figure 4. Processor utilization as a function of probability of pipeline stalls for h<sup>1</sup> ¼ 0:9, h<sup>2</sup> ¼ 0:8.

in Figure 4 for three combinations of values of tc and tm.

sensitive to the values of tc.

80 Petri Nets in Science and Engineering

the values of tc and tm.

An outline of a shared-memory bus-based multiprocessor is shown in Figure 5. The system is composed of n identical processors, which access the shared memory using a system bus. To reduce the average access time to the shared memory, the processors use (multilevel) cache memories. It is assumed that memory consistency is provided by a cache coherence mechanism [16], which usually increases the miss ratio of accessing caches (and is otherwise not represented in the model).

A timed Petri net model of a shared-memory bus-based multiprocessor is shown in Figure 6. It contains models of n processors (only two are shown in Figure 6), which are copies of the model shown in Figure 1 except for the main memory (transition Tmem) which becomes shared memory in Figure 6. The remaining part of Figure 6 is modeling the bus that coordinates accesses of processors to the shared memory.

Figure 5. A shared-memory bus-based multiprocessor.

In Figure 7, the bus utilization approaches 100% for about five processors. Moreover, the degradation of processors' performance due to increasing waiting times for accessing the bus

Performance Analysis of Shared-Memory Bus-Based Multiprocessors Using Timed Petri Nets

http://dx.doi.org/10.5772/intechopen.75589

83

Figure 7. Processor and bus utilization as functions of the number of processors for h<sup>1</sup> ¼ 0:9, h<sup>2</sup> ¼ 0:8, ps ¼ 0:2.

The average waiting time (in processor cycles) of accessing shared memory (i.e., the times from requesting memory access in place Pri to granting this access by an occurrence of Tai) is shown

Figure 8 shows that the waiting times increase almost linearly with the number of processors when this number is greater than 5, i.e., when the bus (and shared memory) is utilized in

Figure 8. The average waiting time for accessing shared memory as a function of the number of processors for h<sup>1</sup> ¼ 0:9,

(and shared memory) is well illustrated in Figure 7.

almost 100%.

h<sup>2</sup> ¼ 0:8, ps ¼ 0:2.

in Figure 8 as a function of the number of processors in the system.

Figure 6. A timed Petri net model of bus-based shared-memory multiprocessor.

When a processor i, i ¼ 1, …, n, requests an access to shared memory, place Pri becomes marked. If the bus is available (i.e., if place Bus is marked), the occurrence of transition Tai indicates that processor i begins its access to shared memory. Transitions Ta1, ..., Tan constitute a conflict class with a fair resolution of conflicts (i.e., all conflicting processors have the same probability of being selected for accessing memory). In real systems, accessing the shared bus is often based on priorities assigned to processors; such priorities could easily be represented using inhibitor arcs in Petri nets.

Place Pmem collects memory access requests from all processors (occurrences of transitions Tai). The end of memory access (i.e., the end of the occurrence of Tmem) is indicated by an occurrence of transition Tei of the processor which initiated memory access. The occurrence of Tei also returns a token to Bus, allowing another access to shared memory to be executed.

Figure 7 shows the utilization of processors and the bus as functions of the number of processors in a shared-memory system for the values of modeling parameters shown in Table 1.

Performance Analysis of Shared-Memory Bus-Based Multiprocessors Using Timed Petri Nets http://dx.doi.org/10.5772/intechopen.75589 83

Figure 7. Processor and bus utilization as functions of the number of processors for h<sup>1</sup> ¼ 0:9, h<sup>2</sup> ¼ 0:8, ps ¼ 0:2.

In Figure 7, the bus utilization approaches 100% for about five processors. Moreover, the degradation of processors' performance due to increasing waiting times for accessing the bus (and shared memory) is well illustrated in Figure 7.

The average waiting time (in processor cycles) of accessing shared memory (i.e., the times from requesting memory access in place Pri to granting this access by an occurrence of Tai) is shown in Figure 8 as a function of the number of processors in the system.

Figure 8 shows that the waiting times increase almost linearly with the number of processors when this number is greater than 5, i.e., when the bus (and shared memory) is utilized in almost 100%.

When a processor i, i ¼ 1, …, n, requests an access to shared memory, place Pri becomes marked. If the bus is available (i.e., if place Bus is marked), the occurrence of transition Tai indicates that processor i begins its access to shared memory. Transitions Ta1, ..., Tan constitute a conflict class with a fair resolution of conflicts (i.e., all conflicting processors have the same probability of being selected for accessing memory). In real systems, accessing the shared bus is often based on priorities assigned to processors; such priorities could easily be

Place Pmem collects memory access requests from all processors (occurrences of transitions Tai). The end of memory access (i.e., the end of the occurrence of Tmem) is indicated by an occurrence of transition Tei of the processor which initiated memory access. The occurrence of Tei also returns a token to Bus, allowing another access to shared memory to be executed.

Figure 7 shows the utilization of processors and the bus as functions of the number of processors in a shared-memory system for the values of modeling parameters shown in Table 1.

represented using inhibitor arcs in Petri nets.

82 Petri Nets in Science and Engineering

Figure 6. A timed Petri net model of bus-based shared-memory multiprocessor.

Figure 8. The average waiting time for accessing shared memory as a function of the number of processors for h<sup>1</sup> ¼ 0:9, h<sup>2</sup> ¼ 0:8, ps ¼ 0:2.

The number of processors for which the bus is used to almost 100% can be estimated by the

Performance Analysis of Shared-Memory Bus-Based Multiprocessors Using Timed Petri Nets

np <sup>¼</sup> <sup>1</sup> <sup>þ</sup> ps<sup>1</sup> <sup>þ</sup> <sup>2</sup>∗ps<sup>2</sup> <sup>þ</sup> ð Þ <sup>1</sup> � <sup>h</sup><sup>1</sup> <sup>∗</sup>ð Þ tc <sup>þ</sup> ð Þ <sup>1</sup> � <sup>h</sup><sup>2</sup> <sup>∗</sup>tm ð Þ 1 � h<sup>1</sup> ∗ð Þ 1 � h<sup>2</sup> ∗tm

<sup>0</sup>:1∗0:2∗<sup>25</sup> <sup>¼</sup> <sup>1</sup>:<sup>2</sup> <sup>þ</sup> <sup>1</sup>:<sup>0</sup>

There are several ways in which the number of processors can be increased in bus-based systems without sacrificing the processors' performance. The simplest approach is to introduce the second bus which allows two concurrent accesses to shared memory, provided the memory is dual port (it allows two concurrent accesses). Figure 11 outlines a dual bus shared-

Petri net model of a dual bus system is the same as in Figure 6; the only difference is the initial marking of place Bus, which now requires two tokens to represent two concurrent accesses to shared memory. It should be observed observed that, for a small number of processors, the utilization of each bus in Figure 12 is one half of that in Figure 7, and also the number of processors that can be used in such a dual bus system without degradation of their perfor-

The results shown in Figure 12 are very similar to those shown in Figure 9. The second bus in a shared-memory system allows to perform two concurrent accesses to the shared memory. From a single processor's performance point of view, this effect is similar to reducing two

1 þ 0:1 þ 0:1 þ 0:1∗ð Þ 5 þ 0:2∗25

: (3)

85

<sup>0</sup>:<sup>5</sup> <sup>¼</sup> <sup>4</sup>:4: (4)

http://dx.doi.org/10.5772/intechopen.75589

following formula:

memory system.

For the case shown in Figure 7, this number is:

For the case shown in Figure 9, this value is 8.2.

mance is twice as large as in a single bus system (Figure 7).

Figure 11. A dual bus shared-memory multiprocessor.

Figure 9. Processor and bus utilization as functions of the number of processors for h<sup>1</sup> ¼ 0:9, h<sup>2</sup> ¼ 0:9, ps ¼ 0:2.

If the value of the second-level cache hit rate, h2, increases (and the other parameters do not change), the number of accesses to main memory is reduced, so that the performances of processors and the whole system improve. Figure 9 shows the utilization of processors and the bus as functions of the number of processors in the system for h<sup>2</sup> ¼ 0:9. It also shows that the reduced (in comparison with Figure 7) utilization of the bus allows the increase of the number of processors without significant degradation of their performance.

By a similar argument, reduced hit rate at the first-level cache, h1, increases the number of accesses to the second-level cache as well as to the main memory, and this results in reduced performance of the system. Figure 10 shows the utilization of processors and the bus as functions of the number of processors in the system for h<sup>1</sup> ¼ 0:8. It provides a good illustration of the degradation of performance when compared with Figure 7.

Figure 10. Processor and bus utilization as functions of the number of processors for h<sup>1</sup> ¼ 0:8, h<sup>2</sup> ¼ 0:8, ps ¼ 0:2.

The number of processors for which the bus is used to almost 100% can be estimated by the following formula:

$$n\_p = \frac{1 + p\_{s1} + 2 \ast p\_{s2} + (1 - h\_1) \ast (t\_c + (1 - h\_2) \ast t\_m)}{(1 - h\_1) \ast (1 - h\_2) \ast t\_m}. \tag{3}$$

For the case shown in Figure 7, this number is:

$$\frac{1+0.1+0.1+0.1\*(5+0.2\*25)}{0.1\*0.2\*25} = \frac{1.2+1.0}{0.5} = 4.4.\tag{4}$$

For the case shown in Figure 9, this value is 8.2.

If the value of the second-level cache hit rate, h2, increases (and the other parameters do not change), the number of accesses to main memory is reduced, so that the performances of processors and the whole system improve. Figure 9 shows the utilization of processors and the bus as functions of the number of processors in the system for h<sup>2</sup> ¼ 0:9. It also shows that the reduced (in comparison with Figure 7) utilization of the bus allows the increase of the

Figure 9. Processor and bus utilization as functions of the number of processors for h<sup>1</sup> ¼ 0:9, h<sup>2</sup> ¼ 0:9, ps ¼ 0:2.

By a similar argument, reduced hit rate at the first-level cache, h1, increases the number of accesses to the second-level cache as well as to the main memory, and this results in reduced performance of the system. Figure 10 shows the utilization of processors and the bus as functions of the number of processors in the system for h<sup>1</sup> ¼ 0:8. It provides a good illustration

Figure 10. Processor and bus utilization as functions of the number of processors for h<sup>1</sup> ¼ 0:8, h<sup>2</sup> ¼ 0:8, ps ¼ 0:2.

number of processors without significant degradation of their performance.

of the degradation of performance when compared with Figure 7.

84 Petri Nets in Science and Engineering

There are several ways in which the number of processors can be increased in bus-based systems without sacrificing the processors' performance. The simplest approach is to introduce the second bus which allows two concurrent accesses to shared memory, provided the memory is dual port (it allows two concurrent accesses). Figure 11 outlines a dual bus sharedmemory system.

Petri net model of a dual bus system is the same as in Figure 6; the only difference is the initial marking of place Bus, which now requires two tokens to represent two concurrent accesses to shared memory. It should be observed observed that, for a small number of processors, the utilization of each bus in Figure 12 is one half of that in Figure 7, and also the number of processors that can be used in such a dual bus system without degradation of their performance is twice as large as in a single bus system (Figure 7).

The results shown in Figure 12 are very similar to those shown in Figure 9. The second bus in a shared-memory system allows to perform two concurrent accesses to the shared memory. From a single processor's performance point of view, this effect is similar to reducing two

Figure 11. A dual bus shared-memory multiprocessor.

Figure 12. Processor and bus utilization as functions of the number of processors—dual bus system with h<sup>1</sup> ¼ 0:9, h<sup>2</sup> ¼ 0:8, ps ¼ 0:2.

times the number of accesses to the shared memory, and this is the effect of reducing two times the miss rate for the second-level cache (which is shown in Figure 9).

If dual port memory cannot be used, the shared memory can be split into several independent modules which can be accessed concurrently by the processors provided that the bus is also split into sections associated with each module, with processors accessing all such sections, as shown in Figure 13 for four independent memory modules. The main difference between a multibus system (Figure 11) and a system with split bus is in accessing the shared memory; in a multiple bus system, the whole shared memory is accessed by each bus, while in a split bus system (Figure 13), each section of the bus accesses only one memory module. In the system

Figure 14. Petri net model of a

shared-memory

multiprocessor

 with multiple memory modules.

Performance Analysis of Shared-Memory Bus-Based Multiprocessors Using Timed Petri Nets

http://dx.doi.org/10.5772/intechopen.75589

87

Figure 13. A shared-memory multiprocessor with multiple memory modules.

times the number of accesses to the shared memory, and this is the effect of reducing two times

Figure 12. Processor and bus utilization as functions of the number of processors—dual bus system with h<sup>1</sup> ¼ 0:9,

If dual port memory cannot be used, the shared memory can be split into several independent modules which can be accessed concurrently by the processors provided that the bus is also split into sections associated with each module, with processors accessing all such sections, as shown in Figure 13 for four independent memory modules. The main difference between a multibus system (Figure 11) and a system with split bus is in accessing the shared memory; in a multiple bus system, the whole shared memory is accessed by each bus, while in a split bus system (Figure 13), each section of the bus accesses only one memory module. In the system

the miss rate for the second-level cache (which is shown in Figure 9).

Figure 13. A shared-memory multiprocessor with multiple memory modules.

h<sup>2</sup> ¼ 0:8, ps ¼ 0:2.

86 Petri Nets in Science and Engineering

Figure 14. Petri net model of a shared-memory multiprocessor with multiple memory modules.

shown in Figure 13, up to four (the number of memory modules) memory accesses can be performed concurrently, but if two (or more) processors request access to the same memory module, the requests are served one after another.

for example, the probabilities of accessing consecutive memory modules by each processor

Performance Analysis of Shared-Memory Bus-Based Multiprocessors Using Timed Petri Nets

http://dx.doi.org/10.5772/intechopen.75589

89

Figure 15 shows the utilization of processors and busses as functions of the number of pro-

In Figure 15, even for 20 processors, the average utilization of the bus is close to 80%, so the

The chapter uses timed Petri nets to model shared-memory bus-based architectures at the level of instruction execution to study the effects of modeling parameters on the performance of the system. The models are rather simple with straightforward representation of modeling parameters. Performance results presented in this chapter have been obtained by the simulation of developed Petri net models. However, the model shown in Figure 7 has only 10 states, so its analytical solution (for different values of modeling parameters) can be easily obtained and compared with simulation results to verify their accuracy. Table 2 shows such a comparison of processor utilization for several combinations of parameters h<sup>1</sup> and h2. In all cases, the

The models of multiprocessor systems are usually composed of many copies of the same submodel of a processor and possibly other elements of the system. Colored Petri nets [17] can significantly simplify such models by eliminating copies of similar subsystems. Analysis of

Finally, it should be noted that the performance of real-life multiprocessor systems very rarely can be described by a set of parameters that remain stable for any significant period of time. The basic parameters like the hit rates depend upon the executed programs as well as their data, and can change very quickly in a significant way. Consequently, the characteristics presented in this chapter can only be used as some insight into the complex behavior of

h<sup>1</sup> h<sup>2</sup> Simulated results Analytical results

0.8 0.8 0.3341 0.3333 0.8 0.9 0.3846 0.3846 0.9 0.8 0.4763 0.4762 0.9 0.9 0.5255 0.5263

colored Petri nets is, however, much more complex than that of ordinary Petri nets.

could be used to model sequential processing of large arrays, and so on.

simulation-based results are very close to the analytical ones.

cessors in a system outlined in Figure 13.

system can accommodate more processors.

5. Concluding remarks

multiprocessor systems.

Table 2. A comparison of simulation and analytical results.

Petri net model of a system outlined in Figure 13 is shown in Figure 14 where only two processors and two memory modules are detailed.

In Figure 14, there is a free-choice place Pri for each processor i, i ¼ 1, …, n. This free-choice place selects the requested memory module by transitions Tij, j ¼ 1, …, 4, and forwards the memory access request to the selected memory module (place Pij). If the selected module is available, i.e., if place Busj is marked, the access to shared memory is initiated by the occurrence of Taij. When this memory access is completed, the occurrence of Teij releases the memory modules (by returning a token to Busj) and resumes instruction execution in the processor that requested the memory access.

If memory module is not available when it is requested, the memory access is delayed (in Pij) until the requested module becomes available.

It is possible that more than one processor becomes waiting for the same memory module. The selection of the processor which will get access first is random with the same probability assigned to all waiting processors. In real systems, there is usually some priority scheme that determines the order in which the waiting processors access the bus. Such a priority scheme could easily be modeled if it is needed (for example, for studying the starvation effect which can be created when the system is overloaded).

In Figure 14, the selection of memory modules is random, with the same probabilities for all modules. If this policy is not realistic, a different memory accessing policy can be implemented,

Figure 15. Processor and bus utilization as functions of the number of processors—system with four memory modules and h<sup>1</sup> ¼ 0:9, h<sup>2</sup> ¼ 0:8, ps ¼ 0:2.

for example, the probabilities of accessing consecutive memory modules by each processor could be used to model sequential processing of large arrays, and so on.

Figure 15 shows the utilization of processors and busses as functions of the number of processors in a system outlined in Figure 13.

In Figure 15, even for 20 processors, the average utilization of the bus is close to 80%, so the system can accommodate more processors.
