Application of mesh topology

Defining a VPN

In Firewall Policies and VPN Configurations, 2006

Meshed Topology

Like their traditional WAN counterparts, meshed VPN topologies can be implemented in a fully or partially meshed configuration. Fully meshed configurations have a large number of alternate paths to any given destination. In addition, fully meshed configurations have exceptional redundancy because every VPN device provides connections to every other VPN device. This topology was illustrated in Figure5.1. A simpler compromise is the partial-mesh topology, in which all the links are connected in a more limited fashion to other links. A partial-mesh topology is shown in Figure5.5.

Figure5.5. Partial-Mesh VPN Topology

Mesh topology provides an inherent advantage that there is no single point of failure. Overall performance of the setup is independent of a single node or a single system. Sites that are geographically close can communicate with each other. Its main drawback is maintenance and key maintenance. For a fully meshed network, whenever a new node is added, all the other nodes will have to be updated. Even with the replacement of traditional WAN services such as frame relay or leased lines, fully meshed topologies can be expensive to implement due to the requirement to purchase a VPN device for every link in the mesh.

Note

Another issue you should be aware of with full versus partial-mesh topology is the number of tunnels you need to configure and manage. If you have 100 sites and add one router, think of all the connections you must make to rebuild a full mesh! In essence, the partial mesh is the way you want to go, but you might see an extra hop in the route from place to place because you will no longer have a single hop to any single destination. There is always give and take. Think about what method suits your design needs, and implement that method accordingly.

View chapterPurchase book
Read full chapter
URL://www.sciencedirect.com/science/article/pii/B9781597490887500074

ZigBee and IEEE 802.15.4 Protocol Layers

Shahin Farahani, in ZigBee Wireless Networks and Transceivers, 2008

3.4.5 Mesh Topology

In a mesh topology, in contrast to the tree topology, there are no hierarchical relationships. Any device in a mesh topology is allowed to attempt to contact any other device either directly or by taking advantage of routing-capable devices to relay the message on behalf of the message originator. In mesh topology, the route from the source device to the destination is created on demand and can be modified if the environment changes. The capability of a mesh network to create and modify routes dynamically increases the reliability of the wireless connections. If, for any reason, the source device cannot communicate with the destination device using a previously established route, the routing-capable devices in the network can cooperate to find an alternative path from the source device to the destination device. This is clarified further in the route discovery and maintenance subsections.

View chapterPurchase book
Read full chapter
URL://www.sciencedirect.com/science/article/pii/B9780750683937000030

Switch Fabric Technology

Gary Lee, in Cloud Networking, 2014

Mesh topology

In a mesh topology, bandwidth between devices is improved compared to a ring structure by providing a dedicated link between every device as shown in Figure 3.6. Although this improves fabric performance, it requires N1 ports on every FIC when N devices are connected to the fabric, limiting the scale of the fabric. The FIC must also direct traffic to a given output port instead of simply placing all traffic on a single ring port. An example use case is in an ATCA backplane, which can use either a mesh or a star configuration.

Figure 3.6. Mesh topology block diagram.

Fabric designers have also scaled mesh and ring designs in multiple dimensions, creating fabric configurations called two-dimensional rings or Torus structures. These are very complex topologies that are typically only deployed in applications such as high-performance computing, so we will not go into any detail in this book which is focusing on cloud data center networks.

View chapterPurchase book
Read full chapter
URL://www.sciencedirect.com/science/article/pii/B9780128007280000035

Evolutionary Developments of DRAM Device Architecture

Bruce Jacob, ... David T. Wang, in Memory Systems, 2008

Heat DensityHeat SpreaderEnforced Idle Cycles

In the classical mesh topology of SDRAM and DDRx SDRAM memory systems, multiple DRAM devices are connected in parallel to form a given rank of memory. Moreover, high data rate DDR2 and DDR3 SDRAM devices do not contain enough banks in parallel in a single rank configuration to fully saturate the memory channel. Consequently, the on-chip and in-system data movements associated with any given command issued by the memory controller are always distributed across multiple DRAM devices in a single rank and also typically across different ranks in standard 64- or 72-bit-wide SDRAM and DDRx SDRAM memory systems. In contrast, the Direct RDRAM memory system is architected for high bandwidth throughput, and a single Direct RDRAM device provides full bandwidth for a given channel of Direct RDRAM devices. However, the ability of the single Direct RDAM device to provide full bandwidth to the memory channel means that the on-chip and in-system data movements associated with a given command issued by the memory controller are always limited to a single DRAM device. Moreover, Figure 12.31 illustrates that in the worst-case memory-access pattern, a sustainable stream of row activation and column access commands can be pipelined to a single device in a given channel of the Direct RDRAM memory system. Consequently, localized hot spots associated with high access rates to a given device can appear and disappear on different sections of the Direct RDRAM channel. The localized hot spots can, in turn, change the electrical characteristics of the transmission lines that Direct RDRAM memory systems rely on to deliver command and data packets, thus threatening the functional correctness of the memory system itself. To counter the problem of localized hot spots in Direct RDRAM memory systems, Rambus Corp. deployed two solutions: heat spreaders and new command issue rules designed to limit access rates to a given Direct RDRAM device. Unfortunately, the use of heat spreaders on the RIMMs further increased the cost of the Direct RDRAM memory system, and the new command issue rules further increased controller complexity and decreased available memory bandwidth in the Direct RDRAM memory system.

FIGURE 12.31. Worst-case memory-access pattern can create localized hot spots in DRDRAM system topology.

View chapterPurchase book
Read full chapter
URL://www.sciencedirect.com/science/article/pii/B978012379751350014X

Routing algorithms for workload consolidation

In Networks-On-Chip, 2015

5.5.4.1 Configuration

Here, we extend the analysis to the CMesh topology [2, 33]. As a case study, we use radix-4 CMeshes [2, 33]; four cores are concentrated around one router, with two cores in each dimension, as shown in Figure 5.18a. Here, Core0, Core1, Core4, and Core5 are concentrated on router [0,0]. Each core has its own injection/ejection channels to the router. Based on a CMesh latency model, the network channel has a two-cycle delay, while the injection/ejection channel has a one-cycle delay [33]. The router pipeline is the same as discussed in Section 5.4.1.2. As shown in Figure 5.18, we evaluate 16-core and 64-core platforms. For the 64-core platform, both single-region and multiple-region experiments are conducted. For the multiple-region configuration [Figure 5.18b], regions R1, R2, and R3 run uniform random traffic with an injection rate of 4% and we vary the pattern in region R0.

Figure 5.18. The CMesh configurations. [a] 16-Core platform and [b] 64-core platform.

View chapterPurchase book
Read full chapter
URL://www.sciencedirect.com/science/article/pii/B9780128009796000056

Network Architectures

José Duato, ... Lionel Ni, in Interconnection Networks, 2003

The Reliable Router

The Reliable Router chip is targeted for fault-tolerant operation in 2-D mesh topologies [75]. The block diagram of the Reliable Router is shown in Figure 7.23. There are six input channels corresponding to the four physical directions in the 2-D mesh, and two additional physical ports: the local processor interface and a separate port for diagnostics. The input and output channels are connected through a full crossbar switch, although some input/output connections may be prohibited by the routing function.

Figure 7.23. Block diagram of the Reliable Router.

While message packets can be of arbitrary length, the flit length is 64 bits. There are four flit types: head, data, tail, and token. The format of the head flit is shown in Figure 7.24. The size of the physical channel or phit size is 23 bits. The channel structure is illustrated in Figure 7.23. To permit the use of chip carriers with fewer than 300 pins, these physical channels utilize half-duplex channels with simultaneous bidirectional signaling. Flits are transferred in one direction across the physical channel as four 23-bit phits called frames, producing 92-bit transfers. The format of a data flit and its constituent frames are shown in Figure 7.25. The 28 bits in excess of the flit size are used for byte parity bits [BP], kind of flit [Kind], virtual channel identification [VCI], flow control to implement the unique token protocol [Copied Kind, Copied VCI, Freed], to communicate link status information [U/D, PE], and two user bits [USR1, USR0]. For a given direction of transfer, the clock is transmitted along with the four data frames as illustrated in the figure. Data are driven on both edges of the clock. To enable the receiver to distinguish between the four frames and reassemble them into a flit, the transmitting side of the channel also sends a pulse on the TxPhase signal, which has the relative timing as shown. The flit is then assembled and presented to the routing logic. This reassembly process takes two cycles. Each router runs off a locally generated 100 MHz clock, removing the problems with distributing a single global clock. Reassembled flits pass through a synchronization module for transferring flits from the transmit clock domain to the receive clock domain with a worst-case penalty of one cycle [see [75] for a detailed description of the data synchronization protocol]. The aggregate physical bandwidth in one direction across the channel is 3.2 Gbits/s.

Figure 7.24. Reliable Router head flit format.

Figure 7.25. Reliable Router frame format for data flits.

While the output controllers simply transmit the flit across the channel, generating the appropriate control and error-checking signals, the input controllers contain the core functionality of the chip and are organized as illustrated in Figure 7.26. The crossbar provides switching between the physical input channels and the physical output channels. Each physical channel supports five virtual channels, which share a single crossbar port: two channels for fully adaptive routing, two channels for dimension-order routing, and one channel for fault handling. The fault-handling channel utilizes turn model routing. The two dimension-ordered channels support two priority levels for user packets. The router is input buffered. The FIFO block is actually partitioned into five distinct FIFO buffersone corresponding to each virtual channel. Each FIFO buffer is 16 flits deep. Bandwidth allocation is demand driven: only allocated channels with data to be transmitted or channels with new requests [i.e., head flits] compete for bandwidth. Virtual channels with message packets are accorded access to the crossbar bandwidth in a round-robin fashion by the scheduler in the virtual channel controller. Data flits simply flow through the virtual channel FIFO buffers, and the virtual channel number is appended to the flit prior to the transmission of the flit through the crossbar and across the physical channel to the next router. Flow control guarantees the presence of buffer space. When buffers at the adjacent node are full, the corresponding virtual channel does not compete for crossbar bandwidth.

Figure 7.26. Block diagram of the input controller.

A more involved sequence of operations takes place in routing a head flit. On arrival the head flit is transferred to both the FIFO buffer and a block that compares the destination address with the local node address. The result is the routing problem, which contains all of the information used to route the message. This information is stored in the virtual channel module, along with routing information: the address of the output controller and the virtual channel ID. Recall from the protocol description provided in Section 6.7.2, the header information must be retained in the router to construct additional messages in the presence of a fault. To minimize the time spent in arbitration for various shared resources, the routing logic is replicated within each virtual channel module. This eliminates the serialized access to the routing logic and only leaves arbitration for the output controllers, which is handled by the crossbar allocator. The routing logic first attempts to route a message along an adaptive channel, failing which a dimension-order channel is requested. If the packet fails in arbitration, on the next flit cycle, the router again first attempts an adaptive channel. The virtual channel module uses counters and status signals from adjacent nodes [see Freed bits in Figure 7.25] to keep track of buffer availability in adjacent routers. A virtual channel module is eligible to bid for access to the crossbar output only if the channel is routed, buffer space is available on the adjacent node, and there are data to be transmitted. When a fault occurs during transmission, the channel switches to a state where computation for retransmission occurs. After a period of two cycles, the channel switches back to a regular state and competes for access to the crossbar outputs. A header can be routed and the crossbar allocated in two cycles [i.e., 20 ns]. The worst-case latency through the router is projected to be eight cycles, or 80 ns.

Arbitration is limited to the output of the crossbar and is resolved via three packet priority levels: two user priority levels and the highest level reserved for system packets. Starvation is prevented by changing the packet priority to level 3 after an input controller has failed to gain access to an output controller after seven tries. Packet selection within a priority level is randomized.

The implementation of the unique token protocol for reliable transmission necessitates some special handling and architectural support. The token at the end of a message must be forwarded only after the corresponding flit queue has successfully emptied. Retransmission on failure involves generation of duplicate tokens. Finally, flow control must span two routers to ensure that a duplicate copy of each data flit is maintained at all times in adjacent routers. When a data flit is successfully transmitted across a channel, the copy of the data flit two routers upstream must be deallocated. The relevant flow control signals are passed through frames as captured in Figure 7.25.

View chapterPurchase book
Read full chapter
URL://www.sciencedirect.com/science/article/pii/B9781558608528500109

Network-on-chip customizations for message passing interface primitives

In Networks-On-Chip, 2015

9.4.1 Architecture overview

Figure 9.1 shows a block diagram of the proposed implementation architecture with a baseline 8 × 8 mesh topology. We consider a multicore processor chip where each core has a private L1 cache and logically shares a large L2 cache. The L2 cache may be physically distributed on the chip, with one slice associated with each core [essentially forming a tiled chip]. As shown in Figure 9.1, the underlying NoC design is the actual medium used to transfer messages, which can be designed with consideration for specialized features of MPI communications. Each node also has an MU between the core and the network interface [NI], which is used to execute corresponding instructions for MPI primitives.

Figure 9.1. The architecture overview of the proposed design.

The latency and bandwidth of the NoC are important factors that affect the efficiency of computations with numerous intercore dependencies. To support the MPI efficiently, good service for control messages must be provided. Control messages may be used to signal network barriers and changes in network configuration, as well as the onset or termination of computation. The process requires minimal bandwidth, but needs very low latency for broadcast [one-to-all or all-to-all] communication. However, the multihop feature and inefficient multicast [one-to-many] or broadcast [one-to-all] support in conventional NoCs have degraded the performance of such kinds of communications. To facilitate the efficient transmission of data and control messages, a customized network is needed. In the proposed design, a hierarchical on-chip network, called VBON, is introduced.

By directly executing the MPI primitives and interrupting service routines, the MU reduces the context switching overhead in the cores and accelerates software processing. The MU also performs the message buffer management as well as the fast buffer copying for the cores. The MU transfers messages to and from dynamically allocated message buffers in the memory to avoid buffer copying between system and user buffers. This process eliminates the need for the sending process to wait for the message buffer to be released by the communication channel. The MU also reserves a set of buffers for incoming messages. With use of the above methods, the long message transmission protocol can be simplified to reduce transmission latency. In the following subsections, we will introduce the architecture of the proposed NoC and MU.

View chapterPurchase book
Read full chapter
URL://www.sciencedirect.com/science/article/pii/B9780128009796000093

Video liên quan

Chủ Đề