What is importance of concurrency control explain with example time stamping protocol?

In the previous chapter, you learned lock based protocol in DBMS to maintain the integrity of database. In this chapter, you will learn Timestamp based ordering protocol.

  • Timestamp ordering protocol maintains the order of transaction based on their timestamps.
  • A timestamp is a unique identifier that is being created by the DBMS when a transaction enters into the system. This timestamp can be based on the system clock or a logical counter maintained in the system.
  • Timestamp helps identifying the older transactions (transactions that are waiting in line to be executed) and gives them higher priority compared to the newer transactions. This make sure that none of the transactions are pending for a longer period of time.
  • This protocol also maintains the timestamps for the last read and last write on a data.
  • For example, let’s say an old transaction T1 timestamp is TS(T1) and a new transaction T2 enters into the system, timestamp assigned to T2 is TS(T2). Here TS(T1) < TS(T2) so the T1 has the higher priority because its timestamp is less than timestamp of T2. T1 would be given the higher priority than T2. This is how timestamp based protocol maintains the serializability order.

How a Timestamp ordering protocol works?

Let’s see how a timestamp ordering protocol works in a DBMS system. Let’s say there is data item A in the database.

W­_TS(A) is the largest timestamp of a transaction that executed the operation write(A) successfully.
R_TS(A) is the largest timestamp of a transaction that executed the operation read(A) successfully.

  1. Whenever a Transaction Tn issues a Write(A) operation, this protocol checks the following conditions:
    • If R_TS(A) > TS(Tn) or if W_TS(A) > TS(Tn), then abort and rollback the transaction Tn and reject the write (A) operation.
    • If R_TS(A) <= TS(Tn) or if W_TS(A) <= TS(Tn) then execute Write(A) operation of Tn and set W_TS(A) to TS(Tn).
  2. Whenever a Transaction Tn issues a Read(A) operation, this protocol checks the following conditions:
    • If W_TS(A) > TS(Tn), then abort and reject Tn and reject the Read(A) operation.
    • If W_TS(A) <= TS(Tn), then execute the Read(A) operation of Tn and update the timestamp R_TS(A).

Advantages of Timestamp based protocol

  • Schedules managed using timestamp based protocol are serializable just like the two phase protocols
  • Since older transactions are given priority which means no transaction has to wait for longer period of time that makes this protocol free from deadlock.

Concurrency Control

Jan L. Harrington, in Relational Database Design and Implementation (Fourth Edition), 2016

Solution #3: Multiversion Concurrency Control (Timestamping)

Multiversion concurrency control, or timestamping, is a concurrency control method that does not rely on locking. Instead, it assigns a timestamp to each piece of data retrieved by a transaction and uses the chronological ordering of the timestamps to determine whether an update will be permitted.

Each time a transaction reads a piece of data, it receives a timestamp on that data. An update of the data will be permitted as long as no other transaction holds an earlier timestamp on the data. Therefore, only the transaction holding the earliest timestamp will be permitted to update, although any number of transactions can read the data.

Timestamping is efficient in environments where most of the database activity is retrieval because nothing blocks retrieval. However, as the proportion of update transactions increases, so does the number of transactions that are prevented from updating and must be restarted.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128043998000223

Computer Data Processing Hardware Architecture

Paul J. Fortier, Howard E. Michel, in Computer Systems Performance Evaluation and Prediction, 2003

2.10.4 Concurrency control manager

The concurrency control manager coordinates the actions of interactive access to the database by concurrently running transactions. The goal of concurrency control is to coordinate execution so that the VIEW or effect from the database's perspective is the same as if the concurrently executing transactions were executed in a serial fashion. This scheme is referred to as the serializable execution of transactions. Concurrency control's serializability theory has two basic modes: The simplest concerns the serializable execution of the read and write sets from conflicting transactions and is based on either locking, timestamp ordering, or optimistic read and write conflict resolution. The second concurrency control concept is more complex and uses semantic knowledge of a transaction's execution to aid in coordination. The major difference is that the granularity of the serialization operator is not the read and write but rather complex functions and procedures as well as complex data objects. The criterion of correct execution, however, is, nevertheless, serialization across concurrent transactions.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781555582609500023

Storage Systems

Dan C. Marinescu, in Cloud Computing, 2013

8.4 General Parallel File System

Parallel I/O implies execution of multiple input/output operations concurrently. Support for parallel I/O is essential to the performance of many applications [236]. Therefore, once distributed file systems became ubiquitous, the natural next step in the evolution of the file system was to support parallel access. Parallel file systems allow multiple clients to read and write concurrently from the same file.

Concurrency control is a critical issue for parallel file systems. Several semantics for handling the shared access are possible. For example, when the clients share the file pointer, successive reads issued by multiple clients advance the file pointer; another semantic is to allow each client to have its own file pointer. Early supercomputers such as the Intel Paragon4 took advantage of parallel file systems to support applications based on the same program, multiple data (SPMD) paradigm.

The General Parallel File System (GPFS) [317] was developed at IBM in the early 2000s as a successor to the TigerShark multimedia file system [159]. GPFS is a parallel file system that emulates closely the behavior of a general-purpose POSIX system running on a single system. GPFS was designed for optimal performance of large clusters; it can support a file system of up to 4 PB consisting of up to 4,096 disks of 1 TB each (see Figure 8.6).

What is importance of concurrency control explain with example time stamping protocol?

Figure 8.6. A GPFS configuration. The disks are interconnected by a SAN and compute servers are distributed in four LANs, LAN1LAN4. The I/O nodes/servers are connected to LAN1.

The maximum file size is (263-1) bytes. A file consists of blocks of equal size, ranging from 16 KB to 1 MB striped across several disks. The system could support not only very large files but also a very large number of files. The directories use extensible hashing techniques5 to access a file. The system maintains user data, file metadata such as the time when last modified, and file system metadata such as allocation maps. Metadata, such as file attributes and data block addresses, is stored in inodes and indirect blocks.

Reliability is a major concern in a system with many physical components. To recover from system failures, GPFS records all metadata updates in a write-ahead log file. Write-ahead means that updates are written to persistent storage only after the log records have been written. For example, when a new file is created, a directory block must be updated and an inode for the file must be created. These records are transferred from cache to disk after the log records have been written. When the directory block is written and then the I/O node fails before writing the inode, then the system ends up in an inconsistent state and the log file allows the system to recreate the inode record.

The log files are maintained by each I/O node for each file system it mounts; thus, any I/O node is able to initiate recovery on behalf of a failed node. Disk parallelism is used to reduce access time. Multiple I/O read requests are issued in parallel and data is prefetched in a buffer pool.

Data striping allows concurrent access and improves performance but can have unpleasant side-effects. Indeed, when a single disk fails, a large number of files are affected. To reduce the impact of such undesirable events, the system attempts to mask a single disk failure or the failure of the access path to a disk. The system uses RAID devices with the stripes equal to the block size and dual-attached RAID controllers. To further improve the fault tolerance of the system, GPFS data files as well as metadata are replicated on two different physical disks.

Consistency and performance, critical to any distributed file system, are difficult to balance. Support for concurrent access improves performance but faces serious challenges in maintaining consistency. In GPFS, consistency and synchronization are ensured by a distributed locking mechanism; a central lock manager grants lock tokens to local lock managers running in each I/O node. Lock tokens are also used by the cache management system.

Lock granularity has important implications in the performance of a file system, and GPFS uses a variety of techniques for various types of data. Byte-range tokens are used for read and write operations to data files as follows: The first node attempting to write to a file acquires a token covering the entire file, [0,∞]. This node is allowed to carry out all reads and writes to the file without any need for permission until a second node attempts to write to the same file. Then the range of the token given to the first node is restricted. More precisely, if the first node writes sequentially at offset fp1 and the second one at offset fp2>fp1, the range of the tokens for the two tokens are [0,fp2] and [fp2,∞], respectively, and the two nodes can operate concurrently, without the need for further negotiations. Byte-range tokens are rounded to block boundaries.

Byte-range token negotiations among nodes use the required range and the desired range for the offset and for the length of the current and future operations, respectively. Data shipping, an alternative to byte-range locking, allows fine-grained data sharing. In this mode the file blocks are controlled by the I/O nodes in a round-robin manner. A node forwards a read or write operation to the node controlling the target block, the only one allowed to access the file.

A token manager maintains the state of all tokens; it creates and distributes tokens, collects tokens once a file is closed, and downgrades or upgrades tokens when additional nodes request access to a file. Token management protocols attempt to reduce the load placed on the token manager; for example, when a node wants to revoke a token, it sends messages to all the other nodes holding the token and forwards the reply to the token manager.

Access to metadata is synchronized. For example, when multiple nodes write to the same file, the file size and the modification dates are updated using a shared write lock to access an inode. One of the nodes assumes the role of a metanode, and all updates are channeled through it. The file size and the last update time are determined by the metanode after merging the individual requests. The same strategy is used for updates of the indirect blocks. GPFS global data such as access control lists (ACLs), quotas, and configuration data are updated using the distributed locking mechanism.

GPFS uses disk maps to manage the disk space. The GPFS block size can be as large as 1 MB, and a typical block size is 256 KB. A block is divided into 32 subblocks to reduce disk fragmentation for small files; thus, the block map has 32 bits to indicate whether a subblock is free or used. The system disk map is partitioned into n regions, and each disk map region is stored on a different I/O node. This strategy reduces conflicts and allows multiple nodes to allocate disk space at the same time. An allocation manager running on one of the I/O nodes is responsible for actions involving multiple disk map regions. For example, it updates free space statistics and helps with deallocation by sending periodic hints of the regions used by individual nodes.

A detailed discussion of system utilities and the lessons learned from the deployment of the file system at several installations in 2002 can be found in [317]; the documentation of the GPFS is available from [177].

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124046276000087

Transactions and Concurrency Control

Joe Celko, in Joe Celko's SQL for Smarties (Fifth Edition), 2015

2.5 Pessimistic Concurrency Control

Pessimistic concurrency control is based on the idea that transactions are expected to conflict with each other, so we need to design a system to avoid the problems before they start.

All pessimistic concurrency control schemes use locks. A lock is a flag placed in the database that gives exclusive access to a schema object to one user. Imagine an airplane toilet door, with its “occupied” sign.

The differences are the level of locking they use; setting those flags on and off costs time and resources. If you lock the whole database, then you have a serial batch processing system since only one transaction at a time is active. In practice you would do this only for system maintenance work on the whole database. If you lock at the table level, then performance can suffer because users must wait for the most common tables to become available. However, there are transactions which do involve the whole table and this will use only one flag.

If you lock the table at the row level, then other users can get to the rest of the table and you will have the best possible shared access. You will also have a huge number of flags to process and performance will suffer. This approach is generally not practical.

Page locking is in between table and row locking. This approach puts a lock on subsets of rows within the table, which include the desired values. The name comes from the fact that this is usually implemented with pages of physical disk storage. Performance depends on the statistical distribution of data in physical storage, but it is generally a good compromise.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128007617000024

Transactions and Concurrency Control

Joe Celko, in Joe Celko's SQL for Smarties (Fourth Edition), 2011

2.4 Pessimistic Concurrency Control

Pessimistic concurrency control is based on the idea that transactions are expected to conflict with each other, so we need to design a system to avoid the problems before they start.

All pessimistic concurrency control schemes use locks. A lock is a flag placed in the database that gives exclusive access to a schema object to one user. Imagine an airplane toilet door, with its “occupied” sign.

But again, you will find different kinds of locking schemes. For example, DB2 for z/OS has “latches” that are a little different from traditional locks. The important differences are the level of locking they use; setting those flags on and off costs time and resources. If you lock the whole database, then you have a serial batch processing system, since only one transaction at a time is active. In practice you would do this only for system maintenance work on the whole database. If you lock at the table level, then performance can suffer because users must wait for the most common tables to become available. However, there are transactions that do involve the whole table, and this will use only one flag.

If you lock the table at the row level, then other users can get to the rest of the table and you will have the best possible shared access. You will also have a huge number of flags to process and performance will suffer. This approach is generally not practical.

Page locking is in between table and row locking. This approach puts a lock on subsets of rows within the table, which include the desired values. The name comes from the fact that this is usually implemented with pages of physical disk storage. Performance depends on the statistical distribution of data in physical storage, but it is generally a good compromise.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123820228000028

Streaming Databases and Complex Events

Joe Celko, in Joe Celko’s Complete Guide to NoSQL, 2014

5.1.1 Optimistic Concurrency

Optimistic concurrency control assumes that conflicts are exceptional and we have to handle them after the fact. The model for optimistic concurrency is microfilm! Most database people today have not even seen microfilm, so, if you have not, you might want to Google it. This approach predates databases by decades. It was implemented manually in the central records department of companies when they started storing data on microfilm. A user did not get the microfilm, but instead the records manager made a timestamped photocopy for him. The user took the copy to his desk, marked it up, and returned it to the central records department. The central records clerk timestamped the updated document, photographed it, and added it to the end of the roll of microfilm.

But what if a second user, user B, also went to the central records department and got a timestamped photocopy of the same document? The central records clerk had to look at both timestamps and make a decision. If user A attempted to put his updates into the database while user B was still working on her copy, then the clerk had to either hold the first copy, wait for the second copy to show up, or return the second copy to user A. When both copies were in hand, the clerk stacked the copies on top of each other, held them up to the light, and looked to see if there were any conflicts. If both updates could be made to the database, the clerk did so. If there were conflicts, the clerk must either have rules for resolving the problems or reject both transactions. This represents a kind of row-level locking, done after the fact.

The copy has a timestamp on it; call it t0 or start_timestamp. The changes are committed by adding the new version of the data to the end of the file with a timestamp, t1. That is unique within the system. Since modern machinery can work with nanoseconds, an actual timestamp and not just a sequential numbering will work. If you want to play with this model, you can get a copy of Borland’s Interbase or its open source, Firebird.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124071926000054

Database Administration

Ming Wang, in Encyclopedia of Information Systems, 2003

III.C.2. Timestamping

Timestamping methods for concurrency control are quite different from locking methods. No locks are involved, and therefore there can be no deadlock. Locking methods generally prevent conflicts by making transactions wait. With timestamp methods, there is no waiting. Transactions involved in conflict are simply rolled back and restarted.

Timestamping is a unique identifier created by the DBMS that indicates the relative starting time of a transaction. Timestamps can be generated by simply using the system clock at the time the transaction started, or by incrementing a logical counter every time a new transaction starts.

With timestamping, a transaction which attempts to read or write a data item is allowed to proceed only if the last update on that data item was carried out by an older transaction. Otherwise, the transaction requesting the read/write is restarted and given a new timestamp to prevent them from being continually aborted and restarted. Timestamping ensures that transaction conflicts are serializable.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0122272404000253

Distributed Databases

M. Tamer Özsu, in Encyclopedia of Information Systems, 2003

VI. Distributed Concurrency Control

Whenever multiple users access (read and write) a shared database, these accesses need to be synchronized to ensure database consistency. The synchronization is achieved by means of concurrency control algorithms that enforce a correctness criterion such as serializability. User accesses are encapsulated as transactions, whose operations at the lowest level are a set of read and write operations to the database. Concurrency control algorithms enforce the isolation property of transaction execution, which states that the effects of one transaction on the database are isolated from other transactions until the first completes its execution.

The most popular concurrency control algorithms are locking-based. In such schemes, a lock, in either shared or exclusive mode, is placed on some unit of storage (usually a page) whenever a transaction attempts to access it. These locks can be two types: shared, indicating that more than two transactions are allowed to access the data, and exclusive, indicating that the transaction needs to be the only one accessing data. Shared locks are also called read locks, since two transactions can read the same data unit, while exclusive locks are also called write locks, indicating that two transactions cannot revise the values of the data unit concurrently. The locks are placed according to lock compatibility rules such that read-write, write-read, and write-write conflicts are avoided. The compatibility rules are the following:

1.

If transaction T1 holds a shared lock on data unit D1, transaction T2 can also obtain a shared lock on D1 (no conflict).

2.

If transaction T1 holds a shared lock on data unit D1, transaction T2 cannot obtain an exclusive lock on D1 (read-write conflict).

3.

If transaction T1 holds an exclusive lock on data unit D1, transaction T2 cannot obtain a shared lock (write-read conflict) or an exclusive lock (write-write conflict) on D1.

It is a well-known theorem that if lock actions on behalf of concurrent transactions obey a simple rule, then it is possible to ensure the serializability of these transactions: “No lock on behalf of a transaction should be set once a lock previously held by the transaction is released.” This is known as two-phase locking, since transactions go through a growing phase when they obtain locks and a shrinking phase when they release locks. In general, releasing of locks prior to the end of a transaction is problematic. Thus, most of the locking-based concurrency control algorithms are strict in that they hold on to their locks until the end of the transaction.

In distributed DBMSs, the challenge is to extend both the serializability argument and the concurrency control algorithms to the distributed execution environment. In these systems, the operations of a given transaction may execute at multiple sites where they access data. In such a case, the serializability argument is more difficult to specify and enforce. The complication is due to the fact that the serialization order of the same set of transactions may be different at different sites. Therefore, the execution of a set of distributed transactions is serializable if and only if the execution of the set of transactions at each site is serializable, and the serialization orders of these transactions at all these sites are identical.

Distributed concurrency control algorithms enforce this notion of global serializability. In locking-based algorithms there are three alternative ways of enforcing global serializability: centralized locking, primary copy locking, and distributed locking algorithm.

In centralized locking, there is a single lock table for the entire distributed database. This lock table is placed, at one of the sites, under the control of a single lock manager. The lock manager is responsible for setting and releasing locks on behalf of transactions. Since all locks are managed at one site, this is similar to centralized concurrency control and it is straightforward to enforce the global serializability rule. These algorithms are simple to implement, but suffer from two problems. The central site may become a bottleneck, both because of the amount of work it is expected to perform and because of the traffic that is generated around it; and the system may be less reliable since the failure or inaccessibility of the central site would cause system unavailability. Primary copy locking is a concurrency control algorithm that is useful in replicated databases where there may be multiple copies of a data item stored at different sites. One of the copies is designated as a primary copy and it is this copy that has to be locked in order to access that item. All the sites know the set of primary copies for each data item in the distributed system, and the lock requests on behalf of transactions are directed to the appropriate primary copy. If the distributed database is not replicated, copy locking degenerates into a distributed locking algorithm.

In distributed (or decentralized) locking, the lock management duty is shared by all the sites in the system. The execution of a transaction involves the participation and coordination of lock managers at more than one site. Locks are obtained at each site where the transaction accesses a data item. Distributed locking algorithms do not have the overhead of centralized locking ones. However, both the communication overhead to obtain all the locks and the complexity of the algorithm are greater.

One side effect of all locking-based concurrency control algorithms is that they cause deadlocks. The detection and management of deadlocks in a distributed system is difficult. Nevertheless, the relative simplicity and better performance of locking algorithms make them more popular than alternatives such as timestamp-based algorithms or optimistic concurrency control

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0122272404000460

Distributed Information Resources

J.B. Lim, A.R. Hurson, in Advances in Computers, 1999

4 The MDAS Environment

There are similarities in the objectives of effectively accessing data in a multidatabase and a wireless-mobile computing environment. This chapter proposes to superimpose a wireless-mobile computing environment on an MDBMS to realize a system capable of effectively accessing data over a wireless medium. This new system is called a mobile data access system (MDAS). By superimposing an MDBMS onto a mobile computing environment, one should be able to map solutions easily from one environment to another. This section will discuss the issues needed to support this new computing environment. Furthermore, we will discuss the structure of the summary schemas model (SSM), a heterogeneous multidatabase environment [11], and show how the SSM is used as the underlying multidatabase environment in an MDAS.

A summary of the issues facing a multidatabase and those involved in a mobile system is given in Table V. Both systems have autonomy and heterogeneity requirements, where the mobility of a system introduces more complexity. The objective of either system is to provide access to data, where the clients and servers of an MDBMS are typically connected through fixed network connections, and the clients in a mobile environment are typically connected through a wireless connection. Both systems must address the data, software, and hardware heterogeneity issues as discussed in sections 2.2 and 2.3. The larger number of potential data sources, the mobility, and the resource constraints in a mobile environment further complicates access to the data. The literature has addressed the heterogeneity issues in an MDBMS [10, 11]. However, these issues have not been addressed in a wireless-mobile computing environment. Traditionally, wireless-mobile computing researchers have investigated and addressed wireless connection, mobility, and portability issues, but have been inclined to ignore many of the issues related to heterogeneity. Consequently, the combined solutions from an MBDMS and a wireless-mobile system should be developed to form an integral part of any MDAS.

Table V. Multidatabase and Mobile System Comparison

Mobile systemMultidatabase system
Site autonomy
Heterogeneous interoperability
Transaction management and concurrency control
Disconnect and weak connection support
Support for resource scarce systems
Distribution transparency
Location transparency
Location dependency
System transparency
Representation transparency
Intelligent search and browsing of data
Intelligent query resolution

Key: ✓ Required; ● Desirable; ○ Optional; ✗ Not required

Autonomy of a system implies that the system should have complete control over the local data and resources, and be able to operate independently. In a multidatabase system, this is referred to as site autonomy, where a local DBMS is autonomous with respect to other systems in the MDBMS. In a mobile system, autonomy refers to the mobile user/application, where the level of autonomy is a function of the available resources (network, processing, storage, etc.). The level of autonomy also varies depending upon the mobile awareness of a particular application, and the support provided by the system. The quality of the wireless/fixed network connection and the processing capacity of the hardware are the primary factors in determining the level of application-autonomy that is required. This type of variable autonomy is only possible if the system and application support this functionality. Some systems may only provide partial application autonomy, or may not even provide any support for this functionality. An MDAS should support both site-level and application-level autonomy.

Schema integration issues include data representation, system, and location transparency issues. As with heterogeneity, these issues have been extensively researched in multidatabase systems [10, 11]. A primary goal of an MDBMS is to present the user with an integrated, global schema of the data. However, in a wireless-mobile computing environment, researchers have overlooked the importance of schema integration in a data access system. Particularly since mobility tends to increase the degree of heterogeneous data available, an MDAS must address schema integration issues in order to present the user with a viable solution for accessing heterogeneous data. Furthermore, mobility introduces an additional challenge in that it may be desirable to have location dependence when accessing data. In such instances, the content and representation of the data could actually depend upon the location of the accessing the data.

Query processing issues are well understood in an MDBMS. A global query is submitted to the system, and the query is decomposed into a set of sub-queries, one for each local DBMS involved in the transaction. In a mobile environment, where the processing power, storage, and energy may be restricted, query processing is non-trivial. If a mobile unit has sufficient resources to perform the query processing, then the query in the MDAS could be processed and executed similar to a query in an MDBMS. However, if the resources are limited, then the processing should be performed by a fixed, more resourceful computing device in the MDAS. One of the disadvantages of this method is that there may be an increase in the network traffic, which poses a problem in a wireless connection. Different strategies to address these issues include object-oriented designs, dynamic adjustment to bandwidth changes, data distillation, and query processing on fixed network devices [10, 23, 44].

Effectively accessing the data in a heterogeneous environment may require an efficient means of searching/browsing the data, in addition to an efficient mechanism to resolve and process a user’s query. In a mobile environment, this may be more difficult to realize due to network, storage, processing power, and energy restrictions. Similar to query processing, the processing could be performed by a fixed, more resourceful computing device in the MDAS if the local host does not have the resources to search/browse data. Furthermore, network traffic increases depending upon the storage capacity of the mobile unit. The lower the storage space the local unit contains, the more the likelihood increases of generating network traffic. In other words, if the local node could store more information and data about the global schema and data, additional local processing (and hence less network traffic) could be achieved.

Transaction processing and concurrency control is an important, yet extremely challenging aspect of data processing. MDBMS researchers have been faced with the problem of maintaining serializability for global transactions. The problem of maintaining serializability in a multidatabase is complicated by the presence of local transactions that are invisible at the global level. There are two methods used to maintain serializability in an MDBMS:

1.

Bottom-up approach. The global serializability is verified by collecting local information from the local DBMSs and validating the serialization orders at the global level. The global scheduler is responsible for detecting and resolving incompatibilities between global transactions and local serialization orders. Optimistic concurrency control mechanisms are usually used with the bottom-up approach. This optimistic nature allows for a higher degree of concurrency among transactions. However, the higher throughput is achieved at the expense of lower resource utilization and more overhead due to rollbacks from failed transactions.

2.

Top-down approach. The global scheduler is allowed to determine the serialization order of global transactions before they are submitted to the local sites. The local DBMS must then enforce this order at the local site. This method is a pessimistic approach, and subsequently leads to a potentially lower degree of concurrency. It forces the global order on the local schedulers. Consequently, runtime decisions regarding the ordering of transactions is not needed.

In an MDAS environment, the system should be able to provide global serializability and local autonomy to a user using a wireless connection. The restrictions imposed by a wireless connection have led to the use of optimistic concurrency control schemes in mobile-wireless environments [22, 25, 36, 39]. In addition, an application in an MDAS may be required to use both weak and strong consistency rules, where the application is required to adapt to changing environmental conditions. An operation uses weak consistency guidelines when data in a write operation is updated or written without immediate confirmation, and data in a read operation is based upon an approximately accurate value. In MDBMSs, weak consistency is used to increase global transaction throughput. In an MDAS, weak consistency may be required due to disconnection and/or a weak network connection.

Mobility and some of its consequences—e.g. disconnection and weak connections (communication restrictions), processing, storage, display, and energy restrictions—introduce additional complexities when a user accesses data. A local cache and prefetching in a mobile unit has been extensively used to address the problems associated with disconnection and weak connections. The idea is that when a disconnection occurs, the mobile unit operates in an autonomous state while performing operations on the local cache. When the connection is re-established, a resynchronization between the cache in the local unit and the server occurs [4, 15, 47]. The use of various prefetch schemes has been used to ensure that the required data is available in the cache during a disconnection [4, 36, 44, 46, 51]. Additionally, some type of queuing mechanism is usually provided in order to perform operations on data that may not be contained in the cache [4]. Predictive schemes, where the system is actually able to anticipate a disconnection, are used to lessen the impact of a disconnection. Finally, broadcasting of data on wireless channels has been suggested in order to reduce network traffic [52].

Finally, processing power and display limitations in a mobile unit introduce additional challenges to an MDAS. Offloading the processing performed on the local unit to fixed hosts is commonly used in wireless-mobile environments. Data distillation is commonly used to address the display network limitations of a mobile unit. Many mobile units are not capable of displaying multimedia data (video, images, sound, etc.). Data distillation is a process where incoming data is “distilled,” or processed, such that only portions of the data that the unit is capable of displaying are shown on the screen. Furthermore, if network bandwidth is limited, data distillation is used to reduce the network traffic by distilling video, images, or sound. In order to address the limitations inherent in a mobile unit, an MDAS should use some or all of these aforementioned methods. A summary of the issues and possible solutions in an MDAS are given in Table VI.

Table VI. Summary of Issues and Solutions for an MDAS

CharacteristicsIssuesSolutions
Site autonomy Autonomy is required in an MDAS system. Provide both site-level and application-level autonomy.
Heterogeneous interoperability Heterogeneous interoperability is required in an MDAS system. Use traditional methods for heterogeneity from MDBMSs.
Transaction management and concurrency control Provide global serializability to transactions. Use a bottom-up approach with optimistic concurrency control.
Disconnect and weak connection support A wireless medium results in lower bandwidth and disconnections. Local cache, prefetching, and broadcasts.
Support for resource scarce systems Limited processing power, storage, energy, and display. Object-oriented design, multi-tiered architecture, offload processing to fixed hosts, data distillation, and broadcasting.
Distribution transparency
Location transparency
Location dependency
System transparency
Representation transparency
Schema integration issues, and greater impact due to mobility. Use traditional methods for schema integration from MDBMSs.
Intelligent search and browsing of data Limited processing power, storage, energy, and display. Reduction of local data storage requirements, object-oriented design, multi-tiered architecture, and offload processing to fixed hosts.
Intelligent query resolution Limited processing power, storage, energy, and display. Object-oriented design, multi-tiered architecture, offload processing to fixed hosts, and data distillation.

4.1 Summary Schemas Model for Multidatabase Systems

Accessing a heterogeneous multidatabase system is a challenging problem. Multidatabase language and global schema systems suffer from inefficiencies and scalability problems. The SSM has been proposed as an efficient means to access data in a heterogeneous multidatabase environment [11]. The SSM primarily acts as a backbone to a multidatabase for query resolution. It uses a hierarchical meta structure that provides an incrementally concise view of the data in the form of summary schemas. The hierarchical data structure of the SSM consists of leaf nodes and summary schema nodes. Each leaf node represents a portion of a local database that is globally shared. The summary schema nodes provide a more concise view of the data by summarizing the schema of each child node. Figure 5 depicts the architecture of the SSM. The terms in the schemas are related through synonym, hypernym and hyponym links [11]. Synonyms are words with a similar definition, and are used to link terms within the same level of the hierarchy. A hypernym is a word that has a more comprehensive and general definition, whereas a hyponym is a word that has a more precise definition. Hypernyms and hyponyms are used to establish links between terms of parent and child nodes.

What is importance of concurrency control explain with example time stamping protocol?

Fig. 5. Summary Schemas Model, N levels, and M local nodes.

The SSM intelligently resolves user queries [11]. When the user knows the precise access terms of a local database, the query is directly submitted to the corresponding local data source. However, when the precise access terms and/or location of the data not known to the user, the SSM processes the query in the user’s own terms and submits a global query to the system. A simulation and benchmark of the SSM model was performed, and the benefits are shown in [10, 11, 14]. The simulator compared and contrasted the costs of querying a multidatabase system using precise queries (access terms and locations are known) against the intelligent query processing using the SSM for the imprecise queries (access terms and locations are unknown). The results showed that the intelligent query processing of the SSM and an exact query incurred very comparable costs (i.e. there were only small overhead costs to using the SSM) [11]. Interestingly, using intelligent SSM query processing actually outperformed an exact query in certain circumstances [14]. These results are very relevant to an MDAS in that a user would most likely require this functionality while accessing heterogeneous data sources.

The SSM provides several benefits to the user over traditional multidatabase systems, which can be directly applied to an MDAS. These include:

The SSM provides global access to data without requiring precise knowledge of local access terms or local views. The system can intelligently process a user’s query in his/her own terms.

The SSM’s hierarchical structure of hypernym/hyponym relationships produces incrementally concise views of the global data. The overall memory requirements for the SSM, compared to the requirements of a global schema, are drastically reduced by up to 94% [14]. Subsequently, the SSM could be kept in main memory, thus reducing the access time and query processing time. Furthermore, for very resource limited devices in an MDAS, only portions of the upper levels of the SSM meta data structure could be stored locally, which would still provide a global view (albeit less detailed) of the system.

The SSM can be used to browse/view global data in the multidatabase system. The user can either (1) follow semantic links in a summary schemas node, or (2) query the system for terms that are similar to the user’s access terms. In either case, the SSM could be used to browse data by “stepping” through the hierarchy, or view semantically similar data through queries. Moreover, for resource limited devices in an MDAS, portions of the upper levels of the SSM could be used to provide a global view (albeit less detailed) of the system, which could be used for browsing/searching.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/S0065245808600194

Locking

Philip A. Bernstein, Eric Newcomer, in Principles of Transaction Processing (Second Edition), 2009

Equivalence of Histories

When we design a concurrency control algorithm, we need to show that every execution of transactions that is permitted by the algorithm has the same effect as a serial execution. So to start, we need a formal model of an execution of transactions. As in Section 6.1, we model an execution as a history, which is a sequence of the read, write, and commit operations issued by different transactions. To simplify matters, we do not consider aborted transactions in this analysis, although they can be included with some modest additional complexity to the theory. For clarity, we do include commit operations, denoted by ci for transaction Ti.

We formalize the concept of “has the same effect as” by the concept of equivalence between two histories. Informally, we say that two histories are equivalent if each transaction reads the same input in both histories and the final value of each data item is the same in both histories. Formally, we say that two histories are equivalent if they have the same operations and conflicting operations are in the same order in both histories. This captures the informal notion of “has the same effect as” because changing the relative order of conflicting operations is the only way to affect the result of two histories that have the same operations. For example, the following histories are equivalent:

H1=r1[x] r2[x] w1[x] c1 w2[y] c2H2=r2[x] r 1[x] w1[x] c1 w2[y] c2H3=r2[x] r1[x] w2[y] c2 w1[x] c1H4=r2[x] w2[y] c2 r1[x] w1[x] c1

But none of them are equivalent to

H5=r1[x] w1[x] c1 r2[x] w2[y] c2

The reason is that r2[x] and w1[x] conflict and r2[x] precedes w1[x] in H1– H4, but r2[x] follows w1[x] in H5.

We model a serial execution as a serial history, which is a history where the operations of different transactions are not interleaved. For example, H4 and H5 are serial histories, but H1– H3 are not.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781558606234000068

What is concurrency control explain its importance with an example?

If many transactions try to access the same data, then inconsistency arises. Concurrency control required to maintain consistency data. For example, if we take ATM machines and do not use concurrency, multiple persons cannot draw money at a time in different places. This is where we need concurrency.

What is concurrency explain time stamp based protocol with example?

The Timestamp-based protocol ensures that every conflicting read and write operations are executed in a timestamp order. The older transaction is always given priority in this method. It uses system time to determine the time stamp of the transaction. This is the most commonly used concurrency protocol.

What is the importance of concurrency?

It enable to run multiple applications at the same time. It enables that the resources that are unused by one application can be used for other applications. Without concurrency, each application has to be run to completion before the next one can be run. It enables the better performance by the operating system.

What is concurrency write the importance of concurrency control in distributed systems?

Concurrency control is the activity of co- ordinating concurrent accesses to a data- base in a multiuser database management system (DBMS). Concurrency control per- mits users to access a database in a multi- programmed fashion while preserving the illusion that each user is executing alone on a dedicated system.