Concurrency

Definitions

  • Concurrency occurs when different processes or applications need the same resource to continue to work.

  • When computations overlap in time and which may permit the sharing of a common resource (ie memory) between those overlapped computations

  • Systems must support many simultaneous users

    • need to guarantee correctness

Issues

  • It is hard, due to number of possibilities of order of executions

    • Hard to test

    • Hard to debug

  • Let others handle it and dont think about it, like transaction managers

    • When all actions that need to occur happen in a transaction, nothing bad happens

    • But not all actions can be placed in a transaction, especially database transactions

  • Ignored about

Execution contexts

  • Processing always occurs in some kind of context, usually more than one.

  • An enterprise application will usually have Requests and Sessions both as a server to clients and as a client to other applications like a DB server or a 3rd party application API.

  • Application servers where supporting multiple threads in an application server system.

Definitions

  • Request

    • A single short term call from the outside world to the application, which may result in it returning a response

    • During a request the processing is largely in the server's court and the client is assumed to wait for a response.

      • This could be a message and client may not need to wait for the processing on the server to finish

    • More often a client may issue another request that may interfere with one it just sent.

      • So a client may ask to place an order and then issue a separate request to cancel that order.

      • From the client's view the two requests may be obviously connected, but depending on your protocol that may not be so obvious to the server.

  • Session

    • A long running interaction between client and server, usually consisting of a series of requests.

    • ie loggin in, doing multiple actions, logging off on purpose or via timeout

  • Process

    • A heavyweight context that provides isolation to the data it works on.

  • Thread

    • A lightweight context that is set up so that several threads are ran in a single process.

    • They support multiple requests in a single process, helping to make good usage of resources.

    • However, because they run in a single process, they share memory, which leads to concurrency problems.

  • Transaction

    • A transaction groups together several Requests that the application wants to treat as a single Request.

    • We can have system transactions (ie. connecting to a DB) or business transactions (ie. a user interacting with the application)

Issues

  • Not all context line up perfectly

  • In theory each session would have an exclusive relationship with a process for its whole lifetime

    • Since processes are properly isolated from each other, this would help reduce concurrency conflicts

    • But this does not exists

    • Alternatively, start a new process for each request

      • but bad as ties up a lot of resources on one machine

      • but good for multi cores or microservices ie replicas

Concurrency problems

  • Concurrency problems occur when more than one active agent, such as a process or thread, has access to the same piece of data

  • Concurrency solutions tries to solve these issues

  • inconsistent reads and lost updates

  • failure of correctness/safety

    • result in incorrect behavior that would not have occurred without two people trying to work with the same data at the same time

  • it's not enough to worry about correctness;

    • have to worry about liveness: how much concurrent activity can go on.

  • Often people need to sacrifice some correctness to gain more liveness

  • The solutions lead to other problems

Inconsistent reads

  • when you read two things that are correct pieces of information but not correct at the same time.

  • when a user reads some data and that data is changed by another user while he is still using the data.

    • The data he read and is still using, is then inconsistent.

  • Solution

    • Pessimistic lock solution

      • Using a pessimistic lock we need to acquire a shared lock for reading and an exclusive lock for writing. However, the rules to acquire a lock are:

        • Any number of shared locks can be acquired at any given time, as long as there is no exclusive lock acquired.

        • An exclusive lock can be acquired at any given time, as long as no other lock (shared or exclusive) has been acquired.

    • Optimistic lock solution

      • With an optimistic lock, every piece of data has a version tag on it.

      • When making an update, the system checks for differences in data versions. If a difference is detected, a conflict is raised.

Lost updates

  • When two or more users update the same resource (ie file) but one saves before the other user saves their update before the first user, the first user does not get this update and saves, so the resource does not include both updates

Race conditions

  • When 2 or more users are updating that same data, a race condition can occur causing corrupt data

Isolation and Immutability solutions

  • Solve concurrency problems

isolation

  • Partition the data so that any piece of it can only be accessed by one active agent.

  • arrange things so that the programs enters an isolated zone, within which you don't have to worry about concurrency

  • Processes work like this in operating system memory

    • The operating system allocates memory exclusively to a single process, and only that process can read or write the data

    • file locks, once open no one else can update it, but multiple users can open read only copies

  • Good

    • Reduce errors

Immutability

  • Avoids all concurrency problems

  • Cannot make everything immutable, as most systems is designed to mutate data

  • by identifying some data as immutable, or at least immutable almost all the time

  • separate applications that are only reading data, and have them use copied data sources, from which we can then relax all concurrency controls

Optimistic and Pessimistic Concurrency Control

  • USed when cannot isolate mutable data

  • Forms of locking

  • Can cause a set of different problems

Optimistic

  • A copy is made for each user. The first to finish commits update as normal, the following users to commit must handle the conflicts between their changes and previously saved changes from the same copy. Once resolved then can commit

  • Git

  • About conflict detection

  • lock held during the commit

  • Good

    • Can increase concurrency

  • Bad

    • How to handle conflicts, as it will probably need tooling or manual intervention

    • How to do with business data

Pessimistic

  • whoever checks out the file first prevents anyone else from editing it.

  • about conflict prevention

  • Bad

    • prevents concurrency, everyone must wait, bottle neck

Preventing inconsistent reads

  • easy to miss because most people tend to focus on lost updates as the essential problem in concurrency

  • Pessimistic locks have a well-worn way of dealing with this problem through read and write locks.

  • To read data you need a read (or shared) lock; to write data you need a write (or exclusive) lock.

  • Many people can have read locks on the data at one time

  • if anyone has a read lock nobody can get a write lock

  • once somebody has a write lock, then nobody else can have any lock.

  • Optimistic locks

    • use some form of version marker on the data

    • timestampe or sequential counter

    • To detect lost updates the system checks the version marker of your update with the version marker of the shared data.

    • If they're the same, the system allows the update and updates the version marker

    • In this case every bit of data that was read also needs to have its version marker compared with the shared data. Any differences indicate a conflict.

    • Bad

      • Controlling access to every bit of data that's read often causes unnecessary problems due to conflicts or waits on data that doesn't actually matter that much

        • Fix - separating out data you've used from data you've merely read.

      • Figuring out what needs to be controlled and not is tough

  • Temproal Reads

    • These prefix each read of data with some kind of timestamp or immutable label, and the database returns the data as it was according to that time or label.

    • similar to event sourcing

    • Bad

      • data source needs to provide a full temporal history of changes, which takes time and space to process

      • difficult for databases

Deadlocks

  • Issue with Pessimistic locking

  • Multiple users need access to another users file which is under going change. But everyone has a lock on that file. Cannot release the lock until all their changes are done. And their wont be done unless they change the file which has a lock. Neither can make progress until the other completes

  • happens when two processes have locks on different resources and suddenly they both require each others locked resources in order to finish their job and releasing their lock. This means they would both be waiting for each others locks to be released.

  • Fix

    • Have an external detector, which picks a victim(s) who have to give up their lock so others can progress

      • The victims(s) changes will be lost

      • It is hard to detect

    • Give all locks a timeout

      • Once you hit that limit you lose your locks and your work—essentially becoming a victim.

      • This can cause victims to lose changes, when no deadlock has occurred

    • Prevent from happening at the start

      • Deadlocks occur when people who already have locks try to get more or to upgrade from read to write locks.

      • Is to force people to acquire all their locks at once at the beginning of their work and then prevent them gaining more

        • ie get locks on files in alphabetical order

      • If lock is already taken, then this new user becomes victim

    • Automatic victim

      • when a process tries to acquire a lock on an already locked resource, it automatically becomes a victim.

    • mix

  • Best to use timeout and getting all locks at the beginning

Transactions

  • what guarantees that a sequence of actions are all successfully and fully executed, or none is.

  • Bounded sequence of work

  • All participating resources are in a consistent state before and after the transaction

  • Must have ACID properties

  • Primary tool for handling concurrency

  • Example

    • ATM transaction, till you get the money out of the ATM machine, your amount of money in your account value will be same.

ACID

Atomicity

  • All actions in a transaction are guaranteed to complete successfully or none is executed

  • If not completed successfully then all steps must be rolled back

  • There is no such idea of partial completion

  • Committing is the act of saying all steps in the transaction have happened and confirmed

Consistency

  • A system resources must be in a consistent, non corrupt state at the start and end of a transaction.

Isolation

  • The result of a transaction is only visible when the whole transaction is finished.

Durability

  • The result of a transaction must be permanent.

Transactional Resources

  • to mean anything that's transactional

    • that uses transactions to control concurrency

  • Transactions generally occur with databases

    • others queues, printers, atm

  • Higher throughput means transactions must be short as possible

    • This is a request transaction

    • should not span multiple requests

    • a common approach is to start a transaction at the beginning of a request and complete it at the end.

    • Modern systems use AOP and annotations tag methods as transactions

  • Long transactions are transactions that span multiple requests

  • Late transaction

    • Do all the reads outside of a transaction, then only start the transaction when doing the updates

    • If there's a lengthy time lag between the opening of the transaction and the first write, this may improve liveness.

    • can lead to inconsistent reads

    • not worth doing, unless have heavy contention

  • Be aware of locks

    • Many db locks the rows involved to allow multiple transactions on the same table

    • if a transaction locks a lot of rows in a table, then the database has more locks than it can handle and escalates the locking to the entire table—locking out other transactions.

    • This lock escalation

Reducing Transaction Isolation for Liveness

  • If you have full isolation, you get serializable transactions.

  • Transactions are serializable if they can be executed concurrently and you get a result that's the same as you get from some sequence of executing the transactions serially.

  • serializability guarantees that he gets a result that corresponds to completing his transaction either entirely before another transaction starts or after it finishes

  • Serializability can't guarantee which result, as in this case, but at least it guarantees a correct one

  • Serializability is strongest level of isolation

    • defines correctness

    • reduces liveness

    • might need to reduce serializability to increase through put

  • repeatable read

    • Allows phantoms

      • Phantoms occur when you add some elements to a collection and the reader sees only some of them

      • always things that are inserted

  • read committed

    • allow unrepeatable reads and phantoms

    • read before another transaction gives one value, afterwards the same read will give different value that has been affected by the transaction

    • It's easier for databases to spot unrepeatable reads than phantoms, so the repeatable read gives you more correctness than read committed but less liveness.

  • read uncommitted

    • lowest level of isolation

    • allows dirty reads, unrepeatable reads and phantoms

    • you can read data that another transaction hasn't actually committed yet

    • Two errors

      • user a might look at the locking package when user b adds the first of his files but before he adds the second. As a result he sees n + 1 files in the locking package, instead of the final outcome n + 2 files

      • user b adds his files but then rolls back his transaction, thus user a sees files that were never really there

Business and system transactions

  • A database transaction is a group of SQL commands delimited by instructions to begin and end it.

    • If the fourth statement in the transaction results in an integrity constraint violation, the database must roll back the effects of the first three statements and notify the caller that the transaction has failed.

    • If all four statements had completed successfully all would have been made visible to other users at the same time rather than one at a time.

  • to supporting the ACID properties of a business transaction is to execute the entire business transaction within a single system transaction

  • business transactions often take multiple requests to complete, so using a single system transaction to implement one results in a long system transaction.

  • Most transaction systems don't work very efficiently with long transactions.

  • the application won't be scalable because long transactions will turn the database into a major bottleneck

  • refactoring from long to small transactions is hard

  • Business transactions need to be done as short transactions, and done at the beginning

    • Thus ACID properties are handled by the system rather than in the database

    • Called Offline Concurrency

  • Business transactions are closely tied to sessions. In the user's view each session is a sequence of business transactions (unless they're only reading data), so we usually make the assumption that all business transactions execute in a single client session

Types

  • Long transaction

    • Spans several requests

  • Request transaction

    • Always starts in the beginning of a request and finishes in the end of the request.

  • Late transaction

    • As a request transaction, but it is open as late as possible, allowing the reads to be made outside of the transaction.

  • System transaction

    • A transaction that operates at the infrastructure level, usually when communicating to the DB.

  • Business transaction

    • A transaction that operates at a use case level, ie. encompasses several user screens from the beginning of the use case flow until its end.

offline concurrency,

  • concurrency control for data that's manipulated during multiple database transactions.

Application Server Concurrency

Locking

  • To avoid problems of lost updates or inconsistent reads, need to lock access to the shared data

  • Why

    • provides correctness

    • reducess concurrency

    • only one person/thread can work on the data at the same time

  • Bad

    • Liveness suffers

      • Reduce the number of concurrent activity occurs

      • as the lock creates a bottle neck where other threads must wait to finish and release the lock

Optimistic Offline Lock

Pessimistic Offline Lock

Coarse-Grained Lock

Implicit Lock

Last updated

Was this helpful?