Performance

  • It's about time and the software system's ability to meet timing requirements

    • When events occur marking the passage of time

      • like

        • interrupts

        • messages

        • requests from users or other systems

        • clock events

      • the system, or some element of the system, must respond to them in time.

  • Example

    • web-based system

      • the desired response might be expressed as number oftransactions that can be processed in a minute.

    • engine control system

      • the response might be the allowable variation in the firing time.

  • performance has been the driving factor in system architecture for most of history, due to hardware constraints

    • This has compromised other quality attributes

    • As the price/performance ratio of hardware continues to plummet and the cost of developing software continues to rise, other attributes are gaining more importance

  • all systems have performance requirements

    • even if they are not expressed

    • Example, word processor has the expectation not to wait an hour to see a character on the screen after pressing the button

  • Performance is often linked to scalability

    • increasing your system's capacity for work, while still performing well

    • scalability is making your system easy to change in a particular way, and so is a kind of modifiability

Genearl Scenario

  • it begins with an event arriving at the system

    • Events can arrive in

      • periodic - predictable patterns

        • arrive predictably at regular time intervals.

        • ie an event may arrive every 10 millisecond

        • most often seen in real-time systems

      • Stochastic - mathematical distributions

        • events arrive according to some probabilistic distribution

      • Sporadic unpredictable patterns

        • These can be captured like:

          • we might know that at most 600 events will occur in a minute

          • That there will be at least 200 milliseconds between the arrival of any two events.

        • But hard to specific

          -

  • Responding correctly to the event requires resources (including time) to be consumed

    • can be measured by the following:

      • Latency

        • The time between the arrival ofthe stimulus and the system's response to it

      • Deadlines in processing

        • In the engine controller, for example, the fuel should ignite when the cylinder is in a particular position, thus introducing a processing deadline

      • The throughput of the system

        • usually given as the number of transactions the system can process in a unit of time.

      • The jitter of the response

        • is the allowable variation in latency

      • The number of events not processed because the system was too busy to respond

  • While this is happening, the system may be simultaneously servicing other events

  • Scenario

    • Source of stimulus

      • The stimuli arrive either from external (possibly multiple) or internal sources.

    • Stimulus

      • The stimuli are the event arrivals. The arrival pattern can be periodic, stochastic, or sporadic, characterized by numeric parameters

    • Artifact

      • The artifact is the system or one or more of its components.

    • Environment

      • The system can be in various operational modes, such as normal, emergency, peak load, or overload.

    • Response

      • The system must process the arriving events. This may cause a change in the system environtnent (e.g., from normal to overload mode).

    • Response measure

      • The response measures are the time it takes to process the arriving events (latency or a deadline), the variation in this time (jitter), the number of events that can be processed within a particular time interval (throughput), or a characterization ofthe events that cannot be processed (miss rate)

Concurrency

  • operations occurring in parallel

    • either on the same processor or multiple processors

  • Concurrency occurs any time your system creates a new thread,

    • because threads, by definition, are independent sequences of control

  • Multi-tasking on your system is supported by independent thread

  • Multiple users are simultaneously supported on your system through the use ofthreads.

  • Concurrency occurs any time your system is executing on more than one processors

    • mulit core on same box or different computers

  • Scheduling is important when processing multiple threads on same processor

  • Allowing operations to occur in parallel improves performance, because delays introduced in one thread allow the processor to progress on another thread

    • because of the interleaving phenomenon (race condition), concurrency must also be carefully managed

    • race conditions can occur when there are two threads of control and there is shared state

  • The management of concurrency frequently comes down to managing how state is shared

    • Solutions:

      • for preventing race conditions is to use locks to enforce sequential access to state

      • to partition the state based on the thread executing a portion of code

  • Race conditions are one of the hardest types ofbugs to discover

Tactics

  • The goal of performance tactics is to generate a response to an event arriving at the system within some time-based constraint.

    • The event can be single or a stream and is the trigger to perform computation.

    • tactics control the time within which a response is generated

  • At any instant during the period after an event arrives but before the system's response fo it is complete,

    • either

      • the system is working to respond to that event or

      • the processing is blocked for some reason

  • two basic contributors to the response time:

    • processing time (when the system is working to respond)

      • Processing consumes resources, which takes time

      • Events are handled by the execution of one or more components, whose time expended is a resource

        • Hardware resources

          • CPU, data stores, network communication bandwidth, and tnemory

        • Software resources

          • include entities defined by the system under design.

          • For example, buffers must be managed and access to critical sections must be made sequential

            • A critical section is a section of code in a multi-threaded system in which at most one thread may be active at any time

      • Different resources behave differently as they become saturated

        • as their utilization approaches their capacity

        • For example

          • as a CPU becomes more heavily loaded, performance usually degrades fairly steadily

          • when you start to run out of memory, at some point the page swapping becomes overwhelming and performance crashes suddenly

    • Blocked time (when the system is unable to respond)

      • A computation can be blocked because

        • of contention for some needed resource, because the resource is unavailable or

          • Many resources can only be used by a single client at a time

          • that other clients must wait for access to those resources, until the client has finished using the resource

            • ie a resource can only process 5 requests at a time

          • Multiple streams vying for the same resource or different events in the same stream vying for the same resource contribute to latency.

          • The more contention for a resource, the more likelihood of latency being introduced

        • Availability ofresources

          • computation cannot proceed if a resource is unavailable

          • Unavailability may be caused by the resource being offline or by failure of the component

          • Unavailability may be caused by the resource being offline or by failure of the component

        • the computation depends on the result of other computations that are not yet available

          • A computation may have to wait because

            • it must synchronize with the results of another computation or

            • it is waiting for the results of a computation that it initiated

          • This can increase with network latency or other hardware being performance

  • Type of tactics

    • Control resource demand.

      • This tactic operates on the demand side to produce smaller demand on the resources that will have to service the events

      • Tactic: Manage sampling rate

        • reduce the sampling frequency at which a stream of environmental data is captured, then demand can be reduced, typically with some attendant loss of fidelity

        • maintain predictable levels oflatency

        • decide whether having a lower fidelity but consistent stream of data is preferable to losing packets of data

      • Tactic: Limit event response.

        • When discrete events arrive at the system too rapidly to be processed, then the events must be queued until they can be processed

        • Because these events are discrete, it is typically not desirable to "downsample" them

        • may choose to process events only up to a set maximum rate, thereby ensuring more predictable processing when the events are actually processed

        • could be triggered by a queue size or processor utilization measure exceeding some warning level

        • if it is unacceptable to lose any events, then you must ensure that your queues are large enough to handle the worst case.

        • If you choose to drop events,

          • then you need to choose a policy for handling this situation:

            • Do you log the dropped events, or simply ignore them?

            • Do you notify other systems, users, or administrators?

      • Tactics: Prioritize events

        • If not all events are equally important, you can impose a priority scheme that ranks events according to how important it is to service them

        • If there are not enough resources available to service them when they arise, low-priority events might be ignored

        • Ignoring events consumes minimal resources (including time), and thus increases performance compared to a system that services all events all the time

      • Tactic: Reduce overhead

        • The use of intermediaries increases the resources consumed in processing an event stream, and so removing them improves latency

        • Separation of concerns, another linchpin of modifiability, can also increase the processing overhead necessary to service an event if it leads to an event being serviced by a chain of components rather than a single component

        • The context switching and intercomponent communication costs add up,

          • especially when the components are on different nodes on a network

        • is to co-locate resources

          • Co-location may mean hosting cooperating components on the same processor to avoid the time delay of network communication

          • putting the resources in the same runtime software component to avoid even the expense of a subroutine call

          • perform a periodic cleanup of resources that have become inefficient

          • execute single-threaded servers (for simplicity and avoiding contention) and split workload across them

      • Tactic: Bound execution times

        • Place a limit on how much execution time is used to respond to an event

        • Timeouts

        • For iterative, data-dependent algorithms, limiting the number of iterations is a method for bounding execution times.

          • cost is usually a less accurate computation.

      • Tactic: Increase resource efficiency

        • Improving the algorithms used in critical areas will decrease latency

    • Manage resources.

      • This tactic operates on the response side to make the resources at hand work more effectively in handling the demands put to them

      • Tactic: Increase resources

        • Faster processors, additional processors, additional memory, and faster networks all have the potential for reducing latency

        • Can cost, at the very top limit may not be cost effective

      • Tactic: Introduce concurrency

        • If requests can be processed in parallel, the blocked time can be reduced

        • Concurrency can be introduced by

          • processing different streams of events on different threads

          • by creating additional threads to process different sets of activities

        • scheduling policies can be used to achieve the goals you find desirable

        • Different scheduling policies may

          • maximize fairness (all requests get equal time)

          • throughput (shortest time to finish first)

          • other goals

      • Tactic: Maintain multiple copies of computations

        • Multiple servers in a client-server pattern are replicas of computation

        • purpose of replicas is to reduce the contention that would occur if all computations took place on a single server

        • A load balancer is a piece of software that assigns new work to one of the available duplicate servers

          • criteria for assignment vary but can be as simple as round-robin or assigning the next request to the least busy server.

      • Tactic: Maintain multiple copies of data

        • Caching is a tactic that involves keeping copies of data (possibly one a subset of the other) on storage with different access speeds.

          • The different access speeds may be inherent (memory versus secondary storage) or may be due to the necessity for network communication.

        • Data replication involves keeping separate copies of the data to reduce the contention from multiple simultaneous accesses

        • Because the data being cached or replicated is usually a copy of existing data, keeping the copies consistent and synchronized becomes a responsibility that the system must assume

        • What to cache needs to be decided

          • Some caches operate by merely keeping copies of whatever was recently requested, but it is also possible to predict users' future requests based on patterns of behavior, and begin the calculations or prefetches necessary to comply with those requests before the user has made them.

      • Tactic: Bound queue sizes

        • This controls the maximum number of queued arrivals and consequently the resources used to process the arrivals

        • need to adopt a policy for what happens when the queues overflow and decide if not responding to lost events is acceptable.

      • Tactic: Schedule resources

        • Whenever there is contention for a resource, the resource must be scheduled

        • understand the characteristics of each resource's use and choose the scheduling strategy that is compatible with it

Scheduling Policies

  • A scheduling policy conceptually has two parts

    • a priority assignment

      • ie FIFO

      • it can be tied to the deadline ofthe request or its semantic importance.

      • Competing criteria for scheduling include

        • optimal resource usage

        • request importance

        • minimizing the number of resources used

        • minimizing latency

        • maximizing throughput

        • preventing starvation to ensure fairness

    • dispatching

  • A high-priority event stream can be dispatched only ifthe resource to which it is being assigned is available.

    • Sometimes this depends on preempting the current user of the resource

      • preemption options :

        • can occur anytime

        • can occur only at specific preemption points

        • executing processes cannot be preempted

  • Scheduling policies

    • First-in/first-out

      • treat all requests for resources as equals and satisfy them in turn

      • Issues

        • one request will be stuck behind another one that takes a long time to generate a response

          • Ok if all requests are equal, but not it some have different priorities

    • Fixed-priority scheduling

      • assigns each source of resource requests a particular priority and assigns the resources in that priority order.

      • Issues

        • it admits the possibility of a lower priority, but important, request taking an arbitrarily long time to be serviced, because it is stuck behind a series of higher priority requests.

      • prioritization strategies

        • Semantic importance

          • Each stream is assigned a priority statically according to some domain characteristic of the task that generates it

        • Deadline monotonic

          • is a static priority assignment that assigns higher priority to streams with shorter deadlines.

          • This scheduling policy is used when streams of different priorities with real-time deadlines are to be scheduled

        • Rate monotonic

          • is a static priority assignment for periodic streams that assigns higher priority to streams with shorter periods.

          • This scheduling policy is a special case of deadline monotonic but is better known and more likely to be supported by the operating system.

        • Dynamic priority scheduling

          • Round-robin

            • is a scheduling strategy that orders the requests and then, at every assignment possibility, assigns the resource to the next request in that order.

            • A special form of round-robin is a cyclic executive, where assignment possibilities are at fixed time intervals

          • Earliest-deadline-first

            • assigns priorities based on the pending requests with the earliest deadline

          • Least-slack-first

            • This strategy assigns the highest priority to the job having the least "slack time," which is the difference between the execution time remaining and the time to the job's deadline.

  • For a single processor and processes that are preemptible (it is possible to suspend processing of one task in order to service a task whose deadline is drawing near)

    • the earliest-deadline and least-slack scheduling strategies are optimal.

  • Static scheduling

    • A cyclic executive schedule is a scheduling strategy where the preemption points and the sequence of assignment to the resource are determined offline. The runtime overhead of a scheduler is thereby obviated.

Checklist for Performance

  • Allocation of Responsibilities

    • Determine the system's responsibilities that will involve

      • heavy loading

      • have time-critical response requirements

      • are heavily used

      • impact portions of the system where

        • heavy load's occur

        • time-critical events occur.

    • For those responsibilities

      • Identify the processing requirements of each respons1bility,

      • determine whether they may cause bottlenecks

    • identify additional responsibilities to recognize and process requests appropriately, including

      • Responsibilities that result from a thread of control crossing process or processor boundaries

      • Responsibilities to manage the threads of control-allocation and deallocation of threads, malntalnlng thread pools, and so forth

      • Responslbilities for scheduling shared resources or managing performance-related artifacts such as

        • queues

        • buffers

        • caches

    • For the responsibiliti.es and resources you identified, ensure that the required performance response can be met

  • Coordination Model

    • Determine the elements of the system that must coordinate with each other-directly or indirectly and choose communication and coordination mechanisms that do the following:

      • Support any introduced

        • concurrency (for example, is it thread safe?)

        • event prioritization

        • scheduling strategy

      • Ensure that the required performance response can be delivered

      • Can capture periodic, stochastic, or sporadic event arrivals, as needed

      • Have the appropriate properties of the communication mechanlsms;

        • for example:

          • stateful

          • stateless

          • synchronous

          • asynchronous

          • guaranteed delivery

          • throughput

          • latency

  • Data Model

    • Determine those port1;ons of the data model that will be

      • heavily loaded

      • have time-critical response requirements

      • are heavily used

      • impact portions of the system where heavy loads

      • time-critical events occur

    • For those data abstractions, determine the following:

      • Whether maintaining multiple copies of key data would benefit performance

      • Whether partitioning data would benefit performance

      • Whether reducing the processing requirements for the creatlon, initialization, persistence, manipulation, translation, or destruction of the enumerated data abstractions is possible

      • Whether adding resources to reduce bottlenecks for the creation, Initialization, persistence, manipulatton, translation, or destruction of the enumerated data abstractions is feasible

  • Mapping among Architectural Elements

    • Where heavy network loading will occur, determine whether co-locating some components will reduce loading and improve overall efficiency.

    • Ensure that components with heavy computation requirements are assigned to processors with the most processing capacity

    • Determine where introducing concurrency (that is, allocattng a piece of functionallty to two or more copies of a component running simultaneously) is feasible and has a significant positive effect on performance

    • Determine whether the choice of threads of control and their associated responsibilities introduces bottlenecks.

  • Resource Management

    • Determine which resources in your system are critical for performance.

    • For these resources, ensure that they will be monitored and managed under normal and overloaded system operation. For example:

      • System elements that need to be aware of, and manage, time and other performance-critical resources

      • Process/thread models

      • Prioritization of resources and access to resources

      • Scheduling and looking strategies

      • Deploying additional resources on demand to meet increased loads

  • Binding Time

    • For each element that will be bound after compile time, determine time following:

      • Time necessary to complete the binding

      • Additional overhead introduced by using the late binding mechanism

    • Ensure that these values do not pose unacceptable performance penalties on the system

  • Choice of Technology

    • Will your choice of technology let you set and meet hard, real time deadlines?

    • Do you know its characteristics under load and its limits?

    • Does your choice of technology g'ive you the ability to set the following

      • Scheduling policy

      • Priorities

      • Policies for reducing demand

      • Allocation of portions of the technology to processors

      • Other performance-related parameters

    • Does your choice of technology introduce excessive overhead for heavily used operations?

Last updated