Performance

It's about time and the software system's ability to meet timing requirements
- When events occur marking the passage of time
  - like
    interrupts
    messages
    requests from users or other systems
    clock events
  - the system, or some element of the system, must respond to them in time.
Example
- web-based system
  - the desired response might be expressed as number oftransactions that can be processed in a minute.
- engine control system
  - the response might be the allowable variation in the firing time.
performance has been the driving factor in system architecture for most of history, due to hardware constraints
- This has compromised other quality attributes
- As the price/performance ratio of hardware continues to plummet and the cost of developing software continues to rise, other attributes are gaining more importance
all systems have performance requirements
- even if they are not expressed
- Example, word processor has the expectation not to wait an hour to see a character on the screen after pressing the button
Performance is often linked to scalability
- increasing your system's capacity for work, while still performing well
- scalability is making your system easy to change in a particular way, and so is a kind of modifiability

Genearl Scenario

it begins with an event arriving at the system
- Events can arrive in
  - periodic - predictable patterns
    arrive predictably at regular time intervals.
    ie an event may arrive every 10 millisecond
    most often seen in real-time systems
  - Stochastic - mathematical distributions
    events arrive according to some probabilistic distribution
  - Sporadic unpredictable patterns
    These can be captured like:
    we might know that at most 600 events will occur in a minute
    That there will be at least 200 milliseconds between the arrival of any two events.
    But hard to specific
    -
Responding correctly to the event requires resources (including time) to be consumed
- can be measured by the following:
  - Latency
    The time between the arrival ofthe stimulus and the system's response to it
  - Deadlines in processing
    In the engine controller, for example, the fuel should ignite when the cylinder is in a particular position, thus introducing a processing deadline
  - The throughput of the system
    usually given as the number of transactions the system can process in a unit of time.
  - The jitter of the response
    is the allowable variation in latency
  - The number of events not processed because the system was too busy to respond
While this is happening, the system may be simultaneously servicing other events
Scenario
- Source of stimulus
  - The stimuli arrive either from external (possibly multiple) or internal sources.
- Stimulus
  - The stimuli are the event arrivals. The arrival pattern can be periodic, stochastic, or sporadic, characterized by numeric parameters
- Artifact
  - The artifact is the system or one or more of its components.
- Environment
  - The system can be in various operational modes, such as normal, emergency, peak load, or overload.
- Response
  - The system must process the arriving events. This may cause a change in the system environtnent (e.g., from normal to overload mode).
- Response measure
  - The response measures are the time it takes to process the arriving events (latency or a deadline), the variation in this time (jitter), the number of events that can be processed within a particular time interval (throughput), or a characterization ofthe events that cannot be processed (miss rate)

Concurrency

operations occurring in parallel
- either on the same processor or multiple processors
Concurrency occurs any time your system creates a new thread,
- because threads, by definition, are independent sequences of control
Multi-tasking on your system is supported by independent thread
Multiple users are simultaneously supported on your system through the use ofthreads.
Concurrency occurs any time your system is executing on more than one processors
- mulit core on same box or different computers
Scheduling is important when processing multiple threads on same processor
Allowing operations to occur in parallel improves performance, because delays introduced in one thread allow the processor to progress on another thread
- because of the interleaving phenomenon (race condition), concurrency must also be carefully managed
- race conditions can occur when there are two threads of control and there is shared state
The management of concurrency frequently comes down to managing how state is shared
- Solutions:
  - for preventing race conditions is to use locks to enforce sequential access to state
  - to partition the state based on the thread executing a portion of code
Race conditions are one of the hardest types ofbugs to discover

Tactics

The goal of performance tactics is to generate a response to an event arriving at the system within some time-based constraint.
- The event can be single or a stream and is the trigger to perform computation.
- tactics control the time within which a response is generated
At any instant during the period after an event arrives but before the system's response fo it is complete,
- either
  - the system is working to respond to that event or
  - the processing is blocked for some reason
two basic contributors to the response time:
- processing time (when the system is working to respond)
  - Processing consumes resources, which takes time
  - Events are handled by the execution of one or more components, whose time expended is a resource
    Hardware resources
    CPU, data stores, network communication bandwidth, and tnemory
    Software resources
    include entities defined by the system under design.
    For example, buffers must be managed and access to critical sections must be made sequential
    A critical section is a section of code in a multi-threaded system in which at most one thread may be active at any time
  - Different resources behave differently as they become saturated
    as their utilization approaches their capacity
    For example
    as a CPU becomes more heavily loaded, performance usually degrades fairly steadily
    when you start to run out of memory, at some point the page swapping becomes overwhelming and performance crashes suddenly
- Blocked time (when the system is unable to respond)
  - A computation can be blocked because
    of contention for some needed resource, because the resource is unavailable or
    Many resources can only be used by a single client at a time
    that other clients must wait for access to those resources, until the client has finished using the resource
    ie a resource can only process 5 requests at a time
    Multiple streams vying for the same resource or different events in the same stream vying for the same resource contribute to latency.
    The more contention for a resource, the more likelihood of latency being introduced
    Availability ofresources
    computation cannot proceed if a resource is unavailable
    Unavailability may be caused by the resource being offline or by failure of the component
    Unavailability may be caused by the resource being offline or by failure of the component
    the computation depends on the result of other computations that are not yet available
    A computation may have to wait because
    it must synchronize with the results of another computation or
    it is waiting for the results of a computation that it initiated
    This can increase with network latency or other hardware being performance
Type of tactics
- Control resource demand.
  - This tactic operates on the demand side to produce smaller demand on the resources that will have to service the events
  - Tactic: Manage sampling rate
    reduce the sampling frequency at which a stream of environmental data is captured, then demand can be reduced, typically with some attendant loss of fidelity
    maintain predictable levels oflatency
    decide whether having a lower fidelity but consistent stream of data is preferable to losing packets of data
  - Tactic: Limit event response.
    When discrete events arrive at the system too rapidly to be processed, then the events must be queued until they can be processed
    Because these events are discrete, it is typically not desirable to "downsample" them
    may choose to process events only up to a set maximum rate, thereby ensuring more predictable processing when the events are actually processed
    could be triggered by a queue size or processor utilization measure exceeding some warning level
    if it is unacceptable to lose any events, then you must ensure that your queues are large enough to handle the worst case.
    If you choose to drop events,
    then you need to choose a policy for handling this situation:
    Do you log the dropped events, or simply ignore them?
    Do you notify other systems, users, or administrators?
  - Tactics: Prioritize events
    If not all events are equally important, you can impose a priority scheme that ranks events according to how important it is to service them
    If there are not enough resources available to service them when they arise, low-priority events might be ignored
    Ignoring events consumes minimal resources (including time), and thus increases performance compared to a system that services all events all the time
  - Tactic: Reduce overhead
    The use of intermediaries increases the resources consumed in processing an event stream, and so removing them improves latency
    Separation of concerns, another linchpin of modifiability, can also increase the processing overhead necessary to service an event if it leads to an event being serviced by a chain of components rather than a single component
    The context switching and intercomponent communication costs add up,
    especially when the components are on different nodes on a network
    is to co-locate resources
    Co-location may mean hosting cooperating components on the same processor to avoid the time delay of network communication
    putting the resources in the same runtime software component to avoid even the expense of a subroutine call
    perform a periodic cleanup of resources that have become inefficient
    execute single-threaded servers (for simplicity and avoiding contention) and split workload across them
  - Tactic: Bound execution times
    Place a limit on how much execution time is used to respond to an event
    Timeouts
    For iterative, data-dependent algorithms, limiting the number of iterations is a method for bounding execution times.
    cost is usually a less accurate computation.
  - Tactic: Increase resource efficiency
    Improving the algorithms used in critical areas will decrease latency
- Manage resources.
  - This tactic operates on the response side to make the resources at hand work more effectively in handling the demands put to them
  - Tactic: Increase resources
    Faster processors, additional processors, additional memory, and faster networks all have the potential for reducing latency
    Can cost, at the very top limit may not be cost effective
  - Tactic: Introduce concurrency
    If requests can be processed in parallel, the blocked time can be reduced
    Concurrency can be introduced by
    processing different streams of events on different threads
    by creating additional threads to process different sets of activities
    scheduling policies can be used to achieve the goals you find desirable
    Different scheduling policies may
    maximize fairness (all requests get equal time)
    throughput (shortest time to finish first)
    other goals
  - Tactic: Maintain multiple copies of computations
    Multiple servers in a client-server pattern are replicas of computation
    purpose of replicas is to reduce the contention that would occur if all computations took place on a single server
    A load balancer is a piece of software that assigns new work to one of the available duplicate servers
    criteria for assignment vary but can be as simple as round-robin or assigning the next request to the least busy server.
  - Tactic: Maintain multiple copies of data
    Caching is a tactic that involves keeping copies of data (possibly one a subset of the other) on storage with different access speeds.
    The different access speeds may be inherent (memory versus secondary storage) or may be due to the necessity for network communication.
    Data replication involves keeping separate copies of the data to reduce the contention from multiple simultaneous accesses
    Because the data being cached or replicated is usually a copy of existing data, keeping the copies consistent and synchronized becomes a responsibility that the system must assume
    What to cache needs to be decided
    Some caches operate by merely keeping copies of whatever was recently requested, but it is also possible to predict users' future requests based on patterns of behavior, and begin the calculations or prefetches necessary to comply with those requests before the user has made them.
  - Tactic: Bound queue sizes
    This controls the maximum number of queued arrivals and consequently the resources used to process the arrivals
    need to adopt a policy for what happens when the queues overflow and decide if not responding to lost events is acceptable.
  - Tactic: Schedule resources
    Whenever there is contention for a resource, the resource must be scheduled
    understand the characteristics of each resource's use and choose the scheduling strategy that is compatible with it

Scheduling Policies

A scheduling policy conceptually has two parts
- a priority assignment
  - ie FIFO
  - it can be tied to the deadline ofthe request or its semantic importance.
  - Competing criteria for scheduling include
    optimal resource usage
    request importance
    minimizing the number of resources used
    minimizing latency
    maximizing throughput
    preventing starvation to ensure fairness
- dispatching
A high-priority event stream can be dispatched only ifthe resource to which it is being assigned is available.
- Sometimes this depends on preempting the current user of the resource
  - preemption options :
    can occur anytime
    can occur only at specific preemption points
    executing processes cannot be preempted
Scheduling policies
- First-in/first-out
  - treat all requests for resources as equals and satisfy them in turn
  - Issues
    one request will be stuck behind another one that takes a long time to generate a response
    Ok if all requests are equal, but not it some have different priorities
- Fixed-priority scheduling
  - assigns each source of resource requests a particular priority and assigns the resources in that priority order.
  - Issues
    it admits the possibility of a lower priority, but important, request taking an arbitrarily long time to be serviced, because it is stuck behind a series of higher priority requests.
  - prioritization strategies
    Semantic importance
    Each stream is assigned a priority statically according to some domain characteristic of the task that generates it
    Deadline monotonic
    is a static priority assignment that assigns higher priority to streams with shorter deadlines.
    This scheduling policy is used when streams of different priorities with real-time deadlines are to be scheduled
    Rate monotonic
    is a static priority assignment for periodic streams that assigns higher priority to streams with shorter periods.
    This scheduling policy is a special case of deadline monotonic but is better known and more likely to be supported by the operating system.
    Dynamic priority scheduling
    Round-robin
    is a scheduling strategy that orders the requests and then, at every assignment possibility, assigns the resource to the next request in that order.
    A special form of round-robin is a cyclic executive, where assignment possibilities are at fixed time intervals
    Earliest-deadline-first
    assigns priorities based on the pending requests with the earliest deadline
    Least-slack-first
    This strategy assigns the highest priority to the job having the least "slack time," which is the difference between the execution time remaining and the time to the job's deadline.
For a single processor and processes that are preemptible (it is possible to suspend processing of one task in order to service a task whose deadline is drawing near)
- the earliest-deadline and least-slack scheduling strategies are optimal.
Static scheduling
- A cyclic executive schedule is a scheduling strategy where the preemption points and the sequence of assignment to the resource are determined offline. The runtime overhead of a scheduler is thereby obviated.

Checklist for Performance

Allocation of Responsibilities
- Determine the system's responsibilities that will involve
  - heavy loading
  - have time-critical response requirements
  - are heavily used
  - impact portions of the system where
    heavy load's occur
    time-critical events occur.
- For those responsibilities
  - Identify the processing requirements of each respons1bility,
  - determine whether they may cause bottlenecks
- identify additional responsibilities to recognize and process requests appropriately, including
  - Responsibilities that result from a thread of control crossing process or processor boundaries
  - Responsibilities to manage the threads of control-allocation and deallocation of threads, malntalnlng thread pools, and so forth
  - Responslbilities for scheduling shared resources or managing performance-related artifacts such as
    queues
    buffers
    caches
- For the responsibiliti.es and resources you identified, ensure that the required performance response can be met
Coordination Model
- Determine the elements of the system that must coordinate with each other-directly or indirectly and choose communication and coordination mechanisms that do the following:
  - Support any introduced
    concurrency (for example, is it thread safe?)
    event prioritization
    scheduling strategy
  - Ensure that the required performance response can be delivered
  - Can capture periodic, stochastic, or sporadic event arrivals, as needed
  - Have the appropriate properties of the communication mechanlsms;
    for example:
    stateful
    stateless
    synchronous
    asynchronous
    guaranteed delivery
    throughput
    latency
Data Model
- Determine those port1;ons of the data model that will be
  - heavily loaded
  - have time-critical response requirements
  - are heavily used
  - impact portions of the system where heavy loads
  - time-critical events occur
- For those data abstractions, determine the following:
  - Whether maintaining multiple copies of key data would benefit performance
  - Whether partitioning data would benefit performance
  - Whether reducing the processing requirements for the creatlon, initialization, persistence, manipulation, translation, or destruction of the enumerated data abstractions is possible
  - Whether adding resources to reduce bottlenecks for the creation, Initialization, persistence, manipulatton, translation, or destruction of the enumerated data abstractions is feasible
Mapping among Architectural Elements
- Where heavy network loading will occur, determine whether co-locating some components will reduce loading and improve overall efficiency.
- Ensure that components with heavy computation requirements are assigned to processors with the most processing capacity
- Determine where introducing concurrency (that is, allocattng a piece of functionallty to two or more copies of a component running simultaneously) is feasible and has a significant positive effect on performance
- Determine whether the choice of threads of control and their associated responsibilities introduces bottlenecks.
Resource Management
- Determine which resources in your system are critical for performance.
- For these resources, ensure that they will be monitored and managed under normal and overloaded system operation. For example:
  - System elements that need to be aware of, and manage, time and other performance-critical resources
  - Process/thread models
  - Prioritization of resources and access to resources
  - Scheduling and looking strategies
  - Deploying additional resources on demand to meet increased loads
Binding Time
- For each element that will be bound after compile time, determine time following:
  - Time necessary to complete the binding
  - Additional overhead introduced by using the late binding mechanism
- Ensure that these values do not pose unacceptable performance penalties on the system
Choice of Technology
- Will your choice of technology let you set and meet hard, real time deadlines?
- Do you know its characteristics under load and its limits?
- Does your choice of technology g'ive you the ability to set the following
  - Scheduling policy
  - Priorities
  - Policies for reducing demand
  - Allocation of portions of the technology to processors
  - Other performance-related parameters
- Does your choice of technology introduce excessive overhead for heavily used operations?

PreviousModifiability NextReadability

Last updated 4 years ago

Was this helpful?