Programmed I/O is a mapping between I/O-related instructions that the processor fetches from memory and commands that the processor issues to I/O modules
Instruction forms depend on addressing policies for external devices
Devices given a unique address
When a processor, main memory and I/O share a bus, two addressing modes are possible
Memory-mapped
Same addres bus used for both memory and I/O
Memory on I/O device mapped into the single address space
Simple, and can use general-purpose memory instructions
Portions of address space must be reserved
Isolated
Bus may have input and output command lines, as well as usual read/write
Command lines specify if address is a memory location or I/O device
Leaves full range of memory address space for processor
Requires extra hardware
Most I/O devices are much slower than CPU, so need some way to synchronise
Busy-wait polling is when CPU constantly polls I/O device for status
Can interleave polling with other tasks
Polling is simple but wastes CPU time and power
When interleaved can lead to delayed response
Interrupt-driven I/O is when devices send interrupts to CPU
IRQs (interrupt requests) and NMIs (non-maskable interrupts)
Interrupt forces CPU to jump to interrupt service routine
Fast response, and does not waste CPU time/power
Complex, and data transfer still controlled by CPU
DMA avoids CPU bottleneck by speeding up transfer of data to memory
Used where large amounts of data needed at high speed
Control of system busses surrendered to DMA controller
DMAC can use cycle stealing or force processor to suspend operation in burst mode
DMA can be more than 10x faster than CPU-driven I/O
Involves addition of dedicated hardware on the system bus
Can have single Bus with a detached DMA, where all modules share the bus
Can connect I/O devices directly to DMA, which reduces bus cycles by integrating I/O and DMA functions
Can have separate I/O bus, DMA connected to system and I/O bus, devices connected to I/O bus
Thunderbolt is a general purpose I/O channel developed by Apple and Intel
Combines data, audio, video, power into single high speed connection (up to 10Gbps)
Based on thunderbolt controller, high speed crossbar switch]
Infiniband is an I/O spec aimed at high-end servers
As performance increased there was a need for larger and faster secondary storage, and one solution is to use disk arrays
Two general ways to utilise a disk array
Data striping transparently distributes data over multiple disks to make the appear as a single large disks
Improves I/O performance by allowing multiple requests to be serviced in parallel
Multiple independent requests can be serviced in parallel by separate disks
Single, multi-block requests can be serviced by disks acting in coordination
More disks = more performance
Redundancy duplicates data accross disks
Allows continuous operation without data loss in case of a disk failure in an array
RAID 0 - non-redundant striping
Lowest cost as there is no redundancy
Data is striped accross all disks
Best write performance as no need to duplicate data
Any 1 disk failure will result in data loss
Used where performance is more important than reliability
RAID 1 - mirrored
2 copies of all info is kept, on separate disks
Uses twice as many disks as a non-redundant array, hence is expensive
On read, data can be retrieved from either disk, hence gives good read performance
If a disk fails, another copy is used
Data can also be striped as well as mirrored, which is RAID 10
RAID 2 - redundancy through Hamming codes
Very small stripes are used, often single byte or word
Employs fewer disks than mirroring by using Hamming codes, error correction codes that can correct single-but errors and detect double-bit errors
Number of redundant disks is proportional to the log of the total number of data disks in the system
On a single write, all data and parity disks must be accessed
Read access not slowed as controller can detect and correct single-bit errors
Overkill and not really used, only effective when lots of disk errors
RAID 3 - bit-interleaved parity
Parallel access, with data in small strips
Bit parity is computer for the set of bits in the same position on all data disks
If drive fails, parity accessed and data reconstructed from remaining devices
Only one redundant disk required
Can acheive high data rates
Simple to implement, but only one I/O request can be executed at a time
RAID 4 - block-interleaved parity
Data striping used, with relatively large strips
Bit-by-but parity calculated accross corresponding strips on each data disk, parity bits stored in the corresponding strip on parity disk
Involves a write penalty for small I/O requests
Parity computed by noting differences between old and new data
Management software mut read old data and parity, then update new data and parity
For large writes that touch all blocks on all disks, parity computed by XORing the data for each new disk
Parity disk can become bottleneck
RAID 5 - block-interleaved distributed parity
Eliminates parity disk bottleneck by distributing parity accross all disks
One of the best small read, large read, and large write performances
Small read requests are still inefficient compared to mirroring due to need to perform read-modify-write operations to update parity
Best parity distribution is left-symmetric
When traversing striping units sequentially, you access each disk once before accessing any disk twice, which reduces disk conflicts when servicing a large request
Commonly used in file servers, most versatile RAID level
RAID 6 - dual redundancy
Multiple disk failures require a stronger code than parity
When disk fails, requires
One scheme, called P + Q redundancy, uses Reed-Soloman codes to protect against up to two disk failures using a bare minimum of two redundant disks
Three disks need to fail for data loss
Significant write penalty, but good for mission-critical applications
SSDs use NAND flash.
Becoming more popular as cost drops and performance increases
High performance I/O
More durable than HDDs
Longer lifespan, lower power consumption, quieter, cooler
Lower access times and latency
Still have some issues
Performance tends to slow over the device's lifetime
Flash becomes unusable after a certain number of writes
Techniques exist for prolonging life, such as front-ending drive with cache and being used in RAID arrays
Storage area networks are for sharing copies of data between many users on a network so anyone can access
Must protect against:
Drive failures - use RAID
Power failures - have redundant power supplies (UPS)
Storage controller failures - have dual active controllers
System unit failures - controllers connect to multiple hosts
Interface failures - have redundant links
Site failures - keep backups offsite
Flash copies produce an instantaneous copy while an application is running, eg for online backups
Use a copy-on-write algorithm
Remote copies are maintained at secondary sites for disaster recovery
Can use synchronous copy, where data is copied before each command executed on host, keeping secondary copy always in sync
Asynchronous copy is done after host executes command, which means data lags but is much more scalable and does not impact host performance