Mass Storage Structure

UNIT 4: I/O System: Mass storage structure - overview, disk structure, disk attachment, disk scheduling algorithm

Overview

Mass storage is a critical component of computer systems, enabling the storage of large volumes of data for long-term retention. It plays a vital role in various computing environments, from personal computers to enterprise-level data centers. The mass storage structure encompasses the organization, management, and access methods for efficiently storing and retrieving data. Key aspects of the mass storage structure include:

Storage Media: Mass storage typically involves the use of magnetic disks (hard disk drives or HDDs), solid-state drives (SSDs), optical discs (CDs, DVDs, Blu-ray), or tape drives. Each storage medium has its own characteristics in terms of capacity, performance, and durability.
File Systems: File systems provide the structure and format for organizing data on storage devices. They define how files are named, stored, and accessed, as well as how data is organized into directories and subdirectories. Common file systems include NTFS, FAT32, exFAT, ext4, HFS+, APFS, and others.
Partitioning: Storage devices are often divided into partitions, each treated as a separate logical volume. Partitioning allows for better organization, management, and isolation of data, as well as support for multiple operating systems on the same physical disk.
Volume Management: Volume management involves the creation, resizing, and management of logical volumes within partitions. It enables features such as spanning multiple disks, creating redundant copies for fault tolerance (mirroring), and striping data across multiple disks for performance (RAID).
Access Methods: Mass storage devices can be accessed using various methods, including block-level access (reading and writing data in fixed-size blocks), file-level access (accessing files and directories through a file system), and network protocols (such as NFS, SMB/CIFS, and iSCSI for remote access).
Data Protection: Ensuring the integrity and security of stored data is crucial. This involves implementing mechanisms such as redundancy (RAID), data checksums, encryption, access controls, and backup/recovery solutions to protect against data loss, corruption, and unauthorized access.
Performance Optimization: Optimizing the performance of mass storage systems requires considerations such as disk layout (placement of data on disks), caching (temporary storage of frequently accessed data in faster memory), disk scheduling (order in which I/O requests are serviced), and parallelism (using multiple disks or SSDs in parallel for increased throughput).

Disk Structure

A disk, also known as a hard disk drive (HDD) or simply a hard drive, is a common mass storage device used in computers to store and retrieve digital data. It consists of one or more circular platters coated with a magnetic material, typically made of aluminum or glass. Each platter is divided into concentric circular tracks, and each track is further divided into sectors.

Components of a Disk:

Platters: Platters are the main components of a disk and are responsible for storing data. They are usually made of a rigid material and coated with a magnetic material that can hold digital data.
Tracks: Tracks are concentric circles on the surface of each platter. Data is stored along these tracks, and each track represents a specific distance from the center of the disk.
Sectors: Sectors are pie-shaped divisions of each track. They represent the smallest unit of storage on a disk and typically store a fixed amount of data, such as 512 bytes or 4 KB.
Heads: Disk heads are read/write mechanisms that move across the surface of the platters to read and write data. Modern hard drives have multiple heads that can access different tracks simultaneously, allowing for faster data access.
Actuator: The actuator is a mechanical arm that moves the disk heads across the surface of the platters. It positions the heads over the desired track for reading or writing data.
Spindle: The spindle is a motorized shaft that rotates the platters at a constant speed. The rotation speed, measured in revolutions per minute (RPM), affects the performance of the disk.
Cylinder: A cylinder consists of all the tracks that reside at the same position on each platter. It represents the vertical alignment of tracks across multiple platters and is used as a logical unit for organizing data.

Data Organization:

Data is organized on a disk in a hierarchical manner, with files and directories stored within a file system. The file system manages the allocation of disk space, tracks the location of files, and provides mechanisms for reading and writing data. Common file systems include NTFS, FAT32, ext4, and HFS+.

Disk Attachment

Disks can be attached to a computer system using various interfaces, each offering different levels of performance, flexibility, and compatibility. Here are some common disk attachment interfaces:

SATA (Serial ATA): SATA is a widely used interface for connecting internal hard disk drives (HDDs) and solid-state drives (SSDs) to the motherboard of a computer. It offers high data transfer rates, hot-swapping support, and compatibility with a wide range of devices. SATA is commonly used in desktops, laptops, and consumer-grade storage devices.
SCSI (Small Computer System Interface): SCSI is a high-performance interface used primarily in server and enterprise environments for connecting multiple disks to a single controller. SCSI supports a wide range of devices, including hard drives, tape drives, and optical drives. It offers advanced features such as command queuing, multiple device support, and high reliability. SCSI is commonly used in RAID arrays and storage area networks (SANs).
USB (Universal Serial Bus): USB is a popular interface for connecting external storage devices, such as USB flash drives, external hard disk drives (HDDs), and solid-state drives (SSDs), to computers and other devices. USB offers plug-and-play compatibility, high-speed data transfer rates, and broad device support. It is commonly used for portable storage and backup solutions.
NVMe (Non-Volatile Memory Express): NVMe is a high-performance interface designed specifically for solid-state drives (SSDs) to leverage the high-speed PCIe (Peripheral Component Interconnect Express) bus. NVMe offers significantly faster data transfer rates and lower latency compared to SATA and SAS interfaces. It is commonly used in high-end desktops, workstations, and enterprise storage systems that require maximum performance.

When choosing a disk attachment interface, consider factors such as performance requirements, compatibility with existing hardware, ease of installation, and cost. The appropriate interface will depend on the specific use case and system requirements.

Disk Scheduling Algorithms

Disk scheduling algorithms are used to optimize the order in which disk I/O requests are serviced, with the goal of minimizing seek time and maximizing disk throughput. Here are some common disk scheduling algorithms:

FCFS (First-Come, First-Served): FCFS processes disk I/O requests in the order they arrive, without considering the location of data on the disk. While simple and fair, FCFS can result in long wait times for requests that are far from the current disk head position.
SSTF (Shortest Seek Time First): SSTF services the disk I/O request with the shortest seek time, prioritizing requests that are closer to the current disk head position. SSTF minimizes disk arm movement and can improve disk access times, but may lead to starvation of requests further from the current head position.
SCAN: SCAN moves the disk head back and forth across the disk, servicing requests in the current direction until reaching the edge, then reverses direction. SCAN prevents starvation of requests at the edge of the disk but may result in longer wait times for requests in the opposite direction.
C-SCAN (Circular SCAN): C-SCAN is similar to SCAN but restricts movement to one direction, servicing requests in a circular fashion. C-SCAN reduces variance in wait times compared to SCAN but may lead to increased average response times.
LOOK: LOOK is similar to SCAN but does not service requests at the edge of the disk unless there are no other requests in the same direction. LOOK reduces unnecessary disk head movement compared to SCAN and may improve disk throughput.
C-LOOK (Circular LOOK): C-LOOK is similar to LOOK but restricts movement to one direction, servicing requests in a circular fashion. C-LOOK reduces variance in wait times compared to LOOK but may result in increased average response times.

When selecting a disk scheduling algorithm, consider factors such as workload characteristics, disk access patterns, and performance requirements. Each algorithm has its strengths and weaknesses, and the best choice depends on the specific use case and system configuration.

Swap Space Management

Swap space, also known as virtual memory, is a reserved area on a disk used by the operating system to temporarily store inactive memory pages when physical RAM (random access memory) is insufficient. Swap space management involves allocating and deallocating swap space efficiently to optimize system performance. Here are some strategies for swap space management:

Dynamic Allocation: Dynamic allocation involves allocating swap space as needed based on system memory usage. When the system requires additional virtual memory due to high memory demand, swap space is allocated dynamically. Similarly, swap space can be deallocated when memory demand decreases. Dynamic allocation helps optimize disk usage and ensures efficient utilization of available resources.
Swappiness: Swappiness is a parameter that controls the aggressiveness of swapping in a Linux-based system. It determines the tendency of the kernel to move processes from physical memory to swap space. By adjusting the swappiness value, administrators can balance between utilizing swap space to increase available memory and preserving system responsiveness. A higher swappiness value makes the system more aggressive in swapping, while a lower value limits swapping to essential cases.
Swap Partition vs. Swap File: When configuring swap space, administrators have the option to use either a dedicated swap partition or a swap file. A swap partition is a separate partition on a disk reserved exclusively for swap space, while a swap file is a regular file stored within the file system. Choosing between a swap partition and a swap file depends on various factors, including performance considerations and flexibility. A swap partition generally offers better performance due to its dedicated space and fixed location, while a swap file provides more flexibility in resizing and managing swap space without repartitioning the disk.

Effective swap space management is crucial for maintaining system stability and performance, especially in environments with limited physical memory or heavy workload demands. By implementing appropriate swap space management strategies, administrators can optimize system resources and ensure smooth operation under varying conditions.

RAID Types

RAID (Redundant Array of Independent Disks) is a technology that combines multiple disk drives into a single logical unit for data redundancy, performance improvement, or both. Different RAID levels offer various combinations of performance, redundancy, and capacity utilization. Here are some common RAID types:

RAID 0 (Striping): RAID 0 distributes data across multiple disks (striping) to improve performance by parallelizing data access. However, RAID 0 offers no redundancy, meaning that the failure of any disk in the array results in data loss for the entire array.
RAID 1 (Mirroring): RAID 1 mirrors data across multiple disks, creating an exact copy (mirror) of each disk in the array. While RAID 1 provides redundancy and fault tolerance, it does not offer performance improvement since each write operation requires duplicating data to all mirrored disks.
RAID 5 (Striping with Parity): RAID 5 distributes data and parity information across multiple disks for both performance improvement and redundancy. Parity information allows for data recovery in the event of a single disk failure. RAID 5 requires a minimum of three disks and provides a balance between performance, redundancy, and capacity utilization.
RAID 6 (Striping with Double Parity): RAID 6 is similar to RAID 5 but includes additional parity information for improved fault tolerance. RAID 6 can withstand the failure of up to two disks in the array without losing data. However, RAID 6 requires more storage overhead compared to RAID 5 due to the additional parity information.
RAID 10 (RAID 1+0): RAID 10 combines mirroring and striping for both performance improvement and redundancy. It first mirrors data across multiple disk pairs and then stripes the mirrored pairs. RAID 10 offers high performance and fault tolerance but requires a larger number of disks and has higher storage overhead compared to other RAID levels.

When choosing a RAID level, consider factors such as performance requirements, fault tolerance, capacity utilization, and cost. Each RAID level has its own trade-offs, and the best choice depends on the specific needs and priorities of the application or system.