Current Research on Data Storage Hardware and Software

Introduction

Data persistence is a problem that is solved only by coordinated and complex interactions at every level of computing. Users need to decide what data to create or use. A significant portion of application program code is devoted to manipulating data in a friendly user interface; the same program also needs to implement storage and retrieval of the data in a domain-specific manner. Operating systems in turn abstract the underlying details in order to present the program with a logical view of storage, while device drivers and the operating system kernel must deal with file system- and hardware-specific implementations of raw storage. Finally, hardware providers impact (and are influenced by) the evolution of data storage designs.

There is an increasing demand for new software and hardware data storage mechanisms. This demand comes from two directions — a growing performance gap between aging storage hardware technology and processor/memory speed, and fundamental changes in the way users and applications manage data. Strong current research in new file system and storage hardware technology is introducing many promising new ideas, some of which may be the seeds of a radically different future for data storage implementation in operating systems.

The past decade has seen large amounts of research at all levels of this hierarchy, caused both by growing demand for better storage solutions and expanding opportunities in hardware and software technology. After considering these forces and reviewing research for both hardware and software, we will use examples of real and proposed systems to expand our understanding of data storage trends. Specifically, each section will focus on topics relevant to operating system design: storage hardware technology and interfaces, file systems and their alternatives, and the operating system’s API.

Demand for New Storage Hardware Technology

Though venerable hard disk technology is still advancing rapidly, many factors are creating a strong demand for research into alternative storage technology or more radical changes in magnetic disk design.

A serious threat to continued growth in magnetic disk capacity is called the superparamagnetic effect (SPE). As the density of the bits stored on the surface of a magnetic disk grows, the domain (physical location) of each bit must become smaller and closer to other domains. With the current method of magnetically arranging the particles within a domain to represent a bit, a smaller domain means less physical matter to represent the magnetic charge. At some point, the density will become so great and the domains so small that thermal energy and other noise will overpower the magnetic charge written by the disk write head, and the bits of data will suffer randomization. [TOIGO] Thus, there is a specific limiting density that disks can achieve. Though this number has been adjusted over the years, current magnetic disk technology, no matter how well we can miniaturize the components, will reach a practical limit imposed by SPE. History has shown that demand for more storage capacity will not recede, so we will soon (in probably less than a decade) need an entirely new storage technology to pick up where today’s magnetic disk technology will fail.

Other trends are pushing the limits of hard disks. Of course, any programmer’s (and system designer’s) dream is to have not only unlimited storage, but also instantaneous access. Though speed has been increasing over the years, seek time and rotational latency are still measured in milliseconds — thousands of times slower than main memory — and dominate the net transfer rate for any hard drive and associated storage software. Toigo states, “Although the capacity of hard-disk drives is surging by 130 percent annually, access rates are increasing by a comparatively tame 40 percent.” [TOIGO] Being a complex mechanical device, access times for hard disks will never approach the speeds capable even in today’s solid-state memory. This is one of the strongest arguments for researching alternative mass storage technologies. Software-level approaches such as caching and the hardware/software solution of RAID help improve the effective speed of magnetic disk storage, but again this can only be a temporary solution.

Less obvious areas in which hard drives may fall behind include overall device size and power management. Though hard drives have definitely shrunk from original models as large as refrigerators down to the 5.25-inch and finally the 3.5-inch form factors, it may not be practical to introduce tiny hard drives suitable for portable devices. Capacity will always be directly in proportion to overall disk size for a given density; small drives will have a relatively low capacity. Also, the moving parts may take a larger fraction of volume away from storage area. Other mechanical concerns include the noise and heat generated by rotating disks, which would not be negligible for portable devices. Another major concern is power management. Disks can definitely power down when not in use, but it takes a noticeable amount of time to spin-up a disk on standby. Users, for a variety of reasons, are often even less patient when using portable devices such as phones and PDAs, so delays caused by spin-up would not be well tolerated. With all the metals and moving parts involved, hard drives are also somewhat heavy. Rotating magnetic disks will probably never make an impact in the small portable consumer device market, where semiconductor memory technologies currently rule.

Recent technological advances are also driving research in several areas. For example, the availability of cheap, miniaturized, mass-produced lasers is renewing interest in advanced optical storage (beyond CD-ROM technology). Advances in miniaturization, in addition to being a major factor in continued improvements in magnetic storage capacity, are providing new opportunities in novel mass-storage systems that work on microscopic scales.

Research in Data Storage Hardware

One cutting-edge storage technology in the labs works as arrays of microscopic devices. Called atomic resolution storage (ARS), it promises to record bits on a surface with nanometer precision. A grid of probes, provided by the latest atomic probe microscopy efforts, would read and write upon domains the size of a mere handful of atoms. [TOIGO] Instead of rotating platters, the medium would likely be a single flat surface that moves laterally, possibly in two dimensions, under the array of tips. A density of more than one terabit per square inch would be possible. With the large array of probes, there is a potential for high bandwidth by using all probes in parallel to read and write entire “pages” of data at once. The entire device would fit in a mobile phone or possibly a watch. It would have far less power consumption than current technologies. Also, it would have zero power consumption when not in use: the device would simply stop and wait for the next instruction. Spin-up time and rotational latency would not exist, and seek time would be small.

A challenging technology that, if successful, may eclipse everything else being researched is called holographic memory. It has actually been researched for decades, but recent improvements in related components such as miniaturized lasers have renewed interest in holographic memory. The many unique properties of holographic storage would result in “trillions of bytes … in a piece of crystalline material the size of a sugar cube” [TOIGO]. Not only would its capacity surpass the full potential of magnetic disk and probably most other mechanical approaches, but holographic memory would also have very high bandwidth and access time, good fault tolerance, and the storage medium would be subject to no mechanical wear. More advantages may be found for specific applications such as multimedia (one is mentioned below).

Holographic memory is an optical storage system. Data are stored as three-dimensional “images” filling the interior of the storage medium. These holograms are created by crossing two lasers, one of which is modulated to encode the data, inside the medium. The intersection of the beams causes a unique interference pattern that is recorded through physical or chemical changes in the medium. To read the data, the unmodulated (reference) beam is aimed at the crystal, and the original encoded beam is produced from the interference pattern. The beam can be read using, for example, CCDs from the digital video industry. When writing the hologram, a sort of LCD screen creates the data pattern in the laser.

This technology has many unique properties. Data is stored in a fully three-dimensional manner, maximizing space efficiency. Each bit, as dispersed by the interference pattern, is actually stored through the entire space [PSALTIS], so minor damage to the recording medium often does not result in any loss of data. Using the two-dimensional pattern on the LCD recording screen, data is written — and read — in entire “pages” at once. These pages may be large — possibly millions of bits. [TOIGO] This results in extra-high throughput. Large numbers of these “pages” may be written into essentially the same space. This is accomplished by changing the incident angle or wavelength of the lasers when reading and writing the hologram. As more pages are written, the “signal” produced when reading a page weakens. Thus, storage density is a function of the precision of the recording equipment and the quality of the medium, with no real upper limit. Seek time will be very small because only a tiny change in the angle of a mirror can bring up completely different pages of data. Using wavelength modulation or advanced laser deflection techniques would provide a completely non-mechanical seek method.

Though there is still a lot of work to do before holographic memory becomes commercially viable, the system has been successfully demonstrated many times. Prototypes have already demonstrated random seek times measured in microseconds, a thousand times faster than today’s hard disks. This speed will only improve. A crystal with 10,000 pages, worth about 100 megabytes of data, was demonstrated. The raw error rate was one bit per million. A unique capability of holographic memory is its “associative” nature, discovered by Dennis Gabor. [PSALTIS] A read operation is done in reverse by using the data beam instead of the reference beam. The hologram emits a pattern of beams that provides quantitative information about the similarity of the data to what is stored in the holographic memory. If the memory stores a specific database of information (images and multimedia, especially), data provided by the user can be compared in a single operation, without reading a single page from the database. This technique was demonstrated with a self-directed vehicle that compared input from its on-board camera to a holographic database of images. The vehicle successfully navigated a building by comparing its location to the holograms. [PSALTIS]

ARS and holographic memory, along with other technologies in development, are certain possibilities, but it is not clear when any of these technologies will become competitive in the market. Until that time, hard disk drives and semiconductor memory will be the dominant means of mass storage. Several novel ideas may help extend the viable life of the magnetic disk. Most of these ideas involve delaying or avoiding SPE by changing the topography of the disk surface or using new materials. However, most of these ideas focus only on maintaining the growth rate in hard disk capacity, and do not address seek time and other problems. Regardless, hard disks will soon begin their decline, and it is quite possible that a radically new technology will open brand new possibilities for storage hardware and software. If we find a technology that can dramatically improve the performance of mass storage, operating system designers will have much greater freedom to explore simplified or brand-new persistence mechanisms.

Demand for New Storage Software Models

Programmers and operating system designers have always wanted a more abstract, simplified model for memory. Ideally, programmers and users should not care how the data is stored, and should not perceive any limitations or complications resulting from a lack of hardware capability. Current operating systems do their best to provide a simplified, abstract storage interface, but the cost is very high complexity within the operating system, and some of the complexity leaks through to the programmer.

In a perfect world, there would not need to be a separation between main memory and secondary memory (permanent mass storage). Ideally, there would be one interface for data persistence that would have the speed of main memory and the capacity of mass storage, and could store any data structure without modification. This is not currently possible for desktop computers (some portable devices can work this way; the Palm OS is described in detail below). Also, no major desktop operating system attempts to provide this type of interface, via encapsulation of the two hardware systems, for the programmer and user. Thus, the demand for this kind of system is still largely unsatisfied.

More specific business problems are influencing research in storage systems, and have produced some successful specialized systems. The need to store and manage ever-increasing volumes of data is no longer just a hardware capacity/performance issue, but a software issue. Operating systems do a good job of storing the data and preventing storage errors, but the management of that data is becoming an increasingly important issue. Many users are asking for file systems that have more built-in capabilities for ensuring data integrity, managing redundancy, and working with distributed systems, as their data sets grow larger and more widely dispersed every day. Charles Foley of Amdahl Corp. says that providing “a single set of data” [DEPOMPA] is a major goal of most businesses. These needs may continue to grow as distributed systems and client-server architectures proliferate.

Another development driving research in file systems, and operating systems in general, is the introduction of new types of computing devices. In the past, there was not much differentiation beyond the mainframe–minicomputer–microcomputer hierarchy. Now, we have desktops, servers, mainframes (again), laptops, PDAs and organizers, mobile phones, portable and set-top media appliances, smart cards, and more. All of these different types of systems clearly have different data storage and management needs. As operating systems are designed for new devices, some new data storage research is tested and brought from the university to the market. Some examples are discussed below.

The common hierarchical file system is only one storage model. Record and database storage systems are alternatives that tend to reduce the work of the application programmer to flatten, store, and rebuild data structures. Additional data management and browsing tools can easily be built into record and database storage systems, at the operating system level, to also make the design more attractive for the user. Another interesting idea is expanding the scope of the file system to abstract the concept of local data and remote data. For example, on an existing UNIX-style file system, access to files on Internet domains could be provided by mounting a special file system at /http [VAHDAT]. This idea has increasing potential as computing and data become more distributed, via the influence of the Internet. Especially applicable with the rise of object-oriented programming, at the other end of the storage model spectrum is throwing out traditional file-based storage altogether, in favor of direct serialization of objects — a sort of built-in version of Java’s object serialization services. Web services, Java’s RMI, and similar technologies may give insight into how to provide services for remote object access at the operating system level as well.

Research in Storage Systems

An exciting possibility being researched now is called grid computing. This paradigm harnesses the full power of the Internet to transform common computing resources — processing and storage — into true utilities that are distributed and managed much like electricity. Distribution centers would coordinate the sale of processor cycles and storage access to clients, while the clients themselves would be providers of these utilities. In this system, distributed storage would be fully abstracted. A file could be stored in fragments across thousands of computing nodes. A distribution center or some other entity would coordinate the secure storage and retrieval of the file. Its contents would surely be encrypted, and fragments would be stored at duplicate locations to allow assembly and retrieval of the file even if a considerable portion of the nodes the file was originally distributed to are down. Clients would need no knowledge of where the file is physically stored. The grid concept cannot be fully realized without faster and more ubiquitous broadband Internet access, because communication among distributed nodes must be fast enough for remote computation to appear to be occurring locally. [FOSTER-I]

Few operating systems are implemented with object-oriented languages. One reason for this is that there are few object-oriented languages that are low-level and compile efficiently. However, there are some object-oriented systems, such as Choices. It provides an extensible programming framework. Sub components of the operating system, including the file system as a whole, can be sub-classed and thus implemented in differing ways. [MADANY] The designers created variations of object-oriented file systems, including ones that conform to UNIX and MS-DOS. An extension of the idea of object-oriented file systems is to represent the files themselves as more conventional objects, including attributes and methods. This could possibly simplify file manipulation when using an object-oriented programming language.

Finally, the Grasshopper operating system represents a completely different approach to storage, called orthogonal persistence. This is discussed in detail below.

Examples

The following two examples highlight some of the successes and problems that result from designing new storage systems. The Palm OS is a success story about an operating system effectively implementing the capabilities of its storage hardware, and vice versa. The Grasshopper operating system reveals the challenges and inefficiencies of our current mass storage hardware, while giving a glimpse of a very bright future, should we obtain good enough hardware.

Palm OS: a Database File System

The Palm Operating System (general credits to [FOSTER-L]) is representative of the unique needs of portable devices. In addition to their small size, these have considerable memory and storage restrictions, not unlike microcomputers and minicomputers before the 1980s. However, Palm and many other portable devices have more modern hardware and software capabilities.

The Palm OS arranges all storage in its equivalent of main memory, which is entirely RAM and ROM hardware such as Flash ROM and standard memory cards. Dynamic memory is small (96KB as of OS version 3), and arranged in a single heap that the operating system and application programs share. The remainder of the hardware’s storage capacity is a single large heap of “storage memory”. Palm devices have no secondary memory; all permanent data is stored directly to RAM in the storage memory. This is a striking difference from the decades-old designs of traditional desktop systems. The benefits of this design include fewer moving parts by choosing a solid-state memory technology, and smaller operating system size and overhead. This solution is scalable for the foreseeable future — the OS uses a logical view of the memory hardware that consists of cards (which may or may not match up to physical memory cards) composing a 32-bit (4 gigabyte) address space. All dynamic and storage memory addresses are accessible using this method.

As well as a simplified model for accessing memory, all concepts of PC file systems were scrapped for a modern system that provides an elegant solution for the Palm’s special needs and uses. The majority of permanent storage — including user data, actual program files, and some preferences — occurs in databases that are customizable record collections. These records can be located anywhere within a single logical card. The Palm OS API provides all functions necessary for managing databases, and requires that applications manage all storage memory through the API methods [FOSTER-L 23]. When a record is opened for editing, it is opened, locked, read, and written on its original location in storage memory. No copy of the data is stored in dynamic memory. Similarly, there is no need for a full-fledged paged or segmented virtual memory system — all permanent storage is equally accessible. This is another example of the streamlined approach taken by the Palm OS storage system. It is effective because of the little need for concurrent file access, which is the case for two reasons: PDA users do not run many applications at once, and systems often limit the amount of multitasking anyway. The Palm OS (as of version 3.5) limits the user to one single-threaded GUI application at a time (background and system programs may also be running).

Most user interaction with a database is via the database’s application program. The lesser degree of direct file management is by design, for many reasons: (1) handheld devices are not well-suited and not used for heavy data entry, and (2) because storage space is premium on most portable devices, most data is application-specific and currently useful (not archived). A full-featured file/database browser packaged with the operating system, analogous to Windows Explorer, would be a space-consuming and unnecessary feature. Of course, the user needs some capabilities to browse the device. The Palm main menu is such an example; it can simply search the resource databases for known programs. A unique feature of the Palm OS is its Find utility. This is a system dialog box asking for a simple keyword input. When submitted, the OS will invoke each application with special startup parameters. If the application supports the Find utility, it can search its database (using its own implementation) and return any matching records. The OS handles displaying the matches. [FOSTER-L 458] As another useful feature, the Palm OS will track changes in application databases, facilitating synchronization of data with the user’s PC. [KAZMIERCZAK]

The Palm OS is a good example of older, simpler file system concepts implemented in modern ways that are effective for the fast-growing mass-produced portable consumer device market. Other small devices are increasingly finding the need for simple operating systems with effective storage systems, and the Palm OS has set a good precedent of an elegant implementation. Mobile phones, for example, are moving closer to PDAs as they include more third-party applications (often written in Java) and more varied data storage such as photos, contact lists, and organizational data. These devices could successfully adopt the Palm strategy because such an operating system provides a simple, foolproof low-level interface by simplifying the hardware and raw data organization, and provides a secure, functional application interface that has a low number of highly functional features. In general, this is something for which all operating system designers should aim.

Grasshopper: Orthogonal Persistence

Orthogonal persistence is a powerful idea for the operating system designer, application programmer, and the end user. The term describes the ability of a system to allow applications to make no extra effort to store their data structures for as long as needed, and to ensure that the data structures are internally and mutually consistent. For users, this means their data is always safe, up to the minute. In the event of a problem such as power loss, successful orthogonal persistence means that the user’s data (and perhaps the very state his or her applications) is all up to date and consistent, with as little time lost as possible. A system fully supporting orthogonal persistence will also provide some built-in mechanisms for auditing and categorizing data — for example, distinguishing among “‘stable’, ‘development’ and ‘backup’ releases” [TUNES] of a document.

For the operating system designer, orthogonal persistence is currently a highly complex challenge: it is best approached by designing a new system from the ground up [DEARLE-2]. This is because the gigantic difference in performance and capacity between memory and mass storage hardware on most systems means the persistent operating system must maintain and synchronize in-memory and on-disk versions of objects in use (compare this to the Palm OS), and it must do so efficiently. To completely hide these problems and provide a consistent abstraction of a single persistent store to the application and user is a significant challenge that has not been widely attempted outside of academia. The true power of orthogonal persistence will be realized if we develop a mass storage technology that is fast enough to replace primary memory. Then, the hardware architecture would match the software architecture, creating a truly effective system.

Despite the current difficulties, there has been plenty of research in this area, presenting a handful of thoroughly detailed operating system designs. Grasshopper and TUNES are two such systems. Grasshopper was built with orthogonal persistence a design goal from the beginning. Consequently, its structure consists of terms that are probably not familiar to most: containers, loci and capabilities [DEARLE-1]. Capabilities is the most familiar term; it refers to the security model adopted by Grasshopper. Containers and loci, though, are the heart of its persistence mechanism. Containers provide a single abstract interface for all referencing environments. A process runs in the context of a container, and all data structures, whether temporary variables existing for milliseconds or years-old database data, are manipulated as members of the container. Containers are abstract and flexible — they may be as large as needed, even larger than 32-bit virtual address spaces. Loci are the active objects in Grasshopper — processes and threads. The designers of Grasshopper go into much greater detail, but with the description above, the simplicity and flexibility of the operating system’s storage interface is clear.

Underneath this interface, Grasshopper manages container mappings, manages locus execution, and implements the persistent storage mechanism itself. The primary means of this implementation is through managers. Managers control the flow of data from the low-level hardware representation to the data’s final mapping through a container. On today’s machines, this includes movement of data between memory and disk, and synchronization of those copies. Specifically, managers implement stability algorithms to ensure data integrity and synchronization. They also manage data representation tasks related to distributed computing. [DEARLE-2]

Conclusion

Heavy research is taking place in all levels of data storage management. There are many exciting possibilities on the horizon, but the hard disk drive and hierarchical file system are highly tuned systems that will continue to dominate, at least in desktop PCs, for the near future. New technologies and file system designs are gaining in specialty markets, such as the database file system of the Palm OS for handhelds. Eventually, a radically different storage technology such as holographic memory or atomic resolution storage will enter the market via fringe applications, and one of them will eventually take over the mainstream market as hard disks reach their inevitable limitations. As new technologies saturate the market, it is likely that new operating systems harnessing the power of the new technologies, perhaps descendents of Grasshopper and TUNES, will see success, if not widespread popularity.

Works Cited