<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="../../resources/paper-xhtml.xsl"?>

<!DOCTYPE project>

<project 
 title="Current Research on Data Storage Hardware and Software"
 course="CMP 431, Alfred University, Spring 2004"
 subject="Operating Systems"
 author="Mike Mertsock"
 authoremail="mjm2@alfred.edu" 
 deadline="May 2004"
>

<properties>
	<history date="April 18 2004">Created XML file, chose general topic</history>
	<history date="May 1 2004">First high-level outline skeleton done</history>
	<history date="May 4 2004">Lots of research done, came up with the the Case Studies olitem</history>
	<history date="May 5 2004">o-intro done, most sources/notes done, outline fleshed out more</history>
	<history date="May 7 2004">Final draft 1.0 complete</history>
</properties>

<outlineroot>

<olitem id="o-intro"><!-- o-intro rev 1 spellchecked/grammer checked 5/6 1:01. NOT hand-reviewed -->
	<olitem title="Introduction">
		<!-- here's the topic, and a hook too -->
		<p>Data persistence is a problem that is solved only by coordinated and complex interactions at every level of computing. Users need to decide what data to create or use. A significant portion of application program code is devoted to manipulating data in a friendly user interface; the same program also needs to implement storage and retrieval of the data in a domain-specific manner. Operating systems in turn abstract the underlying details in order to present the program with a logical view of storage, while device drivers and the operating system kernel must deal with file system- and hardware-specific implementations of raw storage. Finally, hardware providers impact (and are influenced by) the evolution of data storage designs.</p>
		<!-- what's my thesis? -->
		<resourceref resid="n-thesis" resolved="true">
		<p>There is an increasing demand for new software and hardware data storage mechanisms. This demand comes from two directions - a growing performance gap between aging storage hardware technology and processor/memory speed, and fundamental changes in the way users and applications manage data. Strong current research in new file system and storage hardware technology is introducing many promising new ideas, some of which may be the seeds of a radically different future for data storage implementation in operating systems.</p>
		</resourceref>
		<!-- how will this paper work? -->
		<p>The past decade has seen large amounts of research at all levels of this hierarchy, caused both by growing demand for better storage solutions and expanding opportunities in hardware and software technology. After considering these forces and reviewing research for both hardware and software, we will use examples of real and proposed systems to expand our understanding of data storage trends. Specifically, each section will focus on topics relevant to operating system design: storage hardware technology and interfaces, file systems and their alternatives, and the operating system's <acronym title="application program interface">API</acronym>.</p>
	</olitem>
</olitem>

<olitem id="o-body">

	<!-- o-need4tech rev 2 spellchecked/grammer checked 5/6 21:00. NOT hand-reviewed -->
	<olitem id="o-need4tech" title="Demand for New Storage Hardware Technology">
		<p>Though venerable hard disk technology is still advancing rapidly, many factors are creating a strong demand for research into alternative storage technology or more radical changes in magnetic disk design.</p>
		
		<resourceref resid="n-datacrunchcause">
		<p><cite srcid="TOIGO">A serious threat to continued growth in magnetic disk capacity is called the superparamagnetic effect (SPE). As the density of the bits stored on the surface of a magnetic disk grows, the domain (physical location) of each bit must become smaller and closer to other domains. With the current method of magnetically arranging the particles within a domain to represent a bit, a smaller domain means less physical matter to represent the magnetic charge. At some point, the density will become so great and the domains so small that thermal energy and other noise will overpower the magnetic charge written by the disk write head, and the bits of data will suffer randomization.</cite> Thus, there is a specific limiting density that disks can achieve. Though this number has been adjusted over the years, current magnetic disk technology, no matter how well we can miniaturize the components, will reach a practical limit imposed by SPE. History has shown that demand for more storage capacity will not recede, so we will soon (in probably less than a decade) need an entirely new storage technology to pick up where today's magnetic disk technology will fail.</p>
		
		<p>Other trends are pushing the limits of hard disks. Of course, any programmer's (and system designer's) dream is to have not only unlimited storage, but also instantaneous access. Though speed has been increasing over the years, seek time and rotational latency are still measured in milliseconds - thousands of times slower than main memory - and dominate the net transfer rate for any hard drive and associated storage software. <cite srcid="TOIGO">Toigo states, <quote>Although the capacity of hard-disk drives is surging by 130 percent annually, access rates are increasing by a comparatively tame 40 percent.</quote></cite> Being a complex mechanical device, access times for hard disks will never approach the speeds capable even in today's solid-state memory. This is one of the strongest arguments for researching alternative mass storage technologies. Software-level approaches such as caching and the hardware/software solution of RAID help improve the effective speed of magnetic disk storage, but again this can only be a temporary solution.</p>
		</resourceref>
		
		<p>Less obvious areas in which hard drives may fall behind include overall device size and power management. Though hard drives have definitely shrunk from original models as large as refrigerators down to the 5.25-inch and finally the 3.5-inch form factors, it may not be practical to introduce tiny hard drives suitable for portable devices. Capacity will always be directly in proportion to overall disk size for a given density; small drives will have a relatively low capacity. Also, the moving parts may take a larger fraction of volume away from storage area. Other mechanical concerns include the noise and heat generated by rotating disks, which would not be negligible for portable devices. Another major concern is power management. Disks can definitely power down when not in use, but it takes a noticeable amount of time to spin-up a disk on standby. Users, for a variety of reasons, are often even less patient when using portable devices such as phones and PDAs, so delays caused by spin-up would not be well tolerated. With all the metals and moving parts involved, hard drives are also somewhat heavy. Rotating magnetic disks will probably never make an impact in the small portable consumer device market, where semiconductor memory technologies currently rule.</p>

		<!-- now we can segue to the 'supply-side' research end of new tech - what kind of opportunities in technology are driving research (as opposed to...what research is beign done? - thats the next olitem) -->

		<p>Recent technological advances are also driving research in several areas. For example, the availability of cheap, miniaturized, mass-produced lasers is renewing interest in advanced optical storage (beyond CD-ROM technology). Advances in miniaturization, in addition to being a major factor in continued improvements in magnetic storage capacity, are providing new opportunities in novel mass-storage systems that work on microscopic scales.</p>
	</olitem>
	
	<!-- o-hwresearch rev 1 spellchecked/grammer checked 5/6 23:04. NOT hand-reviewed -->
	<olitem id="o-hwresearch" title="Research in Data Storage Hardware">
		<resourceref resid="n-ars" resolved="true">
		<p>One cutting-edge storage technology in the labs works as arrays of microscopic devices. Called atomic resolution storage (ARS), it promises to record bits on a surface with nanometer precision. <cite srcid="TOIGO">A grid of probes, provided by the latest atomic probe microscopy efforts, would read and write upon domains the size of a mere handful of atoms.</cite> Instead of rotating platters, the medium would likely be a single flat surface that moves laterally, possibly in two dimensions, under the array of tips. A density of more than one terabit per square inch would be possible. With the large array of probes, there is a potential for high bandwidth by using all probes in parallel to read and write entire "pages" of data at once. The entire device would fit in a mobile phone or possibly a watch. It would have far less power consumption than current technologies. Also, it would have zero power consumption when not in use: the device would simply stop and wait for the next instruction. Spin-up time and rotational latency would not exist, and seek time would be small.</p>
		</resourceref>

		<resourceref resid="n-holomem-dc" resolved="true">
		<p>A challenging technology that, if successful, may eclipse everything else being researched is called holographic memory. It has actually been researched for decades, but recent improvements in related components such as miniaturized lasers have renewed interest in holographic memory. The many unique properties of holographic storage would result in <cite srcid="TOIGO"><quote>trillions of bytes ... in a piece of crystalline material the size of a sugar cube</quote></cite>. Not only would its capacity surpass the full potential of magnetic disk and probably most other mechanical approaches, but holographic memory would also have very high bandwidth and access time, good fault tolerance, and the storage medium would be subject to no mechanical wear. More advantages may be found for specific applications such as multimedia (one is mentioned below).</p>
		<p>Holographic memory is an optical storage system. Data is stored as three-dimensional 'images' filling the interior of the storage medium. These holograms are created by crossing two lasers, one of which is modulated to encode the data, inside the medium. The intersection of the beams causes a unique interference pattern that is recorded through physical or chemical changes in the medium. To read the data, the unmodulated (reference) beam is aimed at the crystal, and the original encoded beam is produced from the interference pattern. The beam can be read using, for example, <acronym title="Charge-Coupled Devices">CCDs</acronym> from the digital video industry. When writing the hologram, a sort of LCD screen creates the data pattern in the laser.</p>
		<p>This technology has many unique properties. Data is stored in a fully three-dimensional manner, maximizing space efficiency. <cite srcid="PSALTIS">Each bit, as dispersed by the interference pattern, is actually stored through the entire space</cite>, so minor damage to the recording medium often does not result in any loss of data. <cite srcid="TOIGO">Using the two-dimensional pattern on the LCD recording screen, data is written - and read - in entire "pages" at once. These pages may be large - possibly millions of bits.</cite> This results in extra-high throughput. Large numbers of these "pages" may be written into essentially the same space. This is accomplished by changing the incident angle or wavelength of the lasers when reading and writing the hologram. As more pages are written, the "signal" produced when reading a page weakens. Thus, storage density is a function of the precision of the recording equipment and the quality of the medium, with no real upper limit. Seek time will be very small because only a tiny change in the angle of a mirror can bring up completely different pages of data. Using wavelength modulation or advanced laser deflection techniques would provide a completely non-mechanical seek method.</p>
		</resourceref>
		
		<resourceref resid="n-holomem-other" resolved="false">
		<p>Though there is still a lot of work to do before holographic memory becomes commercially viable, the system has been successfully demonstrated many times. Prototypes have already demonstrated random seek times measured in microseconds, a thousand times faster than today's hard disks. This speed will only improve. <cite srcid="PSALTIS">A crystal with 10,000 pages, worth about 100 megabytes of data, was demonstrated. The raw error rate was one bit per million. A unique capability of holographic memory is its "associative" nature, discovered by Dennis Gabor.</cite> A read operation is done in reverse by using the data beam instead of the reference beam. The hologram emits a pattern of beams that provides quantitative information about the similarity of the data to what is stored in the holographic memory. If the memory stores a specific database of information (images and multimedia, especially), data provided by the user can be compared in a single operation, without reading a single page from the database. <cite srcid="PSALTIS">This technique was demonstrated with a self-directed vehicle that compared input from its on-board camera to a holographic database of images. The vehicle successfully navigated a building by comparing its location to the holograms.</cite></p>
		</resourceref>
		
		<p>ARS and holographic memory, along with other technologies in development, are certain possibilities, but it is not clear when any of these technologies will become competitive in the market. Until that time, hard disk drives and semiconductor memory will be the dominant means of mass storage. Several novel ideas may help extend the viable life of the magnetic disk. Most of these ideas involve delaying or avoiding SPE by changing the topography of the disk surface or using new materials. However, most of these ideas focus only on maintaining the growth rate in hard disk capacity, and do not address seek time and other problems. Regardless, hard disks will soon begin their decline, and it is quite possible that a radically new technology will open brand new possibilities for storage hardware and software. If we find a technology that can dramatically improve the performance of mass storage, operating system designers will have much greater freedom to explore simplified or brand-new persistence mechanisms.</p>
	</olitem>
	
	<olitem id="o-persistDemand" title="Demand for New Storage Software Models">
		<p>Programmers and operating system designers have always wanted a more abstract, simplified model for memory. Ideally, programmers and users should not care how the data is stored, and should not perceive any limitations or complications resulting from a lack of hardware capability. Current operating systems do their best to provide a simplified, abstract storage interface, but the cost is very high complexity within the operating system, and some of the complexity leaks through to the programmer.</p>
		<p>In a perfect world, there would not need to be a separation between main memory and secondary memory (permanent mass storage). Ideally, there would be one interface for data persistence that would have the speed of main memory and the capacity of mass storage, and could store any data structure without modification. This is not currently possible for desktop computers (some portable devices can work this way; the Palm OS is described in detail below). Also, no major desktop operating system attempts to provide this type of interface, via encapsulation of the two hardware systems, for the programmer and user. Thus, the demand for this kind of system is still largely unsatisfied.</p>
		<resourceref resid="n-SDSdemand" resolved="true">
		<p>More specific business problems are influencing research in storage systems, and have produced some successful specialized systems. The need to store and manage ever-increasing volumes of data is no longer just a hardware capacity/performance issue, but a software issue. Operating systems do a good job of storing the data and preventing storage errors, but the <em>management</em> of that data is becoming an increasingly important issue. Many users are asking for file systems that have more built-in capabilities for ensuring data integrity, managing redundancy, and working with distributed systems, as their data sets grow larger and more widely dispersed every day. <cite srcid="DEPOMPA">Charles Foley of Amdahl Corp. says that providing <quote>a single set of data</quote></cite> is a major goal of most businesses. These needs may continue to grow as distributed systems and client-server architectures proliferate.</p>
		</resourceref>
		<p>Another development driving research in file systems, and operating systems in general, is the introduction of new types of computing devices. In the past, there was not much differentiation beyond the mainframe - minicomputer - microcomputer hierarchy. Now, we have desktops, servers, mainframes (again), laptops, PDAs and organizers, mobile phones, portable and set-top media appliances, smart cards, and more. All of these different types of systems clearly have different data storage and management needs. As operating systems are designed for new devices, some new data storage research is tested and brought from the university to the market. Some examples are discussed below.</p>
		<!-- data management - a "single view of data" - reduction of redunancy, coordinated management of distributed data, etc., are asking for a file system that can handle these needs. -->

		<!-- remember: talk about demand for new tech, and the 'supply-side' research end of new tech - what kind of opportunities in technology are driving research (as opposed to...what research is beign done? - thats the next olitem) -->
		<p>The common hierarchical file system is only one storage model. Record and database storage systems are alternatives that tend to reduce the work of the application programmer to flatten, store, and rebuild data structures. Additional data management and browsing tools can easily be built into record and database storage systems, at the operating system level, to also make the design more attractive for the user. <cite srcid="VAHDAT">Another interesting idea is expanding the scope of the file system to abstract the concept of local data and remote data. For example, on an existing UNIX-style file system, access to files on Internet domains could be provided by mounting a special file system at <term>/http</term></cite>. This idea has increasing potential as computing and data become more distributed, via the influence of the Internet. Especially applicable with the rise of object-oriented programming, at the other end of the storage model spectrum is throwing out traditional file-based storage altogether, in favor of direct serialization of objects - a sort of built-in version of Java's object serialization services. Web services, Java's <acronym title="Remote Method Invocation">RMI</acronym>, and similar technologies may give insight into how to provide services for remote object access at the operating system level as well.</p>
	</olitem>
	
	<olitem id="o-persistResearch" title="Research in Storage Systems">
		<p><cite srcid="FOSTER I">An exciting possibility being researched now is called grid computing. This paradigm harnesses the full power of the Internet to transform common computing resources - processing and storage - into true utilities that are distributed and managed much like electricity. Distribution centers would coordinate the sale of processor cycles and storage access to clients, while the clients themselves would be providers of these utilities. In this system, distributed storage would be fully abstracted. A file could be stored in fragments across thousands of computing nodes. A distribution center or some other entity would coordinate the secure storage and retrieval of the file. Its contents would surely be encrypted, and fragments would be stored at duplicate locations to allow assembly and retrieval of the file even if a considerable portion of the nodes the file was originally distributed to are down. Clients would need no knowledge of where the file is physically stored. The grid concept cannot be fully realized without faster and more ubiquitous broadband Internet access, because communication among distributed nodes must be fast enough for remote computation to appear to be occurring locally.</cite></p>
		<resourceref resid="n-choices" resolved="true">
		<p>Few operating systems are implemented with object-oriented languages. One reason for this is that there are few object-oriented languages that are low-level and compile efficiently. However, there are some object-oriented systems, such as Choices. <cite srcid="MADANY">It provides an extensible programming framework. Sub components of the operating system, including the file system as a whole, can be sub-classed and thus implemented in differing ways.</cite> The designers created variations of object-oriented file systems, including ones that conform to UNIX and MS-DOS. An extension of the idea of object-oriented file systems is to represent the files themselves as more conventional objects, including attributes and methods. This could possibly simplify file manipulation when using an object-oriented programming language.</p>
		</resourceref>
		<p>Finally, the Grasshopper operating system represents a completely different approach to storage, called orthogonal persistence. This is discussed in detail below.</p>
	</olitem>
	
	<olitem id="o-examples" title="Examples">
	
		<!-- <olitem id="o-examplesintro"> -->
		<p>The following two examples highlight some of the successes and problems that result from designing new storage systems. The Palm OS is a success story about an operating system effectively implementing the capabilities of its storage hardware, and vice versa. The Grasshopper operating system reveals the challenges and inefficiencies of our current mass storage hardware, while giving a glimpse of a very bright future, should we obtain good enough hardware.</p>
		<!-- </olitem> -->
	
		<olitem title="Palm OS: a Database File System">
		<p>The Palm Operating System (<cite srcid="FOSTER L">general credits to </cite>) is representative of the unique needs of portable devices. In addition to their small size, these have considerable memory and storage restrictions, not unlike microcomputers and minicomputers before the 1980s. However, Palm and many other portable devices have more modern hardware and software capabilities.</p>
		
		<p>The Palm OS arranges all storage in its equivalent of main memory, which is entirely RAM and ROM hardware such as Flash ROM and standard memory cards. Dynamic memory is small (96KB as of OS version 3), and arranged in a single heap that the operating system and application programs share. The remainder of the hardware's storage capacity is a single large heap of "storage memory". Palm devices have no secondary memory; all permanent data is stored directly to RAM in the storage memory. This is a striking difference from the decades-old designs of traditional desktop systems. The benefits of this design include fewer moving parts by choosing a solid-state memory technology, and smaller operating system size and overhead. This solution is scalable for the foreseeable future - the OS uses a logical view of the memory hardware that consists of cards (which may or may not match up to physical memory cards) composing a 32-bit (4 gigabyte) address space. All dynamic and storage memory addresses are accessible using this method.</p>
		
		<p>As well as a simplified model for accessing memory, all concepts of PC file systems were scrapped for a modern system that provides an elegant solution for the Palm's special needs and uses. The majority of permanent storage - including user data, actual program files, and some preferences - occurs in databases that are customizable record collections. These records can be located anywhere within a single logical card. The Palm OS API provides all functions necessary for managing databases, and <cite srcid="FOSTER L" pages="23">requires that applications manage all storage memory through the API methods</cite>. When a record is opened for editing, it is opened, locked, read, and written on its original location in storage memory. No copy of the data is stored in dynamic memory. Similarly, there is no need for a full-fledged paged or segmented virtual memory system - all permanent storage is equally accessible. This is another example of the streamlined approach taken by the Palm OS storage system. It is effective because of the little need for concurrent file access, which is the case for two reasons: PDA users do not run many applications at once, and systems often limit the amount of multitasking anyway. The Palm OS (as of version 3.5) limits the user to one single-threaded GUI application at a time (background and system programs may also be running).</p>
		
		<p>Most user interaction with a database is via the database's application program. The lesser degree of direct file management is by design, for many reasons: (1) handheld devices are not well-suited and not used for heavy data entry, and (2) because storage space is premium on most portable devices, most data is application-specific and currently useful (not archived). A full-featured file/database browser packaged with the operating system, analogous to Windows Explorer, would be a space-consuming and unnecessary feature. Of course, the user needs some capabilities to browse the device. The Palm main menu is such an example; it can simply search the resource databases for known programs. <cite srcid="FOSTER L" pages="458">A unique feature of the Palm OS is its <term>Find</term> utility. This is a system dialog box asking for a simple keyword input. When submitted, the OS will invoke each application with special startup parameters. If the application supports the <term>Find</term> utility, it can search its database (using its own implementation) and return any matching records. The OS handles displaying the matches.</cite> <cite srcid="KAZMIERCZAK">As another useful feature, the Palm OS will track changes in application databases, facilitating synchronization of data with the user's PC.</cite> <!-- http://cs.alfred.edu/~kazmiekr/palm/palm_data_manager.html accessed 05/07/2004 01:34:28 --></p>
		
		<p>The Palm OS is a good example of older, simpler file system concepts implemented in modern ways that are effective for the fast-growing mass-produced portable consumer device market. Other small devices are increasingly finding the need for simple operating systems with effective storage systems, and the Palm OS has set a good precedent of an elegant implementation. Mobile phones, for example, are moving closer to PDAs as they include more third-party applications (often written in Java) and more varied data storage such as photos, contact lists, and organizational data. These devices could successfully adopt the Palm strategy because such an operating system provides a simple, foolproof low-level interface by simplifying the hardware and raw data organization, and provides a secure, functional application interface that has a low number of highly functional features. In general, this is something for which all operating system designers should aim.</p>
		</olitem>
		
		<!-- One or two theoretical OS things - direct serialization, alternative file systems, etc. -->
		<olitem title="Grasshopper: Orthogonal Persistence">
		<p>Orthogonal persistence is a powerful idea for the operating system designer, application programmer, and the end user. The term describes the ability of a system to allow applications to make no extra effort to store their data structures for as long as needed, and to ensure that the data structures are internally and mutually consistent. For users, this means their data is always safe, up to the minute. In the event of a problem such as power loss, successful orthogonal persistence means that the user's data (and perhaps the very state his or her applications) is all up to date and consistent, with as little time lost as possible. A system fully supporting orthogonal persistence will also provide some built-in mechanisms for auditing and categorizing data - for example, distinguishing among <cite srcid="TUNES"><quote>'stable', 'development' and 'backup' releases</quote></cite> of a document.</p>
		<resourceref resid="n-gs-intro" resolved="true">
		<p>For the operating system designer, orthogonal persistence is currently a highly complex challenge: <cite srcid="DEARLE 2">it is best approached by designing a new system from the ground up</cite>. This is because the gigantic difference in performance and capacity between memory and mass storage hardware on most systems means the persistent operating system must maintain and synchronize in-memory and on-disk versions of objects in use (compare this to the Palm OS), and it must do so efficiently. To completely hide these problems and provide a consistent abstraction of a single persistent store to the application and user is a significant challenge that has not been widely attempted outside of academia. The true power of orthogonal persistence will be realized if we develop a mass storage technology that is fast enough to replace primary memory. Then, the hardware architecture would match the software architecture, creating a truly effective system.</p>
		</resourceref>
		<resourceref resid="n-gs-more" resolved="true">
		<p>Despite the current difficulties, there has been plenty of research in this area, presenting a handful of thoroughly detailed operating system designs. Grasshopper and <acronym title="TUNES is a Useful, Nevertheless Expedient, System">TUNES</acronym> are two such systems. Grasshopper was built with orthogonal persistence a design goal from the beginning. Consequently, its structure consists of terms that are probably not familiar to most: <cite srcid="DEARLE 1">containers, loci and capabilities</cite>. Capabilities is the most familiar term; it refers to the security model adopted by Grasshopper. Containers and loci, though, are the heart of its persistence mechanism. Containers provide a single abstract interface for all referencing environments. A process runs in the context of a container, and all data structures, whether temporary variables existing for milliseconds or years-old database data, are manipulated as members of the container. Containers are abstract and flexible - they may be as large as needed, even larger than 32-bit virtual address spaces. Loci are the active objects in Grasshopper - processes and threads. The designers of Grasshopper go into much greater detail, but with the description above, the simplicity and flexibility of the operating system's storage interface is clear.</p>
		<p>Underneath this interface, Grasshopper manages container mappings, manages locus execution, and implements the persistent storage mechanism itself. The primary means of this implementation is through managers. <cite srcid="DEARLE 2">Managers control the flow of data from the low-level hardware representation to the data's final mapping through a container. On today's machines, this includes movement of data between memory and disk, and synchronization of those copies. Specifically, managers implement stability algorithms to ensure data integrity and synchronization. They also manage data representation tasks related to distributed computing.</cite></p>
		</resourceref>
		</olitem>
		
	</olitem><!-- end of o-examples -->
	
</olitem><!-- end of o-body -->

<olitem id="o-conclusion">
	<olitem title="Conclusion">
		<!-- summary w/outlook of future, pros and cons of changes in this stuff in general -->
		Heavy research is taking place in all levels of data storage management. There are many exciting possibilities on the horizon, but the hard disk drive and hierarchical file system are highly tuned systems that will continue to dominate, at least in desktop PCs, for the near future. New technologies and file system designs are gaining in specialty markets, such as the database file system of the Palm OS for handhelds. Eventually, a radically different storage technology such as holographic memory or atomic resolution storage will enter the market via fringe applications, and one of them will eventually take over the mainstream market as hard disks reach their inevitable limitations. As new technologies saturate the market, it is likely that new operating systems harnessing the power of the new technologies, perhaps descendents of Grasshopper and TUNES, will see success, if not widespread popularity.
	</olitem>
</olitem>

</outlineroot>

<!-- <appendices>
	<citing/> --><!-- provide attributes or elements here for info that can't be determined from other 
	   elements. (like the project start tag). displays as a 'citing this paper' section or something. -->
<!-- </appendices> -->

<notes>

	<note id="n-thesis" timestamp="05/01/2004 12:17:00">
	<!-- Thesis try number 1 revision 2 -->
	There is an increasing demand for new software and hardware data storage mechanisms. This demand comes from two directions - a growing performance gap between aging storage hardware technology and processor/memory speed, and fundamental changes in the way users and applications manage data. Strong current research in new file system and storage hardware technology is introducting many promising new ideas, some of which may be the seeds of a radically different future for data storage implementation in operating systems.
	</note>
	
	<note id="n-SDSdemand" timestamp="05/02/2004 21:28:38" sourceref="DEPOMPA">
	Shared data storage will grow
	  because it is inherent to the nature of client/server systems and
	  distributed computing.
	
	Why share data? Because the concept is driven by the very nature of
	distributed, client-server computing, where information is stored in so many
	disparate locations. Even though many organizations are placing servers in a
	central data center, almost all would like their storage subsystems to be
	centrally managed, monitored, and controlled. These businesses want to ensure
	that data stored in those subsystems is never lost or corrupted, especially
	when it's needed for data warehousing, data mining, and other advanced
	analytical applications.
	
	Charles Foley, VP of open enterprise storage at Amdahl Corp. in Sunnyvale,
	Calif., says that providing "a single set of data" is the goal of most
	businesses today. But Foley adds that it will take a concerted effort on the
	part of storage hardware makers, database manufacturers, and operating systems
	developers to provide the right interfaces and verification services to allow
	users to share data without losing or corrupting data.
	</note>
	
	<note id="n-datacrunchcause" timestamp="05/04/2004 21:00:00" sourceref="TOIGO">
	SEE ALSO datacrunch.gif
	
	"Many corporations find that the volume of data generated by their computers doubles every year. Gargantuan databases containing more than a terabyte-that is, one trillion bytes-are becoming the norm as companies begin to keep more and more of their data on-line, stored on hard-disk drives, where the information can be accessed readily."

	"In the coming years the technology could reach a limit imposed by the superparamagnetic effect, or SPE. Simply described, SPE is a physical phenomenon that occurs in data storage when the energy that holds the magnetic spin in the atoms making up a bit (either a 0 or 1) becomes comparable to the ambient thermal energy. When that happens, bits become subject to random "flipping" between 0's and 1's, corrupting the information they represent." ... "With the current pace of miniaturization, some experts believe the industry could hit the SPE wall as early as 2005." --- becuase this was a 2000 article, remove the "2005" and just use "within a few years"

	"But storage capacity is not the only issue. Indeed, the rate with which data can be accessed is becoming an important factor that may also determine the useful life span of magnetic disk-drive technology. Although the capacity of hard-disk drives is surging by 130 percent annually, access rates are increasing by a comparatively tame 40 percent."
	</note>
	
	<note id="n-holomem-dc" timestamp="05/04/2004 21:00:00" sourceref="TOIGO">
	"For nearly four decades, holographic memory has been the great white whale of technology research. Despite enormous expenditures, a complete, general-purpose system that could be sold commercially continues to elude industrial and academic researchers. Nevertheless, they continue to pursue the technology aggressively because of its staggering promise."
	
	"Theoretical projections suggest that it will eventually be possible to use holographic techniques to store trillions of bytes--an amount of information corresponding to the contents of millions of books--in a piece of crystalline material the size of a sugar cube or a standard CD platter. Moreover, holographic technologies permit retrieval of stored data at speeds not possible with magnetic methods. In short, no other storage technology under development can match holography's capacity and speed potential."
	
	"An important one is the storage and retrieval of entire pages of data at one time. These pages might contain thousands or even millions of bits. Each of these pages of data is stored in the form of an optical-interference pattern within a photosensitive crystal or polymer material. The pages are written into the material, one after another, using two laser beams. One of them, known as the object or signal beam, is imprinted with the page of data to be stored when it shines through a liquid-crystal-like screen known as a spatial-light modulator. The screen displays the page of data as a pattern of clear and opaque squares that resembles a crossword puzzle."
	
	about content-based storage/retrieval stuff (low level optimization for 
	data mining, etc.):
	After data are stored to a holographic medium, a single desired data page can be projected that will reconstruct all reference beams for similarly patterned data stored in the media. The intensity of each reference beam indicates the degree to which the corresponding stored data pattern matches the desired data page. "Today we search for data on a disk by its sector address, not by the content of the data," Coufal explains. "We go to an address and bring information in and compare it with other patterns. With holographic storage, you could compare data optically without ever having to retrieve it. When searching large databases, you would be immediately directed to the best matches."
	</note>
	
	<note id="n-ars" timestamp="05/04/2004 21:00:00" sourceref="TOIGO">
	ARS - atomic resolution storage
	
	"Chuck Morehouse, director of Hewlett-Packard's Information Storage Technology Lab in Palo Alto, Calif., is quick to point out that atomic resolution storage (ARS) will probably never completely replace rotational magnetic storage. Existing hard-disk drives and drive arrays play well in desktops and data centers where device size is not a major issue. But what about the requirements for mass storage on a wristwatch or in a spacecraft, where form factor, mass and power consumption are overriding criteria?"
	
	"The ARS program at Hewlett-Packard (HP) aims to provide a thumbnail-size device with storage densities greater than one terabit (1,000 gigabits) per square inch. The technology builds on advances in atomic probe microscopy, in which a probe tip as small as a single atom scans the surface of a material to produce images accurate within a few nanometers. Probe storage technology would employ an array of atom-size probe tips to read and write data to spots on the storage medium. A micromover would position the medium relative to the tips."

	Morehouse says "One key advantage is low power consumption. When ARS is not being asked to perform an operation, it has no power consumption."
	</note>
	
	<note id="n-holomem-other" timestamp="05/04/2004 22:42:16" sourceref="PSALTIS">
	from the holomem.htm paper
	
	"The main advantages of holographic storage--high density and speed--come, from three-dimensional recording and from the simultaneous readout of an entire page of data at one time. Uniquely, holographic memories store each bit as an interference pattern throughout the entire volume of the medium."
	
	holo mem is especially well-suited for multimedia - the (already low) error rate is of little consequence, and the unique property of storing/retrieving entire pages is a lucrative performance factor. even the basic physical mechanisms borrow from related hardware - a variation of an LCD screen helps to write each page, and the hologram reader uses CCD technology found in digital cameras.
	
	Before such a "super CD" becomes a commercial reality, holographic memories may be used in specialized, high-speed systems. Some might exploit the associative nature of holographic storage, a feature first expounded on in 1969 by Dennis Gabor, who was awarded the 1971 Nobel Prize for Physics for the invention of holography. 
	
	Given a hologram, either one of the two beams that interfered to create it can be used to reconstruct the other. What this means, in a holographic memory, is that it is possible not only to orient a reference beam into the crystal at a certain angle to select an individual holographic page but also to accomplish the reverse. Illuminating a crystal with one of the stored images gives rise to an approximation of the associated reference beam, reproduced as a plane wave emanating from the crystal at the appropriate angle. 
	
	A lens can focus this wave to a small spot whose lateral position is determined by the angle and therefore reveals the identity of the input image. If the crystal is illuminated with a hologram that is not among the stored patterns, multiple reference beams--and therefore multiple focused spots, are the result. The brightness of each spot is proportional to the degree of similarity between the input image and each of the stored patterns. In other words, the array of spots is an encoding of the input image, in terms of its similarity with the stored database of images.
	
	Earlier this year at Caltech, Pu, Robert Denkewalter and Psaltis used a holographic memory in this mode to drive a small car through the corridors and laboratories of the electrical engineering building there.
	</note>
	
	<note id="n-filesystemintro" timestamp="05/03/2004 00:38:50" sourceref="SMITH">
	SMITH/BARNES pg. 8 - five categories of data (in)dependence
	
	SMITH/BARNES pg. 6 (last paragraph):
	concerning geographic independence (category 5)
	<quote>Ceri and Pelagatti [1] suggest that for a network to be considered a distributed database, each node in the network must be capable of autonomous operations and must also participate in at least one global operation. In the case of a bank, the transfer of funds between accounts at different nodes is such a global operation.</quote>
	
	SMITH/BARNES pg. 9 (top)
	Three issues within the file procesing environment are important: file access techniques, the interdependence of application programs and data files, and data redundancy across data files. We briefly present these issues here, and we examine them in detail in Chapters 3 through 9.
	</note>
	
	<note id="n-gs-intro" timestamp="05/05/2004 23:20:20" sourceref="DEARLE 2">
	"Despite the fact that the basic idea behind orthogonal persistence is very simple, research groups are finding it extremely hard to develop scalable and efficient persistent stores. One of the major difficulties derives from the fact that persistence provides a fundamentally different model of computing from that supported by conventional operating systems. It is therefore not surprising that we are finding that such operating systems are inappropriate for persistent systems research."
	</note>
	
	<note id="n-gs-more" timestamp="05/05/2004 23:20:20" sourceref="DEARLE 1">
	Conventional programming systems require the programmer to translate data resident in virtual memory into a format suitable for long term storage. For example, graph structures must be flattened when they are mapped onto files or relations; this activity is both complex and error prone. In persistent systems, the programmer is not required to perform this mapping since data of any type with arbitrary longevity is supported by the system. "
	
	"To date, most persistent systems, with a few exceptions [6, 9, 34], have been constructed above conventional operating systems. Implementors of persistent languages are invariably forced to construct an abstract machine above the operating system, since the components of a persistent system are different in nature to the components of a conventional operating system. For example, in [37], Tanenbaum lists the four major components of an operating system as being memory management, file system, input-output and process management. In persistent systems, the file system and memory management components are unified. In many operating systems, input-output is presented using the same abstractions as the file system; clearly this is not appropriate in a persistent environment. Some persistent systems require that the state of a process persists. This is not easily supported using conventional operating systems in which all processes are transitory."
	
	"In Grasshopper, loci are the abstraction over execution (processes). In its simplest form, a locus is simply the contents of the registers of the machine on which it is executing. Like containers, a locus is maintained by the Grasshopper kernel and is inherently persistent. Making the locus persistent is a departure from other operating system designs and frees the programmer from much complexity."
	
	"A key concept behind orthogonal persistence is that the programmer is not required to manage the movement of data between primary and secondary storage. Instead, application programs execute within a stable, resilient addressing environment in which data locality is invisible. A number of persistent systems have been constructed which support particular programming languages on a variety of architectures."
	</note>
	
	<note id="n-choices" timestamp="05/05/2004 22:35:42" sourceref="MADANY">
	Madany
	<!-- [47.] P. W. Madany, R. H. Campbell, V. Russo, and D. E. Leyens. ``A Class Hierarchy for Building Stream-Oriented File Systems''. In ECOOP '89, pages 311-328, Nottingham, UK, 1988. -->
	A class heiroarchy for building stream-oriented file systems
	http://choices.cs.uiuc.edu/Papers/Conferences/Ecoop89.stream.file.pdf
	5/5/2004 22:24
	
	"... The file system is one such example of a traditional kernel service. In Choices, we have chosen to implement the file system as a collection of server objects; each object implements an independent component of the file system."
	
	"The file system is a major operating system subsystem. Following the design goals of Choices, we are building a spectrum of file systems to enhance the customization of the operating system family to applications. An application may use a custmized file system that has components which are tailored to improve its performance, to optimize its utilization of storage, or to provide compatibility with other file systems. Stream-oriented file systems comprise a major category of file systems and form the basis for our initial work. In the future, we plan to examine record-oriented file systems, data bases and object-oriented file systems."
	
	"Another goal of Choices is to develop object-oriented file systems that users may extend an dcustomize for thier particular applications. The work we describe here is a milestone in this reserach. It is an object-oriented design and implementation study of the integration of several existing steram-oriented file systems into one class heirarchy with a set of abstract access protocols that may be used on any of the specific filoe systems."
	</note>
	
	<note id="n-tunes" timestamp="05/05/2004 22:03:36" sourceref="TUNES">
	"One alleged reason why orthogonal persistence was not accepted is because it is said to be too expensive to implement. It is not, as some systems like Eumel or Lisp systems showed. A tradition developed to have non-persistent systems, and require users/programmers to explicitly save and restore the state of the objects they use from low-level persistent media."
	
	"Actually, as the discrepancy of speed between of memory components and computing units grows everyday larger, it becomes everyday more obvious that DRAM is really another cache between the CPU and persistent storage, just like (zero to two levels of) SRAM and CPU registers before it. There is no reason why normal users should still have to explicitly fill and empty this cache, when all these things could much more reliably be done automatically by the computer itself. The fact that flushing be done by system software or system hardware is utterly irrelevant to the user, who considers these two as a whole when using them."
	
	"The problem with Orthogonal Persistence is thus one of reliability and performance."
	
	"To have fast and reliable Orthogonal Persistence would be easy if only computers or disks were equipped with battery backed up memory, or "TRAM" (TRAM -- transactional RAM) as they have been dubbed. Power failures would thus be gracefully handled without sacrificing speed by requiring changes to be committed to slow disk before continuing."
	
	"One misunderstanding about Orthogonal Persistence is that by getting rid of filesystems, one would also throw away tree-like hierarchies of directories. Such tree-like hierarchies are indeed a simple and natural way to organize thoughts, although by far not the only one. Even with orthogonal persistence, there would still be a lot of nested "dictionaries", that bind objects to human-readable and typeable names. However, said objects won't be files anymore; instead of raw streams of contiguous bytes, they will be just any structured data that your computing system can manipulate."
	</note>

</notes>

<sources>
<source id="DEARLE 1" type="html" author="Dearle, Alan, et. al" 
 title="Grasshopper: An orthogonally persistent operating system"
 publishdate="26 October 1994" timestamp="5 May 2004"
 href="http://os.dcs.st-and.ac.uk/GH/Papers/GH10/gh10.html"
/>
<source id="DEARLE 2" type="html" author="Dearle, Alan, et. al" 
 title="The Grasshopper Operating System"
 publishdate="18 January 2001" timestamp="5 May 2004"
 href="http://os.dcs.st-and.ac.uk/GH/index.html"
/>
<source id="DEPOMPA" type="article" author="DePompa, Barbara" 
 title="First line of defense" in="InformationWeek"
 issuedate="5 February 1996" pages="58+"
/>
<source id="FOSTER I" type="article" author="Foster, Ian" 
 title="The Grid: Computing without Bounds" in="Scientific American"
 issuedate="April 2003" pages="78-85"
/>
<source id="FOSTER L" type="book" author="Foster, Lonnon R" 
 title="Palm OS Programming Bible" 
 publisher="Hungry Minds, Inc" publishdate="2000" publishlocation="New York"
 totalpages="893"
/>
<source id="KAZMIERCZAK" type="html" author="Kazmierczak, Kevin" 
 title="Palm OS Architecture"
 publishdate="14 November 2001" timestamp="7 May 2004"
 href="http://cs.alfred.edu/~kazmiekr/palm/palm_main.html"
/>
<source id="MADANY" type="html" author="Madany, P., et. al" 
 title="A Class Hierarchy for Building Stream-Oriented File Systems"
 publishdate="1988" timestamp="5 May 2004"
 href="http://choices.cs.uiuc.edu/Papers/Conferences/Ecoop89.stream.file.pdf"
/><!-- at UIUC -->
<source id="PSALTIS" type="article" author="Psaltis, Demetri and Fai Mok" 
 title="Holographic Memories" in="Scientific American"
 issuedate="November 1995" 
/>
<!-- <source id="SMITH" type="book" author="Smith, Peter D., and G. Michael Barnes" 
 title="Files and Databases: An Introduction" 
 publisher="Addison-Wesley" publishdate="1987" publishlocation="Reading, MA"
 totalpages="390"
/> -->
<!-- <source id="TANENBAUM" type="book" author="Tananbaum, Andrew S" 
 title="Modern Operating Systems" 
 publisher="Pearson Education" publishdate="2001" publishlocation="Upper Saddle River, NJ"
 edition="2" totalpages="976"
/> -->
<source id="TOIGO" type="article" author="Toigo, Jon William" 
 title="Avoiding a Data Crunch" in="Scientific American"
 issuedate="May 2000"
/><!-- length: 13 pages --><!-- AVAILABLE http://www.sciam.com/article.cfm?chanID=sa006&amp;colID=1&amp;articleID=00014C5C-2FB1-1C75-9B81809EC588EF21 -->
<source id="TUNES" type="html" 
 title="Orthogonal Persistence"
 publishdate="30 November 2003" timestamp="5 May 2004"
 href="http://cliki.tunes.org/Orthogonal%20Persistence"
/><!-- author: Anonymous -->
<source id="VAHDAT" type="html" author="Vahdat, Amin M., Paul Eastham and Thomas Anderson" 
 title="WebFS: A Global Cache Coherent File System"
 publishdate="December 1996" timestamp="6 May 2004"
 href="http://www.cs.duke.edu/~vahdat/webfs/webfs.html"
/>

<!-- Sample, Herbert A. "Identity Theft Is among Fastest Growing White-Collar Crimes,
    FBI Says." Knight-Ridder/Tribune Business News. 12 Nov. 2001.
    General Reference Center Gold Gale Group Databases. Dallas Public
    Lib., TX. 25 Oct. 2002 <http://www.infotrac.galegroup.com>. -->
</sources>

</project>