Monday, April 8, 2019

Self-healing Operating Systems Essay Example for Free

Self-healing direct Systems demonstrateThe dependability of calculating machine dodges is one of the key issues in the technological era. Our daily lives argon currently governed by complex computer musical arrangements (Haugk, Lax, Royer and Williams, 1985). Operating governances qualified of managing key application on computer organizations should be in a position to cope with the increasing rate of softw be problems, malicious attacks and hardw atomic number 18 flaws (Parhami, 2005 and Lohr, 2001). One of the around signifi idlert requirements for operating placements is resilience to shifts.Most of the operating schemas stop operating at once they encounter a problem with the hardw ar or softw be. This results to mischief of applications and info running in the administration. Some common examples of much(prenominal) issues are Windows blue screen errors and warmness panics in UNIX (David and Campbell, n. d). This is unfortunate since the main concern of the users is with the applications and data. They are xenophobic of losing data out of a fault that it not of their making. Even after a fault is encountered in the software or hardware, the users would want to leave their data intact and recoverable.This problem has been taken care of by the invention of Self-healing operating brasss. Self-healing operating systems refer to systems that automatic anyy detect, diagnose and affect software and hardware problems that are localized. There are various techniques that are used by the operating system in recuperation, once an error has been detected (Andrzejak, Geihs, Shehory and Wilkes, 2009). Code reloading Temporary memory errors or memory corruption as a result of an erroneous code hobo lead to errors like illogical focuss to the software code. despite the fact that the ECC memory is capable of detecting and fixing some temporary memory faults, it is not capable of handling corruption faults that result from invalid instructio ns. The simplest most effective technique to cargo hold such a problem is code reloading. This recovery technique reloads the flawed memory work from permanent memory. In shield the fault is permanent, a case that squirt be identified through testing, at that place is a possibility of recovering through remapping of the faulty hardware page utilizing virtual memory support.In case the branching unit points to an undefined command exception, the command is reloaded by the handler from a copy of the system code in memory-mapped permanent memory and the command that is reloaded is executed. This recovery subprogram is the simplest in implementation. However the procedure is not capable of detecting memory corruption resulting from an opcode transforming into another legal opcode (David and Campbell, n. d). Regular checking of the operating system code is important to better staining of flaws in the memory. Hashing and checksums are simple methods of verifying of running system code.If a fault is detected a reload is triggered very fast. This is a preventive strategy that is capable of detecting flaws before they cause errors. The preventive strategy is also capable of detecting faults that induct an opcode to result to another legitimate opcode (Demsky, and Rinard, 2002). CRC-32 checksum of critical kernel code is computed periodically by choices. This is makes sure that the memory where the instruction is stored has not been adulterated. In case the checksum changes as a result of corrupted memory, the b hush up of the memory that is corrupted is reloaded from the permanent memory.Flushing of the instruction cache is carried out to get wind that all the affected commands are disposed of. The checksum tolerate also be computed as soon as an operating system error is detected. This is done to make sure that the system and recovery code is not affected (Liedtke, 1995). Modern ARM-based wait onor designs consist of race Time Integrity Checker (RTIC) hardware. This hardware is capable of organism configured by the operating system for computation and verification of SHA-1 hashes of specific code areas. Once an error is identified, a communication is made to the processor via an crack.The same kind of checksum verification clear be apply in checking the ace of fixed data. Checking the integrity of changing data is hard. One impuissance of this recovery procedure is that it rout outnot be clearly used for codes that are created at run-time or for self-modifying code. This means that care must be taken to make sure that a replica of the created code is stored in a permanent memory (Shapiro, 2004). Component micro-rebooting This technique has been proven to be effective for application programs. Application of this technique to OS is also practicable (Voas and McGraw, 1998).The technique can attend to in recovery from temporary hardware flaws and some system bugs. For the Nooks project, this technique as extension restarts was utilized for recovery of the Linux Kernel. The technique involves reinitialising the corrupted part or destroying and recreating it and then re-requesting the chemical element part. While in code reloading errors are fixed only in processor commands, in this technique errors are fixed in kernel data structures. The technique works in collaboration with isolated component parts. The wrapper elements that offer isolation of the components are also utilised in the anxiety of the recovery.The fault illustration that is addressed in micro-reboot is component-level flaw repression. This can be partly implemented by component isolation (Tanenbaum, herdsman and Bos, 2006). Automatic armed service restarts In case crucial operating system service, like the paging daemon, stops workings, it brings the entire system to a stop. Once the failure of such a crucial process is realised, a restart of the process can solve the problem and continue the feat of the operating system. The flaw m odel that is handled by automatic service restart is superstar-process failure.In this case there is ordinarily no external assert corruption. In micro-kernel OS, this essentially involves detection and restarting of the affected system services that are run as application processes (David, Carlyle and Campbell, 2007). For instance, in Minix3, this operation is carried out by reincarnation server. A system process could be developed such that it is mechanically restarted once it encounters an exception. There is a particular system process that loops constantly awaiting a prepared process and acquiesces to the new process. This special system process is the process dispatcher.The system becomes completely useless once the process dispatcher crashes. This is the reason why in some systems the system dispatcher is executed as a restartable process that can be recovered once it crashes (Demsky and Rinard, n. d). Process restarts whitethorn fail to work where the process utilizes lo cks for accessing shared data structures. Such cases are common where the process dies while holding a single or more locks. Even in case the shared data structures are not affected or they can be corrected, recovery will not happen unless there is releasing of all the locks held by processes.This is why the system should be such that it can track all the locks help by processes and forcefully release any that is help once a process is halted. It is possible to implement lock tracking and force unlocking to ensure that the process runs once a fault has been identified and fixed (Tanenbaum, Herder and Bos, 2006). Watch-dog based recovery This technique utilises external hardware watchdog timers. They are utilised in error detection where the operating system is not doing any useful work. This is such a case where the OS is in an infinite loop. There is need for regular re zealting of the timer by the operating systems.A star sign is sent to the processor once the timer expires. The processor has a reset pin where the timers are commonplacely cabled. They lead to a complete reboot of the system in case of failure. This process has a weakness for a complete reboot results to the loss of user data and applications that are currently in the volatilisable memory. However, since the memory is conserved after a process reset, reconstruction of both the operating systems and user republic is possible. This makes it possible to continue operating after the reset. This way the user data is recovered resulting to higher reliability (Andrzejak, Geihs, Shehory and Wilkes, 2009).This technique has been successfully implemented in Linux and Choices. Once there is resetting of the memory management unit (MMU), interrupt subsystem, watchdog bites, and the processor, the system continues to operate effectively. To be able to avoid loosing the user data, the reset handler passes the usual boot procedure when the reset is instigated by the timer. The reset handler turns the m emory management unit rear on, there is deactivation of the running processes, reinitialising of the interrupts and skips to the OSs process dispatch loop.After this the system runs the next ready process (Shapiro, 2004). altogether that is lost is the process show of the one that was running during the resetting of the processor. The process whose utter is lost cannot be scheduled once more. As a result, it is eliminated from the process queue. A solution to the lock-up state is delivering of exception to the thread that is locked up. In this case, the thread is free to try local recovery rather than being forced to terminate. Watch-dog based recovery uses single process crash as a fault model without external state corruption.The technique utilises the lock tracking code in the release of pooled alternatives that are in a process that is terminated. Another kind of lockup that can initiate a watchdog timeout is a deadlock. Recovery in this case can be tried by restarting som e parts so as to break cycles (Andrzejak, Geihs, Shehory and Wilkes, 2009). Transactional roll- patronage Once an error results to an exception during an operation, there could be a roll back of the state of the part. This can be achieved through the abortion of the operation. After abortion, the operation is then retried.In Choices, management of a transaction is carried out by the same wrapper elements that offer isolation. The transaction is aborted by the wrapper. Where there is unhandled exception, the state of the part is rolled back. It is also possible to use multi-threaded and non-blocking execution offered by RSTM for better surgery (Brown and Patterson, 2001). subscribe to of transactional model on parts results to expenses in terms of property and time. Expenses in terms of space are as a result of transshipment center of backup copies of states prior to transactions.In terms of time, it is receivable to performance of memory copies and management of the memory duri ng the set up and committing of a transaction (Marathe et al. 2006). Transactional roll-back differ from component micro-booting since the roll back is only on the current process, while the latter re-initialises the entire internal state of the process. Based on the kind of the component, either of the two techniques can be employed. Particularly, in case the component has crucial state education that can be lost if component micro-booting is used, then transactional roll-back can be utilised to retain the state.Component micro-booting is useful when the component can withstand state reinitialisation and has few overheads (Demsky and Rinard, n. d). Process-level recovery Where clear recovery cannot work, or in case the recovery process becomes erroneous, specific process states can be stored to permanent memory. This is carried out as the last option is all the others cannot work. Once the user states are stored, the system can attempt full reboot. The state of the processes can t hen be saved selectively into the computer.Every operating system state is reinitialised after the reboot probably removing fleeting errors. Process-level recovery ensures that user applications are not lost when the fault affects only a few system applications or inunlike(p) operating system state. The technique can be used in collaboration with file system snapshots to make sure that the file integrity is not affected after the recovery process by going on to run erroneous processes. This procedure needs minimal support from the operating system. All it requires is an operational permanent memory drive and user process state management code.The stored processes can be restored selectively after the healing process (Ghosha, Sharman, Rao and Upadhyaya, 2007). Conclusion The reliability of computer systems is one of the key issues in the ultramodern society. This is because computers afford become central to our lives and we depend on them for many of our operations. A reliable co mputer system is one that can recover from a fault or an error effectively and without loss of either user applications or data. This is the reason why operating systems have been developed such that they are self-healing.This means that they can automatically detect, diagnose and repair software and hardware problems that are localized. The recovery techniques discussed on the authorship include Code reloading Component micro-rebooting Automatic service restarts Watch-dog based recovery Transactional roll-back and Process-level recovery. Annotated Bibliography Andrzejak, A. , Geihs, K. , Shehory, O. Wilkes, J. (2009). Self-Healing and Self-Adaptive Systems, Dagstuhl Seminar 09201, May 10-15, 2009. This make-up presented in Dagstuhl Seminar tackles various aspects of self-healing and self-adaptive systems.Among the issues discussed in the radical include fault detection and diagnosis, recovery and repair techniques, frameworks and architectures for self-adaptive systems, self-he aling solutions in IT infrastructures, and fault management for application systems. The discussion on recovery and repair techniques makes the paper an important option for the project. Brown, A. , and Patterson, D. (2001). Embracing failure A case for recovery-oriented computing (ROC). High Performance Transaction Processing Symposium, Asilomar, CA (October 2001). This paper is generally on recovery-oriented technology.Brown and Patterson discus various aspects related to recovery from faults and errors in computing. In their work, they have not left out the role of operating systems in recovery, which is the focus of this research. As a result, this paper provides very important instruction for the project. The authors are experts in data recovery and therefore the information provided is reliable in understanding recovery in computing. David, F. Campbell, R. (n. d). Building a Self-Healing Operating System, Urbana, IL University of Illinois. This paper by David, F. Campbell, R.discusses the rationale behind development of Self-healing Operating Systems. They go further to discus the recovery techniques that ensure user applications and data in temporary storage are not lost when an operating system crashes. The techniques discussed include Code reloading Component micro-rebooting Automatic service restarts Watch-dog based recovery Transactional roll-back and Process-level recovery. This makes the paper an important resource for this project. David, F. Carlyle, J. Campbell, R. (2007). Exploring Recovery from Operating System Lockups. In USENIX Annual practiced Conference, Santa Clara, CA.In the recovery process, process restarts may be impossible where the process has locks. This mostly happens where the process terminates while holding a single or more locks. This resource provides crucial information on how to deal with these lock-ups for recovery to be effective. The paper introduces what lock-ups and how to handle them when using different recover y methods. This is what makes it important as an information source for this paper. Demsky, B. and Rinard, M. (2002). Automatic detection and repair of errors in data structures. Technical Report MIT-LCS-TR-875, MIT, Massachusetts Institute of Technology.This paper is on mechanical detection and repair of errors in computer systems. The idea of automatic detection and repair reveals the fact that the operation system is involved in the detection and recovery. The paper provides details on how the self-healing operating system detects and repairs errors in data structures. These are the techniques that are used for detection and recovery which are the main focus of the essay. Demsky, B. Rinard, M. (n. d). Automatic data Structure Repair for SelfHealing Systems. Retrieved on August 3, 2010 from http//people. csail. mit.edu/rinard/paper/sms03. pdf The authors of this paper, Demsky, B. Rinard, M. talk about a system that they came up with that that accepts specifications of key data structure constraints, detects and repairs breaches of these constraints, making it possible for the program to recover from errors and continue working effectively. The paper offers the procedures that the authors use in detection and recovery of their system from the errors. This is what makes the paper significant for the research. Ghosha, D. , Sharman, R. , Rao, R. Upadhyaya, S. (2007). Self-healing systems survey andsynthesis, Decision Support Systems Volume 42, Issue 4. Ghosha, Sharman, Rao and Upadhyaya give a detailed analysis of Self-healing systems. Theirs is a contemporary software-based systems and applications analysis in a world where this has gained significance importance. They discus the ability of Self-healing systems in to manage conflicting resources and service different user needs. They go ahead to discus the need and how to discover and rectify system faults and recovery from errors. They have argued that these systems attempt to heal themselves by recovering from faults and regaining normal performance rates.Haugk, G. , Lax, F. , Royer, R. and Williams, J. (1985). The 5ESS(TM) switching system upkeep capabilities. ATT Technical Journal, 64(6 part2). This paper discusses maintenance capabilities of operating systems. It is a useful recourse for the essay that discusses self-healing of operating systems from an diachronic point of view. Computer systems have been affected by software bugs and hardware faults since the beginning. This article discusses how these bugs and faults that result to errors have been handled since the invention of computer hardware and software. Liedtke, J.(1995). On micro-kernel construction. In SOSP 95 Proceedings of the fifteenth ACM symposium on Operating systems principles, New York ACM Press. This book includes the proceedings of ACM symposium on Operating systems principles in 1995. The book contains a discussion of the component micro-rebooting that has been proven to be effective for application progra ms. The author also argues that the application of this technique to operating system is also practicable. For the Nooks project, this technique as extension restarts was utilized for recovery of the Linux Kernel.This book contains important information on component micro-rebooting as recovery technique for self-healing operating systems. Lohr, S. (2001). Go to The Story of the Math Majors, Bridge Players, Engineers, rig Wizards, Maverick Scientists, and Iconoclasts, the Programmers Who Created the parcel Revolution. New York Basic Books. This book provides important information on the evolution and working of software. The book offers reliable information on software management. Software bugs are some of the problems that cause errors on processes. The book offers a clear understanding of these bugs and ways of dealing with them.Marathe, V. et al. (2006). Lowering the Overhead of Software Transactional Memory. Technical Report TR 893, Computer Science Department, University of Ro chester, Mar 2006. According to this paper, support of transactional model on parts results to overheads in terms of space and time. Expenses in terms of space are as a result of storage of backup copies of states prior to transactions. In terms of time, it is due to performance of memory copies and management of the memory during the set up and committing of a transaction. After providing this fact, the authors goes on to discuss ways of eliminating these overheads.Parhami, B. (2005). Computer Architecture From Microprocessors to Supercomputers, New York Oxford University Press. As the technology has been advancing, so are the changes and needs to have systems that are more reliable. This book has a branch that discusses computer operations and it is the section that has significant information for the paper. Faults in computer hardware are as crucial in error detection and recovery as software. This makes the book important for the research. The research would not be complete wit hout the understanding of computer hardware. Shapiro, M. ( 2004).Self-Healing in Modern Operating Systems. Retrieved on August 3, 2010 http//queue. acm. org/detail. cfm? id=1039537 Shapiro gives an penetration to the topic of self-healing operating systems by first discussing the role played by the operating system in a computer system. It is not possible to understand the concept of self-healing operating systems, without understanding operating systems in general. This is the strength of this article for this research. He goes on to discuss the self-healing system model, which leads to the self-healing operating systems, which is the center of this research.Tanenbaum, A. S. , Herder, J. N. and Bos, H. (2006). Can We mark Operating Systems Reliable and Secure? Computer, 39(5)4451, The reliability of computer systems is one of the key issues in the modern society. This article provides the reasons why computer systems need to be made reliable and dependable. The authors go on to condone ways by which operating systems can be made more reliable in a computing environment prone to hardware faults and software bugs. This book is an important resource for the essay since it provides the solutions to the problem. Voas J. M. and McGraw G. (1998).Software Fault Injection. New York Wiley, 1998. Software Fault Injection is a book that identifies the fact that software bugs can result to unreliability in computer systems. The book discusses ways in which these bugs and errors in computer systems can be identified and what should be done. The solution suggested by Voas J. M. and McGraw G. is related to the operating systems, leading us to what is referred to as self-healing Operating Systems. This section on how the system can solve the problems with the software is the one that offers important information for the research.

No comments:

Post a Comment