Hardware Maintenance Best Practices

Hardware Maintenance Best Practices

No computer system can run forever without incident. Yet the average time before a mainframe system fails can often be measured in decades.

The Five Nines

The mainframe embodies the concept of non-disruptive hardware and software maintenance and installation. These activities can be performed while other systems continue to process work. This near-constant availability of 99.999 percent is commonly called “the five nines.” Continuous operation with unplanned downtime of only 5 minutes over the course of a year.

This is all very good in theory. But unforeseen events can and will occur, so it's best to be prepared. This guide should help.

Proactive Prevention

Develop and implement a proactive preventive maintenance strategy, whereby you regularly apply maintenance.

System administrators should architect and plan for continuous availability. They should know and understand the mechanics of routine firmware and software preventative maintenance. They should also exploit tools and features like concurrent firmware maintenance (CFM), Service Update Management Assistant (SUMA), Service Agent (SA) and subscriptions to maintenance updates.

IBM's website features Best Practices documents – whitepapers designed to provide quick and easy references to assist in understanding the latest strategies in architecting for availability.

Corrective Maintenance

This becomes necessary after you encounter a problem. Your primary goal should be to return your system to the same functional state it was in before you experienced the issue. This type of maintenance addresses a specific and immediate issue and should be applied on an as-needed basis.

Often, System z itself can determine why a failure occurred. This allows for replacement of hardware and software elements whileimpacting as little of your operational system as possible.

New Uses for Old...

The life expectancy of servers can be increased. Virtualization and server hardware improvements may extend server life. Clustering technologies can compensate for the increased risk of server hardware failure, giving administrators additional options.

Older hardware can also be repurposed to handle non-critical workloads or environments where a single server fault won’t wreak havoc on business operations – even if it's working at a slightly slower pace than state-of-the-art machines.

Putting old data center hardware to use may be a cost-effective option when consolidating to take advantage of new technologies such as cloud computing and virtualization.

Maintaining server hardware and maximizing its potential for as long as possible could be your best bet if finances are tight and buying new hardware isn't part of your budget.

Refreshing

In some cases, a server hardware upgrade can solve performance issues at a fraction of the cost of a new server. Adding CPUs or memory can significantly increase a server’s performance. This may extend the usual three-to-four-year hardware refresh cycle. IT administrators still have to make hard decisions as to when updates are needed to ensure adequate performance and efficiency. 

Stay Up To Date

Not all systems are upgradable, and upgrades don’t always fix poorly performing hardware. New technology often provides fresh opportunities, allowing administrators to deploy more virtual machines or adopt more demanding workloads.

Virtualization

Virtualization on a large scale can demand powerful servers, which often come with equally large up-front costs. Server hardware must be powerful enough to handle those CPU-and-memory-hungry virtual machines (VMs). Administrators should also avoid over-consolidation where excessive VMs on a physical server can lead to deteriorating performance and system instability.

Purchasing a large server intended for virtualization could turn out to be more expensive than buying several physical servers. When choosing a server, focusing on CPU and RAM performance and expandability will help prevent performance bottlenecks.

Parallel Sysplex

A sysplex or systems complex is a computer system running on one or more physical partitions, where each can run a different release of a z/OS operating system. It generally provides for resource sharing between communicating systems (tape, consoles, catalogues, etc.). The sysplex increases the number of functioning processing units and z/OS operating systems, which in turn increases the amount of work that can be processed.

Parallel Sysplextechnology allows multiple mainframes to act as one. It's a clustering technology that can provide near-continuous availability. A properly configured Parallel Sysplex cluster will remain available to its users and applications with minimal downtime.

Know When to Quit

There comes a point where sticking with old server hardware can significantly reduce efficiency and poses unnecessary risks to business operations. The key is to identify when increased performance, energy-efficiency requirements and reduced risk of hardware failure will justify a new purchase.

First, determine whether new hardware is actually needed. Will a hardware upgrade solve your performance problems? Or is purchasing a used server an option? Creative financing options can even make leasing a server more cost efficient in the long run.

Money Talk

When changing IT vendors or negotiating a new contract with your current vendor, carefully review the contract terms to avoid unexpected pitfalls. There should be a clear exit strategy and a detailed contract termination process.

Other Considerations

Your firmware-maintenance strategy will be largely dictated by your business priorities. You should periodically review the methodology and criteria being used to define your firmware strategy and modify them as necessary.

In environments that only have minimal opportunity to perform maintenance, choosing a once-per-year strategy and updating both firmware and software after thorough testing may present the best option.