- Easy does it.
- If it hurts don't do it.
- Always keep a backup.
Customer tech support person calls our "help line" and says here system has crashed. The first question in cases like this is always "What changed? What did you change?"The answer was she increased a certain kind of system resource used by TPX (VTAM virtual terminals) by several orders of magnitude (in this case by several thousands). But she failed to make corresponding parameter adjustments. The manual does not tell you exactly what those adjustments are because they system is so complex and flexible, because it can be used in wildly different environments, it is difficult to predict what adjustments to make. So she was told "easy does it!" Make changes slowly. Observe what the results are, especially in storage usage, and adjust parameters (in this case slot pool percentages) accordingly. When system comes up and is usable, add more resources. Iterate until you have reached your goal.
Story two has same source and scenario. What was happening was that the system was slow and frustrated users at the customer site would repeatedly hit a certain key (the ATTN key) trying to disconnect. Instead the system crashed. Yes we had a bug. When we got the documentation of the bug (IBM mainframes have this wonderful thing called "a dump" which lets you perform a post-mortem autopsy on the system), we fixed the bug. But until then we told customers to tell their users: "If it hurts, don't do it!"
As for Story three, this is obvious. Always keep at least one backup. If some data is very important, keep more than one and keep them in different places.
For a computer solution when the data is stored in a physically secure location, see RAID.
No comments:
Post a Comment