Problem Management Process
Objective
The primary objectives of the Problem Management process are to prevent problems and resulting incidents from happening, to eliminate recurring events, and to minimize the impact of incidents that cannot be avoided.
Scope
The scope of Problem Management includes root cause analysis of incidents to determine the underlying cause of one or more Incidents and to identify the permanent fix or workaround to those problems. It is also responsible for ensuring that the Resolution is implemented through the appropriate control procedures, such as the Change Management process.
The scope of SoftwareOne Problem Management consists of both reactive and proactive approaches:
Reactive Problem Management | Originates from an Incident that has already occurred. |
Proactive Problem Management | Proactive problem management is driven as a continual improvement perspective. The main techniques of proactive problem management include trend analysis, risk assessment, and affinity mapping. |
Problem Management Policy Statements
SoftwareOne support teams shall use the currently approved documented Problem Management process and standardized methodology.
All problems must be recorded in ServiceNow.
All problems will be reported, recorded, managed, and appropriately communicated.
Any activity performed for a problem must be documented in the work notes section of the problem ticket.
If a problem is related to an incident, the incident record must be attached to the problem ticket.
The Problem Manager will ensure the Problem Management process is followed.
The Problem Manager of the specific service/technology will assign appropriate resources to conduct Problem Management activities, such as Root Cause Analysis (RCA), creation of workarounds, and proactive trend analysis.
If a Change Request is created to resolve a Problem, the Change ticket shall be attached to the Problem ticket.
Any Problem that requires a Change Request to aid in its Resolution can only be closed after successfully implementing the Change Request and validating that no further incidents occur due to the identified error.
As soon as the diagnosis is complete, and particularly where a workaround has been found (even though it may not yet be a permanent resolution), a Known Error Record must be raised and placed in the Known Error Database so that if further incidents arise, they can be identified and resolved quickly.
Criteria for raising a Problem ticket
Major Incident
Detection of recurring incidents by the support team.
The support agent might have resolved an incident but has not determined a definitive cause and suspects it will likely reoccur. Hence, we shall raise a Problem ticket to perform root cause analysis.
A supplier or contractor is notified of a problem, and an RCA is required.
Analysis of events/incidents as part of proactive Problem Management, resulting in the need to raise a Problem ticket.
Process

RACI Matrix: Problem Management
Key:
R=Responsible
A=Accountable
C=Consulted
I=Informed
Stages | Customer | L1/L2/L3 Support Agent (As Applicable) | Problem Manager | Problem Review Board |
Create a problem ticket | I | R | A | C |
Conduct root cause analysis | C | R | A | C |
Identify & communicate workaround | I | A/R | C | C |
Create a known error record | I | A/R | C | C |
Identify solution (permanent fix) | I | A/R | C | C |
Prepare RCA report | I | A/R | C | C |
Conduct RCA review meeting | I | R | A | C |
Review RCA report | I | C | C | A/R |
Share RCA report | I | A/R | C | I |
Implement, Verify & Test solution | I | A/R | C | C |
Customer/Stakeholders communication during the problem lifecycle |
I |
A/R |
C |
I |
Retire known error record | I | A/R | C | I |
Root Cause Analysis
What is Root Cause Analysis?
Root Cause Analysis (RCA) is a systematic process for finding and identifying the root cause of a problem or event. RCA aims not only to figure out where the issue came about but also to find a solution to prevent it from happening again.
Purpose of Root Cause Analysis
The primary purpose of Root Cause Analysis is to analyze problems or events to identify:
What happened
How it happened
Why it happened
Actions for preventing reoccurrence
Root Cause Analysis Model
Define the Problem.
Gather information, data, and evidence.
Identify all issues and events that contributed to the problem.
Determine root causes.
Identify the Solution for eliminating or mitigating the reoccurrence of problems or events.
Implement the identified solution.
Root Cause Analysis techniques
Root Cause Analysis is to identify all and multiple contributing factors to a problem or event. The root cause analysis can be performed through any of the analysis methods.
Many methodologies, approaches, and techniques exist for conducting root cause analysis. Some methods used to conduct RCA include:
The "5-Whys Analysis" is a simple problem-solving technique that helps users quickly get to the root of the problem. This strategy involves looking at a problem and asking "why" and "what caused the problem." The answer to the first "why" prompts a second "why" and so on, providing the basis for the "5-why" analysis.
Fish-Bone Diagram or Ishikawa Diagram: Derived from the quality management process, it's an analysis tool that provides a systematic way of looking at effects and the causes that create or contribute to those effects. The diagram design looks much like a fish skeleton, hence the designation "fishbone" diagram.
Pareto Analysis: Pareto Analysis is a technique used for business decision-making based on the 80/20 rule. It is a decision-making technique that statistically separates a limited number of input factors as having the most significant impact on an outcome, either desirable or undesirable.
Major Problem Review
Upon closure of the Problem ticket, the Problem Manager may conduct a review while memories are still fresh to learn any lessons for the future.
Note: A review may be conducted for a Problem triggered by a Major Incident. It is up to the discretion of the Problem Manager to determine when a Problem Review will be performed.
A Problem Review is conducted to determine:
Things that are done correctly
Things that are done wrong
What could be done better in the future
How to prevent a recurrence
Such reviews can be used as part of training and awareness activities for support staff – and any lessons learned should be documented as inappropriate procedures, work instructions or diagnostic scripts, or Known Error Records. The Problem Manager shall facilitate the session and document any agreed actions.
Priority Matrix
Priority Calculator
The priorities are derived from impact and urgency.
Priority | Impact | ||||
Urgency | Critical | High | Medium | Low | |
Critical | P1 | P2 | P2 | P3 | |
High | P2 | P2 | P3 | P3 | |
Medium | P2 | P3 | P3 | P4 | |
Low | P3 | P3 | P4 | P4 |
Impact Definitions
Impact | Description | Detailed Description |
Critical | Business critical system is down | An application or infrastructure service that is essential to the operations of the business (e.g. supports key business functions such as email, finance or customer service) is unavailable. |
High | Production system down | An application or infrastructure service that provides a service to the business but is not essential to the business's daily operations or core functions (e.g. training systems, project management tools and internal content management systems) is unavailable. |
Medium | Production system impaired | A business critical or production system is available but not functioning optimally. It may be experiencing downtime or other problems that limit its ability to provide service to its users. |
Low | System impaired | A non-production system (e.g. a test environment, a training environment or a research application) is available but not functioning optimally. It may be experiencing downtime or other problems that limit its ability to provide service to its users. |
Urgency Definitions
Urgency | Description | Detailed Description |
Critical | No viable workaround and effected work is time sensitive | An issue is affecting work that is time sensitive and critical to the operations of the business (e.g. processing urgent financial transactions, processing customer orders or dealing with medical emergencies) and there is no viable work-around. |
High | Workaround available and effected work is time sensitive | An issue is affecting work that is time sensitive and critical to the operations of the business (e.g. processing urgent financial transactions, processing customer orders or dealing with medical emergencies) but there is a viable work-around (e.g. a backup system or alternate manual process). |
Medium | No viable workaround and effected work is not time sensitive | An issue is affecting work that is not time sensitive (e.g. data entry, research, test and development) and there is no viable work-around. This type of work does not have an immediate deadline and can be completed within a flexible timeframe. |
Low | Workaround available and effected work is not time sensitive | An issue is affecting work that is not time sensitive (e.g. data entry, research, test and development) and there is a viable work-around (e.g. a backup system or alternate manual process). This type of work does not have an immediate deadline and can be completed within a flexible timeframe. |