Skip to main content
Skip table of contents

Problem Management Process

Objective

The primary objectives of the Problem Management process are to prevent problems and resulting incidents from happening, to eliminate recurring events, and to minimize the impact of incidents that cannot be avoided.

Scope

The scope of Problem Management includes root cause analysis of incidents to determine the underlying cause of one or more Incidents and to identify the permanent fix or workaround to those problems. It is also responsible for ensuring that the Resolution is implemented through the appropriate control procedures, such as the Change Management process.

The scope of SoftwareOne Problem Management consists of both reactive and proactive approaches:

Reactive Problem Management

Originates from an Incident that has already occurred.

Proactive Problem Management

Proactive problem management is driven as a continual improvement perspective. The main techniques of proactive problem management include trend analysis, risk assessment, and affinity mapping.

Problem Management Policy Statements

  1. SoftwareOne support teams shall use the currently approved documented Problem Management process and standardized methodology.

  2. All problems must be recorded in ServiceNow.

  3. All problems will be reported, recorded, managed, and appropriately communicated.

  4. Any activity performed for a problem must be documented in the work notes section of the problem ticket.

  5. If a problem is related to an incident, the incident record must be attached to the problem ticket.

  6. The Problem Manager will ensure the Problem Management process is followed.

  7. The Problem Manager of the specific service/technology will assign appropriate resources to conduct Problem Management activities, such as Root Cause Analysis (RCA), creation of workarounds, and proactive trend analysis.

  8. If a Change Request is created to resolve a Problem, the Change ticket shall be attached to the Problem ticket.

  9. Any Problem that requires a Change Request to aid in its Resolution can only be closed after successfully implementing the Change Request and validating that no further incidents occur due to the identified error.

  10. As soon as the diagnosis is complete, and particularly where a workaround has been found (even though it may not yet be a permanent resolution), a Known Error Record must be raised and placed in the Known Error Database so that if further incidents arise, they can be identified and resolved quickly.

Criteria for raising a Problem ticket

  1. Major Incident

  2. Detection of recurring incidents by the support team.

  3. The support agent might have resolved an incident but has not determined a definitive cause and suspects it will likely reoccur. Hence, we shall raise a Problem ticket to perform root cause analysis.

  4. A supplier or contractor is notified of a problem, and an RCA is required.

  5. Analysis of events/incidents as part of proactive Problem Management, resulting in the need to raise a Problem ticket.

Process

RACI Matrix: Problem Management

Key:

R=Responsible
A=Accountable
C=Consulted
I=Informed

 Stages

 Customer

 L1/L2/L3 Support Agent (As Applicable)

Problem Manager

Problem Review Board

Create a problem ticket

I

R

A

C

Conduct root cause analysis

C

R

A

C

Identify & communicate workaround

I

A/R

C

C

Create a known error record

I

A/R

C

C

Identify solution (permanent fix)

I

A/R

C

C

Prepare RCA report

I

A/R

C

C

Conduct RCA review meeting

I

R

A

C

Review RCA report

I

C

C

A/R

Share RCA report

I

A/R

C

I

Implement, Verify & Test solution

I

A/R

C

C

Customer/Stakeholders communication during the problem lifecycle

 

I

 

A/R

 

C

 

I

Retire known error record

I

A/R

C

I

Root Cause Analysis

What is Root Cause Analysis?

Root Cause Analysis (RCA) is a systematic process for finding and identifying the root cause of a problem or event. RCA aims not only to figure out where the issue came about but also to find a solution to prevent it from happening again.

Purpose of Root Cause Analysis

The primary purpose of Root Cause Analysis is to analyze problems or events to identify:

  • What happened

  • How it happened

  • Why it happened

  • Actions for preventing reoccurrence

Root Cause Analysis Model

  • Define the Problem.

  • Gather information, data, and evidence.

  • Identify all issues and events that contributed to the problem.

  • Determine root causes.

  • Identify the Solution for eliminating or mitigating the reoccurrence of problems or events.

  • Implement the identified solution.

Root Cause Analysis techniques

Root Cause Analysis is to identify all and multiple contributing factors to a problem or event. The root cause analysis can be performed through any of the analysis methods.

Many methodologies, approaches, and techniques exist for conducting root cause analysis. Some methods used to conduct RCA include:

  • The "5-Whys Analysis" is a simple problem-solving technique that helps users quickly get to the root of the problem. This strategy involves looking at a problem and asking "why" and "what caused the problem." The answer to the first "why" prompts a second "why" and so on, providing the basis for the "5-why" analysis.

  • Fish-Bone Diagram or Ishikawa Diagram: Derived from the quality management process, it's an analysis tool that provides a systematic way of looking at effects and the causes that create or contribute to those effects. The diagram design looks much like a fish skeleton, hence the designation "fishbone" diagram.

  • Pareto Analysis: Pareto Analysis is a technique used for business decision-making based on the 80/20 rule. It is a decision-making technique that statistically separates a limited number of input factors as having the most significant impact on an outcome, either desirable or undesirable.

Major Problem Review

Upon closure of the Problem ticket, the Problem Manager may conduct a review while memories are still fresh to learn any lessons for the future.

Note: A review may be conducted for a Problem triggered by a Major Incident. It is up to the discretion of the Problem Manager to determine when a Problem Review will be performed.

  • A Problem Review is conducted to determine:

  • Things that are done correctly

  • Things that are done wrong

  • What could be done better in the future

  • How to prevent a recurrence

Such reviews can be used as part of training and awareness activities for support staff – and any lessons learned should be documented as inappropriate procedures, work instructions or diagnostic scripts, or Known Error Records. The Problem Manager shall facilitate the session and document any agreed actions.

Priority Matrix

Priority Calculator

The priorities are derived from impact and urgency.

Priority

Impact

Urgency

Critical

High

Medium

Low

Critical

P1

P2

P2

P3

High

P2

P2

P3

P3

Medium

P2

P3

P3

P4

Low

P3

P3

P4

P4

Impact Definitions

Impact

Description

Detailed Description

Critical

Business critical system is down 

An application or infrastructure service that is essential to the operations of the business (e.g. supports key business functions such as email, finance or customer service) is unavailable.

High

Production system down 

An application or infrastructure service that provides a service to the business but is not essential to the business's daily operations or core functions (e.g. training systems, project management tools and internal content management systems) is unavailable.

Medium

Production system impaired 

A business critical or production system is available but not functioning optimally. It may be experiencing downtime or other problems that limit its ability to provide service to its users.

Low

System impaired

A non-production system (e.g. a test environment, a training environment or a research application) is available but not functioning optimally. It may be experiencing downtime or other problems that limit its ability to provide service to its users.

Urgency Definitions

Urgency

Description

Detailed Description

Critical

No viable workaround and effected work is time sensitive 

An issue is affecting work that is time sensitive and critical to the operations of the business (e.g. processing urgent financial transactions, processing customer orders or dealing with medical emergencies) and there is no viable work-around.

High

Workaround available and effected work is time sensitive 

An issue is affecting work that is time sensitive and critical to the operations of the business (e.g. processing urgent financial transactions, processing customer orders or dealing with medical emergencies) but there is a viable work-around (e.g. a backup system or alternate manual process).

Medium

No viable workaround and effected work is not time sensitive 

An issue is affecting work that is not time sensitive (e.g. data entry, research, test and development) and there is no viable work-around.  This type of work does not have an immediate deadline and can be completed within a flexible timeframe.

Low

Workaround available and effected work is not time sensitive 

An issue is affecting work that is not time sensitive (e.g. data entry, research, test and development) and there is a viable work-around (e.g. a backup system or alternate manual process).  This type of work does not have an immediate deadline and can be completed within a flexible timeframe.

 

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.