The case for doubling down on refinery reliability now

| Article

Refinery reliability issues have impacted product prices in recent times, and the ongoing reliability issue is creating both challenges and opportunities for refiners.

As the wider industry continues to struggle with unplanned outages, refiners can invest to improve reliability and capture value. Operators can do this by revamping fundamental processes and using digital tools, such as generative AI (gen AI), to drive further improvements in their systems.

In this article, we explore some of the processes and digital tools that may allow operators to advance their reliability systems and unlock value, starting with the fundamentals.

Unplanned refinery outages create opportunities for reliable operators to capture margin

Major external events often lead to sudden spikes in product prices, such as when winter storm Uri led to refinery outages across the United States in December 2021 to January 2022 (Exhibit 1).

1
Unplanned outages reduce capacity, increasing margins by between $6 and $12 per barrel of crude oil.

However, these external events often mask underlying reliability issues across the industry, as became apparent in the summer of 2023. During this period, product prices suddenly increased, despite a lack of major inclement weather events. During this period, several North American oil refineries experienced loss of containment and unexpected unit malfunction, pointing to potential reliability-related production losses.1

Markets responded swiftly to these unplanned outages, resulting in an approximately 75 percent increase in peak margins compared to the seasonal low for the same year. Product cracks showed price increases of between $6 and $12 per barrel during these outages.

Our work with refiners suggests that reliability-related lost profit opportunities can range from $20 million to $50 million per year for mid-size refineries when comparing median versus top quartile performers. A conservative estimate indicates that value from market uplift during these outages for a 200 kilobarrel per day refiner on the US Gulf Coast could be between $6 million and $12 million during a single month, just from having better reliability processes and performance than its peers. For a commercially savvy refiner better positioned to take advantage of margin capture opportunities, the value from improving reliability performance is likely to be even higher.

There are also potential cost savings from the knock-on effects of improved reliability, such as lower spend on equipment replacements and reduced need for maintenance contractors.

Establishing rock-solid reliability fundamentals is not easy, but is critical for success

Building and maintaining a best-in-class reliability program is difficult, and many refiners have not done enough to strengthen basic reliability practices.

Effective reliability processes must be embedded in equipment design and maintenance practices, and require consistent execution over time. To succeed, reliability programs must be supported by a culture of ownership across functions and organizational levels (Exhibit 2).

2
To improve reliability, rock-solid foundations need to be laid.

The four fundamentals of reliability

In our work in the industry, we observed that highly reliable operators focus on four foundational reliability levers: asset strategies, asset health and monitoring, reliable operations, and maintenance execution. These levers define what to do, when to do it, and how to do it.

Asset strategies: Determining the appropriate maintenance strategy for all assets in a refinery, or across a fleet of refineries, is essential. These strategies are defined by understanding the vulnerabilities and risks of a system, then conducting failure mode and effects analysis (FMEA) for critical equipment. These system-level and equipment-specific risks inform preventative tasks, both during outages and normal operations. Analyses can then be conducted to understand which scope will be the most effective in reducing risk of failure and thus prioritized for investment, based on cost-benefit tradeoffs or ROI calculations.

Asset health and monitoring: Capturing high-quality reliability data is paramount to tracking the success of maintenance strategies. Recording failure-rate data for critical equipment and having a comprehensive work order database are key enablers for a best-in-class reliability program. More advanced and cost-efficient wireless monitoring sensors are becoming more and more prevalent, making the collection of reliability data easier every year. High-quality asset-health data allows operators to deploy condition-based monitoring rather than time- or usage-based approaches to preventive maintenance.

Reliable operations: To ensure reliability, operators can focus on setting and adhering to reliable operating windows, monitoring equipment, and conducting operator rounds to identify abnormal conditions, while following operations procedures to avoid or manage upsets.

Maintenance execution: High-quality execution of maintenance activities in the field is critical for achieving the benefits laid out in an asset strategy. The typical maintenance workforce in a refinery operates under quick turnarounds, changing priorities, and with a long list of backlog work across equipment criticalities. Leaders of the maintenance organization have a responsibility to set operators up with the right tools for success. Implementing best practices within maintenance organizations to drive consistent troubleshooting, more effective planning and scheduling, and efficient execution ensures labor productivity can be maximized.

What high-reliability organizations get right

What high-reliability organizations get right

Digital tools can accelerate improvements in reliability processes

Augmenting a rock-solid reliability program with digital solutions can help refiners improve the efficiency and sustainability of fundamental reliability processes, potentially creating a competitive advantage compared to peers.

Developing robust asset strategies efficiently via FMEA

FMEAs can be extremely time-intensive processes, with reliability engineers needing to generate a comprehensive list of failure modes while simultaneously processing thousands of unstructured maintenance records (Exhibit 3). Refiners could look instead at using automated FMEA to develop robust equipment strategies.

3
Robust equipment strategies can be developed and updated with AI assistance.

An automated FMEA tool can help reduce manual hours significantly, by reading asset descriptions, scraping publicly available information such as OEM manuals, and reviewing work order history to draft a hierarchy of equipment systems, subcomponents, failure modes, and maintenance actions.2 Reliability engineers can then review and modify outputs at each step.

Improving maintenance efficiency via a gen AI assistant

Given recent trends in workforce turnover, the swift building of capabilities and having safeguards in place are both critical to ensuring a maintenance organization's efficiency and quality of execution.3

A gen AI maintenance “assistant” allows technicians to ask targeted questions about specific equipment and conditions, helping maintenance technicians reduce time spent troubleshooting equipment (Exhibit 4).

Exhibit 4

Ultimately, a gen AI assistant that provides a synthesized view from all relevant data sources—such as maintenance logs, checklists, and manuals—can make technicians more effective and efficient while also freeing-up supervisor time (Exhibit 5). Similar tools can help operators run their plants reliably, for example, facilitating issue identification during rounds, creating high-quality maintenance notifications, rapidly troubleshooting issues, and accessing procedures during operational upsets.

5
Maintenance assistants can improve organizational efficiency.

In this way, machine learning and AI tools could enable organizations to achieve consistent maintenance execution and increased productivity. In addition to the examples noted above, refiners can use gen AI tools and techniques to clean process and equipment monitoring data, write and update maintenance procedures, and refresh training manuals and guides. Deploying gen AI as described above can help the next generation of refiners reduce downtime and improve productivity, all of which helps deliver reliable business results in an increasingly unpredictable world.


The journey to becoming a best-in-class operator begins with getting the basics right. Digital solutions are not a replacement for the foundations of reliability in refining, but digital solutions built on the foundation of a solid reliability program could help operators rapidly mature their reliability management systems and unlock a competitive edge.

Explore a career with us