Resilience KPIs: From Uptime to Recovery Value

Move beyond uptime. Discover how "Recovery Value" KPIs help infrastructure owners measure speed, cost, and carbon impact to build true climate resilience.

CLIMATE-RESILIENT INFRASTRUCTURE & CIRCULAR MATERIALS

TDC Ventures LLC

3/18/202617 min read

Crews repairing a burst pipe and power lines in a flooded street at sunset.

Context: Why Resilience KPIs Matter Now

The turbulence of climate change has redefined the operating environment for infrastructure owners around the world. Every year, extreme weather events—spanning torrential floods, devastating wildfires, urban heatwaves, and severe storms—expose the vulnerabilities of legacy systems and test the effectiveness of resilience strategies. In this new climate reality, the traditional focus on “uptime” as the sole benchmark of success no longer suffices. While maintaining high service availability remains important, rapid and resource-conserving recovery after a disruption has become the most critical differentiator.

Utilities, municipal resilience officers, and infrastructure asset managers are under mounting pressure to provide transparent, defensible proof that their assets and systems don’t just bounce back, but do so efficiently, affordably, and in a climate-conscious manner. The growing expectations of investors, regulatory bodies, and insurance carriers have triggered a transformation in how resilience is tracked and reported. Today, executive leadership teams demand a broader view—one that links operational capacity to financial impact, environmental stewardship, and social accountability.

Regulatory frameworks like the EU Taxonomy, the U.S. Securities and Exchange Commission’s climate disclosure rules, and the Sustainable Finance Disclosure Regulation (SFDR) in Europe are not just influencing long-term planning; they are shaping day-to-day asset management decisions. Municipalities and private operators alike must now demonstrate that their infrastructure assets are not only robust but adaptable—capable of sustainable recovery with data to back it up. The ability to quickly report on speed, cost, and carbon impacts of recovery is fast becoming table stakes in infrastructure management.

Key Takeaway:
Resilience KPIs now sit at the crossroads of operations, sustainability, finance, and compliance. Beyond simply reporting “percent uptime,” modern leaders must prove ongoing performance in the face of disruption and decisive, low-carbon recovery through tightly measured metrics.

2. Defining the Real Problem and Opportunity

The Problem:

Despite millions spent on digital dashboards and monitoring software, most infrastructure operators still track only a narrow set of performance indicators. Uptime may mask severe weaknesses—such as slow repairs, unsustainable restoration processes, or repeated asset failures following major events. For CFOs, city officials, and asset managers, this gap can obscure everything from hidden supply chain risks to inflated insurance costs and missed ESG targets.

Traditional approaches disregard the τrue cost of recovery: the cumulative delays from sourcing hard-to-find materials, the mounting logistics expenses, the climate impact of using virgin resources, and the reputational risks when service interruptions drag on. Equally problematic, these dashboards give little visibility into the opportunities for innovation—such as harnessing recycled metals—to expedite and green the recovery process.

The Opportunity:

The shift to comprehensive resilience KPIs—capturing not just how long assets remain functional, but how rapidly, affordably, and sustainably they get restored—unlocks powerful operational and strategic advantages:

Smarter Capital Allocation: By uncovering which assets deliver high “recovery value,” organizations can direct investment toward upgrades or retrofits that maximize resilience returns.
Clearer Role for Sustainable Materials: Recycled metals and circular materials can drive down carbon emissions by up to 90% compared to new materials, while dramatically shortening restoration lead times due to better local and regional availability.
Enhanced ESG Reporting and Compliance: New standards reward organizations that can document both adaptation and mitigation—proving their infrastructure doesn’t just survive, but recovers with measurable climate and resource efficiency.
Optimized Insurance Terms: With resilience KPIs now underwriter-verified, asset owners can negotiate better (lower) premiums and contract conditions based on quantified recovery data.
Stronger Public and Stakeholder Trust: Demonstrating proven, sustainable, and data-rich resilience paves the way for grant fund access, public-private partnerships, and community buy-in.

For example, according to a 2023 report from the World Economic Forum, cities that adopted advanced resilience KPIs saw a 15–25% reduction in recovery costs and a similar decrease in restoration times compared to peers using only traditional metrics.

3. Key Concepts and Definitions

Achieving world-class resilience reporting and low-carbon performance requires a precise, common vocabulary—one that enables teams from operations to finance to sustainability to collaborate effectively and make informed decisions. Here’s a detailed breakdown to ground your strategy:

Resilience:
The measure of an asset or system’s ability to withstand shocks (e.g., climate events, supply chain disruption) and recover critical functions in a timely, sustainable manner. In infrastructure management, this means robust design coupled with rapid, low-impact restoration processes.
Key Performance Indicator (KPI):
A KPI in this context is more than a number on a dashboard—it's a strategic signal of how well your operations balance service, cost, and climate impact. For resilience, these could include metrics such as “mean hours to recovery,” “total carbon emitted per recovery event,” or “percent use of recycled content in repairs.”
Recovery Value:
This holistic KPI layers in every major factor that influences restoration, including:
- Speed: Hours or days needed to resume full operation.
- Cost: Total dollars spent, including labor, materials, and indirect losses.
- Carbon: Embodied greenhouse gas emissions in the restoration process.
- Resource Efficiency: Proportion of materials that are recycled or circular.
- Durability: Likelihood of asset failure recurrence.
By integrating these dimensions, “recovery value” gives decision-makers a data-driven way to compare interventions and proves value to insurers and the public.
Low Carbon Infrastructure:
Refers to systems consciously designed to minimize greenhouse gas emissions from day one, through construction, operation, and eventual recovery. The hallmark is a high reliance on circular processes and recycled content—especially metals whose extraction and manufacture are among the largest sources of industrial emissions.
Recycled Metals:
Metals salvaged from decommissioned infrastructure, cars, appliances, and construction scrap, refined to meet modern standards. According to the International Energy Agency, using recycled metals cuts energy use for steel by ~75% and for aluminum by ~95% compared to virgin mining—creating both direct emissions reductions and faster access to building blocks in times of crisis.
Circularity:
The principle of designing out waste and keeping materials in use as long as possible. In practical terms, this means maximizing recycled content in new builds and repairs, thus ensuring supply chain resilience, cost predictability, and lower carbon impact.

Alignment to Key Points:
The latest generation of resilience KPIs can only be achieved through explicit integration of recycled and low-carbon materials, shifting cultural priorities and procurement strategies across the infrastructure sector.

4. Core Framework: From Uptime to Recovery Value

The RCV (Resilience & Circular Value) Dashboard in Practice

To operationalize these new priorities, organizations need a multi-layered dashboard that blends technical, financial, and environmental data. This is where the Resilience and Circular Value (RCV) framework comes in, providing both real-time and strategic feedback for leadership and field teams.

Step 1: Baseline Your Current State

Begin with traditional resilience metrics: percent uptime, frequency of outages, and Mean Time to Recovery (MTTR). Collect at least 2–3 years of historical data for meaningful context.

Step 2: Map Material and Supply Chain Flows

Identify:

Where original and spare materials are sourced.
How much is procured locally/regionally.
The recycled content and certification of those materials.

This mapping can uncover sources of regular bottlenecks and hidden cost overruns.

Step 3: Expand the KPI Set

Track KPIs including:

Downtime per Event: Direct indicator of impact on community and productivity.
Recovery Cost: Total outlay, including hidden costs (e.g., expedited shipping, emergency labor).
Embodied Carbon: Calculated using material-based life cycle analysis (LCA).
Percent Recycled Content: Certified by suppliers—this supports both ESG and procurement compliance.
Failure Frequency and Durability: Key to identifying need for more resilient design or materials.
Recovery Logistics Complexity: Fewer suppliers and simpler repair pathways reduce time/cost risk.

Step 4: Set Scenario-Driven Targets

Model your targets off credible threat assessments (e.g., 100-year flood, wildfire risk) and align to board-level business goals—such as “80%+ recycled content in all major repairs by 2027” or “Restore Tier 1 assets within 12 hours.”

Step 5: Roll Up Into a Recovery Value Index

This composite score uses weighted KPIs, supports site-to-site comparisons, and feeds into larger enterprise risk/ESG dashboards. You can customize weights for different risk exposures or asset classes.

Worked Example: Flood Recovery for a City Utility

Let’s say a city utility previously judged itself on annual uptime (99.6%, or 31 hours of downtime/year), feeling satisfied with that reliability. However, after aggressive storms in back-to-back years, they reconstruct their incident logs:

Event Analysis: The latest major flood caused 8 hours of citywide downtime—a swift result compared to a peer average of 20 hours. The direct recovery cost was $260,000. However, by specifying 92% recycled steel and aluminum for critical replacements, the embodied CO₂ emissions per event fell to 22 tons (vs. 70 tons historically), and local suppliers fulfilled orders within 30% shorter lead times.
Result: Over the following 18 months, these “resilient by design” repairs led to zero repeat failures. CFOs now report to the board not just about restored uptime, but actual value delivered—faster recovery, documented carbon reduction, and insurance savings, which support green bond issuances and positive ESG ratings.

The playbook: how to move from KPI theory to field execution

The biggest mistake infrastructure owners make is assuming that better resilience reporting starts with a software purchase. It usually does not. It starts with a decision: which outcomes matter enough to measure, who owns them, and what gets counted after a disruption. NIST’s resilience work has long pushed practitioners toward recovery time as a core metric because time to restore function is understandable across technical, executive, and community audiences. DOE guidance on grid resilience makes the same point in another way: traditional reliability metrics such as SAIDI, SAIFI, and CAIDI are useful, but resilience work needs metrics across hazard phases, including recovery costs, asset damage, and restoration performance during major events.

That means the playbook begins by defining critical functions, not assets in isolation. A water utility does not exist to own pipes. It exists to provide safe water pressure and potable supply. A transit authority does not exist to own rolling stock. It exists to restore passenger movement and network throughput. A port does not exist to own cranes. It exists to resume cargo handling, gate processing, and vessel turnaround. This sounds obvious, but it is where many KPI programs fail. They measure equipment status, not recovery of service. NIST’s community resilience framework emphasizes exactly this kind of function-based recovery thinking. FEMA’s National Disaster Recovery Framework takes the same view at the system level, focusing on restoring and revitalizing social, economic, natural, and built systems rather than simply checking off repaired components.

The first step is to build a resilience service map. For each critical service, identify the minimum acceptable service level, the target time to restore that level, the full-service restoration target, the upstream dependencies, and the repair bottlenecks that have historically slowed recovery. This map becomes the bridge between operational teams and finance teams. It lets leadership answer questions such as: Which dependencies actually drive downtime? Which materials have the longest lead times? Which repairs create the highest carbon burden? Which assets fail once, and which create repeat restoration costs for years? DOE’s resilience materials repeatedly note that resilience cannot be understood from a single index alone. It requires a set of descriptors that show how infrastructure withstands, absorbs, and recovers from disruptive events.

The second step is to build an event taxonomy. Too many operators throw all outages into one bucket. A feeder fault on a normal operating day is not the same as a substation outage during a major flood. A telecom backhaul interruption caused by wind is not the same as a wildfire-driven corridor shutdown. Recovery value depends on context. Your dashboard should classify events by hazard type, scale, service impact, repair pathway, and restoration constraints. That makes year-on-year benchmarking more honest. It also prevents a high uptime number from hiding catastrophic weakness during major events, which is one of the central flaws in legacy dashboards. DOE’s current utility resilience planning work explicitly separates metrics across hazard types and phases for this reason.

The third step is to redesign procurement around recovery, not only cost. This is where the article’s recovery value argument becomes operational. If a replacement material has a lower sticker price but requires long-distance sourcing, higher embodied carbon, slower fabrication, and greater repeat-failure risk, it may destroy recovery value. By contrast, certified recycled steel, aluminium, and other circular inputs can improve availability, shorten procurement cycles, and lower emissions. The International Aluminium Institute reports that recycled aluminium requires about 8.3 gigajoules per tonne versus about 186 gigajoules per tonne for primary aluminium, a roughly 95.5% energy saving. Worldsteel notes that all steel production uses scrap and that steel production from scrap in electric-arc furnaces can run with up to 100% scrap input, making scrap a central resilience and emissions lever.
The fourth step is to treat post-event reviews as a KPI gold mine. Every disruption should trigger a structured recovery value review within thirty days. The review should capture not only outage duration and repair spend, but also material lead time, logistics friction, number of suppliers involved, embodied carbon of replacement materials, waste generated, mutual-aid use, emergency freight premiums, and repeat-failure probability. The reason is simple: resilience spending decisions get better when they are grounded in actual recovery friction, not assumptions. EPA’s work on utility resilience and its Water Network Tool for Resilience reinforce this idea by focusing on damage estimation, efficient repair strategies, preparedness, and priority actions rather than on narrow compliance reporting alone.

The fifth step is executive integration. Resilience KPIs should not live only in operations. They belong in quarterly capital planning, insurance renewal discussions, bond and lender conversations, and sustainability governance. The macro case for this is already established. The World Bank and GFDRR estimate that the net benefit of more resilient infrastructure in low- and middle-income countries is about $4.2 trillion, or about four dollars in benefit for every dollar invested. UNDRR’s 2025 Global Assessment Report goes even broader, arguing that disaster costs now exceed $2.3 trillion annually when cascading and ecosystem costs are included. If the external cost of disruption is that large, the internal KPI system cannot stop at uptime.

Measurement: the KPI stack that actually matters

The best resilience measurement systems work in layers. They do not rely on one score. They move from base operational metrics to financial metrics, then to carbon and circularity metrics, then to strategic portfolio metrics. That structure matters because it gives each audience what it needs. Operators need speed and fault visibility. CFOs need cost, avoided loss, and capital prioritization. Sustainability teams need emissions and material traceability. Boards need a clear signal that connects all three. DOE’s resilience guidance for utilities reflects this multi-layer approach by combining standard outage metrics with build metrics, impact metrics, and community and equity metrics.

At the operational layer, the essentials remain outage count, downtime by event, mean time to recovery, mean time to partial recovery, service availability during major event days, and repeat failure rates. These remain useful because they tell you how fast a system returns to function and how often repaired assets fail again. But they need to be separated by event class. A network that performs well under blue-sky conditions but collapses during flood, fire, or heat stress has weak resilience, even if its annual uptime looks impressive. This distinction is why DOE and NIST materials keep returning to recovery time and hazard-specific performance as the core of resilience measurement.

At the financial layer, the essential metrics are recovery cost per event, avoided loss from pre-positioned resilience measures, emergency logistics premium, contractor surge premium, uninsured loss exposure, and lifecycle repair cost over a multi-year window. This is where recovery value becomes visible to finance teams. It becomes possible to show that the cheapest restoration choice was not actually the cheapest. Swiss Re reported that global economic losses from disaster events in 2024 reached USD 318 billion, with 57% uninsured. That protection gap matters because it means a large share of recovery cost will continue to land directly on operators, governments, and communities.

At the carbon and materials layer, the core measures are embodied carbon per recovery event, percentage recycled content in repairs, local or regional material sourcing share, waste diverted from landfill, and carbon intensity per unit of restored service. These measures are no longer side notes. They increasingly affect access to capital, procurement scoring, investor confidence, and insurer trust. The EU Taxonomy remains one of the most important market frameworks here because it provides a common system for identifying environmentally sustainable activity and reducing greenwashing risk. Even where reporting regimes differ by jurisdiction, the direction of travel is clear: infrastructure owners are expected to show that climate adaptation and climate mitigation are being managed together.

At the strategic layer, the most useful measure is a composite Recovery Value Index, but only if the reader can still drill into its parts. The index should include restoration speed, total cost, embodied carbon, circular material share, and repeat-failure risk. Different sectors can weight these differently. A hospital campus may put the highest weight on time to clinical function. A municipal bridge authority may weigh lifecycle durability more heavily. A port operator may give stronger weight to cargo throughput restored within the first 24 to 72 hours. The point is not to create a magic score. The point is to create a board-level number that does not flatten the evidence behind it. NIST’s resilience work has consistently supported this logic by tying performance goals to time-based recovery targets across systems and functions.

A useful rule is this: if a KPI cannot influence capital planning, procurement specifications, emergency contracts, or insurer discussions, it is not doing enough work. That is the standard mature resilience programs should use.

Case scenarios: what good resilience KPI practice looks like on the ground

Consider a coastal water utility facing repeated flood and storm-surge damage to pumping, switchgear enclosures, and sections of above-ground pipework. Under the old model, leadership might report 99.5% annual uptime and assume performance is satisfactory. Under a recovery value model, the picture changes. The utility starts tracking event-class downtime, hours to minimum service restoration, procurement lead time for replacement metals, emergency freight spend, and embodied carbon of each recovery event. It then redesigns its spare strategy around regionally available materials with higher recycled content and shorter replacement cycles. EPA’s utility resilience research and WNTR work support exactly this kind of scenario-based planning, where the value lies in estimating damage, identifying efficient repair strategies, and prioritizing resilience actions before the next event.

Now imagine a power distribution utility exposed to hurricanes and non-winter storms. Under a narrow reliability frame, it tracks SAIDI, SAIFI, and CAIDI. Under a resilience frame, it expands to include outage recovery cost, equipment replaced by damage cause, vegetation-related event severity, and performance during major event days. DOE’s guidance for Bipartisan Infrastructure Law grid resilience metrics explicitly points utilities toward this broader approach, including automated feeder performance, annual outage recovery cost, and devices replaced because of equipment damage or failure. That shift matters because it pushes utilities to assess how systems recover under stress, not only how often they interrupt customers under routine conditions.

Take a third scenario in urban transport or municipal public works. A city replaces damaged bridge railings, signal structures, and lighting poles after repeated storm events. In the legacy model, the city might measure repair completion and budget variance. In the recovery value model, it also tracks local material availability, carbon intensity of replacement materials, frequency of repeat interventions, and the effect of repair choices on reopening times. Over time, this lets the city compare whether one repair specification led to faster network reopening, fewer callbacks, and lower emissions than another. This is exactly the kind of practical value behind the World Bank’s resilient infrastructure thesis: the upside of resilience is not abstract. It shows up in avoided disruption, stronger service continuity, and better long-run economics.

There is also a procurement scenario that many organizations miss. Picture two bids for replacement aluminium components after a severe weather event. Bid A is lower on invoice price but depends on primary metal, overseas shipment, and a six- to eight-week lead time. Bid B is slightly higher on invoice price but uses recycled aluminium with materially lower energy intensity and a shorter regional supply route. If Bid B shortens restoration by even a few days for a critical asset, its total recovery value can be far higher once downtime costs, emergency workarounds, and avoided emissions are counted. The International Aluminium Institute’s data on secondary aluminium energy savings makes this logic easy to defend internally.

The most important lesson from these scenarios is that resilience KPIs should change behavior. If the same materials, same contracts, same suppliers, and same capital decisions remain in place year after year, then the dashboard is decorative, not strategic.

Frequently asked questions

One common question is whether uptime is still important. Yes, absolutely. Uptime still matters because it reflects service continuity during normal and moderate conditions. The problem is that uptime alone can hide poor recovery performance during severe events. Resilience programs should keep uptime, but place it beside recovery time, recovery cost, and recovery carbon. DOE and NIST both make clear in different ways that resilience measurement must go beyond routine reliability indicators.

Another frequent question is whether a Recovery Value Index is too complicated for public-sector or utility use. It does not have to be. Most organizations already collect part of the needed data. The issue is that the data sits in separate systems: outage logs, ERP records, procurement files, contractor invoices, and sustainability worksheets. The index becomes practical when you start with five metrics only: time to minimum service, time to full service, total event recovery cost, embodied carbon of repairs, and percentage circular or recycled material share. That first version is enough to improve decisions. More detail can be added later.

A third question is whether low-carbon materials actually improve resilience or whether they only help ESG reporting. In many cases they do both. Recycled and circular materials can reduce embodied emissions, but they can also improve recovery by widening regional supply options, reducing dependence on high-risk primary supply chains, and supporting pre-negotiated local sourcing. That does not mean every recycled material choice is automatically superior. It means material selection should be scored on restoration performance as well as carbon. The data on recycled aluminium and scrap-based steel production makes clear that the emissions opportunity is real. The resilience upside comes when procurement and logistics are designed around it.

A fourth question is whether this only applies to rich cities or large private operators. It does not. The World Bank’s resilience work is centered heavily on low- and middle-income countries because disruption costs hit hardest where infrastructure systems are already stressed. The argument for better resilience KPIs becomes stronger, not weaker, when capital is constrained. If you cannot afford to rebuild twice, you need better measures for choosing the first intervention correctly.

A fifth question is whether climate disclosure uncertainty in the United States weakens the case for resilience KPIs. No. It changes the compliance landscape, but not the underlying business need. The SEC’s March 2025 vote to end defense of its climate disclosure rules means the federal securities rule is not a stable pillar to lean on in the same way your draft implied. But insurers, lenders, procurement authorities, rating agencies, public bond markets, and EU-aligned capital channels still care deeply about climate risk, adaptation, and credible reporting. The business case for strong resilience metrics survives even when one regulatory path becomes uncertain.

The toolkit: what every infrastructure owner should put in place now

A serious resilience KPI toolkit has six parts.

The first is a resilience function register. This is a simple but disciplined list of critical services, recovery thresholds, full restoration targets, dependencies, and asset-to-service linkages. Without this, teams will continue to measure components instead of public or customer function.

The second is an event review template. After every major disruption, teams should capture duration, partial restoration time, total restoration time, direct repair cost, contractor and logistics premiums, material lead times, substitute materials used, waste generated, circular content share, and repeat-failure risk. This document should be mandatory, light enough to finish, and good enough to inform the next capital cycle.

The third is a supplier resilience scorecard. Every supplier of critical repair materials should be scored on lead time, geographic concentration, recycled content, certification quality, emergency surge capacity, and history during recent events. When recovery depends on a single distant source, that is a resilience problem even before the next hazard arrives.

The fourth is a carbon-and-materials data layer. Many organizations delay this because they think it requires perfect lifecycle assessment from day one. It does not. Start with material category, tonnage, source region, and a basic embodied carbon factor. Refine over time. What matters at first is directional decision quality.

The fifth is a scenario library. This should include your five to ten most plausible severe-event cases: flood, wildfire smoke plus heat, storm surge, major wind event, extended grid outage, telecom corridor disruption, drought-related water stress, and supply-chain interruption. Each scenario should include the expected service impacts, critical dependencies, target recovery windows, and likely material bottlenecks. EPA, DOE, FEMA, and NIST resources all support this type of scenario-based resilience planning.

The sixth is a board-ready dashboard. This should fit on one page. It should show, at minimum, trendlines for major-event downtime, mean recovery time, recovery cost per major event, repeat-failure rate, embodied carbon per recovery event, and circular material share in repairs. It should also show the top three assets or systems with the worst recovery value profile. Senior leaders do not need fifty metrics. They need the six that change money, risk, and trust.

If you want a practical maturity path, use this one. In the first ninety days, build the function register, event template, and first-cut dashboard. In six months, add supplier scoring, carbon factors, and scenario targets. Within twelve months, connect the KPI stack to capital planning, insurance renewals, and procurement language. That is how resilience reporting moves from a presentation topic to a management system.

Conclusion: the future belongs to organizations that can prove recovery value

The old infrastructure story was simple: keep the lights on, keep the water flowing, keep the network up. That story is no longer enough. In a world of more frequent climate shocks, tighter capital, harder insurance markets, and sharper public scrutiny, the real question is not whether assets fail. The real question is how intelligently, how quickly, and at what total cost they recover.

That is why resilience KPIs are shifting from uptime to recovery value. Uptime tells you whether service was available. Recovery value tells you whether the organization is prepared for the world it actually operates in. It measures speed, cost, carbon, circularity, and durability together. It gives finance teams a better basis for capital allocation. It gives operations teams a clearer target for post-event performance. It gives sustainability teams proof that adaptation and mitigation are being managed together. It gives insurers, lenders, and public stakeholders something far more credible than general claims about preparedness.

The economic case is already strong. The World Bank’s work shows large net benefits from resilient infrastructure investment. UNDRR’s 2025 reporting shows the scale of loss the global economy is already carrying. Swiss Re’s catastrophe data shows how much of that burden remains uninsured. The organizations that win in this environment will not be the ones with the prettiest dashboard. They will be the ones that can show, with evidence, that their next recovery will be faster, cleaner, cheaper, and more durable than their last.

Connect

Your trusted partner for scrap metal procurement.

CONTACT

About

haroon@tdcventures.com

+1-307-655-7593

NEWSLETTER