Correlation Identifier
Note: This document describes the legacy static correlation identifier system. For the new dynamic STAC-based correlation approach, see: - STAC-Based Correlation Algorithms - Main algorithm documentation - Correlation Examples - Practical examples with code - Migration Guide - How to transition from static to dynamic correlation
The static correlation identifier described below is still generated and remains available for backward compatibility, but the new dynamic approach offers greater flexibility and better support for multi-hazard events.
This page describes the legacy algorithm used to generate the static correlation identifier for items in the Monty system.
Context
*The elements of context below are based on the article Connecting the dots: the importance of recognising multi-hazard events in disaster reporting.
In the past year, the world has experienced numerous severe disasters caused by multiple overlapping hazards. In February 2023, two severe earthquakes struck Syria and Turkey in quick succession, followed by two more powerful earthquakes and over a hundred aftershocks in the subsequent weeks. This disaster resulted in over 48,000 fatalities, with many people still missing. Similarly, in June 2022, Afghanistan was hit by an earthquake while already grappling with a multi-year drought, which was then followed by extreme rainfall and flooding in August. Pakistan also faced a series of unfortunate events throughout 2022. After a period of drought, a heatwave-induced glacier melt combined with a heavy rain season led to devastating flooding, landslides, and disease outbreaks. Additionally, the country contended with severe wildfires in May and June 2022.
The United Nations Office for Disaster Risk Reduction defines such complex events as multi-hazard events involving the simultaneous or sequential occurrence of two or more hazards and their potentially interrelated impacts. For example, the flash floods in Pakistan were likely exacerbated by the wildfires in other regions. Due to connections and feedback between multiple events, the combined impact of a multi-hazard event can be different from the sum of the impacts of multiple individual disasters.
This requires a shift from a fragmented to a more comprehensive approach to disaster reporting, including more frequent and consistent use of multi-risk terminology and adopting a broader definition of multi-hazard events. Improved recognition of multi-hazard events in disaster reporting will provide a more accurate representation of such complex disasters and their impacts, enhancing public awareness and understanding of multi-hazard events. Additionally, more attention to interconnected natural hazards in the reporting of official disaster data can support disaster risk science and management efforts.
Monty is a system that aims to address this challenge by providing a comprehensive and consistent approach to disaster data collection, management, and reporting. As such, the model used by Monty is designed to support the identification and tracking events, hazards, impacts, and responses from various sources. Its model must be flexible enough to cope with the heterogeneity of the data sources and the different representations of the disaster data.
Monty Approach: correlation identifier
The Monty system uses a correlation identifier to link the items in the system. This identifier is generated based on the metadata collected from the data sources.
Monty database collects 4 types of items that cover most of the disaster data from various sources:
- Event item: A document that represents a disaster event that has occured or is forecasted to occur. This document contains the information about the event, such as the date and time and location. It also contains the information about the hazards that affected the event by referencing hazard codes and linking to the related hazard items.
- Hazard item: A document that represents a process, phenomenon or human activity that may cause loss of life, injury or other health impacts. This document contains the information about the hazard, such as the name, description, a unique hazard code and severity level or magnitude.
- Impact item: A document that represents the impact of a hazard on an event. This document contains the information about the impact, such as the number of fatalities, injuries, and affected people.
- Response item: TBD
Important
Every source produces at least event items. They are necessary to provide the context for the other items. As described in the next section, the correlation identifier is generated based on the event item.
Event Pairing Algorithm
For any type of item, there is a link to an event that represents the context of the item. The event pairing algorithm is used to generate a correlation identifier for the items based on the event metadata. The correlation identifier is used to link the items in the system.
The event pairing algorithm is a function that takes the following parameters:
- Hazard codes: An array with the codes of the event hazards based on different hazard classification systems. The primary hazard code from the 2025 UNDRR-ISC Hazard Information Profiles is used as the main hazard identifier.
- Country code: The country code of the related event in the ISO 3166-1 alpha-3 format.
- Date and Time: The date and time of the related event.
The algorithm returns a string with the event id. The event id is a string with the following format:
Where:
{datetime}: The date and time of the event in the formatYYYYMMDDorYYYYMMDDThhmmssZ(ISO 8601).{country_code}: The country code of the related event in the ISO 3166-1 alpha-3 format.{hazard_code}: The primary hazard code from the 2025 UNDRR-ISC Hazard Information Profiles (e.g.,MH0600for River Flood,GH0101for Earthquake). For multi-hazard events, the first hazard code in the array is used. Legacy codes may use simplified forms (e.g.,FLfor flood).{episode_number}: A number that represents the episode of the event. This number is used to differentiate between events that have the same date, country, and hazard code. The episode number starts at 1 and is incremented by 1 for each new event with the same date, country, and hazard.
A reference implementation is provided with the pystac.monty module. TBD