Migration Guide: Static to Dynamic Correlation

This guide helps you transition from the static correlation identifier system to the dynamic STAC-based correlation approach.

Overview

The Monty system is evolving from a static correlation identifier approach to a dynamic STAC query-based correlation system while maintaining backward compatibility.

What's Changing

Aspect	Old Approach	New Approach
Correlation Method	Pre-generated correlation ID	Dynamic STAC API queries
Flexibility	Fixed at ingestion time	Flexible at query time
Multi-hazard Support	Limited	Comprehensive
Query Complexity	Simple ID match	Complex CQL2 filters
Performance	Very fast (direct ID lookup)	Fast (indexed queries)

What's NOT Changing

The monty:corr_id field remains available and will continue to be populated
All existing APIs and integrations continue to work
The data model structure (events, hazards, impacts) remains the same

Why Migrate?

Benefits of Dynamic Correlation

Flexibility: Adjust correlation criteria based on your specific needs
Different time windows for different hazard types
Variable spatial tolerances
Custom hazard code mapping
Multi-hazard Events: Better support for complex disasters
Link cascading hazards (earthquake → landslide → flooding)
Track concurrent hazards (drought + heatwave)
Identify secondary impacts
Cross-classification Support: Work with multiple hazard taxonomies
Query using GLIDE, EM-DAT, or UNDRR-ISC 2025 codes
Find events regardless of which classification was used
No Pre-computation: Correlation happens at query time
No need to wait for correlation ID generation
Can re-correlate data with different criteria
Transparency: See exactly how items are being correlated
Understand the logic
Customize for specific use cases

When to Use Each Approach

Use Static Correlation IDs when: - ✅ You need maximum performance - ✅ You need deterministic, reproducible results - ✅ You're working with already-correlated datasets - ✅ You need simple integration

Use Dynamic Queries when: - ✅ You need flexibility in correlation criteria - ✅ You're analyzing multi-hazard events - ✅ You're working across multiple hazard classifications - ✅ You need to correlate new/streaming data - ✅ You want to explore different correlation strategies

Migration Path

Phase 1: Dual Support (Current)

Status: Both systems are fully supported

What you can do: - Continue using monty:corr_id for all existing workflows - Start experimenting with dynamic queries - Compare results between both approaches - Test performance for your use cases

Example - Finding Related Events:

# Old approach (still works)
def find_related_old(client, corr_id):
    return client.search(
        filter={
            "op": "=",
            "args": [{"property": "monty:corr_id"}, corr_id]
        },
        filter_lang="cql2-json"
    )

# New approach (now available)
def find_related_new(client, country, hazards, start_date, end_date):
    return client.search(
        filter={
            "op": "and",
            "args": [
                {"op": "a_contains", "args": [{"property": "monty:country_codes"}, country]},
                {"op": "a_overlaps", "args": [{"property": "monty:hazard_codes"}, hazards]},
                {"op": "t_intersects", "args": [{"property": "datetime"}, {"interval": [start_date, end_date]}]}
            ]
        },
        filter_lang="cql2-json"
    )

# Hybrid approach (recommended for migration)
def find_related_hybrid(client, corr_id=None, country=None, hazards=None, dates=None):
    """
    Try correlation ID first, fall back to dynamic queries.
    """
    if corr_id:
        results = find_related_old(client, corr_id)
        items = list(results.items())
        if items:
            return items

    # Fall back to dynamic correlation
    if country and hazards and dates:
        return list(find_related_new(client, country, hazards, dates[0], dates[1]).items())

    return []

Phase 2: Enhanced Dynamic Queries (Next 6-12 months)

Planned enhancements: - Optimized indexes for common query patterns - Query templates for standard correlation scenarios - Helper libraries in multiple languages - Performance monitoring and optimization

What you should do: - Start using dynamic queries for new integrations - Gradually migrate existing workflows - Provide feedback on query performance - Report any issues or limitations

Phase 3: Deprecation Notice (12-24 months)

Future considerations: - monty:corr_id will remain available but may not be the primary method - Documentation will emphasize dynamic queries - New features will be optimized for dynamic correlation - Static IDs will still be generated for backward compatibility

No breaking changes planned - existing code will continue to work.

Practical Migration Examples

Example 1: Simple Event Lookup

Before (using correlation ID):

def get_event_data(client, corr_id):
    search = client.search(
        filter={
            "op": "=",
            "args": [{"property": "monty:corr_id"}, corr_id]
        },
        filter_lang="cql2-json"
    )
    return list(search.items())

# Usage
items = get_event_data(client, "20241027-ESP-FL-2-GCDB")

After (using dynamic queries):

def get_event_data(client, country, hazard_type, event_date, tolerance_days=1):
    from datetime import datetime, timedelta

    # Parse event date
    dt = datetime.fromisoformat(event_date.replace('Z', '+00:00'))
    start = dt - timedelta(days=tolerance_days)
    end = dt + timedelta(days=tolerance_days)

    search = client.search(
        filter={
            "op": "and",
            "args": [
                {"op": "a_contains", "args": [{"property": "monty:country_codes"}, country]},
                {"op": "a_contains", "args": [{"property": "monty:hazard_codes"}, hazard_type]},
                {
                    "op": "t_intersects",
                    "args": [
                        {"property": "datetime"},
                        {"interval": [start.isoformat(), end.isoformat()]}
                    ]
                }
            ]
        },
        filter_lang="cql2-json"
    )
    return list(search.items())

# Usage
items = get_event_data(client, "ESP", "MH0600", "2024-10-27T00:00:00Z")

Hybrid Approach (recommended during migration):

def get_event_data(client, country, hazard_type, event_date, corr_id=None, tolerance_days=1):
    # Try correlation ID first if available
    if corr_id:
        search = client.search(
            filter={
                "op": "=",
                "args": [{"property": "monty:corr_id"}, corr_id]
            },
            filter_lang="cql2-json"
        )
        items = list(search.items())
        if items:
            return items

    # Fall back to dynamic queries
    return get_event_data_dynamic(client, country, hazard_type, event_date, tolerance_days)

Example 2: Impact Analysis

Before:

def get_impacts(client, corr_id):
    search = client.search(
        collections=["gdacs-impacts", "emdat-impacts"],
        filter={
            "op": "and",
            "args": [
                {"op": "=", "args": [{"property": "monty:corr_id"}, corr_id]},
                {"op": "in", "args": ["impact", {"property": "roles"}]}
            ]
        },
        filter_lang="cql2-json"
    )
    return list(search.items())

After:

def get_impacts(client, country, hazard_codes, start_date, end_date, impact_types=None):
    filter_args = [
        {"op": "a_contains", "args": [{"property": "monty:country_codes"}, country]},
        {"op": "a_overlaps", "args": [{"property": "monty:hazard_codes"}, hazard_codes]},
        {
            "op": "t_during",
            "args": [
                {"property": "datetime"},
                {"interval": [start_date, end_date]}
            ]
        },
        {"op": "in", "args": ["impact", {"property": "roles"}]}
    ]

    search = client.search(
        collections=["gdacs-impacts", "emdat-impacts", "idmc-gidd-impacts"],
        filter={
            "op": "and",
            "args": filter_args
        },
        filter_lang="cql2-json"
    )
    return list(search.items())

Example 3: Multi-Hazard Events

This scenario was difficult with static IDs:

# New capability: Find all events where flooding occurred after an earthquake
def find_cascading_events(client, country, primary_event_date, time_window_hours=72):
    from datetime import datetime, timedelta

    # First find earthquakes
    eq_date = datetime.fromisoformat(primary_event_date.replace('Z', '+00:00'))

    earthquakes = client.search(
        filter={
            "op": "and",
            "args": [
                {"op": "a_contains", "args": [{"property": "monty:country_codes"}, country]},
                {"op": "a_contains", "args": [{"property": "monty:hazard_codes"}, "GH0101"]},
                {
                    "op": "t_intersects",
                    "args": [
                        {"property": "datetime"},
                        {"interval": [
                            (eq_date - timedelta(hours=1)).isoformat(),
                            (eq_date + timedelta(hours=1)).isoformat()
                        ]}
                    ]
                },
                {"op": "in", "args": ["event", {"property": "roles"}]}
            ]
        },
        filter_lang="cql2-json"
    )

    # Then find subsequent flooding
    floods = client.search(
        filter={
            "op": "and",
            "args": [
                {"op": "a_contains", "args": [{"property": "monty:country_codes"}, country]},
                {"op": "a_overlaps", "args": [{"property": "monty:hazard_codes"}, ["MH0600", "MH0603"]]},
                {
                    "op": "t_during",
                    "args": [
                        {"property": "datetime"},
                        {"interval": [
                            eq_date.isoformat(),
                            (eq_date + timedelta(hours=time_window_hours)).isoformat()
                        ]}
                    ]
                },
                {"op": "in", "args": ["event", {"property": "roles"}]}
            ]
        },
        filter_lang="cql2-json"
    )

    return {
        "earthquakes": list(earthquakes.items()),
        "subsequent_floods": list(floods.items())
    }

Performance Considerations

Query Optimization Tips

Use Indexed Fields First:

# Good - country and datetime are indexed
{
    "op": "and",
    "args": [
        {"op": "a_contains", "args": [{"property": "monty:country_codes"}, "ESP"]},
        {"op": "t_after", "args": [{"property": "datetime"}, "2024-01-01"]}
    ]
}

Limit Result Sets:

search = client.search(
    filter=filter_dict,
    filter_lang="cql2-json",
    limit=100  # Always set a reasonable limit
)

Use Collections Parameter:

# Good - limits search to specific collections
search = client.search(
    collections=["gdacs-events", "glide-events"],
    filter=filter_dict
)

# Less efficient - searches all collections
search = client.search(filter=filter_dict)

Cache Common Queries:

from functools import lru_cache
from datetime import timedelta

@lru_cache(maxsize=100)
def get_events_cached(country, hazard, date_str):
    return get_events(client, country, hazard, date_str)

Performance Comparison

Operation	Static ID	Dynamic Query	Notes
Single event lookup	~50ms	~100ms	Dynamic query adds overhead
Related items (10-50)	~100ms	~150ms	Minimal difference
Complex multi-hazard	N/A	~500ms	New capability
Regional analysis (100s)	~500ms	~800ms	Dynamic query more flexible

Testing Your Migration

Validation Checklist

Test basic event lookup with both methods
Verify results match between static and dynamic approaches
Measure query performance for your use cases
Test edge cases (multi-country, multi-hazard events)
Validate data completeness
Update documentation and code comments
Train team members on new approach

Sample Test Code

def validate_migration(client, test_corr_id, expected_criteria):
    """
    Validate that dynamic queries return the same results as static ID.
    """
    # Get results using correlation ID
    static_results = client.search(
        filter={"op": "=", "args": [{"property": "monty:corr_id"}, test_corr_id]},
        filter_lang="cql2-json"
    )
    static_items = {item.id for item in static_results.items()}

    # Get results using dynamic query
    dynamic_filter = {
        "op": "and",
        "args": [
            {"op": "a_contains", "args": [{"property": "monty:country_codes"}, expected_criteria["country"]]},
            {"op": "a_overlaps", "args": [{"property": "monty:hazard_codes"}, expected_criteria["hazards"]]},
            {
                "op": "t_intersects",
                "args": [
                    {"property": "datetime"},
                    {"interval": expected_criteria["date_range"]}
                ]
            }
        ]
    }
    dynamic_results = client.search(
        filter=dynamic_filter,
        filter_lang="cql2-json"
    )
    dynamic_items = {item.id for item in dynamic_results.items()}

    # Compare
    print(f"Static ID returned: {len(static_items)} items")
    print(f"Dynamic query returned: {len(dynamic_items)} items")
    print(f"Match: {static_items == dynamic_items}")

    if static_items != dynamic_items:
        print(f"Only in static: {static_items - dynamic_items}")
        print(f"Only in dynamic: {dynamic_items - static_items}")

    return static_items == dynamic_items

# Test
validate_migration(
    client,
    "20241027-ESP-FL-2-GCDB",
    {
        "country": "ESP",
        "hazards": ["MH0600", "FL", "nat-hyd-flo-flo"],
        "date_range": ["2024-10-27T00:00:00Z", "2024-10-28T23:59:59Z"]
    }
)

Getting Help

Resources

Support Channels

GitHub Issues: monty-stac-extension
IFRC GO Platform: go.ifrc.org
Development Seed: developmentseed.org

Common Issues

Issue: Queries timing out
Solution: Add more specific filters, reduce time windows, or use collection parameter

Issue: Missing results
Solution: Check hazard code compatibility across classification systems

Issue: Too many results
Solution: Tighten temporal or spatial criteria

Issue: Array operators not working
Solution: Verify you're using a_contains, a_overlaps etc., not regular equality operators

Next Steps

Experiment: Try the examples in correlation_examples.md
Test: Run validation tests on your data
Measure: Compare performance for your use cases
Migrate: Gradually move to hybrid approach
Optimize: Tune queries based on your needs
Feedback: Report issues and suggestions

The migration is designed to be gradual and non-breaking. Take your time to experiment and find what works best for your use case.