Calculate Outliers Summary by Year — calculate_outliers

calculate_outliers_summary provides an annual overview of data quality by summarizing extreme outliers for health indicators. Outliers are identified based on robust statistical metrics (Median Absolute Deviation, MAD) and flagged when they deviate significantly (beyond five MADs from the median).

Usage

calculate_outliers_summary(
  .data,
  admin_level = c("national", "adminlevel_1", "district"),
  include_year = TRUE
)

Arguments

.data: A data frame with district-level health indicators. This data frame must include a district and year column, along with indicator columns for calculating outliers. Outlier flags should be computed prior and named with the suffix _outlier5std (e.g., anc1_outlier5std where 1 indicates an outlier and 0 indicates non-outliers).
admin_level: Character. The administrative level at which to calculate reporting rates. Must be one of "national", "adminlevel_1" or "district".
include_year: Integer. Whether to include the year

Value

A cd_outliers_summary object (tibble) with:

Each indicator's non-outlier percentage (_outlier5std columns).
Overall non-outlier summaries across all indicators, vaccination indicators, and tracers.

Details

Outlier Detection: Outliers are calculated based on Hampel’s robust X84 method, using the Median Absolute Deviation (MAD). This method identifies values that exceed five times the MAD from the median, reducing the influence of extreme values on the analysis.
Annual Non-Outlier Rate: For each indicator and each year, the function calculates the percentage of non-outliers. Additionally, the function aggregates the non-outlier rates across all indicators, as well as vaccination-only and tracer-only indicators, providing an overall data quality summary.

Examples

if (FALSE) { # \dontrun{
  # Check for extreme outliers in indicator data
  calculate_outliers_summary(data)
} # }