calculate_outliers_summary provides an annual overview of data quality by
summarizing extreme outliers for health indicators. Outliers are identified
based on robust statistical metrics (Median Absolute Deviation, MAD) and
flagged when they deviate significantly (beyond five MADs from the median).
Usage
calculate_outliers_summary(
.data,
admin_level = c("national", "adminlevel_1", "district"),
include_year = TRUE
)Arguments
- .data
A data frame with district-level health indicators. This data frame must include a
districtandyearcolumn, along with indicator columns for calculating outliers. Outlier flags should be computed prior and named with the suffix_outlier5std(e.g.,anc1_outlier5stdwhere 1 indicates an outlier and 0 indicates non-outliers).- admin_level
Character. The administrative level at which to calculate reporting rates. Must be one of
"national","adminlevel_1"or"district".- include_year
Integer. Whether to include the year
Value
A cd_outliers_summary object (tibble) with:
Each indicator's non-outlier percentage (
_outlier5stdcolumns).Overall non-outlier summaries across all indicators, vaccination indicators, and tracers.
Details
Outlier Detection: Outliers are calculated based on Hampel’s robust X84 method, using the Median Absolute Deviation (MAD). This method identifies values that exceed five times the MAD from the median, reducing the influence of extreme values on the analysis.
Annual Non-Outlier Rate: For each indicator and each year, the function calculates the percentage of non-outliers. Additionally, the function aggregates the non-outlier rates across all indicators, as well as vaccination-only and tracer-only indicators, providing an overall data quality summary.