add_outlier5std_column
calculates and appends outlier flags for specified
indicators in a dataset. An outlier is defined as a value falling outside the
range defined by the median ± 5 times the Median Absolute Deviation (MAD).
This method is applied on a per-group basis, allowing flexibility for grouping
by columns like district
.
Value
A tibble with additional columns for each indicator, named
{indicator}_outlier5std
, containing a value of 1
if the observation is
an outlier and 0
otherwise.
Details
Median and MAD Calculation: For each indicator, the median and MAD are calculated using data from years prior to the most recent year in the dataset. The last year is excluded to prevent potential contamination of the baseline statistics.
Outlier Definition: A value is flagged as an outlier if it falls outside the range: Lower Bound = median - 5 * MAD Lower Bound = median + 5 * MAD
Grouping: Calculations are performed separately for each group specified by the
group_by
parameter.Generated Columns: For each indicator, the function generates an outlier flag column named
{indicator}_outlier5std
. The column contains1
if the value is an outlier and0
otherwise.
See also
add_mad_med_columns()
for computing and appending the median and MAD columns.