Welcome to metaprivBIDS documentation!
This documentation provides an overview of the metaprivBIDS Graphical User Interface, including installation instructions, basic usage, and tutorial to get you started.
metaprivBIDS provides tool for data risk assesment, including methods:
- K-anonymity [1]
Searching each record in the dataset to see if they are indistinguishable from at least k − 1 other records with respect to a set of quasi-identifiers.
- ℓ-diversity [2]
Looking at the diversity of the sensitive attribute. A dataset satisfies l-diversity if each group of indistinguishable records has at least l diverse values for the sensitive attribute(s), preventing easy inference of sensitive information.
- Sample Unique Detection Algorithm (SUDA) [3]
The SUDA (Sample Unique Detection Algorithm) identifies records in a dataset that are unique based on a combination of quasi-identifiers. It works by flagging records with rare attribute combinations, indicating a higher risk of re-identification.
- Personal Information Factor (PIF) [4]
The personal Information Factor (PIF) measures the risk of re-identification by analyzing how a record’s quasi-identifiers deviate from the overall distribution in the dataset. A higher PIF suggests that the record’s combination of attributes is rare, making it more likely to be uniquely identifiable within the population.
- K-Global
K-Global attempts to capture individual variables K-anonymity contribution in the context of all other quasi-identifiers. This is done by evaluating the difference in unique row, given the removal of a given variable. To then account for the fact that e.g. continuous variables often result in more unique entries we normalise by the unique value counts of the column. Subsequently penalising variables with few unique values but a high impact on unique rows.
License
metaprivBIDS is licensed under the MIT License (https://opensource.org/licenses/MIT).