Resources for Statistical Disclosure Control

This page contains a range of resources that we have developed during various projects with the UK Office for National Statistics.

Example datasets used to develop and test cell suppression algorithms.

Code

Statistical Disclosure Control Frameworks .

We have recently developed a complete new open source framework for solving large scale statistical disclosure control problems in parallel. This is intended to support a range of different problems and mathematical models. Currently we have implementations of the incremental and grouping algorithms for solving the Cell Suppression problem via modifications of Fischetti and Salazar’s incremental linear model. We also have built-in links to a range of different HyperGraph partitioning algorithms for tackling huge-scale problems. We are keen to work with the community to validate this, and to extend it to new problems, and to new formulations of the Cell Suppression Problem. Please contact Jim Smith (james.smith@uwe.ac.uk) if you would like access to this framework

Surrogate Models and Incremental Genetic Algorithm used for GECCO 2016 paper

The code used to run the experiments described in our GECCO 2016 paper may be found here. This includes a version of the genetic algorithm which attempts to find the best sequence in which to solve the Cell Suppression Problem using the incremental approach.

Older Code

UWE_ExternalAttackerOnJJ.exe This program uses the COIN-OR open source mathematical solver to find the upper and lower calculable bounds for each primary cell (there is no need to install a separate solver). When run from a batch file the first parameter is the name of the .jj file to be validated and the second is a tag (any string of text that is used to identify the test). The output is written to the file intermediate.txt (E indicates the number of exposed primary cells, Ef the number whose values can be calculated exactly and Ep whose values can be calculated within their protection limits). The corresponding files contain the cell numbers of the exposed primary cells. This program is best used on tables with less than or equal to 40,000 cells.
UWE_ExternalUnpickerOnJJ.exe This program uses an unpicking algorithm to find the upper and lower calculable bounds for each primary cell (there is no need to install a separate solver). When run from a batch file the first parameter is the name of the .jj file to be validated and the second is a tag (any string of text that is used to identify the test). The output is written to the file intermediate.txt (Eu indicates the number of primary cells that can be unpicked, Euf the number whose values can be unpicked exactly and Eup whose values can be unpicked to within their protection limits). The corresponding files contain the cell numbers of the exposed primary cells. This program will not identify all the primary cells that can be exposed, only those that can be unpicked. This program can be used to validate tables with over 1,000,000 cells and it runs very fast.
If you would like a copy of any of these programmes please email Martin Serpell (martin 2 dot serpell at uwe dot ac dot uk) or Jim Smith ( james dot smith at uwe dot ac dot uk) as we wish to keep a record of who is using this program.

Cell suppression algorithms papers.