An AI model trained on data that looks real but won’t leak personal information


In this article, IBM unveils a new method for bringing privacy-preserving synthetic data closer to its real-world analog to improve the predictive value of models trained on it.

A revolution in how businesses handle customer data could be around the corner, and it’s based entirely on made-up information.

Banks, health care providers, and other highly regulated fields are sitting on piles of spreadsheet data that could be mined for insights with AI — if only it were easier and safer to access. Sharing data, even internally, comes with a high risk of leaking or exposing sensitive information. And the risks have only increased with the passage of new data-privacy laws in many countries.

Synthetic data has become an essential alternative. This is data that’s been generated algorithmically to mimic the statistical distribution of real data, without revealing information that could be used to reconstruct the original sample. Synthetic data lets companies build predictive models, and quickly and safely test new ideas before going to the effort of validating them on real data.

Read more on their website.