Help support open science today.
Donate Now

Watch the Demystifying the Data Anonymization Process: Myths and Best Practices Webinar

July 12th, 2022,

Many well-meaning and ambitious researchers get tangled in the process of data anonymization. A thoughtful approach to anonymization is critical to achieving regulatory compliance, protecting individual privacy while maintaining the usefulness of the data for other researchers. Planning for data protection across the research lifecycle both mitigates risks to research subjects and facilitates sharing of data for replication and reuse.

Many researchers have expressed their desire to share data, but also concerns about complying with privacy standards and the need to limit liability. COS felt it was important to raise these concerns through a thoughtful and informed discussion.

As such, COS hosted a webinar on June 29, 2022, where Dr. Micah Altman shared myths about information privacy that deter researchers from sharing data, best practices for anonymization, and the treatment of personally identifiable information in datasets.


Speaker:

  • Micah Altman, Research Scientist, Center for Research in Equitable and Open Scholarship, Massachusetts Institute of Technology

“There’s more information from individuals that’s available than ever before,” Altman said. “The laws, technologies, and uses are all changing rapidly.”

Altman kicked things off by discussing data protection issues in the headlines, from breaches of anonymity to controversies over consent to hacking and identity theft. Altman explained that information threats are changing in a variety of ways in the current environment with information traveling wider and faster, increases in cyber attacks, common storage platforms exposing information in new ways, and the impact of the accumulation of privacy leakage.

“Regardless of the form of the information – whether it’s a map or a picture or a story – we can accumulate privacy losses,” Altman said.

Altman discussed key concepts for data protection in three domains: policy concepts (privacy, informational harms), technical concepts (information security, anonymization), and legal concepts (personally identifiable information, statistical purposes).

“We can think about data protection at every [research] lifecycle stage, from collection through post-process,” Altman said, outlining the following stages and concepts to keep in mind as part of research workflows.

  • Research Design: Evaluate privacy and security of measurement and data collection; Identify legal requirements for information management; Develop lifecycle data management plan
  • Research Implementation: Data collection and transmission; Transformation; Retention
  • Research Analysis: Internal data sharing and use; Disclosure limitation from research results
  • Post Analysis: Auditing; Adverse event monitoring; Correction; Data destruction

“How we protect the data will be a factor of both how identifiable the data is, what the risk of learning about individuals from that data is, and the sensitivity of that data,” Altman said.

Altman noted that modern approaches to privacy require planning for tiered modes of access, ranging from all users to gated users to vetted users. Further, he defined best practices for researchers, including:

  • Articulating privacy principles
  • Identifying the legal and institutional requirements applying to your specific research
  • Enumerating significant informal harms that are part of your research
  • Developing a principled lifecycle data protection plan
  • Selecting and applying privacy protection tools and methods

“People participate in research because they see it as a public good,” Altman said. “We have an ethical responsibility both to protect their privacy and make sure the benefits of the research compensate for the inevitable privacy loss.”

Access slides and resources.

Recent Posts