Title: Privacy Ethics – A Big Data Problem
Speaker: Raghu Gollamudi
See the table of contents for more blog posts from the conference.
GPDR (General Data Protection Regulation) – took effect May 25, 2018
Data is exploding
- Cost of storing data so low that it is essentially free
- 250 petabytes of data a month. What comes ater petabytes?
- Getting more data when acquire other companies
- IOT data is ending up in massive data lakes
Sensitive information – varies by domain
- Usernames
- user base – customers could be sensitive for a law firm
- location – the issue with a fitness tracker identifing location of a military base
- purchases – disclosing someone is pregnant before they tell people
- employee data
changes over time – collecting more data after decision made to log
Privacy vs security
- privacy – individual right, focus on how data used, depends on context
- security – protect information, focus on confidentiality/accessibility, explicit controls
- privacy is an under invested market. Security is more mature [but still an issue]
Solutions
- culture
- invest more – GDPR fines orders of magniude higher than privacy budget
- include in perormance reviews
- barrier to entry – must do at least what Facebook does if in that space
- security – encrypt, Anonymization/pseudonyization, audit logs, store credentials in vault
- reuse – use solutions available to you
- design for data integrity, authorization, conservative approach to privacy settings
- include privacy related tasks in sprint
- design in data retention – how long do you need it for
- automation – label data (tag/classify/confidence score) So can automate compliance. Score helps reduce false positives
EU currently strictest privacy policy Germany and Brazil working on. There was a debate on whether it applies to EU citizens or residents. Mostly agreement that physical location matters
My take
I was expectng this to be more technical. There was a little about the implications of big data like automation. But it felt glossed over. I would have liked to see an example of some technique that involves big data. The session was fine. It covered a lot of areas in passing which is a good opening session – lets you know where to plan. I think not having the “what you will learn” session on the abstract made it harder to know what to expect. Maybe QCon should make this mandatory?