Open Source Projects and the GDPR
In 2018, the European Union implemented the General Data Protection Regulation (the GDPR), a sweeping privacy law that touches almost every corner of the planet directly or indirectly. The law exists because for too long, organizations of all types and in all countries were taking advantage of the wealth of personal data generated by individuals.
In some cases, the EU argued that companies were outright abusing personal data for their own gain and without regard to the risks to the rights of the individual.
The goal of the GDPR is to hand back privacy rights to the people (referred to as data subjects) and emphasize accountability among data controllers and processors.
But what does the GDPR mean for open source projects and communities and their projects?
They typically eschew any type of 'organization,' which makes them notoriously harder to regulate. However, if you use open source code in your projects, then you have new obligations under the law.
What is an Open Source Project?
Open source refers to a project that's publicly accessible and fits within a broader set of values that relies on collaboration, transparency, meritocracy, and generally open exchanges and development.
The origin of the term reverts back to the early days of open source software, but today, it's a common mode of creating for both programmers and non-programmers.
In theory, the practices of transparency and open exchange make open source principles and GDPR rules fast friends. The GDPR also believes in transparency on behalf of data controllers and processors as well as accessibility for data subjects. The rights of the data subject are built into the premise of data subjects being able to get into their own information.
However, where the two ideas collide is in the concept of Privacy by Design. Article 25 of the GDPR requires data processors and controllers to consider privacy as a core part of design, and not something to be tacked on at the end of a build. Article 25 of the legislation says:
"The controller shall, both at the time of the determination of the means of processing and at the time of processing itself, implement appropriate technical and organisational measures, such as pseudonymisation, which are designed to implement data-protection principles..."
The nature of open source projects rejects this. Both the code and its vulnerabilities are available for the world to see. In theory, open source code is no big deal as long as it doesn't reveal any personal identifying information about an EU data subject.
However, the use of open source code by data controllers and processors (that's you) does fall under the umbrella of the GDPR in quite a unique way.
Why Are Open Source Projects Unique in the Realm of Privacy?
The primary clash between GDPR principles and open source projects lies partly in the issue of known vulnerabilities existing in open source programs.
Vulnerabilities are the 'building blocks' of software, and they can make up to 80% of the code base.
Exploring known vulnerabilities is the beauty of open source projects. But it's also its GDPR downfall because hackers wait in the wings to see not only what vulnerabilities exist but how experts intend to go about patching them.
Because all tasks happen in an open repository, the answers are in the open and there's no need to go through the code on their own.
In other words, using open source code at will and with no other precautions is a massive security risk if you're also processing the data of EU residents.
The concern isn't baseless. The Equifax hack of 2017 was the product of hackers targeting a known vulnerability in the company's web application, which used Apache Strust 2 open source framework. It resulted in the GDPR's worst nightmare: hackers stole 145.9 million peoples' personally identifiable information.
How the GDPR Impacts Open Source Project Security Obligations
There are two articles of the GDPR that specifically apply to open source projects: Article 25 (Privacy by Design, as mentioned previously) and Article 32 (Security of Processing).
The GDPR requires Privacy by Design as per Article 25, but does it mean that open source is now out of the question? Very likely not, but data controllers and data processors who must comply with the legislation do need to now be much more careful in how they integrate open source components with their products.
To meet GDPR requirements, you need to ensure you don't use open source code with known vulnerabilities. But that's not easy: 1 in 18 component downloads include known security vulnerabilities and 84% of projects don't fix them.
Compliance often means running a Software Composition Analysis (SCA) tool that compares the open source components you're interested in with the known vulnerabilities database. The comparison will let you know whether the piece of code you hope to use is GDPR-compliant, and it will help enhance your own security and reduce your risk generally.
Automattic also provides a helpful security clause:
Remember that You Need Permission to Process Public Data
A big focus of the GDPR on the open source community has been on the security issue, but an overlooked facet of the law in this context is this: The GDPR doesn't distinguish between personal information in the public domain and personal information collected privately and directly from the individual.
The traditional interpretation of public domain data is that it's free to use as long as it's already out there. However, that belief applies to a piece of art that's no longer considered to be owned. It doesn't apply to personally identifying data, which can include data that identifies someone directly, indirectly, or when combined with other information.
As a result, it's important for any open source platform, community, forum, or company to respect the GDPR's rules for collecting and processing data from EU residents.
To start, you need a legal basis for collecting, storing, processing, or transferring any personal data, even if it's in the public domain. Personal data is any data from an EU data subject that can identify a person either directly or within context, or information that can identify someone indirectly or be linked to them.
If your regular activities involve any sort of personal data under the protection of the GDPR, then you'll need to comply with the privacy standards that now apply to data processors and controllers.
In practical terms, it means:
- Gathering consent before collecting any data
- Practicing data minimization
- Allowing data subjects access to their own data and exert their other GDPR-granted rights
- Storing and processing the data securely
- Erasing the data using proper procedures
Open Source Projects Can Be GDPR Compliant
Even though the open source concept and the GDPR have common goals, the GDPR does present a challenge to the use of open source code and programs. Companies who fall under the GDPR's purview must be vigilant about what code they use and how they use it.
In many cases, it means adopting additional tools to make sure the code is free from known vulnerabilities that could create a security risk and lead to a loss of privacy for data subjects.
If you run an open source community, you also have obligations to protect the privacy of your members by employing GDPR standards in all data collection and processing.
It's hard to overstate the need to comply with the GDPR. Failing to do so can result in real damage to your company's finances and your image. But it's important to remember what's important to both parties: The idea of providing the end user with the freedom and control they deserve online (and on their computers).