8 How To License Your Data
Optimize for breadth, FAIRness, & ethical re-use
“The world’s most valuable resource is no longer oil, but data”
- Kiran Bhageshpur, the Economist, 2017
If you do not suitably license your work by default others are restricted from re-using it without asking for your permission first. Under most jurisdictions you retain the copyright to your work by default so if you want to permit full re-use you must explicitly allow this, the best way to do this is usually by using an existing open license. Here is how to pick and use one.
8.1 Data
It is fairly common to release research data publicly under the CC-0 ‘public domain’ license in which no rights are reserved by the author and anyone may used the data for any purpose. Alternatively CC-BY can be an option, under this license anyone using the data is obligated to attribute the data to it’s author through e.g. citation in order to be in compliance with the copyright on this work. It is the norm to provide attribution of some kind even for CC-0 licensed data in the scientific community but this is not a legal obligation under the terms of the license, unlike CC-BY. These licences are those recommended in Wellcome’s data guidelines for authors.
It is commonly stated that research data is a public good and should be released with as few restrictions as possible, whilst requiring attribution is an additional restriction it increases the provenance of data by making its source known. Research data is a public good, research data with good metadata and provenance is a still greater good.
8.1.1 Images
Like software; images, figures and diagrams should come with licenses. If an image lacks a license the default is their creator reserves all rights (in most copyright regimes). Thus if you make a figure or diagram people will not be able to use it without exposing themselves to potential copyright claims unless you use a licence which explicitly permits the use of your work by others.
If you are in need of some freely licensed images to use in your own scientific diagrams, illustrations or figures bioicons is an excellent source.
The creative commons licenses are probably the best choice here. The base creative commons licence has a number of modifiers that can be applied in combination to it.
BY - by attribution, you must attribute the work to it’s originator in order to reuse it
SA - Share alike, if you redistribute the work or a derivative of it you must do so under the same licence as the original
NC - Non-commercial, you may not redistribute the work for commercial purposes
ND - No derivatives you may redistribute the work but only in unmodified form
For example the most restrictive combination would be CC-BY-SA-NC-ND, this work can only be redistributed if the author is credited, the license used is the same as the original, it is unchanged and it is not for commercial purposes.
You can use the Creative Commons License Chooser to find a suitable license and/or generate suitable attributions to creative commons content which appropriately link to the original work, and to the license text.
These licenses can apply to any multimedia, audio, video or other digital files that you produce and research products.
8.2 Software
- You should avoid publishing code without any accompanying license as the author reserves all rights by default in most copyright regimes. Consequently anyone using code with no associated licence is opening themselves up to copyright liability.
- Software produced by members of the HDBI should be licensed with a license approved by the Open Source Initiative (OSI), (or the more opinionated Free Software Foundation (FSF)), in accordance with the guidelines from the Wellcome trust.
8.2.1 Quick Primer on choosing a software license
Software licences can be placed into three broad categories, proprietary or copy right, permissive and copy left.
- In proprietary software the source code is not generally available, though some is ‘source available’ (this is not the same as ‘open source’). Thus it’s internal operations are not generally transparent to the end user, a state of affairs problematic for the transparency of the scientific process. Users of proprietary software lease or ‘buy’ permission to use the software under the terms of a license leases are not always paid monetarily, they are often paid in access to user data which can be monetized through services to the software’s customers.
- In permissively licensed software the source code is available and the user is free to do more or less whatever they like with it.
- ‘Copy left’ licensed software requires that if you distribute any derivatives of the original software you must publish the source code under the same, or a compatible, license.
8.2.1.1 Recommendations
- Copy left
- GPL v>=3.0 (General Public Licence) [My preferred license for non-network software]
- Use a GPL licence when you want to ensure that your software and any derivatives of it remain freely available to the community and cannot be re-packaged extended and re-sold under proprietary licenses. Paid services are still possible with GPL code e.g. hosting or additional development work under contract.
- AGPL (Affero General Public Licence) [My preferred license for software used over a network]
- AGPL is essentially identical to the GPL with an extra stipulation especially for software that runs on a server that others might use as a service. This licence requires that the source code must be available to anyone using the software over a network.
- LGPL (Lesser General Public Licence)
- The lesser GPL license permits a software library to be used in a proprietary application whilst keeping the library itself copy left
- The lesser GPL license permits a software library to be used in a proprietary application whilst keeping the library itself copy left
- GPL v>=3.0 (General Public Licence) [My preferred license for non-network software]
- Permissive
- Apache 2.0 permissive [My preferred permissive License]
- Patent and copyright are distinct areas of law and patents exist on software processes in some jurisdictions thus permissively licensed code for copyright purposes can still be in violation of patents. The Apache license grants a perpetual royalty free license to use any patents held by the licener that are used in the software. This only applies to patents held by the organisation licensing the software and thus cannot protect you from infringing on patents held by 3rd parties.
- ‘MIT’ (aka Expat)
- A short simple permissive license covering the software and it’s documentation it permits the use of the software essentially without restriction but with no warranty.
- Apache 2.0 permissive [My preferred permissive License]
I default to the use of the ‘copy-left’ or ‘share alike’ licenses as I regard these as the most ethical choice in most contexts. There are however reasons why you might not want to use these licenses. They can be an impediment to working with certain commercial partners whose business models make use of proprietary licensing which is not always compatible with copy left / share alike licensing. 3rd parties that offer proprietary commercial software products may avoid using your code if it has a copy left license like the GPL. This might be a problem if you for example you wrote a library that reads a particular type of file and a company wants to use your library to read that file type in a proprietary analysis tool that they sell licenses to. Thus if you are writing tools that you would like companies to be willing to include in software products with paid licenses you may want to opt instead for a permissive licence like the Apache 2.0.
Patents function somewhat differently to copyright despite the common conflation of these two distinct areas of law under the term IP (intellectual property). Prior disclosure of an invention or process that you wish to patent in any venue including a conference paper or online post can be an impediment to being granted a patent in prominent patent jurisdictions like the USA. Some other jurisdictions have less strict criteria for prior disclosure. This is in contrast to copyright where default presumption is that all rights are retained by the author. Patenting is an affirmative process there are generally fees associated with asserting a patent and they are subject to approval by the patent office in your jurisdiction. Not all jurisdictions have a concept of software patents as exists in the USA under the rubric of business method patents. If you are working on something that you are interested in patenting you should not publish anything revealing its patent-able aspects.
I would advise reviewing Wellcome’s guide on ‘intellectual property’ for relevant guidance from HDBI’s funder.
CLAs are agreements made by contributors to a open source software project. They cede the copyright claim of contributors on the code that they write to the organisation that is the steward of the project. Alternatively they may more narrowly provide for an agreement to permit the dual-licencing of code over which the original author retains copyright. This can facilitate the organisation’s ability to dual-license the code for commercial purposes. It also makes it easier for them to take a code base closed source and make future development proprietary as they not longer need the consent of all contributors to make changes to the licensing but can act unilaterally.
8.3 Retaining Rights
Prior to the submission of a work for publication authors should apply a suitable license such as a CC BY license to the work. This permits the retention of the rights by the authors to that work so that ‘author accepted manuscript’ can be freely redistributed by the authors under the terms of this license. For example to update pre-prints to match the ‘version of record’ including changes during peer-review. Taking this approach permits works to be released immediately, not following a potentially lengthy embargo period, as is now required by many funders.
This text, or similar, should be included in submitted manuscripts and alluded to in submission cover letters:
For the purpose of open access, the author(s) has(have) applied a Creative Commons Attribution (CC BY) licence to any Author’s Accepted Manuscript version arising from this submission.
To find out more about rights retention see the Plan S Rights Retention Strategy and Rights retention: A Primer from UKRN
I am not a lawyer this is not legal advice. If you have any questions about how any of these consideration apply to your work please consult with a suitable professional legal expert.*
8.4 Resources
- For a deeper dive into licensing checkout the chapter in the turing way on licensing.
- Creative Commons License Chooser
- Freely licensed images to use in your own scientific diagrams, illustrations or figures bioicons.
- The REUSE initiative started by the free software foundation europe (FSFe) provides some useful tooling to ensure that licences of your code are clearly denoted. This is more useful in larger project that also ships code with other licences but is a useful reference for practical licencing best practices.
A Short video guide to software licences