Posts Tagged ‘licenses’
Last week we attended Linux Tag in Berlin to give two talks. First one was about identifying reused code between two FLOSS projects and it was given by me. The second one explained the importance of studying FLOSS software communities and was given by Daniel Izquierdo.
The main aim of my presentation was to show that it is possible (and easy!) to get very interesting results about the shared code between two FLOSS projects using FLOSS tools; the ones we used in this case were: CCFinder, Cloc, Ninka and Grep. The study identified not only the common code but also the possible license issues that were found. These kind of studies can be interesting from different points of view, I’ve summed them up in the following questions:
- how different are two software projects?
- is it feasible to propose a merge of the code?
- how is the derivate project using the original code?
- are the licenses being respected? what about the copyright?
- is the new project using new licenses that could be interested for the team that created the original work? are they improving the code?
- what changes performed the second team on the original code?
- is your source code being adopted by a certain community?
The presentation that was presented is available here.
[This entry is part of the work I do in Bitergia and it is also available here]
Last month I’ve been working in the creation of a report with the aim of finding out code clones between two libre software projects. The method we used was basically the one that was detailed in the paper Code siblings: Technical and Legal Implications by German, D., Di Penta M., Gueheneuc Y. and Antoniol, G.
It is an interesting case and I’m pretty sure this kind of reports will be more and more interesting for entities that publish code using a libre software license. Imagine you are part of a big libre software project and your copyright and even money is there, it would be very useful to you knowing whether a project is using your code and respecting your copyright and the rights you gave to the users with the license. With the aim of identifying these scenarios we did in our study the following:
- extraction of clones with CCFinderX
- detection of license with Ninka
- detection of the copyright with shell scripts
The CCFinderX tool used in the first phase gives you information about common parts of the code, it detects a common set of tokens (by default it is 50) between two files, this parameter should be changed depending on what it is being looked for. In the following example the second and third column contain information about the file and the common code. The syntax is (id of the file).(source file tokens) so the example shows that the file with id 1974 contains common code with files with id 11, 13 and 14.
19108 11.85-139 1974.70-124
19108 13.156-210 1974.70-124
19108 14.260-314 1974.70-124
12065 17.1239-1306 2033.118-185
12065 17.1239-1306 2033.185-252
12065 17.1239-1306 2033.252-319
12065 17.1239-1306 2141.319-386
In the report we did we only wanted to estimate the percent of code used from the “original” project in the derivative work, but there are some variables that are necessary to take into account. First, code clones can appear among the files of the same project (btw this is clear sign of needing refactorization). Second, different parts of a file can have clones in different files (a 1:n relationship) in both projects. The ideal solution would be to study file by file the relationship with others and to remove the repeated ones.
Once the relationship among files is created is the turn of the license and copyright detection. In this phase the method just compares the output of the two detectors and finally you get a matrix where it is possible to detect whether the copyright holders were respected and the license was correctly used.
Daniel German’s team found interesting things in their study of the FreeBSD and Linux kernels. They found GPL code in FreeBSD in the xfs file system. The trick to distribute this code under a BSD license is to distribute it disabled (is not compiled into FreeBSD) and let the user the election of compiling it or not. If a developer compiles the kernel with xfs support, the resulting kernel must be distributed under the terms of the GPLx licence.
[This entry is part of the work I do in LibreSoft and it is also available in my blog at libresoft.es]
The debian-legal is a great source of knowledge about legal issues related to FLOSS. A couple of days ago one of the contributors sent a mail informing that a computer shop has taken the Debian logo and used it for his business.
The Debian Open User Logo without the word “Debian” (they call it DOUL-nd) is released under the terms of a license similar to MIT, as specified on http://www.debian.org/logos/
If the people from http://www.legendpc.co.nz/ copied the logo from Debian it could be a copyright infringement problem because there is no mention to the license. The violation is about the following statement:
“The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.”
If they do that, they would be in compliance with the license of the DOUL-nd
On the other hand, what if they just obtained it from scratch? According to a message in the debian-legal mailing list the swirl is just one of the defaults. Some contributors were discussing if that would be relevant for a trademark violation, in those cases what matters is whether the image is confusingly similar to an existing trademark. The point here is the swirl is not trademark, what Debian has under trademark is the Debian name and as a consequence the logo that contains the “Debian” label.
This morning Stefano Zacchiroli (the Debian leader) has forwarded the issue to the SPI lawyer, it seems that the idea is to send a letter to the domain owner requesting to come into compliance with the licensing term of the Debian swirl.
I used to be one of these persons that daily spend a lot of time browsing photos on flickr and tried to learn tricks from the best photographers. Soon I started to upload photos licensed under CC by-nc and to be honest I didn’t think this photos would be less free than the software I use (libre/free software). It was around a year later I joined flickr when talking with a friend about contributing with photos to libre software projects he told me the license I was using was not compatible with projects like wikipedia, why? because it enters in conflict with one of the freedoms included in the free licences definition about redistribution. If you included a photo with the non-commercial clause in its license in wikipedia it wouldn’t be possible to sell DVD copies of it and it couldn’t contain advertisements. Basically if you want to create a free cultural work using Creative Commons you have to license your photo under the CC Attribution license of the Attribution-Sharealike license. First one is similar to the MIT or BSD used in software, the second one is copyleft.
So, this cloudy night I wondered how many free cultural works are hosted in flickr and I obtained some approximated numbers:
- According to blog.flickr.net there are more than 5 billion photos in flickr
- According to the search engine less than 167 millions are using a CC license
So far we can say that 3,3 out of 100 photos are using Creative Commons licenses. Let’s dive it a bit more:
- 167 millions using CC
- 38 millions of photos licensed under cc-by or cc-by-sa
So .. around 1 out of 100 photos stored in flickr is a free cultural work and 2 out of 100 use a CC license that is not considered free. I wonder how many people does not know yet how important that free contribution would be.
During the last three weeks I’ve been diving into cvsanaly to refresh my python skills. My first contributions have been a couple of easy fixes but now I’m finishing the integration of the ohcount tool which detects the license used in source code files ( see my previous entries about ohcount ).
This afternoon with 35ºC outside I’m very close to the air conditioning while testing and cleaning up the code before submitting the patch to my colleague carlosgc. With this new extension we get a table which relates files, revisions and licenses. See the picture below.
Ohcount is a very interesting tool, we even realized we had incorrect headers in 31 source files of cvsanaly. The new extension allow us to detect these changes. For instance the image below reflects the different licenses over time on one of the cvsanaly files, as you can see the file had two licenses (gpl and lpgl) before revision 609. That happened due to a incorrect header which mixed gpl and lgpl text together.
So, our plan is to integrate ohcount to study the licenses used in the fresh code and start studying if there are significant facts over time. I hope the code will be committed to git://git.libresoft.es/git/cvsanaly by the end of next week, in any case drop me a mail if you are interested on it and I’ll let you know.
This afternoon I did some simple tests with Ohcount which is the Ohloh’s source code line counter. I did not manage to compile the 3.0 release, but the latest version downloaded from git worked properly.
For me, the most interesting part is the possibility to get the license from a source code file with the flag “-l”
$ ./bin/ohcount -l /tmp/evince/
UPDATE I’ve found a bug in this version of the tool while studying the evince code. It identifies cpp code in the libview directory which is false. I’ve reported the bug to the main developer in sourceforge.