Libre Software People's Front

don't confuse it with People's Front of Open Source

Posts Tagged ‘licenses

Analysis of reused code using FLOSS tools

leave a comment »

Last week we attended Linux Tag in Berlin to give two talks. First one was about identifying reused code between two FLOSS projects and it was given by me. The second one explained the importance of studying FLOSS software communities and was given by Daniel Izquierdo.

The main aim of my presentation was to show that it is possible (and easy!) to get very interesting results about the shared code between two FLOSS projects using FLOSS tools; the ones we used in this case were: CCFinder, Cloc, Ninka and Grep. The study identified not only the common code but also the possible license issues that were found. These kind of studies can be interesting from different points of view, I’ve summed them up in the following questions:

  • how different are two software projects?
  • is it feasible to propose a merge of the code?
  • how is the derivate project using the original code?
  • are the licenses being respected? what about the copyright?
  • is the new project using new licenses that could be interested for the team that created the original work? are they improving the code?
  • what changes performed the second team on the original code?
  • is your source code being adopted by a certain community?

The presentation that was presented is available here.

[This entry is part of the work I do in Bitergia and it is also available here]


Written by sanacl

May 29, 2012 at 11:05 am

Finding code clones between two libre software projects

leave a comment »

Last month I’ve been working in the creation of a report with the aim of finding out code clones between two libre software projects. The method we used was basically the one that was detailed in the paper Code siblings: Technical and Legal Implications by German, D., Di Penta M., Gueheneuc Y. and Antoniol, G.

It is an interesting case and I’m pretty sure this kind of reports will be more and more interesting for entities that publish code using a libre software license. Imagine you are part of a big libre software project and your copyright and even money is there, it would be very useful to you knowing whether a project is using your code and respecting your copyright and the rights you gave to the users with the license. With the aim of identifying these scenarios we did in our study the following:

  • extraction of clones with CCFinderX
  • detection of license with Ninka
  • detection of the copyright with shell scripts

The CCFinderX tool used in the first phase gives you information about common parts of the code, it detects a common set of tokens (by default it is 50) between two files, this parameter should be changed depending on what it is being looked for. In the following example the second and third column contain information about the file and the common code. The syntax is (id of the file).(source file tokens) so the example shows that the file with id 1974 contains common code with files with id 11, 13 and 14.

clone_pairs {
19108 11.85-139 1974.70-124
19108 13.156-210 1974.70-124
19108 14.260-314 1974.70-124
12065 17.1239-1306 2033.118-185
12065 17.1239-1306 2033.185-252
12065 17.1239-1306 2033.252-319
12065 17.1239-1306 2141.319-386

In the report we did we only wanted to estimate the percent of code used from the “original” project in the derivative work, but there are some variables that are necessary to take into account. First, code clones can appear among the files of the same project (btw this is clear sign of needing refactorization). Second, different parts of a file can have clones in different files (a 1:n relationship) in both projects. The ideal solution would be to study file by file the relationship with others and to remove the repeated ones.

Once the relationship among files is created is the turn of the license and copyright detection. In this phase the method just compares the output of the two detectors and finally you get a matrix where it is possible to detect whether the copyright holders were respected and the license was correctly used.

Daniel German’s team found interesting things in their study of the FreeBSD and Linux kernels. They found GPL code in FreeBSD in the xfs file system. The trick to distribute this code under a BSD license is to distribute it disabled (is not compiled into FreeBSD) and let the user the election of compiling it or not. If a developer compiles the kernel with xfs support, the resulting kernel must be distributed under the terms of the GPLx licence.

[This entry is part of the work I do in LibreSoft and it is also available in my blog at]

Written by sanacl

May 11, 2011 at 7:49 pm

Finding free cultural works on flickr

with one comment

I used to be one of these persons that daily spend a lot of time browsing photos on flickr and tried to learn tricks from the best photographers. Soon I started to upload photos licensed under CC by-nc and to be honest I didn’t think this photos would be less free than the software I use (libre/free software). It was around a year later I joined flickr when talking with a friend about contributing with photos to libre software projects he told me the license I was using was not compatible with projects like wikipedia, why? because it enters in conflict with one of the freedoms included in the free licences definition about redistribution. If you included a photo with the non-commercial clause in its license in wikipedia it wouldn’t be possible to sell DVD copies of it and it couldn’t contain advertisements. Basically if you want to create a free cultural work using Creative Commons you have to license your photo under the CC Attribution license of the Attribution-Sharealike license. First one is similar to the MIT or BSD used in software, the second one is copyleft.

So, this cloudy night I wondered how many free cultural works are hosted in flickr and I obtained some approximated numbers:

  • According to there are more than 5 billion photos in flickr
  • According to the search engine less than 167 millions are using a CC license

So far we can say that 3,3 out of 100 photos are using Creative Commons licenses. Let’s dive it a bit more:

Flickr CC Licenses

So .. around 1 out of 100 photos stored in flickr is a free cultural work and 2 out of 100 use a CC license that is not considered free. I wonder how many people does not know yet how important that free contribution would be.

Written by sanacl

November 9, 2010 at 11:43 pm

Mining software licenses with cvsanaly and ohcount

leave a comment »

During the last three weeks I’ve been diving into cvsanaly to refresh my python skills. My first contributions have been a couple of easy fixes but now I’m finishing the integration of the ohcount tool which detects the license used in source code files ( see my previous entries about ohcount ).

This afternoon with 35ºC outside I’m very close to the air conditioning while testing and cleaning up the code before submitting the patch to my colleague carlosgc. With this new extension we get a table which relates files, revisions and licenses. See the picture below.

Ohcount is a very interesting tool, we even realized we had incorrect headers in 31 source files of cvsanaly. The new extension allow us to detect these changes. For instance the image below reflects the different licenses over time on one of the cvsanaly files, as you can see the file had two licenses (gpl and lpgl) before revision 609. That happened due to a incorrect header which mixed gpl and lgpl text together.

So, our plan is to integrate ohcount to study the licenses used in the fresh code and start studying if there are significant facts over time. I hope the code will be committed to git:// by the end of next week, in any case drop me a mail if you are interested on it and I’ll let you know.

Written by sanacl

August 27, 2010 at 3:53 pm

Ohcount, the Ohloh’s line counter

with one comment

This afternoon I did some simple tests with Ohcount which is the Ohloh’s source code line counter. I did not manage to compile the 3.0 release, but the latest version downloaded from git worked properly.

With the default parameters is similar to sloccount, it has more information about the code but nothing about effort estimation.

For me, the most interesting part is the possibility to get the license from a source code file with the flag “-l”

$ ./bin/ohcount -l /tmp/evince/
lgpl evince-document.h
gpl ev-document-model.c
gpl ev-annotation-window.h
gpl ev-stock-icons.c
gpl ev-view-presentation.c
gpl ev-job-scheduler.h
gpl ev-document-model.h
lgpl ev-timeline.c
gpl ev-page-cache.h
gpl ev-jobs.c
lgpl ev-transition-animation.c
gpl ephy-zoom-control.h
gpl ev-previewer.c
gpl ev-previewer-window.c
gpl ev-previewer-window.h
gpl evince-thumbnailer.c

This tool looks promising, I’m going to test it deeply to propose using it in Melquiades (flossmetrics) and the FusionForge metrics plugin that we are developing these days.

UPDATE I’ve found a bug in this version of the tool while studying the evince code. It identifies cpp code in the libview directory which is false. I’ve reported the bug to the main developer in sourceforge.

Written by sanacl

July 6, 2010 at 8:28 pm