Libre Software People's Front

don't confuse it with People's Front of Open Source

Posts Tagged ‘ninka

Analysis of reused code using FLOSS tools

leave a comment »

Last week we attended Linux Tag in Berlin to give two talks. First one was about identifying reused code between two FLOSS projects and it was given by me. The second one explained the importance of studying FLOSS software communities and was given by Daniel Izquierdo.

The main aim of my presentation was to show that it is possible (and easy!) to get very interesting results about the shared code between two FLOSS projects using FLOSS tools; the ones we used in this case were: CCFinder, Cloc, Ninka and Grep. The study identified not only the common code but also the possible license issues that were found. These kind of studies can be interesting from different points of view, I’ve summed them up in the following questions:

  • how different are two software projects?
  • is it feasible to propose a merge of the code?
  • how is the derivate project using the original code?
  • are the licenses being respected? what about the copyright?
  • is the new project using new licenses that could be interested for the team that created the original work? are they improving the code?
  • what changes performed the second team on the original code?
  • is your source code being adopted by a certain community?

The presentation that was presented is available here.

[This entry is part of the work I do in Bitergia and it is also available here]

Written by sanacl

May 29, 2012 at 11:05 am

Finding code clones between two libre software projects

leave a comment »

Last month I’ve been working in the creation of a report with the aim of finding out code clones between two libre software projects. The method we used was basically the one that was detailed in the paper Code siblings: Technical and Legal Implications by German, D., Di Penta M., Gueheneuc Y. and Antoniol, G.

It is an interesting case and I’m pretty sure this kind of reports will be more and more interesting for entities that publish code using a libre software license. Imagine you are part of a big libre software project and your copyright and even money is there, it would be very useful to you knowing whether a project is using your code and respecting your copyright and the rights you gave to the users with the license. With the aim of identifying these scenarios we did in our study the following:

  • extraction of clones with CCFinderX
  • detection of license with Ninka
  • detection of the copyright with shell scripts

The CCFinderX tool used in the first phase gives you information about common parts of the code, it detects a common set of tokens (by default it is 50) between two files, this parameter should be changed depending on what it is being looked for. In the following example the second and third column contain information about the file and the common code. The syntax is (id of the file).(source file tokens) so the example shows that the file with id 1974 contains common code with files with id 11, 13 and 14.

...
clone_pairs {
19108 11.85-139 1974.70-124
19108 13.156-210 1974.70-124
19108 14.260-314 1974.70-124
12065 17.1239-1306 2033.118-185
12065 17.1239-1306 2033.185-252
12065 17.1239-1306 2033.252-319
12065 17.1239-1306 2141.319-386
...

In the report we did we only wanted to estimate the percent of code used from the “original” project in the derivative work, but there are some variables that are necessary to take into account. First, code clones can appear among the files of the same project (btw this is clear sign of needing refactorization). Second, different parts of a file can have clones in different files (a 1:n relationship) in both projects. The ideal solution would be to study file by file the relationship with others and to remove the repeated ones.

Once the relationship among files is created is the turn of the license and copyright detection. In this phase the method just compares the output of the two detectors and finally you get a matrix where it is possible to detect whether the copyright holders were respected and the license was correctly used.

Daniel German’s team found interesting things in their study of the FreeBSD and Linux kernels. They found GPL code in FreeBSD in the xfs file system. The trick to distribute this code under a BSD license is to distribute it disabled (is not compiled into FreeBSD) and let the user the election of compiling it or not. If a developer compiles the kernel with xfs support, the resulting kernel must be distributed under the terms of the GPLx licence.

[This entry is part of the work I do in LibreSoft and it is also available in my blog at libresoft.es]

Written by sanacl

May 11, 2011 at 7:49 pm