Posts Tagged ‘ohcount’
Is this a good question? I would say it is something that could be asked by the bridgekeeper in the film Monty Python and the Holy Grail. In our research group we started using sloccount a long time ago and it is currently integrated with some of our data mining tools like cvsanaly but after testing ohcount a couple of times during the last summer I started to though we should either improve it or replace it. Yesterday I read a post by Nicolas Betternburg on a study of PostgreSQL 9 and he mentioned he used cloc to count the SLOC (source lines of code) so I decided to test a couple of tools during this cold morning in Madrid.
The test I performed use the latest code of fusionforge and nautilus with the aim of getting the total SLOC and the SLOC for the primary language. I started with the tools cloc 1.09, sloccount 2.26 and ohcount (latest from git), but I obtained weird results from ohcount (very different from the version used in ohloh) and I discarded it. This is what I got:
SLOC in Fusionforge:
- 452508 in 27 different languages according to cloc
Primary language in Fusionforge (PHP SLOC):
- 244435 according to cloc
- 243355 according to sloccount
SLOC in Nautilus:
- 182967 in 10 different languages according to cloc
- 146595 in 5 different languages according to sloccount
Primary language in Nautilus (C SLOC):
- 134675 according to cloc
- 134680 according to sloccount
At this point I think we should test deeply cloc but I still don’t understand some numbers in the fusionforge report where it says there are 1600 PHP files and using the “file” command I only found 1355.
So .. the good think of sloccount is the time and effort estimation (but it is based on the SLOC), the great feature of ohcount is to identify the FLOSS licenses of the code and as far as I’ve seen the best SLOC counter is cloc. Do you agree?
[This post is also available in my blog at libresoft.es]
During the last three weeks I’ve been diving into cvsanaly to refresh my python skills. My first contributions have been a couple of easy fixes but now I’m finishing the integration of the ohcount tool which detects the license used in source code files ( see my previous entries about ohcount ).
This afternoon with 35ºC outside I’m very close to the air conditioning while testing and cleaning up the code before submitting the patch to my colleague carlosgc. With this new extension we get a table which relates files, revisions and licenses. See the picture below.
Ohcount is a very interesting tool, we even realized we had incorrect headers in 31 source files of cvsanaly. The new extension allow us to detect these changes. For instance the image below reflects the different licenses over time on one of the cvsanaly files, as you can see the file had two licenses (gpl and lpgl) before revision 609. That happened due to a incorrect header which mixed gpl and lgpl text together.
So, our plan is to integrate ohcount to study the licenses used in the fresh code and start studying if there are significant facts over time. I hope the code will be committed to git://git.libresoft.es/git/cvsanaly by the end of next week, in any case drop me a mail if you are interested on it and I’ll let you know.
This afternoon I did some simple tests with Ohcount which is the Ohloh’s source code line counter. I did not manage to compile the 3.0 release, but the latest version downloaded from git worked properly.
For me, the most interesting part is the possibility to get the license from a source code file with the flag “-l”
$ ./bin/ohcount -l /tmp/evince/
UPDATE I’ve found a bug in this version of the tool while studying the evince code. It identifies cpp code in the libview directory which is false. I’ve reported the bug to the main developer in sourceforge.