Herodotus: Difference between revisions

From ALT Linux Wiki
m (→‎herodotos in p8: formatting)
m (→‎How to try herodotos: give a short annotation of the linked example)
 
(49 intermediate revisions by the same user not shown)
Line 7: Line 7:
(Herodotus is inspired by, and partly based in the implementation, on [[#The core: herodotos tool|herodotos]] tool. Note the different spelling of the name of this tool and of our project. Named after [https://en.wikipedia.org/wiki/Herodotus Herodotus].)
(Herodotus is inspired by, and partly based in the implementation, on [[#The core: herodotos tool|herodotos]] tool. Note the different spelling of the name of this tool and of our project. Named after [https://en.wikipedia.org/wiki/Herodotus Herodotus].)


==Introduction==
;Which computed or external meta-information for a package is tracked:
;Which computed or external meta-information for a package is tracked:
:*'''Analytic facts''' (computed from the "internal" content of package releases):
:*'''Analytic facts''' (computed from the "internal" content of package releases):
Line 39: Line 40:


=====herodotos in p8=====
=====herodotos in p8=====
* {{(+)|p8, test-only [http://git.altlinux.org/tasks/#243195/ task #243195] }}{{pkg|herodotos}}
* {{(+)|p8, test-only [http://git.altlinux.org/tasks/243195/ task #243195]}} {{pkg|herodotos}}
: requires:
: (or: test-only [http://git.altlinux.org/tasks/257760/ task #257760] {{pkg|herodotos}} with the optional dependency on {{pkg|gumtree}} excluded because it is missing in p8, so that one can install it)
:* {{(+)|p8, [http://git.altlinux.org/tasks/#243203/ task #243203] }}{{pkg|ocaml-bolt}} (Upstream: [http://bolt.x9c.fr/downloads.html downloads]; <code>darcs get http://bolt.x9c.fr/</code>)
: Requires:
:* {{(+)|p8, [http://git.altlinux.org/tasks/#243208/ task #243208] }}{{pkg|ocaml-parmap}}
:* {{(+)|p8}} {{pkg|ocaml-bolt}} (Upstream: [http://bolt.x9c.fr/downloads.html downloads]; <code>darcs get http://bolt.x9c.fr/</code>)
:* {{(+)|p8, [http://git.altlinux.org/tasks/#243245/ task #243245] }}{{pkg|coccinelle}} with support for embedded Python ('''needed mainly for''' reproducing the author's experiments with Linux sources as a way of testing {{pkg|herodotos}})
:* {{(+)|p8}} {{pkg|ocaml-parmap}}
:* {{(-)|p8 (works, but not quite ready) }}{{pkg|ocaml-postgresql}}
:* {{(-)|p8 (works, but not quite ready)}} {{pkg|ocaml-postgresql}}
:* {{(-)|p8 }}{{pkg|gumtree}} ('''needed optionally''' for better correlation)
: Requires (for counting LOC; this dependency became optional in recent herodotos versions):
:: requires:
:* {{(+)|p8}} {{pkg|sloccount}}
::* {{(+)|p8, [http://git.altlinux.org/tasks/#243201/ task #243201] }}{{pkg|cgum}}
: Needed mainly for reproducing the author's experiments with Linux sources as a way of testing {{pkg|herodotos}}:
:* {{(+)|p8}} {{pkg|coccinelle}} with support for embedded Python
: Needed optionally for better correlation:
:* {{(-)|p8}} {{pkg|gumtree}}
:: Requires:
::* {{(+)|p8}} {{pkg|cgum}}


====How to try herodotos====
=====herodotos in p9=====
* {{(-)|p9}} {{pkg|herodotos}}
: Requires:
:* {{(-)|p9, test-only [http://git.altlinux.org/tasks/243204/ task #243204]}} {{pkg|ocaml-bolt}} (Upstream: [http://bolt.x9c.fr/downloads.html downloads]; <code>darcs get http://bolt.x9c.fr/</code>)
:* {{(+)|p9}} {{pkg|ocaml-parmap}}
:* {{(+)|p9 (another version, ok?)}} {{pkg|ocaml-postgresql}}
: Requires (for counting LOC; this dependency became optional in recent herodotos versions):
:* {{(+)|p9}} {{pkg|sloccount}}
: Needed mainly for reproducing the author's experiments with Linux sources as a way of testing {{pkg|herodotos}}:
:* {{(+)|p9}} {{pkg|coccinelle}} with support for embedded Python
: Needed optionally for better correlation:
:* {{(+)|p9}} {{pkg|gumtree}}
:: Requires:
::* {{(+)|p9}} {{pkg|cgum}}


One can install herodotos (for p8) from [http://git.altlinux.org/tasks/214330/ task 214330].
=====herodotos in Sisyphus=====
* {{(-)|sisyphus}} {{pkg|herodotos}}
: Requires:
:* {{(-)|sisyphus, test-only [http://git.altlinux.org/tasks/243205/ task #243205]}} {{pkg|ocaml-bolt}} (Upstream: [http://bolt.x9c.fr/downloads.html downloads]; <code>darcs get http://bolt.x9c.fr/</code>)
:* {{(+)|sisyphus}} {{pkg|ocaml-parmap}}
:* {{(+)|sisyphus (another version, ok?)}} {{pkg|ocaml-postgresql}}
: Requires (for counting LOC; this dependency became optional in recent herodotos versions):
:* {{(+)|sisyphus}} {{pkg|sloccount}}
: Needed mainly for reproducing the author's experiments with Linux sources as a way of testing {{pkg|herodotos}}:
:* {{(-)|sisyphus, test-only [http://git.altlinux.org/tasks/243259/ task #243259]}} {{pkg|coccinelle}} with support for embedded Python
: Needed optionally for better correlation:
:* {{(+)|sisyphus}} {{pkg|gumtree}}
:: Requires:
::* {{(+)|sisyphus}} {{pkg|cgum}}


If you want to try herodotos, try to reproduce the authors' work https://github.com/coccinelle/faults-in-linux . (It is more recent; the older work http://coccinelle.lip6.fr/papers/aosd10.pdf with their [http://coccinelle.lip6.fr/aosd10/ data and configuration] is not suitable for the current herodotos 0.8+ version.)
====How to try herodotos====
 
If you want to try the ''herodotos'' tool, try to reproduce the authors' work https://github.com/coccinelle/faults-in-linux . (It is more recent; the older work http://coccinelle.lip6.fr/papers/aosd10.pdf with their [http://coccinelle.lip6.fr/aosd10/ data and configuration] is not suitable for the current herodotos 0.8+ version.)
I've adapted their herodotos config files and made it a [[Gear]] repo: http://git.altlinux.org/people/imz/public/faults-in-Linux.git , so that one can easily pass it to {{prg|hasher}} and do the processing in an isolated, easily reproducible [[hasher]] environment.
 
* First, prepare: clone my repo and and set up the sources for APT:
 
$ git clone --depth=20 git://git.altlinux.org/people/imz/public/faults-in-Linux.git
$ cd faults-in-Linux
$ apt-repo  --hsh-apt-config=/home/imz/.hasher/p8/apt.conf add 214330
 
:Here is what the APT sources config for the hasher should be like (and our current working dir):
 
$ apt-repo  --hsh-apt-config=/home/imz/.hasher/p8/apt.conf
rpm [updates] file:/ALT/p8 x86_64 classic
rpm [updates] file:/ALT/p8 noarch classic
rpm http://git.altlinux.org repo/214330/x86_64 task
$ pwd
/space/home/imz/wip/2018-10-herodotos-cppcheck/faults-in-Linux
 
* Then, we execute the authors' processing rules (under the control of my [http://git.altlinux.org/people/imz/public/faults-in-Linux.git?p=faults-in-Linux.git;a=blob;f=.gear/faults-in-Linux.spec;h=c95bec62f0357de8d95886a338073ae088159bbe;hb=HEAD .gear/faults-in-Linux.spec]-file from the [http://git.altlinux.org/people/imz/public/faults-in-Linux.git?p=faults-in-Linux.git;a=shortlog;h=refs/heads/master master] branch; it automatically gets and checks out various revisions of the linux sources (so, you must have enough space to hold it):
 
$ export share_network=1
$ gear-hsh  --apt-config=/home/imz/.hasher/p8/apt.conf --without-stuff 2>&1 | tee hsh.log.1
 
:It stops after the step of applying the static analyzer ({{prg|coccinelle}}) to each version of the sources (linux). The results are saved at {{path|/usr/src/HERODOTOS/}} (inside hasher). I've copied them and saved in [http://git.altlinux.org/people/imz/public/faults-in-Linux.git?p=faults-in-Linux.git;a=commitdiff;h=ad458b0c2b24e92cc09c4f2043e6021bb0d2af05 commit ad458b0c2] in the [http://git.altlinux.org/people/imz/public/faults-in-Linux.git?p=faults-in-Linux.git;a=shortlog;h=refs/heads/EXPERI/imz2/apply-analyzer-results EXPERI/imz2/apply-analyzer-results] branch, so that you can look and get an idea what they look like:
 
:* the individual '''per-version {{path|*.orig.org}} files'''.
 
{{path|/usr/src/HERODOTOS/}} is used as the place to cache the analyzed sources and to save the (intermediate and final) results, so it won't be cleaned if you run {{cmd|1=<nowiki>gear --hasher | hsh-rebuild</nowiki>}} again (after editing the Git repo with the Makefiles, configs etc). (TODO: Unfortunately, the automatically filled {{path|faults/.projects_study.hc}} file is not relocatable in a similar manner.)
 
* The next step (correlation of the warnings between versions by {{prg|herodotos}}) is to be run by us manually (because I wanted to have a possibility to first commit the results of the previous step):
 
hsh-shell --mount=/proc,/dev/pts
cd /usr/src/RPM/BUILD/faults-in-Linux-20181023/faults/
make correl
 
:or as a single command:
 
hsh-run --mount=/proc -- sh -c 'cd /usr/src/RPM/BUILD/faults-in-Linux-20181023/faults/ && make correl'
 
:I saved the results in [http://git.altlinux.org/people/imz/public/faults-in-Linux.git?p=faults-in-Linux.git;a=commitdiff;h=c3f5e56dd7ecc9a9c7bc70379bc05f1a0c6be1b6 commit c3f5e56dd7e] in the [http://git.altlinux.org/people/imz/public/faults-in-Linux.git?p=faults-in-Linux.git;a=shortlog;h=refs/heads/EXPERI/imz2/correl-gnudiff-results EXPERI/imz2/correl-gnudiff-results] branch, so that you can look and get an idea what they look like:
 
:* some non-empty '''{{path|*.correl.org}} files with undecided possible correlations''' (marked as <code>TODO</code>);
:* the '''{{path|*.new.org}} files with merged warnings from all versions'''. It is to be decided whether each of them (marked as <code>TODO</code> initially) is a real error or a false warning.
 
(In this example, I [http://git.altlinux.org/people/imz/public/faults-in-Linux.git?p=faults-in-Linux.git;a=commitdiff;h=8e6adc3941e8b856a7c33ddfe234e3155b71b1e7 made] {{prg|herodotos}} use the {{cmd|--diff gnudiff}} option, because the default better {{cmd|--diff hybrid}} requires {{prg|gumtree}} and doesn't work correctly if it is absent.)


A follow-up scenario would be to first mark some warnings as checked and then add another version of the project into consideration (by editing the pattern in {{path|faults/study.hc.base}}) and see how the warnings concerning the new version are merged with the marks for the old versions. Let's explore this.
See a concrete example: [[/How to try herodotos|/How to try herodotos]].


===coccinelle support===
===coccinelle support===
Line 155: Line 143:


==Usage==
==Usage==
(to be written)
 
===Herodotus as server===
Think of the work with Herodotus as a server (which is in some respect similar to [[girar]]).
 
The main task of the server is to store the analysis of a package (for each known package) in a "normal" form, i.e., after having done the best effort of correlating the analyses for each known version (release) of the package.
 
The stored analysis of a package can be updated upon request with new information. Most commonly, the new information is:
 
* a new analysis of a specific new version (release) of the package;
* or additional manual correlations between warnings from the old analyses.
 
After getting new information, the Herodotus server must "normalize" it (i.e., make the best effort to automatically correlate) and save.
 
The Herodotus server uses a Git repository for each package as a way to store the current and past states of the analysis of this package.
 
====Action: "update" (performed by the server)====
 
(Applicable for each individual package. Parameter: a branch name, which is to be updated.)
 
;Result: a "normal" (automatically correlated) analysis in the top commit of the specified branch (on the server).
;Input: a Git commit with analysis results (and optional correlations) in herodotos format. (The old head of the Git branch on the server should be an ancestor of this new Git commit.)
 
The Git commit that comes as an input to the "update" action can be created by the "analyze" action described below.
 
====Action: "analyze" (performed anywhere)====
 
(Applicable for each version (release) of each individual package.)
 
;Result: a Git commit containing an analysis of the specified version (release) of the package in herodotos format.
;Input: a herodotos config with a specification of the version (release) of the package to be analyzed.
 
Commonly, one creates the commit with the analysis of a new version (release) on top of a previous commit with old analyses.
 
====Manual action: edit the analyses or correlations in herodotos format====
 
A special Emacs mode (extension to org-mode) can be used to do this conveniently, side-by-side with exploring the actual corresponding source code.
 
===herodotus-server commands===
 
[http://git.altlinux.org/people/imz/packages/herodotus-server.git herodotus-server.git] contains scripts and related data that represent the model of Herodotus as a server.
 
* {{cmd|herodotus-update}}
* {{cmd|herodotus-helper-analyze-each}} (automatically runs cppcheck for each known version (release) of a package)
 
====herodotus-helper-analyze-each====
 
herodotus-helper-analyze-each STEM PKGNAME
 
automatically runs cppcheck for each known version (release) of a package (and saves the results).
 
It is to be run inside a Git repository. It is possibly an empty directory. It will re-initialized as a Git repo.
 
<code>STEM</code> is the stem of the Git branch name, which will be used to save the results. (At the same time, it is used to get the sources from this branch in the ALT archive.)
 
<code>PKGNAME</code> is the name of the package to be analyzed. (The srpms are taken from the ALT archive.)
 
=====Example of a one-shot use of herodotus-helper-analyze-each=====
 
Example: analyze each release in the history of package "anacron" in ALT c7 branch:
 
$ mkdir anacron
$ cd anacron/
$ /home/imz/wip/2018-10-herodotos-cppcheck/herodotus-server/herodotus-helper-analyze-each c7 anacron
 
======What herodotus-helper-analyze-each has done internally======
 
(It has invoked various tools in a way similar to how the author of herodotos used it to analyze the Linux kernel sources.)
 
It has looked up the list of the known versions (releases) in {{path|c7/index/src/a/anacron/d-t-s-evr.list}} under {{path|/usr/src/HERODOTOS/INPUT/anacron/../ALT/repo/}} (the path is constructed according to the configuration in {{path|study.hc.base}} from {{pkg|herodotus-server.git}}); here is what it has seen there (so that you better understand what has been going on):
 
$ cat /usr/src/HERODOTOS/INPUT/anacron/../ALT/repo/c7/index/src/a/anacron/d-t-s-evr.list
1381657390 105852 - 1:2.3-alt6
$
 
It has cached the unpacked/"prepared" sources of each of the releases under {{path|/usr/src/HERODOTOS/INPUT/anacron/}}
 
$ ls -l /usr/src/HERODOTOS/INPUT/anacron/
total 0
drwxrwxrwx 5 imz imz 260 дек 21  2018 anacron-1@2.3-alt6
$
 
(One kind of trick or another is needed if you want to share the cache between multiple users of the system. TODO: describe them in details.)
 
Then it has run the analyzer on each of the versions (releases) of the sources; in this case, it has been cppcheck (according to the configuration in {{path|study.hc.base}} from {{pkg|herodotus-server.git}}).
 
======Results of herodotus-helper-analyze-each======
 
It has made a Git commit with the results:
 
$ git ls-files
.depend
.depend.erase
.depend.patterns
.projects_study.hc
RESULTS/anacron/.depend.Linux
RESULTS/anacron/Linux_cppcheck.orig.org
RESULTS/anacron/anacron-1@2.3-alt6/Linux_cppcheck.log
RESULTS/anacron/anacron-1@2.3-alt6/Linux_cppcheck.orig.org
study.hc
$
 
It has saved the analysis (done by cppcheck) of each release, namely, of the single known release in this case:
 
$ cat RESULTS/anacron/anacron-1\@2.3-alt6/Linux_cppcheck.orig.org
* TODO [[view:/usr/src/HERODOTOS/INPUT/anacron/anacron-1@2.3-alt6/anacron/matchrx.c::face=ovl-face1::linb=52::colb=1::cole=2][Memory leak: sub_offsets]]
$
 
(TODO: of course, analyzing deleted packages seems like a strange behavior, so it should be fixed. But we show it here nevertheless just for the demonstration of the structure of the results.)
 
All results from all releases will be put together in a single common file after correlation (the "update" action on the Herodotus server), which we haven't done yet; so, it is empty for now:
 
$ wc -l RESULTS/anacron/Linux_cppcheck.orig.org
0 RESULTS/anacron/Linux_cppcheck.orig.org
$
 
====herodotus-update====
 
===Test example: correlating cppcheck histories for each package in a repository branch===
 
====Installation====
 
=====herodotos tool=====
 
The herodotos tool is installed from the task mentioned above in [[#herodotos_in_p8]]:
 
apt-repo test 257760 herodotos
 
or a locally-built package:
 
rpm -Uhv ~imz/hasher-p8-herodotos/repo/x86_64/RPMS.hasher/herodotos-0.8.0.0.21-alt1.x86_64.rpm
 
=====herodotus-server=====
 
For now, I use a locally checked-out copy of {{pkg|herodotus-server}}:
 
# ls -l /space/home/herodotus/bin/
total 8
lrwxrwxrwx 1 herodotus herodotus 87 ноя  9  2018 herodotus-helper-analyze-each -> /home/imz/wip/2018-10-herodotos-cppcheck/herodotus-server/herodotus-helper-analyze-each
lrwxrwxrwx 1 herodotus herodotus 74 ноя  9  2018 update -> /home/imz/wip/2018-10-herodotos-cppcheck/herodotus-server/herodotus-update
#
 
The "analyze" action is usually run by other users, so they must use the complete path to {{cmd|herodotus-helper-analyze-each}}.
 
(Remark: {{cmd|herodotus-helper-analyze-each}} determines the name of the analyzed package from the current working dir. The current working will be used to save the results of the analysis. After it determined the name of the package, it gets the list of releases from a local mirror of {{path|/ALT/repo/}}, and the source packages (if not cached already) from a local mirror of the repo.)
 
=====Cache of unpacked sources=====
 
Whenever you invoke the "analysis" action of herodotos tool, it unpacks the sources to be analyzed.
 
According to the {{path|study.hc.base}} template from {{pkg|herodotus-server}} package, the "cache" of the unpacked sources is located under {{path|/usr/src/HERODOTOS/INPUT/}} (one directory per project/package; one subdir per release):
 
projects="/usr/src/HERODOTOS/INPUT"
 
The same location (in the "cache") will appear in the paths in the results (in the .org format) of the "analyze" or "correlate" actions. So, if you explore the warnings in the results and want to view the corresponding source code, Emacs will open the file in the cache.
 
This location '''must be writable''' by all users who run the "analyze" action themselves and readable (and maybe writable) by the "herodotus" user, under which the server runs the "correlate" action.
 
There are better clever ways to achieve the needed permissions, but I won't describe them now. The simplest way is to have the permissions like this (recursively):
 
# ls -ld /usr/src/HERODOTOS/INPUT/
drwxrwxrwx 813 herodotus herodotus 16260 дек 23  2018 /usr/src/HERODOTOS/INPUT/
 
=====ALT/repo=====
 
In my Git repo {{pkg|herodotos}}, I have a helper script for mounting {{path|/ALT/repo}}, which is needed to get the list of the releases of a package and to get the srpms: {{path|herodotos/scm/alt_archive.mount}}.
 
There are different possibilities (to use the public mirror via ftp, or a private one via ssh); here is an example:
 
sshfs -o ro,allow_other,kernel_cache team.alt:/ALT/repo /usr/src/HERODOTOS/INPUT/ALT/repo
 
It is found according to {{path|study.hc.base}} from {{pkg|herodotus-server}} package:
 
project Linux {
  // Too old (archived under /alt0/repo/sisyphus/task/archive and not available -- FIXME)
  local_scm = STRINGIFY(unpack+alt_archive:../ALT/repo c7 PKGNAME)
...
 
(When it is working with "project" under {{path|/usr/src/HERODOTOS/INPUT/}}, as decribed above, according to this configuration line, to find the source repo, it must go one dir up and look for {{path|ALT/repo}} there.)
 
=====pregirar=====
 
{{pkg|pregirar}} is a package with utilities that help to run the analysis in parallel.
 
====Running the analysis====
 
=====Preparation=====
 
$ pwd
/home/imz/wip/2018-10-herodotos-cppcheck
$ mkdir ANALYSES.4
$ cp -av ANALYSES.3/n* -t ANALYSES.4/
'ANALYSES.3/n' -> 'ANALYSES.4/n'
'ANALYSES.3/n-a-k' -> 'ANALYSES.4/n-a-k'
'ANALYSES.3/n-l-z' -> 'ANALYSES.4/n-l-z'
$ cd ANALYSES.4
$ make -f ../ANALYSES.mk mkdirs -o n_v_r-a-k -o n_v_r-l-z
mkdir -p GEARS
(cd GEARS && xargs mkdir -p) <n
$
 
=====Running in parallel=====
 
{{pkg|pregirar}} helps to run in parallel.
 
I run the following under screen {{cmd|screen}} in a dir {{path|ANALYSES.4}} (created above).
 
pregirar-in_each-parallel 31 31 analyze-each /home/imz/wip/2018-10-herodotos-cppcheck/herodotus-server/herodotus-helper-analyze-each c7 <n
 
====Evaluation of the test====
 
=====Logs of the failed tasks=====
Compare the logs of the tasks that finished with an error:
 
diff -r ANALYSES.{2,3}/in_each-analyze-each/error | less
 
or (to see new failed tasks):
 
diff -qr ANALYSES.{2,3}/in_each-analyze-each/error | fgrep Only | less
 
Compare both the "error" and success" logs:
 
diff -qr ANALYSES.{2,3}/in_each-analyze-each -x archive | fgrep Only | less
 
<code>archive</code> is excluded because they might have been different number of tries in different tests.
 
=====Comapring the RESULTS=====
 
diff -r ANALYSES.{2,3}/GEARS  -x .git -x .projects_study.hc -x filelist  -x sloc_hashes -x top_dir -x Linux_cppcheck.log | less
 
In this diff, you basically see which RESULTS are absent (or what their diff is).
 
* Comparing the <code>Linux_cppcheck.log</code> files is not quite useful, because the timings always differ.
* <code>-x filelist  -x sloc_hashes -x top_dir</code> is to exclude the working files of {{cmd|sloccount}}, which might fail sometimes. (FIXME)
* <code>.projects_study.hc</code> might differ due to failed {{cmd|sloccount}} (it includes the number of lines of code)...
 
===Exploring analysis results===
 
Here is an illustrated example of exploring analysis results (from svace) in Herodotus with emacs.
 
Step 1: find the corresponding .org-file
<br />
[[File:Herodotus-emacs-explore-svace-guile18-01-dired.png|800px]]
 
Step 2: the .org-file opened in org-mode
<br />
[[File:Herodotus-emacs-explore-svace-guile18-02-org-mode.png|800px]]
 
Step 3: look at the source code by clicking an item
<br />
[[File:Herodotus-emacs-explore-svace-guile18-03-view-source-code-at-this-position.png|800px]]
 
Step 4: label the warning as a BUG after thinking about it (with <kbd>C-c C-t</kbd> keys)
<br />
[[File:Herodotus-emacs-explore-svace-guile18-04-label-BUG.png|800px]]

Latest revision as of 01:08, 16 September 2020


(It's a work in progress.)

Herodotus is a project for tracking and linking analytic and synthetic facts about (the releases of) a package. The tracking is to be done independently per package. (Sisyphus is an example of a repository of packages where this can be applied.)

(Herodotus is inspired by, and partly based in the implementation, on herodotos tool. Note the different spelling of the name of this tool and of our project. Named after Herodotus.)

Introduction

Which computed or external meta-information for a package is tracked
  • Analytic facts (computed from the "internal" content of package releases):
  • Static analysis of the C/C++ code (warnings):
  • by coccinelle
  • by cppcheck
  • ...
  • Discovery of source files which are not used during the build of the package (by means of strace or by the access time)
  • ...
  • Synthetic facts (added "externally" by maintainers)
  • Resolutions for the warnings from the static analysis (a reason why they are invalid or a fix).
  • ...
Representation
  • Each fact is linked to the corresponding Git (Gear) commit or tag.
  • (The facts can be stored in the same Git repository in a separate branch.)
  • If the "same" fact appears for several releases, all its occurrences are linked together, so that a maintainer can view them as a single fact. Only when the facts change between releases, it should need attention.
User interfaces
  • Files (obtained via Git), org-mode editor (Emacs; org-mode is like a personal wiki)
  • ...

Implementation details

The core: herodotos tool

herodotos tool runs the analyzers for different releases and then links identical facts (modulo the diff, i.e., the changes of the source code).

Description of herodotos tool

herodotos in ALT repos

herodotos in p8
(or: test-only task #257760 herodotos with the optional dependency on gumtree excluded because it is missing in p8, so that one can install it)
Requires:
Requires (for counting LOC; this dependency became optional in recent herodotos versions):
  • Symbol support vote.svg p8 sloccount
Needed mainly for reproducing the author's experiments with Linux sources as a way of testing herodotos:
  • Symbol support vote.svg p8 coccinelle with support for embedded Python
Needed optionally for better correlation:
  • Symbol oppose vote.svg p8 gumtree
Requires:
  • Symbol support vote.svg p8 cgum
herodotos in p9
  • Symbol oppose vote.svg p9 herodotos
Requires:
Requires (for counting LOC; this dependency became optional in recent herodotos versions):
  • Symbol support vote.svg p9 sloccount
Needed mainly for reproducing the author's experiments with Linux sources as a way of testing herodotos:
  • Symbol support vote.svg p9 coccinelle with support for embedded Python
Needed optionally for better correlation:
  • Symbol support vote.svg p9 gumtree
Requires:
  • Symbol support vote.svg p9 cgum
herodotos in Sisyphus
  • Symbol oppose vote.svg sisyphus herodotos
Requires:
Requires (for counting LOC; this dependency became optional in recent herodotos versions):
  • Symbol support vote.svg sisyphus sloccount
Needed mainly for reproducing the author's experiments with Linux sources as a way of testing herodotos:
  • Symbol oppose vote.svg sisyphus, test-only task #243259 coccinelle with support for embedded Python
Needed optionally for better correlation:
  • Symbol support vote.svg sisyphus gumtree
Requires:
  • Symbol support vote.svg sisyphus cgum

How to try herodotos

If you want to try the herodotos tool, try to reproduce the authors' work https://github.com/coccinelle/faults-in-linux . (It is more recent; the older work http://coccinelle.lip6.fr/papers/aosd10.pdf with their data and configuration is not suitable for the current herodotos 0.8+ version.)

See a concrete example: /How to try herodotos.

coccinelle support

coccinelle is natively supported by herodotos tool.

Actually, herodotos tool can work with any analyzer which gives output in the org-mode format.

coccinelle in Sisyphus

cppcheck support

  • cppcheck is supported by flycheck (an Emacs package)
  • flycheck can be hacked to output the information in the format suitable for herodotos tool (org-mode)

So, we could easily get the support for any analyzer known to flycheck.

cppcheck in Sisyphus

  • Symbol support vote.svg cppcheck
  • Symbol oppose vote.svg emacs-mode-flycheck
  • Symbol oppose vote.svg flycheck output in org-mode format

Discovery of source files which are not used during the build of the package

Either builds under strace can be used to discover files which are not used, or the access time (an idea by boyarsh@, which has already been probably implemented by him).

Extensions to be implemented

Ad hoc sources for herodotos

Ad hoc ways to feed herodotos some specific sources (which are not covered by the configuration "*SCM" and "versions" parameters):

Symbol oppose vote.svg herodotos preinit-add git REPO TAG
Symbol oppose vote.svg herodotos preinit-add rpm-bp FILE
Symbol oppose vote.svg herodotos preinit-add srpm FILE

More; easy to implement; but not really needed much (as for now):

Symbol oppose vote.svg herodotos preinit-add rpm-bp+gear REPO TAG
Symbol oppose vote.svg herodotos preinit-add srpm+gear REPO TAG

Here, the way the git option is processed is similar to how the git: sources from the configuration are treated. (An exercise in implementing preinit-add on the base of the existing code.)

The rpm-bp option would invoke rpm to prepare the source tree (with all the patches applied etc. by performing the prep stage with rpmbuild -bp, optionally under hasher); the srpm option is about a stupid unpacking of an .src.rpm and of the archives it contains. The +gear options are about getting the srpm from a Gear repo.

More methods for herodotos to get sources

In the spirit of the current way to write the configuration file, in addition to git: (combined with versions to select the tags), one could implement more methods for herodotos to get sources from some other kinds of repositories:

  • Symbol oppose vote.svg rpm-bp+gear: (or srpm+gear:)
  • Symbol oppose vote.svg the Sisyphus (and branches) archive (whereby the repo index might help to learn the releases and their place in the archive).

This could be useful for a more automated study of packages from Sisyphus and branches.

Usage

Herodotus as server

Think of the work with Herodotus as a server (which is in some respect similar to girar).

The main task of the server is to store the analysis of a package (for each known package) in a "normal" form, i.e., after having done the best effort of correlating the analyses for each known version (release) of the package.

The stored analysis of a package can be updated upon request with new information. Most commonly, the new information is:

  • a new analysis of a specific new version (release) of the package;
  • or additional manual correlations between warnings from the old analyses.

After getting new information, the Herodotus server must "normalize" it (i.e., make the best effort to automatically correlate) and save.

The Herodotus server uses a Git repository for each package as a way to store the current and past states of the analysis of this package.

Action: "update" (performed by the server)

(Applicable for each individual package. Parameter: a branch name, which is to be updated.)

Result
a "normal" (automatically correlated) analysis in the top commit of the specified branch (on the server).
Input
a Git commit with analysis results (and optional correlations) in herodotos format. (The old head of the Git branch on the server should be an ancestor of this new Git commit.)

The Git commit that comes as an input to the "update" action can be created by the "analyze" action described below.

Action: "analyze" (performed anywhere)

(Applicable for each version (release) of each individual package.)

Result
a Git commit containing an analysis of the specified version (release) of the package in herodotos format.
Input
a herodotos config with a specification of the version (release) of the package to be analyzed.

Commonly, one creates the commit with the analysis of a new version (release) on top of a previous commit with old analyses.

Manual action: edit the analyses or correlations in herodotos format

A special Emacs mode (extension to org-mode) can be used to do this conveniently, side-by-side with exploring the actual corresponding source code.

herodotus-server commands

herodotus-server.git contains scripts and related data that represent the model of Herodotus as a server.

  • herodotus-update
  • herodotus-helper-analyze-each (automatically runs cppcheck for each known version (release) of a package)

herodotus-helper-analyze-each

herodotus-helper-analyze-each STEM PKGNAME

automatically runs cppcheck for each known version (release) of a package (and saves the results).

It is to be run inside a Git repository. It is possibly an empty directory. It will re-initialized as a Git repo.

STEM is the stem of the Git branch name, which will be used to save the results. (At the same time, it is used to get the sources from this branch in the ALT archive.)

PKGNAME is the name of the package to be analyzed. (The srpms are taken from the ALT archive.)

Example of a one-shot use of herodotus-helper-analyze-each

Example: analyze each release in the history of package "anacron" in ALT c7 branch:

$ mkdir anacron
$ cd anacron/
$ /home/imz/wip/2018-10-herodotos-cppcheck/herodotus-server/herodotus-helper-analyze-each c7 anacron
What herodotus-helper-analyze-each has done internally

(It has invoked various tools in a way similar to how the author of herodotos used it to analyze the Linux kernel sources.)

It has looked up the list of the known versions (releases) in c7/index/src/a/anacron/d-t-s-evr.list under /usr/src/HERODOTOS/INPUT/anacron/../ALT/repo/ (the path is constructed according to the configuration in study.hc.base from herodotus-server.git); here is what it has seen there (so that you better understand what has been going on):

$ cat /usr/src/HERODOTOS/INPUT/anacron/../ALT/repo/c7/index/src/a/anacron/d-t-s-evr.list 
1381657390	105852	-	1:2.3-alt6
$

It has cached the unpacked/"prepared" sources of each of the releases under /usr/src/HERODOTOS/INPUT/anacron/

$ ls -l /usr/src/HERODOTOS/INPUT/anacron/
total 0
drwxrwxrwx 5 imz imz 260 дек 21  2018 anacron-1@2.3-alt6
$

(One kind of trick or another is needed if you want to share the cache between multiple users of the system. TODO: describe them in details.)

Then it has run the analyzer on each of the versions (releases) of the sources; in this case, it has been cppcheck (according to the configuration in study.hc.base from herodotus-server.git).

Results of herodotus-helper-analyze-each

It has made a Git commit with the results:

$ git ls-files
.depend
.depend.erase
.depend.patterns
.projects_study.hc
RESULTS/anacron/.depend.Linux
RESULTS/anacron/Linux_cppcheck.orig.org
RESULTS/anacron/anacron-1@2.3-alt6/Linux_cppcheck.log
RESULTS/anacron/anacron-1@2.3-alt6/Linux_cppcheck.orig.org
study.hc
$

It has saved the analysis (done by cppcheck) of each release, namely, of the single known release in this case:

$ cat RESULTS/anacron/anacron-1\@2.3-alt6/Linux_cppcheck.orig.org 
* TODO [[view:/usr/src/HERODOTOS/INPUT/anacron/anacron-1@2.3-alt6/anacron/matchrx.c::face=ovl-face1::linb=52::colb=1::cole=2][Memory leak: sub_offsets]]
$

(TODO: of course, analyzing deleted packages seems like a strange behavior, so it should be fixed. But we show it here nevertheless just for the demonstration of the structure of the results.)

All results from all releases will be put together in a single common file after correlation (the "update" action on the Herodotus server), which we haven't done yet; so, it is empty for now:

$ wc -l RESULTS/anacron/Linux_cppcheck.orig.org
0 RESULTS/anacron/Linux_cppcheck.orig.org
$

herodotus-update

Test example: correlating cppcheck histories for each package in a repository branch

Installation

herodotos tool

The herodotos tool is installed from the task mentioned above in #herodotos_in_p8:

apt-repo test 257760 herodotos

or a locally-built package:

rpm -Uhv ~imz/hasher-p8-herodotos/repo/x86_64/RPMS.hasher/herodotos-0.8.0.0.21-alt1.x86_64.rpm
herodotus-server

For now, I use a locally checked-out copy of herodotus-server:

# ls -l /space/home/herodotus/bin/
total 8
lrwxrwxrwx 1 herodotus herodotus 87 ноя  9  2018 herodotus-helper-analyze-each -> /home/imz/wip/2018-10-herodotos-cppcheck/herodotus-server/herodotus-helper-analyze-each
lrwxrwxrwx 1 herodotus herodotus 74 ноя  9  2018 update -> /home/imz/wip/2018-10-herodotos-cppcheck/herodotus-server/herodotus-update

The "analyze" action is usually run by other users, so they must use the complete path to herodotus-helper-analyze-each.

(Remark: herodotus-helper-analyze-each determines the name of the analyzed package from the current working dir. The current working will be used to save the results of the analysis. After it determined the name of the package, it gets the list of releases from a local mirror of /ALT/repo/, and the source packages (if not cached already) from a local mirror of the repo.)

Cache of unpacked sources

Whenever you invoke the "analysis" action of herodotos tool, it unpacks the sources to be analyzed.

According to the study.hc.base template from herodotus-server package, the "cache" of the unpacked sources is located under /usr/src/HERODOTOS/INPUT/ (one directory per project/package; one subdir per release):

projects="/usr/src/HERODOTOS/INPUT"

The same location (in the "cache") will appear in the paths in the results (in the .org format) of the "analyze" or "correlate" actions. So, if you explore the warnings in the results and want to view the corresponding source code, Emacs will open the file in the cache.

This location must be writable by all users who run the "analyze" action themselves and readable (and maybe writable) by the "herodotus" user, under which the server runs the "correlate" action.

There are better clever ways to achieve the needed permissions, but I won't describe them now. The simplest way is to have the permissions like this (recursively):

# ls -ld /usr/src/HERODOTOS/INPUT/
drwxrwxrwx 813 herodotus herodotus 16260 дек 23  2018 /usr/src/HERODOTOS/INPUT/
ALT/repo

In my Git repo herodotos, I have a helper script for mounting /ALT/repo, which is needed to get the list of the releases of a package and to get the srpms: herodotos/scm/alt_archive.mount.

There are different possibilities (to use the public mirror via ftp, or a private one via ssh); here is an example:

sshfs -o ro,allow_other,kernel_cache team.alt:/ALT/repo /usr/src/HERODOTOS/INPUT/ALT/repo

It is found according to study.hc.base from herodotus-server package:

project Linux {

 // Too old (archived under /alt0/repo/sisyphus/task/archive and not available -- FIXME)
 local_scm = STRINGIFY(unpack+alt_archive:../ALT/repo c7 PKGNAME)
...

(When it is working with "project" under /usr/src/HERODOTOS/INPUT/, as decribed above, according to this configuration line, to find the source repo, it must go one dir up and look for ALT/repo there.)

pregirar

pregirar is a package with utilities that help to run the analysis in parallel.

Running the analysis

Preparation
$ pwd
/home/imz/wip/2018-10-herodotos-cppcheck
$ mkdir ANALYSES.4
$ cp -av ANALYSES.3/n* -t ANALYSES.4/
'ANALYSES.3/n' -> 'ANALYSES.4/n'
'ANALYSES.3/n-a-k' -> 'ANALYSES.4/n-a-k'
'ANALYSES.3/n-l-z' -> 'ANALYSES.4/n-l-z'
$ cd ANALYSES.4
$ make -f ../ANALYSES.mk mkdirs -o n_v_r-a-k -o n_v_r-l-z
mkdir -p GEARS
(cd GEARS && xargs mkdir -p) <n
$
Running in parallel

pregirar helps to run in parallel.

I run the following under screen screen in a dir ANALYSES.4 (created above).

pregirar-in_each-parallel 31 31 analyze-each /home/imz/wip/2018-10-herodotos-cppcheck/herodotus-server/herodotus-helper-analyze-each c7 <n

Evaluation of the test

Logs of the failed tasks

Compare the logs of the tasks that finished with an error:

diff -r ANALYSES.{2,3}/in_each-analyze-each/error | less

or (to see new failed tasks):

diff -qr ANALYSES.{2,3}/in_each-analyze-each/error | fgrep Only | less

Compare both the "error" and success" logs:

diff -qr ANALYSES.{2,3}/in_each-analyze-each -x archive | fgrep Only | less

archive is excluded because they might have been different number of tries in different tests.

Comapring the RESULTS
diff -r ANALYSES.{2,3}/GEARS  -x .git -x .projects_study.hc -x filelist  -x sloc_hashes -x top_dir -x Linux_cppcheck.log | less

In this diff, you basically see which RESULTS are absent (or what their diff is).

  • Comparing the Linux_cppcheck.log files is not quite useful, because the timings always differ.
  • -x filelist -x sloc_hashes -x top_dir is to exclude the working files of sloccount, which might fail sometimes. (FIXME)
  • .projects_study.hc might differ due to failed sloccount (it includes the number of lines of code)...

Exploring analysis results

Here is an illustrated example of exploring analysis results (from svace) in Herodotus with emacs.

Step 1: find the corresponding .org-file
Herodotus-emacs-explore-svace-guile18-01-dired.png

Step 2: the .org-file opened in org-mode
Herodotus-emacs-explore-svace-guile18-02-org-mode.png

Step 3: look at the source code by clicking an item
Herodotus-emacs-explore-svace-guile18-03-view-source-code-at-this-position.png

Step 4: label the warning as a BUG after thinking about it (with C-c C-t keys)
Herodotus-emacs-explore-svace-guile18-04-label-BUG.png