Skip to content

Conversation

@fantazio
Copy link
Collaborator

@fantazio fantazio commented Jan 24, 2026

Fix #47

The .exp files are updated to reflect the expected dissociation and new tests are added to witness details of the optional arguments reporting semantics (see #47).

The invalid results were due to an inconsistency between sourcepaths built from cmt_infos.cmt_sourcefile and filenames found in the typedtrees' locations. The former could contain a .pp extension, unlike the latter. The .pp was already removed from compilation units for consistency (f090928).
Because of this inconsistency, DeadCommon.VdNode.eof would not be able to discard opt args "internal" declarations and, thus, dissociate them from their corresponding "external" declarations. Therefore, the internal and external uses of a value were all linked to the internal declaration, and the reports reflected this.
Removing .pp directly from the sourcepath rather than on compilation units fixes the inconsistency and ensures that it is mechanically absent from compilation units.

This change improves the performance on Frama-C 31.0. I measured a gain of ~10s (from ~22s to ~12s) when running the analyzer with -v --all on my machine, without any noticeable impact on memory consumption.
The change in results only affects the optional arguments reports. I reviewed 25% of it in-depth and another 25% quickly to confirm the update works as expected and only brings fixes.

Optional arguments always/never used have a peculiar semantic. If a
function is only used internally, then the optional arguments reports
only contain references to the implementation (`.ml`). If it is only
used externally, and there is an interface (`.mli`), then they only
contain references to the interface (`.mli`). However, when it is used
both internally and externally, then they contain references to both,
depending on the internal and external uses of the optional arguments.
This is why the optional arguments reports may appear as "duplicated" :
the same optional argument is referenced as always/never used both in
the `.ml` and the `.mli`.

The expected reports for optional arguments in
`preprocessed_lib/preprocessed.ml` were therefore invalid because they
were "unified", as if internal and external uses all counted as internal
uses.
By following the described semantics, the `?internal` (resp.
`?external`) optional argument is now expected to be reported as always
used for the `.ml` (resp. `.mli`). Similarly for the `?external` (resp.
`?internal`) which is expected to be reported as never used in the `.ml`
(resp. `.mli`).
Because of this dissociation, the `?external` and `?internal` arguments
are not expected to be reported as almost always/never used anymore in
the threshold-3-0.5 test scenario.
The new tests are witnesses of the non-dissociation of the internal and
external uses of the optional arguments. In particular, the optional
arguments of the `Preprocessd.externally_used_f` function should only be
reported with locations in `preprocessed_lib/preprocessed.mli` but are
currently wrongly reported in `preprocessed_lib/preprocessed.ml`, as if
there was no `.mli` like in the case of `Preprocessed_no_intf`.
Rather than removing the `.pp` extension in `Utils.unit`, it is directly
removed from the sourcepaths read in `.cmt` files. This way,
compilation units remain consistent; and file paths read in locations
are now also consistent with the sourcepaths. More details on why this
fixes the observed FN and FP below.

With this change, I measured a gain of ~10s (from ~22s to ~12s) when
running the analyzer with `-v --all` on
[Frama-C 31.0](https://git.frama-c.com/pub/frama-c/-/tree/31.0?ref_type=tags)
on my machine. This is coherent with the few computations of the cmts'
sourcepaths (once per `.cmt` read) vs the amount of time the compilation
unit is computed (multiple times on every potentially meaningful
location).

As mentionned in f090928, there was a difference between the compilation
unit of a preprocessed file and the one of an unpreprocessed file.
Forcing them to have the same compilation unit by removing the `.pp`
extension is sufficient in most cases.
However, in the case of optional arguments, some computations relied on
finding whether a location belongs to the current or a previously
analyzed sourcefile, or if it belongs to an unknown (potentially
analyzed later) sourcefile. In particular, `VdNode.eof` relies on this
to discard "internal" locations and, thus, dissociate them from their
corresponding "external" locations.
This property is checked in `DeadCommon.VdNode.seen` by verifying if the
location has a corresponding sourcepath in the `DeadCommon.abspath`
table. This table associates compilation units (as obtained by
`Utils.unit`) with sourcepaths (read in the `.cmi` and `.cmt` files) and
is filled when starting the analysis of either a `.cmi` or a `.cmt`.
Verifying that a location belongs to a known file is done by checking if
its filename is a suffix of one of the sourcepaths associated with its
compilation unit.
The sourcepath of a `.cmt` file may end with `.pp.ml`, while locaitons
found in the typedtree will not contain the `.pp` part. Therefore, the
inconsistency observed in f090928 still remains for sourcepaths. Because
of this inconsistency, the internal and external locations are not
dissociated at the end of the analysis of a `.cmt`, leading to the
observed FN and FP.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dissociate optional arguments in preprocessed files

1 participant