Skip to content

Fix #94: resolve instance cross-references to typed Python objects instead of raw dicts#95

Merged
Raphael-Gazzotti merged 4 commits into
openMetadataInitiative:pipelinefrom
apdavison:issue94
Jun 26, 2026
Merged

Fix #94: resolve instance cross-references to typed Python objects instead of raw dicts#95
Raphael-Gazzotti merged 4 commits into
openMetadataInitiative:pipelinefrom
apdavison:issue94

Conversation

@apdavison

@apdavison apdavison commented Jun 8, 2026

Copy link
Copy Markdown
Member

Summary

Fixes #94. Library instances such as Accessibility.direct_virtual_open_access stored
properties like payment_models (and similarly channel, eligibility, form,
process, and equivalents on other classes) as raw {"@id": "..."} dicts rather than
as references to the typed Python objects they point to (e.g.
PaymentModelType.zero_cost_payment_model). This broke validate(), and caused a
KeyError in Collection.load() after a save/reload round trip, because the referenced
instance was never included in the saved collection.

Root cause

During code generation, the builder treated {"@id": "..."} values found in instance
data as opaque strings and emitted them verbatim, instead of resolving them to references
to the corresponding generated class attributes.

Fix

Resolve @id references to typed objects. Instance properties whose value is a
{"@id": "..."} dict (or a list of such dicts) that matches the @id of another loaded
instance are now emitted as references to that instance's generated Python attribute
(e.g. payment_models=[PaymentModelType.zero_cost_payment_model]) rather than as a
literal dict. References that don't resolve to a known instance (pre-existing dangling
@ids in the upstream data) fall back to the previous raw-dict behaviour.

Handle mutually-referencing classes. Resolving references this way exposed circular
import problems wherever two classes reference each other's instances (e.g.
ParcellationEntity and ParcellationEntityVersion in sands/atlas, v3/v4). The
generation of instance code was restructured so that such references no longer need to
be resolved at class-definition time: instances are first created with the properties
that don't create a cycle, and the cyclic ones are filled in afterwards, once every
class involved has been defined.

Test plan

  • Added a regression test (pipeline/tests/test_regressions.py) reproducing the
    reported scenario: building a collection containing Accessibility instances,
    validating, saving, and reloading, and asserting that cross-referenced properties
    are typed objects rather than dicts.
  • Full pytest -v pipeline/tests suite passes against the rebuilt package, including
    checks that mutually-referencing instances (e.g. ParcellationEntity /
    ParcellationEntityVersion) resolve to typed objects with no import errors.

apdavison added 2 commits June 5, 2026 12:13
…w dicts instead of typed objects

Problem: Instances such as Accessibility.direct_virtual_open_access stored
properties like payment_models as raw {"@id": "..."} dicts rather than typed
objects (e.g. PaymentModelType.zero_cost_payment_model). This caused KeyError
in Collection.load() and broke round-trip serialisation.

Root cause: The builder treated {"@id": "..."} values in instance data as
opaque strings rather than resolving them to references to typed Python objects.

Fix (part 1): Resolve {"@id": "..."} values to typed Python object references
(e.g. PaymentModelType.zero_cost_payment_model) during code generation.

Fix (part 2): Implementing part 1 surfaced circular import problems when two
classes in the same directory have instances that reference each other (e.g.
ParcellationEntity <-> ParcellationEntityVersion in v4). Rather than generating
instance assignments inline in each class module, generate separate *_instances.py
files that can be imported independently, and later.
Class modules are now pure class definitions and trigger
instance loading via `from . import *_instances as _`. To avoid cyclic imports,
certain files are patched in a second phase.
@apdavison apdavison added the bug Something isn't working label Jun 8, 2026
Comment thread pipeline/translator.py Outdated
Comment thread pipeline/utils.py Outdated
Comment thread pipeline/utils.py
Comment on lines +78 to +85
def _instance_needs_iri(props):
"""
True if any property of props will be rendered as IRI(...) by the instances template,
i.e. is a string starting with 'http' in a property other than 'id'.
"""
return any(
isinstance(value, str) and value.startswith("http") for key, value in props.items() if key != "id"
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a heuristic. Are we validating against the schema to ensure that an IRI is actually allowed here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good point, but I think it's out of scope for this pull request (we were already doing this elsewhere). I've opened #99 to keep track of it.

The list branch of _resolve_links() raised KeyError when an item's
identifier was absent from node_lookup, whereas the scalar branch already
fell back to keeping the Link. Use node_lookup.get(item.identifier, item)
in the list branch too, so unresolvable links are kept (and can be resolved
later, e.g. against the KG) instead of crashing.

This is needed by fairgraph's initialise_instances, which resolves the
recast openMINDS library instances against an id->object lookup that need
not contain every referenced id.

Adds a regression test (test_issue0094_resolve_links_tolerates_missing_id).
…ef and docstring

- Store PythonRef as separate class_name/attr_name components instead of a
  pre-joined string, removing the format-then-split round-trip in the
  cyclic-reference checks (_has_cyclic_ref, _collect_python_ref_classes).
- Stop referencing the internal PythonRef class from the translator docstring.
@Raphael-Gazzotti Raphael-Gazzotti merged commit 0f13fc0 into openMetadataInitiative:pipeline Jun 26, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Accessibility library instances store payment_models as dicts instead of PaymentModelType objects

2 participants