Skip to content

gsoc26: Mapping Conversion Layer (Layer 3), #60#63

Open
DhanashreePetare wants to merge 15 commits into
dbpedia:gsoc-2026from
DhanashreePetare:gsoc-2026
Open

gsoc26: Mapping Conversion Layer (Layer 3), #60#63
DhanashreePetare wants to merge 15 commits into
dbpedia:gsoc-2026from
DhanashreePetare:gsoc-2026

Conversation

@DhanashreePetare

Copy link
Copy Markdown
Collaborator

Pull Request

Description

Implements Layer 3 (cross-class mapping conversion) for the Databus Python Client download pipeline, building on the Layer 2 format handlers (TripleHandler, QuadHandler, TSDHandler) merged in #[link Layer 2 PR number]. Users can convert RDF triples to/from CSV, and convert between RDF triple and quad formats, using --format extended with --graph-name and --base-uri.

Related Issues
Issue #60

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • This change requires a documentation update
  • Housekeeping

Checklist:

  • My code follows the ruff code style of this project.
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (if applicable)
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
    • poetry run pytest - all tests passed
    • poetry run ruff check - no linting errors

What was added:

  • databusclient/filehandling/mapping.py — 5 mapping functions using the Layer 2 handlers as IR (TripleHandler/QuadHandler/TSDHandler). Companion .meta.json preserves RDF datatypes and language tags for lossless CSV round trips. Blank nodes serialized with _: prefix for correct round-trip reconstruction.
  • --graph-name and --base-uri flags added to download command.

Tests:

  • 5 mapping round trip tests in tests/test_format_round_trips.py (IR captured before conversion, consistent with Layer 2 pattern)
  • 29 functional tests in tests/test_mapping_conversions.py covering all 5 directions plus edge cases (blank nodes, missing companion file, missing required flags)

Closes #60

@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4ab7b2da-9c80-4e09-9cfa-c47c16e6481c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Comment on lines +49 to +77
# Sample Turtle with typed literals, blank nodes, multi-valued predicates
SAMPLE_TTL_CONTENT = """\
@base <https://example.org/data/> .
@prefix ex: <https://example.org/vocab/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<alice> foaf:name "Alice" ;
ex:age 29 ;
ex:livesAt _:address1 .

_:address1 ex:city "Leipzig" ;
ex:country "Germany" .

<bob> foaf:name "Bob" ;
ex:age 34 ;
ex:knows <alice> .

<project1> ex:title "Databus Example Project" ;
ex:member <alice> .
"""

SAMPLE_NQ_CONTENT = """\
<https://example.org/data/alice> <http://xmlns.com/foaf/0.1/name> "Alice" <https://example.org/graph/people> .
<https://example.org/data/alice> <https://example.org/vocab/age> "29"^^<http://www.w3.org/2001/XMLSchema#integer> <https://example.org/graph/people> .
<https://example.org/data/bob> <http://xmlns.com/foaf/0.1/name> "Bob" <https://example.org/graph/people> .
<https://example.org/data/project1> <https://example.org/vocab/title> "Databus Example Project" <https://example.org/graph/projects> .
<https://example.org/data/project1> <https://example.org/vocab/member> <https://example.org/data/alice> <https://example.org/graph/projects> .
"""

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put test data in tests/resources

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should align these tests more closely with the round-trip mapping idea from the paper:

Within a round trip mapping test, we take a file i and convert it to file c of
the format of another equivalence class before we convert c back to file o of the
same format as i. Therefore, i first has to be read into the internal data structure
of the equivalence class of i (see (1) in Fig. 3). Then this data is mapped to the
internal data structure of the equivalence class of c (2) before it is written out
to c (3). Next, c is read into the internal data structure of its equivalence class
(4). That resulting data is mapped back to the internal data structure of the
equivalence class i (5). In the last step (6), this internal data structure data is
written out to o. If the information of the input file i is equal to the information
of output file o, the round trip test succeeds.

The core test should not only check that each conversion direction works, files are created, or counts are non-zero. Instead, it should test:

So I would suggest adding explicit round-trip tests such as:

Triple → Quad → Triple
Triple → CSV/TSV → Triple

and compare the original RDF graph with the reconstructed RDF graph.

For mappings like Quad → Triple → Quad you have to be creative. When two files are created from one Quad file, they could be both separately converted back to Quad and merged and then compared.

Of course a test like Quad -> TSD -> QUAD does not work since TSD to QUAD does not exist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants