Update supported file list to match new Archivematica processing by liam-lloyd · Pull Request #37 · PermanentOrg/functional-test

liam-lloyd · 2026-06-17T22:20:34Z

This commit updates the supported file list that these tests use to check conversions against. It removes the intermediate file types that were generated by the legacy process but are not generated by the new Archivematica pipeline. It also removes some files types that we don't officially support from the list, increases the time the tests wait for files to finish processing, and changes how the tests determine whether a file is done processing to reflect the behavior of the new pipeline.

slifty

I would recommend splitting the correctness portion of the PR (the updated types) from the test-tooling portion (improving how tests run) via separate commits. That said, it's also fine either way!

Copilot

Pull request overview

This PR updates the functional upload/conversion test harness to align with the new Archivematica processing pipeline by updating the supported/expected output formats list, increasing polling timeout, and changing “processing complete” detection to include expected derivative formats.

Changes:

Update supported_file_types.csv to remove legacy intermediate/unsupported conversions and reflect new expected outputs.
Add expected-format loading and pass expected formats into upload polling to detect completion based on produced formats.
Increase upload polling timeout from 60s to 300s.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
permanent_upload/validation.py	Refactors dataset loading and adds `load_expected_formats()` used to drive “expected derivative formats” checks.
permanent_upload/permanent.py	Updates post-upload polling to wait for both OK status and presence of expected formats.
permanent_upload/data/supported_file_types.csv	Updates expected conversion outputs to match the new Archivematica pipeline.
permanent_upload/main.py	Wires expected formats into upload polling and increases timeout for the new pipeline.

Comments suppressed due to low confidence (1)

permanent_upload/main.py:54

The f-string building the base URL uses double quotes inside the {...} expression while the f-string itself is also delimited by double quotes. This pattern is a Python syntax error in f-strings (e.g., similar to f"{foo[\"bar\"]}"). Using single quotes inside the expression avoids the parse error.

    timeout = 300
    print(f"Current timeout is {timeout} seconds")

    api = PermanentAPI(
        f"https://{"app" if environment == "www" else "app." + environment}.permanent.org"
    )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+def validate_supported_types(results, data_file="data/supported_file_types.csv"):
+    validation_dataset = _load_validation_dataset(data_file)
    for result in results:
        extension = result[0].split(".")[-1]
        assert validation_dataset[extension]


+        extension = os.path.splitext(f)[1].lstrip(".")
+        expected_formats = formats_by_extension.get(extension, set())


+def load_expected_formats(data_file="data/supported_file_types.csv"):
+    return {
+        ext: set(row["conversions"].split(","))
+        for ext, row in _load_validation_dataset(data_file).items()
+    }


+            actual_formats = {
+                vo["type"].split(".")[-1] for vo in (record.get("FileVOs") or [])
+            }
+            processing_complete = bool(expected_formats) and expected_formats.issubset(
+                actual_formats
+            )


liam-lloyd requested review from cecilia-donnelly and slifty June 17, 2026 22:20

slifty requested a review from Copilot June 18, 2026 17:39

Copilot started reviewing on behalf of slifty June 18, 2026 17:40 View session

slifty approved these changes Jun 18, 2026

View reviewed changes

Copilot AI reviewed Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update supported file list to match new Archivematica processing#37

Update supported file list to match new Archivematica processing#37
liam-lloyd wants to merge 1 commit into
mainfrom
remove_expected_intermediated_formats

liam-lloyd commented Jun 17, 2026

Uh oh!

slifty left a comment •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		extension = os.path.splitext(f)[1].lstrip(".")
		expected_formats = formats_by_extension.get(extension, set())

Conversation

liam-lloyd commented Jun 17, 2026

Uh oh!

slifty left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

slifty left a comment •

edited

Loading