Update supported file list to match new Archivematica processing#37
Open
liam-lloyd wants to merge 1 commit into
Open
Update supported file list to match new Archivematica processing#37liam-lloyd wants to merge 1 commit into
liam-lloyd wants to merge 1 commit into
Conversation
This commit updates the supported file list that these tests use to check conversions against. It removes the intermediate file types that were generated by the legacy process but are not generated by the new Archivematica pipeline. It also removes some files types that we don't officially support from the list, increases the time the tests wait for files to finish processing, and changes how the tests determine whether a file is done processing to reflect the behavior of the new pipeline.
slifty
approved these changes
Jun 18, 2026
There was a problem hiding this comment.
Pull request overview
This PR updates the functional upload/conversion test harness to align with the new Archivematica processing pipeline by updating the supported/expected output formats list, increasing polling timeout, and changing “processing complete” detection to include expected derivative formats.
Changes:
- Update
supported_file_types.csvto remove legacy intermediate/unsupported conversions and reflect new expected outputs. - Add expected-format loading and pass expected formats into upload polling to detect completion based on produced formats.
- Increase upload polling timeout from 60s to 300s.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| permanent_upload/validation.py | Refactors dataset loading and adds load_expected_formats() used to drive “expected derivative formats” checks. |
| permanent_upload/permanent.py | Updates post-upload polling to wait for both OK status and presence of expected formats. |
| permanent_upload/data/supported_file_types.csv | Updates expected conversion outputs to match the new Archivematica pipeline. |
| permanent_upload/main.py | Wires expected formats into upload polling and increases timeout for the new pipeline. |
Comments suppressed due to low confidence (1)
permanent_upload/main.py:54
- The f-string building the base URL uses double quotes inside the
{...}expression while the f-string itself is also delimited by double quotes. This pattern is a Python syntax error in f-strings (e.g., similar tof"{foo[\"bar\"]}"). Using single quotes inside the expression avoids the parse error.
timeout = 300
print(f"Current timeout is {timeout} seconds")
api = PermanentAPI(
f"https://{"app" if environment == "www" else "app." + environment}.permanent.org"
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+23
to
27
| def validate_supported_types(results, data_file="data/supported_file_types.csv"): | ||
| validation_dataset = _load_validation_dataset(data_file) | ||
| for result in results: | ||
| extension = result[0].split(".")[-1] | ||
| assert validation_dataset[extension] |
Comment on lines
+69
to
+70
| extension = os.path.splitext(f)[1].lstrip(".") | ||
| expected_formats = formats_by_extension.get(extension, set()) |
Comment on lines
+16
to
+20
| def load_expected_formats(data_file="data/supported_file_types.csv"): | ||
| return { | ||
| ext: set(row["conversions"].split(",")) | ||
| for ext, row in _load_validation_dataset(data_file).items() | ||
| } |
Comment on lines
+92
to
+97
| actual_formats = { | ||
| vo["type"].split(".")[-1] for vo in (record.get("FileVOs") or []) | ||
| } | ||
| processing_complete = bool(expected_formats) and expected_formats.issubset( | ||
| actual_formats | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit updates the supported file list that these tests use to check conversions against. It removes the intermediate file types that were generated by the legacy process but are not generated by the new Archivematica pipeline. It also removes some files types that we don't officially support from the list, increases the time the tests wait for files to finish processing, and changes how the tests determine whether a file is done processing to reflect the behavior of the new pipeline.