Skip to content

Update supported file list to match new Archivematica processing#37

Open
liam-lloyd wants to merge 1 commit into
mainfrom
remove_expected_intermediated_formats
Open

Update supported file list to match new Archivematica processing#37
liam-lloyd wants to merge 1 commit into
mainfrom
remove_expected_intermediated_formats

Conversation

@liam-lloyd

Copy link
Copy Markdown
Member

This commit updates the supported file list that these tests use to check conversions against. It removes the intermediate file types that were generated by the legacy process but are not generated by the new Archivematica pipeline. It also removes some files types that we don't officially support from the list, increases the time the tests wait for files to finish processing, and changes how the tests determine whether a file is done processing to reflect the behavior of the new pipeline.

This commit updates the supported file list that these tests use to
check conversions against. It removes the intermediate file types that
were generated by the legacy process but are not generated by the new
Archivematica pipeline. It also removes some files types that we don't
officially support from the list, increases the time the tests wait
for files to finish processing, and changes how the tests determine
whether a file is done processing to reflect the behavior of the new
pipeline.

@slifty slifty left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend splitting the correctness portion of the PR (the updated types) from the test-tooling portion (improving how tests run) via separate commits. That said, it's also fine either way!

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the functional upload/conversion test harness to align with the new Archivematica processing pipeline by updating the supported/expected output formats list, increasing polling timeout, and changing “processing complete” detection to include expected derivative formats.

Changes:

  • Update supported_file_types.csv to remove legacy intermediate/unsupported conversions and reflect new expected outputs.
  • Add expected-format loading and pass expected formats into upload polling to detect completion based on produced formats.
  • Increase upload polling timeout from 60s to 300s.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
permanent_upload/validation.py Refactors dataset loading and adds load_expected_formats() used to drive “expected derivative formats” checks.
permanent_upload/permanent.py Updates post-upload polling to wait for both OK status and presence of expected formats.
permanent_upload/data/supported_file_types.csv Updates expected conversion outputs to match the new Archivematica pipeline.
permanent_upload/main.py Wires expected formats into upload polling and increases timeout for the new pipeline.
Comments suppressed due to low confidence (1)

permanent_upload/main.py:54

  • The f-string building the base URL uses double quotes inside the {...} expression while the f-string itself is also delimited by double quotes. This pattern is a Python syntax error in f-strings (e.g., similar to f"{foo[\"bar\"]}"). Using single quotes inside the expression avoids the parse error.
    timeout = 300
    print(f"Current timeout is {timeout} seconds")

    api = PermanentAPI(
        f"https://{"app" if environment == "www" else "app." + environment}.permanent.org"
    )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +23 to 27
def validate_supported_types(results, data_file="data/supported_file_types.csv"):
validation_dataset = _load_validation_dataset(data_file)
for result in results:
extension = result[0].split(".")[-1]
assert validation_dataset[extension]
Comment on lines +69 to +70
extension = os.path.splitext(f)[1].lstrip(".")
expected_formats = formats_by_extension.get(extension, set())
Comment on lines +16 to +20
def load_expected_formats(data_file="data/supported_file_types.csv"):
return {
ext: set(row["conversions"].split(","))
for ext, row in _load_validation_dataset(data_file).items()
}
Comment on lines +92 to +97
actual_formats = {
vo["type"].split(".")[-1] for vo in (record.get("FileVOs") or [])
}
processing_complete = bool(expected_formats) and expected_formats.issubset(
actual_formats
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants