Add the implementation for downloading mnist#1383
Conversation
Add the implementation for downloading mnist
|
Add the implementation for downloading mnist |
|
Add the implementation for downloading mnist |
|
|
||
| def check_exist_or_download(url): | ||
|
|
||
| download_dir = '/tmp/' # downloaded to the /tmp/ folder |
There was a problem hiding this comment.
download_dir is currently hard-coded to /tmp/, but the PEFT MNIST loader defaults to /tmp/mnist and expects the dataset files under that directory:
- Default dataset path: https://github.com/eyumboo/singa/blob/dev-postgresql/examples/singa_peft/examples/data/mnist.py#L85
- File lookup logic assumes
/tmp/mnist: https://github.com/eyumboo/singa/blob/dev-postgresql/examples/singa_peft/examples/data/mnist.py#L38
In addition, the example runner explicitly passes -dir /tmp/mnist:
https://github.com/eyumboo/singa/blob/dev-postgresql/examples/singa_peft/examples/run.sh#L21
Because of this mismatch, even after running the new downloader, training fails since it looks for files like /tmp/mnist/train-images-idx3-ubyte.gz, which are not present.
It would be better if the downloader:
- Writes directly to
/tmp/mnist(and creates the directory if it doesn’t exist) - Ideally supports the same
--dir-pathargument used by the training pipeline for consistency
| print("Downloading %s" % url) | ||
| urllib.request.urlretrieve(url, filename) | ||
| else: | ||
| print("Already Downloaded: %s" % url) |
There was a problem hiding this comment.
Suggestion: improve download logging clarity
The current download log is helpful, but it would be even better if we also print the resolved local path (or target directory) where files are stored. This makes it easier to debug storage issues and verify where artifacts end up, especially since the download location may depend on defaults or environment variables (e.g. /tmp/mnist).
In addition, it would improve clarity to include the local filename/path in both download states.
Proposed changes:
if not os.path.isfile(filename):
print("Downloading %s -> %s" % (url, filename))
urllib.request.urlretrieve(url, filename)
else:
print("Already Downloaded: %s -> %s" % (url, filename))|
One remaining integration suggestion for a future PR: As a result, running the downloader with its default settings is not sufficient for training to work out of the box. A future improvement could be to either:
Ideally, both scripts should share the same default directory to avoid this mismatch. Another non-blocking suggestion for a future cleanup is the directory structure under Neither of these is a blocking issue for merging this PR, but addressing them in a follow-up PR would improve the out-of-the-box experience, project organization, and overall clarity for future users. |
This PR adds download_mnist.py to examples/singa_peft/examples/data.