Python zarr version: 3.2.1
zarr-java version: 0.1.3
Sharded zarrs which can be read without issue locally fail when the same data is read from S3 with a checksum of the sharding index is invalid error.
Example
I created a simple 2D zarr with the following python code:
size = 1024
zarrcube = zarr.create_array(
store='/path/to/testsquare.zarr',
shape=(size, size),
shards=(size//4, size//4),
chunks=(size//16, size//16),
dtype='uint8')
zarrcube[:, :] = np.random.randint(0, 256, (size, size))
I then uploaded the data to a publicly-readable S3 bucket in us-east-1 using the following commands:
cd testsquare.zarr
aws s3 sync . 's3://my-public-zarr-bucket/shardtest/testsquare.zarr'
I then attempted to read both the local and remote files using the following java code:
String localPath = "/path/to/testsquare.zarr";
URI endpoint = new URI("https://s3.us-east-1.amazonaws.com");
S3ClientBuilder clientBuilder = S3Client.builder()
.httpClientBuilder(UrlConnectionHttpClient.builder()
.socketTimeout(Duration.ofMinutes(5)));
clientBuilder.endpointOverride(endpoint);
clientBuilder.region(Region.US_EAST_1);
S3Configuration s3Config = S3Configuration.builder().pathStyleAccessEnabled(true)
.build();
clientBuilder.serviceConfiguration(s3Config);
clientBuilder.credentialsProvider(AnonymousCredentialsProvider.create());
S3Client client = clientBuilder.build();
S3Store store = new S3Store(client, "gs-public-zarr-dev", "shardtest");
Array s3Array = Array.open(store.resolve("testsquare.zarr"));
Array localArray = Array.open(localPath);
for (int i = 0; i < 10; i++) {
long [] offset = new long[] {100l*i, 100l*i};
long [] shape = new long[] {100l, 100l};
localArray.read(offset, shape);
s3Array.read(offset, shape);
}
The following exception was thrown:
java.lang.RuntimeException: dev.zarr.zarrjava.ZarrException: The checksum of the sharding index is invalid. Stored: 1384733839 Computed: -246033701
at dev.zarr.zarrjava.core.Array.lambda$read$2(Array.java:437)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:992)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:754)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:686)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:765)
at dev.zarr.zarrjava.core.Array.read(Array.java:407)
at dev.zarr.zarrjava.core.Array.read(Array.java:344)
at com.glencoesoftware.omero.zarr.ZarrPixelBufferTest.testSharding(ZarrPixelBufferTest.java:336)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:93)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:40)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:520)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:748)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:443)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:211)
Caused by: dev.zarr.zarrjava.ZarrException: The checksum of the sharding index is invalid. Stored: 1384733839 Computed: -246033701
at dev.zarr.zarrjava.v3.codec.core.Crc32cCodec.decode(Crc32cCodec.java:40)
at dev.zarr.zarrjava.core.codec.CodecPipeline.decode(CodecPipeline.java:114)
at dev.zarr.zarrjava.v3.codec.core.ShardingIndexedCodec.decodeInternal(ShardingIndexedCodec.java:205)
at dev.zarr.zarrjava.v3.codec.core.ShardingIndexedCodec.decodePartial(ShardingIndexedCodec.java:254)
at dev.zarr.zarrjava.core.codec.CodecPipeline.decodePartial(CodecPipeline.java:95)
at dev.zarr.zarrjava.core.Array.lambda$read$2(Array.java:422)
... 42 more
Please let me know if there's any other information you need from me.
Python zarr version: 3.2.1
zarr-java version: 0.1.3
Sharded zarrs which can be read without issue locally fail when the same data is read from S3 with a
checksum of the sharding index is invaliderror.Example
I created a simple 2D zarr with the following python code:
I then uploaded the data to a publicly-readable S3 bucket in us-east-1 using the following commands:
I then attempted to read both the local and remote files using the following java code:
The following exception was thrown:
Please let me know if there's any other information you need from me.