What happened?
Using a date format to parse a string that includes month or day without leading 0 fails in pyspark. Example: %m/%d/%Y does not successfully parse 1/1/2026 in pyspark. It works fine in duckdb.
The following produces an assertion error:
import datetime as dt
import ibis
## Use duckdb connection as baseline
for con in ibis.duckdb.connect(), ibis.pyspark.connect(spark):
## Show that parsing is as expected so long as date is 2 digits
for d in dt.datetime(2026, 12, 12), dt.datetime(2026, 1, 1):
raw_date = f"{d.month}/{d.day}/{d.year}"
t0 = con.sql(f"SELECT '{raw_date}' as raw_date")
t1 = t0.mutate(parsed_date = t0.raw_date.as_date("%m/%d/%Y"))
val = t1.execute().at[0, 'parsed_date']
print(f"{raw_date!r} parsed as {val!r} using {con.dialect}")
assert val == d, (con.dialect, raw_date, val)
Output:
'12/12/2026' parsed as Timestamp('2026-12-12 00:00:00') using <class 'sqlglot.dialects.duckdb.DuckDB'>
'1/1/2026' parsed as Timestamp('2026-01-01 00:00:00') using <class 'sqlglot.dialects.duckdb.DuckDB'>
'12/12/2026' parsed as Timestamp('2026-12-12 00:00:00') using <class 'ibis.backends.sql.dialects.PySpark'>
'1/1/2026' parsed as NaT using <class 'ibis.backends.sql.dialects.PySpark'>
AssertionError: (<class 'ibis.backends.sql.dialects.PySpark'>, '1/1/2026', NaT)
What version of ibis are you using?
12.0.0
What backend(s) are you using, if any?
pyspark
Relevant log output
Code of Conduct
What happened?
Using a date format to parse a string that includes month or day without leading 0 fails in pyspark. Example:
%m/%d/%Ydoes not successfully parse1/1/2026in pyspark. It works fine in duckdb.The following produces an assertion error:
Output:
What version of ibis are you using?
12.0.0
What backend(s) are you using, if any?
pyspark
Relevant log output
Code of Conduct