Skip to content

bug: to_date fails to handle single-digit values in pyspark #12004

@btw08

Description

@btw08

What happened?

Using a date format to parse a string that includes month or day without leading 0 fails in pyspark. Example: %m/%d/%Y does not successfully parse 1/1/2026 in pyspark. It works fine in duckdb.

The following produces an assertion error:

import datetime as dt
import ibis


## Use duckdb connection as baseline
for con in ibis.duckdb.connect(), ibis.pyspark.connect(spark):
    ## Show that parsing is as expected so long as date is 2 digits
    for d in dt.datetime(2026, 12, 12), dt.datetime(2026, 1, 1):
        raw_date = f"{d.month}/{d.day}/{d.year}"
        t0 = con.sql(f"SELECT '{raw_date}' as raw_date")
        t1 = t0.mutate(parsed_date = t0.raw_date.as_date("%m/%d/%Y"))
        val = t1.execute().at[0, 'parsed_date']
        print(f"{raw_date!r} parsed as {val!r} using {con.dialect}")
        assert val == d, (con.dialect, raw_date, val)

Output:

'12/12/2026' parsed as Timestamp('2026-12-12 00:00:00') using <class 'sqlglot.dialects.duckdb.DuckDB'>
'1/1/2026' parsed as Timestamp('2026-01-01 00:00:00') using <class 'sqlglot.dialects.duckdb.DuckDB'>
'12/12/2026' parsed as Timestamp('2026-12-12 00:00:00') using <class 'ibis.backends.sql.dialects.PySpark'>
'1/1/2026' parsed as NaT using <class 'ibis.backends.sql.dialects.PySpark'>
AssertionError: (<class 'ibis.backends.sql.dialects.PySpark'>, '1/1/2026', NaT)

What version of ibis are you using?

12.0.0

What backend(s) are you using, if any?

pyspark

Relevant log output

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

Labels

bugIncorrect behavior inside of ibis

Type

No type
No fields configured for issues without a type.

Projects

Status
backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions