-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: basic Table
operations fail if the empty string is a column name
#10514
Comments
Table
if a column name is the empty stringTable
operations fail the empty string is a column name
Table
operations fail the empty string is a column nameTable
operations fail if the empty string is a column name
Thank you for bringing this to our attention and for providing the code to reproduce the issue. This behavior is quite interesting. I would like someone to examine this in more detail, but it appears that DuckDB does not support zero-length identifiers. Using the DuckDB CLI: $ duckdb
v1.1.2 f680b7d08f
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D SELECT 1 AS "";
Parser Error: zero-length delimited identifier at or near """"
LINE 1: SELECT 1 AS ""; In the meantime, you might consider swapping the default backend, as it should better support your memtable usage and allow for a zero-length identifier column name. import ibis
import polars as pl
ibis.set_backend("polars")
url = "https://raw.githubusercontent.com/PythonCharmers/PythonCharmersData/refs/heads/master/palmerpenguins.csv"
penguins_pl = pl.read_csv(url)
penguins = ibis.memtable(penguins_pl)
result = penguins.to_polars() # fails
It looks as if DuckDB (or maybe this is something Ibis handles internally) automatically assigns a column name in this case. The DuckDB documentation clarifies something similar on Deduplicating Identifiers, but I think this may be a bit different. Using the Polars backend will still keep the index column as an empty string. In [1]: from ibis.interactive import *
In [2]: data = """,name,amount
...: 0,Alice,100
...: 1,Bob,200
...: 2,Charlie,300"""
In [3]: with open("/tmp/example.csv", "w") as f:
...: f.write(data)
...:
In [4]: ibis.read_csv("/tmp/example.csv")
Out[4]:
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┓
┃ column0 ┃ name ┃ amount ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━┩
│ int64 │ string │ int64 │
├─────────┼─────────┼────────┤
│ 0 │ Alice │ 100 │
│ 1 │ Bob │ 200 │
│ 2 │ Charlie │ 300 │
└─────────┴─────────┴────────┘
In [5]: ibis.set_backend("polars")
In [6]: ibis.read_csv("/tmp/example.csv")
Out[6]:
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━━┓
┃ ┃ name ┃ amount ┃
┡━━━━━━━╇━━━━━━━━━╇━━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼─────────┼────────┤
│ 0 │ Alice │ 100 │
│ 1 │ Bob │ 200 │
│ 2 │ Charlie │ 300 │
└───────┴─────────┴────────┘ |
DuckDB doesn't support zero-length identifiers. One option is for us to enforce adding identifiers to columns when we materialize a |
What happened?
This code succeeds with the
sqlite
andpolars
backends but fails for me with theduckdb
backend:The final line raises a ValueError:
Note that a ValueError also occurs if the source or destination is a Pandas dataframe, as in this code:
However, after renaming the column named
''
(the empty string) to anything else, like' '
(a space), displaying the Table works with the DuckDB backend too:It may seem perverse and weird to have a column name as the empty string, but note that this is the default CSV output format produced by Pandas:
What version of ibis are you using?
Ibis 9.5.0
What backend(s) are you using, if any?
DuckDB 1.1.3
Relevant log output
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: