feat(snowflake): transpile Snowflake IDENTIFIER function to DuckDB#7723
feat(snowflake): transpile Snowflake IDENTIFIER function to DuckDB#7723fivetran-kwoodbeck wants to merge 5 commits into
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Hey @fivetran-kwoodbeck, I don't think this is the right approach. We should parse the IDENTIFIER(...) argument, and only when it's a string, otherwise you risk breaking case-sensitive identifiers. I wonder if we should just use to_column and to_table directly and produce semantically equivalent SQL without the IDENTIFIER call.
Falling back to an Anonymous node in other cases is good. I would also have a try-except block to make sure we gracefully handle parse failures and fall back to the IDENTIFIER(...) syntax.
Check this out too: https://github.com/fivetran/sqlglot-integration-tests/pull/452#pullrequestreview-4467142494.
| any_token: bool = True, | ||
| tokens: Collection[TokenType] | None = None, | ||
| ) -> exp.Expr | None: | ||
| if self._match_text_seq("IDENTIFIER", "("): |
There was a problem hiding this comment.
How come we need both this and an entry in FUNCTION_PARSERS to handle IDENTIFIER(...)?
There was a problem hiding this comment.
_parse_identifier_function is used for calls such as SELECT IDENTIFIER('foo') FROM t whereas _parse_id_var is used for calls such as CREATE TABLE IDENTIFIER('foo')
There was a problem hiding this comment.
Did you verify that this is still needed though? If we remove the IDENTIFIER()-related logic from this method, does the parser fail to handle SQL like your example?
12b5a1f to
d97ad27
Compare
d97ad27 to
12e2d66
Compare
|
|
||
|
|
||
| class DynamicIdentifier(Expression): | ||
| arg_types = {"this": True} |
There was a problem hiding this comment.
| arg_types = {"this": True} | |
| pass |
| return self.name | ||
|
|
||
|
|
||
| class DynamicIdentifier(Expression): |
There was a problem hiding this comment.
| class DynamicIdentifier(Expression): | |
| # https://docs.snowflake.com/en/sql-reference/identifier-literal | |
| class DynamicIdentifier(Expression): |
There was a problem hiding this comment.
I'm not sure this should inherit from Expression, but instead from Func. If we do that:
- We can do a simple
rename_funcin Snowflake's generator - We don't need to have an entry in
FUNCTION_PARSERS, only map"IDENTIFIER"to it inFUNCTIONSand the rest is taken care of.
Just a couple of simplifications. Since this is really a function, it also aligns better with intuition.
There was a problem hiding this comment.
sounds good, will update
| if this and this.is_string: | ||
| return this.name |
There was a problem hiding this comment.
Don't you still need to parse this when transpiling it? maybe_parse(this.name).sql(self.dialect). Otherwise if you transpile into something like Databricks, for example, you'll emit "-delimited text which is a string literal, not an identifier.
| this = expression.this | ||
| if this and this.is_string: | ||
| return this.name | ||
| return self.func("IDENTIFIER", this) |
There was a problem hiding this comment.
We should call unsupported if control falls to this return statement.
| any_token: bool = True, | ||
| tokens: Collection[TokenType] | None = None, | ||
| ) -> exp.Expr | None: | ||
| if self._match_text_seq("IDENTIFIER", "("): |
There was a problem hiding this comment.
Did you verify that this is still needed though? If we remove the IDENTIFIER()-related logic from this method, does the parser fail to handle SQL like your example?
| def _fold_identifier_literal(self, arg: exp.Expr | None) -> exp.Expr: | ||
| return self.expression(exp.DynamicIdentifier(this=arg)) |
There was a problem hiding this comment.
This looks unnecessary, let's inline this where it's needed.
There was a problem hiding this comment.
The branch is needed, if we remove it, the following test fails:
CREATE TABLE IDENTIFIER('foo') (COLUMN1 VARCHAR, COLUMN2 VARCHAR)
| return self.name | ||
|
|
||
|
|
||
| class DynamicIdentifier(Expression): |
There was a problem hiding this comment.
I'm not sure this should inherit from Expression, but instead from Func. If we do that:
- We can do a simple
rename_funcin Snowflake's generator - We don't need to have an entry in
FUNCTION_PARSERS, only map"IDENTIFIER"to it inFUNCTIONSand the rest is taken care of.
Just a couple of simplifications. Since this is really a function, it also aligns better with intuition.
12e2d66 to
bdeb4b5
Compare
Snowflake's
IDENTIFIER(), which resolves a string to a runtime identifier, previously fell through toexp.Anonymous. This folds it into a typedexp.Identifiernode (flagged withidentifier_func=True) so the Snowflake generator reconstructsIDENTIFIER('name')and the DuckDB generator emits a native identifier. It includes splitting dot-qualified names likeIDENTIFIER('t1.col')into a properexp.Dotchain.Non-foldable args like
IDENTIFIER($var)remain asexp.Anonymousso they round-trip correctly.Behavior