Skip to content

Table: Support SELECT aliases in GROUP BY and ORDER BY#17843

Open
DaZuiZui wants to merge 3 commits into
apache:masterfrom
DaZuiZui:feat/fix-table-select-alias-group-order
Open

Table: Support SELECT aliases in GROUP BY and ORDER BY#17843
DaZuiZui wants to merge 3 commits into
apache:masterfrom
DaZuiZui:feat/fix-table-select-alias-group-order

Conversation

@DaZuiZui

@DaZuiZui DaZuiZui commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Description

This PR implements Part 1 of #17797 for the table model SQL analyzer.

It allows explicit SELECT aliases to be referenced in GROUP BY and ORDER BY.

For example:

SELECT date_bin(1h, time) AS hour_time, AVG(s1) AS avg_s1
FROM table1
GROUP BY hour_time
ORDER BY hour_time;

The alias is resolved during analysis, so existing semantic checks still apply after alias resolution.

Alias precedence rules

This PR documents and implements the name resolution rules discussed in #17797:

  • GROUP BY prefers current-query input columns over SELECT aliases. If an unqualified name does not resolve to a local input column, it may resolve to a matching SELECT alias.
  • ORDER BY prefers SELECT output aliases over input columns. If no SELECT alias matches, it falls back to the existing ORDER BY name resolution behavior.
  • ORDER BY alias resolution also applies to SELECT DISTINCT, for example SELECT DISTINCT s1 AS x FROM table1 ORDER BY x.
  • Duplicate matching SELECT aliases are rejected with an ambiguity error.

Scope

This PR only handles Part 1 of #17797:

  • SELECT alias in GROUP BY
  • SELECT alias in ORDER BY

The following items are intentionally left out of scope for a follow-up PR:

  • Lateral column alias references in the SELECT list
  • Alias references in WHERE
  • Alias references in HAVING

Refs #17797


This PR has:

  • been self-reviewed.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage.

Key changed/added classes (or packages if there are too many classes) in this PR
  • StatementAnalyzer
  • SelectAliasReuseTest
  • TestMetadata

Test

./mvnw test -pl iotdb-core/datanode -am -Dtest=SelectAliasReuseTest -DfailIfNoTests=false -Dsurefire.failIfNoSpecifiedTests=false -DskipITs

Result:

Tests run: 12, Failures: 0, Errors: 0, Skipped: 0
BUILD SUCCESS

@Caideyipi

Copy link
Copy Markdown
Collaborator

I found two correctness issues in the alias resolution path:

  1. GROUP BY input-column precedence currently treats outer-scope columns as input columns. resolvesToInputColumn() calls scope.tryResolveField(...).isPresent(), but that can resolve through the query boundary into a correlated outer scope. As a result, a correlated subquery such as:
SELECT x,
       (SELECT COUNT(*) FROM table1 GROUP BY x)
FROM table_with_x

or an inner query with SELECT expr AS x ... GROUP BY x can have GROUP BY x blocked from resolving to the inner SELECT alias just because the outer query has a column named x. The stated rule is that GROUP BY prefers input columns, which should mean the current query source scope, not outer query columns. This should check the resolved field is local to the current source scope, e.g. via ResolvedField.isLocal() / relation id, before suppressing alias resolution.

  1. ORDER BY aliases that point to window functions are re-analyzed as window functions in the ORDER BY phase. For example:
SELECT row_number() OVER (ORDER BY s1) AS rn
FROM table1
ORDER BY rn

ORDER BY rn is rewritten to the row_number() OVER (...) expression, so the function is collected both in analysis.getWindowFunctions(node) and analysis.getOrderByWindowFunctions(orderBy). QueryPlanner then plans SELECT window functions first and ORDER BY window functions again after switching to the ORDER BY scope, producing duplicate window planning instead of ordering by the SELECT output alias. For ORDER BY alias references, the planner should reuse the SELECT output symbol / field reference rather than treating the alias target as a fresh ORDER BY window expression.

@JackieTien97 JackieTien97 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one correctness gap in the GROUP BY alias resolution path.

column = outputExpressions.get(toIntExact(ordinal - 1));
verifyNoAggregateWindowOrGroupingFunctions(column, "GROUP BY clause");
} else {
column = resolveGroupBySelectAlias(column, scope, selectAliases);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we apply this same alias-resolution step to the GroupingSets branch below as well? Right now only SimpleGroupBy rewrites SELECT aliases, so queries such as SELECT s1 AS x, COUNT(*) FROM table1 GROUP BY ROLLUP(x) still reach analyzeExpression(column, scope) with x unresolved and fail with Column 'x' cannot be resolved. Since ROLLUP, CUBE, and GROUPING SETS are still GROUP BY grouping elements, they should follow the same input-column precedence and SELECT-alias fallback rule.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the table-model SQL analyzer to allow explicit SELECT ... AS <alias> aliases to be reused by name in GROUP BY and ORDER BY, following the documented precedence rules (GROUP BY prefers input columns; ORDER BY prefers output aliases). It also adds focused unit tests and test metadata to validate alias resolution, ambiguity detection, and scope boundaries.

Changes:

  • Implement SELECT alias collection during analysis and reuse those aliases in GROUP BY / ORDER BY resolution.
  • Add unit tests covering precedence rules, ambiguity errors, invalid alias usages, and subquery scoping.
  • Extend test metadata with a new table schema used for name-collision scenarios.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/analyzer/StatementAnalyzer.java Adds alias capture + resolution logic for GROUP BY / ORDER BY during semantic analysis.
iotdb-core/datanode/src/test/java/org/apache/iotdb/db/queryengine/plan/relational/analyzer/SelectAliasReuseTest.java New test suite verifying alias reuse semantics and error cases.
iotdb-core/datanode/src/test/java/org/apache/iotdb/db/queryengine/plan/relational/analyzer/TestMetadata.java Adds table_with_x schema to test alias-vs-input-column precedence.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


for (SortItem item : sortItems) {
Expression expression = item.getSortKey();
Scope expressionScope = sourceScope;

@JackieTien97 JackieTien97 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add all related test cases in an existing IT class, make sure that includes all the normal and corner cases. All new functionalities need ITs besides UTs.

Comment on lines +4217 to +4218
Scope sourceScope,
Scope orderByScope,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need another sourceScope?


private static final class SelectAlias {
private final String canonicalName;
private final Expression expression;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to store this? For resolveGroupBySelectAlias, we only need position too. With position, you can just pick expression from outputExpressions just like previous LongLiteral

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants