feat: add Spark History Server link to job details#113
feat: add Spark History Server link to job details#113Shivang Nagta (ShivangNagta) wants to merge 3 commits into
Conversation
| <Button | ||
| styleType='text-blue' | ||
| as='externalLink' | ||
| href={`https://spark-history.data-platform.aws.pattern.com/history/${jobData?.spark_application_id}/jobs/`} |
There was a problem hiding this comment.
Shivang Nagta (@ShivangNagta) this is going to be shipped with oss docker image. Lets find another way to inject this in UI
Yash Shrivastava (alephys26)
left a comment
There was a problem hiding this comment.
If this is going to the jobs table, add something more generic, maybe a json that stores key-value and the column is extra_job_attributes. And the frontend then uses the key as display text and the value as the hyperlink for all the attributes that exist for that column.
Or some better approach, anyway, a specific column for spark application ID is not what I would like to see.
I have added an |
|
Also, the spark history server URL is now added in the cluster context instead |
|
I have moved the logic for defining the template for extra job attribute values to config itself(cluster). Now the runtime metadata is stored in-memory (output field in job struct), and is rendered based on what was passed in the template. It is finally persisted to the DB column - extra_job_atttributes as it was previously |
prasadlohakpure
left a comment
There was a problem hiding this comment.
Nice work, LGTM
Description
Adds a "Spark History" link on the job details page for Spark-on-EKS jobs.
The link needs Spark's runtime application id, which
surfaces on the SparkApplication Custom Resource status. So the sparkeks plugin now
captures
Status.SparkApplicationIDduring job monitoring and persists iton the job (new
spark_application_idcolumn, added asnot null default ''); the UI then renders the History Serverlink when an id is present, and hide it otherwise.
Test
Tested locally (migration, persistence, UI);
Haven't done e2e testing in sandbox as this adds no extra API call for the id. The monitor loop
already fetches
Status(forAppState), andSparkApplicationIDis just another field on that same object, so reading it adds no new call or failure mode.Confirmed the spark-operator populates Status.SparkApplicationID at runtime by running the operator's spark-pi example on a local kind cluster and reading the field back.

Manual seeding for testing (for spark and non-spark job)

Button Rendering in Job Details Page (for a spark job)

Some Notes (open for comments)
spark_application_idis the first plugin-specific column onjobs(the other columns are generic). It looks a little bit odd to me but Claude's reasoning for it was - "there's no generic home for plugin runtime metadata as of now", which seems to be true, because our use case is to store a runtime generated data (spark_application_id) in the heimdall database. I could not find in any other plugins, doing something like this.Some other options could be:
a. add a separate table for storing
spark_application_idwith a foreign key reference to the original job table. This separates the spark specific data from generic job table but that adds an extra API/read call, and also does not avoid the fact that we would still have to add spark specific table update somewhere inupdateAsyncJobStatusfunction.b. If more plugins need to store runtime metadata, a generic
metadatacolumn may be preferable to per-plugin columns. But this seems like an early abstraction.