To confirm this is an architecture problem rather than a model quality problem, Databricks reran published STaRK baselines ...