Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More
Boston-based data lake analytics company Starburst today announced an integration with transformation tool dbt Cloud to help users of the platform build data pipelines spanning multiple data sources via one central plane.
The integration, which is now live as a dedicated adapter inside dbt Cloud, connects to Starburst’s SaaS offering Starburst Galaxy. It comes as a much-needed solution to federate data assets for enterprises that continue to juggle highly distributed data environments.
Starburst says the connection is easy to deploy and can be up and running in a matter of minutes.
How does Starburst Galaxy help with data transformations?
Starburst Galaxy is the cloud-native and fully managed service of Starburst’s massively parallel processing (MPP) query engine. It allows enterprise users to query a variety of data sources, or join data across multiple data sources through a single query.
Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.
With the latest integration, dbt users can use this particular capability of the SaaS platform to transform all of their data assets, regardless of where they reside. This essentially means no more need to prepare and move data via manually configured ETL pipelines — which can be cumbersome, expensive and prone to risks.
“By integrating its federated query engine with dbt’s transformation engine, Starburst aims to help data teams increase the amount of data they prepare for analytics projects. dbt users can query data in distributed locations, then clean, model, test, deliver and document those datasets for consumption. There are more than 50,000 users of the open-source dbt tool, so it’s a significant addressable market,” Kevin Petrie, VP of research at Eckerson Group, tells VentureBeat.
To use the integration, all one has to do is create a new dbt Cloud project, select Starburst as the data platform, enter credentials and connect. As soon as the authentication is done, one can start using Starburst’s query engine to transform distributed data.
“Users have to write queries as normal, using SQL JOINs between data from multiple sources while Starburst intelligently determines where to send requests,” Matt Fuller, co-founder and VP of product at Starburst, told VentureBeat. He emphasized that part of the power of this integration is how easy it is to implement and use.
Goal to maximize data coverage
While global enterprises continue to shift towards centralized data warehouses, a large number of companies still have many data assets spread across multiple distributed platforms, including on-prem databases and object storage. The new dbt-Starburst integration makes sure that these data assets are also prepared and utilized for analytics and machine learning projects.
“This integration addresses the needs of the enterprise customer base, helping them get the most out of their existing systems and extending dbt’s analytics engineering workflow platform to new cloud-first use cases without additional operational overhead,” Harrison Johnson, head of technology partnerships at Starburst, said.
The trend of having highly distributed data environments is expected to continue in the near future, making solutions like this valuable for data engineers and data analytics engineers, Petrie said.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.