Building a Data Lake House — It’s Not Just for the Summer Vacation—It's Year-Round Enjoyment!

Friday, October 29, 2021 | 1:15PM–1:35PM ET
Viewing Location: Online
Session Type: Breakout Session
Delivery Format: Presentation/Panel Session
This session will be recorded for later viewing
Many institutions have implemented a traditional data warehouse for their data management/reporting requirements. WGU followed this normal path to create its own data warehouse. Eventually, as enrollment increased, WGU, as an online university, had to establish a larger data repository or a data lake to keep pace with internal data reporting and the integration of external vendor/publisher data. While the data warehouse satisfied the demands of canned reporting and handling structured data, its scalability was expensive and technology-dependent. Additionally, the data lake hosted unstructured data in its architecture; it lacked several critical features such as data quality processes and consistent support for transactions. There was a need to create a unified data platform that would combine the benefits of both architectures. Thus, the data lake house was created for a more open and standardized design with the structures and management features of a data warehouse, and the low-cost storage of a data lake. Some of the benefits of the data lake house are: (1) open source-Apache Spark, (2) coding options include SQL, Python, Scala, and R (3) data files stored in optimized format—parquet and delta, and (4) cloud agnostic. This new unified data architecture has the capacity to cater to the growing reporting needs of the university and its predictive modeling planning. WGU is aiming to be a data-centric university, moving from BI to AI.

Presenters

  • Kurt Gunnell

    Director, Institutional Research, Western Governors University
  • Narendra Pandya

    Distinguished Data Engineer, Western Governors University