Introduction
Data is the foundation upon which scientific and engineering discoveries sit. Gleaning practical conclusions from observed data is the culmination of the scientific method that provides the most basic framework for communication and collaboration between technical professionals. In today’s age, data collection has become incredibly efficient. Digitized analogue data and sensors directly connected to the web create a constant stream of information that accumulates into an overwhelming mass. Leveraging such large datasets is a daunting task; organizing, validating, analyzing, and visualizing these datasets manually is very time consuming and often not feasible within project budgets.
Modern programming tools streamline the analysis of large datasets. Specifically, interpreted languages such as Julia, Python, and R provide an interactive programming experience, a simple syntax, and a high-level interface that automates more complicated aspects of programming such as data typing and memory management. Furthermore, the scientific community has embraced these languages, developing free to use frameworks for seemingly endless use cases. Learning one of these languages, and their data analysis frameworks, has many benefits including:
Discussed below are three project-oriented applications of modern programming developed by water resources engineers at CDM Smith.
Database Driven Digital Reporting for the Middle Santa Ana River TMDL
Interactive plots and dashboards are tremendously popular on the web. These tools provide a better data exploration experience, allowing for panning, zooming, hover tooltips, selections, and more capabilities that make it easier for people to glean meaningful conclusions from a single plot. These interactive plots rely on the JavaScript programming language, which is the only language that runs in a web browser. JavaScript is very popular because of its monopoly over web browser coding, but most scientists and engineers choose to learn languages like Python, R, or Julia, which have much larger libraries of public scientific computing codes. Until recently, interactive plotting required knowledge of JavaScript or the use of costly and restrictive dashboard tools like Tableau and PowerBI. Several popular Python tools like Plotly, Bokeh, and Altair have emerged to bridge the gap between Python and the web browser. Those tools provide a simple syntax to make beautifully formatted interactive plots that run on the web, and in the case of Plotly, even develop dashboard web applications entirely in Python. These Python tools were leveraged by CDM Smith to develop a web application dashboard for presenting water quality data collected by the Santa Ana Bacteria Monitoring Program.
The Santa Ana is a major river in Southern California, running from the San Bernardino Mountains to the Pacific Ocean; it has a 2,650-mi² watershed spanning Los Angeles, San Bernardino, Orange, and Riverside Counties. Consistent bacteria water quality monitoring throughout the watershed is required by a 2015 amendment to the Region’s water quality control plan. A monitoring program is administered by the Santa Ana Watershed Protection Authority (SAWPA), who retained CDM Smith to fulfill the program’s requirements. One of those requirements is submitting quarterly reports of bacteria monitoring results to the regulating body.
CDM Smith has been developing and submitting pdf bacteria monitoring reports quarterly since 2016, but in 2021 the team proposed those quarterly reports be moved to a digital platform. An interactive Python web application was developed that provides public access to all historical monitoring data and streamlines the analysis and contextualization of new data as they come in (sawpa.cdmsmith.com). Using a relational database along with Python code to provide tables, plots, and maps of data from 2016 through the most recent quarter, the application serves as a central repository of all data collected by the monitoring program. Users can more easily evaluate trends by comparing bacteria concentrations over time and at sites across the region, providing a means to better contextualize newly collected data. Python’s immense popularity, dynamic typing, and concise syntax made developing this application a modest effort. Consequently, the application could be budgeted with a no-cost scope modification that replaced the existing pdf quarterly reports with the digital online dashboard.
Interactive Early Warning System Model Visualization with GIS in R
The Delaware Valley Early Warning System (EWS) is a notification system used to alert its subscribers of surface water contamination within the lower Delaware River and Schuylkill River watersheds (Pennsylvania and New Jersey). The EWS was developed, and is being improved and maintained, by the Philadelphia Water Department in conjunction with supporting public and industrial water systems. The EWS relays real-time notifications and spill trajectory model results to over 450 registered users within minutes of an event being reported. Spill modeling is based on particle tracking simulations run in real time within the EWS framework. Simulations are driven by model results gathered from NOAA COOPS’ Delaware Bay Operational Forecast System (NOAA 2022).
CDM Smith water resources professionals, working on behalf of the Philadelphia Water Department, continually seek to improve EWS trajectory model performance. Recent efforts have focused on minimizing the computational demands associated with the Lagrangian TRANSport (LTRANS) 3D particle tracking model (North 2012) that is used by the EWS to simulate spill trajectories in the tidal Delaware River. However, rigorous model testing and analysis is required prior to implementing any proposed changes to the EWS. To support these efforts, an automated workflow and simple application (developed with the R programming language) were created as internal tools to efficiently process and visualize the thousands of output files generated by LTRANS.
The LTRANS Viewer is an application developed using Shiny, the leading R package for web app development. In conjunction with this Shiny app, LTRANS model results are first post-processed into an SQLite database using a separate R script. The LTRANS Viewer then queries the database (e.g., by model scenario, timestamp, etc.) and displays the queried particles on a fully interactive map. Color-coded simulations may be superimposed on top of one another and “played,” which allows the user to visualize the particles moving in both space and time. Additional functionality of the application includes spatial aggregation options (e.g., center of mass) to reduce on-screen particle counts, toggleable polygon layers, and interactive measuring tools, all of which are either user-defined or open-source functions written in R.
Using R Scripts for Batch-Processing Model Runs
In addition to using R to visualize model results, R can be used to efficiently batch and process model runs. This process was used to evaluate system performance for a water management district in Florida, which has been constructing and maintaining a series of wetland stormwater treatment areas as part of a habitat restoration project. The stormwater treatment areas have been constructed at several key locations to reduce the nutrient load in runoff upstream of the lake while simultaneously creating wetland habitats. The key to effective nutrient removal using the treatment systems is consistent hydration within a range of depths from 6 to 36 inches, which can be challenging due to the flashy and intermittent nature of stormwater runoff.
A decision support tool was generated to evaluate system hydration under the current configuration and to provide sizing guidelines for flow equalization basins, which are storage reservoirs used to provide consistent hydration for wetland treatment systems. The system was modeled using Stella, a graphical simulation tool designed by isee Systems to evaluate conceptual design and operations of complex flow systems. R scripts were used to pre-process model inputs, run the model, post-process model results, and generate numerous figures. The scripts led to significant time savings and allowed for additional scenario considerations to be evaluated easily and efficiently.
Summary
The examples presented in this article demonstrate the capabilities of an emerging modernized toolset for engineers and scientists. Interpreted programming languages coupled with popular open-source libraries provide an approachable means for advanced and scalable data analysis. Today’s abundance of free resources provides engineers and scientists a realistic opportunity to learn and apply programming in their everyday work. Programmed data wrangling, workflow automation, and powerful graphics packages (e.g., Plotly, Matplotlib, ggplot2) overcome the many limitations inherent with spreadsheet work, saving time and money without sacrificing high standards of quality. Dashboard and other web-application development with Python and R-Shiny circumvent outsourcing that work to third parties, which can separate the engineer from the project and may lead to miscommunication or inefficiencies. By incorporating these tools into their daily workflows, engineers can free-up time previously spent on tedious tasks and enjoy creative problem solving in the deep sandbox these programming languages provide.
References
North 2012: https://northweb.hpl.umces.edu/LTRANS.htm
NOAA 2022: https://tidesandcurrents.noaa.gov/ofs/dbofs/dbofs.html
Important: To be recognized as a BSCES member you must login using your BSCES assigned username and password.
Please note: BSCES membership is handled through ASCE but your BSCES log in information is different than your ASCE log in information. To update your contact information, please visit their Manage Your Account page.
If you have issues logging in or have additional questions regarding registration, please contact us at 617/227-5551 or bscesreg@engineers.org.
Supported by the staff of The Engineering Center Education Trust