Description: Creating a framework to process earthquake arrival times in parallel using Apache Spark. More on GitHub.
Techniques: distributed computing, grid-search earthquake location
Tools: Spark (PySpark), Arrow, AWS (S3, EC2), MySQL, Dash, Python (pandas, numpy)
This project was completed in three weeks for the Insight Data Engineering program in Seattle, Fall 2019.
A more thorough description can be found on GitHub.
Here is a walkthrough of the web frontend: