Turbocharge Your Data Science Workbench
Use Web Components with Jupyter Notebooks
This is where you start. A blank Jupyter Notebook, ready for action. It is assumed that all necessary software has been installed, including additional package like Pandas, Numpy, Matplotlib etc. Introduction to Data For the purpose of demonstration, we will analyse flight delays from public sources of information. United States Department of Transport (www.bts.gov) has made available Airline On-Time Statistics on its web site. Typically, this data is available within a month. We will analyse the flight delays for the month of July 2017 and will target to build something like the figure below.
There are good analysis and graphic representations on the site. However, as Data Scientists, we always have the curiosity to do further analysis on top of data, using our own algorithms and approaches. The starting point, however, is to match the results first. The raw data is in the form of one line per airline per airport. The file contains 21 fields but for now, we will focus on two which are most relevant i.e. number of flights which landed and, number of flights which were delayed. Figure 3 : Snapshot of raw data Prepare for Analysis. Ingest Data in memory Python Pandas is the most obvious choice to read the data and make it ready in memory. Since we already have the data in Comma Separated Values (CSV) format, Pandas has a ready made support for it.
First parameter of the execute call, can be a simple Python command, e.g. df.head() or, a function e.g. ontime(df) as above. That’s it. This needs to be done only once. Now we are ready to use the component. Please note that the code pieces above, just illustrate the concepts. The entire working code, can be downloaded from here. Use the component in Jupyter Notebooks And it all boils down to one line of code
From here on, you can combine your HTML creativity to build a highly interactive application, which looks and behaves like a web application but still, combines the high power of Python and Jupyter Notebook to deliver a Data Science application. We were able to build an interactive web application under Jupyter such that a use can select the various options from a dropdown menu and can see the results immediately, without clicking on the any of IPython hot buttons like “Run”. Like other web applications which react to change in a text input, we could connect the text entered, with the query, and deliver the results instantly. Here are some snapshots from the final Notebook.
In conclusion This blog, is one more in the series of blogs related to Web Components. Even though the standards are being finalized and implemented, the combination of Web Components with Jupyter Notebooks has tremendous potential in today’s Data Science development work. References
- Custom Elements v1: Reusable Web Components by Eric Bidelman (https://developers.google.com/web/fundamentals/web-components/customelements)
- Airline On-Time Statistics (https://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp)
- Install Python 3 (https://www.python.org), Jupyter(http://jupyter.org/), Pandas (http://pandas.pydata.org/) and Matplotlib (https://matplotlib.org/).