Public data, available from USAfacts.org, is a great resource for raw data. While that website offers some graphics content, the raw data availability, coupled with USCensus.org, begs for more in-depth analysis. The short project just shows a fe examples of what is possible with analytics.
A table of data for the most recent unemployment rates by state is low hanging fruit. the current wide gap in unemployment across the USA. Midwest states have lowest rates around 3% unemployment, while west coast and Hawaii are above 9%. The gap is somewhat surprising, but becomes easily apparent with a sorted view of the data as shown.
A second analysis of unemployment claims by state, collected over a one-year period starting in January 2020, reveals a peak in claims by mid-2020 with a consistently high level of claims on the west coast of the US.
A third analysis brings in a potential correlation of population by state to the number of employment claims. While states with the lowest population numbers do have significantly lower claims than the more populous ones, there is evidence of “order of magnitude” differences for all state populations over the course of one year’s time. Initial unemployment claims clearly peaked during the summer of 2020 (green curve in the plot above). By January, 2021 the initial unemployment claims across the country have dropped to the lowest levels recorded over the one-year time period. Thankfully, at least based on this data, things appear to be getting better.
A company only remains in business because of one reason: Customers. To know your customers is to know your business inside and out. Digging into your company’s transactional data and learning about customer behavior is essential to understand where they were, where they are now, and to anticipate where your best business growth opportunities lie.
To properly determine customer segments from transactional data, the customers (by name), categories of products, and sales volume for every distinct combination are analyzed by two algorithms: K-means clustering and UMAP dimensional reduction. This approach allows multidimensional categories of product types, product lines, product feature sets, etc. to be included and displayed as a two-dimensional map.
The mapping of customer clusters reveals similarities within each group and also what differentiates groups (segments) of customer purchasing habits. This sort of data driven approach allows senior management to develop strategy for future business development.
This analysis characterized a batch process for production of inkjet ink to meet the pH specification for various ink products. Since the nominal pH target for different inks, while manufactured by the same process were not the same, the deviation from target pH was analyzed. Results display a manufacturing process that is stable – thus in a state of statistical control.
The capability of the process with Cpk = 1.42 indicates that the specifications of +/- 1.0 pH unit are achieved with this process with very remote opportunity for production of non-conforming product. Furthermore, as the process is fairly well centered within specification limits, there is ample room for the process to drift up or down while still producing a quality product.
For any laboratory with instruments that measure properties of materials, for example, maintaining proper calibration is important to ensure valid results. Knowledge that the instrument is stable over time requires calibration records which many keep, yet few analyze the log. This project applies statistical process control methods to determine stability of an instrument.
Calibration data (collected daily) for an instrument that measures surface tension (using the Du Noüy ring method) of ethanol were analyzed over time to determine stability of the measuring device. Five daily measurements were taken each week over the course of 20 weeks, for a total of 100 individual observations. The accepted value for surface tension of ethanol is 22.39 dyne/cm. The overall average of the data reported is 22.38 dyne/cm and thus the accuracy of the measuring device is confirmed being within 0.1 dyne/cm. The x-bar and R charts demonstrate the variation of the measurements over time, and the x-bar chart reveals two data points that fall outside of the statistical control limits, suggesting that there is assignable cause in the measurement in addition to natural random variation. Further investigative work will be necessary to determine the root cause(s) of the excessive variation. The measurement process as is can reliably yield surface tension measurements in the range of ethanol to within the existing +/- 0.5 dyne/cm specification limits with the instrument under test.
This is an example of a slightly more involved JSON object. This chunk of data, for what to expect for several future date’s weather, is “structured data” that is passed from a source (like the National Weather Service) to recipients like you and me when we want to check on the weather forecast (for more than just one day) from our cell phone.
In this object there is specified a particular location by longitude and latitude coordinates, and the range of days for the forecast is an array (or list) of dates and the forecast for each. Here’s the corresponding documentation:
A new endeavor is writing API documentation. This may sound daunting, but after a short introductory online course I find it to be only moderately challenging. First, what is API? This is an acronym for “Application Programming Interface,” and what an API does is exchange data between a device (like a cell phone or a laptop) and a server. For example we all have apps on our smartphones these days and the display of today’s weather forecast may well be handled by an API, and it might look something like this:
This is an example of a JSON object (there are two types, JSON and XML), or Java Script Object Notation (and XML is the acronym for eXtensible Markup Language). This chunk of data, for what to expect of tomorrow’s weather, is “structured data” that is passed from a source (like the National Weather Service) to recipients like you and me when we want to check on the weather forecast from our cell phone.
The documentation part is what is in high demand these days. Documentation is needed so that software developers who want to create usefull apps can search for and implement thousands upon thousands of existing API’s that currently exist either in public places (like free software projects) and even private places (like within a corporate private network).
So here’s how I would document the example JSON API above:
The date of the forecast
Format is YYYY-MM-DD
The type of weather expected
May only have these values: “sunny”, “overcast”, “partly cloudy”, “raining”, and “snowing”
Maximum temperature on forecast date
in degrees Celsius
Minimum temperature on forecast date
in degrees Celsius
in km per hour, kph
Flag if conditions are expected to be dangerous
True if conditions are expected to be dangerous, otherwise false