Public data, available from USAfacts.org, is a great resource for raw data. While that website offers some graphics content, the raw data availability, coupled with USCensus.org, begs for more in-depth analysis. The short project just shows a fe examples of what is possible with analytics.
A table of data for the most recent unemployment rates by state is low hanging fruit. the current wide gap in unemployment across the USA. Midwest states have lowest rates around 3% unemployment, while west coast and Hawaii are above 9%. The gap is somewhat surprising, but becomes easily apparent with a sorted view of the data as shown.
A second analysis of unemployment claims by state, collected over a one-year period starting in January 2020, reveals a peak in claims by mid-2020 with a consistently high level of claims on the west coast of the US.
A third analysis brings in a potential correlation of population by state to the number of employment claims. While states with the lowest population numbers do have significantly lower claims than the more populous ones, there is evidence of “order of magnitude” differences for all state populations over the course of one year’s time. Initial unemployment claims clearly peaked during the summer of 2020 (green curve in the plot above). By January, 2021 the initial unemployment claims across the country have dropped to the lowest levels recorded over the one-year time period. Thankfully, at least based on this data, things appear to be getting better.
A company only remains in business because of one reason: Customers. To know your customers is to know your business inside and out. Digging into your company’s transactional data and learning about customer behavior is essential to understand where they were, where they are now, and to anticipate where your best business growth opportunities lie.
To properly determine customer segments from transactional data, the customers (by name), categories of products, and sales volume for every distinct combination are analyzed by two algorithms: K-means clustering and UMAP dimensional reduction. This approach allows multidimensional categories of product types, product lines, product feature sets, etc. to be included and displayed as a two-dimensional map.
The mapping of customer clusters reveals similarities within each group and also what differentiates groups (segments) of customer purchasing habits. This sort of data driven approach allows senior management to develop strategy for future business development.
This analysis characterized a batch process for production of inkjet ink to meet the pH specification for various ink products. Since the nominal pH target for different inks, while manufactured by the same process were not the same, the deviation from target pH was analyzed. Results display a manufacturing process that is stable – thus in a state of statistical control.
The capability of the process with Cpk = 1.42 indicates that the specifications of +/- 1.0 pH unit are achieved with this process with very remote opportunity for production of non-conforming product. Furthermore, as the process is fairly well centered within specification limits, there is ample room for the process to drift up or down while still producing a quality product.
For any laboratory with instruments that measure properties of materials, for example, maintaining proper calibration is important to ensure valid results. Knowledge that the instrument is stable over time requires calibration records which many keep, yet few analyze the log. This project applies statistical process control methods to determine stability of an instrument.
Calibration data (collected daily) for an instrument that measures surface tension (using the Du Noüy ring method) of ethanol were analyzed over time to determine stability of the measuring device. Five daily measurements were taken each week over the course of 20 weeks, for a total of 100 individual observations. The accepted value for surface tension of ethanol is 22.39 dyne/cm. The overall average of the data reported is 22.38 dyne/cm and thus the accuracy of the measuring device is confirmed being within 0.1 dyne/cm. The x-bar and R charts demonstrate the variation of the measurements over time, and the x-bar chart reveals two data points that fall outside of the statistical control limits, suggesting that there is assignable cause in the measurement in addition to natural random variation. Further investigative work will be necessary to determine the root cause(s) of the excessive variation. The measurement process as is can reliably yield surface tension measurements in the range of ethanol to within the existing +/- 0.5 dyne/cm specification limits with the instrument under test.