Data Science Hierarchy of Needs for Beginners

Understanding the Data Science Hierarchy of Needs

You may have heard the saying, “Data is the new gold.” But in fact, it’s the data insights that are worth their weight in gold. You’ll know the difference after reading this.

‍

Data science is the process of analyzing and collecting large amounts of data to uncover trends and relationships. These relationships can then be applied to creating models and solutions that provide insights, drive decision-making and inform business strategies.

‍

Data science can be broken down into a “Hierarchy of Needs” to better understand how data is processed and the corresponding value it achieves.

‍

‍

Level 1: Data Collection - Financial Filings

At the base of the data science “Hierarchy of Needs” is Data Collection. This step involves gathering the necessary raw data from sources like databases, the internet, or existing files. It is essential to make sure the collection of data is of good quality, accurate, and current.

‍

What is Raw Data?

Data is the fuel for machine learning models. It’s considered "raw data" when it hasn't been processed yet. It's just bulk information that needs to be organized into something meaningful for analysis purposes.

‍

AnalytixInsight collects financial statement data from 50,000 publicly-traded companies globally as it's source of raw data.

‍

Level 2: Data Cleaning & Storage - Financial Filters

This stage involves working with the data in order to detect patterns and uncover correlations, using methods like data cleaning, encoding, normalizing, etc. This step is where we start to uncover meaningful information that can be utilized to answer questions.

‍

What is Data Cleaning?

Data cleaning is the critical process of detecting and removing errors and inconsistencies in data. It involves using algorithms to detect errors such as missing values, outliers, duplicate records and erroneous values. The goal is to improve the quality of your dataset so that it's ready for analysis.

‍

What is Structured Data?

Structured data is data that is organized and formatted in a way that makes it easy to query and manipulate. It’s considered structured if the data can be represented as a table, spreadsheet or other tabular format.

‍

When processing raw financial data, AnalytixInsight extracts more than 140 data points per financial filing. The data is then standardized and filters are applied to remove incorrect, duplicate, or corrupted data. This ensures we’re using quality data for subsequent analysis and modeling.

‍

Level 3: Descriptive Analytics - Financial Analysis

As we advance to the next level in the Hierarchy of Needs, descriptive analytics can be very helpful when trying to understand a problem or make decisions. This process focuses on understanding what has happened in the past and is used to inform current decisions from actionable insights.

‍

For example, AnalytixInsight’s data analytics platform analyzes the financial data of a company and generates insightful descriptions from this analysis such as, “At the current level of operating expenses per quarter, this is sufficient cash for approximately 4.87 quarters”.

‍

It’s important to remember that descriptive analytics provide only a partial picture of the bigger analytical landscape – adding other approaches such as prescriptive analytics can result in even greater insights about related behavior or future predictions.

‍

Ultimately, descriptive analytics provides an essential foundation for predictive analytics, which takes this data one step further by using pattern recognition to forecast probable outcomes associated with certain behaviors or conditions.

‍

Level 4: Diagnostic Analytics - Financial Insights

Diagnostic analytics plays a crucial role in understanding the underlying cause and effect relationships within datasets. It's used to determine which factors are most important in a data set, helping us uncover hidden patterns, anomalies and trends in large datasets and drill down to identify root causes.

‍

Diagnostic analytics also reveals outliers, or discrepancies in data points, providing greater overall transparency into the health of a company's finances. For example, AnalytixInsight’s data analysis on a company can diagnose a company’s earnings to uncover insights such as: “relatively strong net income margin for the last twelve months combined with relatively low accruals suggest possible aggressive accounting and an overstatement of its reported net income”.

‍

Coupled with powerful tools like machine learning algorithms and natural language generation, meaningful insights can be extracted from vast amounts of seemingly unrelated data points to create machine-generated research reports.

‍

Level 5: Predictive Modeling - Corporate Actions

A variety of techniques are used for predictive modeling, such as machine learning algorithms, artificial intelligence, statistics and natural language processing to develop their predictive models.

‍

With these tools, we are able to build models that become more robust over time as new data points are added. This serves as a powerful tool that allows us to unlock insights from past information to better inform future decisions and help predict the likelihood of future events.

‍

For example, as we explored in Using AI to Predict META’s $40 Billion share buyback, AnalytixInsight analyzes the presence of the financial conditions that would be favorable for a company’s board of directors to approve corporate actions such as buying back shares, cutting or initiating a dividend, or, acquiring or being acquired within the peer group.

‍

Level 6: Prescriptive Optimization - Robo-Advisor

In the highest level in Hierarchy of Needs, prescriptive optimization is an advanced area of data science that has the potential to revolutionize decision-making. It combines predictive analytics, machine learning, and AI to generate insight-driven outcomes. It's a step above predictive analytics because it involves making decisions based on what you know and want to happen.

‍

Prescriptive optimization focuses on determining what action should be taken in order to achieve the best outcome possible. By analyzing large amounts of data and using sophisticated algorithms, it can provide insights into what action should be taken at different stages of the process in order to maximize a desired result.

‍

Robo-advisor is an example of prescriptive optimization. By analyzing the universe of securities an advanced robo-advisor tool can execute on buying or selling securities, provide real-time alerts and recommendations based on market conditions, and balance a portfolio based on an investor’s Know-Your-Client risk tolerance and investment goals (more on this in a future blog post!).

‍

Machine Learning and Artificial Intelligence (AI)

Machine Learning is a subset of AI, which is in turn a subset of data science. Data science is the broadest term used to describe everything from simple descriptive analysis through complex predictive modeling.

‍

Conclusion

The data science Hierarchy of Needs is a great visualization for those who wish to understand the complexity of data science and stay up-to-date on this ever-evolving field. This hierarchy provides a structure to help us better understand what data science is, how it works, and which technologies are used at each stage in order to reach desired outcomes.

Data Science Hierarchy of Needs for Beginners

Understanding the Data Science Hierarchy of Needs