Data exploration is the process of describing the main characteristics of a set of data. This is often the first step people take when performing data analysis. During this process, the analyst decides which questions the data is going to answer, what hypothesis it will likely support, and which variables have meaningful relationships. During the data exploration phase, analysts familiarize themselves with the data while seeking these answers.
Data Visualization
The most common way analysts perform the data exploration step is through data visualization software. By having quick access to data and the ability to view relationships in a graphical way, users can easily see which variables have a significant relationship and should be analyzed further. For example, a user can use a BI tool to create a scatter plot comparing two variables. They can then replace one of the variables with another, then continue this process until they notice a correlation in the plot. They may then decide to pursue the relationship between those variables further, discovering if there is any information that could benefit their company.
Steps of Data Exploration
There are two main steps in performing data exploration: univariate analysis and bivariate analysis.
- Univariate Analysis: This type of analysis looks at variables individually and discovers whether they are categorical or numerical. Categorical variables have two or more specific categories, such as gender (male and female). Numerical variables have values that fall within an interval, such as height or weight. Determining the types of the variables is important for determining their relationships.
- Bivariate Analysis: This analysis compares two variables and the relationship between them. Depending on the types of variables discovered in univariate analysis, these relationships can by numerical-numerical, categorical-categorical, or numerical-categorical.