Statistical Inquiry into Employee Data

Problem: To develop a regression model to predict an employee’s monthly income as well as perform other statistical inquiries on a dataset

Minitab • Excel • Tests of Hypothesis • Regression Model • Statistical Analysis

May 2023

Solo Project

Preparing Data

I used the popular 'IBM HR Analytics' dataset for this study and retrieved it from this link. I used pandas for data cleaning and preparation. I only kept the following tables that were relevant to the project and dropped the rest. I randomized a subset of 200 rows and used it for the calculations to make it work in Minitab.

Descriptive Statistics

Breakdown of Continuous Variables:

Age

MonthlyIncome

DistanceFromHome

PercentSalaryHike

YearsAtCompany

Categorical Variables:

BusinessTravel

Department

JobSatisfaction

Binary Variable:

Gender: There are 118 (59%) male employees and 82 (41%) female employees in our sample

Answering Managerial Questions

Q1. The company has had a benchmark mean monthly income of $6,000. Does the sample mean MonthlyIncome meets the benchmark?
Sample mean MonthlyIncome did not exceed $6000. However, it was very close to the benchmark with a value of $5998.

Q2. Is there a difference in the proportion of male and female employees?
The sample proportion of male and female employees is not equal. 59% of employees were male are 41% were female. This may indicate a potential gender imbalance in the company's workforce.

Q3. Is there is a significant pay gap between male and female employees?
There is no significant difference between the income of male employees and income of female employees.

Q4. It is generally found that jobs that require travel led to more satisfied workers. Is there any significant relationship between frequency of travel due to business reasons and job satisfaction?
No significant relationship was found between BusinessTravel and JobSatisfaction.

Q5. Is there is any significant relationship between gender and job satisfaction at GNM?
No significant relationship was found between Gender and JobSatisfaction

Q6. With an increase in age, is there an increase or decrease in MonthlyIncome?
We found that 22.11% of variability in MonthlyIncome can be explained by the variability in Age. When age increases by 1 year, MonthlyIncome increases by $235.

Q7. Develop the best model to predict MonthlyIncome using the continuous variables.
We found that the most parsimonious model to predict MonthlyIncome is with the two variables: Age and YearsAtCompany. About 27.76% of the variability in MonthlyIncome can be explained by the two variables.1

Statistical Output

I performed a Test of hypothesis for one mean for Q1, a Test of hypothesis for one proportion for Q2, a Test of hypothesis for the difference between two means for Q3, Tests of Independence for Q4 & 5, Linear Regression for Q6, and Multiple Regression for Q7.
For the final regression model to predict Monthly Income of Employees, I used the Stepwise Regression method to find the best model.

DistanceFromHome is not a significant predictor of MonthlyIncome and is not included in the model.
In Step 3, with PercentSalaryHike added in the model, we have a R-sq value of 28.73% which is approx. only 1% more than Step 2. Also, PercentSalaryHike has a p-value of 0.105 which is statistically insignificant.
Hence, we will not include PercentSalaryHike in our regression model.

Hence, the most parsimonious model is with the two variables: Age and YearsAtCompany.

Regression Equation
MonthlyIncome = -3013 + 210.0 Age + 221.0 YearsAtCompany

With a 1 year increase in Age, MonthlyIncome increases by $210 and with 1 year increase in YearsAtCompany, MonthlyIncome increases by $221.

VATSAL LAHOTI

ME IN A WEBSITE