Open
Description
Add a Challenging SQL & Pandas Problem: "Employee Performance Analysis"
Description
I propose that we add a new challenging question to our repository that requires solving using both SQL and Pandas. The question involves real-world data analysis and combines multiple concepts. Below is the proposed problem statement.
Problem: Employee Performance Analysis
Context:
A company maintains a database of its employees along with their monthly performance metrics. The data is stored in two tables: Employees
and Performance
.
Table: Employees
Column Name | Type | Description |
---|---|---|
employee_id | INT | Unique identifier for each employee. |
name | VARCHAR | Employee name. |
department | VARCHAR | Department of the employee. |
Table: Performance
Column Name | Type | Description |
---|---|---|
employee_id | INT | Foreign key referencing Employees.employee_id . |
month | DATE | The month of the performance record (assume it is the first day of the month). |
sales | INT | Number of sales made by the employee in that month. |
targets_met | INT | Number of targets met by the employee in that month. |
Task:
Write a solution that reports, for each department:
- Employee Count: Total number of employees.
- Average Monthly Sales: Average monthly sales per employee (across all available months).
- Target Achievement Rate: The overall rate of target achievement calculated as:
[
\text{Target Achievement Rate} = \frac{\text{Total Targets Met}}{\text{Total Targets Assigned}}
]
(Assume each employee has a fixed target of 10 per month.)
Return the result table with the following columns:
department
, employee_count
, avg_monthly_sales
, target_achievement_rate
.
Round avg_monthly_sales
and target_achievement_rate
to two decimal places.
Requirements:
-
SQL Solution:
- Write a query that computes the required results using appropriate JOINs, GROUP BY, and aggregation functions.
- Ensure the query handles cases where performance records might be missing for some employees.
-
Pandas (Python) Solution:
- Write a function that accepts two DataFrames (one for
Employees
and one forPerformance
) and returns a DataFrame with the required results. - Use Pandas operations such as
merge
,groupby
, and aggregation functions to compute the values.
- Write a function that accepts two DataFrames (one for
Example (Simplified):
Given sample data, your output might look like:
department | employee_count | avg_monthly_sales | target_achievement_rate |
---|---|---|---|
IT | 5 | 250.75 | 0.92 |
Finance | 3 | 180.50 | 0.88 |
HR | 2 | 200.00 | 0.95 |
Additional Notes:
- The SQL solution should utilize functions like
ROUND()
to format the results. - The Pandas solution should be encapsulated in a function that takes the two DataFrames as input and returns the resulting DataFrame.
- This question combines multiple SQL concepts (JOINs, GROUP BY, aggregations, rounding) along with equivalent Pandas operations, making it a great practice problem for both SQL and Python skills.
Metadata
Metadata
Assignees
Labels
No labels