Skip to content

Add a Challenging SQL & Pandas Problem: "Employee Performance Analysis" #95

Open
@iamAntimPal

Description

@iamAntimPal

Add a Challenging SQL & Pandas Problem: "Employee Performance Analysis"

Description

I propose that we add a new challenging question to our repository that requires solving using both SQL and Pandas. The question involves real-world data analysis and combines multiple concepts. Below is the proposed problem statement.


Problem: Employee Performance Analysis

Context:
A company maintains a database of its employees along with their monthly performance metrics. The data is stored in two tables: Employees and Performance.

Table: Employees

Column Name Type Description
employee_id INT Unique identifier for each employee.
name VARCHAR Employee name.
department VARCHAR Department of the employee.

Table: Performance

Column Name Type Description
employee_id INT Foreign key referencing Employees.employee_id.
month DATE The month of the performance record (assume it is the first day of the month).
sales INT Number of sales made by the employee in that month.
targets_met INT Number of targets met by the employee in that month.

Task:
Write a solution that reports, for each department:

  • Employee Count: Total number of employees.
  • Average Monthly Sales: Average monthly sales per employee (across all available months).
  • Target Achievement Rate: The overall rate of target achievement calculated as:
    [
    \text{Target Achievement Rate} = \frac{\text{Total Targets Met}}{\text{Total Targets Assigned}}
    ]
    (Assume each employee has a fixed target of 10 per month.)

Return the result table with the following columns:
department, employee_count, avg_monthly_sales, target_achievement_rate.
Round avg_monthly_sales and target_achievement_rate to two decimal places.


Requirements:

  1. SQL Solution:

    • Write a query that computes the required results using appropriate JOINs, GROUP BY, and aggregation functions.
    • Ensure the query handles cases where performance records might be missing for some employees.
  2. Pandas (Python) Solution:

    • Write a function that accepts two DataFrames (one for Employees and one for Performance) and returns a DataFrame with the required results.
    • Use Pandas operations such as merge, groupby, and aggregation functions to compute the values.

Example (Simplified):

Given sample data, your output might look like:

department employee_count avg_monthly_sales target_achievement_rate
IT 5 250.75 0.92
Finance 3 180.50 0.88
HR 2 200.00 0.95

Additional Notes:

  • The SQL solution should utilize functions like ROUND() to format the results.
  • The Pandas solution should be encapsulated in a function that takes the two DataFrames as input and returns the resulting DataFrame.
  • This question combines multiple SQL concepts (JOINs, GROUP BY, aggregations, rounding) along with equivalent Pandas operations, making it a great practice problem for both SQL and Python skills.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions