Skip to content

Commit 1f652c3

Browse files
authored
Update leaderboard.md
1 parent 6b3222f commit 1f652c3

File tree

1 file changed

+19
-16
lines changed

1 file changed

+19
-16
lines changed

docs/leaderboard.md

+19-16
Original file line numberDiff line numberDiff line change
@@ -4,22 +4,25 @@
44

55
# SciCode Leaderboard
66

7-
| Model | Main Problem Resolve Rate |
8-
|------------------------|---------------------------|
9-
| 🥇OpenAI o1-preview | 7.7% |
10-
| 🥈Claude3.5-Sonnet | 4.6% |
11-
| 🥉Deepseek-Coder-v2 | 3.1% |
12-
| GPT-4o | 1.5% |
13-
| GPT-4-Turbo | 1.5% |
14-
| OpenAI o1-mini | 1.5% |
15-
| Gemini 1.5 Pro | 1.5% |
16-
| Claude3-Opus | 1.5% |
17-
| Claude3-Sonnet | 1.5% |
18-
| Qwen2-72B-Instruct | 1.5% |
19-
| Llama-3.1-405B-Instruct| 0% |
20-
| Llama-3.1-70B-Instruct | 0% |
21-
| Mixtral-8x22B-Instruct | 0% |
22-
| Llama-3-70B-Chat | 0% |
7+
| Models | Main Problem Resolve Rate | <span style="background-color:lightgrey">Subproblem</span> |
8+
|--------------------------|-------------------------------------|-------------------------------------|
9+
| 🥇 OpenAI o1-preview | <div align="center">7.7</div> | <div align="center" style="background-color:lightgrey">28.5</div> |
10+
| 🥈 Claude3.5-Sonnet | <div align="center">4.6</div> | <div align="center" style="background-color:lightgrey">26.0</div> |
11+
| 🥉 Claude3.5-Sonnet (new) | <div align="center">4.6</div> | <div align="center" style="background-color:lightgrey">25.3</div> |
12+
| Deepseek-Coder-v2 | <div align="center">3.1</div> | <div align="center" style="background-color:lightgrey">21.2</div> |
13+
| GPT-4o | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">25.0</div> |
14+
| GPT-4-Turbo | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">22.9</div> |
15+
| OpenAI o1-mini | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">22.2</div> |
16+
| Gemini 1.5 Pro | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">21.9</div> |
17+
| Claude3-Opus | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">21.5</div> |
18+
| Llama-3.1-405B-Chat | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">19.8</div> |
19+
| Claude3-Sonnet | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">17.0</div> |
20+
| Qwen2-72B-Instruct | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">17.0</div> |
21+
| Llama-3.1-70B-Chat | <div align="center">0.0</div> | <div align="center" style="background-color:lightgrey">17.0</div> |
22+
| Mixtral-8x22B-Instruct | <div align="center">0.0</div> | <div align="center" style="background-color:lightgrey">16.3</div> |
23+
| Llama-3-70B-Chat | <div align="center">0.0</div> | <div align="center" style="background-color:lightgrey">14.6</div> |
24+
25+
Note: If the models tie in the Main Problem resolve rate, we will then compare the Subproblems.
2326

2427
<!-- Once you've added the results to the submission repository,
2528
bring back the table here -->

0 commit comments

Comments
 (0)