Skip to content

Dhruv/minions groq #65

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 130 additions & 0 deletions tutorials/minions-groq/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Minions with Groq API Cookbook for Multi-Hop Reasoning

## 1. Introduction to Minions

Minions is a framework developed by Stanford's [Hazy Research lab](https://hazyresearch.stanford.edu/blog/2025-02-24-minions) that enables efficient collaboration between small, local models running on your device and large, powerful models running in the cloud. By combining Minions with Groq's fast inference and cost-effectiveness, you get reduced costs, minimal latency, and high-quality results similar to using large models exclusively. We will be exploring Minions in general, and then investigating how it is specifically well-suited for multi-hop question-answering, a challenging type of question where frontier models still struggle.

## 2. Minion and MinionS Protocols

The framework offers two main protocols:

The Minion Protocol is a small, on-device or local model chatting with a cloud-based model to reach a solution without sending long context to the cloud, while the MinionS protocol is a cloud model decomposing the task into smaller subtasks that are executed in parallel by the local model before being aggregated by the local model. While Minion offers slightly lower performance, it is more cost-effective than MinionS.
One should consider the complexity of the task when deciding between Minion and MinionS based on the trade-off between cost and performance.

## 3. What is Multi-Hop Reasoning?

### Traditional Approaches to Multi-Hop QA

Multi-hop reasoning refers to answering questions that require connecting multiple pieces of information across different sources or parts of a document. Traditionally, multi-hop QA has been approached through:

1. **Retrieval-Augmented Generation (RAG)**: Systems retrieve multiple relevant passages and then generate answers, but often struggle with complex reasoning chains.

2. **Pipeline Approaches**: Breaking questions into sub-questions, retrieving information for each, then combining results - but these systems are complex to build and maintain.

3. **Single Large Model Calls**: Using frontier models with large context windows to process all information at once - effective but extremely expensive and often inefficient.

### Examples of Multi-Hop Questions

Consider these examples that require multiple reasoning steps:
- "How did the 2008 housing crisis affect average retirement savings by 2010?"
- "Compare NVIDIA and Apple's stock performance during the AI boom of 2023"
- "How did Britain's and France's economic recovery differ in the Great Depression?"

#### Anatomy of a Multi-Hop Question

Let's break down how one would approach the question about Britain and France's economic recovery:

1. **First Hop**: Identify when the Great Depression occurred (approximate timeframe)

2. **Second Hop**: Gather information about Britain's economic recovery during this period

3. **Third Hop**: Gather information about France's economic recovery during this period

4. **Fourth Hop**: Compare the two countries' economic recovery

5. **Final Synthesis**: Draw conclusions about how each country recovered from and was affected by the Great Depression

This multi-step process requires gathering different pieces of information and connecting them in a logical sequence - exactly what makes multi-hop reasoning challenging.

### Current Limitations

Single-call approaches to multi-hop reasoning face several challenges:
- High tendency for models to hallucinate when connections aren't explicit
- High token costs when processing large documents
- Difficulty maintaining focus across multiple steps

## 4. Why MinionS is Suited for Multi-Hop Reasoning

The MinionS architecture of decomposing a query into tasks and synthesizing a final response is well-suited for multi-hop question-answering, as it helps form connections, identify separate pieces of information, and compare them effectively.

### Current Implementation

Let us consider the multi-hop question in the example file we have provided:
"How did Britain's and France's economic recovery differ in the Great Depression?"

Here is how MinionS would go ahead and answer this question:

**1. Initial Task Decomposition:**
The remote model hosted on Groq typically will break the task into two parallel tasks, such as:
```python
Task 1: "Extract information about Britain's economic recovery during the Great Depression."
Task 2: "Extract information about France's economic recovery during the Great Depression."
```

**2. Parallel Local Processing:**
The local model will these tasks against the context document, returning structured outputs:
```json
{
"explanation": "Britain devalued its currency early and experienced less severe impacts...",
"citation": "Britain, Argentina and Brazil, all of which devalued their currencies early and returned to normal patterns of growth relatively rapidly...",
"answer": "Britain's early currency devaluation helped it recover more quickly..."
}
```

**3. Expert Aggregation:**
Finally, the remote model on Groq synthesizes the parallel outputs from the local models into a final answer. This final answer effectively answers the multi-hop question, combining information from several parts of the context.

## 5. Conclusion

By leveraging the Minions framework with Groq's fast inference capabilities, developers can build applications that handle complex multi-hop reasoning tasks efficiently and cost-effectively. The MinionS protocol in particular offers a powerful alternative to traditional approaches, maintaining near-frontier model quality while significantly reducing costs and running on local devices.

## Prerequisites

1. Clone the repository:
```bash
git clone https://github.com/HazyResearch/minions.git
cd minions
```

2. Create a virtual environment (optional):
```bash
python3 -m venv .venv
source .venv/bin/activate
```

3. Install the package and dependencies:
```bash
pip install -e .
```

4. Install Ollama and pull the Llama 3.2 model:
```bash
# Install Ollama from https://ollama.ai
ollama pull llama3.2
```

5. Get your Groq API key from [Groq Cloud](https://console.groq.com)
```bash
export GROQ_API_KEY=YOUR_GROQ_API_KEY
```

## Example Code
Go ahead and run the example code:
```bash
python groq_minions.py
```

## Additional Resources
- [Minions GitHub Repository](https://github.com/HazyResearch/minions)
- [Groq API Documentation](https://console.groq.com/docs)
- [Minions Research Paper](https://arxiv.org/abs/2402.15688)
63 changes: 63 additions & 0 deletions tutorials/minions-groq/groq_minions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
from minions.clients.ollama import OllamaClient
from minions.clients.groq import GroqClient
from minions.minions import Minions
from pydantic import BaseModel
import os

class StructuredLocalOutput(BaseModel):
explanation: str
citation: str | None
answer: str | None

#Set up your local model, in this case we're using Llama3.2 3b via Ollama
local_client = OllamaClient(
model_name="llama3.2",
temperature=0.0,
structured_output_schema=StructuredLocalOutput
)

#Set up your remote model, in this case we're using Llama3.3 70b via Groq
remote_client = GroqClient(
model_name="llama-3.3-70b-versatile",
api_key=os.getenv("GROQ_API_KEY")
)

# Instantiate the Minion object with the local and remote models
minions = Minions(local_client, remote_client)

#Context taken from Wikipedia about the Great Depression
context = """
The Great Depression was a severe global economic downturn from 1929 to 1939. The period was characterized by high rates of unemployment and poverty; drastic reductions in liquidity, industrial production, and trade; and widespread bank and business failures around the world. The economic contagion began in 1929 in the United States, the largest economy in the world, with the devastating Wall Street stock market crash of October 1929 often considered the beginning of the Depression. Among the countries with the most unemployed were the U.S., the United Kingdom, and Germany.

The Depression was preceded by a period of industrial growth and social development known as the "Roaring Twenties". Much of the profit generated by the boom was invested in speculation, such as on the stock market, which resulted in growing wealth inequality. Banks were subject to minimal regulation under laissez-faire economic policies, resulting in loose lending and widespread debt. By 1929, declining spending had led to reductions in manufacturing output and rising unemployment. Share values continued to rise until the Wall Street crash, after which the slide continued for three years, accompanied by a loss of confidence in the financial system. By 1933, the unemployment rate in the U.S. had risen to 25%, about one-third of farmers had lost their land, and about half of its 25,000 banks had gone out of business. The U.S. federal government under President Herbert Hoover was unwilling to intervene heavily in the economy. In the 1932 presidential election, Hoover was defeated by Franklin D. Roosevelt, who from 1933 pursued a set of expansive New Deal programs in order to provide relief and create jobs. In Germany, which depended heavily on U.S. loans, the crisis caused unemployment to rise to nearly 30% and fueled political extremism, paving the way for Adolf Hitler's Nazi Party to rise to power in 1933.

Between 1929 and 1932, worldwide gross domestic product (GDP) fell by an estimated 15%; in the U.S., the Depression resulted in a 30% contraction in GDP. Recovery varied greatly around the world. Some economies, such as the U.S., Germany and Japan started to recover by the mid-1930s; others, like France, did not return to pre-shock growth rates until later in the decade. The Depression had devastating economic effects on both wealthy and poor countries: all experienced drops in personal income, prices (deflation), tax revenues, and profits. International trade fell by more than 50%, and unemployment in some countries rose as high as 33%. Cities around the world, especially those dependent on heavy industry, were heavily affected. Construction virtually halted in many countries, and farming communities and rural areas suffered as crop prices fell by up to 60%. Faced with plummeting demand and few job alternatives, areas dependent on primary sector industries suffered the most. The outbreak of World War II in 1939 ended the Depression, as it stimulated factory production, providing jobs for women as militaries absorbed large numbers of young, unemployed men.

The precise causes for the Great Depression are disputed. One set of historians, for example, focuses on non-monetary economic causes. Among these, some regard the Wall Street crash itself as the main cause; others consider that the crash was a mere symptom of more general economic trends of the time, which had already been underway in the late 1920s. A contrasting set of views, which rose to prominence in the later part of the 20th century, ascribes a more prominent role to failures of monetary policy. According to those authors, while general economic trends can explain the emergence of the downturn, they fail to account for its severity and longevity; they argue that these were caused by the lack of an adequate response to the crises of liquidity that followed the initial economic shock of 1929 and the subsequent bank failures accompanied by a general collapse of the financial markets.

Beyond the United States
At first, the decline in the U.S. economy was the factor that triggered economic downturns in most other countries due to a decline in trade, capital movement, and global business confidence. Then, internal weaknesses or strengths in each country made conditions worse or better. For example, the U.K. economy, which experienced an economic downturn throughout most of the late 1920s, was less severely impacted by the shock of the depression than the U.S. By contrast, the German economy saw a similar decline in industrial output as that observed in the U.S. Some economic historians attribute the differences in the rates of recovery and relative severity of the economic decline to whether particular countries had been able to effectively devaluate their currencies or not. This is supported by the contrast in how the crisis progressed in, e.g., Britain, Argentina and Brazil, all of which devalued their currencies early and returned to normal patterns of growth relatively rapidly and countries which stuck to the gold standard, such as France or Belgium.

Frantic attempts by individual countries to shore up their economies through protectionist policies – such as the 1930 U.S. Smoot–Hawley Tariff Act and retaliatory tariffs in other countries – exacerbated the collapse in global trade, contributing to the depression. By 1933, the economic decline pushed world trade to one third of its level compared to four years earlier.

Economic Indicators Change (1929-1932):
| Country | Industrial Production | Wholesale Prices | Foreign Trade | Unemployment |
|----------------|---------------------|-----------------|--------------|--------------|
| United States | -46% | -32% | -70% | +607% |
| United Kingdom | -23% | -33% | -60% | +129% |
| France | -24% | -34% | -54% | +214% |
| Germany | -41% | -29% | -61% | +232% |
"""

task = "How did the combination of currency devaluation decisions and trade policies affect the economic recovery rates of Britain versus France during the Great Depression, and what does this reveal about the effectiveness of different policy responses to the crisis?"

# Execute the minions protocol for up to two communication rounds
output = minions(
task=task,
doc_metadata="Historical Economic Analysis",
context=[context],
max_rounds=2
)

#Print the final answer
print(output["final_answer"])