400 lines
9.2 KiB
Markdown
400 lines
9.2 KiB
Markdown
|
|
# Chain-of-Thought Prompting
|
|||
|
|
|
|||
|
|
## Overview
|
|||
|
|
|
|||
|
|
Chain-of-Thought (CoT) prompting elicits step-by-step reasoning from LLMs, dramatically improving performance on complex reasoning, math, and logic tasks.
|
|||
|
|
|
|||
|
|
## Core Techniques
|
|||
|
|
|
|||
|
|
### Zero-Shot CoT
|
|||
|
|
Add a simple trigger phrase to elicit reasoning:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
def zero_shot_cot(query):
|
|||
|
|
return f"""{query}
|
|||
|
|
|
|||
|
|
Let's think step by step:"""
|
|||
|
|
|
|||
|
|
# Example
|
|||
|
|
query = "If a train travels 60 mph for 2.5 hours, how far does it go?"
|
|||
|
|
prompt = zero_shot_cot(query)
|
|||
|
|
|
|||
|
|
# Model output:
|
|||
|
|
# "Let's think step by step:
|
|||
|
|
# 1. Speed = 60 miles per hour
|
|||
|
|
# 2. Time = 2.5 hours
|
|||
|
|
# 3. Distance = Speed × Time
|
|||
|
|
# 4. Distance = 60 × 2.5 = 150 miles
|
|||
|
|
# Answer: 150 miles"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Few-Shot CoT
|
|||
|
|
Provide examples with explicit reasoning chains:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
few_shot_examples = """
|
|||
|
|
Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 balls. How many tennis balls does he have now?
|
|||
|
|
A: Let's think step by step:
|
|||
|
|
1. Roger starts with 5 balls
|
|||
|
|
2. He buys 2 cans, each with 3 balls
|
|||
|
|
3. Balls from cans: 2 × 3 = 6 balls
|
|||
|
|
4. Total: 5 + 6 = 11 balls
|
|||
|
|
Answer: 11
|
|||
|
|
|
|||
|
|
Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many do they have?
|
|||
|
|
A: Let's think step by step:
|
|||
|
|
1. Started with 23 apples
|
|||
|
|
2. Used 20 for lunch: 23 - 20 = 3 apples left
|
|||
|
|
3. Bought 6 more: 3 + 6 = 9 apples
|
|||
|
|
Answer: 9
|
|||
|
|
|
|||
|
|
Q: {user_query}
|
|||
|
|
A: Let's think step by step:"""
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Self-Consistency
|
|||
|
|
Generate multiple reasoning paths and take the majority vote:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
import openai
|
|||
|
|
from collections import Counter
|
|||
|
|
|
|||
|
|
def self_consistency_cot(query, n=5, temperature=0.7):
|
|||
|
|
prompt = f"{query}\n\nLet's think step by step:"
|
|||
|
|
|
|||
|
|
responses = []
|
|||
|
|
for _ in range(n):
|
|||
|
|
response = openai.ChatCompletion.create(
|
|||
|
|
model="gpt-5",
|
|||
|
|
messages=[{"role": "user", "content": prompt}],
|
|||
|
|
temperature=temperature
|
|||
|
|
)
|
|||
|
|
responses.append(extract_final_answer(response))
|
|||
|
|
|
|||
|
|
# Take majority vote
|
|||
|
|
answer_counts = Counter(responses)
|
|||
|
|
final_answer = answer_counts.most_common(1)[0][0]
|
|||
|
|
|
|||
|
|
return {
|
|||
|
|
'answer': final_answer,
|
|||
|
|
'confidence': answer_counts[final_answer] / n,
|
|||
|
|
'all_responses': responses
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Advanced Patterns
|
|||
|
|
|
|||
|
|
### Least-to-Most Prompting
|
|||
|
|
Break complex problems into simpler subproblems:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
def least_to_most_prompt(complex_query):
|
|||
|
|
# Stage 1: Decomposition
|
|||
|
|
decomp_prompt = f"""Break down this complex problem into simpler subproblems:
|
|||
|
|
|
|||
|
|
Problem: {complex_query}
|
|||
|
|
|
|||
|
|
Subproblems:"""
|
|||
|
|
|
|||
|
|
subproblems = get_llm_response(decomp_prompt)
|
|||
|
|
|
|||
|
|
# Stage 2: Sequential solving
|
|||
|
|
solutions = []
|
|||
|
|
context = ""
|
|||
|
|
|
|||
|
|
for subproblem in subproblems:
|
|||
|
|
solve_prompt = f"""{context}
|
|||
|
|
|
|||
|
|
Solve this subproblem:
|
|||
|
|
{subproblem}
|
|||
|
|
|
|||
|
|
Solution:"""
|
|||
|
|
solution = get_llm_response(solve_prompt)
|
|||
|
|
solutions.append(solution)
|
|||
|
|
context += f"\n\nPreviously solved: {subproblem}\nSolution: {solution}"
|
|||
|
|
|
|||
|
|
# Stage 3: Final integration
|
|||
|
|
final_prompt = f"""Given these solutions to subproblems:
|
|||
|
|
{context}
|
|||
|
|
|
|||
|
|
Provide the final answer to: {complex_query}
|
|||
|
|
|
|||
|
|
Final Answer:"""
|
|||
|
|
|
|||
|
|
return get_llm_response(final_prompt)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Tree-of-Thought (ToT)
|
|||
|
|
Explore multiple reasoning branches:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
class TreeOfThought:
|
|||
|
|
def __init__(self, llm_client, max_depth=3, branches_per_step=3):
|
|||
|
|
self.client = llm_client
|
|||
|
|
self.max_depth = max_depth
|
|||
|
|
self.branches_per_step = branches_per_step
|
|||
|
|
|
|||
|
|
def solve(self, problem):
|
|||
|
|
# Generate initial thought branches
|
|||
|
|
initial_thoughts = self.generate_thoughts(problem, depth=0)
|
|||
|
|
|
|||
|
|
# Evaluate each branch
|
|||
|
|
best_path = None
|
|||
|
|
best_score = -1
|
|||
|
|
|
|||
|
|
for thought in initial_thoughts:
|
|||
|
|
path, score = self.explore_branch(problem, thought, depth=1)
|
|||
|
|
if score > best_score:
|
|||
|
|
best_score = score
|
|||
|
|
best_path = path
|
|||
|
|
|
|||
|
|
return best_path
|
|||
|
|
|
|||
|
|
def generate_thoughts(self, problem, context="", depth=0):
|
|||
|
|
prompt = f"""Problem: {problem}
|
|||
|
|
{context}
|
|||
|
|
|
|||
|
|
Generate {self.branches_per_step} different next steps in solving this problem:
|
|||
|
|
|
|||
|
|
1."""
|
|||
|
|
response = self.client.complete(prompt)
|
|||
|
|
return self.parse_thoughts(response)
|
|||
|
|
|
|||
|
|
def evaluate_thought(self, problem, thought_path):
|
|||
|
|
prompt = f"""Problem: {problem}
|
|||
|
|
|
|||
|
|
Reasoning path so far:
|
|||
|
|
{thought_path}
|
|||
|
|
|
|||
|
|
Rate this reasoning path from 0-10 for:
|
|||
|
|
- Correctness
|
|||
|
|
- Likelihood of reaching solution
|
|||
|
|
- Logical coherence
|
|||
|
|
|
|||
|
|
Score:"""
|
|||
|
|
return float(self.client.complete(prompt))
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Verification Step
|
|||
|
|
Add explicit verification to catch errors:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
def cot_with_verification(query):
|
|||
|
|
# Step 1: Generate reasoning and answer
|
|||
|
|
reasoning_prompt = f"""{query}
|
|||
|
|
|
|||
|
|
Let's solve this step by step:"""
|
|||
|
|
|
|||
|
|
reasoning_response = get_llm_response(reasoning_prompt)
|
|||
|
|
|
|||
|
|
# Step 2: Verify the reasoning
|
|||
|
|
verification_prompt = f"""Original problem: {query}
|
|||
|
|
|
|||
|
|
Proposed solution:
|
|||
|
|
{reasoning_response}
|
|||
|
|
|
|||
|
|
Verify this solution by:
|
|||
|
|
1. Checking each step for logical errors
|
|||
|
|
2. Verifying arithmetic calculations
|
|||
|
|
3. Ensuring the final answer makes sense
|
|||
|
|
|
|||
|
|
Is this solution correct? If not, what's wrong?
|
|||
|
|
|
|||
|
|
Verification:"""
|
|||
|
|
|
|||
|
|
verification = get_llm_response(verification_prompt)
|
|||
|
|
|
|||
|
|
# Step 3: Revise if needed
|
|||
|
|
if "incorrect" in verification.lower() or "error" in verification.lower():
|
|||
|
|
revision_prompt = f"""The previous solution had errors:
|
|||
|
|
{verification}
|
|||
|
|
|
|||
|
|
Please provide a corrected solution to: {query}
|
|||
|
|
|
|||
|
|
Corrected solution:"""
|
|||
|
|
return get_llm_response(revision_prompt)
|
|||
|
|
|
|||
|
|
return reasoning_response
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Domain-Specific CoT
|
|||
|
|
|
|||
|
|
### Math Problems
|
|||
|
|
```python
|
|||
|
|
math_cot_template = """
|
|||
|
|
Problem: {problem}
|
|||
|
|
|
|||
|
|
Solution:
|
|||
|
|
Step 1: Identify what we know
|
|||
|
|
- {list_known_values}
|
|||
|
|
|
|||
|
|
Step 2: Identify what we need to find
|
|||
|
|
- {target_variable}
|
|||
|
|
|
|||
|
|
Step 3: Choose relevant formulas
|
|||
|
|
- {formulas}
|
|||
|
|
|
|||
|
|
Step 4: Substitute values
|
|||
|
|
- {substitution}
|
|||
|
|
|
|||
|
|
Step 5: Calculate
|
|||
|
|
- {calculation}
|
|||
|
|
|
|||
|
|
Step 6: Verify and state answer
|
|||
|
|
- {verification}
|
|||
|
|
|
|||
|
|
Answer: {final_answer}
|
|||
|
|
"""
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Code Debugging
|
|||
|
|
```python
|
|||
|
|
debug_cot_template = """
|
|||
|
|
Code with error:
|
|||
|
|
{code}
|
|||
|
|
|
|||
|
|
Error message:
|
|||
|
|
{error}
|
|||
|
|
|
|||
|
|
Debugging process:
|
|||
|
|
Step 1: Understand the error message
|
|||
|
|
- {interpret_error}
|
|||
|
|
|
|||
|
|
Step 2: Locate the problematic line
|
|||
|
|
- {identify_line}
|
|||
|
|
|
|||
|
|
Step 3: Analyze why this line fails
|
|||
|
|
- {root_cause}
|
|||
|
|
|
|||
|
|
Step 4: Determine the fix
|
|||
|
|
- {proposed_fix}
|
|||
|
|
|
|||
|
|
Step 5: Verify the fix addresses the error
|
|||
|
|
- {verification}
|
|||
|
|
|
|||
|
|
Fixed code:
|
|||
|
|
{corrected_code}
|
|||
|
|
"""
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Logical Reasoning
|
|||
|
|
```python
|
|||
|
|
logic_cot_template = """
|
|||
|
|
Premises:
|
|||
|
|
{premises}
|
|||
|
|
|
|||
|
|
Question: {question}
|
|||
|
|
|
|||
|
|
Reasoning:
|
|||
|
|
Step 1: List all given facts
|
|||
|
|
{facts}
|
|||
|
|
|
|||
|
|
Step 2: Identify logical relationships
|
|||
|
|
{relationships}
|
|||
|
|
|
|||
|
|
Step 3: Apply deductive reasoning
|
|||
|
|
{deductions}
|
|||
|
|
|
|||
|
|
Step 4: Draw conclusion
|
|||
|
|
{conclusion}
|
|||
|
|
|
|||
|
|
Answer: {final_answer}
|
|||
|
|
"""
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Performance Optimization
|
|||
|
|
|
|||
|
|
### Caching Reasoning Patterns
|
|||
|
|
```python
|
|||
|
|
class ReasoningCache:
|
|||
|
|
def __init__(self):
|
|||
|
|
self.cache = {}
|
|||
|
|
|
|||
|
|
def get_similar_reasoning(self, problem, threshold=0.85):
|
|||
|
|
problem_embedding = embed(problem)
|
|||
|
|
|
|||
|
|
for cached_problem, reasoning in self.cache.items():
|
|||
|
|
similarity = cosine_similarity(
|
|||
|
|
problem_embedding,
|
|||
|
|
embed(cached_problem)
|
|||
|
|
)
|
|||
|
|
if similarity > threshold:
|
|||
|
|
return reasoning
|
|||
|
|
|
|||
|
|
return None
|
|||
|
|
|
|||
|
|
def add_reasoning(self, problem, reasoning):
|
|||
|
|
self.cache[problem] = reasoning
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Adaptive Reasoning Depth
|
|||
|
|
```python
|
|||
|
|
def adaptive_cot(problem, initial_depth=3):
|
|||
|
|
depth = initial_depth
|
|||
|
|
|
|||
|
|
while depth <= 10: # Max depth
|
|||
|
|
response = generate_cot(problem, num_steps=depth)
|
|||
|
|
|
|||
|
|
# Check if solution seems complete
|
|||
|
|
if is_solution_complete(response):
|
|||
|
|
return response
|
|||
|
|
|
|||
|
|
depth += 2 # Increase reasoning depth
|
|||
|
|
|
|||
|
|
return response # Return best attempt
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Evaluation Metrics
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
def evaluate_cot_quality(reasoning_chain):
|
|||
|
|
metrics = {
|
|||
|
|
'coherence': measure_logical_coherence(reasoning_chain),
|
|||
|
|
'completeness': check_all_steps_present(reasoning_chain),
|
|||
|
|
'correctness': verify_final_answer(reasoning_chain),
|
|||
|
|
'efficiency': count_unnecessary_steps(reasoning_chain),
|
|||
|
|
'clarity': rate_explanation_clarity(reasoning_chain)
|
|||
|
|
}
|
|||
|
|
return metrics
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Best Practices
|
|||
|
|
|
|||
|
|
1. **Clear Step Markers**: Use numbered steps or clear delimiters
|
|||
|
|
2. **Show All Work**: Don't skip steps, even obvious ones
|
|||
|
|
3. **Verify Calculations**: Add explicit verification steps
|
|||
|
|
4. **State Assumptions**: Make implicit assumptions explicit
|
|||
|
|
5. **Check Edge Cases**: Consider boundary conditions
|
|||
|
|
6. **Use Examples**: Show the reasoning pattern with examples first
|
|||
|
|
|
|||
|
|
## Common Pitfalls
|
|||
|
|
|
|||
|
|
- **Premature Conclusions**: Jumping to answer without full reasoning
|
|||
|
|
- **Circular Logic**: Using the conclusion to justify the reasoning
|
|||
|
|
- **Missing Steps**: Skipping intermediate calculations
|
|||
|
|
- **Overcomplicated**: Adding unnecessary steps that confuse
|
|||
|
|
- **Inconsistent Format**: Changing step structure mid-reasoning
|
|||
|
|
|
|||
|
|
## When to Use CoT
|
|||
|
|
|
|||
|
|
**Use CoT for:**
|
|||
|
|
- Math and arithmetic problems
|
|||
|
|
- Logical reasoning tasks
|
|||
|
|
- Multi-step planning
|
|||
|
|
- Code generation and debugging
|
|||
|
|
- Complex decision making
|
|||
|
|
|
|||
|
|
**Skip CoT for:**
|
|||
|
|
- Simple factual queries
|
|||
|
|
- Direct lookups
|
|||
|
|
- Creative writing
|
|||
|
|
- Tasks requiring conciseness
|
|||
|
|
- Real-time, latency-sensitive applications
|
|||
|
|
|
|||
|
|
## Resources
|
|||
|
|
|
|||
|
|
- Benchmark datasets for CoT evaluation
|
|||
|
|
- Pre-built CoT prompt templates
|
|||
|
|
- Reasoning verification tools
|
|||
|
|
- Step extraction and parsing utilities
|