· Engineering · 3 min read
Hermes OpenCode Agent ile Modal.com'da Ücretsiz LLM Model Deploy
Hermes OpenCode Agent kullanarak Modal.com üzerinde ücretsiz T4 GPU ile LLM model deploy etme rehberi.
Hermes OpenCode Agent ile Modal.com’da Ücretsiz LLM Deploy
Bu rehberde, Hermes OpenCode Agent kullanarak Modal.com platformunda ücretsiz T4 GPU ile LLM model nasıl deploy edeceğinizi göstereceğim.
Modal.com Nedir?
Modal.com, GPU üzerinde Python kodlarınızı çalıştırabileceğiniz bir serverless platform. Özellikle ML/AI iş yükleri için optimize edilmiş.
| Özellik | Değer |
|---|---|
| GPU | T4 (Ücretsiz Tier) |
| RAM | 16GB |
| Storage | 256GB |
| Fiyat | 40 saat/ay ücretsiz |
Kurulum
1. Modal CLI Kurulumu
pip install modal
modal setup
2. API Key Alımı
modal.com adresinden hesap oluşturun ve API key alın.
Hermes OpenCode Agent Deployment Script
#!/usr/bin/env python3
"""
Hermes OpenCode Agent - Modal.com Deployment
"""
import modal
app = modal.App("hermes-opencode-agent")
# Custom container image
image = modal.Image.debian_slim(python_version="3.11").pip_install([
"fastapi==0.110.3",
"uvicorn==0.29.0",
"pydantic==2.10.0",
"requests==2.32.3",
"transformers==4.45.0",
"torch==2.3.0",
"sentencepiece==0.2.0",
"protobuf==3.20.3",
"accelerate==1.1.0"
])
@app.function(
image=image,
gpu="T4", # Free tier GPU
timeout=3600
)
def serve_llm():
"""Serve Phi-3 mini LLM with FastAPI"""
import fastapi
from fastapi import FastAPI
from pydantic import BaseModel
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
# Phi-3 mini model
MODEL_NAME = "microsoft/Phi-3-mini-4k-instruct"
print("Loading Phi-3 mini model...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
# Add padding token if not exists
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
print(f"Model loaded on {model.device}!")
# FastAPI app
app = FastAPI(title="Hermes OpenCode LLM Agent")
class CodeRequest(BaseModel):
prompt: str
max_tokens: int = 256
class CodeResponse(BaseModel):
code: str
model: str
tokens: int
@app.get("/")
async def root():
return {
"message": "Hermes OpenCode LLM Agent",
"model": MODEL_NAME,
"status": "ready",
"gpu": "T4"
}
@app.post("/generate")
async def generate_code(request: CodeRequest):
"""Generate code with Phi-3 model"""
try:
# Prepare prompt
prompt = f"""<|system|>
You are a helpful AI assistant that generates high-quality code. Always respond with complete, runnable code blocks.
<|end|>
<|user|>
{request.prompt}
<|end|>
<|assistant|>
"""
# Tokenize
inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True).to(model.device)
# Generate
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=request.max_tokens,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=tokenizer.eos_token_id
)
# Decode
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
# Extract assistant response
assistant_response = generated_text.split("<|assistant|>")[-1].strip()
return CodeResponse(
code=assistant_response,
model=MODEL_NAME,
tokens=len(outputs[0])
)
except Exception as e:
return {"error": str(e)}
@app.get("/health")
async def health():
return {
"status": "healthy",
"model": MODEL_NAME,
"device": str(model.device),
"dtype": str(model.dtype)
}
return app
@app.local_entrypoint()
def deploy():
"""Deploy the LLM agent"""
print("🚀 Deploying Hermes OpenCode LLM Agent to Modal.com...")
print(f"📦 Model: microsoft/Phi-3-mini-4k-instruct")
print(f"🎯 GPU: T4 (Free tier)")
print("⏳ This may take a few minutes...")
# Test the function
result = serve_llm.remote()
print(f"✅ Deployment successful!")
print(f"🔗 App URL: https://your-username--hermes-opencode-agent.modal.run")
print(f"📊 Health endpoint: /health")
print(f"⚡ Generate endpoint: /generate")
if __name__ == "__main__":
deploy()
Deployment
Script’i çalıştırın:
modal run hermes_modal_deploy.py
API Endpoints
| Endpoint | Method | Açıklama |
|---|---|---|
/ | GET | Status sayfası |
/health | GET | Sistem sağlık durumu |
/generate | POST | Kod generation |
Örnek Kullanım
# Health check
curl https://your-username--hermes-opencode-agent.modal.run/health
# Code generation
curl -X POST https://your-username--hermes-opencode-agent.modal.run/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "Python ile Fibonacci serisi oluştur",
"max_tokens": 256
}'
Maliyet Analizi
| Kaynak | Ücret | Limit |
|---|---|---|
| GPU Saati | $0 | 40 saat/ay |
| Bandwidth | $0 | 10GB/ay |
| Storage | $0 | 256GB |
Toplam Maliyet: $0 🎉
Avantajlar
- Ücretsiz: T4 GPU 40 saat/ay bedava
- Hızlı: NVIDIA GPU’lar ile yüksek performans
- Kolay: Python script ile tek komut deploy
- Scalable: Otomatik scaling
- Open Source: Açık kaynak model desteği
Sonuç
Modal.com + Hermes OpenCode Agent kombinasyonu:
- ✅ Ücretsiz GPU erişimi
- ✅ Kolay deployment
- ✅ Production-ready API
- ✅ Açık kaynak model desteği
- ✅ Otomatik scaling
Tüm AI projeleriniz için mükemmel bir başlangıç noktası! 🚀

