After you’ve built and deployed a model on the Plexe Platform, you can use its API endpoint to make predictions (inferences). This guide explains how to interact with deployed models to get predictions for your data.
Prerequisites
- A model that has been successfully deployed (status:
READY
)
- The deployment ID or endpoint URL
- A valid API key with appropriate permissions
- The input schema for your model (what data format it expects)
Getting the Endpoint URL
Before making predictions, you need to know your model’s endpoint URL. You can obtain this in several ways:
From the Console
- Log in to Plexe Console
- Navigate to Models → Deployments
- Select your deployment
- Copy the endpoint URL from the Overview tab
Via the API
When you deploy a model on the Plexe Platform, you’ll be able to get the inference URL from the model status. The URL will be in the format:
https://api.plexe.ai/models/{model_name}/{model_version}/infer
Where:
{model_name}
is the name of your model
{model_version}
is the version number of your model
Making Single Predictions
For predictions with a single data point, use the simple prediction endpoint.
Using cURL
curl -X POST https://api.plexe.ai/models/{model_name}/{model_version}/infer \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"feature1": "value1",
"feature2": 42,
"feature3": true
}'
The response will contain the prediction result:
{
"request_id": "req_789012",
"result": {
"predicted_class": "category_b",
"confidence": 0.87
},
"model_id": "model_789xyz",
"model_version": "1",
"processing_time_ms": 28
}
Using Python
import requests
import json
API_KEY = "YOUR_API_KEY"
MODEL_NAME = "your_model_name"
MODEL_VERSION = "1" # Typically "1" for the first version
BASE_URL = "https://api.plexe.ai"
def get_prediction(input_data):
"""
Get a prediction from a deployed model.
Args:
input_data (dict): The input data matching the model's expected schema
Returns:
dict: The prediction result
"""
headers = {
"x-api-key": API_KEY,
"Content-Type": "application/json"
}
endpoint = f"{BASE_URL}/models/{MODEL_NAME}/{MODEL_VERSION}/infer"
try:
response = requests.post(
endpoint,
headers=headers,
data=json.dumps(input_data)
)
response.raise_for_status() # Raise exception for error status codes
return response.json()
except requests.exceptions.RequestException as e:
print(f"Error making prediction: {e}")
if hasattr(e, 'response') and e.response is not None:
print(f"Response: {e.response.text}")
return None
# Example usage
input_data = {
"square_footage": 1950,
"bedrooms": 3,
"bathrooms": 2.5,
"location": "suburban"
}
result = get_prediction(input_data)
if result:
print(f"Prediction result: {result['result']}")
print(f"Request ID: {result['request_id']}")
print(f"Processing time: {result['processing_time_ms']} ms")
Using JavaScript
async function getPrediction(inputData) {
const apiKey = "YOUR_API_KEY";
const modelName = "your_model_name";
const modelVersion = "1"; // Typically "1" for the first version
const endpoint = `https://api.plexe.ai/models/${modelName}/${modelVersion}/infer`;
try {
const response = await fetch(endpoint, {
method: 'POST',
headers: {
'x-api-key': apiKey,
'Content-Type': 'application/json'
},
body: JSON.stringify(inputData)
});
if (!response.ok) {
throw new Error(`HTTP error! Status: ${response.status}`);
}
return await response.json();
} catch (error) {
console.error("Error making prediction:", error);
return null;
}
}
// Example usage
const inputData = {
square_footage: 1950,
bedrooms: 3,
bathrooms: 2.5,
location: "suburban"
};
getPrediction(inputData)
.then(result => {
if (result) {
console.log("Prediction result:", result.result);
console.log("Request ID:", result.request_id);
console.log("Processing time:", result.processing_time_ms, "ms");
}
});
Handling Errors
Common Error Codes
HTTP Status | Error Code | Description |
---|
400 | invalid_input | Input doesn’t match model schema |
401 | unauthorized | Missing or invalid API key |
403 | forbidden | API key doesn’t have permission for this deployment |
404 | not_found | Deployment ID doesn’t exist |
429 | rate_limit_exceeded | Too many requests in the allowed time period |
500 | prediction_error | Error occurred during model prediction |
503 | deployment_unavailable | Deployment is not in READY state |
{
"error": {
"code": "invalid_input",
"message": "Input validation failed",
"details": {
"feature2": "must be a number between 0 and 100"
}
},
"request_id": "req_789012"
}
Python Error Handling Example
import requests
import json
import time
API_KEY = "YOUR_API_KEY"
DEPLOYMENT_ID = "dep_abc123def456"
BASE_URL = "https://api.plexe.ai"
def get_prediction_with_retry(input_data, max_retries=3, initial_backoff=1):
"""
Get a prediction with retry logic for transient errors.
Args:
input_data (dict): The input data
max_retries (int): Maximum number of retry attempts
initial_backoff (float): Initial backoff time in seconds (doubles each retry)
Returns:
dict: The prediction result or None if failed
"""
headers = {
"x-api-key": API_KEY,
"Content-Type": "application/json"
}
endpoint = f"{BASE_URL}/predict/{DEPLOYMENT_ID}"
backoff = initial_backoff
for attempt in range(max_retries + 1):
try:
response = requests.post(
endpoint,
headers=headers,
data=json.dumps(input_data),
timeout=10 # 10-second timeout
)
# If successful, return the result
if response.status_code == 200:
return response.json()
# Handle different error types
if response.status_code == 400:
# Bad request - no point retrying
error_data = response.json().get("error", {})
print(f"Input validation error: {error_data.get('message')}")
print(f"Details: {error_data.get('details')}")
return None
elif response.status_code == 429:
# Rate limit - should retry with backoff
if attempt < max_retries:
print(f"Rate limit exceeded. Retrying in {backoff} seconds...")
time.sleep(backoff)
backoff *= 2 # Exponential backoff
continue
else:
print("Maximum retries reached. Rate limit still exceeded.")
return None
elif response.status_code in (503, 504):
# Service unavailable or gateway timeout - retry
if attempt < max_retries:
print(f"Service temporarily unavailable. Retrying in {backoff} seconds...")
time.sleep(backoff)
backoff *= 2
continue
else:
print("Maximum retries reached. Service still unavailable.")
return None
# Other errors
try:
error_data = response.json().get("error", {})
print(f"Error: {error_data.get('code')} - {error_data.get('message')}")
except:
print(f"HTTP Error: {response.status_code} - {response.text}")
return None
except requests.exceptions.Timeout:
if attempt < max_retries:
print(f"Request timed out. Retrying in {backoff} seconds...")
time.sleep(backoff)
backoff *= 2
continue
else:
print("Maximum retries reached. Requests still timing out.")
return None
except requests.exceptions.RequestException as e:
print(f"Request error: {e}")
return None
return None # Should not reach here but just in case
Best Practices
- Input Validation: Validate inputs before sending to avoid unnecessary API calls
- Error Handling: Implement robust error handling with retries for transient errors
- Logging: Log request IDs and responses for troubleshooting
- Batch Processing: Use batch endpoints for high-volume prediction needs
- Rate Limiting: Manage your request rate to avoid hitting rate limits
- Monitoring: Track latency and error rates for your production deployments
- Caching: Consider caching prediction results for identical inputs
- Timeouts: Set appropriate timeouts for your application’s needs
- Batch When Possible: Use batch predictions for multiple inputs
- Minimize Payload Size: Only include required fields in your requests
- Connection Pooling: Reuse HTTP connections for multiple requests
- Use CDNs: If serving model in user-facing applications, consider a CDN in front of your API calls
- Regional Endpoints: Use the endpoint closest to your application (if multiple regions are supported)
Security Considerations
- API Key Management: Rotate keys regularly and use the principle of least privilege
- Input Sanitization: Validate and sanitize all inputs before sending to the API
- TLS/HTTPS: Always use HTTPS (the API will reject HTTP requests)
- Response Handling: Don’t expose full API responses to end users
- Rate Limiting: Implement your own rate limiting to avoid service disruption