How-To Guides
Use Deployed Models
Learn how to make predictions with deployed models on the Plexe Platform.
After you’ve built and deployed a model on the Plexe Platform, you can use its API endpoint to make predictions (inferences). This guide explains how to interact with deployed models to get predictions for your data.
Prerequisites
- A model that has been successfully deployed (status:
READY
) - The deployment ID or endpoint URL
- A valid API key with appropriate permissions
- The input schema for your model (what data format it expects)
Getting the Endpoint URL
Before making predictions, you need to know your model’s endpoint URL. You can obtain this in several ways:
From the Console
- Log in to Plexe Console
- Navigate to Models → Deployments
- Select your deployment
- Copy the endpoint URL from the Overview tab
Via the API
When you deploy a model on the Plexe Platform, you’ll be able to get the inference URL from the model status. The URL will be in the format:
Where:
{model_name}
is the name of your model{model_version}
is the version number of your model
Making Single Predictions
For predictions with a single data point, use the simple prediction endpoint.
Using cURL
The response will contain the prediction result:
Using Python
Using JavaScript
Handling Errors
Common Error Codes
HTTP Status | Error Code | Description |
---|---|---|
400 | invalid_input | Input doesn’t match model schema |
401 | unauthorized | Missing or invalid API key |
403 | forbidden | API key doesn’t have permission for this deployment |
404 | not_found | Deployment ID doesn’t exist |
429 | rate_limit_exceeded | Too many requests in the allowed time period |
500 | prediction_error | Error occurred during model prediction |
503 | deployment_unavailable | Deployment is not in READY state |
Error Response Format
Python Error Handling Example
Best Practices
- Input Validation: Validate inputs before sending to avoid unnecessary API calls
- Error Handling: Implement robust error handling with retries for transient errors
- Logging: Log request IDs and responses for troubleshooting
- Batch Processing: Use batch endpoints for high-volume prediction needs
- Rate Limiting: Manage your request rate to avoid hitting rate limits
- Monitoring: Track latency and error rates for your production deployments
- Caching: Consider caching prediction results for identical inputs
- Timeouts: Set appropriate timeouts for your application’s needs
Performance Optimization
- Batch When Possible: Use batch predictions for multiple inputs
- Minimize Payload Size: Only include required fields in your requests
- Connection Pooling: Reuse HTTP connections for multiple requests
- Use CDNs: If serving model in user-facing applications, consider a CDN in front of your API calls
- Regional Endpoints: Use the endpoint closest to your application (if multiple regions are supported)
Security Considerations
- API Key Management: Rotate keys regularly and use the principle of least privilege
- Input Sanitization: Validate and sanitize all inputs before sending to the API
- TLS/HTTPS: Always use HTTPS (the API will reject HTTP requests)
- Response Handling: Don’t expose full API responses to end users
- Rate Limiting: Implement your own rate limiting to avoid service disruption