Use Deployed Models

After you’ve built and deployed a model on the Plexe Platform, you can use its API endpoint to make predictions (inferences). This guide explains how to interact with deployed models to get predictions for your data.

Prerequisites

A model that has been successfully deployed (status: READY)
The deployment ID or endpoint URL
A valid API key with appropriate permissions
The input schema for your model (what data format it expects)

Getting the Endpoint URL

Before making predictions, you need to know your model’s endpoint URL. You can obtain this in several ways:

From the Console

Log in to Plexe Console
Navigate to Models → Deployments
Select your deployment
Copy the endpoint URL from the Overview tab

Via the API

When you deploy a model on the Plexe Platform, you’ll be able to get the inference URL from the model status. The URL will be in the format:

https://api.plexe.ai/models/{model_name}/{model_version}/infer

Where:

{model_name} is the name of your model
{model_version} is the version number of your model

Making Single Predictions

For predictions with a single data point, use the simple prediction endpoint.

Using cURL

curl -X POST https://api.plexe.ai/models/{model_name}/{model_version}/infer \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "feature1": "value1",
    "feature2": 42,
    "feature3": true
  }'

The response will contain the prediction result:

{
  "request_id": "req_789012",
  "result": {
    "predicted_class": "category_b",
    "confidence": 0.87
  },
  "model_id": "model_789xyz",
  "model_version": "1",
  "processing_time_ms": 28
}

Using Python

import requests
import json

API_KEY = "YOUR_API_KEY"
MODEL_NAME = "your_model_name"
MODEL_VERSION = "1"  # Typically "1" for the first version
BASE_URL = "https://api.plexe.ai"

def get_prediction(input_data):
    """
    Get a prediction from a deployed model.
    
    Args:
        input_data (dict): The input data matching the model's expected schema
        
    Returns:
        dict: The prediction result
    """
    headers = {
        "x-api-key": API_KEY,
        "Content-Type": "application/json"
    }
    
    endpoint = f"{BASE_URL}/models/{MODEL_NAME}/{MODEL_VERSION}/infer"
    
    try:
        response = requests.post(
            endpoint,
            headers=headers,
            data=json.dumps(input_data)
        )
        response.raise_for_status()  # Raise exception for error status codes
        
        return response.json()
    
    except requests.exceptions.RequestException as e:
        print(f"Error making prediction: {e}")
        if hasattr(e, 'response') and e.response is not None:
            print(f"Response: {e.response.text}")
        return None

# Example usage
input_data = {
    "square_footage": 1950,
    "bedrooms": 3,
    "bathrooms": 2.5,
    "location": "suburban"
}

result = get_prediction(input_data)
if result:
    print(f"Prediction result: {result['result']}")
    print(f"Request ID: {result['request_id']}")
    print(f"Processing time: {result['processing_time_ms']} ms")

Using JavaScript

async function getPrediction(inputData) {
  const apiKey = "YOUR_API_KEY";
  const modelName = "your_model_name";
  const modelVersion = "1";  // Typically "1" for the first version
  const endpoint = `https://api.plexe.ai/models/${modelName}/${modelVersion}/infer`;
  
  try {
    const response = await fetch(endpoint, {
      method: 'POST',
      headers: {
        'x-api-key': apiKey,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify(inputData)
    });
    
    if (!response.ok) {
      throw new Error(`HTTP error! Status: ${response.status}`);
    }
    
    return await response.json();
    
  } catch (error) {
    console.error("Error making prediction:", error);
    return null;
  }
}

// Example usage
const inputData = {
  square_footage: 1950,
  bedrooms: 3,
  bathrooms: 2.5,
  location: "suburban"
};

getPrediction(inputData)
  .then(result => {
    if (result) {
      console.log("Prediction result:", result.result);
      console.log("Request ID:", result.request_id);
      console.log("Processing time:", result.processing_time_ms, "ms");
    }
  });

Handling Errors

Common Error Codes

HTTP Status	Error Code	Description
400	`invalid_input`	Input doesn’t match model schema
401	`unauthorized`	Missing or invalid API key
403	`forbidden`	API key doesn’t have permission for this deployment
404	`not_found`	Deployment ID doesn’t exist
429	`rate_limit_exceeded`	Too many requests in the allowed time period
500	`prediction_error`	Error occurred during model prediction
503	`deployment_unavailable`	Deployment is not in READY state

Error Response Format

{
  "error": {
    "code": "invalid_input",
    "message": "Input validation failed",
    "details": {
      "feature2": "must be a number between 0 and 100"
    }
  },
  "request_id": "req_789012"
}

Python Error Handling Example

import requests
import json
import time

API_KEY = "YOUR_API_KEY"
DEPLOYMENT_ID = "dep_abc123def456"
BASE_URL = "https://api.plexe.ai"

def get_prediction_with_retry(input_data, max_retries=3, initial_backoff=1):
    """
    Get a prediction with retry logic for transient errors.
    
    Args:
        input_data (dict): The input data
        max_retries (int): Maximum number of retry attempts
        initial_backoff (float): Initial backoff time in seconds (doubles each retry)
        
    Returns:
        dict: The prediction result or None if failed
    """
    headers = {
        "x-api-key": API_KEY,
        "Content-Type": "application/json"
    }
    
    endpoint = f"{BASE_URL}/predict/{DEPLOYMENT_ID}"
    backoff = initial_backoff
    
    for attempt in range(max_retries + 1):
        try:
            response = requests.post(
                endpoint,
                headers=headers,
                data=json.dumps(input_data),
                timeout=10  # 10-second timeout
            )
            
            # If successful, return the result
            if response.status_code == 200:
                return response.json()
            
            # Handle different error types
            if response.status_code == 400:
                # Bad request - no point retrying
                error_data = response.json().get("error", {})
                print(f"Input validation error: {error_data.get('message')}")
                print(f"Details: {error_data.get('details')}")
                return None
                
            elif response.status_code == 429:
                # Rate limit - should retry with backoff
                if attempt < max_retries:
                    print(f"Rate limit exceeded. Retrying in {backoff} seconds...")
                    time.sleep(backoff)
                    backoff *= 2  # Exponential backoff
                    continue
                else:
                    print("Maximum retries reached. Rate limit still exceeded.")
                    return None
                    
            elif response.status_code in (503, 504):
                # Service unavailable or gateway timeout - retry
                if attempt < max_retries:
                    print(f"Service temporarily unavailable. Retrying in {backoff} seconds...")
                    time.sleep(backoff)
                    backoff *= 2
                    continue
                else:
                    print("Maximum retries reached. Service still unavailable.")
                    return None
            
            # Other errors
            try:
                error_data = response.json().get("error", {})
                print(f"Error: {error_data.get('code')} - {error_data.get('message')}")
            except:
                print(f"HTTP Error: {response.status_code} - {response.text}")
            
            return None
                
        except requests.exceptions.Timeout:
            if attempt < max_retries:
                print(f"Request timed out. Retrying in {backoff} seconds...")
                time.sleep(backoff)
                backoff *= 2
                continue
            else:
                print("Maximum retries reached. Requests still timing out.")
                return None
                
        except requests.exceptions.RequestException as e:
            print(f"Request error: {e}")
            return None
    
    return None  # Should not reach here but just in case

Best Practices

Input Validation: Validate inputs before sending to avoid unnecessary API calls
Error Handling: Implement robust error handling with retries for transient errors
Logging: Log request IDs and responses for troubleshooting
Batch Processing: Use batch endpoints for high-volume prediction needs
Rate Limiting: Manage your request rate to avoid hitting rate limits
Monitoring: Track latency and error rates for your production deployments
Caching: Consider caching prediction results for identical inputs
Timeouts: Set appropriate timeouts for your application’s needs

Performance Optimization

Batch When Possible: Use batch predictions for multiple inputs
Minimize Payload Size: Only include required fields in your requests
Connection Pooling: Reuse HTTP connections for multiple requests
Use CDNs: If serving model in user-facing applications, consider a CDN in front of your API calls
Regional Endpoints: Use the endpoint closest to your application (if multiple regions are supported)

Security Considerations

API Key Management: Rotate keys regularly and use the principle of least privilege
Input Sanitization: Validate and sanitize all inputs before sending to the API
TLS/HTTPS: Always use HTTPS (the API will reject HTTP requests)
Response Handling: Don’t expose full API responses to end users
Rate Limiting: Implement your own rate limiting to avoid service disruption

Introduction

Plexe Python Library

Plexe Platform

Prerequisites

Getting the Endpoint URL

From the Console

Via the API

Making Single Predictions

Using cURL

Using Python

Using JavaScript

Handling Errors

Common Error Codes

Error Response Format

Python Error Handling Example

Best Practices

Performance Optimization

Security Considerations

Introduction

Plexe Python Library

Plexe Platform

​Prerequisites

​Getting the Endpoint URL

​From the Console

​Via the API

​Making Single Predictions

​Using cURL

​Using Python

​Using JavaScript

​Handling Errors

​Common Error Codes

​Error Response Format

​Python Error Handling Example

​Best Practices

​Performance Optimization

​Security Considerations

Prerequisites

Getting the Endpoint URL

From the Console

Via the API

Making Single Predictions

Using cURL

Using Python

Using JavaScript

Handling Errors

Common Error Codes

Error Response Format

Python Error Handling Example

Best Practices

Performance Optimization

Security Considerations