After you’ve built and deployed a model on the Plexe Platform, you can use its API endpoint to make predictions (inferences). This guide explains how to interact with deployed models to get predictions for your data.

Prerequisites

  • A model that has been successfully deployed (status: READY)
  • The deployment ID or endpoint URL
  • A valid API key with appropriate permissions
  • The input schema for your model (what data format it expects)

Getting the Endpoint URL

Before making predictions, you need to know your model’s endpoint URL. You can obtain this in several ways:

From the Console

  1. Log in to Plexe Console
  2. Navigate to ModelsDeployments
  3. Select your deployment
  4. Copy the endpoint URL from the Overview tab

Via the API

When you deploy a model on the Plexe Platform, you’ll be able to get the inference URL from the model status. The URL will be in the format:

https://api.plexe.ai/models/{model_name}/{model_version}/infer

Where:

  • {model_name} is the name of your model
  • {model_version} is the version number of your model

Making Single Predictions

For predictions with a single data point, use the simple prediction endpoint.

Using cURL

curl -X POST https://api.plexe.ai/models/{model_name}/{model_version}/infer \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "feature1": "value1",
    "feature2": 42,
    "feature3": true
  }'

The response will contain the prediction result:

{
  "request_id": "req_789012",
  "result": {
    "predicted_class": "category_b",
    "confidence": 0.87
  },
  "model_id": "model_789xyz",
  "model_version": "1",
  "processing_time_ms": 28
}

Using Python

import requests
import json

API_KEY = "YOUR_API_KEY"
MODEL_NAME = "your_model_name"
MODEL_VERSION = "1"  # Typically "1" for the first version
BASE_URL = "https://api.plexe.ai"

def get_prediction(input_data):
    """
    Get a prediction from a deployed model.
    
    Args:
        input_data (dict): The input data matching the model's expected schema
        
    Returns:
        dict: The prediction result
    """
    headers = {
        "x-api-key": API_KEY,
        "Content-Type": "application/json"
    }
    
    endpoint = f"{BASE_URL}/models/{MODEL_NAME}/{MODEL_VERSION}/infer"
    
    try:
        response = requests.post(
            endpoint,
            headers=headers,
            data=json.dumps(input_data)
        )
        response.raise_for_status()  # Raise exception for error status codes
        
        return response.json()
    
    except requests.exceptions.RequestException as e:
        print(f"Error making prediction: {e}")
        if hasattr(e, 'response') and e.response is not None:
            print(f"Response: {e.response.text}")
        return None

# Example usage
input_data = {
    "square_footage": 1950,
    "bedrooms": 3,
    "bathrooms": 2.5,
    "location": "suburban"
}

result = get_prediction(input_data)
if result:
    print(f"Prediction result: {result['result']}")
    print(f"Request ID: {result['request_id']}")
    print(f"Processing time: {result['processing_time_ms']} ms")

Using JavaScript

async function getPrediction(inputData) {
  const apiKey = "YOUR_API_KEY";
  const modelName = "your_model_name";
  const modelVersion = "1";  // Typically "1" for the first version
  const endpoint = `https://api.plexe.ai/models/${modelName}/${modelVersion}/infer`;
  
  try {
    const response = await fetch(endpoint, {
      method: 'POST',
      headers: {
        'x-api-key': apiKey,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify(inputData)
    });
    
    if (!response.ok) {
      throw new Error(`HTTP error! Status: ${response.status}`);
    }
    
    return await response.json();
    
  } catch (error) {
    console.error("Error making prediction:", error);
    return null;
  }
}

// Example usage
const inputData = {
  square_footage: 1950,
  bedrooms: 3,
  bathrooms: 2.5,
  location: "suburban"
};

getPrediction(inputData)
  .then(result => {
    if (result) {
      console.log("Prediction result:", result.result);
      console.log("Request ID:", result.request_id);
      console.log("Processing time:", result.processing_time_ms, "ms");
    }
  });

Handling Errors

Common Error Codes

HTTP StatusError CodeDescription
400invalid_inputInput doesn’t match model schema
401unauthorizedMissing or invalid API key
403forbiddenAPI key doesn’t have permission for this deployment
404not_foundDeployment ID doesn’t exist
429rate_limit_exceededToo many requests in the allowed time period
500prediction_errorError occurred during model prediction
503deployment_unavailableDeployment is not in READY state

Error Response Format

{
  "error": {
    "code": "invalid_input",
    "message": "Input validation failed",
    "details": {
      "feature2": "must be a number between 0 and 100"
    }
  },
  "request_id": "req_789012"
}

Python Error Handling Example

import requests
import json
import time

API_KEY = "YOUR_API_KEY"
DEPLOYMENT_ID = "dep_abc123def456"
BASE_URL = "https://api.plexe.ai"

def get_prediction_with_retry(input_data, max_retries=3, initial_backoff=1):
    """
    Get a prediction with retry logic for transient errors.
    
    Args:
        input_data (dict): The input data
        max_retries (int): Maximum number of retry attempts
        initial_backoff (float): Initial backoff time in seconds (doubles each retry)
        
    Returns:
        dict: The prediction result or None if failed
    """
    headers = {
        "x-api-key": API_KEY,
        "Content-Type": "application/json"
    }
    
    endpoint = f"{BASE_URL}/predict/{DEPLOYMENT_ID}"
    backoff = initial_backoff
    
    for attempt in range(max_retries + 1):
        try:
            response = requests.post(
                endpoint,
                headers=headers,
                data=json.dumps(input_data),
                timeout=10  # 10-second timeout
            )
            
            # If successful, return the result
            if response.status_code == 200:
                return response.json()
            
            # Handle different error types
            if response.status_code == 400:
                # Bad request - no point retrying
                error_data = response.json().get("error", {})
                print(f"Input validation error: {error_data.get('message')}")
                print(f"Details: {error_data.get('details')}")
                return None
                
            elif response.status_code == 429:
                # Rate limit - should retry with backoff
                if attempt < max_retries:
                    print(f"Rate limit exceeded. Retrying in {backoff} seconds...")
                    time.sleep(backoff)
                    backoff *= 2  # Exponential backoff
                    continue
                else:
                    print("Maximum retries reached. Rate limit still exceeded.")
                    return None
                    
            elif response.status_code in (503, 504):
                # Service unavailable or gateway timeout - retry
                if attempt < max_retries:
                    print(f"Service temporarily unavailable. Retrying in {backoff} seconds...")
                    time.sleep(backoff)
                    backoff *= 2
                    continue
                else:
                    print("Maximum retries reached. Service still unavailable.")
                    return None
            
            # Other errors
            try:
                error_data = response.json().get("error", {})
                print(f"Error: {error_data.get('code')} - {error_data.get('message')}")
            except:
                print(f"HTTP Error: {response.status_code} - {response.text}")
            
            return None
                
        except requests.exceptions.Timeout:
            if attempt < max_retries:
                print(f"Request timed out. Retrying in {backoff} seconds...")
                time.sleep(backoff)
                backoff *= 2
                continue
            else:
                print("Maximum retries reached. Requests still timing out.")
                return None
                
        except requests.exceptions.RequestException as e:
            print(f"Request error: {e}")
            return None
    
    return None  # Should not reach here but just in case

Best Practices

  1. Input Validation: Validate inputs before sending to avoid unnecessary API calls
  2. Error Handling: Implement robust error handling with retries for transient errors
  3. Logging: Log request IDs and responses for troubleshooting
  4. Batch Processing: Use batch endpoints for high-volume prediction needs
  5. Rate Limiting: Manage your request rate to avoid hitting rate limits
  6. Monitoring: Track latency and error rates for your production deployments
  7. Caching: Consider caching prediction results for identical inputs
  8. Timeouts: Set appropriate timeouts for your application’s needs

Performance Optimization

  1. Batch When Possible: Use batch predictions for multiple inputs
  2. Minimize Payload Size: Only include required fields in your requests
  3. Connection Pooling: Reuse HTTP connections for multiple requests
  4. Use CDNs: If serving model in user-facing applications, consider a CDN in front of your API calls
  5. Regional Endpoints: Use the endpoint closest to your application (if multiple regions are supported)

Security Considerations

  1. API Key Management: Rotate keys regularly and use the principle of least privilege
  2. Input Sanitization: Validate and sanitize all inputs before sending to the API
  3. TLS/HTTPS: Always use HTTPS (the API will reject HTTP requests)
  4. Response Handling: Don’t expose full API responses to end users
  5. Rate Limiting: Implement your own rate limiting to avoid service disruption