How to Build a Machine Learning System on Serverless Architecture

Let’s say you’ve built a fantastic machine learning model that performs beautifully in notebooks.

But a model isn’t truly valuable until it’s in production, serving real users and solving real problems.

In this article, you’ll learn how to ship a production-ready ML application built on serverless architecture.

Prerequisites

This project requires some basic experience with:

Machine Learning / Deep Learning: The full lifecycle, including data handling, model training, tuning, and validation.
Coding: Proficiency in Python, with experience using major ML libraries such as PyTorch and Scikit-Learn.
Full-stack deployment: Experience deploying applications using RESTful APIs.

What We’re Building

AI Pricing for Retailers

This project aims to help a middle-sized retailer compete with large players like Amazon.

Smaller companies often can’t afford significant price discounts, so they can face challenges finding optimal price points as they expand their product lines.

Our goal is to leverage AI models to recommend the best price for a selected product to maximize sales for the retailer, and display it on a client-side user interface (UI):

You can explore the UI from here.

The Models

I’ll train and tune multiple models so that when the primary model fails, a backup model gets loaded to serve predictions.

Primary Model: Multi-layered feedforward network (on the PyTorch library)
Backup Models (Backups): LightGBM, SVR, and Elastic Net (on the Scikit-Learn library)

The backup models are prioritized based on learning capabilities.

Tuning and Training

The primary model was trained on a dataset of around 500,000 samples (source) and fine-tuned using Optuna‘s Bayesian Optimization, with grid search available for further refinement.

The backups are also trained on the same samples and tuned using the Scikit-Optimize framework.

The Prediction

All models serve predictions on logged quantity values.

Logarithmic transformations of the quantity data make the distribution denser, which helps models learn patterns more effectively. This is because logarithms reduce the impact of extreme values, or outliers, and can help normalize skewed data.

Performance Validation

We’ll evaluate model performance using different metrics for the transformed and original data, with a lower value always indicating better performance.

Logged values: Mean Squared Error (MSE)
Actual values: Root Mean Squared Log Error (RMSLE) and Mean Absolute Error (MAE)

The System Architecture

We’re going to build a complete ecosystem around an AWS Lambda function to create a scalable ML system:

Fig. The system architecture (Created by Kuriko IWAI)

AWS Lambda is a serverless production where a service provider can run the application without managing servers. Once they upload the code, AWS takes on the responsibility of managing the underlying infrastructure.

In the serverless production, the code is deployed as a stateless function that runs only when it’s triggered by an event like HTTP requests or scheduled tasks.

This event-driven nature makes serverless production extremely efficient in resource allocation because:

There’s no server management: The cloud provider takes care of operational tasks.
You have automatic scaling: Serverless applications automatically scale up or down based on demand.
You have pay-per-use billing: Charged for the exact amount of compute resources the application consumes.

Note that other cloud ecosystems like Google Cloud Platform (GCP) and Microsoft Azure offer comprehensive alternatives to AWS. Which one you choose depends on your budget, project type, and familiarity with each ecosystem.

Core AWS Resources in the Architecture

The system architecture focuses on the following points:

The application is fully containerized on Docker for universal accessibility.
The container image is stored in AWS Elastic Container Registry (ECR).
The API Gateway’s REST API endpoints trigger an event to invoke the Lambda function.
The Lambda function loads the container image from ECR and perform inference.
Trained models, processors, and input features are stored in AWS S3 buckets.
A Redis client serves cached analytical data and past predictions stored in the ElastiCache.

And to build the system, we’ll use the following AWS resources:

Lamda: Serves a function to perform inference.
API Gateway: Routes API calls to the Lambda function.
S3 Storage: Serves feature store and model store.
ElastiCache: Store cached predictions and analytical data.
ECR: Stores Docker container images to allow Lambda to pull the image.

Each resource requires configuration. I’ll explore those details in the next section.

The Deployment Workflow in Action

The deployment workflow involves the following steps:

Draft data preparation, model training, and serialization scripts
Configure designated feature store and model store in S3
Create a Flask application with API endpoints
Publish a Docker image to ECR
Create a Lambda function
Configure related AWS resources

We’ll now walk through each of these steps to help you fully understand the process.

For your reference, here is the repository structure:

.
.venv/                  [.gitignore]    # stores uv venv
│
└── data/               [.gitignore]
│     └──raw/                           # stores raw data
│     └──preprocessed/                  # stores processed data after imputation and engineering
│
└── models/             [.gitignore]    # stores serialized model after training and tuning
│     └──dfn/                           # deep feedforward network
│     └──gbm/                           # light gbm
│     └──en/                            # elastic net
│     └──production/                    # models to be stored in S3 for production use
|
└── notebooks/                          # stores experimentation notebooks
│
└── src/                                # core functions
│     └──<span class="hljs-emphasis">_utils/                        # utility functions
│     └──data_</span>handling/                 # functions to engineer features
│     └──model/                         # functions to train, tune, validate models
│     │     └── sklearn<span class="hljs-emphasis">_model
│     │     └── torch_</span>model
│     │     └── ...
│     └──main.py                        # main script to run the inference locally
│
└──app.py                               # Flask application (API endpoints)
└──pyproject.toml                       # project configuration
└──.env                [.gitignore]     # environment variables
└──uv.lock                              # dependency locking
└──Dockerfile                           # for Docker container image
└──.dockerignore
└──requirements.txt
└──.python-version                      # python version locking (3.12)

Step 1: Draft Python Scripts

The first step is to draft Python scripts for data preparation, model training and tuning.

We’ll run these scripts in a batch process because these are resource-intensive and stateful tasks that aren’t suitable for serverless functions optimized for short-lived, stateless, and event-driven tasks.

Serverless functions also can experience cold starts. With heavy tasks in the function, the API gateway would timeout before serving predictions.

src/main.py

<span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">import</span> warnings
<span class="hljs-keyword">import</span> pickle
<span class="hljs-keyword">import</span> joblib
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> lightgbm <span class="hljs-keyword">as</span> lgb
<span class="hljs-keyword">from</span> sklearn.linear_model <span class="hljs-keyword">import</span> ElasticNet
<span class="hljs-keyword">from</span> sklearn.svm <span class="hljs-keyword">import</span> SVR
<span class="hljs-keyword">from</span> skopt.space <span class="hljs-keyword">import</span> Real, Integer, Categorical
<span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv

<span class="hljs-keyword">import</span> src.data_handling <span class="hljs-keyword">as</span> data_handling
<span class="hljs-keyword">import</span> src.model.torch_model <span class="hljs-keyword">as</span> t
<span class="hljs-keyword">import</span> src.model.sklearn_model <span class="hljs-keyword">as</span> sk


<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">'__main__'</span>: 
    load_dotenv(override=<span class="hljs-literal">True</span>)
    os.makedirs(PRODUCTION_MODEL_FOLDER_PATH, exist_ok=<span class="hljs-literal">True</span>)

    <span class="hljs-comment"># create train, validation, test datasets</span>
    X_train, X_val, X_test, y_train, y_val, y_test, preprocessor = data_handling.main_script()

    <span class="hljs-comment"># store the trained preprocessor in local storage</span>
    joblib.dump(preprocessor, PREPROCESSOR_PATH)

    <span class="hljs-comment"># model tuning and training</span>
    best_dfn_full_trained, checkpoint = t.main_script(X_train, X_val, y_train, y_val)

    <span class="hljs-comment"># serialize the trained model</span>
    torch.save(checkpoint, DFN_FILE_PATH)

    <span class="hljs-comment"># svr</span>
    best_svr_trained, best_hparams_svr = sk.main_script(
        X_train, X_val, y_train, y_val, **sklearn_models[<span class="hljs-number">1</span>]
    )
    <span class="hljs-keyword">if</span> best_svr_trained <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">None</span>:
        <span class="hljs-keyword">with</span> open(SVR_FILE_PATH, <span class="hljs-string">'wb'</span>) <span class="hljs-keyword">as</span> f:
            pickle.dump({ <span class="hljs-string">'best_model'</span>: best_svr_trained, <span class="hljs-string">'best_hparams'</span>: best_hparams_svr }, f)

    <span class="hljs-comment"># elastic net</span>
    best_en_trained, best_hparams_en = sk.main_script(
        X_train, X_val, y_train, y_val, **sklearn_models[<span class="hljs-number">0</span>]
    )
    <span class="hljs-keyword">if</span> best_en_trained <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">None</span>:
        <span class="hljs-keyword">with</span> open(EN_FILE_PATH, <span class="hljs-string">'wb'</span>) <span class="hljs-keyword">as</span> f:
            pickle.dump({ <span class="hljs-string">'best_model'</span>: best_en_trained, <span class="hljs-string">'best_hparams'</span>: best_hparams_en }, f)

    <span class="hljs-comment"># light gbm</span>
    best_gbm_trained, best_hparams_gbm = sk.main_script(
        X_train, X_val, y_train, y_val, **sklearn_models[<span class="hljs-number">2</span>]
    )

    <span class="hljs-keyword">if</span> best_gbm_trained <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">None</span>:
        <span class="hljs-keyword">with</span> open(GBM_FILE_PATH, <span class="hljs-string">'wb'</span>) <span class="hljs-keyword">as</span> f:
            pickle.dump({<span class="hljs-string">'best_model'</span>: best_gbm_trained, <span class="hljs-string">'best_hparams'</span>: best_hparams_gbm }, f)

Run the script to train and serialize the models using the uv package management:

<span class="hljs-variable">$uv</span> venv
<span class="hljs-variable">$source</span> .venv/bin/activate
<span class="hljs-variable">$uv</span> run src/main.py

The main.py script includes several key components.

Scripts for Data Handling

These scripts involve loading original data, structure missing values, and engineer features necessary for the future prediction.

src/data_handling/main.py

<span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> joblib
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split

<span class="hljs-keyword">import</span> src.data_handling.scripts <span class="hljs-keyword">as</span> scripts
<span class="hljs-keyword">from</span> src._utils <span class="hljs-keyword">import</span> main_logger


<span class="hljs-comment"># load and save the original data frame in parquet</span>
df = scripts.load_original_dataframe()
df.to_parquet(ORIGINAL_DF_PATH, index=<span class="hljs-literal">False</span>)

<span class="hljs-comment"># imputation</span>
df = scripts.structure_missing_values(df=df)

<span class="hljs-comment"># feature engineering</span>
df = scripts.handle_feature_engineering(df=df)

<span class="hljs-comment"># save processed df in csv and parquet</span>
scripts.save_df_to_csv(df=df)
df.to_parquet(PROCESSED_DF_PATH, index=<span class="hljs-literal">False</span>)


<span class="hljs-comment"># for preprocessing, classify numerical and categorical columns</span>
num_cols, cat_cols = scripts.categorize_num_cat_cols(df=df, target_col=target_col)
<span class="hljs-keyword">if</span> cat_cols:
    <span class="hljs-keyword">for</span> col <span class="hljs-keyword">in</span> cat_cols: df[col] = df[col].astype(<span class="hljs-string">'string'</span>)

<span class="hljs-comment"># creates training, validation, and test datasets (test dataset is for inference only)</span>
y = df[target_col]
X = df.copy().drop(target_col, axis=<span class="hljs-string">'columns'</span>)
test_size, random_state = <span class="hljs-number">50000</span>, <span class="hljs-number">42</span>
X_tv, X_test, y_tv, y_test = train_test_split(
    X, y, test_size=test_size, random_state=random_state
)
X_train, X_val, y_train, y_val = train_test_split(
    X_tv, y_tv, test_size=test_size, random_state=random_state
)

<span class="hljs-comment"># transform the input datasets</span>
X_train, X_val, X_test, preprocessor = scripts.transform_input(
    X_train, X_val, X_test, num_cols=num_cols, cat_cols=cat_cols
)

<span class="hljs-comment"># retrain and serialize the preprocessor</span>
<span class="hljs-keyword">if</span> preprocessor <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">None</span>: preprocessor.fit(X)
joblib.dump(preprocessor, PREPROCESSOR_PATH)

Scripts for Model Training and Tuning (PyTorch Model)

The scripts involve initiating the model, searching optimal neural architecture and hyperparameters, and serializing the fully-trained model so that the system can load the trained model when performing inference.

Because the primary model is built on PyTorch and the backups use Scikit-Learn, we’re drafting the scripts separately.

1. PyTorch Models

The training script contains training the model with the validation over a subset of training data.

It contains the early stopping logic when the loss history is not improved for a given consecutive epochs (that is, 10 epochs).

src/model/torch_model/scripts/training.py

<span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">import</span> torch.nn <span class="hljs-keyword">as</span> nn
<span class="hljs-keyword">import</span> optuna <span class="hljs-comment"># type: ignore</span>
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split

<span class="hljs-keyword">from</span> src._utils <span class="hljs-keyword">import</span> main_logger

<span class="hljs-comment"># device</span>
device_type = device_type <span class="hljs-keyword">if</span> device_type <span class="hljs-keyword">else</span> <span class="hljs-string">'cuda'</span> <span class="hljs-keyword">if</span> torch.cuda.is_available() <span class="hljs-keyword">else</span> <span class="hljs-string">'mps'</span> <span class="hljs-keyword">if</span> torch.backends.mps.is_available() <span class="hljs-keyword">else</span> <span class="hljs-string">'cpu'</span>
device = torch.device(device_type)

<span class="hljs-comment"># gradient scaler for stability (only applicable for cuba)</span>
scaler = torch.GradScaler(device=device_type) <span class="hljs-keyword">if</span> device_type == <span class="hljs-string">'cuba'</span> <span class="hljs-keyword">else</span> <span class="hljs-literal">None</span>

<span class="hljs-comment"># start training</span>
best_val_loss = float(<span class="hljs-string">'inf'</span>)
epochs_no_improve = <span class="hljs-number">0</span>
<span class="hljs-keyword">for</span> epoch <span class="hljs-keyword">in</span> range(num_epochs):
    model.train()
    <span class="hljs-keyword">for</span> batch_X, batch_y <span class="hljs-keyword">in</span> train_data_loader:
        batch_X, batch_y = batch_X.to(device), batch_y.to(device)
        optimizer.zero_grad()

        <span class="hljs-keyword">try</span>:
            <span class="hljs-comment"># pytorch's AMP system automatically handles the casting of tensors to Float16 or Float32</span>
            <span class="hljs-keyword">with</span> torch.autocast(device_type=device_type):
                outputs = model(batch_X)
                loss = criterion(outputs, batch_y)

                <span class="hljs-comment"># break the training loop when models return nan or inf</span>
                <span class="hljs-keyword">if</span> torch.any(torch.isnan(outputs)) <span class="hljs-keyword">or</span> torch.any(torch.isinf(outputs)):
                    main_logger.error(
                        <span class="hljs-string">'pytorch model returns nan or inf. break the training loop.'</span>
                    )
                    <span class="hljs-keyword">break</span>

            <span class="hljs-comment"># create scaled gradients of losses</span>
            <span class="hljs-keyword">if</span> scaler <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">None</span>:
                scaler.scale(loss).backward()
                scaler.unscale_(optimizer)  <span class="hljs-comment"># cliping grad</span>
                nn.utils.clip_grad_norm_(model.parameters(), max_norm=<span class="hljs-number">1.0</span>)
                scaler.step(optimizer)  <span class="hljs-comment"># unscales the gradients</span>
                scaler.update()  <span class="hljs-comment"># updates the scale</span>

            <span class="hljs-keyword">else</span>:
                loss.backward()
                nn.utils.clip_grad_norm_(model.parameters(), max_norm=<span class="hljs-number">1.0</span>) <span class="hljs-comment"># cliping grad</span>
                optimizer.step()

        <span class="hljs-keyword">except</span>:
            outputs = model(batch_X)
            loss = criterion(outputs, batch_y)
            loss.backward()
            optimizer.step()


    <span class="hljs-comment"># run validation on a subset of the training dataset</span>
    model.eval()
    val_loss = <span class="hljs-number">0.0</span>

    <span class="hljs-comment"># switch the torch mode</span>
    <span class="hljs-keyword">with</span> torch.inference_mode():
        <span class="hljs-keyword">for</span> batch_X_val, batch_y_val <span class="hljs-keyword">in</span> val_data_loader:
            batch_X_val, batch_y_val = batch_X_val.to(device), batch_y_val.to(device)
            outputs_val = model(batch_X_val)
            val_loss += criterion(outputs_val, batch_y_val).item()

    val_loss /= len(val_data_loader)

    <span class="hljs-comment"># check if early stop</span>
    <span class="hljs-keyword">if</span> val_loss < best_val_loss - min_delta:
        best_val_loss = val_loss
        epochs_no_improve = <span class="hljs-number">0</span>
    <span class="hljs-keyword">else</span>:
        epochs_no_improve += <span class="hljs-number">1</span>
        <span class="hljs-keyword">if</span> epochs_no_improve >= patience:
            main_logger.info(<span class="hljs-string">f'early stopping at epoch <span class="hljs-subst">{epoch + <span class="hljs-number">1</span>}</span>'</span>)
            <span class="hljs-keyword">break</span>

The tuning script uses the study component from the Optuna library to run the Bayesian Optimization.

The study component choose a neural architecture and hyperparameter set to test from the global search space.

Then, it builds, trains, and validates the model to find the optimal neural architecture that can minimize the loss (MSE, for instance).

src/model/torch_model/scripts/tuning.py

<span class="hljs-keyword">import</span> itertools
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> optuna
<span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">import</span> torch.nn <span class="hljs-keyword">as</span> nn
<span class="hljs-keyword">import</span> torch.optim <span class="hljs-keyword">as</span> optim
<span class="hljs-keyword">from</span> torch.utils.data <span class="hljs-keyword">import</span> DataLoader, TensorDataset
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split

<span class="hljs-keyword">from</span> src.model.torch_model.scripts.pretrained_base <span class="hljs-keyword">import</span> DFN
<span class="hljs-keyword">from</span> src.model.torch_model.scripts.training <span class="hljs-keyword">import</span> train_model
<span class="hljs-keyword">from</span> src._utils <span class="hljs-keyword">import</span> main_logger

<span class="hljs-comment"># device</span>
device_type = <span class="hljs-string">"cuda"</span> <span class="hljs-keyword">if</span> torch.cuda.is_available() <span class="hljs-keyword">else</span> <span class="hljs-string">"mps"</span> <span class="hljs-keyword">if</span> torch.backends.mps.is_available() <span class="hljs-keyword">else</span> <span class="hljs-string">"cpu"</span>
device = torch.device(device_type)

<span class="hljs-comment"># loss function</span>
criterion = nn.MSELoss()

<span class="hljs-comment"># define objective function for optuna</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">objective</span>(<span class="hljs-params">trial</span>):</span>
    <span class="hljs-comment"># model</span>
    num_layers = trial.suggest_int(<span class="hljs-string">'num_layers'</span>, <span class="hljs-number">1</span>, <span class="hljs-number">20</span>)
    batch_norm = trial.suggest_categorical(<span class="hljs-string">'batch_norm'</span>, [<span class="hljs-literal">True</span>, <span class="hljs-literal">False</span>])
    dropout_rates = []
    hidden_units_per_layer = []
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(num_layers):
        dropout_rates.append(trial.suggest_float(<span class="hljs-string">f'dropout_rate_layer_<span class="hljs-subst">{i}</span>'</span>, <span class="hljs-number">0.0</span>, <span class="hljs-number">0.6</span>))
        hidden_units_per_layer.append(trial.suggest_int(<span class="hljs-string">f'n_units_layer_<span class="hljs-subst">{i}</span>'</span>, <span class="hljs-number">8</span>, <span class="hljs-number">256</span>)) <span class="hljs-comment"># hidden units per layer</span>

    model = DFN(
        input_dim=X_train.shape[<span class="hljs-number">1</span>],
        num_layers=num_layers,
        dropout_rates=dropout_rates,
        batch_norm=batch_norm,
        hidden_units_per_layer=hidden_units_per_layer
    ).to(device)

    <span class="hljs-comment"># optimizer</span>
    learning_rate = trial.suggest_float(<span class="hljs-string">'learning_rate'</span>, <span class="hljs-number">1e-10</span>, <span class="hljs-number">1e-1</span>, log=<span class="hljs-literal">True</span>)
    optimizer_name = trial.suggest_categorical(<span class="hljs-string">'optimizer'</span>, [<span class="hljs-string">'adam'</span>, <span class="hljs-string">'rmsprop'</span>, <span class="hljs-string">'sgd'</span>, <span class="hljs-string">'adamw'</span>, <span class="hljs-string">'adamax'</span>, <span class="hljs-string">'adadelta'</span>, <span class="hljs-string">'radam'</span>])
    optimizer = _handle_optimizer(optimizer_name=optimizer_name, model=model, lr=learning_rate)

    <span class="hljs-comment"># data loaders</span>
    batch_size = trial.suggest_categorical(<span class="hljs-string">'batch_size'</span>, [<span class="hljs-number">32</span>, <span class="hljs-number">64</span>, <span class="hljs-number">128</span>, <span class="hljs-number">256</span>])
    test_size = <span class="hljs-number">10000</span> <span class="hljs-keyword">if</span> len(X_train) > <span class="hljs-number">15000</span> <span class="hljs-keyword">else</span> int(len(X_train) * <span class="hljs-number">0.2</span>)
    X_train_search, X_val_search, y_train_search, y_val_search = train_test_split(X_train, y_train, test_size=test_size, random_state=<span class="hljs-number">42</span>)
    train_data_loader = create_torch_data_loader(X=X_train_search, y=y_train_search, batch_size=batch_size)
    val_data_loader = create_torch_data_loader(X=X_val_search, y=y_val_search, batch_size=batch_size)

    <span class="hljs-comment"># training</span>
    num_epochs = <span class="hljs-number">3000</span> <span class="hljs-comment"># ensure enough epochs (early stopping would stop the loop when overfitting)</span>
    _, best_val_loss = train_model(
        train_data_loader=train_data_loader,
        val_data_loader=val_data_loader,
        model=model,
        optimizer=optimizer,
        criterion = criterion,
        num_epochs=num_epochs,
        trial=trial,
    )
    <span class="hljs-keyword">return</span> best_val_loss


<span class="hljs-comment"># start to optimize hyperparameters and architecture</span>
study = optuna.create_study(direction=<span class="hljs-string">'minimize'</span>, sampler=optuna.samplers.TPESampler())
study.optimize(objective, n_trials=<span class="hljs-number">50</span>, timeout=<span class="hljs-number">600</span>)

<span class="hljs-comment"># best </span>
best_trial = study.best_trial
best_hparams = best_trial.params

<span class="hljs-comment"># construct the model based on the tuning results</span>
best_lr = best_hparams[<span class="hljs-string">'learning_rate'</span>]
best_batch_size = best_hparams[<span class="hljs-string">'batch_size'</span>]
input_dim = X_train.shape[<span class="hljs-number">1</span>]
best_model = DFN(
    input_dim=input_dim,
    num_layers=best_hparams[<span class="hljs-string">'num_layers'</span>],
    hidden_units_per_layer=[v <span class="hljs-keyword">for</span> k, v <span class="hljs-keyword">in</span> best_hparams.items() <span class="hljs-keyword">if</span> <span class="hljs-string">'n_units_layer_'</span> <span class="hljs-keyword">in</span> k],
    batch_norm=best_hparams[<span class="hljs-string">'batch_norm'</span>],
    dropout_rates=[v <span class="hljs-keyword">for</span> k, v <span class="hljs-keyword">in</span> best_hparams.items() <span class="hljs-keyword">if</span> <span class="hljs-string">'dropout_rate_layer_'</span> <span class="hljs-keyword">in</span> k],
).to(device)

<span class="hljs-comment"># construct an optimizer based on the tuning results</span>
best_optimizer_name = best_hparams[<span class="hljs-string">'optimizer'</span>]
best_optimizer = _handle_optimizer(
    optimizer_name=best_optimizer_name, model=best_model, lr=best_lr
)

<span class="hljs-comment"># create torch data loaders</span>
train_data_loader = create_torch_data_loader(
    X=X_train, y=y_train, batch_size=best_batch_size
)
val_data_loader = create_torch_data_loader(
    X=X_val, y=y_val, batch_size=best_batch_size
)

<span class="hljs-comment"># retrain the best model with full training dataset applying the optimal batch size and optimizer</span>
best_model, _ = train_model(
    train_data_loader=train_data_loader,
    val_data_loader=val_data_loader,
    model=best_model,
    optimizer=best_optimizer,
    criterion = criterion,
    num_epochs=<span class="hljs-number">1000</span>
)

<span class="hljs-comment"># create a checkpoint for serialization (reconstruct the model using the checkpoint)</span>
checkpoint = {
    <span class="hljs-string">'state_dict'</span>: best_model.state_dict(),
    <span class="hljs-string">'hparams'</span>: best_hparams,
    <span class="hljs-string">'input_dim'</span>: X_train.shape[<span class="hljs-number">1</span>],
    <span class="hljs-string">'optimizer'</span>: best_optimizer,
    <span class="hljs-string">'batch_size'</span>: best_batch_size
}

<span class="hljs-comment"># serialize the model w/ checkpoint</span>
torch.save(checkpoint, FILE_PATH)

2. Scikit-Learn Models (Backups)

For Scikit-Learn models, we’ll run k-fold cross validation during training to prevent overfitting.

K-fold cross-validation is a technique for evaluating a machine learning model’s performance by training and testing it on different subsets of training data.

We define the run_kfold_validation function where the model is trained and validated using 5-fold cross-validation.

src/model/sklearn_model/scripts/tuning.py

<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> KFold
<span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> mean_squared_error

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">run_kfold_validation</span>(<span class="hljs-params">
        X_train,
        y_train,
        base_model,
        hparams: dict,
        n_splits: int = <span class="hljs-number">5</span>, <span class="hljs-comment"># the number of folds </span>
        early_stopping_rounds: int = <span class="hljs-number">10</span>,
        max_iters: int = <span class="hljs-number">200</span>
    </span>) -> float:</span>

    mses = <span class="hljs-number">0.0</span>

    <span class="hljs-comment"># create k-fold component</span>
    kf = KFold(n_splits=n_splits, shuffle=<span class="hljs-literal">True</span>, random_state=<span class="hljs-number">42</span>)

    <span class="hljs-keyword">for</span> fold, (train_index, val_index) <span class="hljs-keyword">in</span> enumerate(kf.split(X_train)):
        <span class="hljs-comment"># create a subset of training and validation datasets from the entire training data</span>
        X_train_fold, X_val_fold = X_train.iloc[train_index], X_train.iloc[val_index]
        y_train_fold, y_val_fold = y_train.iloc[train_index], y_train.iloc[val_index]

        <span class="hljs-comment"># reconstruct a model</span>
        model = base_model(**hparams)

        <span class="hljs-comment"># start the cross validation</span>
        best_val_mse = float(<span class="hljs-string">'inf'</span>)
        patience_counter = <span class="hljs-number">0</span>
        best_model_state = <span class="hljs-literal">None</span>
        best_iteration = <span class="hljs-number">0</span>

        <span class="hljs-keyword">for</span> iteration <span class="hljs-keyword">in</span> range(max_iters):
            <span class="hljs-comment"># train on a subset of the training data</span>
            <span class="hljs-keyword">try</span>:
                model.train_one_step(X_train_fold, y_train_fold, iteration)
            <span class="hljs-keyword">except</span>:
                model.fit(X_train_fold, y_train_fold)

            <span class="hljs-comment"># make a prediction on validation data </span>
            y_pred_val_kf = model.predict(X_val_fold)

            <span class="hljs-comment"># compute validation loss (MSE)</span>
            current_val_mse = mean_squared_error(y_val_fold, y_pred_val_kf)

            <span class="hljs-comment"># check if epochs should be stopped (early stopping)</span>
           <span class="hljs-keyword">if</span> current_val_mse < best_val_mse:
                best_val_mse = current_val_mse
                patience_counter = <span class="hljs-number">0</span>
                best_model_state = model.get_params()
                best_iteration = iteration
           <span class="hljs-keyword">else</span>:
                patience_counter += <span class="hljs-number">1</span>

           <span class="hljs-comment"># execute early stopping when patience_counter exceeds early_stopping_rounds</span>
           <span class="hljs-keyword">if</span> patience_counter >= early_stopping_rounds:
                main_logger.info(<span class="hljs-string">f"Fold <span class="hljs-subst">{fold}</span>: Early stopping triggered at iteration <span class="hljs-subst">{iteration}</span> (best at <span class="hljs-subst">{best_iteration}</span>). Best MSE: <span class="hljs-subst">{best_val_mse:<span class="hljs-number">.4</span>f}</span>"</span>)
                <span class="hljs-keyword">break</span>


        <span class="hljs-comment"># after training epochs, reconstruct the best performing model </span>
        <span class="hljs-keyword">if</span> best_model_state: model.set_params(**best_model_state)

        <span class="hljs-comment"># make prediction</span>
        y_pred_val_kf = model.predict(X_val_fold)

        <span class="hljs-comment"># add MSEs</span>
        mses += mean_squared_error(y_pred_val_kf, y_val_fold)

    <span class="hljs-comment"># compute the final loss (avarage of MSEs across folds)</span>
    ave_mse = mses / n_splits
    <span class="hljs-keyword">return</span> ave_mse

Then, for the tuning script, we use the gp_minimize function from the Scikit-Optimize library.

The gp_minimize function is used to tune hyperparameters with Bayesian optimization.

This function intelligently searches the best hyperparameter set that can minimize the model’s error, which is calculated using the run_kfold_validation function defined earlier.

The best-performing hyperparameters are then used to reconstruct and train the final model.

src/model/sklearn_model/scripts/tuning.py

<span class="hljs-keyword">from</span> functools <span class="hljs-keyword">import</span> partial
<span class="hljs-keyword">from</span> skopt <span class="hljs-keyword">import</span> gp_minimize


<span class="hljs-comment"># define the objective function for Bayesian Optimization using Scikit-Optimize</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">objective</span>(<span class="hljs-params">params, X_train, y_train, base_model, hparam_names</span>):</span>
    hparams = {item: params[i] <span class="hljs-keyword">for</span> i, item <span class="hljs-keyword">in</span> enumerate(hparam_names)}
    ave_mse = run_kfold_validation(X_train=X_train, y_train=y_train, base_model=base_model, hparams=hparams)
    <span class="hljs-keyword">return</span> ave_mse

<span class="hljs-comment"># create the search space</span>
hparam_names = [s.name <span class="hljs-keyword">for</span> s <span class="hljs-keyword">in</span> space]
objective_partial = partial(objective, X_train=X_train, y_train=y_train, base_model=base_model, hparam_names=hparam_names)

<span class="hljs-comment"># search the optimal hyperparameters</span>
results = gp_minimize(
    func=objective_partial,
    dimensions=space,
    n_calls=n_calls,
    random_state=<span class="hljs-number">42</span>,
    verbose=<span class="hljs-literal">False</span>,
    n_initial_points=<span class="hljs-number">10</span>,
)
<span class="hljs-comment"># results</span>
best_hparams = dict(zip(hparam_names, results.x))
best_mse = results.fun

<span class="hljs-comment"># reconstruct the model with the best hyperparameters</span>
best_model = base_model(**best_hparams)

<span class="hljs-comment"># retrain the model with full training dataset</span>
best_model.fit(X_train, y_train)

Step 2: Configure Feature/Model Stores in S3

The trained models and processed data are stored in the S3 bucket as a Parquet file.

We’ll draft the s3_upload function where the Boto3 client, a low-level interface to an AWS service, initiates the connection to S3:

<span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> boto3
<span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv

<span class="hljs-keyword">from</span> src._utils <span class="hljs-keyword">import</span> main_logger

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">s3_upload</span>(<span class="hljs-params">file_path: str</span>):</span>
    <span class="hljs-comment"># initiate the boto3 client</span>
    load_dotenv(override=<span class="hljs-literal">True</span>)
    S3_BUCKET_NAME = os.environ.get(<span class="hljs-string">'S3_BUCKET_NAME'</span>) <span class="hljs-comment"># the bucket created in s3</span>
    s3_client = boto3.client(<span class="hljs-string">'s3'</span>, region_name=os.environ.get(<span class="hljs-string">'AWS_REGION_NAME'</span>)) <span class="hljs-comment"># your default region</span>

    <span class="hljs-keyword">if</span> s3_client:
        <span class="hljs-comment"># create s3 key and upload the file to the bucket</span>
        s3_key = file_path <span class="hljs-keyword">if</span> file_path[<span class="hljs-number">0</span>] != <span class="hljs-string">'/'</span> <span class="hljs-keyword">else</span> file_path[<span class="hljs-number">1</span>:]
        s3_client.upload_file(file_path, S3_BUCKET_NAME, s3_key)
        main_logger.info(<span class="hljs-string">f"file uploaded to s3://<span class="hljs-subst">{S3_BUCKET_NAME}</span>/<span class="hljs-subst">{s3_key}</span>"</span>)
    <span class="hljs-keyword">else</span>:
        main_logger.error(<span class="hljs-string">'failed to create an S3 client.'</span>)

Model Store

Trained PyTorch models are serialized (converted) into .pth files.

Then, these files are uploaded to the S3 bucket, enabling the system to load the trained model when it performs inference in production.

<span class="hljs-keyword">import</span> torch

<span class="hljs-keyword">from</span> src._utils <span class="hljs-keyword">import</span> s3_upload

<span class="hljs-comment"># model serialization, store in local</span>
torch.save(trained_model.state_dict(), MODEL_FILE_PATH)

<span class="hljs-comment"># upload to s3 model store</span>
s3_upload(file_path=MODEL_FILE_PATH)

Feature Store

The processed data is converted into a CSV and Parquet file format.

Then, the Parquet files are uploaded to the S3 bucket, enabling the system to load the lightweight data when it creates prediction data to perform inference in production.

<span class="hljs-keyword">from</span> src._utils <span class="hljs-keyword">import</span> s3_upload

<span class="hljs-comment"># store csv and parquet files in local</span>
df.to_csv(file_path, index=<span class="hljs-literal">False</span>)
df.to_parquet(DATA_FILE_PATH, index=<span class="hljs-literal">False</span>)

<span class="hljs-comment"># store in s3 feature store</span>
s3_upload(file_path=DATA_FILE_PATH)

<span class="hljs-comment"># trained preprocessor is also stored to transform the prediction data</span>
s3_upload(file_path=PROCESSOR_PATH)

Step 3: Create a Flask Application with API Endpoints

Next, we’ll create a Flask application with API endpoints.

Flask needs to configure Python scripts in the app.py file located at the root of the project repository.

As showed in the code snippets, the app.py file needs to contain the components in order of:

AWS Boto3 client setup,
Flask app configuration and API endpoint setup,
Loading the trained preprocessor, processed input data X_test, and trained models,
Invoke the Lambda function via API Gateway, and
The local test section.

Note that X_test should never be used during model training to avoid data leakage.

app.py

<span class="hljs-keyword">from</span> flask <span class="hljs-keyword">import</span> Flask
<span class="hljs-keyword">from</span> flask_cors <span class="hljs-keyword">import</span> cross_origin
<span class="hljs-keyword">from</span> waitress <span class="hljs-keyword">import</span> serve
<span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv

<span class="hljs-keyword">from</span> src._utils <span class="hljs-keyword">import</span> main_logger

<span class="hljs-comment"># global variables (will be loaded from the S3 buckets)</span>
_redis_client = <span class="hljs-literal">None</span>
X_test = <span class="hljs-literal">None</span>
preprocessor = <span class="hljs-literal">None</span>
model = <span class="hljs-literal">None</span>
backup_model = <span class="hljs-literal">None</span>

<span class="hljs-comment"># load env if local else skip (lambda refers to env in production)</span>
AWS_LAMBDA_RUNTIME_API = os.environ.get(<span class="hljs-string">'AWS_LAMBDA_RUNTIME_API'</span>, <span class="hljs-literal">None</span>)
<span class="hljs-keyword">if</span> AWS_LAMBDA_RUNTIME_API <span class="hljs-keyword">is</span> <span class="hljs-literal">None</span>: load_dotenv(override=<span class="hljs-literal">True</span>)


<span class="hljs-comment">#### <---- 1. AWS BOTO3 CLIENT ----></span>
<span class="hljs-comment"># boto3 client </span>
S3_BUCKET_NAME = os.environ.get(<span class="hljs-string">'S3_BUCKET_NAME'</span>, <span class="hljs-string">'ml-sales-pred'</span>)
s3_client = boto3.client(<span class="hljs-string">'s3'</span>, region_name=os.environ.get(<span class="hljs-string">'AWS_REGION_NAME'</span>, <span class="hljs-string">'us-east-1'</span>))
<span class="hljs-keyword">try</span>:
    <span class="hljs-comment"># test connection to boto3 client</span>
    sts_client = boto3.client(<span class="hljs-string">'sts'</span>)
    identity = sts_client.get_caller_identity()
    main_logger.info(<span class="hljs-string">f"Lambda is using role: <span class="hljs-subst">{identity[<span class="hljs-string">'Arn'</span>]}</span>"</span>)
<span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
    main_logger.error(<span class="hljs-string">f"Lambda credentials/permissions error: <span class="hljs-subst">{e}</span>"</span>)

<span class="hljs-comment">#### <---- 2. FLASK CONFIGURATION & API ENDPOINTS ----></span>
<span class="hljs-comment"># configure the flask app</span>
app = Flask(__name__)
app.config[<span class="hljs-string">'CORS_HEADERS'</span>] = <span class="hljs-string">'Content-Type'</span>

<span class="hljs-comment"># add a simple API endpoint to serve the prediction by price point to test</span>
<span class="hljs-meta">@app.route('/v1/predict-price/<string:stockcode>', methods=['GET', 'OPTIONS'])</span>
<span class="hljs-meta">@cross_origin(origins=origins, methods=['GET', 'OPTIONS'], supports_credentials=True)</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">predict_price</span>(<span class="hljs-params">stockcode</span>):</span>
    df_stockcode = <span class="hljs-literal">None</span>

    <span class="hljs-comment"># fetch request params</span>
    data = request.args.to_dict()

    <span class="hljs-keyword">try</span>:
        <span class="hljs-comment"># fetch cache</span>
        <span class="hljs-keyword">if</span> _redis_client <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">None</span>:
            <span class="hljs-comment"># returns cached prediction results if any without performing inference</span>
            cached_prediction_result = _redis_client.get(cache_key_prediction_result_by_stockcode)
            <span class="hljs-keyword">if</span> cached_prediction_result: 
                <span class="hljs-keyword">return</span> jsonify(json.loads(json.dumps(cached_prediction_result)))

            <span class="hljs-comment"># historical data of the selected product</span>
            cached_df_stockcode = _redis_client.get(cache_key_df_stockcode)
            <span class="hljs-keyword">if</span> cached_df_stockcode: df_stockcode = json.loads(json.dumps(cached_df_stockcode))


        <span class="hljs-comment"># define the price range to make predictions. can be a request param, or historical min/max prices</span>
        min_price = float(data.get(<span class="hljs-string">'unitprice_min'</span>, df_stockcode[<span class="hljs-string">'unitprice_min'</span>][<span class="hljs-number">0</span>]))
        max_price = float(data.get(<span class="hljs-string">'unitprice_max'</span>, df_stockcode[<span class="hljs-string">'unitprice_max'</span>][<span class="hljs-number">0</span>]))

        <span class="hljs-comment"># create bins in the price range. when the number of the bins increase, the prediction becomes more smooth, but requires more computational cost</span>
        NUM_PRICE_BINS = int(data.get(<span class="hljs-string">'num_price_bins'</span>, <span class="hljs-number">100</span>))
        price_range = np.linspace(min_price, max_price, NUM_PRICE_BINS)

        <span class="hljs-comment"># create a prediction dataset by merging X_test (dataset never used in model training) and df_stockcode</span>
        price_range_df = pd.DataFrame({ <span class="hljs-string">'unitprice'</span>: price_range })
        test_sample = X_test.sample(n=<span class="hljs-number">1000</span>, random_state=<span class="hljs-number">42</span>)
        test_sample_merged = test_sample.merge(price_range_df, how=<span class="hljs-string">'cross'</span>) <span class="hljs-keyword">if</span> X_test <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">None</span> <span class="hljs-keyword">else</span> price_range_df
        test_sample_merged.drop(<span class="hljs-string">'unitprice_x'</span>, axis=<span class="hljs-number">1</span>, inplace=<span class="hljs-literal">True</span>)
        test_sample_merged.rename(columns={<span class="hljs-string">'unitprice_y'</span>: <span class="hljs-string">'unitprice'</span>}, inplace=<span class="hljs-literal">True</span>)

        <span class="hljs-comment"># preprocess the dataset</span>
        X = preprocessor.transform(test_sample_merged) <span class="hljs-keyword">if</span> preprocessor <span class="hljs-keyword">else</span> test_sample_merged

        <span class="hljs-comment"># perform inference</span>
        y_pred_actual = <span class="hljs-literal">None</span>
        epsilon = <span class="hljs-number">0</span>
        <span class="hljs-comment"># try using the primary model</span>
        <span class="hljs-keyword">if</span> model:
            input_tensor = torch.tensor(X, dtype=torch.float32)
            model.eval()
            <span class="hljs-keyword">with</span> torch.inference_mode():
                y_pred = model(input_tensor)
                y_pred = y_pred.cpu().numpy().flatten()
                y_pred_actual = np.exp(y_pred + epsilon)

        <span class="hljs-comment"># if not, use backups</span>
        <span class="hljs-keyword">elif</span> backup_model:
            y_pred = backup_model.predict(X)
            y_pred_actual = np.exp(y_pred + epsilon)


        <span class="hljs-comment"># finalize the outcome for client app</span>
        df_ = test_sample_merged.copy()
        df_[<span class="hljs-string">'quantity'</span>] = np.floor(y_pred_actual) <span class="hljs-comment"># quantity must be an integer</span>
        df_[<span class="hljs-string">'sales'</span>] = df_[<span class="hljs-string">'quantity'</span>] * df_[<span class="hljs-string">'unitprice'</span>] <span class="hljs-comment"># compute sales</span>
        df_ = df_.sort_values(by=<span class="hljs-string">'unitprice'</span>)

        <span class="hljs-comment"># aggregate the results by the unitprice in the price range</span>
        df_results = df_.groupby(<span class="hljs-string">'unitprice'</span>).agg(
            quantity=(<span class="hljs-string">'quantity'</span>, <span class="hljs-string">'median'</span>),
            quantity_min=(<span class="hljs-string">'quantity'</span>, <span class="hljs-string">'min'</span>),
            quantity_max=(<span class="hljs-string">'quantity'</span>, <span class="hljs-string">'max'</span>),
            sales=(<span class="hljs-string">'sales'</span>, <span class="hljs-string">'median'</span>),
        ).reset_index()

        <span class="hljs-comment"># find the optimal price point</span>
        optimal_row = df_results.loc[df_results[<span class="hljs-string">'sales'</span>].idxmax()]
        optimal_price = optimal_row[<span class="hljs-string">'unitprice'</span>]
        optimal_quantity = optimal_row[<span class="hljs-string">'quantity'</span>]
        best_sales = optimal_row[<span class="hljs-string">'sales'</span>]

        all_outputs = []
        <span class="hljs-keyword">for</span> _, row <span class="hljs-keyword">in</span> df_results.iterrows():
            current_output = {
                <span class="hljs-string">"stockcode"</span>: stockcode,
                <span class="hljs-string">"unit_price"</span>: float(row[<span class="hljs-string">'unitprice'</span>]),
                <span class="hljs-string">'quantity'</span>: int(row[<span class="hljs-string">'quantity'</span>]),
                <span class="hljs-string">'quantity_min'</span>: int(row[<span class="hljs-string">'quantity_min'</span>]),
                <span class="hljs-string">'quantity_max'</span>: int(row[<span class="hljs-string">'quantity_max'</span>]),
                <span class="hljs-string">"predicted_sales"</span>: float(row[<span class="hljs-string">'sales'</span>]),
            }
            all_outputs.append(current_output)

        <span class="hljs-comment"># store the prediction results in cache</span>
        <span class="hljs-keyword">if</span> all_outputs <span class="hljs-keyword">and</span> _redis_client <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">None</span>:
             serialized_data = json.dumps(all_outputs)
            _redis_client.set(
                cache_key_prediction_result_by_stockcode, 
                serialized_data,
                ex=<span class="hljs-number">3600</span>     <span class="hljs-comment"># expire in an hour</span>
            )

        <span class="hljs-comment"># return a list of all outputs</span>
        <span class="hljs-keyword">return</span> jsonify(all_outputs)

    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e: <span class="hljs-keyword">return</span> jsonify([])


<span class="hljs-comment"># request header management (for the process from API gateway to the Lambda)</span>
<span class="hljs-meta">@app.after_request</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">add_header</span>(<span class="hljs-params">response</span>):</span>
    response.headers[<span class="hljs-string">'Cache-Control'</span>] = <span class="hljs-string">'public, max-age=0'</span>
    response.headers[<span class="hljs-string">'Access-Control-Allow-Origin'</span>] = CLIENT_A
    response.headers[<span class="hljs-string">'Access-Control-Allow-Headers'</span>] = <span class="hljs-string">'Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token,Origin'</span>
    response.headers[<span class="hljs-string">'Access-Control-Allow-Methods'</span>] = <span class="hljs-string">'GET, POST, OPTIONSS'</span>
    response.headers[<span class="hljs-string">'Access-Control-Allow-Credentials'</span>] = <span class="hljs-string">'true'</span>
    <span class="hljs-keyword">return</span> response

<span class="hljs-comment">#### <---- 3. LOADING PROCESSOR, DATASET, AND MODELS ----></span>
load_processor()
load_x_test()
load_model()

<span class="hljs-comment">#### <---- 4. INVOKE LAMBDA ----></span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">handler</span>(<span class="hljs-params">event, context</span>):</span>
    logger.info(<span class="hljs-string">"lambda handler invoked."</span>)
    <span class="hljs-keyword">try</span>:
        <span class="hljs-comment"># connecting the redis client after the lambda is invoked</span>
        get_redis_client()
    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
        logger.critical(<span class="hljs-string">f"failed to establish initial Redis connection in handler: <span class="hljs-subst">{e}</span>"</span>)
        <span class="hljs-keyword">return</span> {
            <span class="hljs-string">'statusCode'</span>: <span class="hljs-number">500</span>,
            <span class="hljs-string">'body'</span>: json.dumps({<span class="hljs-string">'error'</span>: <span class="hljs-string">'Failed to initialize Redis client. Check environment variables and network config.'</span>})
        }

    <span class="hljs-comment"># use the awsgi package to convert JSON to WSGI</span>
    <span class="hljs-keyword">return</span> awsgi.response(app, event, context)


<span class="hljs-comment">#### <---- 5. FOR LOCAL TEST ----></span>
<span class="hljs-comment"># serve the application locally on WSGI server, waitress</span>
<span class="hljs-comment"># lambda will ignore this section.</span>
<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">'__main__'</span>:   
    <span class="hljs-keyword">if</span> os.getenv(<span class="hljs-string">'ENV'</span>) == <span class="hljs-string">'local'</span>:
        main_logger.info(<span class="hljs-string">"...start the operation (local)..."</span>)
        serve(app, host=<span class="hljs-string">'0.0.0.0'</span>, port=<span class="hljs-number">5002</span>)
    <span class="hljs-keyword">else</span>:
        app.run(host=<span class="hljs-string">'0.0.0.0'</span>, port=<span class="hljs-number">8080</span>)

I’ll test the endpoint locally using the uv package manager:

$uv run app.py --cache-clear

$curl http://localhost:<span class="hljs-number">5002</span>/v1/predict-price/{STOCKCODE}

The system provided a list of sales predictions for each price point:

Fig. Screenshot of the Flask app local response

Key Points on Flask App Configuration

There are various points you should take into consideration when configuring a Flask application with Lambda. Let’s go over them now:

1. A Few API Endpoints Per Container

Adding many API endpoints to a single serverless instance can lead to monolithic function concern where issues in one endpoint impact others.

In this project, we’ll focus on a single endpoint per container – and if needed, we can add separate Lambda functions to the system.

2. Understanding the `handler` Function and the role of AWSGI

The handler function is invoked every time the Lambda function receives a client request from the API Gateway.

The function takes the event argument that includes the request details in a JSON dictionary and passes it to the Flask application.

AWSGI acts as an adapter, translating a Lambda event in JSON format into a WSGI request that a Flask application can understand, and converts the application’s response back into a JSON format that Lambda and API Gateway can process.

3. Using Cache Storage

The get_redis_client function is called once the handler function is called by the API Gateway. This allows the Flask application to store or fetch a cache from the Redis client:

<span class="hljs-keyword">import</span> redis
<span class="hljs-keyword">import</span> redis.cluster
<span class="hljs-keyword">from</span> redis.cluster <span class="hljs-keyword">import</span> ClusterNode

_redis_client = <span class="hljs-literal">None</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_redis_client</span>():</span>
    <span class="hljs-keyword">global</span> _redis_client
    <span class="hljs-keyword">if</span> _redis_client <span class="hljs-keyword">is</span> <span class="hljs-literal">None</span>:
        REDIS_HOST = os.environ.get(<span class="hljs-string">"REDIS_HOST"</span>)
        REDIS_PORT = int(os.environ.get(<span class="hljs-string">"REDIS_PORT"</span>, <span class="hljs-number">6379</span>))
        REDIS_TLS = os.environ.get(<span class="hljs-string">"REDIS_TLS"</span>, <span class="hljs-string">"true"</span>).lower() == <span class="hljs-string">"true"</span>
        <span class="hljs-keyword">try</span>:
            startup_nodes = [ClusterNode(host=REDIS_HOST, port=REDIS_PORT)]
            _redis_client = redis.cluster.RedisCluster(
                startup_nodes=startup_nodes,
                decode_responses=<span class="hljs-literal">True</span>,
                skip_full_coverage_check=<span class="hljs-literal">True</span>,
                ssl=REDIS_TLS,                  <span class="hljs-comment"># elasticache has encryption in transit: enabled -> must be true</span>
                ssl_cert_reqs=<span class="hljs-literal">None</span>,
                socket_connect_timeout=<span class="hljs-number">5</span>,
                socket_timeout=<span class="hljs-number">5</span>,
                health_check_interval=<span class="hljs-number">30</span>,
                retry_on_timeout=<span class="hljs-literal">True</span>,
                retry_on_error=[
                    redis.exceptions.ConnectionError,
                    redis.exceptions.TimeoutError
                ],
                max_connections=<span class="hljs-number">10</span>,            <span class="hljs-comment"># limit connections for Lambda</span>
                max_connections_per_node=<span class="hljs-number">2</span>     <span class="hljs-comment"># limit per node</span>
            )
            _redis_client.ping()
            main_logger.info(<span class="hljs-string">"successfully connected to ElastiCache Redis Cluster (Configuration Endpoint)"</span>)
        <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
            main_logger.error(<span class="hljs-string">f"an unexpected error occurred during Redis Cluster connection: <span class="hljs-subst">{e}</span>"</span>, exc_info=<span class="hljs-literal">True</span>)
            _redis_client = <span class="hljs-literal">None</span>
    <span class="hljs-keyword">return</span> _redis_client

4. Handling Heavy Tasks Outside of the `handler` Function

Serverless functions can experience a cold start duration.

While a Lambda function can run for up to 15 minutes, its associated API Gateway has a timeout of 29 seconds (29,000 ms) for a RESTful API.

So, any heavy tasks like loading preprocessors, input data, or models should be performed once outside of the handler function, ensuring they are ready before the API endpoint is called.

Here are the loading functions called in app.py.

app.py

<span class="hljs-keyword">import</span> joblib

<span class="hljs-keyword">from</span> src._utils <span class="hljs-keyword">import</span> s3_load, s3_load_to_temp_file

preprocessor = <span class="hljs-literal">None</span>
X_test = <span class="hljs-literal">None</span>
model = <span class="hljs-literal">None</span>
backup_model = <span class="hljs-literal">None</span>


<span class="hljs-comment"># load processor</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">load_preprocessor</span>():</span>
    <span class="hljs-keyword">global</span> preprocessor
    preprocessor_tempfile_path = s3_load_to_temp_file(PREPROCESSOR_PATH)
    preprocessor = joblib.load(preprocessor_tempfile_path)
    os.remove(preprocessor_tempfile_path)


<span class="hljs-comment"># load input data</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">load_x_test</span>():</span>
    <span class="hljs-keyword">global</span> X_test
    x_test_io = s3_load(file_path=X_TEST_PATH)
    X_test = pd.read_parquet(x_test_io)


<span class="hljs-comment"># load model</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">load_model</span>():</span>
    <span class="hljs-keyword">global</span> model, backup_model
    <span class="hljs-comment"># try loading & reconstructing the primary model</span>
    <span class="hljs-keyword">try</span>:
        <span class="hljs-comment"># first load io file from the s3 bucket</span>
        model_data_bytes_io_ = s3_load(file_path=DFN_FILE_PATH)
        <span class="hljs-comment"># convert to checkpoint dictionary (containing hyperparameter set)</span>
        checkpoint_ = torch.load(
            model_data_bytes_io_, 
            weights_only=<span class="hljs-literal">False</span>, 
            map_location=device
        )
        <span class="hljs-comment"># reconstruct the model</span>
        model = t.scripts.load_model(checkpoint=checkpoint_, file_path=DFN_FILE_PATH)
        <span class="hljs-comment"># set the model evaluation mode</span>
        model.eval()

    <span class="hljs-comment"># else, backup model</span>
     <span class="hljs-keyword">except</span>:
        load_artifacts_backup_model()

Step 4: Publish a Docker Image to ECR

After configuring the Flask application, we’ll containerize the entire application on Docker.

Containerization makes a package of the application, including models, its dependencies, and configuration in machine learning context, as a container.

Docker creates a container image based on the instructions defined in a Dockerfile, and the Docker engine uses the image to run the isolated container.

In this project, we’ll upload the Docker container image to ECR, so the Lambda function can access it in production.

After this, we’ll define the .dockerignore file to optimize the container image:

.dockerignore

# any irrelevant data
__pycache__/
.ruff_cache/
.DS_Store/
.venv/
dist/
.vscode
*.psd
*.pdf
[a-f]*.log
tmp/
awscli-bundle/

# add any experimental models, unnecessary data
dfn_bayesian/
dfn_grid/
data/
notebooks/

Dockerfile

<span class="hljs-comment"># serve from aws ecr </span>
<span class="hljs-keyword">FROM</span> public.ecr.aws/lambda/python:<span class="hljs-number">3.12</span>

<span class="hljs-comment"># define a working directory in the container</span>
<span class="hljs-keyword">WORKDIR</span><span class="bash"> /app</span>

<span class="hljs-comment"># copy the entire repository (except .dockerignore) into the container at /app</span>
<span class="hljs-keyword">COPY</span><span class="bash"> . /app/</span>

<span class="hljs-comment"># install dependencies defined in the requirements.txt</span>
<span class="hljs-keyword">RUN</span><span class="bash"> pip install --no-cache-dir -r requirements.txt</span>

<span class="hljs-comment"># define commands</span>
<span class="hljs-keyword">ENTRYPOINT</span><span class="bash"> [ <span class="hljs-string">"python"</span> ]</span>
<span class="hljs-keyword">CMD</span><span class="bash"> [ <span class="hljs-string">"-m"</span>, <span class="hljs-string">"awslambdaric"</span>, <span class="hljs-string">"app.handler"</span> ]</span>

Test in Local

Next, we’ll test the Docker image by building the container named my-app locally:

<span class="hljs-variable">$docker</span> build -t my-app -f Dockerfile .

Then, we’ll run the container with the waitress server in local:

<span class="hljs-variable">$docker</span> run -p 5002:5002 -e ENV=<span class="hljs-built_in">local</span> my-app app.py

The -e ENV=local flag sets the environment variable inside the container, which will trigger the waitress.serve() call in the app.py.

In the terminal, you’ll find a message saying the following:

You can also call the endpoint created to see the results returned:

<span class="hljs-variable">$uv</span> run app.py --cache-clear

<span class="hljs-variable">$curl</span> http://localhost:5002/v1/predict-price/{STOCKCODE}

Publish the Docker Image to ECR

To publish the Docker image, we first need to configure the default AWS credentials and region:

From the AWS account console, issue an access token and check the default region.
Store them in the ~/aws/credentials and ~/aws/config files:

~/aws/credentials

[default] 
aws_secret_access_key=
aws_access_key_id=

~/aws/config

[default]
region=

After the configuration, we’ll publish the Docker image to ECR.

<span class="hljs-comment"># authenticate the docker client to ECR</span>
<span class="hljs-variable">$aws</span> ecr get-login-password --region <your-aws-region> | docker login --username AWS --password-stdin <your-aws-account-id>.dkr.ecr.<your-aws-region>.amazonaws.com

<span class="hljs-comment"># create repository</span>
<span class="hljs-variable">$aws</span> ecr create-repository --repository-name <your-repo-name> --region <your-aws-region>

<span class="hljs-comment"># tag the docker image</span>
<span class="hljs-variable">$docker</span> tag <your-repo-name>:<your-app-version>  <your-aws-account-id>.dkr.ecr.<your-aws-region>.amazonaws.com/<your-app-name>:<your-app-version>

<span class="hljs-comment"># push</span>
<span class="hljs-variable">$docker</span> push <your-aws-account-id>.dkr.ecr.<your-aws-region>.amazonaws.com/<your-repo-name>:<your-app-version>

Here’s what’s going on:

<your-aws-region>: Your default AWS region (for example, us-east-1 ).
<your-aws-account-id>: 12-digit AWS account ID.
<your-repo-name>: Your desired repository name.
<your-app-version>: Your desired tag name (for example, v1.0).

Now, the Docker image is stored in ECR with the tag:

Fig. Screenshot of the AWS ECR console

Step 5: Create a Lambda Function

Next, we’ll create a Lambda function.

From the Lambda console, choose:

The Container Image option,
The container image URL from the pull down list,
A function name of our choice, and
An architecture type (arm64 is recommended for a better price-performance).

Fig. Screenshot of AWS Lambda function configuration

The Lambda function my-app was successfully launched.

Connect the Lambda function to API Gateway

Next, we’ll add API gateway as an event trigger to the Lambda function.

First, visit the API Gateway console and create REST API methods using the ARN of the Lambda function (press enter or click to view image in full size):

Fig. Screenshot of the AWS API Gateway configuration

Then, add resources to the created API gateway to create an endpoint:
API Gateway > APIs > Resources > Create Resource

Align the resource endpoint with the API endpoint defined in the app.py.
Configure CORS (for example, accept specific origins).
Deploy the resource to the stage.

Going back to the Lambda console, you’ll find the API Gateway is connected as an event trigger:
Lambda > Function > my-app (your function name)

Fig. Screenshot of the AWS Lambda dashboard

Step 6: Configure AWS Resources

Lastly, we’ll configure the related AWS resources to make the system work in production.

This process involves the following steps:

1. The IAM Role: Controls Who to Access Resources

AWS requires IAM roles to grant temporary, secure permissions to users, mitigating security risks related to long-term credentials like passwords.

The IAM role leverages policies to grant accesses to the selected service. Policies can be issued by AWS or customized by the user by defining the inline policy.

It is important to avoid overly permissive access rights for the IAM role.

In the Lambda function console, check the execution role:
Lambda > Function > <FUNCTION> > Permission > The execution role.
Set up the following policies to allow the Lambda’s IAM role to handle necessary operations:
- Lambda AWSLambdaExecute: Allows executing the function.
- EC2 Inline policy: Allows controlling the security group and the VPC of the Lambda function.
- ECR AmazonElasticContainerRegistryPublicFullAccess + Inline policy: Allows storing and pulling the Docker image.
- ElastiCache AmazonElastiCacheFullAccess + Inline policy: Allows storing and pulling caches.
- S3: AmazonS3ReadOnlyAccess + Inline policy: Allows reading and storing contents.

Now, the IAM role can access these resources and perfo the allowed actions.

2. The Security Group: Controls Network Traffic

A security group is a virtual firewall that controls inbound and outbound network traffic for AWS resources.

It uses stateful (allowing return traffic automatically) “allow-only” rules based on protocol, port, and IP address, where it denies all traffic by default.

Create a new security group for the Lambda function:
EC2 > Security Groups > <YOUR SECURITY GROUP>

Now, we’ll want to setup inbound / outbound traffic rules.

The inbound rules:

S3 → Lambda:Type*: HTTPS /* Protocol*: TCP /* Port range*: 443 / Source: Custom**
ElastiCache → Lambda:Type*: Custom TCP /* Port range*: 6379 / Source: Custom**

*Choose the created security group for the Lambda function as a custom source.

The outbound rules:

Lambda → Internet: Type*: HTTPS /* Protocol*: TCP /* Port range*: 443 /* Destination*: 0.0.0.0/0*
ElastiCache → Internet: Type*: All Traffic /* Destination*: 0.0.0.0/0*

3. The Virtual Private Cloud (VPC)

A Virtual Private Cloud (VPC) provides a logically isolated private network for the AWS resources, acting as our own private data center within AWS.

AWS can create a Hyperplane ENI (Elastic Network Interface) for the Lambda function and its connected resources in the subnets of the VPC.

Though it’s optional, we’ll use the VPC to connect the Lambda function to the S3 storage and ElastiCache.

This process involves:

Creating a VPC endpoint from the VPC console:VPC > Create VPC.
Creating an STS (Security Token Service) endpoint:
VPC > PrivateLink and Lattice > Endpoints > Create Endpoint >
- Type*: AWS Service*
- Service name*: com.amazonaws.<YOUR REGION>.sts*
- Type*: Interface*
- VPC: Select the VPC created earlier.
- Subnets*: Select all subnets.*
- Security groups*: Select the security group of the Lambda function.*
- Policy*: Full access*
- Enable DNS names

The VPC must have a dedicated endpoint for STS to receive temporary credentials from STS.

Create an S3 endpoint in the VPC:
VPC > PrivateLink and Lattice > Endpoints > Create Endpoint >
- Type*: AWS Service*
- Service name*: com.amazonaws.<YOUR REGION>.s3*
- Type*: Gateway*
- VPC: Select the VPC created earlier.
- Subnets*: Select all subnets.*
- Security groups*: Select the security group of the Lambda function.*
- Policy*: Full access*

Lastly, check the security group of the Lambda function and ensure that its VPC ID directs to the VPC created: EC2 > Security Group > <YOUR SECURITY GROUP FOR THE LAMDA FUNCTION> > VPC ID.

That’s all for the deployment flow.

We can now test the API endpoint in production. Copy the Invoke URL of the deployed API endpoint: API Gateway > APIs > Stages > Invoke URL. Then call the API endpoint and check if it responds predictions:

<span class="hljs-variable">$curl</span> -H <span class="hljs-string">'Authorization: Bearer YOUR_API_TOKEN'</span> -H <span class="hljs-string">'Accept: application/json'</span> 
     <span class="hljs-string">'<INVOKE URL>/<ENDPOINT>'</span>

For logging and debugging, we’ll use the LiveTail of CloudWatch: CloudWatch > LiveTail.

Building a Client Application (Optional)

For full-stack deployment, we’ll build a simple React application to display the prediction using the recharts library for visualization.

Other options for quick frontend deployment include Streamlit or Gradio.

The React Application

The React application creates a web page that fetches and visualizes sales predictions from an external API, recommending an optimal price point.

The app uses useState to manage its data and state, including the selected product, the list of sales predictions, and the loading/error status.

When the user initiates a request, a useEffect hook triggers a fetch request to a Flask backend. It handles the API response as a data stream, processing it line by line to progressively update the predictions.

The AreaChart from the recharts library then visualizes this data. The X-axis represents the price and the Y-axis represents the sales. The chart updates in real-time as the data streams in. Finally, the app displays the optimal price once all the predictions are received.

App.js: (in a separate React app)

<span class="hljs-keyword">import</span> { useState, useEffect } <span class="hljs-keyword">from</span> <span class="hljs-string">"react"</span>
<span class="hljs-keyword">import</span> { AreaChart, Area, XAxis, YAxis, CartesianGrid, Tooltip, ResponsiveContainer, ReferenceLine } <span class="hljs-keyword">from</span> <span class="hljs-string">'recharts'</span>


<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">App</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-comment">// state</span>
  <span class="hljs-keyword">const</span> [predictions, setPredictions] = useState([])
  <span class="hljs-keyword">const</span> [start, setStart] = useState(<span class="hljs-literal">false</span>)
  <span class="hljs-keyword">const</span> [isLoading, setIsLoading] = useState(<span class="hljs-literal">false</span>)

  <span class="hljs-comment">// product data</span>
  <span class="hljs-keyword">let</span> selectedStockcode = <span class="hljs-string">'85123A'</span>
  <span class="hljs-keyword">let</span> selectedProduct = productOptions.filter(<span class="hljs-function"><span class="hljs-params">item</span> =></span> item.id === selectedStockcode)[<span class="hljs-number">0</span>]

  <span class="hljs-comment">// api endpoint</span>
  <span class="hljs-keyword">const</span> flaskBackendUrl = <span class="hljs-string">"YOUR FLASK BACKEND URL"</span>

  <span class="hljs-comment">// create chart data to display</span>
  <span class="hljs-keyword">const</span> chartDataSales = predictions && predictions.length > <span class="hljs-number">0</span>
    ? predictions
      .map(<span class="hljs-function"><span class="hljs-params">item</span> =></span> ({
        <span class="hljs-attr">price</span>: item.unit_price,
        <span class="hljs-attr">sales</span>: item.predicted_sales,
        <span class="hljs-attr">volume</span>: item.unit_price !== <span class="hljs-number">0</span> ? item.predicted_sales / item.unit_price : <span class="hljs-number">0</span>
      }))
      .sort(<span class="hljs-function">(<span class="hljs-params">a, b</span>) =></span> a.price - b.price)
    : [...selectedProduct[<span class="hljs-string">'histPrices'</span>]]

  <span class="hljs-comment">// optimal price to display</span>
  <span class="hljs-keyword">const</span> optimalPrice = predictions.length > <span class="hljs-number">0</span>
    ? predictions.sort(<span class="hljs-function">(<span class="hljs-params">a, b</span>) =></span> b.predicted_sales - a.predicted_sales)[<span class="hljs-number">0</span>][<span class="hljs-string">'unit_price'</span>]
    : <span class="hljs-number">0</span>

  <span class="hljs-comment">// fetch prediction results</span>
  useEffect(<span class="hljs-function">() =></span> {
    <span class="hljs-keyword">const</span> handlePrediction = <span class="hljs-keyword">async</span> () => {
      setIsLoading(<span class="hljs-literal">true</span>)
      setPredictions([])
      <span class="hljs-keyword">const</span> errorPrices = selectedProduct[<span class="hljs-string">'errorPrices'</span>]

      <span class="hljs-keyword">await</span> fetch(flaskBackendUrl)
        .then(<span class="hljs-function"><span class="hljs-params">res</span> =></span> {
          <span class="hljs-keyword">if</span> (res.status !== <span class="hljs-number">200</span>) { setPredictions(errorPrices); setIsLoading(<span class="hljs-literal">false</span>); setStart(<span class="hljs-literal">false</span>) }
          <span class="hljs-keyword">else</span> <span class="hljs-keyword">return</span> <span class="hljs-built_in">Promise</span>.resolve(res.clone().json())
        })
        .then(<span class="hljs-function"><span class="hljs-params">res</span> =></span> {
          <span class="hljs-keyword">if</span> (res && res.length > <span class="hljs-number">0</span>) setPredictions(res)
          <span class="hljs-keyword">else</span> setPredictions(errorPrices)
          setIsLoading(<span class="hljs-literal">false</span>); setStart(<span class="hljs-literal">false</span>)
        })
        .catch(<span class="hljs-function"><span class="hljs-params">err</span> =></span> { setPredictions(errorPrices); setIsLoading(<span class="hljs-literal">false</span>); setStart(<span class="hljs-literal">false</span>) })
        .finally(setStart(<span class="hljs-literal">false</span>))
    }

    <span class="hljs-keyword">if</span> (start) handlePrediction()
    <span class="hljs-keyword">if</span> (predictions && predictions.length > <span class="hljs-number">0</span>) setStart(<span class="hljs-literal">false</span>)
  }, [flaskBackendUrl, start])


  <span class="hljs-comment">// render</span>
  <span class="hljs-keyword">if</span> (isLoading) <span class="hljs-keyword">return</span> <span class="xml"><span class="hljs-tag"><<span class="hljs-name">Loading</span> /></span></span>
  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag"><<span class="hljs-name">div</span>></span>
      <span class="hljs-tag"><<span class="hljs-name">ResponsiveContainer</span> <span class="hljs-attr">width</span>=<span class="hljs-string">"100%"</span> <span class="hljs-attr">height</span>=<span class="hljs-string">"100%"</span>></span>
        <span class="hljs-tag"><<span class="hljs-name">AreaChart</span>
          <span class="hljs-attr">key</span>=<span class="hljs-string">{chartDataSales.length}</span>
          <span class="hljs-attr">data</span>=<span class="hljs-string">{chartDataSales.sort(data</span> =></span> data.unit_price)}
          margin={{ top: 10, right: 30, left: 0, bottom: 0 }}
        >
          <span class="hljs-tag"><<span class="hljs-name">CartesianGrid</span> <span class="hljs-attr">strokeDasharray</span>=<span class="hljs-string">"3 3"</span> <span class="hljs-attr">strokeOpacity</span>=<span class="hljs-string">{0.6}</span> /></span>

          <span class="hljs-tag"><<span class="hljs-name">XAxis</span>
            <span class="hljs-attr">dataKey</span>=<span class="hljs-string">"price"</span>
            <span class="hljs-attr">label</span>=<span class="hljs-string">{{</span> <span class="hljs-attr">value:</span> "<span class="hljs-attr">Unit</span> <span class="hljs-attr">Price</span> ($)", <span class="hljs-attr">position:</span> "<span class="hljs-attr">insideBottom</span>", <span class="hljs-attr">offset:</span> <span class="hljs-attr">0</span>, <span class="hljs-attr">fontSize:</span> <span class="hljs-attr">12</span>, <span class="hljs-attr">marginTop:</span> <span class="hljs-attr">10</span> }}
            <span class="hljs-attr">tickFormatter</span>=<span class="hljs-string">{(tick)</span> =></span> `$${parseFloat(tick).toFixed(2)}`}
            tick={{ fontSize: 12 }}
            padding={{ left: 20, right: 20 }}
          />

          <span class="hljs-tag"><<span class="hljs-name">YAxis</span>
            <span class="hljs-attr">label</span>=<span class="hljs-string">{{</span> <span class="hljs-attr">value:</span> "<span class="hljs-attr">Predicted</span> <span class="hljs-attr">Sales</span> ($)", <span class="hljs-attr">angle:</span> <span class="hljs-attr">-90</span>, <span class="hljs-attr">position:</span> "<span class="hljs-attr">insideLeft</span>", <span class="hljs-attr">fontSize:</span> <span class="hljs-attr">12</span> }}
            <span class="hljs-attr">tick</span>=<span class="hljs-string">{{</span> <span class="hljs-attr">fontSize:</span> <span class="hljs-attr">12</span> }}
            <span class="hljs-attr">tickFormatter</span>=<span class="hljs-string">{(tick)</span> =></span> `$${tick.toLocaleString()}`}
          />

          {/* tooltips with the prediction result data */}
          <span class="hljs-tag"><<span class="hljs-name">Tooltip</span>
            <span class="hljs-attr">contentStyle</span>=<span class="hljs-string">{{</span>
              <span class="hljs-attr">borderRadius:</span> '<span class="hljs-attr">8px</span>',
              <span class="hljs-attr">padding:</span> '<span class="hljs-attr">10px</span>',
              <span class="hljs-attr">boxShadow:</span> '<span class="hljs-attr">0px</span> <span class="hljs-attr">0px</span> <span class="hljs-attr">15px</span> <span class="hljs-attr">rgba</span>(<span class="hljs-attr">0</span>,<span class="hljs-attr">0</span>,<span class="hljs-attr">0</span>,<span class="hljs-attr">0.5</span>)'
            }}
            <span class="hljs-attr">formatter</span>=<span class="hljs-string">{(value,</span> <span class="hljs-attr">name</span>) =></span> {
              if (name === 'sales') {
                return [`$${value.toFixed(4)}`, 'Predicted Sales']
              }
              if (name === 'volume') {
                return [`${value.toFixed(0)}`, 'Volume']
              }
              return value
            }}
            labelFormatter={(label) => `Price: $${label.toFixed(2)}`}
          />

          {/* chart area = sales */}
          <span class="hljs-tag"><<span class="hljs-name">Area</span>
            <span class="hljs-attr">type</span>=<span class="hljs-string">"monotone"</span>
            <span class="hljs-attr">dataKey</span>=<span class="hljs-string">"sales"</span>
            <span class="hljs-attr">fillOpacity</span>=<span class="hljs-string">{1}</span>
            <span class="hljs-attr">fill</span>=<span class="hljs-string">"url(#colorSales)"</span>
          /></span>

          {/* vertical line for the optimal price */}
          {optimalPrice &&
            <span class="hljs-tag"><<span class="hljs-name">ReferenceLine</span>
              <span class="hljs-attr">x</span>=<span class="hljs-string">{optimalPrice}</span>
              <span class="hljs-attr">strokeDasharray</span>=<span class="hljs-string">"4 4"</span>
              <span class="hljs-attr">ifOverflow</span>=<span class="hljs-string">"visible"</span>
              <span class="hljs-attr">label</span>=<span class="hljs-string">{{</span>
                <span class="hljs-attr">value:</span> `<span class="hljs-attr">Optimal</span> <span class="hljs-attr">Price:</span> $${<span class="hljs-attr">optimalPrice</span> !== <span class="hljs-string">null</span> && <span class="hljs-attr">optimalPrice</span> ></span> 0 ? Math.ceil(optimalPrice * 10000) / 10000 : ''}`,
                position: "right",
                fontSize: 12,
                offset: 10
              }}
            />
          }
        <span class="hljs-tag"></<span class="hljs-name">AreaChart</span>></span>
      <span class="hljs-tag"></<span class="hljs-name">ResponsiveContainer</span>></span>

      {optimalPrice && <span class="hljs-tag"><<span class="hljs-name">p</span>></span>Optimal Price: $ {Math.ceil(optimalPrice * 10000) / 10000}<span class="hljs-tag"></<span class="hljs-name">p</span>></span>}

    <span class="hljs-tag"></<span class="hljs-name">div</span>></span></span>
  )
}

<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> App

Final Results

Now, the application is ready to serve.

You can explore the UI from here.

All code (backend) is available in my Github Repo.

Conclusion

Building a machine learning system requires thoughtful project scoping and architecture design.

In this article, we built a dynamic pricing system as a simple single interface on containerized serverless architecture.

Moving forward, we’d need to consider potential drawbacks of this minimal architecture:

Increase in cold start duration: The WSGI adapter awsgi layer adds a small overhead. Loading a larger container image takes longer time.
Monolithic function: Adding endpoints to the Lambda function can lead to a monolithic function where an issue in one endpoint impacts others.
Less granular observability: AWS CloudWatch cannot provide individual invocation/error metrics per API endpoint without custom instrumentation.

To scale the application effectively, extracting functionalities into a new microservice can be a good strategy to the next step.

I’m Kuriko IWAI, and you can find more of my work and learn more about me here:

Portfolio / LinkedIn / Github

All images, unless otherwise noted, are by the author. This application utilizes synthetic dataset licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

This information about AWS is current as of August 2025 and is subject to change.

Source: freeCodeCamp Programming Tutorials: Python, JavaScript, Git & MoreÂ

How to Build a Machine Learning System on Serverless Architecture

Table of Contents

Prerequisites

What We’re Building

AI Pricing for Retailers

The Models

Tuning and Training

The Prediction

Performance Validation

The System Architecture

Core AWS Resources in the Architecture

The Deployment Workflow in Action

Step 1: Draft Python Scripts

Scripts for Data Handling

Scripts for Model Training and Tuning (PyTorch Model)

1. PyTorch Models

2. Scikit-Learn Models (Backups)

Step 2: Configure Feature/Model Stores in S3

Model Store

Feature Store

Step 3: Create a Flask Application with API Endpoints

Key Points on Flask App Configuration

1. A Few API Endpoints Per Container

2. Understanding the handler Function and the role of AWSGI

3. Using Cache Storage

4. Handling Heavy Tasks Outside of the handler Function

Step 4: Publish a Docker Image to ECR

Test in Local

Publish the Docker Image to ECR

Step 5: Create a Lambda Function

Connect the Lambda function to API Gateway

Step 6: Configure AWS Resources

1. The IAM Role: Controls Who to Access Resources

2. The Security Group: Controls Network Traffic

3. The Virtual Private Cloud (VPC)

Building a Client Application (Optional)

The React Application

Final Results

Conclusion

Related Posts

2. Understanding the `handler` Function and the role of AWSGI

4. Handling Heavy Tasks Outside of the `handler` Function