I would like to show you, how I implement my loss functions inside my Keras Models which gives you more flexibility.

There can be numerous arguments why is it better this way, but I will provide my main points using my method for more complex models:

  • Loss calculation is encapsulated in the model, and can be reached by a name of the layer
  • You can define multiple losses with just a few model outputs and vica-versa

So originally how should we work with Keras model losses? (reference)

We would need to define our loss function(s) when we compile the tf-graph:

from keras import losses
model = build_model()
model.compile(loss=losses.mean_squared_error, optimizer='sgd')

Now this works when our model has 1 output.
Of course with multiple outputs (for example 3) you could define the loss functions like this:

model.compile(loss=[losses.mean_squared_error, losses.binary_crossentropy], optimizer='sgd')

As you can see we only added a list of loss functions. This means that for model output 1 the first function will be used and for the second output the second loss will be used form the list.

And this is a very good way for providing understandable API for basic use. Most of the time you can use this method.

The drawback of this is that every loss function should use the same format:

my_custom_loss_function(y_true, y_pred)

So…. what if I would need my first loss to calculate the second one? Or what if I would like to use some intermediate result from a tensor inside my model for my loss calculation?

In the original scenario you would need to pack everything together to y_pred  variable then unpack them inside the loss function which produces hard to read code and a huge loss function.

Instead of this approach let’s calculate the loss inside the model!

# Building my model from layer to layer...
# ...
# ...
# ...
# my_models_code ....

# Define Loss functions as a Keras Layer
my_custom_loss_1 = keras.layers.Lambda(lambda x: custom_loss_sub_graph(*x)
                           name="my_loss")([tensor_1, tensor_3, tensor_16])

# Defining inputs and outputs
inputs = [...]
outputs = [..., my_custom_loss_1, ...]
model = keras.models.Model(inputs, outputs)

As you can see I created my model (I left the code out as it is not relevant) then I defined a new Lambda Layer with a name. The loss calculation will happen inside this layer and we can reference it by it’s name.

  • custom_loss_sub_graph is just a function which can receive 3 tensors and calculate something (you custom loss)
  • Why do I use a Lambda Layer?
    • Because Keras requires that the output of the model are Keras layers
    • (Of course you can write a separate layer for the calculation to avoid wrapping a sub-graph into a Lambda layer)

When we define the outputs of the model we include this layer also, so it will be part of the model’s graph.

Now here comes the tricky part which feels a bit hacky (and I would love to have a better solution).

Compiling model with custom losses and metrics

After the step above we need to compile our model.

loss_layer_names = {"my_loss", ...}

# Adding losses
for name in loss_layer_names:
    layer = model.get_layer(name)
    loss = (tf.reduce_mean(layer.output, keepdims=True))
    model.add_loss(loss)

# Adding metrics
for name in loss_layer_names:
    layer = model.get_layer(name)
    loss = (tf.reduce_mean(layer.output, keepdims=True))
    model.metrics_names.append(name)
    model.metrics_tensors.append(loss)

model.compile(optimizer="adam", loss=[None] * len(model.outputs))

What happens here?

  • First, I define the name of the loss layers
  • Add losses
    • Add the mean of the tensor for every layer I defined using model.add_loss
  • Add metrics
    • Add the name of the layer as a metric name with model.metrics_names.append
    • Add the mean value of the tensor like at losses with model.metrics_tensors.append
  • Inside the model.compile function set every loss for every model output to None

That’s it!

I think this a bit cleaner way of implementing out custom loss functions and I like the idea that losses are part of my model. Of course there are other ways doing this, I just wanted to you my preferred method.