• I’ve seen this a few times. not sure it works
  • I assume it works better than just a linear layer
    • after this layer you do have the final layer / output so it’s just the layer that aggregates the results for output