- https://kikaben.com/swiglu-2020/#swiglu-activation
-
- I think is just a matrix multiplication in this case
- since the output of swiss is a nxn matrix, and xV is a nxn matrix
- I think is just a matrix multiplication in this case
- Sigmoid Linear Unit (Swish)
-
- GPT
- used by mixtral