We can describe the architecture, training procedure, and inference flow for LLM, But what about the hidden/internal state?
What do all these layers do in LLM?
We can describe the architecture, training procedure, and inference flow for LLM, But what about the hidden/internal state?