Hi,
Due to the way API calls send all messages to the API endpoint, this causes:
  1. Increasing token consumption
  2. Messages reaching the context length limit
Therefore, I hope to set the number of messages sent each time for each chat to be n. At this time, the messages will include the system prompt, the first message and first reply, and the last n messages. By setting this, we can reduce the context sent each time, avoiding rapid token consumption and reaching the context length limit.
I hope you can consider my suggestion.
Thank you.