comment on reddit about the possible need for async
There is an interesting discussion on Reddit about why Oppkey is looking at async tools.
We think we need async because we are using multiple LLM calls from multiple people. We do not have thousands of requests a second. However, we do have a streams of data coming in from multiple sources, including the Django database, which is a vector database, Postgresql with pgvector.
We have things working on Django, but it is confusing for us as the application was originally written for sync.
These are my initial notes:
htmx-tutorials/docs/djangjo-vs-fastapi.md at main · codetricity/htmx-tutorials · GitHubOur previous experience was working with Django with the data coming from PostgreSQL, which is generally fast. In the past, we had some problems with waiting for a complex sort to finish processing, but we generally solved it with optimization of the SQL calls or dividing it up.
Now, we have a call to an LLM which might take several minutes.
I want several people to start several LLM calls at the same time and be able to do other things while they wait. so, there may be hundreds, but not thousands, of requests running at the same time.
The LLM is going to be OpenAI or Anthropic eventually. Our experience is that the response is slow.
Is there an easier way I should look at?
Your assessment is likely influenced by this set of videos, which focused on the stream.
- FastAPI Introduction - Publish HTML Directly - https://youtu.be/fmrQVbrQ9kw
- FastAPI Streaming to HTMX with SSE - https://youtu.be/D5l_A_kqUhI
- FastAPI and Ollama - Getting Response with HTMX - https://youtu.be/El_-vCpxmTQ
- HTMX with Stream of Chunks from LLM - https://youtu.be/pL86FqeRX08
In addition, we can do the following:
- send messages on status while the person is waiting
- run a report in the background