chore(docs): updated ML documentation (#4063)

2026-01-21 00:26:03 +00:00 · 2023-09-12 02:22:42 -04:00
parent 7173af60e4
commit cb437829f3
5 changed files with 75 additions and 17 deletions
--- a/machine-learning/README.md
+++ b/machine-learning/README.md
@@ -17,6 +17,8 @@ Be sure to commit the `poetry.lock` and `pyproject.toml` files to reflect any ch

 To measure inference throughput and latency, you can use [Locust](https://locust.io/) using the provided `locustfile.py`.
 Locust works by querying the model endpoints and aggregating their statistics, meaning the app must be deployed.
-You can run `load_test.sh` to automatically deploy the app locally and start Locust, optionally adjusting its env variables as needed.
+You can change the models or adjust options like score thresholds through the Locust UI.

-Alternatively, for more custom testing, you may also run `locust` directly: see the [documentation](https://docs.locust.io/en/stable/index.html). Note that in Locust's jargon, concurrency is measured in `users`, and each user runs one task at a time. To achieve a particular per-endpoint concurrency, multiply that number by the number of endpoints to be queried. For example, if there are 3 endpoints and you want each of them to receive 8 requests at a time, you should set the number of users to 24.
+To get started, you can simply run `locust --web-host 127.0.0.1` and open `localhost:8089` in a browser to access the UI. See the [Locust documentation](https://docs.locust.io/en/stable/index.html) for more info on running Locust. 
+
+Note that in Locust's jargon, concurrency is measured in `users`, and each user runs one task at a time. To achieve a particular per-endpoint concurrency, multiply that number by the number of endpoints to be queried. For example, if there are 3 endpoints and you want each of them to receive 8 requests at a time, you should set the number of users to 24.