REST API¶
MLBench provides a basic REST Api though which most functionality can also be used. It’s accessible through the /api/ endpoints on the dashboard URL.
Pods¶
-
GET
/api/pods/
¶ All Worker-Pods available in the cluster, including status information
Example request:
GET /api/pods HTTP/1.1 Host: example.com Accept: application/json, text/javascript
Example response:
HTTP/1.1 200 OK Vary: Accept Content-Type: text/javascript [ { "name":"worn-mouse-mlbench-worker-55bbdd4d8c-4mxh5", "labels":"{'app': 'mlbench', 'component': 'worker', 'pod-template-hash': '1166880847', 'release': 'worn-mouse'}", "phase":"Running", "ip":"10.244.2.58" }, { "name":"worn-mouse-mlbench-worker-55bbdd4d8c-bwwsp", "labels":"{'app': 'mlbench', 'component': 'worker', 'pod-template-hash': '1166880847', 'release': 'worn-mouse'}", "phase":"Running", "ip":"10.244.3.57" } ]
- Request Headers
Accept – the response content type depends on Accept header
- Response Headers
Content-Type – this depends on Accept header of request
- Status Codes
200 OK – no error
Metrics¶
-
GET
/api/metrics/
¶ Get metrics (Cpu, Memory etc.) for all Worker Pods
Example request:
GET /api/metrics HTTP/1.1 Host: example.com Accept: application/json, text/javascript
Example response:
HTTP/1.1 200 OK Vary: Accept Content-Type: text/javascript { "quiet-mink-mlbench-worker-0": { "container_cpu_usage_seconds_total": [ { "date": "2018-08-03T09:21:38.594282Z", "value": "0.188236813" }, { "date": "2018-08-03T09:21:50.244277Z", "value": "0.215950298" } ] }, "quiet-mink-mlbench-worker-1": { "container_cpu_usage_seconds_total": [ { "date": "2018-08-03T09:21:29.347960Z", "value": "0.149286015" }, { "date": "2018-08-03T09:21:44.266181Z", "value": "0.15325329" } ], "container_cpu_user_seconds_total": [ { "date": "2018-08-03T09:21:29.406238Z", "value": "0.1" }, { "date": "2018-08-03T09:21:44.331823Z", "value": "0.1" } ] } }
- Request Headers
Accept – the response content type depends on Accept header
- Response Headers
Content-Type – this depends on Accept header of request
- Status Codes
200 OK – no error
-
GET
/api/metrics/
(str: pod_name_or_run_id)/
¶ Get metrics (Cpu, Memory etc.) for all Worker Pods
Example request:
GET /api/metrics HTTP/1.1 Host: example.com Accept: application/json, text/javascript
Example response:
HTTP/1.1 200 OK Vary: Accept Content-Type: text/javascript { "container_cpu_usage_seconds_total": [ { "date": "2018-08-03T09:21:29.347960Z", "value": "0.149286015" }, { "date": "2018-08-03T09:21:44.266181Z", "value": "0.15325329" } ], "container_cpu_user_seconds_total": [ { "date": "2018-08-03T09:21:29.406238Z", "value": "0.1" }, { "date": "2018-08-03T09:21:44.331823Z", "value": "0.1" } ] }
- Query Parameters
since – only get metrics newer than this date, (Default 1970-01-01T00:00:00.000000Z)
metric_type – one of pod or run to determine what kind of metric to get (Default: pod)
- Request Headers
Accept – the response content type depends on Accept header
- Response Headers
Content-Type – this depends on Accept header of request
- Status Codes
200 OK – no error
-
POST
/api/metrics
¶ Save metrics. “pod_name” and “run_id” are mutually exclusive. The fields of metrics and their types are defined in mlbench/api/models/kubemetrics.py.
Example request:
POST /api/metrics HTTP/1.1 Host: example.com Accept: application/json, text/javascript { "pod_name": "quiet-mink-mlbench-worker-1", "run_id": 2, "name": "accuracy", "date": "2018-08-03T09:21:44.331823Z", "value": "0.7845", "cumulative": False, "metadata": "some additional data" }
Example response:
HTTP/1.1 201 CREATED Vary: Accept Content-Type: text/javascript { "pod_name": "quiet-mink-mlbench-worker-1", "name": "accuracy", "date": "2018-08-03T09:21:44.331823Z", "value": "0.7845", "cumulative": False, "metadata": "some additional data" }
- Request Headers
Accept – the response content type depends on Accept header
- Response Headers
Content-Type – this depends on Accept header of request
- Status Codes
201 Created – no error
Runs¶
-
GET
/api/runs/
¶ Gets all active/failed/finished runs
Example request:
GET /api/runs/ HTTP/1.1 Host: example.com Accept: application/json, text/javascript
Example response:
HTTP/1.1 200 OK Vary: Accept Content-Type: text/javascript [ { "id": 1, "name": "Name of the run", "created_at": "2018-08-03T09:21:29.347960Z", "state": "STARTED", "job_id": "5ec9f286-e12d-41bc-886e-0174ef2bddae", "job_metadata": {...} }, { "id": 2, "name": "Another run", "created_at": "2018-08-02T08:11:22.123456Z", "state": "FINISHED", "job_id": "add4de0f-9705-4618-93a1-00bbc8d9498e", "job_metadata": {...} }, ]
- Request Headers
Accept – the response content type depends on Accept header
- Response Headers
Content-Type – this depends on Accept header of request
- Status Codes
200 OK – no error
-
GET
/api/runs/
(int: run_id)/
¶ Gets a run by id
Example request:
GET /api/runs/1/ HTTP/1.1 Host: example.com Accept: application/json, text/javascript
Example response:
HTTP/1.1 200 OK Vary: Accept Content-Type: text/javascript { "id": 1, "name": "Name of the run", "created_at": "2018-08-03T09:21:29.347960Z", "state": "STARTED", "job_id": "5ec9f286-e12d-41bc-886e-0174ef2bddae", "job_metadata": {...} }
:run_id The id of the run
- Request Headers
Accept – the response content type depends on Accept header
- Response Headers
Content-Type – this depends on Accept header of request
- Status Codes
200 OK – no error
-
POST
/api/runs/
¶ Starts a new Run
Example request:
POST /api/runs/ HTTP/1.1 Host: example.com Accept: application/json, text/javascript
- Request JSON Object
name (string) – Name of the run
num_workers (int) – Number of worker nodes for the run
num_cpus (json) – Number of Cores utilized by each worker
Example response:
HTTP/1.1 200 OK Vary: Accept Content-Type: text/javascript { "id": 1, "name": "Name of the run", "created_at": "2018-08-03T09:21:29.347960Z", "state": "STARTED", "job_id": "5ec9f286-e12d-41bc-886e-0174ef2bddae", "job_metadata": {...} }
- Request Headers
Accept – the response content type depends on Accept header
- Response Headers
Content-Type – this depends on Accept header of request
- Status Codes
200 OK – no error
409 Conflict – a run is already active