Hi folks , I have my Jupyterhub running on a server and I use Docker spawner to spawn container for each user. i want to use some monitoring tool like grafana to monitor Hub logs as well as container logs. I know that its possible for jupterhub on K8s but how to do the same for tools not using kubernetes ?
Grafana is typically used for visualising metrics, e.g. resource consumption, number of servers, performance, etc. JupyterHub exposes some Prometheus metrics
These can be scraped using Prometheus, and viewed with Grafana, regardless of the hosting platform (K8s, Docker, etc).
If you’re actually interested in gathering logs, try searching for “log aggregation”, “centralised logging”, or similar, as this is a topic that applies to all infrastructure, not just JupyterHub
We did this sort of stuff using Promtail and Grafana Loki. Ansible repos are private so cant share them here. But here are the config files for Promtail and Grafana Loki:
# WARNING: This file is Ansible managed. Do not modify it
# Loki Config file
# based on https://github.com/grafana/loki/blob/master/cmd/loki/loki-docker-config.yaml
# Documentation: https://grafana.com/docs/loki/latest/configuration/
# Reference: https://github.com/grafana/loki/issues/4613#issuecomment-1018367471
# Enables authentication through the X-Scope-OrgID header, which must be present
# if true. If false, the OrgID will always be set to "fake".
auth_enabled: False
# Configures the server of the launched module(s).
server:
http_listen_address: localhost
http_listen_port: 3100
http_server_read_timeout: 310s # allow longer time span queries
http_server_write_timeout: 310s # allow longer time span queries
grpc_server_max_recv_msg_size: 33554432 # 32MiB (int bytes), default 4MB
grpc_server_max_send_msg_size: 33554432 # 32MiB (int bytes), default 4MB
# Log only messages with the given severity or above. Supported values [debug,
# info, warn, error]
# CLI flag: -log.level
log_level: info
# Configures the ingester and how the ingester will register itself to a
# key value store.
ingester:
wal:
enabled: true
dir: /var/lib/loki/wal
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 1h # Any chunk not receiving new logs in this time will be flushed
max_chunk_age: 1h # All chunks will be flushed when they hit this age, default is 1h
chunk_target_size: 1048576 # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first
chunk_retain_period: 30s # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
max_transfer_retries: 0 # Chunk transfers disabled
schema_config:
configs:
- from: 2020-05-15
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb:
directory: /var/lib/loki/index
filesystem:
directory: /var/lib/loki/chunks
boltdb_shipper:
active_index_directory: /var/lib/loki/boltdb-shipper-active
cache_location: /var/lib/loki/boltdb-shipper-cache
cache_ttl: 72h # Can be increased for faster performance over longer query periods, uses more disk space
shared_store: filesystem
compactor:
working_directory: /var/lib/loki/boltdb-shipper-compactor
shared_store: filesystem
compaction_interval: 2h
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
limits_config:
retention_period: 168h
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 84h
# Per-user ingestion rate limit in sample size per second. Units in MB.
# CLI flag: -distributor.ingestion-rate-limit-mb
ingestion_rate_mb: 8 # <float> | default = 4
# Per-user allowed ingestion burst size (in sample size). Units in MB.
# The burst size refers to the per-distributor local rate limiter even in the
# case of the "global" strategy, and should be set at least to the maximum logs
# size expected in a single push request.
# CLI flag: -distributor.ingestion-burst-size-mb
ingestion_burst_size_mb: 16 # <int> | default = 6
# Maximum byte rate per second per stream,
# also expressible in human readable forms (1MB, 256KB, etc).
# CLI flag: -ingester.per-stream-rate-limit
per_stream_rate_limit: 5MB # <string|int> | default = "3MB"
# Maximum burst bytes per stream,
# also expressible in human readable forms (1MB, 256KB, etc).
# This is how far above the rate limit a stream can "burst" before the stream is limited.
# CLI flag: -ingester.per-stream-rate-limit-burst
per_stream_rate_limit_burst: 15MB # <string|int> | default = "15MB"
# The limit to length of chunk store queries. 0 to disable.
# CLI flag: -store.max-query-length
max_query_length: 168h # <duration> | default = 721h
# Limit how far back in time series data and metadata can be queried,
# up until lookback duration ago.
# This limit is enforced in the query frontend, the querier and the ruler.
# If the requested time range is outside the allowed range, the request will not fail,
# but will be modified to only query data within the allowed time range.
# The default value of 0 does not set a limit.
# CLI flag: -querier.max-query-lookback
max_query_lookback: 168h
# Split queries by a time interval and execute in parallel.
# The value 0 disables splitting by time.
# This also determines how cache keys are chosen when result caching is enabled
split_queries_by_interval: 30m
# Maximum number of active streams per user, across the cluster. 0 to disable.
# When the global limit is enabled, each ingester is configured with a dynamic
# local limit based on the replication factor and the current number of healthy
# ingesters, and is kept updated whenever the number of ingesters change.
# CLI flag: -ingester.max-global-streams-per-user
max_global_streams_per_user: 100000 # <int> | default = 5000
# Limit the maximum of unique series that is returned by a metric query.
# When the limit is reached an error is returned.
# CLI flag: -querier.max-query-series
max_query_series: 100000 # <int> | default = 500
# Timeout when querying backends (ingesters or storage) during the execution of
# a query request. If a specific per-tenant timeout is used, this timeout is
# ignored.
# CLI flag: -querier.query-timeout
query_timeout: 5m # default = 1m
frontend:
# Maximum number of outstanding requests per tenant per frontend; requests
# beyond this error with HTTP 429.
# CLI flag: -querier.max-outstanding-requests-per-tenant
max_outstanding_per_tenant: 2048 # default = 100
# Compress HTTP responses.
# CLI flag: -querier.compress-http-responses
compress_responses: true # default = false
# Log queries that are slower than the specified duration. Set to 0 to disable.
# Set to < 0 to enable on all queries.
# CLI flag: -frontend.log-queries-longer-than
log_queries_longer_than: 20s
frontend_worker:
grpc_client_config:
# The maximum size in bytes the client can send.
# CLI flag: -<prefix>.grpc-max-send-msg-size
max_send_msg_size: 33554432 # 32MiB, default = 16777216
max_recv_msg_size: 33554432
ingester_client:
grpc_client_config:
# The maximum size in bytes the client can send.
# CLI flag: -<prefix>.grpc-max-send-msg-size
max_send_msg_size: 33554432 # 32mb, default = 16777216
max_recv_msg_size: 33554432
query_scheduler:
max_outstanding_requests_per_tenant: 2048
grpc_client_config:
# The maximum size in bytes the client can send.
# CLI flag: -<prefix>.grpc-max-send-msg-size
max_send_msg_size: 33554432 # 32mb, default = 16777216
max_recv_msg_size: 33554432
# Dont enable anonymous usage reporting.
analytics:
reporting_enabled: false
And you can setup dashboards in Grafana to get these logs. I hope that can give you an idea!