This might be because of me not understanding how singleuser works (or K8s scheduling for that part…), but I exhaust the limits once I start a single server:
My config looks like this:
singleuser:
storage:
dynamic:
storageClass: openebs-zfspv
nodeSelector:
node-role.kubernetes.io/worker: worker
image:
name: my-singleuser-gpu-image
tag: v1.5.0
extraResource:
limits:
nvidia.com/gpu: 1
nvidia-smi
within a notebook for the first user shows only one of the eight GPUs, so it should not be using all of them.
Now my resources for the node look OK:
Allocatable:
cpu: 256
ephemeral-storage: 397152651836
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 2101193248Ki
nvidia.com/gpu: 8
pods: 110
...
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 100m (0%) 100m (0%)
memory 1126170624 (0%) 50Mi (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
nvidia.com/gpu 1 1
Does anyone have an idea what I am missing here?
Restarted the master node after reading this:
opened 06:44PM - 29 Sep 16 UTC
closed 02:59AM - 14 Apr 21 UTC
sig/scheduling
area/nodecontroller
lifecycle/rotten
**Kubernetes version** (use `kubectl version`):
```
Client Version: version.In… fo{Major:"1", Minor:"3", GitVersion:"v1.3.7", GitCommit:"a2cba278cba1f6881bb0a7704d9cac6fca6ed435", GitTreeState:"clean", BuildDate:"2016-09-12T23:15:30Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.7", GitCommit:"a2cba278cba1f6881bb0a7704d9cac6fca6ed435", GitTreeState:"clean", BuildDate:"2016-09-12T23:08:43Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}
```
**Environment**:
- **Cloud provider or hardware configuration**: AWS, masters (Count: 3, Size: m3.medium), minions (Count 5, Size m4.xlarge)
- **OS** (e.g. from /etc/os-release): 14.04.5 LTS, Trusty Tahr
- **Kernel** (e.g. `uname -a`): Master: 3.13.0-95-generic Minion: 4.4.0-38-generic
- **Install tools**: Ansible using modified contrib playbooks: https://github.com/kubernetes/contrib/tree/master/ansible
- **Others**:
**What happened**: When scheduling pods with a low resource request for CPU (15m) We recieve the message "Insufficient CPU" across all nodes attempting to schedule the pod. We are using multi container pods and running a describe pods shows nodes with available resources to schedule the pods. However k8s refuses to schedule across all nodes.
[kubectl_output.txt](https://github.com/kubernetes/kubernetes/files/501431/kubectl_output.txt)
**What you expected to happen**:
**How to reproduce it** (as minimally and precisely as possible):
Below is a sample manifest that we can use to produce the output.
[manifest.txt](https://github.com/kubernetes/kubernetes/files/501443/manifest.txt)
We end up scheduling pods up until about 10-14 pods and then we run into this problem. See graph below
<img width="1090" alt="screen shot 2016-09-29 at 11 30 56 am" src="https://cloud.githubusercontent.com/assets/5752233/18967327/9a73acde-8639-11e6-9b93-1e545fa3d94f.png">
Works now, so not related to Jupyter.