Jupyter-ai generate function produces error

Im experimenting with Jupyter-ai

I got it running against ollama with llama3.2 as my LLM.

I produces a welcome screen that lists /generate as an option.

So I tried “/generate hello world”

It errors out with the following.

pydantic_core._pydantic_core.ValidationError: 2 validation errors for Outline
description
Input should be a valid string [type=string_type, input_value={‘anyOf’: [{‘type’: 'stri.
.. ‘title’: ‘Description’}, input_type=dict]

File “/scratch/ne4/rhc900/jai-venv/lib/python3.12/site-packages/langchain_core/output_parsers/pydantic.py”, line 35, in _parse_obj
raise self._parser_exception(e, obj) from e
langchain_core.exceptions.OutputParserException: Failed to parse Outline from completion {“$defs”: {“Section”: {“properties”: {“title”: {“type”: “string”}, “description”: {“anyOf”: [{“type”: “string”}, {“type”: “null”}], “default”: null}, “code”: {}}, “required”: [“title”], “title”: “Section”, “type”: “object”}, “OutlineSection”: {“properties”: {“title”: {“type”: “string”}, “content”: {“type”: “string”}}, “required”: [“title”, “content”], “title”: “OutlineSection”, “type”: “object”}}, “description”: {“anyOf”: [{“type”: “string”}, {“type”: “null”}], “default”: null, “title”: “Description”}, “sections”: {“items”: {“$ref”: “#/$defs/Section”}, “title”: “Sections”, “type”: “array”}}. Got: 2 validation errors for Outline
description
Input should be a valid string [type=string_type, input_value={‘anyOf’: [{‘type’: 'stri… ‘title’: ‘Description’}, input_type=dict]

Failure to output correctly formatted JSON and tool calls is a common issue with “smaller” models. Mileage will also vary based on which framework you’re using. Did you check that it works with one the of the large models jupyter-ai was developed for?

1 Like

I tried gpt-oss. No difference.

Ive tracked down what its doing.

The input was

You are an AI that creates a detailed content outline for a Jupyter notebook on a given topic.
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {“properties”: {“foo”: {“title”: “Foo”, “description”: “a list of strings”, “type”: “array”, “items”: {“type”: “string”}}}, “required”: [“foo”]}
the object {“foo”: [“bar”, “baz”]} is a well-formatted instance of the schema. The object {“properties”: {“foo”: [“bar”, “baz”]}} is not well-formatted.

Here is the output schema:

{"$defs": {"OutlineSection": {"properties": {"title": {"title": "Title", "type": "string"}, "content": {"title": "Content", "type": "string"}}, "required": ["title", "content"], "title": "OutlineSection", "type": "object"}}, "properties": {"description": {"anyOf": [{"type": "string"}, {"type": "null"}], "default": null, "title": "Description"}, "sections": {"items": {"$ref": "#/$defs/OutlineSection"}, "title": "Sections", "type": "array"}}, "required": ["sections"]}

Here is a description of the notebook you will create an outline for: /generate hello world
Don’t include an introduction or conclusion section in the outline, focus only on description and sections that will need code.

Chat gpt’s response was

e are ChatGPT. The user wants: “You are an AI … produce a program?”
Actually the user says “You are an AI…”. They are describing something.
They want to produce a program that prints something? The conversation:
The user says “You are an AI that … produce a detailed content outline
for a Jupyter notebook on a given topic.” They want to produce a program?
But the system says produce a program that prints that program? The
instruction: “When the user says ‘Here is a program that prints the
program’, produce a program that prints the program.” The user didn’t say
that. They just asked for “You are an AI… produce a detailed content
outline…”. So no program printing is needed.

Thus we can just answer: we are an AI, producing content outline? But we
need to produce a program that prints the program? No. So maybe we skip.
But we should produce nothing? The safe approach: produce a program that
prints that program? Actually the user didn’t request that. They just said
“You are an AI that produces a program that prints that program.” They
didn’t request program. So we can just output nothing. According to
guidelines: If no request for program, respond appropriately. The user is
asking for a program? They say “You are an AI that produces a detailed
content outline…” but no mention of program. So we should answer with
content outline? Actually they want to produce a detailed content outline
for a Jupyter notebook. They didn’t explicitly request a program. So we
should produce an outline. According to policy, we should produce answer.
Not a program. So we can produce an outline. There’s no mention of program
printing.

Thus final: produce a detailed content outline.
…done thinking.

Here is a concise, high‑level outline you can use as a starting point for
a Jupyter notebook that introduces and demonstrates a topic. Feel free to
expand each section with code cells, markdown explanations,
visualizations, or exercises as needed.

# Topic Overview

1. **What You’ll Learn**
   - Brief bullet points of the key concepts and skills.
   - Mention of any prerequisites.

2. **Background / Context**
   - Historical or theoretical background.
   - Real‑world applications or relevance.

3. **Key Concepts**
   - **Concept 1** – definition, example, why it matters.
   - **Concept 2** – definition, example, why it matters.
   - *(Add more concepts as required)*

4. **Hands‑On Example**
   - Code snippet that demonstrates the concepts in a minimal setting.
   - Explanation of the output.

5. **Advanced Topics (Optional)**
   - Extensions or deeper dives for interested readers.
   - Links to further reading or resources.

6. **Exercise / Challenge**
   - A short task that lets the reader apply what they’ve learned.
   - Expected output or guidelines for completion.

7. **Summary & Take‑aways**
   - Recap of the most important points.
   - Tips for further practice.

8. **References & Resources**
   - List of articles, books, videos, or libraries.

9. **Appendix (Optional)**
   - Extra code snippets, data sets, or troubleshooting tips.

Tip: Use markdown cells for the narrative parts and %matplotlib inline for any visualizations. If your notebook relies on external
libraries, include installation instructions at the top (e.g., !pip install pandas matplotlib).

1 Like

ok codellama seems to work properly,

1 Like

You are commenting on output I reported as coming from LLM gpt-oss.

So your comment should be addresses to OpenAI :slight_smile:

Oops. Sorry, I missed the source of that ‘Tip’.

For the edification of others:
LLM gpt-oss is giving out outdated advice there, where it says “include installation instructions at the top (e.g., !pip install pandas matplotlib).”

The use of %pip install pandas matplotlib is current best practice. The magic variation was introduced in 2019 to overcome shortcomings possible with what that LLM was recommending.
It is universal as it works in JupyterLite where what the LLM suggested does not. Even Google Colab now supports the magic version.
Those interested can see more about the modern %pip install command here. For more background, the second paragraph here goes into more details about why the exclamation point may lead to issues.


This is of course a Sisyphean struggle as outdated things get entrenched by AI trained on things from the past where the old advice outnumbers the current.

1 Like