Apply sentence case in Jupyter Notebook markdown cells

shuai-zhou · December 22, 2020, 10:42pm

Is there an extension/plugin/workaround that can automatically change texts in Jupyter Notebook markdown cells to sentence case? For example, automatically change the first letter right after a period/question mark/exclamation mark, etc to upper case, also capitalize the single letter “i” automatically? This could be helpful for people who rely on Jupyter Notebook for documenting, presenting, lecturing, ect.

fomightez · December 22, 2020, 11:55pm

Workaround based on here, which traces back to nbclean’s brilliant use of nbformat as described here.

Code Updated: to fix a typo and add additional request from below.

def cap_letters_after_punc(s):
    '''
    Take a string and capitlize every letter after a end-of-sentence-marking 
    punctuation mark.

    Returns a string with those changes.
    based on https://stackoverflow.com/a/28639714/8508004
    Found https://stackoverflow.com/a/63192453/8508004 after wrote which looks 
    to do this in a different approach using regex.
    '''
    punctuation_marks = [". ","! ","? "]
    for pm in punctuation_marks:
        one_pass_result = ""
        for item in s.split(pm):
            one_pass_result += item[0].upper() + item[1:] + pm
        s = one_pass_result[:-2] # remove extra punctuation added to last item
    return s

def capitalize_line_starts(s):
    '''
    Take a string and capitalizes every letter starting a line.
    '''
    #s = s[0].upper() + s[1:] # Make sure first letter in first line capitalized
    one_pass_result = ""
    #print (s.split("\n"))
    for item in s.split("\n"):
        #print(item)
        if item:
            one_pass_result += item[0].upper() + item[1:] + "\n"
        else:
            one_pass_result += item + "\n"
    s = one_pass_result
    return s

nb_file = "name_of_notebook.ipynb" # REPLACE WITH YOUR NOTEBOOK NAME
import nbformat as nbf
ntbk = nbf.read(nb_file, nbf.NO_CONVERT)
cells_to_keep = []
for cell in ntbk.cells:
    if cell.cell_type == "markdown":
        cell['source'] = cell['source'].replace(" i "," I ") # capitalize `i` alone
        cell['source'] = cap_letters_after_punc(cell['source'])
        cell['source'] = capitalize_line_starts(cell['source'])
    cells_to_keep.append(cell)
new_ntbk = ntbk
new_ntbk.cells = cells_to_keep
nbf.write(new_ntbk, "fixed_"+nb_file, version=nbf.NO_CONVERT)

A notebook to test the code on is available here.

shuai-zhou · December 23, 2020, 4:20am

Hi, this works. Thanks. There is actually one typo where you missed a quotation mark. It should be:

nbf.write(new_ntbk, "fixed_"+nb_file, version=nbf.NO_CONVERT)

Anyway, it does work. But I figured out something else that this function has not covered. If starting a new line in the same markdown cell, this function will not capitalize the first letter of the new line. Could you please also add this rule to the function? I tried to add “\n” to the punctuation_marks list, obviously it does not work as expected.

fomightez · December 23, 2020, 5:59pm

In my original post, I fixed the typo and added a step to address your additional request. I noted the update in that post. Beyond it being easier to code separately, I kept the additional one as a separate because that allows more customization.

shuai-zhou · December 29, 2020, 3:51am

Thanks for the updates, they definitely fixed the issues (although the punctuations for some lines are missing after the converting). So this approach actually converts the existing Jupyter Notebook file and rewrites it to a new file while applying the rules to capitalize certain characters. I was wondering, to your knowledge, if there are some ways to do this while typing in the Jupyter Notebook file instead of remedying afterwords.

fomightez · December 29, 2020, 5:39am

Do you have any examples of the starting text where punctuation ends up going away?

I’d suggest starting a separate thread asking in the title about an extension that may do it actively as you type. (Or an example of any extension that can modify contents as you write so that you can use that as a basis to adapt to do what you need.) And to help you get what will work best for you, I’d be specific whether it is fine if it is in JupyterLab or can you only use it in the classic notebook interface?

shuai-zhou · December 31, 2020, 4:27am

The two pics show the results before and after converting. It does capitalize certain letters as expected, but (1) the numbers in the Title are missing; (2) all the original lines end with periods, but after converting, some periods are missing.

To your suggestion about starting a new thread, I do have asked the same question in Jupyter Notebook and Jupyter notebook extensions’ GitHub pages long ago, but have not got any response, and seems that they did not care about this. This is not a big issue for sure, but for people who heavily rely on Jupyter notebook for their work, like documenting, presenting, blogging, or writing a book, this function can be very useful.

fomightez · December 31, 2020, 7:06pm

If it was long ago, I wouldn’t think a thread here would hurt. There’s always priorities and back when you posted before it may have been way down on a long list. Plus you never know who might see you post here now and be able to add the features you desire to a current extension. (Or know where it or something very close already exists and could be adapted further.) How extensions are made has evolved a lot with JupyterLab.

I have now edited the script above further. I think it fixes the removing of characters at the end. In capitalize_line_starts(), I was removing the last character based on the cap_letters_after_punc(); however, that step was not necessary to get things correct, and in fact, was causing the problems you noted.

shuai-zhou · January 4, 2021, 3:21am

Yes, the updated function perfectly fixed all the issues I have encountered.

I will start a new issue on Jupyter notebook extensions’ GitHub page and keep an eye on it. Thanks for your help.

Topic		Replies	Views
Auto-convert cell and output into formatted markdown string Show and Tell	1	1832	February 23, 2023
Programmatically-generated markdown as seperate cells Notebook	0	541	June 14, 2021
Hook markdown cell editor to change what is stored extensions	0	365	July 4, 2023
How do I turn off autocomplete brackets and quotation marks? Notebook how-to	0	28	June 10, 2025
Jupyter Converts Code Cell with input() into Markdown with Comments Notebook	2	685	September 21, 2024

Apply sentence case in Jupyter Notebook markdown cells

Related topics