Is there an extension/plugin/workaround that can automatically change texts in Jupyter Notebook markdown cells to sentence case? For example, automatically change the first letter right after a period/question mark/exclamation mark, etc to upper case, also capitalize the single letter “i” automatically? This could be helpful for people who rely on Jupyter Notebook for documenting, presenting, lecturing, ect.
Workaround based on here, which traces back to nbclean’s brilliant use of nbformat
as described here.
Code Updated: to fix a typo and add additional request from below.
def cap_letters_after_punc(s):
'''
Take a string and capitlize every letter after a end-of-sentence-marking
punctuation mark.
Returns a string with those changes.
based on https://stackoverflow.com/a/28639714/8508004
Found https://stackoverflow.com/a/63192453/8508004 after wrote which looks
to do this in a different approach using regex.
'''
punctuation_marks = [". ","! ","? "]
for pm in punctuation_marks:
one_pass_result = ""
for item in s.split(pm):
one_pass_result += item[0].upper() + item[1:] + pm
s = one_pass_result[:-2] # remove extra punctuation added to last item
return s
def capitalize_line_starts(s):
'''
Take a string and capitalizes every letter starting a line.
'''
#s = s[0].upper() + s[1:] # Make sure first letter in first line capitalized
one_pass_result = ""
#print (s.split("\n"))
for item in s.split("\n"):
#print(item)
if item:
one_pass_result += item[0].upper() + item[1:] + "\n"
else:
one_pass_result += item + "\n"
s = one_pass_result
return s
nb_file = "name_of_notebook.ipynb" # REPLACE WITH YOUR NOTEBOOK NAME
import nbformat as nbf
ntbk = nbf.read(nb_file, nbf.NO_CONVERT)
cells_to_keep = []
for cell in ntbk.cells:
if cell.cell_type == "markdown":
cell['source'] = cell['source'].replace(" i "," I ") # capitalize `i` alone
cell['source'] = cap_letters_after_punc(cell['source'])
cell['source'] = capitalize_line_starts(cell['source'])
cells_to_keep.append(cell)
new_ntbk = ntbk
new_ntbk.cells = cells_to_keep
nbf.write(new_ntbk, "fixed_"+nb_file, version=nbf.NO_CONVERT)
A notebook to test the code on is available here.
Hi, this works. Thanks. There is actually one typo where you missed a quotation mark. It should be:
nbf.write(new_ntbk, "fixed_"+nb_file, version=nbf.NO_CONVERT)
Anyway, it does work. But I figured out something else that this function has not covered. If starting a new line in the same markdown cell, this function will not capitalize the first letter of the new line. Could you please also add this rule to the function? I tried to add “\n” to the punctuation_marks list, obviously it does not work as expected.
In my original post, I fixed the typo and added a step to address your additional request. I noted the update in that post. Beyond it being easier to code separately, I kept the additional one as a separate because that allows more customization.
Thanks for the updates, they definitely fixed the issues (although the punctuations for some lines are missing after the converting). So this approach actually converts the existing Jupyter Notebook file and rewrites it to a new file while applying the rules to capitalize certain characters. I was wondering, to your knowledge, if there are some ways to do this while typing in the Jupyter Notebook file instead of remedying afterwords.
Do you have any examples of the starting text where punctuation ends up going away?
I’d suggest starting a separate thread asking in the title about an extension that may do it actively as you type. (Or an example of any extension that can modify contents as you write so that you can use that as a basis to adapt to do what you need.) And to help you get what will work best for you, I’d be specific whether it is fine if it is in JupyterLab or can you only use it in the classic notebook interface?
The two pics show the results before and after converting. It does capitalize certain letters as expected, but (1) the numbers in the Title are missing; (2) all the original lines end with periods, but after converting, some periods are missing.
To your suggestion about starting a new thread, I do have asked the same question in Jupyter Notebook and Jupyter notebook extensions’ GitHub pages long ago, but have not got any response, and seems that they did not care about this. This is not a big issue for sure, but for people who heavily rely on Jupyter notebook for their work, like documenting, presenting, blogging, or writing a book, this function can be very useful.
If it was long ago, I wouldn’t think a thread here would hurt. There’s always priorities and back when you posted before it may have been way down on a long list. Plus you never know who might see you post here now and be able to add the features you desire to a current extension. (Or know where it or something very close already exists and could be adapted further.) How extensions are made has evolved a lot with JupyterLab.
I have now edited the script above further. I think it fixes the removing of characters at the end. In capitalize_line_starts()
, I was removing the last character based on the cap_letters_after_punc()
; however, that step was not necessary to get things correct, and in fact, was causing the problems you noted.
Yes, the updated function perfectly fixed all the issues I have encountered.
I will start a new issue on Jupyter notebook extensions’ GitHub page and keep an eye on it. Thanks for your help.