Notes Rich Text to XHTML: the code
¡Ay, caramba! In my previous post, I talked about finding the Philosopers stone: converting Notes Rich Text files into valid XHTML. The stone turned out to be a mountain. But here's a solution. It is based on a script from Julian Robichaux, but my version probably works for Notes 6.5 or higher only.
How it works
To get the HTML from a Notes RichText field, you have to copy it in another RichText field with the setting: Store contents as HTML and MIME:
I first tried to work directly on a field with this setting as does the IBM blog, but I noticed that in my case, the content of the field itself changed every time I edited it: line breaks were added, table information was lost, etc.
So I went for the same solution as Julian Robichaux and BlogSphere: creating a separate document with a copy of the RichText field and shaking it about until it spitted out the required HTML.
This separate document serves later also as a ghost document to retrieve the inline images. I didn't find another way to do it.
The Form code
All the code is in a LotusScript library, which you have to include in the (Globals)Document Options of the form:
Option Public
Option Declare
Use "HtmlEngine"
The conversion itself is triggered in the Document PostSave event:
Sub PostSave(Source As Notesuidocument)
Dim s As New NotesSession
Dim thisdb As NotesDatabase
Dim thisdoc As NotesDocument
Dim html As String
Set thisdb = s.CurrentDatabase
Set thisdoc = source.Document
html=convertRichText(thisDoc,"Body",source.FieldGetText("DbPath"))
Call thisDoc.ReplaceItemValue("HtmlBody",html)
Call thisDoc.Save(True,True)
Call source.Reload
End Sub
The transformation script
The function convertRichText in the HtmlEngine library first copies the field in the convert document. Next it gets the HTML from, and then it converts the image tags.
Function convertRichText(doc As NotesDocument, Byval fieldName As String,Byval dbPath As String) As String
On Error Goto catch
Dim session As New NotesSession
Dim mText As String
Dim db As NotesDatabase
Dim newDoc As NotesDocument
Dim noteID As String
Dim currentSessionMimeSetting As Integer
Dim rtitem As NotesRichTextItem
Dim rtitem2 As NotesRichTextItem
Dim docItem As NotesItem
Dim mimeItem As NotesItem
Dim mime As NotesMIMEEntity
Dim child As NotesMIMEEntity
Dim imgTag As String
Dim altTag As String
Dim imgCount As Integer
Dim x As Integer
Dim y As Integer
Dim z As Integer
Set rtitem=doc.GetFirstItem(fieldName)
If (rtitem Is Nothing) Then
Exit Function
End If
' Get the convert document or create a new one if necessary
currentSessionMimeSetting = session.ConvertMime
session.ConvertMime=True
Set db = session.CurrentDatabase
If doc.MimeDocId(0)>"" Then
Set newDoc=db.GetDocumentByid(doc.MimeDocId(0) & "")
End If
If newDoc Is Nothing Then
Set newDoc=New NotesDocument(db)
newDoc.Form=CONVERTMIMEFORM
Else
While newDoc.HasItem(FieldName)
newDoc.RemoveItem(FieldName)
Wend
End If
newDoc.subject=doc.subject(0) & " - images"
Set rtitem2=New NotesRichTextItem(newDoc, FieldName)
Call rtitem2.AppendRTItem(rtitem)
Call newDoc.Save(True, True)
noteId=RefreshDocFields(newDoc)
doc.MimeDocId=noteId
Set newDoc=Nothing
' Get the HTML
session.ConvertMime=False
Set newDoc=db.GetDocumentById(noteID)
Set mimeItem=newDoc.GetFirstItem(FieldName)
If Not (mimeItem Is Nothing) Then
If (mimeItem.Type=MIME_PART) Then
Set mime=mimeItem.GetMimeEntity
If Not (mime Is Nothing) Then
Call mime.DecodeContent
If (mime.ContentType="multipart") Then
Set child = mime.GetFirstChildEntity
While Not(child Is Nothing)
If child.ContentSubType="html" Then
Call child.DecodeContent
mText = mText & child.ContentAsText
End If
Set child=child.GetNextEntity(1724)
Wend
Else
mText=mText & mime.ContentAsText
End If
End If
End If
End If
session.ConvertMIME=currentSessionMimeSetting
'Converting the image tags
imgCount=1
x = Instr(1,mText,"<img src=cid:")
While x<>0
imgCount=imgCount +1
y=Instr(x,mText,">")
z=Instr(x, mText, | alt=|) +5
If z>5 And z<y Then
altTag=|"|+Mid$(mText, z, y-z)+|"|
Else
altTag=|"M| + Cstr(imgCount) +|"|
End If
altTag=Replace(altTag, |""|, |"|)
imgTag = |<img src="|+dbPath+|0/|+newDoc.universalId+|/|+ fieldName +|/M| + Cstr(imgCount) + |?OpenElement"|
mText = Left(mText,x-1) + imgTag + | alt=| + altTag + | />| + Right(mText,Len(mText)-(y))
x = Instr(1,mText,|<img src=cid:|)
Wend
mText=htmlTidy(mText)
Goto finally
catch:
Msgbox "Error " & Err & " in line " & Erl & ": " & Error$
Resume finally
finally:
ConvertRichText=mText
End Function
Tidying up the HTML to get valid XHTML
Based on what I found in BlogSphere and the IBM blog template, here's what I brewed to transform the HTML4 to valid XHTML:
(see the sample db in the zip file)
The result
This is the output of my test page. All valid XHTML. BTW: you have to scroll a very long way to get to the download, haven't you?
But here's the download. Use at your own risk. Don't come asking if something does not work.
Download
notesrichtext-to-html.zip (75 kB)
Comments
09/07/2007 00:05:41, Jan Schulz
Hi You!
I didn't like the 'unstructured' way this was done so I replace <br> by paragraphs and if a line is all the way sizce 3 or 4, it gets replaced by h2 or h1. This is the code (in htmltidy).
tmp = html crlf=Chr$(13) & Chr$(10) ' tmp=Replace(tmp, "<br>", "<br />") p=Split(tmp,crlf) For x=1 To Ubound(p) ' first: if it has a <br> on the start, we have now a > there... If Left$(p(x),4) = "<br>" Then ' To be a headline, at the start must be a <font size=x and the next </font> must be at the end of the line If Left$(p(x),9) = "<br><font" Then If Not (Instr(5, p(x), "</font>" ) < Len(p(x))-7) Then ' ok, we are a line which might be a headline, but only if teh size is right... tEnd = Instr(5,p(x),">") tmpx=Mid$(p(x),1,tEnd) If Instr(1,tmpx,"size=3") <> 0 Then ' replace start with <h2> and next </font> with a </h2> p(x) = "<h2>" + Mid$(p(x), tend+1, Len(p(x))-7-Len(tmpx) ) + "</h2>" p(x)=Replace(p(x), "<em>","") p(x)=Replace(p(x), "</em>","") p(x)=Replace(p(x), "<strong>","") p(x)=Replace(p(x), "</strong>","") Elseif Instr(1,tmpx,"size=4") <> 0 Then ' replace start with <h2> and next </font> with a </h2> p(x) = "<h1>" + Mid$(p(x), tend+1, Len(p(x))-7-Len(tmpx) ) + "</h1>" p(x)=Replace(p(x), "<em>","") p(x)=Replace(p(x), "</em>","") p(x)=Replace(p(x), "<strong>","") p(x)=Replace(p(x), "</strong>","") Else ' just a paragraph... p(x) = "<p>" + Mid$(p(x), 5) + "</p>" End If Else ' just a paragraph... p(x) = "<p>" + Mid$(p(x), 5) + "</p>" End If Else p(x) = "<p>" + Mid$(p(x), 5) + "</p>" End If End If Next tmp = Join(p, crlf) x=Instr(1,tmp,"<font")
09/07/2007 00:10:27, Jan Schulz
Örks, that not what I wanted...
Available here: http://www.stud.uni-karlsruhe.de/~urla/tidyhtml.txt
Enjoy!
19/09/2007 13:41:10, Declan Kelly
Hi Michel,
I'm getting a "Cannot find external name: HTMLPAGE" error when I try to run the "test-convert" agent. Any ideas? Deck
19/09/2007 13:48:54, Declan Kelly
Ooops, sorry Michel, I take that back... I didn't read the documentation first. Deck
29.10.2007 16:24:11, Brane mxm
I am copying the HTML back on the original form, but the problem is that the field breaks my tags. Example: text text </(end of line) font>. And then the HTML doesnt recognize the tag </font> because it is in two lines.
Help
08/13/2008 07:14:47 PM, Renate W Ravnaas
Hi! Your blog and download was very useful - but I have a question about rich text fields that contains one image with several rectangular hotspots. These does not seem to be converted properly. I just want to check with you if your agent is supporting that? Or maybe anyone else have looked into this? Thanks
29/09/2008 02:38:55, Carlos Collao
Dear Michell: Let me to send you my congratulations for your sample...is great. Michell, I tell you that I customized a web app similar to TinyMCE : http://www.codestore.net/apps/tinymce3.nsf in my site and I don't know if you've seen before but the web app let to save documents with rich text(using Tiny : http://tinymce.moxiecode.com/). Well, I would like to know, how I could to implement a web document only for reader, I mean, in web I create documents using richtext...it saves format text only...i works fine(images....nothing) and documents store in a view. Now, I need to copy my text(richtext) that I created before and paste or create another web document only for reader in a view to see for anonymous...example :http://www.rree.gob.pe/portal/boletinInf.nsf/WEBNotasPrensa?OpenView Please, help me with your good advice. Regards, Carlos. PERU.
11/06/2008 04:49:31 PM, Muhammad Nasir Javed
Hi Michell,
Thanks for the great stuff. I used this in my project where I need to grab all the contents from a Notes email document as html and it worked really awesome.
I have one query here, I want to use this code in a backend agent (scheduled agent) however RefreshDocFields function fails because it uses UI Classes to convert Contents to HTML. Do you have any suggestion how can I achieve this on server side?
I will post if I figured out any solution.
Thanks again for great stuff.
Best Regards, Nasir
To add a comment, log in or register as new user. It's free and safe.