Anonymous
Domino 2.0 Rich Internet Applications with IBM Lotus Notes/Domino
You are here: Today » Notes Rich Text to XHTML: the code
« Notes Rich Text: now doing Semantic XHTML
Notes RichText to HTML: the Philosopher's stone »

Notes Rich Text to XHTML: the code

¡Ay, caramba! In my previous post, I talked about finding the Philosopers stone: converting Notes Rich Text files into valid XHTML. The stone turned out to be a mountain. But here's a solution. It is based on a script from Julian Robichaux, but my version probably works for Notes 6.5 or higher only.

How it works

To get the HTML from a Notes RichText field, you have to copy it in another RichText field with the setting: Store contents as HTML and MIME:

I first tried to work directly on a field with this setting as does the IBM blog, but I noticed that in my case, the content of the field itself changed every time I edited it: line breaks were added, table information was lost, etc.

So I went for the same solution as Julian Robichaux and BlogSphere: creating a separate document with a copy of the RichText field and shaking it about until it spitted out the required HTML.

This separate document serves later also as a ghost document to retrieve the inline images. I didn't find another way to do it.

The Form code

All the code is in a LotusScript library, which you have to include in the (Globals)Document Options of the form:

Option Public
Option Declare
Use "HtmlEngine"

The conversion itself is triggered in the Document PostSave event:

Sub PostSave(Source As Notesuidocument)
    Dim s As New NotesSession
    Dim thisdb As NotesDatabase
    Dim thisdoc As NotesDocument
    Dim html As String
    
    Set thisdb = s.CurrentDatabase
    Set thisdoc = source.Document
    
    html=convertRichText(thisDoc,"Body",source.FieldGetText("DbPath"))
    Call thisDoc.ReplaceItemValue("HtmlBody",html)
    
    Call thisDoc.Save(True,True)
    Call source.Reload
    
End Sub

The transformation script

The function convertRichText in the HtmlEngine library first copies the field in the convert document. Next it gets the HTML from, and then it converts the image tags.

Function convertRichText(doc As NotesDocument, Byval fieldName As String,Byval dbPath As String) As String
    On Error Goto catch
    Dim session As New NotesSession
    Dim mText As String
    Dim db As NotesDatabase
    Dim newDoc As NotesDocument
    Dim noteID As String
    Dim currentSessionMimeSetting As Integer
    Dim rtitem As NotesRichTextItem
    Dim rtitem2 As NotesRichTextItem
    Dim docItem As NotesItem
    Dim mimeItem As NotesItem
    Dim mime As NotesMIMEEntity
    Dim child As NotesMIMEEntity
    Dim imgTag As String
    Dim altTag As String
    Dim imgCount As Integer
    Dim x As Integer
    Dim y As Integer
    Dim z As Integer
    
    Set rtitem=doc.GetFirstItem(fieldName)
    If (rtitem Is Nothing) Then
        Exit Function
    End If
    
' Get the convert document or create a new one if necessary
    currentSessionMimeSetting = session.ConvertMime
    session.ConvertMime=True
    Set db = session.CurrentDatabase
    If doc.MimeDocId(0)>"" Then
        Set newDoc=db.GetDocumentByid(doc.MimeDocId(0) & "")        
    End If
    If newDoc Is Nothing Then
        Set newDoc=New NotesDocument(db)
        newDoc.Form=CONVERTMIMEFORM
    Else
        While newDoc.HasItem(FieldName)
            newDoc.RemoveItem(FieldName)
        Wend
    End If
    newDoc.subject=doc.subject(0) & " - images"
    Set rtitem2=New NotesRichTextItem(newDoc, FieldName)
    Call rtitem2.AppendRTItem(rtitem)
    Call newDoc.Save(True, True)
    noteId=RefreshDocFields(newDoc)
    doc.MimeDocId=noteId
    Set newDoc=Nothing
    
' Get the HTML
    session.ConvertMime=False
    Set newDoc=db.GetDocumentById(noteID)
    Set mimeItem=newDoc.GetFirstItem(FieldName)
    If Not (mimeItem Is Nothing) Then
        If (mimeItem.Type=MIME_PART) Then
            Set mime=mimeItem.GetMimeEntity
            If Not (mime Is Nothing) Then
                Call mime.DecodeContent
                If (mime.ContentType="multipart") Then
                    Set child = mime.GetFirstChildEntity
                    While Not(child Is Nothing)
                        If child.ContentSubType="html" Then
                            Call child.DecodeContent
                            mText = mText & child.ContentAsText
                        End If
                        Set child=child.GetNextEntity(1724)
                    Wend
                Else
                    mText=mText & mime.ContentAsText
                End If
            End If
        End If
    End If
    session.ConvertMIME=currentSessionMimeSetting
    
'Converting the image tags
    imgCount=1
    x = Instr(1,mText,"<img src=cid:")
    While x<>0
        imgCount=imgCount +1
        y=Instr(x,mText,">")
        z=Instr(x, mText, | alt=|) +5
        If z>5 And z<y Then
            altTag=|"|+Mid$(mText, z, y-z)+|"|
        Else
            altTag=|"M| + Cstr(imgCount) +|"|
        End If
        altTag=Replace(altTag, |""|, |"|)
        imgTag = |<img src="|+dbPath+|0/|+newDoc.universalId+|/|+ fieldName +|/M| + Cstr(imgCount) + |?OpenElement"|
        mText = Left(mText,x-1) + imgTag + | alt=| + altTag + | />| + Right(mText,Len(mText)-(y))
        x = Instr(1,mText,|<img src=cid:|)
    Wend
    mText=htmlTidy(mText)
    
    Goto finally
catch:
    Msgbox "Error " & Err & " in line " & Erl & ": " & Error$
    Resume finally
finally:
    ConvertRichText=mText
End Function

Tidying up the HTML to get valid XHTML

Based on what I found in BlogSphere and the IBM blog template, here's what I brewed to transform the HTML4 to valid XHTML:

(see the sample db in the zip file)

The result

This is the output of my test page. All valid XHTML. BTW: you have to scroll a very long way to get to the download, haven't you?

But here's the download. Use at your own risk. Don't come asking if something does not work.

Download

notesrichtext-to-html.zip (75 kB)

Star rating

60%

Comments

  1. 09/07/2007 00:05:41, Jan Schulz

    Hi You!

    I didn't like the 'unstructured' way this was done so I replace <br> by paragraphs and if a line is all the way sizce 3 or 4, it gets replaced by h2 or h1. This is the code (in htmltidy).

    tmp = html crlf=Chr$(13) & Chr$(10) ' tmp=Replace(tmp, "<br>", "<br />") p=Split(tmp,crlf) For x=1 To Ubound(p) ' first: if it has a <br> on the start, we have now a > there... If Left$(p(x),4) = "<br>" Then ' To be a headline, at the start must be a <font size=x and the next </font> must be at the end of the line If Left$(p(x),9) = "<br><font" Then If Not (Instr(5, p(x), "</font>" ) < Len(p(x))-7) Then ' ok, we are a line which might be a headline, but only if teh size is right... tEnd = Instr(5,p(x),">") tmpx=Mid$(p(x),1,tEnd) If Instr(1,tmpx,"size=3") <> 0 Then ' replace start with <h2> and next </font> with a </h2> p(x) = "<h2>" + Mid$(p(x), tend+1, Len(p(x))-7-Len(tmpx) ) + "</h2>" p(x)=Replace(p(x), "<em>","") p(x)=Replace(p(x), "</em>","") p(x)=Replace(p(x), "<strong>","") p(x)=Replace(p(x), "</strong>","") Elseif Instr(1,tmpx,"size=4") <> 0 Then ' replace start with <h2> and next </font> with a </h2> p(x) = "<h1>" + Mid$(p(x), tend+1, Len(p(x))-7-Len(tmpx) ) + "</h1>" p(x)=Replace(p(x), "<em>","") p(x)=Replace(p(x), "</em>","") p(x)=Replace(p(x), "<strong>","") p(x)=Replace(p(x), "</strong>","") Else ' just a paragraph... p(x) = "<p>" + Mid$(p(x), 5) + "</p>" End If Else ' just a paragraph... p(x) = "<p>" + Mid$(p(x), 5) + "</p>" End If Else p(x) = "<p>" + Mid$(p(x), 5) + "</p>" End If End If Next tmp = Join(p, crlf) x=Instr(1,tmp,"<font")

  2. 09/07/2007 00:10:27, Jan Schulz

    Örks, that not what I wanted...

    Available here: http://www.stud.uni-karlsruhe.de/~urla/tidyhtml.txt

    Enjoy!

  3. 19/09/2007 13:41:10, Declan Kelly

    Hi Michel,

    I'm getting a "Cannot find external name: HTMLPAGE" error when I try to run the "test-convert" agent. Any ideas? Deck

  4. 19/09/2007 13:48:54, Declan Kelly

    Ooops, sorry Michel, I take that back... I didn't read the documentation first. Deck

  5. 29.10.2007 16:24:11, Brane mxm

    I am copying the HTML back on the original form, but the problem is that the field breaks my tags. Example: text text </(end of line) font>. And then the HTML doesnt recognize the tag </font> because it is in two lines.

    Help

  6. 08/13/2008 07:14:47 PM, Renate W Ravnaas

    Hi! Your blog and download was very useful - but I have a question about rich text fields that contains one image with several rectangular hotspots. These does not seem to be converted properly. I just want to check with you if your agent is supporting that? Or maybe anyone else have looked into this? Thanks

  7. 29/09/2008 02:38:55, Carlos Collao

    Dear Michell: Let me to send you my congratulations for your sample...is great. Michell, I tell you that I customized a web app similar to TinyMCE : http://www.codestore.net/apps/tinymce3.nsf in my site and I don't know if you've seen before but the web app let to save documents with rich text(using Tiny : http://tinymce.moxiecode.com/). Well, I would like to know, how I could to implement a web document only for reader, I mean, in web I create documents using richtext...it saves format text only...i works fine(images....nothing) and documents store in a view. Now, I need to copy my text(richtext) that I created before and paste or create another web document only for reader in a view to see for anonymous...example :http://www.rree.gob.pe/portal/boletinInf.nsf/WEBNotasPrensa?OpenView Please, help me with your good advice. Regards, Carlos. PERU.

  8. 11/06/2008 04:49:31 PM, Muhammad Nasir Javed

    Hi Michell,

    Thanks for the great stuff. I used this in my project where I need to grab all the contents from a Notes email document as html and it worked really awesome.

    I have one query here, I want to use this code in a backend agent (scheduled agent) however RefreshDocFields function fails because it uses UI Classes to convert Contents to HTML. Do you have any suggestion how can I achieve this on server side?

    I will post if I figured out any solution.

    Thanks again for great stuff.

    Best Regards, Nasir

To add a comment, log in or register as new user. It's free and safe.