A Tool I Made To Export From Notes to SQL (ETL)
As I watch all my customers leaves Notes/Domino behind, all I can do is try and assist as best I can. As part of this process I've found myself creating a Notes-to-SQL export tool.
The tool is Notes-based and involves the following three part process:
Step 1. Loop through and analyse every single document in the target database and build a list of all unique combinations of Form + Field + Data-Type. For each combination it finds, a Notes document is created which lists the Form name, the Field name and the type of data (text, Authors, Rich Text, Numbers etc)
The resulting list is then analysed and any duplicate fields (conflicting data types) or unnecessary fields are removed.
Step 2. Taking the results of step 1 it then builds the SQL tables (using the document's UNID as the table's primary key). For each form it creates a table of the same name. For each field related to that form it creates a column in the SQL table with the equivalent data type.
Step 3. Finally, step 3 populates the SQL tables with data by once again looping ALL documents in the database and exporting each document to a new row in SQL according to the map of form-fields we've decided on.
Once it's done we can then go in to SQL and define relationships between parent/child document by joining the tables in a Database Diagram. Although as an enhancement to the tool it could be a part of the export process that we define these rules.
As part of the initial configuration of the tool you can tell it which Forms to ignore, which fields ignore, which fields to always treat as UNIDs etc.
Once the data is in SQL then you can do the T and L of the ETL process.
Attachments?
For attachments there's a special table called Files which stores the UNID and filename of any files found during the export. These files are detached to disk and the path to them stored in the "Files" table. For any given row of any of the SQL tables you can find the associated attachments by looking in the Files table for a matching UNID.
Taking it To Market?
I've always known I need a product to sell. Could this be it? It's some way off being ready to sell directly off the shelf. However, with some support from yours truly I reckon it could be used in most situations or tailored to work where it doesn't quite fit.
If you're interested in using the tool to migrate data from Lotus Notes to SQL Server (or any other RDBMS system) then drop me a line on jake.howlett@rockalldesign.com.
From my experience. When customers move away from Notes they are often so frustrated that they don't want so spend a single Penny on the old infrastructure (but countless money on the new).
Having said that an SQL migration tool could still work because even companies that keep Notes have an increasing demand for storing data in SQL (and little interest in XPages).
Someone who has has done something similar comes to my mind. Johannes Fiala, see
http://www.fwd.at/fwd2/notesdb/index.php
Midas probably also has some interesting capabilities in this area.
And for the rest there still is DECS.
In my case most customers keep Domino around for some time so I recommend considering the new Domino 9 REST interface and get data from there. Where I work there is no Codestore fame so hardly anyone wants to keep their existing data models.
Starting fresh is the new black here.
If you decide to do it I wish you all the best.
Reply
Jake, how do you handle multi-value fields and rich text? I'd be interested in hearing your approach for these data types.
Reply
In the application as it stands multi-value fields are concatenated with a delimiter (such as "$$") and stored in one column. However, my plan for it is to move to normalized table that stores values separately.
As for Rich Text, I've not had to deal with that yet, luckily.
Reply
Good idea. I'm sure there are enough companies out there with data "stuck" in domino without the technical resources on hand to make it accessible that you would probably sell more than a few licenses. It can't hurt to publicize it and make it as "turn-key" as possible. And just because you sell a product doesn't mean you can't keep doing what you do, so I don't see a down side to marketing your tool. Put it out there and don't worry overly much if it succeeds right off. Getting market share is a long, hard slog requiring persistence and a perpetual positive attitude.
A suggested feature might be making it target system agnostic. You could do this with MSADO as your data layer with simple connection string changes between MSSQL, DB2 and ORACLE.
Reply
I would say that you don't need to concentrate on making the tool 'turn-key' as there are probably already products on the market that can do this.
Instead provide the source code with the tool so that users can customise it to their needs, and make this your selling point. You could add license conditions stopping people redistributing your code.
Reply
Show the rest of this thread
What ends up being the presentation layer most of the time? Do you design ASP pages for your SQL data? Do you put it all in Sharepoint lists?
Reply
In the project for which I wrote the app the target was MS Dynamics CRM. However, this part is just the E of ETL (Export, Transform and Load). Where it ends up living shouldn't matter.
Reply
Show the rest of this thread
nitpicker:
In step2, I would save the docunid as a char field and let the sql-db generate a LongInt PK. Numbers are much faster in SQL than char.
Embedded Objects are hard.
@Keyth:
- Multivalue fields maybe saved as their own m-n joined table.
- For RichText I once parsed their content as html with jakarta-http-client from Domino. Nsf was running on Domino Server and I wrote a loop which fired http-Get requests for all documents with richText.
kind regards
Axel
Reply
embedded objects as embedded OLE objects.
There is C-magic to extract the file from those, but afaik code to extract the file depends on the OLE object type.
Reply
Good point about the PK value. I did it with the UNID of the Notes doc as it's then easier to tie them together to other tables in a Parent-Child manner. I guess LongInt could be used as the PK and still join the tables on the UNIDs...
Speed was never really a factor though. The target SQL tables are just temporary containers for the (E)xported data, from where it's (T)ransformed before (L)oading in to the target platform (ETL).
Reply
Wouldn't it be better to offer a DBDC interface such that newly written applications using SQL could access the data currently held in Notes databases?
Reply
I think the point of these companies is to get away from Notes, not to keep it running :) Or am I misunderstanding your comment?
Reply
Show the rest of this thread
Excellent idea ! I don't know the value of your tool in detail but knowing you I am sure it is great quality. i would suggest to extend it to other database types, at least sql. in addition to your initiative, do you remember the hype a few years ago about the puakma server a sql+java alternative to Notes. i would appreciate to know your point of view about such an approach and also do you know about an alternative that offers the same features as Notes except Puakma?
Reply
Hey Jake,
Just read your post. I did a near identical setup as yours for the large data migration (over 5 million documents, from multiple Lotus Notes databases & forms) about 5 years ago, from Lotus Notes to IBM DB2.
As part of that migration we also migrated & sync'd data records from a number of MS Excel spreadsheets, MS access databases, Lotus Approach databases, Lotus 123 spreadsheets and MS Word tables.
Had the same issues with the RichText migration. We chose to automatically print all the Notes documents to PDF files (to retain the format) and then just migrated the unformatted text.
We had many issues, as we were migrating multiple Lotus Notes databases (built by multiple dev's, over numerous years) and other data sources, which required a lot of data cleansing reports for missing fields, incorrect data types, etc...
Also, given that we moving to a new system (IBM Websphere and DB2), there was a lot of work required in data mapping between the old and new systems. We had to provide mechanisms for the SME's to do this mapping, with the results being used to for the final migration to the new system.
Feel free to give me a shout some time and we can compare notes. I may have a few things, tips, idea's; which may help you.
Regards
Patrick
Reply
Jake,
you can't hide forever. We do miss you.
Reply
Jake, yes, we do miss you...
Reply
I reckon Jake is holding out until 14th Jan 2015 for the relaunch of this beloved blog :)
Reply
I chose a sort of different approach. I modified the agent in the help documentation that goes for every notes item in all documents of a database and tests for the type, then built/updated a view then I basically exported to csv. As for the files, I just wrote them to disk by creating folders with the form name and within it put the folders with UNID's and inside all the attachments pertaining to the doc.
Although I was able to fetch all attachments, I did find it ambiguous to understand the main difference of the whereabouts of all embedded objects.
Another thing one has to keep in mind in a Java agent is that in order to not run out of memory one has to recycle what you are no longer needing.
Reply
Sure it's time to return Jake, Lotus Notes to SharePoint must be put on the agenda :-) + the technology writing you did on front-end development techniques. With the shear amount of possibilities it's good to have such a dedicated forum and the style of you teaching that was for me very helpful in the 100% Lotus Notes days.
Awaiting the next post.
Reply
+1
Luckily, I don't have to do any Sharepoint coding (and I sincerely hope I never will), but Jake on Sharepoint would still be better than codestore rusting away ...
Reply
Show the rest of this thread
I think Jake cashed in on this amazing tool, Jaap, the proof being the lack of posts since publicising it. The old soul is flush with cash and probably off to another delightful holiday on a Greek isle. ;-)
Reply
Hey Jake,
How about a post just to say what you're up to now days? Or maybe the continued pursuit of the perfect desk! lol
Reply