Work with complex Microsoft Excel files from Java (the low-budget way)

Posted on Jan 12, 2019

There are numerous ways to work with Excel Spreadsheets in Java. Apache POI looks like the way to go but if your spreadsheet contains complex formulas, cross-references and all this fancy stuff, you are pretty much on your own. I think Apache POI actually can do some formula computation but definitely nothing more advanced.

If Apache POI is sufficient for your needs, continue your happy life and stop reading this. If not, I hope you have some spare money to invest in JExcel (cheapest licence 700$). Let’s be real, JExcel is fucking expensive but it will save you from a lot of pain and suffering. It uses the Microsoft Excel COM API and provides a very nice API.

Sidenote: If you are able to switch to .NET: DO IT. .NET has a Office-COM-Wrapper already build in. Just include some fancy assembly (google it on your own!) and Microsoft.Office.Tools.Excel is at your service.

Another Sidenote: Read this if you consider server-side automation of office

The pain starts here. What is left for you, is to access the Microsoft Excel COM Interface on your own. You will ask yourself …

  1. How do I actually access COM-Objects from Java?
  2. Where can I look up COM-Interfaces?

Accessing COM-Objects from Java

Good luck googling that. There actually are some libraries for doing that but many of them are not maintained anymore and/or don’t work with complex COM-Interfaces like the one of Microsoft Excel.

While looking at all those libraries I had an idea which could have eased my pain: It should be possible to access .NET from Java (to use Microsoft.Office.Tools.Excel). After hours of trying to get jni4net to generate Java Interfaces from the Microsoft.Office.Interop.Excel.dll assembly, I gave up.

Java Native Access (JNA)

java-native-access/jna

JNA provides Java programs easy access to native shared libraries without writing anything but Java code — no JNI or native code is required. This functionality is comparable to Windows’ Platform/Invoke and Python’s ctypes.

At first, JNA doesn’t sound that well equipped for our needed task, but they actually support using COM Objects on Windows. Its API looks very complicated, but it does the job.

A very very nice person contributed Microsoft Office example code to the project. The code includes a very limited wrapper for some Excel and Word COM-Objects. Just have a look into the com.sun.jna.platform.win32.COM.util.office.excel namespace.

// snippet from MSOfficeExcelDemo
public class MSOfficeExcelDemo {
  public static void main(String[] argv) throws IOException {
    Ole32.INSTANCE.CoInitializeEx(Pointer.NULL, Ole32.COINIT_MULTITHREADED);
     try {
       testExcel();
     } finally {
       Ole32.INSTANCE.CoUninitialize();
     }
   }
 
   public static void testExcel() throws IOException {
     ComIApplication msExcel = null;
     Factory factory = new Factory();
     try {
       ComExcel_Application excelObject = factory.createObject(ComExcel_Application.class);
       msExcel = excelObject.queryInterface(ComIApplication.class);
       
       System.out.println("MSExcel version: " + msExcel.getVersion());
       
       msExcel.setVisible(true);
       
       // ...
     } finally {
       // Make sure the excel instance is shut down
       if (null != msExcel) {
         msExcel.Quit();
       }
       
       // Release all objects acquired by the factory
       factory.disposeAll();
     }
   }
}

If you are lucky, the existing interfaces match all your needs and you can proceed your happy life as before. If you are not lucky, the suffering continues. You need to create those ComInterface wrappers yourself.

Creating ComInterface wrappers is not hard, look at the already existing ones and you are good to go. The only problem left is to get to know the COM Objects you want to access, their ids, methods, return values and so on.

Looking up COM Interfaces

You may think there is a website for browsing COM Objects like in JavaDoc or similar. COM-Objects are a quite old technology, so there must be something.

You are wrong.

I found one way to look into those great COM-Objects. It is OLE/COM Object Viewer (Microsoft Ⓡ) aka oleview.exe.

OLE/COM Object Viewer is a very dated application included in the Windows SDK. On my Windows 10 64bit installation it is located at C:\Program Files (x86)\Windows Kits\10\bin\10.0.17763.0\x86.

Oh boy, I missed those endless tree views …

  • Open Type Libraries
  • Look for Microsoft Excel 16.0 Object Library (Ver 1.9)
  • DOUBLE CLICK IT

Welcome to the place of your dreams

At the right hand side, you see all those interfaces and objects in their naked glory. (They could at least have added syntax highlighting …) Now you are able to create your java-wrapper-interfaces on your own.

Creating Wrappers yourself

Example Code: ComIWorksheet

  • Find the object you want to wrap in oleview.exe
  • Get it’s uuid from the ‘code view’
  • Create a java interface with @ComInterface annotation (use uuid as iid)
  • Add getters for properties: @ComProperty (matches by name, no id needed)
  • Add methods: @ComMethod (also matches by name)
  • And you are done.