Avoka Blog: Adobe LiveCycle

April 14, 2008

How Can You Interface a PDF Form to a Third-party Application?

Filed under: Designing Forms, LiveCycle — htreisman @ 9:36 am

A customer new to LiveCycle recently asked for some technical information regarding the requirements of SmartForms for interfacing to a third party application.

Unfortunately (or perhaps fortunately), there are several ways of interfacing Smart forms to a third party application – the technique you choose depends on a number of factors, including exactly what you’re trying to achieve, and the capabilities of the application.

I’ll try to go through some of them:

1. You can submit the data from a Smart Form directly into your application, via http or https (This is similar to if you submitted an html form to the same application). In order to do this, the application must already define some web-based interfaces using http POST requests. It’s quite straightforward to create a button in a form that invokes an http POST.

2. You can submit the data directly from the form to the application via web services (ie SOAP). Again, in this case, the application must already expose the appropriate web services. Alternately, you could write (using the tool of your choice), a web-service front-end to the application. Calling a web-service from within a form is reasonably straight-forward, since a form can simply import the WSDL (web services definition language). There are some limitations:

  • Some types of web services are not supported by Reader
  • Some complex WSDL definitions may exceed the capability of Reader to invoke them easily. (You can drop down to Javascript code, but this is quite a lot more difficult.)
  • There are some issues with secure web services, and with authenticating proxies.
  • It is quite fiddly to change the destination of the web service. For example, if you have two different web services, one for testing, and the other for production, it generally requires the two versions of the form, each pointing to a different destination.

3. You can submit the data directly from the form to the application via the sending of an email. It’s quite easy to create a button in a form that will send either the form’s data, or the full form, as an attachment. Obviously, again, the application will have to poll a particular mailbox server, and process the data that arrives.

4. You can submit the data from the form using any of the above methods (http/SOAP/email) to an intermediate web application, which you build. The web application can optionally process the data, and then invoke the third-party application. You can build the intermediate application in whatever language/platform you like.

5. You can submit the data from the form using http/SOAP/email to the LiveCycle orchestration engine. You can then use any of the capabilities of the LiveCycle engine to process or manipulate the data, and then submit it to the third-party application. In order to communicate with the third-party application, you can either use one of the “out of the box” capabilities of the orchestration engine (eg database access, messaging, file system, web services, and many more), or you can create a custom component for this purpose.

The final option is by far the most powerful, as well as possibly being the simplest to modify if requirements change. You will need LiveCycle Foundation, however, this is an “almost free” product, that Adobe bundles with just about every LiveCycle product that it sells. LiveCycle Foundation gives you a rich graphical flowchart-like environment for defining exactly what should happen when the data from your form is submitted, and a wide variety of ways of integrating with your third-party application. A full discussion is beyond the scope of this short blog, but it is very powerful and easy. Please see Adobe’s web site for more details.

Avoka provide an add-on component for invoking a LiveCycle process directly as a http or https request from a form. Please see our web site at: http://avoka.dnsalias.com/confluence/display/Public/Avoka+Process+Invoker
for details.

Another type of integration with a form is populating the form with data from the application – we’ll get to this another time.

Limitations of List Variables, Some Workflow Internals Exposed, and a Solution

Filed under: Designing Processes, LiveCycle — htreisman @ 9:30 am

On a recent project, we created a workflow to send an email to each of around 3000 users – the users’ email address and name came from a database table.

Our first attempt was to use the SQL component to read the users’ name and email address into list variables (separate lists for name and email address). Once we had the data in a list, it’s a reasonably easy workflow task to iterate over the list:

  • find out the size of the list, using xpath expression
    • /process_data/@count = xxx
  • define an integer counter variable
  • define a simple loop in the workflow, indexing each element in the lists like this:
    • /process_data/email_list[/process_data/@index] and
    • /process_data/name_list[/process_data/@index]
  • Loop with the following condition:
    • /process_data/@index < /process_data/@count

This all worked fine during simple testing, but once we started testing with large numbers of users, we suddenly experienced a massive performance degradation, around 2-3 seconds for every single step in the workflow. We suspected the very large list variables to be the culprit, but interestingly, even steps in the workflow that did not access the list variables were running slow. While this was happening, both CPU and disk were maxing out. What was going on?

A little knowledge of what is really going on under the covers when a workflow runs explains what is happening. When a long-lived workflow runs, every single step of the workflow effectively runs as a separate, asynchronous step, generally within its own transaction. This means that before each step runs, the workflow engine must establish the “process context”, loading it from the database, and after each step, the process context must be persisted back to the database. The process context basically contains all information about the process, including the values of all variables. The process context is loaded/persisted, regardless of whether a particular variable is actually used by a particular step, because the process engine doesn’t actually know whether a particular variable is going to be used or not.
So what is going on is that we have two very large Java List variables that are being loaded into memory from the database, deserialized, and the serialized and stored back into the database, on each and every step – regardless of whether the list is actually needed in that step or not. For two large lists, this accounted for the 2-3 seconds of activity on each step.

How to solve the problem?

Our first approach was to simply change the workflow to a short-lived workflow. Short-lived workflows don’t persist their data, they run more like a regular Java method, simply creating all variables in memory at the beginning of the process, and garbage collecting them at the end of the process – all variables are in memory all the time, and the process context is made available to each of the steps – no intermediate serialization or storage.

A short-lived workflow did indeed solve the problem, but introduced several new problems, related to error recovery:

  1. If the process failed for any reason (and it did – every now and then, the SendEmail step would fail due to a very overworked Mail Server being too busy to respond), the process threw an exception, but all the data was discarded, and we had no record of how many emails had been sent. It was impossible to restart the stalled process, because short-lived processes don’t stall, they simply throw an exception and they’re done.
  2. We could simply restart the failed process, but then some users would get two emails – not a good look.
  3. The process took several hours to complete, and this exceeded the default transaction timeout on the workflow. Workflow assumes that short-lived workflows are transactional by default. This can be fixed by modifying the workflows transactional properties, but felt “wrong”.

Finally, we solved the problem by completely removing the list variables, and adding a status column to the database table. The logic in our long-lived workflow was something like this:

  1. Using the database component, select a row using a query something like:
    “select id, name, email from usertable where status is null”.
    Retrieve only the first matching row into simple variables called “rowid”, “username” and “email”.
  2. Do stuff. In this case, this meant creating a document, and sending an email.
  3. Using the database component, update the previous row using a statement something like:
    “update usertable set status =’sent’ where id = ?”
    and set the parameter to /process_data/@rowid
  4. Loop. The next time around the loop, the query will pick up the second row, because the first row no longer has a null status. Repeat until no row is returned.

This approach solves the entire problem very elegantly:

  • The status column ensures that you always know where you’re up to. It’s persisted in the database, so it is resilient to any type of failure.
  • There are no large collections slowing down the running of the process – performance was almost an order of magnitude faster.
  • If the workflow stalls at any step, you can just restart it, and things will continue where they left off
  • Even if the workflow stalls in a weird way, where restarting it doesn’t actually cause it to continue (we have seen this occasionally), you can simply start another instance of the workflow, and it will pick the next row in the database. The only danger is that it is possible that one user may get the same email twice, although since it’s usually the mail server that stalls, not the database query, this is unlikely.

Notes:

  • We’re using a long-lived process, not because we have a user-step in the workflow, but because we want the auditability and reliability that long-lived processes deliver.
  • We believe (although we haven’t verified) that Document variables have some “smarts” that Lists and Maps don’t have. In particular, they are cached in memory between steps. It is therefore probably that a Document variable (perhaps containing the XML data from the query, would not create the loading/storing problems that we experienced with Lists. However, the XML data would still need to be parsed into an XML document each time a row was needed, and for a large data-set, this could introduce delays. Overall, we believe that the above solution is superior.

Blog at WordPress.com.