Split off the ingest module

Now, it gets interesting. In a Streams application, data flows from operator to operator on streams, which are fast and flexible transport links. The Streams application developer is not concerned with how these are implemented. They might work differently between operators running on different hosts, in different PEs on the same host, or in the same PE, but the logic of the graph stays the same. An application requires explicit source and sink operators to exchange data with the outside world through file I/O, database connections, TCP ports, HTTP REST APIs, message queues, and so on.

However, for Streams applications that run in the same instance, another mode of data exchange is possible: Exportand Import. An application can export a stream, which makes it available to other applications running in the instance. One or more applications can import such a stream based on flexible criteria. After exported streams are connected, they behave like all the other streams that run between PEs in an application, and they are fast and flexible. It’s only at the time a job is submitted or canceled that the runtime services get involved to see which links need to be made or broken. After that’s done, there is no difference in runtime behavior (well, almost none, but the difference is beyond the scope of this lab), and there is no performance penalty.

But there is a tremendous gain in flexibility. Application stream connections can be made based on publish-and-subscribe criteria, and this allows developers to design completely modular solutions where one module can evolve and be replaced, removed, or replicated without affecting the other modules. It keeps individual modules small and specialized.

In the lab so far, you built a monolithic application, but there is a logical division. The front end of the application from DirectoryScan to Throttle reads data, in this case from files, and replays that data in a controlled fashion to make it look like a real-time feed.

The rest of the app from Split to FileSinks performs analysis and writes out the results. If you split off the front end into a separate Ingest module, you can imagine that it’s easy to have another module alongside it or as a replacement that produces tuples that have the same structure and similar contents but that come from a completely different source. And that is exactly what this lab will do: add another module that reads a live data feed and makes the data available for processing by the rest of this application.

  1. In the graphical editor, drag a Composite (under Design in the palette) and drop it on the canvas outside of any graphical object, not on the existing main composite. The editor will call it Composite. Rename it to FileIngest.
    Notice that the new composite appears in the Project Explorer, but it does not have a build associated with it.
  2. Create a build for the new composite. In the Project Explorer, right-click the FileIngest main composite. Select New > Build Configuration. In the dialog, change the Configuration name to BuildConfig, accept all other defaults, and click OK.
  3. Move the three front-end operators from the old main composite to the new:
    1. In MyMainComposite, select the three operators FilesObservations, and Throttled. To do this, hold down the Ctrl key while you click each one. Cut them to the clipboard by pressing Ctrl+X or right-click and select Cut.
    2. Select the FileIngest Paste the three operators in by pressing Ctrl+V or right-click and select Paste.
      You now have two applications (main composites) in the same code module (SPL file). This is not standard practice, but it does work. However, the applications are not complete: you have broken the link between Throttled and IDChecker.

  4. Set up the new application (FileIngest) for stream export:
    1. In the palette, find the Export operator and drop one (not a template) into the FileIngest
    2. Drag a stream from Throttled to the Export Note that the schema is remembered even when there was no stream because it belongs to the output port of Throttled.
    3. Edit the Export operator’s properties. Rename it to FileExporter.
    4. In the Param tab, add the properties parameter. In the Value field for properties, enter the following:
      { category = "vehicle positions", feed = "sample file" }
      This action publishes the stream with a set of properties that are completely arbitrary pairs of names and values. The idea is that an importing application can look for streams that satisfy a certain subscription, which is a set of properties that need to match.
    5. The FileIngest application builds, but MyMainComposite still has errors.
  5. Set up the original application for stream import:
    1. In the palette, find the Import operator and drop it into the old main composite.
    2. Drag a stream from Import_11 to IDChecker.
    3. Assign a schema to this stream by dragging and dropping LocationType from the palette.
    4. Rename the new stream to Observations. There is already another stream called Observations, but it is now in a different main composite, so there is no name collision.
    5. Select the Import operator and rename it to Observations by blanking out the alias.
    6. In the Param tab, edit the value for the subscription parameter. Replace the placeholder parameterValue with the following Boolean expression:
      category == "vehicle positions"
      Notice that this is only looking for one property: the key category and the value “vehicle positions”. You can ignore the other one that happens to be available, because if the subscription predicate is satisfied, the connection is made as long as the stream types match.
    7. Save.
  6. Test the new arrangement of the two collaborating applications.
    1. In the Instance Graph, minimize or hide any remaining jobs. Don’t cancel them if you want to observe back-pressure later on. Set the color scheme to Flow Under 100 [nTuples/s]. Enlarge the view so you can easily see the two jobs.
    2. In the Project Explorer, launch the old application MyMainComposite.
    3. Launch the new application FileIngest.
      Notice that the tuples flow from operator to operator throughout the instance graph even though the application is divided into two main composites. Leave the two applications running. You’ll be adding two more.