Optional: Investigate back-pressure

This section builds on your exploration of the Streams Console in Lab 3. It assumes that you have kept the job from Lab 3 running for at least 40 minutes. To proceed, go back to the Application Dashboard in the Streams Console.

Because the file is read every 45 seconds and the throttled drawdown takes a little longer than that (47.55 s), the Throttle’s input buffer eventually fills up. If you let the job or jobs run long enough, a red square or yellow triangle will show in the PE Connection Congestion row of the Summary card. (The congestion metric for a stream tells you how full the destination buffer is, expressed as a percentage.) At the same time, the Flow Rate Chart shows more frequent, lower peaks: the bursts are now limited by the filling up of the Throttle’s input buffer instead of by the data available in the file.

  1. Hover over the information tool in the PE Connection Congestion row in the Summary card to find out exactly which PEs are congested and how badly.

    Also, notice how the pattern in the Flow Rate Chart is now different compared to when the job was young.

  2. View the information panel that shows Observations > Observations at the top of the list. This means that congestion is observed on a stream called Observations at the output port of an operator of the same name. (You did that by removing the operator alias.)
  3. Scroll down the right side of the panel to see the congestionFactor metric, which is at the maximum value of 100. Note, however, that while Observations is the one that suffers congestion, it is the next operator named Throttledthat causes it.
    What will happen eventually if you let this run for a long time? Will the FileSource operator continue to read the entire file every 45 seconds? What happens to its input buffer on the port receiving the file names? How will the DirectoryScan operator respond?
    These questions are intended to get you thinking about a phenomenon called back-pressure. This is an important concept in stream processing. As long as buffers can even out the peaks and valleys in tuple flow rates, everything will continue to run smoothly. But if buffers fill up and are never fully drained, the congestion moves to the front of the graph and something has to give. Unless you can control and slow down the source (as conveniently happens here), data will be lost, which cannot be avoided.