Get started with IBM Streams

Get started with IBM Streams

Course Features

Course Details

IBM® Streams enables the continuous processing and fast analysis of possibly massive volumes of moving data to speed up business insight and decision making. Streams provides an execution platform for user-developed applications that ingest, filter, analyze, and correlate the information in streaming data. Streams helps you: Analyze data in motion Provides sub-millisecond response times, allowing you to view and act on information and events as they unfold. Simplify development of streaming applications Uses an Eclipse-based integrated development environment (IDE), which is called Streams Studio. Extend the value of existing systems Integrates with your applications and supports both structured and unstructured data sources. This course introduces you to IBM Streams so that you can starting building applications that analyze real-time data.
This is an IBM Open Badge eligible activity Visit the dW Course badges page for more information and to earn a knowledge badge for this course!
Get started

FREE
Learning objectives
1 of 7
Learn to use the components and features of IBM Streams and more specifically, Streams Studio. You’ll build and enhance a simple application based on a connected-car automotive scenario in which you track vehicle locations and speeds (and other variables). Although the underlying Streams Processing Language (SPL) code is always accessible, this course requires no programming and does not describe the syntax and other features of SPL. This course shows you: The basics of stream computing, the fundamental concepts of IBM Streams, and the IBM Streams runtime environment How to use Streams Studio for creating and importing projects, submitting and canceling jobs, and viewing jobs, health, metrics, and data How to use the graphical editor to design and enhance a Streams application How to use the data visualization capabilities in the Streams Console This course includes four labs that help you to learn the capabilities of IBM Streams: Lab 1 Build a simple IBM Streams application. Then, run it in the Streams runtime environment and inspect the results. Lab 2 Enhance the application by adding the ability to read multiple files from a given directory and slow down the flow so that you can watch things happen. Lab 3 Add an operator to your application to compute the average speed every five observations, separately for two cars. Use the Streams Console to visualize results. Lab 4 Use exported application streams to create a modular application. Bring in live vehicle location data. Show the live and simulated location data on a map.
FREE
Install the Quick Start Edition
4 of 7
You have different options about what to install, but it’s recommended that you install the Quick Start Edition virtual machine. The following software is already installed on the Quick Start VMware image: Red Hat Enterprise Linux 6.8 (64-bit) IBM Streams Quick Start Edition 4.2.0.0, including Streams Studio The Quick Start VMware Image has the following configuration: Parameter Value Host name streamsqse (streamsqse.localdomain) User and administrator ID streamsadmin (logged in automatically) User home directory /home/streamsadmin User password passw0rd (password with a zero for the O) root password passw0rd Streams domain StreamsDomain (started automatically) Streams instance StreamsInstance (started automatically)   In the Quick Start VMware Image, a domain (StreamsDomain) and instance (StreamsInstance) are already created and automatically started. This means that everything you need to run and test your applications is already prepared for you. A domain is a logical grouping of resources in a network for the purpose of common management and administration. The domain is managed by a small number of Linux services (daemons) for tasks such as authentication and authorization, auditing, and supporting the Streams Console. A domain can contain one or more Streams instances that share the domain’s security model. An instance provides the runtime environment where you can submit applications to. It consists of a small number of additional services, for example, a resource manager, an application manager, and a scheduler. The labs in this course do not explore the creation and administration of domains and instances. System requirements Components Minimum requirements Comments Operating system 64-bit operating system that supports VMware VMware is supported on the following operating systems: Apple Mac OS X Linux Microsoft Windows Memory 8GB The amount of memory that is required by IBM Streams depends on the applications that are developed and deployed. This minimum requirement is based on the memory requirements of the Commodity...

Continue Reading
FREE
Install the lab
5 of 7
After you start the Quick Start VM image, you need to install the lab projects, data files, and toolkits. You must have Internet access from your virtual machine (VM). To install the lab package on the Linux VM: From your VM, download the lab ZIP file (GitHub). Then, extract the files. Extract the files to streamsadmin home folder. Double-click the IntroLab_Install.sh shell script. Then, click Run in terminal. Detailed progress messages are written to the IntroLab_Install.log file (in the same folder). The installation will take about five minutes, depending on the speed of your Internet connection. The installation program builds a required toolkit, which involves downloading additional components. When the installation is complete, press any key to terminate the script and close the terminal. The script removes itself and the installation files archive, so after successful completion you have only the downloaded file, the installation log, and an uninstall script in your home folder (in addition to any files already there before you started). The uninstall script, IntroLab_4.1.1_Uninstall.sh, removes all installed files related to this lab, including toolkits, data, and desktop launchers. It does not remove any work you might have done in projects in your own Streams Studio workspace. Use this script if you want to clean up your environment later. If the installation fails If installation was not successful, check the installation log. After failure, the script cleans removes any files and directories it already installed. If you are not using the Quick Start VM and do not have Apache Ant (1.8 or later) or Apache Maven (3.2 or later) installed in your environment, the installation will fail. These two utilities are required to build the Internet Toolkit, which is used in the final lab. Explaining exactly how to install these utilities is beyond the scope of this document. Other causes of failure might be that files...

Continue Reading
FREE
Explore Streams
7 of 7
The IBM Streams platform consists of the following components: Component Description Streams Studio An Eclipse-based Integrated Development Environment (IDE) for creating, compiling, running, visualizing, and debugging Streams applications. Streams Console A web-based graphical user interface for managing and monitoring the runtime environment and applications. This is where you can work with application graphs. Streams runtime A set of processes that work together to let you run stream processing applications on a group of host computers (“resources”) in a cluster. Streams Processing Language (SPL) A declarative and procedural language and framework for writing stream processing applications. Development and management interfaces APIs for creating building blocks (toolkits) and interfaces for interacting manually and programmatically with the runtime and applications. Toolkits Packages of building blocks for developing applications. They provide operators, functions, and types for many kinds of analytics and adapters for interfacing to external systems. You can use an application graph to visualize the overall status of running applications and view information about application jobs, processing elements (PEs), operators, ports, and connections. You can also view or download log and trace data for components in the application graph. To select options for certain objects in the graph, such as a PE or operator output port, right-click the object. The following image shows an IBM Streams application that’s represented as a graph of connected operators. To see the Studio interface: On the VM desktop, double-click the Streams Studio Accept the prepopulated workspace (/homes/streamsadmin/workspace) and click OK. The Eclipse integrated development (IDE) environment is divided into multiple windows, which are called views. Project Explorer: Shows project contents and details. Streams Explorer: Shows domain and instance information, including any jobs that are running. Editing pane: Empty when Studio is started. This is where you edit code or create application graphs. Outline: When a Streams Processing Language (SPL) source file is open...

Continue Reading

Lab 1: develop a simple application

FREE
Create a project
3 of 11
You’re ready to start building your application. First, you’ll create a Streams project, which is a collection of files in a directory tree in the Eclipse workspace, and then an application in that project. The New SPL Application Project wizard takes care of a number of steps in one pass: it creates a project, a namespace, an SPL source file, and a main composite. In Streams, a main composite is an application. Usually, each main composite lives in its own source file (of the same name), but this is not required. This lab does not explore composite operators or what distinguishes a main composite from any other composite. To create a project and a main composite: Click File > New > Project. Alternatively, right-click in the Project Explorer and select New > Project. In the New Project dialog, expand IBM Streams Studio, and select SPL Application Project. Click Next. In the New SPL Application Project wizard, enter the following information: Project name: MyProject Namespace: my.name.space Main Composite Name: MyMainComposite Click Next. On the SPL Project Configuration panel, change the Toolkit name to MyToolkit. In the Dependencies field, clear Toolkit Locations. Click Finish. Project dependencies: In the Dependencies field you can signal that an application requires operators or other resources from one or more toolkits—a key aspect of the extensibility that makes Streams such a flexible and powerful platform. For now, you will use only building blocks from the built-in Standard Toolkit. Check your results. You should see the following items in Streams Studio: The new project shows in the Project Explorer view on the left. The code module named MyMainComposite.spl opens in the graphical editor with an empty composite named MyMainComposite in the canvas. Tip for adding space: If you want to give the editor more space, close the Outline view, and collapse the Layers and Color Schemes palettes. (This lab does not use them.)
FREE
Review the Project Explorer
4 of 11
The Project Explorer shows both an object-based and a file-based view of all the projects in the workspace. In Project Explorer, note that you can expand and collapse MyProject by clicking the twisty on the left. The next level shows namespaces (only one, in this case), a dependencies entry, and resources (directories and files). Below the namespace, my.name.space, are main composites. Other objects, such as types and functions, if you had any, appear there also. Under my.name.space, see the main composite MyMainComposite. The next level shows build configurations. Here, there is only one build configuration, named BuildConfig, that is created by default. You can create multiple builds, for debug, different optimization levels and other variations. Expand Resources. The next level shows a number of directories and two XML files that contain descriptions of the current application or toolkit. (In Streams, toolkit and application are the same in terms of project organization and metadata.) The only directory you will use in this lab is the data directory. By default, build configurations in Streams Studio specify this as the root for any relative path names (paths that do not begin with a forward slash “/”) for input and output data files. Default data directory Streams applications do not have a default data directory unless you explicitly set one in the build specification. Here, you are simply taking advantage of a feature of Streams Studio, which will provide that specification by default. It works because you have only a single host. Because Streams is a distributed platform that does not require a shared file system, you need to be careful when you specify file paths. A process accessing a file must run on a host that can reach it. In general, this means specifying absolute paths and constraining where a particular process can run. Using relative paths and a default...

Continue Reading
FREE
Define a stream type
5 of 11
Rather than separately define the schema (stream type) in the declaration of each stream, create a type first so that each stream can simply refer to that type. Keeping the type definition in one place eliminates code duplication, and improves consistency and maintainability. Create a type named LocationType for the vehicle location data that will be the main kind of tuple flowing through the application. Use the following information to create the stream type: Name Type Comments id rstring Vehicle ID (an rstring uses “raw” 8-bit characters) time int64 Observation timestamp (milliseconds since 00:00:00 on January 1, 1970) latitude float64 Latitude (degrees) longitude float64 Longitude (degrees) speed float64 Vehicle speed (km/h) heading float64 Direction of travel (degrees, clockwise from north)   To define a stream type: In the graphical editor, right-click anywhere on the canvas outside the main composite (MyMainComposite), and click Edit. You see a Properties view, which floats above all the other views. Be sure that it looks the same as the following screen capture with the three tabs for General, Uses, and Types. If your Properties view does not look the same as the screen capture, dismiss the view and right-click again in the graphical editor outside the main composite. In the Properties view, click Types. Click Add New Type in the Name column. Enter LocationType, and then press Enter. Click Add Attribute below LocationType in the Name column, and enter id. Press the Tab key to go to the Type column, and enter rstring. Tip for content assist: In the Type column, press Ctrl+Space to get a list of available types. Begin typing (for example, “r” for rstring) to narrow down the list. When the type you want is selected, press Enter to assign it to the field. This reduces keyboard effort and typing errors. Press the Tab key to go to the next Name field. Enter the attribute names and types listed in the table above. Leave the Properties view open. Tip for obscured views: The floating Properties view might obscure other...

Continue Reading
FREE
Create an application graph
6 of 11
You are now ready to construct the application graph and will need the following data for this section: Parameter Value Input file /home/streamsadmin/data/all.cars Output file filtered.cars File format CSV (both input and output) Do not use quotation marks around strings on output Filter condition vehicle ID is “C101” or “C133” Stream names Observations (before filter) Filtered (after filter)   With this information, you can create the entire application. You will use the graphical editor. There will be no SPL coding in this lab. Tip for seeing the code: If you want to see SPL code for what you are creating, right-click anywhere in the graphical editor and select Open with SPL Editor. To drop the operators that you want into the graph, you need to find them in the palette, which is the panel to the left of the canvas. Operator templates Some operators appear once in the palette. Others (the ones you will use) have twisties and expand into one or more subentries. These are templates: invocations of the operator with specific settings, for example, a Filter operator with a second output port for rejected tuples. In this lab, the generic version (with the twisty) is always the correct one. Don’t use the templates. Operator names The editor generates placeholder names for the operators that you drag onto the canvas. These placeholders include the operator type (“FileSink”) and a sequence number (“1”). The sequence number depends on the order in which the operators are added to the graph, and yours might not match this document. You can safely ignore that. It does not affect anything in the application, and in any case, you will change the generated names later to match the role each operator plays. Organize layout and maximize in view To organize the layout, click the Layout button in the...

Continue Reading
FREE
Specify stream properties
8 of 11
The streams are what hold the graph together, so give meaning to them first. Tell the operators how to do their jobs later. To assign a name and a type to a stream: Select the stream (dashed arrow) connecting FileSource_2 and Filter_3. Sometimes you need to try a few times before the cursor catches the stream instead of selecting the enclosing main composite.The Properties view, which you used earlier to create LocationType, now shows stream properties. Reposition and resize the view if necessary so that it doesn’t obscure the graph you’re editing. If you closed the Properties view, double-click the stream to reopen it. Enter descriptive stream names, which are preferable to the placeholder names generated by the editor, by clicking Rename in the Properties view (General tab). In the Rename dialog, under Specify a new name, enter Observations and click OK.Notice that this saves the file and starts a build. This is because renaming an identifier is a form of refactoring, which means that not only the identifier itself but also any references to it must be found and updated. This requires a compilation step to ensure that the code is consistent and all references are known. Specify the stream schema. You can do that in the Properties view, but because you already created a type for this, you can also drag and drop it in the graphical editor like any other object. In the graphical editor, clear the palette filter by clicking the Eraser button next to where you entered fil This makes all the objects visible again. Under Current Graph, expand Schemas. This shows the LocationType type and the names of the two streams in the graph. Select LocationType and drag it into the graph. Drop it onto the Observations stream, which is the one between FileSource_2 and Filter_3. Make sure that the stream’s selection handles turn green...

Continue Reading
FREE
Specify operator properties
9 of 11
With the streams fully defined, it is time to configure the operators. In the graphical editor, select FileSink_1. In the Properties view, click the Param tab. This shows one mandatory parameter, file, with a placeholder value of parameterValue (not a valid value, hence the error marker). Click on the field that says parameterValue, and type "filtered.cars" (with the double quotes). Press Enter. Note that this is a relative path. It doesn’t start with “/”, so this file will go in the data subdirectory of the current project as specified by default for this application. Click Add to add two more parameters. In the Select parameters dialog, select format and quoteStrings. You might need to scroll down to find it. Click OK. For the value of format, enter csv. Do not use quotation marks; this is an enumerated value. For the value of quoteString, enter false. Do not use quotation marks; this is a Boolean value. The properties view should look like this: The FileSource operator needs to know what file to read. In the graphical editor, select the FileSource_2 operator. In the Properties view (Param tab), click Add. In the Select parameters dialog, select file and format. Click OK. In the value for file, enter "/home/streamsadmin/data/all.cars" (with quotes and exactly as shown—all lowercase). For format, enter csv. You have to tell the Filter operator what to filter on. Without a filter condition, it will simply copy every input tuple to the output. In the graphical editor, select Filter_3. In the Properties view (Param tab), click Add. In the Select parameters dialog, select filter, and click OK. In the value field, enter the Boolean expression id in ["C101","C133"] to indicate that only tuples for which that expression evaluates to true should be passed along to the output. (The expression with the key word in followed by a list evaluates to true only if an element of the list matches the item on the left.) Save your changes. The error markers disappear, and the application is valid. Make a few final changes. Select the FileSink operator again. Go back to the General tab in the Properties view. Rename the operator in...

Continue Reading
FREE
Run your application
10 of 11
You are now ready to run this program or, in Streams Studio parlance, launch the build. In the Project Explorer, right-click MyMainComposite. You might need to expand MyProject and my.name.space. Select Launch > Launch Active Build Config To Running Instance. In the Edit Configuration dialog, click Apply, and then click Launch. You can set or change several options when you launch an application. However, for now, ignore those options. The Streams Launch progress dialog appears briefly. To see what happened, switch consoles from the SPL Build to the Streams Studio console. In the Console view, click the Display Selected Console button on the right to switch consoles. The Streams Studio console shows that job number 0 was submitted to the instance called StreamsInstance in the domain StreamsDomain. Because nothing else seems to have happened, you need to look for some result of what you’ve done. First, you’ll view the job running in the instance. Then, you’ll inspect the results. Switch to the Streams Explorer, which is the second tab in the view on the left, behind the Project Explorer. Expand the StreamsDomains folder, and the Resources and Instances folders under that. The Resources folder refers to the machines that are available to the domain. In this virtual machine, there is only one. Resource tags let you dedicate different hosts to different purposes, such as running runtime services or application processes. In this single-resource environment, this is not relevant. Tip: For convenience, the Streams Jobs and Streams Instances folders repeat information from the Instances folders under the listed domains, but they lack further drill-down information. You know that you have one instance StreamsInstance in the domain StreamsDomain. The entry tells you that it is the default instance, where launched applications will run unless otherwise specified, that it is running and its current wall clock time. (You might need to widen the view to see the status.) Expand default:[email protected] This shows a...

Continue Reading

Lab 2: understand the flow of data

FREE
Overview
2 of 8
In this lab, you will further develop the vehicle data filtering application and get a more detailed understanding of the data flow and the facilities in Studio for monitoring and examining the running application. To make this easier, you will make two enhancements that let you see what is happening before the data runs out: you will slow the flow down (left to its own devices, Streams is just too fast) and you’ll make it possible to read multiple files. This is a general design pattern for development and debugging. Prerequisites If you successfully completed the previous lab, skip this section and go to the next step, “Add operators to enhance monitoring.” If you did not successfully complete the previous lab, you can import a Streams project that has been prepared for you and that contains the expected results from Lab 1. To import the Streams project: In the Project Explorer, right-click the current project (MyProject), and select Close Project. This gets it out of the way for builds or name conflicts without deleting any files. In the top Eclipse menu, click File > Import. In the Import dialog, select IBM Streams Studio > SPL Project, then click Next. Click Browse. In the file browser, expand My Home. Scroll down, expand Labs, select IntroLab, and then cick OK. Select MyProject1, and click Finish. This starts a build, but you don’t need to wait until it finishes. In the Project Explorer, expand MyProject1 and then my.name.space. Double-click MyMainComposite to open it in the graphical editor. In the editor palette, right-click Toolkits. In the context menu, clear Show All Toolkits.
FREE
Add operators to enhance monitoring
3 of 8
Two new operators are needed to make your application easier to monitor and debug. The Throttle operator copies tuples from input to output at a specified rate rather than as fast as possible. The DirectoryScan operator periodically scans a given directory; for each new file that satisfies optional criteria, it sends out a tuple that contains the file’s full path. Instead of using the palette’s filter field to quickly pick up the operators you want, let’s browse the full palette to achieve the same result. In the graphical editor’s palette, expand spl (under Toolkits), and then spl.adapter. Drag DirectoryScan into the main composite. The editor names the operator DirectoryScan_4. Scroll down in the palette and expand spl.utility. Scroll down further, find Throttle. Drag and drop it onto the stream Observations, exactly as you did with the LocationType schema previously. (Make sure the stream is highlighted by green handles before you let go.) The operator will be called Throttle_5. The editor automatically connects the Observations stream to its input port and creates a new stream, with the same schema as Observations, from its output port to the input of Filtered. There is no need to adjust the schema of this new stream: The Throttle operator merely controls the rate at which tuples flow, without changing their contents. To straighten out the graph, click Layout and Fit to Content. Rename the new stream to Throttled. Rename the operator to the name of the stream by blanking out its alias. (That’s in the General tab of the Properties view; review Lab 1 if you forgot how to get there.) Drag a stream from the output of DirectoryScan_4 to the input of Observations. Click Layout > Fit to Content. Your graph should look like this at this point: Tip for input ports: The FileSource operator can have an input port, but it is optional. In the original...

Continue Reading
FREE
Define the new stream and operator details
4 of 8
Now, you need to define the schema for the stream from the DirectoryScan and tell that operator where to look for files. The Observations operator now gets its instructions from an input stream rather than a static parameter, so you have to adjust its configuration. Finally, you need to tell the Throttle the desired flow rate. The DirectoryScan operator’s output port supports only one schema: a single attribute of type rstring, which will hold the full path to the file. You can call that attribute anything you like. Select the output stream from DirectoryScan_4 and rename it to Files. In the Schema tab in the Properties view, click the first Name field (placeholder varName) and enter file. Press the Tab key to move to the next field (placeholder varType) and enter rstring. Remember to use content assist (Ctrl+Space) to reduce typing and avoid errors. Press Enter. In the editor, select the DirectoryScan_4 operator. In the Properties view, go to the Param tab and set the directory parameter to the value "/home/streamsadmin/data". Remember to include the double quotation marks. Rename the operator (to Files) by removing its alias. A FileSource operator knows which file or files to read either from a static parameter (called file) or from the tuples coming in on an input stream, but not both. Now that you are getting file names from a DirectoryScan operator, that file parameter you used previously is no longer needed. You’ll get an error if you keep it. Select the Observations operator in the editor. In the Properties view (Param tab), click the file parameter and then click Remove. The Throttle operator has a mandatory parameter for specifying the desired flow rate. It is a floating-point number with a unit of tuples per second. In the editor, select Throttled. In the Properties view (Param tab), click the Value field next to the rate parameter and enter 40.0. The decimal point is necessary to indicate...

Continue Reading
FREE
Monitor the application by using the instance graph
5 of 8
The Instance Graph in Streams Studio provides many ways to monitor what your application does and how data flows through a running job. This part of the lab explores those capabilities. There is much more to the Instance Graph than this section can cover, so don’t hesitate to go beyond these instructions and discover more on your own. Launch the application. In the Project Explorer, right-click the main composite (MyMainComposite) and select Launch > Launch Active Build Config To Running Instance. In the Edit Configuration dialog, click Apply if necessary, and then click Launch. Maximize the Instance Graph view. You now have two running jobs: the one you just launched and the one from the previous lab. The old one is dormant; it’s not getting any data. However, leave it running for now. To the right of the Instance Graph, a layout options drop-down menu and two selection panes for Layers and Color Schemes allow you to control the display. Explore the various options. The layout options control how the operators in the graph are grouped: By Composite This is the default. You see two boxes representing the two main composites, that is, the two applications and inside each composite, you see the operators that make up the application: three for the old job and five for the new one. By PE A PE is a Processing Element, which is like an operating system process. Operators can be combined (fused) into a single PE. This couples them tightly and reduces communication latencies. Operator fusion is a performance optimization topic beyond the scope of this lab. In the preconfigured instance, the default behavior is to fuse all operators running on the same resource. This layout option shows two PEs, one for each job. It looks the same as the By Composite view. By Resource Because the virtual...

Continue Reading
FREE
View stream data
6 of 8
While developing an application, you often want to inspect not just the overall tuple flow, but the actual data. Previously, you looked at the results file, but you can also see the data in the Instance Graph. This way, you don’t need to add FileSinks whenever you want to capture the output of a particular operator. Let’s look at the input to and output from the Filter operator to see whether it’s working as expected. In the Instance Graph, right-click the stream Throttled (output of the Throttled operator, input to Filtered). Select Show Data. In the Data Visualization settings dialog, verify that the tuple type is what you expect (attributes id, time, latitude, longitude, speed, and heading) and click OK. A Properties view appears. Repeat the previous step for the stream Filtered between operators Filtered and Writer. Move and resize both Properties views so that you can see both tables and the Instance Graph. Notice that, as expected, the Filtered stream contains only tuples with an ID value of C101 or C133 whereas the Throttle output contains a greater mix of vehicle IDs: When you have seen enough data, dismiss the two floating Properties views. In preparation for the next lab, cancel all jobs. If you used the Filter graph button to hide a job in the Instance Graph, bring it back. Click Filter graph, clear all options, and then click OK. Select all jobs in the instance graph by holding down the Ctrl key and clicking each one. Right-click one of them and click Cancel job. The Instance Graph should now be empty.

Lab 3: apply enhanced analytics

FREE
Overview
2 of 9
In this lab, you will enhance the app you’ve built by adding an operator to compute an average speed over every five observations, separately for each vehicle tracked. After that, you will use the Streams Console to monitor results. So far, the operators you’ve used look at each tuple in isolation, and there was no need to keep any history. However, for many analytical processes, it is necessary to remember some history to compute the desired results. In stream computing, there is no such thing as “the entire data set,” but it is possible to define buffers holding a limited sequence of consecutive tuples, for example, to compute the average over that limited subset of tuples of one or more numeric attributes. Such buffers are called windows. In this part, you will use an Aggregate operator to compute just such an average. Prerequisites If you successfully completed the previous lab, skip this section and go to Step 1. If you did not successfully complete the previous lab, you can continue with this lab by importing a Streams project that has been prepared for you and that contains the expected results from Lab 2. To import the Streams project: In the Project Explorer, right-click the current project (MyProject or MyProject1) and select Close Project. This gets it out of the way for builds or name conflicts without deleting any files. In the top Eclipse menu, click File > Import. In the Import dialog, click IBM Streams Studio > SPL Project. Then, click Next. Click Browse. In the file browser, expand My Home, scroll down, expand Labs, and select IntroLab. Click OK. Select MyProject2 and click Finish. This starts a build, but you don’t need to wait until it finishes. In the Project Explorer, expand MyProject2 and then my.name.space. Double-click MyMainComposite to open it in the graphical editor. In the editor palette, right-click Toolkits. In the context menu, clear Show All Toolkits.
FREE
Add a window-based operator
3 of 9
You will compute average speeds over a window separately for vehicles C101 and C133. Use a tumbling window of a fixed number of tuples: each time the window collects the required number of tuples, the operator computes the result and submits an output tuple, discards the window contents, and is again ready to collect tuples in a now empty window. Window partitioning based on a given attribute means that the operator will allocate a separate buffer for each value of that attribute—in effect, as if you had split the stream by attribute and applied a separate operator to each substream. The specifications are summarized in the following table: Specification Value Operator type Aggregate Window specification Tumbling, based on tuple count, 5 tuples Window partitioning Yes, based on vehicle ID (id) Stream to be aggregated Filtered Output schema id – rstring time – int64 avgSpeed – float64 Aggregate computation Average(speed) Results destination File: average.speed   Add the two required operators: In the graphical editor’s palette filter box, enter agg. Drag an Aggregate operator into the main composite. The editor calls it Aggregate_6. This is you main analytical operator. In the palette filter, enter filesink. Drag a FileSink into the main composite: FileSink_7. This will let you write the analytical results to a file. Fold the two new operators into the graph by connecting one existing stream and adding another: Drag a stream from Filtered to Aggregate_6. This means Aggregate_6 is tapping into the same stream that Writer is already consuming, so the schema is already defined. This is indicated in the editor by a solid arrow. Drag another stream from Aggregate_6 to FileSink_7. This stream does not yet have a schema, so the arrow is dashed. Click Layout and Fit to Content. Rename the new stream and operators: Rename the stream to Averaged. Rename the Aggregate operator to Averaged by blanking...

Continue Reading
FREE
Explore the Application Dashboard
5 of 9
Let’s look more closely at your running application. While the Management Dashboard is designed for administrators, the Application Dashboard is more useful for developers. You can even set up your own dashboard by saving a set of cards in your preferred arrangement with a query to focus on just the jobs that are of interest to you. In the title bar, click Management Dashboard > Open Dashboard > Application Dashboard. Some of the cards are equivalent to similar ones in the Management Dashboard: Metrics Scatter Chart: shows the same information as PEs. Metrics Bar Chart: by default this shows the same information as Resources. In addition, there are other cards with useful information: Summary card: shows at a glance the health or exception status of jobs, operators, streams, and congestion (and consistent regions, which this lab does not explore). Streams Tree: this is similar to the Streams Explorer in Studio. A Streams Graph: this is similar to the Instance Graph in Studio; if you have more than one job running, you must expand twisties to see their graphs. Flow Rate Chart: shows the tuple submission rates of all source operators from all jobs. The Flow Rate Chart is interesting. It shows sudden bursts of activity separated by periods of quiet. The source operator (FileSource, in this case) reads the file as fast as it can until it runs out of data. This fills the input port buffer of the Throttle, which slowly draws down that buffer at 40 tuples per second. At just about the right time, when the Throttle operator is almost out of data, the same file is reported to the FileSource, which reads it again in one sharp burst. The chart shows the flow rate at zero most of the time with peaks up to just over 600 tuples per second spaced 45 seconds apart. Note that the chart shows...

Continue Reading
FREE
Monitor the domain with the Streams Console
9 of 9
In this section, you will learn about the Streams Console, which is a general-purpose and web-based administration tool for IBM Streams. You will explore various parts of the Console, such as the application dashboard. The Streams Console The Streams Console is a web-based administration tool. Each Streams domain has its own console environment. The console interacts with one specific domain at a time based on its Streams Web Service (SWS) URL. In addition to managing and monitoring instances, resources, jobs, logging and tracing, and more, it serves as a simple data visualization tool. It is not intended to be a production-quality dashboard, but mainly a useful facility for monitoring applications and understanding data during development. There are several ways to launch the Console: with a desktop launcher, or by looking up the URL and opening it directly in Firefox or any other browser from any machine with HTTPS access to the Streams environment. Normal user authentication and security apply. In this lab you open it from within Studio. To open the Streams Console: In the Streams Explorer, expand Streams Domains. Right-click StreamsDomain (the only domain listed) and select Open Streams Console. In the Untrusted Connection page, expand I Understand the Risks and click Add Exception. If the Add Security Exception dialog appears, keep Permanently store this exception selected and click Confirm Security Exception. Log in as user streamsadmin with the password passw0rd. The initial view is the Management Dashboard, which monitors the domain from an administrator’s point of view. Each of the views, called cards, shows a specific type of object (PEs, jobs, instances, and so on) with a graphical view that lets you see at a glance what is going on. The image shows a snapshot highlighting some of the graphically depicted information. For example, the PES card shows quickly which processing elements consume little memory and CPU (in the bottom left) and which consume...

Continue Reading

Lab 4: develop modular applications

FREE
Add a test for unexpected data
3 of 10
A best practice is to validate the data that you receive from a feed. Data formats might not be well defined, ill-formed data can occur, and transmission noise can also appear. You do not want your application to fail when the data does not conform to its expectations. In this lab, you will be receiving live data. As an example of this kind of validation, you will add an operator that checks one attribute: the vehicle ID (id). In the data file all.cars, all records have an id value of the form Cnnn (presumably, with “C” for “car”). Even though it doesn’t at the moment, assume that your application depends on this format; for example, it could take a different action depending on the type of vehicle indicated by that first letter (say, “B” for “bus”). Also, there might be a system requirement that all vehicle IDs must be exactly four characters. Rather than silently dropping the tuple when it does not match requirements, it is better practice to save the “bad” data so that you can audit what happened and later perhaps enhance the application. In summary, the vehicle ID (id attribute) specifications are as follows: Criterion Value First character “C” Length 4   Therefore, if any data comes in with an unexpected value for id, your program will shunt it aside as invalid data. There are several operators that can take care of this. Which one you use is to some degree a matter of taste. You have already used one that works well, which is the Filter. So let’s use a different one here. The Split operator sends tuples to different output ports (or none) depending on the evaluation of an arbitrary expression. This expression can, but does not need to, involve attribute values from the incoming tuple. It...

Continue Reading
FREE
Split off the ingest module
4 of 10
Now, it gets interesting. In a Streams application, data flows from operator to operator on streams, which are fast and flexible transport links. The Streams application developer is not concerned with how these are implemented. They might work differently between operators running on different hosts, in different PEs on the same host, or in the same PE, but the logic of the graph stays the same. An application requires explicit source and sink operators to exchange data with the outside world through file I/O, database connections, TCP ports, HTTP REST APIs, message queues, and so on. However, for Streams applications that run in the same instance, another mode of data exchange is possible: Exportand Import. An application can export a stream, which makes it available to other applications running in the instance. One or more applications can import such a stream based on flexible criteria. After exported streams are connected, they behave like all the other streams that run between PEs in an application, and they are fast and flexible. It’s only at the time a job is submitted or canceled that the runtime services get involved to see which links need to be made or broken. After that’s done, there is no difference in runtime behavior (well, almost none, but the difference is beyond the scope of this lab), and there is no performance penalty. But there is a tremendous gain in flexibility. Application stream connections can be made based on publish-and-subscribe criteria, and this allows developers to design completely modular solutions where one module can evolve and be replaced, removed, or replicated without affecting the other modules. It keeps individual modules small and specialized. In the lab so far, you built a monolithic application, but there is a logical division. The front end of the application from DirectoryScan to Throttle reads data, in this case...

Continue Reading
FREE
Add a live feed
5 of 10
Rather than building a live-data ingest application from scratch, you will import a Streams project that has already been prepared. This application uses an operator called HTTPGetXMLContent to connect to a web services feed from NextBus.com and periodically (every 30 seconds) download the current locations, speeds, and headings of San Francisco Muni’s buses and trams. That operator comes from a version of the com.ibm.streamsx.inet toolkit that is available only on GitHub. This toolkit was installed in your environment when you installed the lab files. The application parses, filters, and transforms the NextBus.com data and makes the result look similar to the file data although some differences remain. It exports the resulting stream with a set of properties that match the subscription of your processing application. When you launch the NextBus application, the connection is automatically made and data flows continuously until you cancel the job. To add a live feed: Before you can use the NextBus project, tell Studio where to find the version of the com.ibm.streamsx.inet toolkit that it depends on: In the Streams Explorer, expand IBM Streams Installations [4.2.0.0] > IBM Streams 4.2.0.0 > Toolkit Locations. Right-click Toolkit Locations and select Add Toolkit Location. In the Add toolkit location dialog, click Directory and browse to My Home > Toolkits. (My Home is at the top; the dialog starts in the separate Root tree.) Select Toolkits and click OK. Click OK If you expand the new location (Local) /home/streamsadmin/Toolkits, you see com.ibm.streamsx.inet[2.7.4]. This is different from the version of this toolkit that is installed with Streams (2.0.2), so the NextBus project can select the right one by version. The 2.0.2 version is under the location STREAMS_SPLPATH. Import the NextBus project: In the top Eclipse menu, click File > Import. In the Import dialog, click IBM Streams Studio > SPL Project. Then, click Next. Click Browse and in the file browser, expand My Home and select Toolkits. Click OK. Select NextBus and click Finish. Expand project NextBus and namespace ibm.streamslab.transportation.nextbus. Launch the application NextBusIngest. You might need to wait...

Continue Reading
FREE
Show location data on the map
6 of 10
The NextBus toolkit comes with another application that lets you view data in a way that is more natural for moving geographic locations, namely on a map. Similar to MyMainComposite, this application connects to the kind of stream exported by NextBusIngest and FileIngest. Without any further configuration, it can take the latitude and longitude values in the tuples and an ID attribute, and generate an appropriate map. The map is simple and is intended only as a quick method for learning about your data. In the NextBus toolkit, launch NextBusVisualize. In the Edit Configuration dialog, scroll down to the Submission Time Values. Note the value of the port variable: 8080. Widen the Name column to see the full name. In the Instance Graph, each of the two exported streams is connected to each of the downstream jobs. The arrows look a bit confusing, but if you select each of the branches, you can untangle them. To open the map in Firefox, double-click the Live Map desktop launcher. Minimize the Studio window or move it out of the way to see it. You will see a map of the San Francisco Bay Area with a large number of green bus markers crowding the city and blue cars concentrated in the downtown area. Use the map controls or mouse wheel to zoom in and pan (hold down the left mouse button to drag and center the map) so that you can see the individual vehicles. The buses jump around as their locations are updated. You can see that the map is live! The map refreshes every second, but remember that NextBus data is updated only every 30 seconds. If you zoom in far enough by clicking the zoom tool three times from the starting level, you can see the simulated cars from the file move...

Continue Reading
FREE
Optional: Investigate back-pressure
7 of 10
This section builds on your exploration of the Streams Console in Lab 3. It assumes that you have kept the job from Lab 3 running for at least 40 minutes. To proceed, go back to the Application Dashboard in the Streams Console. Because the file is read every 45 seconds and the throttled drawdown takes a little longer than that (47.55 s), the Throttle’s input buffer eventually fills up. If you let the job or jobs run long enough, a red square or yellow triangle will show in the PE Connection Congestion row of the Summary card. (The congestion metric for a stream tells you how full the destination buffer is, expressed as a percentage.) At the same time, the Flow Rate Chart shows more frequent, lower peaks: the bursts are now limited by the filling up of the Throttle’s input buffer instead of by the data available in the file. Hover over the information tool in the PE Connection Congestion row in the Summary card to find out exactly which PEs are congested and how badly. Also, notice how the pattern in the Flow Rate Chart is now different compared to when the job was young. View the information panel that shows Observations > Observations at the top of the list. This means that congestion is observed on a stream called Observations at the output port of an operator of the same name. (You did that by removing the operator alias.) Scroll down the right side of the panel to see the congestionFactor metric, which is at the maximum value of 100. Note, however, that while Observations is the one that suffers congestion, it is the next operator named Throttledthat causes it. What will happen eventually if you let this run for a long time? Will the FileSource operator continue to read the entire file every 45 seconds? What happens to its input buffer on the port receiving the file...

Continue Reading

Summary

More Courses by this Instructor