|
In part II of this tutorial we'll take a look at creating our job in Talend Open Studio (TOS); but before we get started I just want to go into a bit more technical background stuff that we'll need along the way.
First off is the data file format. My weather software (Virtual Weather Station) stores its data in a file with each value separated by a comma. This type of file is referred to as a Comma Separated Values, or CSV, file. For this project to work you'll need to
- Find out where on your PC your weather software stores its data file.
- The format of the file, is it a CSV file or something similar. This project will work with any text based data file, you might just have to adjust things as you go along.
- The order of the fields in the file. You should be able to find this from your weather station software documentation.
- The position of the "key field" in the file.
The key field in this case is probably a field containing the date and time. For example in the VWS data file the key field is the first field and contains values in the format YYMMDDHHMI, for example 201003171442. The reason this is the key field is that it uniquely identifies each row of data - you can't have two rows with the same date and time. It's possible your data file might have the date and time split into two fields, in which case your "key" is a combination of the two. In database terms this key is called the Primary Key.
OK, that's enough chit chat, let's get down to some work. The first thing to do is to open Talend Open Studio (TOS). When you first open it you'll see something like this:

Before we can do anything we have to set up a repository by clicking the ... button indicated. This is simply where we'll be storing our project, which in this case will be locally on our PC:

Select a Local repository, enter your email address and choose where you want your workspace. I've put mine under my Documents folder; but you can leave yours in the default location if you wish. Click OK to set up the repository:

Now click the drop down and select "Create a new local project", then click Go!

Enter a name for your project, the description is optional. Then select the Generation language, here we're using Perl; but you can use Java if you want. However they'll be a step later on that is Perl specific so unless you know Java then stick to Perl. Don't be concerned that we'll be writing huge screeds of code, it's only a very small step that needs to be written in Perl. Finally click Finish to create the project.

Click the Open button to open the project in TOS:

Once you have the workspace open this is how it should look. The main areas we'll be working with are:
- The repository. Here we set up information about our data, where it's coming from and where it's going to. Once this information is in the repository it can be used in other projects.
- The Palette. This will contain all the methods we can use to manipulate the data. It's empty now, once we create a job it'll be populated.
- The Job Design area. Here we'll manipulate the data, much more on this later.
- Tabbed area. Here there are many tabs that will allow us to configure our data objects and methods and run our job.
Lastly let's create a Job. Right click on Job Designs (Repository top left) and select "Create job":

Enter the job details. You have to give the job a name, the rest is optional.

When you click Finish you'll be returned back to TOS, your Palette will be populated with all the many tools you can use and the Job Design area will become an active grid.
That's all for now. In the next installment we'll be creating some data objects and taking our first look at them in TOS. |