Tuesday 25 December 2018

Managing the monthly Flow allocation using a scheduled ftp file grab

Running a flow for every new file in the FTP repository is eating away my monthly allowance
Therefore since I'm not interested in real time monitoring, a more efficient approach is to only fetch files once a day. Note that the ftp directory list gives a body on which we can run a 'foreach' loop, to then only select the wanted files whose filenames satisfy the condition.
If you got this far, Merry Xmas!!!

Sunday 23 December 2018

FTP fetching and maintaining filename with Microsoft Flow

In previous posts I looked at grabbing web content with microsoft flow on a scheduled trigger, and triggering similar website grabs from RSS feeds. A more old fashioned scenario involves downloading files from an FTP server. In that case the content already exists in the form of a file and has a filename, so there is no need to 'construct' a unique filename, we can simply name it in the same way that it is named on the server. On the other hand, this can be seen as an advantage over traditional ftp clients in that we can actually rename our files based on our own convention rather than be forced to keep the original filenames.
Finally, having changed my mind about downloading all files and wanting to introduce a condition, I find that I was wrong in my previous post, I can drag and drop the create file component inside the 'yes' outcome of the condition.


Monday 17 December 2018

Operational Intelligence basics: Looking for data loss with a scheduled data feed

It is worrying how many people, from the novice visualisation enthusiast to the experienced data scientist, just assume all is well with the underlying dataset and go on to visualise it, feed it through algorithms etc. 

My first use of Flow was to capture DX cluster data. I was requesting an XML every few hours, which contained 500 records. Now the websource is designed for near real time monitoring, not for people like me to download a complete archive. So sometimes 500 records do not go far back enough to avoid having a gap with the previous fetch, as is shown in the morning of the 9th December below. Colouring by the filename (effectively the date and time of fetch) helpfully shows that the gap corresponds to a colour change, and very possibly means there was data loss. On the other hand the gap on the 12th December happens 'within' a particular colour band, i.e. in the middle of a fetched file. So it is probably a genuine lack of activity rather than data loss. I have subsequently changed my flow to do more frequent fetches. 

But what about the actual lack of activity? We look in a bit more detail by adding frequency in 10s f kHz on the vertical axis and being a bit more careful about accuracy of the thickness on the horizontal axis using the calculated field constant 1/(24*60) for the minute.
Now compare this slow dying down and picking up again to a data gap below:

Sunday 16 December 2018

Conditionals in Microsoft Flow for RSS feed processing

For the Flow aficionados amongst you, check out the Flow online conference videos on YouTube.
 
I've referred to the golden age of web 2.0 in Yahoo! in a previous post. One of the cool products of that time was Yahoo! Pipes that provided effectively pipework for RSS feeds. 

With similar applications in mind, Microsoft flow can be a worthy replacement. I needed to only retrieve the web page linked from an RSS feed if the title contains a particular keyword. See the screenshot below. The flow web editor does not allow for dragging and dropping an action into the alternative paths following the conditional. I guess this will be a possibility when editing Flows in Visio becomes available. This might look like a peculiar combination to those of us who think of Visio as a tool for drawing diagrams, but Visio already has Sharepoint workflow functionality, and Microsoft Flow is to replace Sharepoint workflows as you'll hear in the conference videos.


Monday 10 December 2018

Using Excel to turn XML format data into a table

Back in the early noughties when I was trying to learn a bit about web development, XML was all the rage. These days it has been displaced in various applications by json and other formats. Configs, that should have been safe XML territory, are also threatened by json, with Microsoft Flow being an example of such use of json. Even so, a number of legacy uses of XML are still going and it is worth being able to decipher it.

Of course you could simply open an XML file in an editor and do lots of manual editing to turn it into csv. At the other end of the spectrum, scripting languages are good at parsing this format. I have found the Python XML Element Tree library useful for that.

A middle road exists, which is to simply drag and drop the file on Excel. This functionality is hardly surprising given that the excel file format itself is based on XML. Using this example data: http://dxlite.g7vjr.org/?band=vhf&limit=500&xml=1
Ignore the style pop up question, then you get a few options:
and finally a warning about schema.
Sticking with all the defaults results in the desired table in excel
Things of course will be more complicated with more complicated XML data, and it will be worth looking at the file in an editor, and considering using a proper script to convert it. But excel can be a useful quick and dirty solution.

Saturday 8 December 2018

Fixing the flaws of Flow

In a previous post I complained about Flow not being able to compress files on the go. The issue with that was that as my files were not stored locally by default, once I started the archiving process with 7zip it would first cause the download of the file locally, then add it to the archive. Often with a flaky network something would disrupt the download, which would then mess up with the archive. The workaround has been to specify that all the files in that folder are stored locally, thus decoupling the download from the archiving.

I know, not something to write home about. Let's look at the config of the HTTP component in more detail. Oops! There is an automatic decompression button set to 'On' by default! Make sure it's off unless you really want decompression.

As I had a few failed runs, I've changed the retry policy, from the exponential default starting with an interval of seconds, to a fixed interval with quite a few minutes interval. I find this works better for a website that has gone temporarily down, especially given that I don't particularly want the flow to be responding in real time. The fact that this can be done from inside the config rather than by adding more steps to the flow is something I learned from the great Serge Luca in his Sharepoint Saturday presentation.