Table of Contents

Downloading Web Resources with http.stream – basics

Introduction

The http.stream class is one of Reggae stream classes, in other words data sources. In a chain of Reggae objects, a http.stream instance will be always the first object, having only one, output port. A http.stream object may be also used standalone, not connected to anything, just to retrieve any data resource reachable via HTTP protocol and particularly its GET and POST requests. From this point of view, http.stream is just embeddable HTTP/1.1 client with simple yet powerful API. A brief list of its features is given below:

The class has some disadvantages however. Some of them may be removed in future versions:

Minimal Example

When we skip any error handling, the whole process of downloading data via HTTP protocol reduces to three lines of code:

#define DATA_LENGTH 7465      /* just example value */

UBYTE buffer[DATA_LENGTH];    /* place for data */
Object *http;

http = NewObject(NULL, "http.stream", MMA_StreamName, "www.morphzone.org", TAG_END);
DoMethod(http, MMM_Pull, 0, buffer, DATA_LENGTH);
DisposeObject(http);

We assume here, http.stream class has been loaded previously with OpenLibrary() (see Opening and closing individual classes). The code will download first 7465 bytes of MorphZone main page (HTML code), assuming there will be no error. This assumption is rather risky, because a network operation can fail for numerous reasons. Then we will be calling method on the NULL pointer and disposing NULL later, which can even lead to application crash. For this reason http.stream offers a few ways for handling errors. They will be discussed later, for now a minimal error handling is checking NewObject() result against NULL. This is used in a simple example downloading the first 1000 bytes of a resource specified in the commandline and dumping them into the console. Note that using this program for binary resources (like images) may result in rather weird output... I recommend running this example along with MediaLogger, to learn http.stream protocol debugging features.

Length of data

Usefulness of the above example is limited. It downloads only predefined amount of data (or less, if the resource turns out to be shorter). Usually we want to download all the data and this implies getting the length of it somehow. A few scenarios are possible:

The length of data is known before downloading

This is the easiest, but the most rare case. It can be handled exactly as in the example from the previous section – a statically sized buffer and single MMM_Pull() call.

The server sends a static file

Then it knows the size and passes it in the response header (Content-Length field). The http.stream object extracts it automatically. Then data length may be obtained by getting MMA_StreamLength attribute. It means that the length is known before data downloading, so a buffer may be allocated dynamically. The attribute is 64-bit, so it should be get as follows:

QUAD length;

length = MediaGetPort64(http, 0, MMA_StreamLength);

This example shows MMA_StreamLength usage. It creates the object, asks of the data length, allocates a buffer, downloads data to the buffer and finally stores the buffer in a file.

The server sends dynamically generated data

Data are usually generated by some server-side script, written in PHP or other language. In this case server does not know the length a priori so it switches to HTTP chunked transfer mode. The http.stream object handles it automatically, and reports 0 as MMA_StreamLength, which means that the length is unknown. The only way to process such data is downloading it in blocks in a loop until the object reports MMERR_END_OF_DATA error code. The loop code may look like this:

LONG chunk, error = 0;

while (!error)
{
  chunk = DoMethod(http, MMM_Pull, 0, buffer, BUFFER_SIZE);

  /* Do something with 'chunk' bytes of data in 'buffer'. */

  if (chunk < BUFFER_SIZE)
  {
    if (MediaGetPort(http, 0, MMA_ErrorCode) == MMERR_END_OF_DATA))
    {
      break;      /* downloading finished */
    }
    else
    {
      error = 1;  /* downloading failed */
    }
  }
}

The same loop, just enhanced with progress and error reporting is used in the complete example being just a Reggae based, very simple, universal HTTP downloader application. What may be interesting, it deals properly with data longer than 4 GB, assuming the filesystem of destination file is 64-bit.

It is important that RFC 2616, the HTTP specification, does not specify, that static files must be served without chunked transfer. On the other hand the server is not forced to use chunked transfer for dynamically generated contents. Assumption that server will not use chunks for a file just because the file is static one, may fail. Then, the safe way is to use download loop always and treat MMA_StreamLength as a hint only.