Improvement to the Log Analyzer SQL example

The following code can be improved by better leveraging SQL:

```
// Calculate statistics based on the content size.
Tuple4<Long, Long, Long, Long> contentSizeStats =
    sqlContext.sql("SELECT SUM(contentSize), COUNT(*), MIN(contentSize), MAX(contentSize) FROM logs")
        .map(row -> new Tuple4<>(row.getLong(0), row.getLong(1), row.getLong(2), row.getLong(3)))
        .first();
System.out.println(String.format("Content Size Avg: %s, Min: %s, Max: %s",
    contentSizeStats._1() / contentSizeStats._2(),
    contentSizeStats._3(),
    contentSizeStats._4()));
```

Namely, SQL already suppports calculating an average via the AVG function. Therefore, the improved code snippet may look like as follows:

```
// Calculate statistics based on the content size.
Tuple3<Double, Long, Long> contentSizeStats =
    sqlContext.sql("SELECT AVG(contentSize), MIN(contentSize), MAX(contentSize) FROM logs")
        .map(row -> new Tuple3<>(row.getDouble(0), row.getLong(1), row.getLong(2)))
        .first();
System.out.println(String.format("Content Size Avg: %s, Min: %s, Max: %s",
    contentSizeStats._1(),
    contentSizeStats._2(),
    contentSizeStats._3()));
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improvement to the Log Analyzer SQL example #77

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improvement to the Log Analyzer SQL example #77

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions