-
Notifications
You must be signed in to change notification settings - Fork 338
Open
Description
The following code can be improved by better leveraging SQL:
// Calculate statistics based on the content size.
Tuple4<Long, Long, Long, Long> contentSizeStats =
sqlContext.sql("SELECT SUM(contentSize), COUNT(*), MIN(contentSize), MAX(contentSize) FROM logs")
.map(row -> new Tuple4<>(row.getLong(0), row.getLong(1), row.getLong(2), row.getLong(3)))
.first();
System.out.println(String.format("Content Size Avg: %s, Min: %s, Max: %s",
contentSizeStats._1() / contentSizeStats._2(),
contentSizeStats._3(),
contentSizeStats._4()));
Namely, SQL already suppports calculating an average via the AVG function. Therefore, the improved code snippet may look like as follows:
// Calculate statistics based on the content size.
Tuple3<Double, Long, Long> contentSizeStats =
sqlContext.sql("SELECT AVG(contentSize), MIN(contentSize), MAX(contentSize) FROM logs")
.map(row -> new Tuple3<>(row.getDouble(0), row.getLong(1), row.getLong(2)))
.first();
System.out.println(String.format("Content Size Avg: %s, Min: %s, Max: %s",
contentSizeStats._1(),
contentSizeStats._2(),
contentSizeStats._3()));
Metadata
Metadata
Assignees
Labels
No labels