Skip to content

Scala Collection Views

jsuereth edited this page Nov 28, 2010 · 3 revisions

I'm doing investigation into why the .view/.force paradigm for collections does not bank out in situations where it should perform well, for example, it seems the following should be faster by converting to a view first and forcing to a collection later:

  val a = Array.range(1,1000000).map(_.toString)
  val views = a.view.zipWithIndex.filter(x => x._1.startsWith("1") && (x._2%3==0)).map(_._1).force
  // is slower than
  val collections = a.zipWithIndex.filter(x => x._1.startsWith("1") && (x._2%3==0)).map(_._1)
  // is slower than
  val closedstate = { var i=-1; a.filter(_.startsWith("1") && { i += 1; i%3==0 }) }

In the above, it would seem that the deferred execution of each row when using view, should eventually be inlined by the JIT and perform nearly equivalent to the lower piece of code. The reality is that using views is significantly slower than the imperative version.

Views are the best option currently when performing any sort of "take" operation, which is somewhat rare. Here's an example from SPerformance:

object ViewShootOut2 extends sperformance.dsl.PerformanceDSLTest {
  import collection.mutable.ArrayBuffer
  def setup(size : Int) = {
    val data : Iterable[Int] = ArrayBuffer.range(1,size)
    new Iterable[Int] {
      override def iterator : Iterator[Int] = data.iterator
    }
  }
  performance of "views" in {
    measure method "takeWithMap" in {
      withSize upTo 1000 withSetup setup run { col =>
        col.view.map(_ * 2).take(100).force
      }
    }
    measure method "takeWithFilter" in {
      withSize upTo 1000 withSetup setup run { col =>
        col.view.filter(_ % 2 == 0).take(100).force
      }
    }
    measure method "takeWithZipAndFilter" in {
      withSize upTo 1000 withSetup setup run { col =>
        col.view.zipWithIndex.filter(_._2 % 2 == 0).take(100).force
      }
    }
  }
  performance of "non-views" in {
    measure method "takeWithMap" in {
      withSize upTo 1000 withSetup setup run { col =>
        col.map(_ * 2).take(100)
      }
    }
    measure method "takeWithFilter" in {
      withSize upTo 1000 withSetup setup run { col =>
        col.filter(_ % 2 == 0).take(100)
      }
    }
    measure method "takeWithZipAndFilter" in {
      withSize upTo 1000 withSetup setup run { col =>
        col.zipWithIndex.filter(_._2 % 2 == 0).take(100)
      }
    }
  }
}

The resulting graphs are below.

The first graph shows calling filter and then take. As you can see, the view wins and limits the runtime cost to O( takesize ).

The second graph shows a similar win from views.

This is why it's odd that zipWithIndex causes such a massive slowdown of views. You can see here that the zipWithIndex version causes a large overhead that can eventually be overcome by the early-exit from take:

My theories on this revolve around the way zip is implemented on views, especially IndexedSeq views.

Clone this wiki locally