How do I avoid Clojure's chunking behavior for lazy seqs that I want to short circuit?

Tag: clojure , chunking , lazy-sequences Author: echo1125 Date: 2010-07-19

I have a long, lazy sequence that I want to reduce and test lazily. As soon as two sequential elements are not = (or some other predicate) to each other, I want to stop consuming the list, which is expensive to produce. Yes, this sounds like take-while, but read further.

I wanted to write something simple and elegant like this (pretending for a minute that every? works like reduce):

(every? = (range 100000000))

But that does not work lazily and so it hangs on infinite seqs. I discovered that this works almost as I wanted:

(apply = (range 100000000))

However, I noticed that sequence chunking was resulting in extra, unnecessary elements being created and tested. At least, this is what I think this is what happening in the following bit of code:

;; Displays chunking behavior in groups of four on my system and prints 1 2 3 4
(apply = (map #(do (println %) %) (iterate inc 1)))

;; This prints 0 to 31
(apply = (map #(do (println %) %) (range)))

I found a workaround using take-while, and count to check the number of elements taken, but that is rather cumbersome.

Should I politely suggest to Rich Hickey that he make some combination of reduce and every? short circuit properly, or am I missing some obvious way that already exists?

EDIT: Two kind people posted solutions for avoiding chunking on the lazy sequences, but how do I avoid chunking when doing the apply, which seems to be consuming in chunked groups of four?

EDIT #2: As Stuart Sierra notes and I discovered independently, this isn't actually chunking. It's just apply acting normally, so I'll call this closed and give him the answer. I included a small function in a separate answer to do the reduce'ing part of the problem, for those who are interested.

Best Answer

CORRECTED TWICE: A simpler way to un-chunk a lazy sequence:

(defn unchunk [s]
  (when (seq s)
      (cons (first s)
            (unchunk (next s))))))

First version omitted (when ... so it returned an infinite seq of nil's after the input sequence ended.

Second version used first instead of seq so it stopped on nil.

RE: your other question, "how do I avoid chunking when doing the apply, which seems to be consuming in chunked groups of four":

This is due to the definition of =, which, when given a sequence of arguments, forces the first 4:

(defn =
  ;; ... other arities ...
  ([x y & more]
   (if (= x y)
 (if (next more)
   (recur y (first more) (next more))
   (= y (first more)))


That is better, but I'm still getting chunking (albeit only 4) when I do this: (apply = (map #(do (println %) %) (unchunk (range))))
This is not chunking. Compare (first (map #(doto % println) (unchunk (range)))) to (first (map #(doto % println) (range))). This is something with apply and/or =. Also note: you might want to use (defn unchunk [s] (when-let [s (seq s)] (cons (first s) (unchunk (rest s))))), see (unchunk [1 2 3]).
kotorak- Fixed the missing when. Now your examples work correctly.
Thanks to @chrishouser for correcting my mistakes.

Other Answer1

Looking in clojure.core at the definition of apply makes it obvious as to why it was chunking in groups of four when apply is used with an infinite sequence. Reduce doesn't short circuit, I'm left writing my own solution:

(defn reducep
  "Like reduce, but for use with a predicate. Short-circuits on first false."
  ([p coll]
     (if-let [s (seq coll)]
       (reducep p (first s) (next s))
  ([p val coll]
     (if-let [s (seq coll)]
       (if-let [v (p val (first s))]
         (recur p (first s) (next s))

Then using Stuart's unchunk (with an extra and)

(defn unchunk [s]
   (cons (first s)
         (and (next s)
              (unchunk (next s))))))

I get:

(reducep = (map #(do (print %) %) (unchunk (range)))) ;; Prints 01, returns false
(reducep = (map #(do (print %) %) (repeat 20 1))) ;; returns true
(reducep = (map #(do (print %) %) (unchunk [0 0 2 4 5]))) ;; Returns false
(reducep = (map #(do (print %) %) (unchunk [2 2 2 2 2]))) ;; returns true

If that works for you too, mod this up.

EDIT: Stuart's modified version of unchunk after his edit is probably preferable to the one in this earlier post.

Other Answer2

I found this post while hitting a time limit on a 4clojure problem and I found another way to avoid the 32-chunks:

;; add another dummy sequence parameter to the map:
(apply = (map #(do (prn %2) %) (range) (range)))

The higher arity forms of map don't appear to use chunked sequences (clojure 1.5)

You have to do something with the second parameter, so being explicit about that might be better:

(apply = (map (fn [i _] (prn i) i) (range) (range)))

This isn't as neat as the other solutions but might be good for quick and dirty uses, like testing "is this slow because of chunking?".

Regarding apply, you could use partition to get pairs from the sequence and = them:

(every? #(apply = %) (partition 2 1
    (map (fn [i _] (prn i) i) (range) (range))))

Although reducep looks useful too.

PS. I don't want to give the impression that the chunked sequence is slower, it's not. My issue is that the 4clojure test case calls "first" on my seq generating function over a range of values, so chunking means I do 32 times the work. (PPS. My code is still too slow)