The input data in this case is a GnuCash accounts file, which looks like this:
<gnc:transaction version="2.0.0"> ... <trn:splits> <trn:split> ... <split:value>4270/100</split:value> <split:account type="guid">b5e2367d505a1fdabd9bf53b1b307a6a</split:account> </trn:split> <trn:split> <split:value>-4270/100</split:value> <split:account type="guid">9ded5f95a5c0aca5b019beec91282bcf</split:account> </trn:split> </trn:splits> </gnc:transaction>
Note that even though the file is XML, in this case there's no nested structure or other potentially difficult stuff - GnuCash stores the transactions flatly and sequentially. Therefore we don't really need to use a proper XML parser for this one.
The idea then is to scan through the file and, for a given account, pair up the positive and negative transactions for that account. For instance, given the transaction above, we'd expect to eventually find another transaction with the value -4270/100 for the account b5etc (and a transaction with the value 4270/100 for the account 9detc). If the counteracting transaction isn't found, the account is out of balance. It's easy to make a parser that can tell us which transactions are the guilty ones, instead of just summing everything up like GnuCash does: put the values in a map and you'll see exactly which ones cancel each other out.
Here's a simple but sufficient parser in pseudocode:
TARGET = 'b5e2367d505a1fdabd9bf53b1b307a6a' in_trans = False curr_val = -123 for line in open("input"): if line.startswith('<gnc:transaction'): in_trans = True elif line.startswith('</gnc:transaction'): in_trans = False elif in_trans: if line.startswith('<split:value>'): curr_val = line.replace('<split:value>', '').replace('</split:value>', '') elif line.startswith('<split:accounttype="guid">'): if line.find(TARGET) >= 0: process_value(curr_val) def process_value(val): # add it to a hash map, etc
Works, good enough.
The Clojure version is rather different. The following is based on the elevator example from The Joy of Clojure. I added simple line parsing and some state variables.
(def target-guid-pattern (re-pattern ".*b5e2367d505a1fdabd9bf53b1b307a6a.*")) (defn parse [lines] (letfn [(not-in-trans [[line & rest] result] #(cond (empty? rest) result (re-matches #"^[ \t]*<gnc.transaction.*" line) (in-trans rest result) :else (not-in-trans rest result))) (in-trans [[line & rest] result] #(cond (re-matches #"^[ \t]*</gnc:transaction.*" line) (not-in-trans rest result) (empty? rest) result ;; actually an error (premature end of input) :else (cond (re-matches #"^[ \t]*<split:value>.*" line) (let [val (-> line (.replaceAll "<split:value>" "") (.replaceAll "</split:value>" "") (.replaceAll " " "") (read-string))] ;; note: probably unsafe (in-trans rest (assoc result :last-value val))) (and (re-matches #"^[ \t]*<split:account type.*" line) (re-matches target-guid-pattern line)) (let [val (get result :last-value) vals (get result :values) new-vals (update-values vals val)] (in-trans rest (assoc result :values new-vals))) :else (in-trans rest result))))] (trampoline not-in-trans lines {:values {}, :last-value -1})))
(Note that in real code you don't want to use read-string for parsing random input. Since this is a quick hack, I'm using it here to easily turn the values into Clojure fractions, which are nice and exact.)
The idea is that the different states are represented by mutually recursive functions. Since the functions return ready-made functions that represent the next state, Clojure's trampoline can ensure that the recursive function calls happen in constant space. The resulting code turns out to be nicer than the simple if-else hack; the function representation of states is more natural. The performance, though obviously linear in this case, might be somewhat worse than the if-else parser's, but I haven't tested this.
A benefit of the functional representation is that it can be relatively easily extended into a more proper parser in case you do have arbitrary recursive structures to deal with. Of course using an existing parser where applicable will probably be a better solution, though a lot more boring.
To finish up, here's a simple update-values function that remembers the values not yet balanced:
(defn update-values [val-map val] (let [val-sign (if (neg? val) -1 1) val-abs (* val val-sign) curr-count (get val-map val-abs 0) new-count (+ curr-count val-sign) new-map (if (= 0 new-count) (dissoc val-map val-abs) (assoc val-map val-abs new-count))] new-map))
And to parse an input file, you can use slurp to read it into memory:
(def input (map #(.replaceAll % "\n" "") (re-seq #".*\n" (slurp "input")))) (parse input)
Or use a reader such as the one in duck-streams to read line by line:
(with-open [rdr (clojure.contrib.duck-streams/reader "input")] (parse (line-seq rdr)))
That's all for now - thanks for reading.
No comments:
Post a Comment