underscore源码解读 | 黯羽轻扬

写在前面

源码之前，了无秘密。

不知道从哪里听说的，但好像有一定道理。上次读的是Step，100来行代码实现的异步流程控制方案，读完结论是API设计很巧妙（只提供了一个API），异步流程控制方面一般，所以没偷学到什么实用招式，只拿到一些“玄乎”的东西：

The general idea is to not impose structure on the programmer, but rather help them write good callback based functions.

从Node异步流程控制一路跑下去，又发现了写co的tj，发现此人声望很高，就多了解了一点八卦：

haha wow I’m impressed that you found out I started with design! When I was a few years into design I was playing around with Flash which led me to scripting. Later when I was doing design work for a local company in Victoria I decided that if I was going to do side work that I would like to be able to do every aspect so I started coding. As far as the “how” – nothing special really, I don’t read books, never went to school, I just read other people’s code and always wonder how things work

（引自TJ Holowaychuk's answer to How did TJ Holowaychuk learn to program? – Quora）

那么至少可以确定读源码疗法是有效的，不管偷学到的是招式还是心法，捡到篮子里的都是菜

一.为什么去读underscore？

源码短（1500行的样子）
听说有助于学习函数式编程（不能说完全没有帮助，但离FP还差的很远）
有用（my.js还缺一套集合操作支持）

每天早上花1个小时读50-100行，一个多月就读完了，成本不算高

然后对_的依赖程度一般，很多时候能立即想到可用的_.xxx()，但很少有非用不可的场景（除了昨天用的_.groupBy()，确实比较方便）

二.亮点

1.callback包装

很多_.xxx()都用到了callback包装，如下：

  // Internal function that returns an efficient (for current engines) version
  // of the passed-in callback, to be repeatedly applied in other Underscore
  // functions.
// 针对ctx的优化，类似于currying
  var optimizeCb = function(func, context, argCount) {
// 没有ctx就直接返回func
    if (context === void 0) return func;
// 不传第3个参数就默认是3
    switch (argCount == null ? 3 : argCount) {
// 确定单参
      case 1: return function(value) {
        return func.call(context, value);
      };
// 确定2参，value、other
      case 2: return function(value, other) {
        return func.call(context, value, other);
      };
// 默认3参，item、index、arr
      case 3: return function(value, index, collection) {
        return func.call(context, value, index, collection);
      };
// 确定4参，收集器、item、index、arr
      case 4: return function(accumulator, value, index, collection) {
        return func.call(context, accumulator, value, index, collection);
      };
    }
// >4参，用apply
    return function() {
      return func.apply(context, arguments);
    };
  };

  // A mostly-internal function to generate callbacks that can be applied
  // to each element in a collection, returning the desired result — either
  // identity, an arbitrary callback, a property matcher, or a property accessor.
// 很有用的回调生成方法，很多公开方法都是在cb的基础上实现的
  var cb = function(value, context, argCount) {
// 第1个参数为空，就返回一个管子方法，x => x
    if (value == null) return _.identity;
// 第一个参数是函数，就返回currying过的callback
    if (_.isFunction(value)) return optimizeCb(value, context, argCount);
// 第一个参数是对象，就返回一个属性检测器 (value, attrs) => value是不是attrs的超集（示例属性上有一份attrs，键值一摸一样）
    if (_.isObject(value)) return _.matcher(value);
// 默认返回取值方法，把value作为key 返回obj => obj[key]
    return _.property(value);
  };
  _.iteratee = function(value, context) {
// 返回一个callback，见cb中的四种情况，可以作用于集合中的每个元素
    return cb(value, context, Infinity);
  };

把传入的值包装成合适的callback，集合操作中省去了很多麻烦

2.用函数代替简单值

  // Return all the elements that pass a truth test.
  // Aliased as `select`.
// 过滤器，圈个子集,保留漏勺下面的
  _.filter = _.select = function(obj, predicate, context) {
    var results = [];
    // 筛选规则，转换为callback(item, index, arr)
    predicate = cb(predicate, context);
// 遍历，筛选
    _.each(obj, function(value, index, list) {
      // 筛选true就丢到结果集
      if (predicate(value, index, list)) results.push(value);
    });
    return results;
  };

  // Return all the elements for which a truth test fails.
// 与过滤器相反，保留漏勺上面的
  _.reject = function(obj, predicate, context) {
    return _.filter(obj, _.negate(cb(predicate)), context);
  };

我们发现_.reject()的实现非常简单，看样子是对筛选规则predicate取反，再做一遍filter()，负责对函数取反的_.negate()也没什么神秘的：

  // Returns a negated version of the passed-in predicate.
// 取反，再包一层，对判断函数的返回值取反
  _.negate = function(predicate) {
    return function() {
      return !predicate.apply(this, arguments);
    };
  };

之所以能简单地对predicate取反就实现了相反功能的reject()，正是因为函数式编程的一个小技巧：

尽可能使用函数代替简单值

把筛选条件抽出去作为函数，而不是传入一系列基本值，内部if...else筛选，带来了极大的灵活性

3.函数组合的威力

简单组合2个函数，就能实现相对复杂的功能了：

  // Convenience version of a common use case of `map`: fetching a property.
// 从对象集合中取出指定属性值，形成新数组
// 类似于查表，取出某一列
  _.pluck = function(obj, key) {
    // 做映射y=prop(key)
    return _.map(obj, _.property(key));
  };

  // Convenience version of a common use case of `filter`: selecting only objects
  // containing specific `key:value` pairs.
// 从集合中筛选出含有指定键值对集合的元素
  _.where = function(obj, attrs) {
    // 先取出attrs的实例属性，再对obj进行超集检测留下包含这些属性的元素
    return _.filter(obj, _.matcher(attrs));
  };

还有更巧妙的，先定义强大的_.partial()：

  // Partially apply a function by creating a version that has had some of its
  // arguments pre-filled, without changing its dynamic `this` context. _ acts
  // as a placeholder, allowing any combination of arguments to be pre-filled.
// 类似于currying，但提供了占位符
// 通过占位符可以跳着绑，比用bind实现的一般currying更强大
  _.partial = function(func) {
    // func后面的其它参数都是要绑定给func的
    var boundArgs = slice.call(arguments, 1);
    // currying结果
    var bound = function() {
      var position = 0, length = boundArgs.length;
      var args = Array(length);
      for (var i = 0; i < length; i++) {
        // 如果要绑定的参数为_（表示一个占位符，当然，也是underscore），就把新传入的参数填进去
        //! 例如_.partial((a, b, c, d) => console.log(a, b, c, d), 1, _, _, 4)(2, 3);
        // 否则不变，就用之前currying内定的参数值
        args[i] = boundArgs[i] === _ ? arguments[position++] : boundArgs[i];
      }
      // 如果新传入的参数有剩余（填完空还多余几个），就都放在参数列表最后
      while (position < arguments.length) args.push(arguments[position++]);
      // bind执行
      return executeBound(func, bound, this, this, args);
    };
    return bound;
  };

然后可以玩各种杂技，比如通过_.delay()实现nextTick：

  // Defers a function, scheduling it to run after the current call stack has
  // cleared.
// nextTick，延迟1毫秒执行
// 实现很巧妙，通过_.partial给_.delay做currying，把func空出来，只绑定wait=1
// 此时_.defer(func)就等价于_.delay(func, 1)
  _.defer = _.partial(_.delay, _, 1);

比如通过_.before()实现once：

  // Returns a function that will be executed at most one time, no matter how
  // often you call it. Useful for lazy initialization.
// 只执行1次
// _.before()的一种情况，对_.before做个currying
  _.once = _.partial(_.before, 2);

_.partial()就像狂野炎术士、缩小射线工程师一样，创造了无限可能

4.OOP支持

_支持链式调用，他们自称是OOP方式

// OOP

// —————

// If Underscore is called as a function, it returns a wrapped object that

// can be used OO-style. This wrapper holds altered versions of all the

// underscore functions. Wrapped objects may be chained.

那么怎样让挂在_上的n个静态方法支持链式调用呢？

首先，弄个对象出来：

  // Create a safe reference to the Underscore object for use below.
// 用来支持链式调用，这样下面所有方法都作为静态方法存在
  var _ = function(obj) {
    // 链没断就直接返回
    if (obj instanceof _) return obj;
    // 链断了就重新包一个续上
    if (!(this instanceof _)) return new _(obj);
    // 持有被包裹的对象
    this._wrapped = obj;
  };

然后想办法把静态方法交给这些对象：

  // Add your own custom functions to the Underscore object.
// 扩展_
// 把静态方法全粘到原型对象上
  _.mixin = function(obj) {
    // 遍历obj身上的所有方法名
    _.each(_.functions(obj), function(name) {
      // 当前方法
      var func = _[name] = obj[name];
      // 粘到_的原型对象上去
      _.prototype[name] = function() {
        // 准备参数，把被包裹的对象作为第一个参数
        var args = [this._wrapped];
        // 把调用时的参数列表接上去
        push.apply(args, arguments);
        // 用准备好的参数，以_为ctx执行当前方法
        // result()用来处理需不需要支持链式调用
        return result(this, func.apply(_, args));
      };
    });
  };

最后，把所有静态方法粘到_的原型对象上：

  // Add all of the Underscore functions to the wrapper object.
//! 能支持OOP的原因
// 把自己的静态方法全粘到原型对象上
  _.mixin(_);

5.正则性能优化小技巧

  // Functions for escaping and unescaping strings to/from HTML interpolation.
  // 转义器
  // 根据传入字典做转义/去转义
  var createEscaper = function(map) {
    // 查字典
    var escaper = function(match) {
      return map[match];
    };
    // Regexes for identifying a key that needs to be escaped
    // 根据待转义项拼接生成匹配规则
    var source = '(?:' + _.keys(map).join('|') + ')';
    // 匹配正则，单次
    var testRegexp = RegExp(source);
    // 替换正则，多次
    var replaceRegexp = RegExp(source, 'g');
    return function(string) {
      // 传入字符串检查，undefined/null转空串
      string = string == null ? '' : '' + string;
//! 性能优化
//! 先用匹配正则检查，存在需要转义的才上替换正则（匹配，查字典，换掉）
      return testRegexp.test(string) ? string.replace(replaceRegexp, escaper) : string;
    };
  };
  // 转义html
  _.escape = createEscaper(escapeMap);
  // 去转义
  _.unescape = createEscaper(unescapeMap);

小技巧在这里：

// 匹配正则，单次
var testRegexp = RegExp(source);
// 替换正则，多次
var replaceRegexp = RegExp(source, 'g');
//...
//! 性能优化
//! 先用匹配正则检查，存在需要转义的才上替换正则（匹配，查字典，换掉）
return testRegexp.test(string) ? string.replace(replaceRegexp, escaper) : string;

三.注意事项

通过源码发现了一些比较难受的地方

1.uniqueId

代码自己会说话

  // Generate a unique integer id (unique within the entire client session).
  // Useful for temporary DOM ids.
  // 私有计数器
  var idCounter = 0;
// 生成客户端唯一id
//!!! 如果没有prefix的话，直接就是1, 2, 3...很容易冲突
// 多用作临时DOM id
  _.uniqueId = function(prefix) {
    // 先自增，从1开始
    var id = ++idCounter + '';
    // 传了前缀的话拼上，否则裸1, 2, 3...
    return prefix ? prefix + id : id;
  };

Backbone的cid用的就是这个东西，实现非常简单，或者说弱，并不是想象中强大的唯一id

使用时需要注意，想保证唯一，就只用_.uniqueId()来生成id，不要把几套生成id的方案一起用，裸1, 2, 3...太容易冲突了

2.unique

集合无序的话，去重方法性能不怎么样

  // Produce a duplicate-free version of the array. If the array has already
  // been sorted, you have the option of using a faster algorithm.
  // Aliased as `unique`.
// 去重
// 如果数组有序，传入isSorted真值一次过
// 无序的话，实现方式是循环包含性检测，性能比字典法差很多
  _.uniq = _.unique = function(array, isSorted, iteratee, context) {
    // isSorted不是布尔值的话，做3参支持处理
    // 把3个参数(array, iteratee, context)映射到4个参数对应位置上，isSorted为false
    if (!_.isBoolean(isSorted)) {
      context = iteratee;
      iteratee = isSorted;
      isSorted = false;
    }
    // 如果传了权值计算方法，包装成callback(item, index, arr)
    if (iteratee != null) iteratee = cb(iteratee, context);
    // 结果集和临时变量
    var result = [];
    var seen = [];
    // 遍历
    for (var i = 0, length = getLength(array); i < length; i++) {
      // 当前值、计算权值（没传权值计算方法的话，权值就是当前值）
      var value = array[i],
          computed = iteratee ? iteratee(value, i, array) : value;
      // 有序就直接seen记录上一个值，一次过
      if (isSorted) {
        // i === 0或者上一个元素的权值不等于当前元素的权值，添进结果集
        if (!i || seen !== computed) result.push(value);
        // 更新状态
        seen = computed;
      } else if (iteratee) {
      // 无序，但传了权值计算方法的话
        // 如果seen集合里没有当前元素的权值，值添进结果集，权值添进seen集
        if (!_.contains(seen, computed)) {
          seen.push(computed);
          result.push(value);
        }
      } else if (!_.contains(result, value)) {
      // 无序 且 没传权值计算方法 且结果集中不含当前值，添进去
        result.push(value);
      }
    }
    return result;
  };

因为是循环包含性检测，而_.contains(arr, value)查找性能显然不如字典法的key in dir

3.before

  // Returns a function that will only be executed up to (but not including) the Nth call.
// 只执行几次
//! 只执行times-1次，为什么不包括第times次？搞得_.once()看着都难受
  _.before = function(times, func) {
    // 缓存返回值
    var memo;
    return function() {
      // 前times-1次调用
      if (--times > 0) {
        memo = func.apply(this, arguments);
      }
      // 之后的调用忽略掉，直接返回最后一次执行结果
      if (times <= 1) func = null;
      return memo;
    };
  };

所以_.once()长这样子：

_.once = _.partial(_.before, 2);

4.isFunction

  // Optimize `isFunction` if appropriate. Work around some typeof bugs in old v8,
  // IE 11 (#1621), and in Safari 8 (#1929).
// 函数判断，兼容老版本v8、IE11和Safari8
  // 浏览器hack
  // 如果typeof检测正则表达式不为'function' 且 typeof检测Int8Array不为'object'
  if (typeof /./ != 'function' && typeof Int8Array != 'object') {
    // 重写函数判断，typeof检测返回'function'
//! || false是为了解决IE8 & 11下的一个诡异问题（有时typeof dom元素结果是'function'，|| false竟然能解决），见：
//! https://github.com/jashkenas/underscore/issues/1621
    _.isFunction = function(obj) {
      return typeof obj == 'function' || false;
    };
  }

没看明白|| false有什么用，跑去提了个issue，然后知道了这个历史问题

5.sortBy

  // Sort the object's values by a criterion produced by an iteratee.
// 按iteratee给定的衡量标准对集合元素排序
  _.sortBy = function(obj, iteratee, context) {
    // 转换为callback(item, index, arr)
    iteratee = cb(iteratee, context);
// 1.fx = (v, i, w)，做映射，计算每个元素的权值，并记录索引
// 2.原生sort方法排序，按权值升序排列，权值相等时保持原顺序
// 3.取出结果表的value列
    return _.pluck(_.map(obj, function(value, index, list) {
      return {
        value: value,
        index: index,
        criteria: iteratee(value, index, list)
      };
    }).sort(function(left, right) {
      var a = left.criteria;
      var b = right.criteria;
      if (a !== b) {
// 认为undefined很大，升序的话，最终所有undefined都排在后面
        if (a > b || a === void 0) return 1;
        if (a < b || b === void 0) return -1;
      }
      return left.index - right.index;
    }), 'value');
  };

需要注意2个问题：

默认升序认为undefined很大，最终排在最后面
undefined可能会导致排序失败

例如：

// 默认升序
_.sortBy([,,1,,2]);
// [1, 2, undefined, undefined, undefined]
_.sortBy([,,1,,2], v => v)
// [1, 2, undefined, undefined, undefined]
_.sortBy([,,1,,2], v => v * 1)
// [undefined, undefined, 1, undefined, 2]

原因很简单，undefined * 1 === NaN，而NaN不大于x也不小于x，所以：

// NaN不满足这2道检测
if (a > b || a === void 0) return 1;
if (a < b || b === void 0) return -1;
// 一路跑到
return left.index - right.index;

所以保持原顺序，排序失败。所以使用_.sortBy(obj, fn)要注意undefined的隐患

四.源码分析

Git地址：https://github.com/ayqy/underscore-1.8.3

P.S.源码1500行，读完手动注释版本2200行，足够详细

参考资料

http://underscorejs.org/
#1621：IE8&11下一个非常奇怪的问题