🌸 Java基础（二）——HashMap剖析（下）

## HashMap的规约

JavaDocs中HashMap的spec是这么写的：

Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.

• 基于Map接口实现
• 允许null键/值
• 非同步
• 不保证有序（如插入的顺序）
• 不保证顺序不随时间变化。

## 两个因子

• 初始容量（initial capacity）。容量是指哈希表中桶的数量，而初始容量顾名思义就是HashMap实例创建时的最初容量。

• Initial capacity. The capacity is the number of buckets in the hash table, The initial capacity is simply the capacity at the time the hash table is created.
• Load factor. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased.

### 初始容量

HashMap的默认初始容量为16。把它设成2的幂次是有意为之。我们再回顾一下index的计算：

index = hash(Key) & (length - 1)


static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}


Computes key.hashCode() and spreads (XORs) higher bits of hash to lower. Because the table uses power-of-two masking, sets of hashes that vary only in bits above the current mask will always collide. (Among known examples are sets of Float keys holding consecutive whole numbers in small tables.) So we apply a transform that spreads the impact of higher bits downward. There is a tradeoff between speed, utility, and quality of bit-spreading. Because many common sets of hashes are already reasonably distributed (so don’t benefit from spreading), and because we use trees to handle large sets of collisions in bins, we just XOR some shifted bits in the cheapest possible way to reduce systematic lossage, as well as to incorporate impact of the highest bits that would otherwise never be used in index calculations because of table bounds.

### 载入因子

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

## 一棵树

• 首先根据hashCode()hash，然后确定桶的index；
• 如果桶的头结点的Key不是我们需要的，则通过Key.equals()链表中找。

• 空间性能。红黑树虽然改善了链表增删改查的性能，但是其结点大小是链表结点的两倍。
• 转换性能。把链表转换成红黑树的过程需要一定时间成本。
• 数据结构性能。也就是之前所说的，在冲突较多的情况下，红黑树比链表的性能高很多。

$P(X=k)=\frac{\lambda^k}{k!}e^{-\lambda}$

Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use(see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution(http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5) * pow(0.5, k)/factorial(k)). The first values are: 0: 0.60653066 1: 0.30326533 2: 0.07581633 … 8: 0.00000006 more: less than 1 in ten million

… and I think it also assumes that the number of elements in the HashMap is 50% of the number of buckets