介绍
讨论三种线性时间复杂度的排序算法:计数排序、基数排序和桶排序。
8.1 排序算法的下界
-
比较排序的下界是Ω(nlgn)
-
在一个比较排序算法中,我们只使用元素间的比较来获得输入序列中的元素间次序的信息。
-
在最坏情况下,任何比较排序算法都需要做Ω(nlgn)次比较。因为有n!个叶节点。
-
堆排序和归并排序都是渐进最优的比较排序算法。
-
练习8.1-1 What is the smallest possible depth of a leaf in a decision tree for a comparison sort?
最少需要n-1次比较,最小深度是n-1
- 练习8.1-2 Obtain asymptotically tight bounds on lg(n!) without using Stirling’s approximation. Instead, evaluate the summation.
可以用积分来近似求和~
- 练习8.1-3 Show that there is no comparison sort whose running time is linear for at least half of the n! inputs of length n. What about a fraction of 1/n of the inputs of length n? What about a fraction 1/2^n?
线性运行时间的比较排序最多有2^n个,所以n!/2, n!/n, n!/2^n要小于2^n才行。
- 练习8.1-4 You are given a sequence of n elements to sort. The input sequence consists of n/k subsequences, each containing k elements. The elements in a given subsequence are all smaller than the elements in the succeeding subsequence and larger than the elements in the preceding subsequence. Thus, all that is needed to sort the whole sequence of length n is to sort the k elements in each of the n/k subsequences. Show an Ω(n lg k) lower bound on the number of comparisons needed to solve this variant of the sorting problem. (Hint: It is not rigorous to simply combine the lower bounds for the individual subsequences.)
(k!)^(n/k) < 2^h
-> h > lg(k!)^(n/k)
-> h > n/k*lg(k!)
-> h > n/k*k*lg(k)
-> h = Ω(nlgk)
2^h要大于所有可能输出才行。
计数排序
- 假设n个输入元素中的每一个都是在0到k区间内的一个整数,其中k为某个整数,当k=O(n)时,排序运行时间为θ(n).
COUNTING-SORT(A, B, k)
let C[0..k] be a new array
for i = 0 to k
c[i] = 0
for j = 1 to A.length
C[A[j]] = C[A[j]] + 1
for i = 1 to k
C[i] = C[i] + C[i-1]
for j = A.length downto 1
B[C[A[j]]] = A[j]
C[A[j]] = C[A[j]] - 1
-
使用输入元素的实际值来确定其在数组中的位置。
-
计数排序是稳定的。
-
练习8.2-1 Using Figure 8.2 as a model, illustrate the operation of COUNTING-SORT on the array A = [6, 0, 2, 0, 1, 3, 4, 6, 1, 3, 2].
C: [2, 4, 6, 8, 9, 9, 11]
B: [0, 0, 1, 1, 2, 2, 3, 3, 4, 6, 6]
- 练习8.2-2 Prove that COUNTING-SORT is stable.
最后一个for循环从后往前扫描输出,遇到相同元素把它放到同大小的最后位置,因此是稳定的。
- 练习8.2-3 Suppose that the for loop header in line 9 of the COUNTING-SORT procedure is rewritten as 9 for j ← 1 to length[A]. Show that the algorithm still works properly. Is the modified algorithm stable?
是正确的,但是不稳定了。
- 练习8.2-4 Describe an algorithm that, given n integers in the range 0 to k, preprocesses its input and then answers any query about how many of the n integers fall into a range [a…b] in O(1) time. Your algorithm should use Θ(n + k) preprocessing time.
COUNT-INTERVAL(A, k)
let C[0..k] be new array
for i = 0 to k
C[i] = 0
for j = 0 to A.length
C[A[j]] += 1
for i = 1 to k
C[i] += C[i-1]
return C
落在a, b区间的是C[b] - C[a-1], C[-1] = 0.
基数排序
- n个d位数,每一数位有k个可能的取值,如果RADIX-SORT使用的稳定排序算法耗时Θ(n+k),则RADIX-SORT耗时Θ(d(n+k))
RADIX-SORT(A, d)
for i = 1 to d
use a stable sort to srot array A on digit i
- 练习8.3-1 Using Figure 8.3 as a model, illustrate the operation of RADIX-SORT on the following list of English words: COW, DOG, SEA, RUG, ROW, MOB, BOX, TAB, BAR, EAR, TAR, DIG, BIG, TEA, NOW, FOX.
COW SEA TAB BAR
DOG TEA BAR BIG
SEA MOB EAR BOX
RUG TAB TAR COW
ROW DOG SEA DIG
MOB RUG TEA DOG
BOX DIG DIG EAR
TAB BIG BIG FOX
BAR BAR MOB MOB
EAR EAR DOG NOW
TAR TAR COW ROW
DIG COW ROW ROG
BIG ROW NOW SEA
TEA NOW BOX TAB
NOW BOX FOX TAR
FOX FOX ROG TEA
- 练习8.3-2 Which of the following sorting algorithms are stable: insertion sort, merge sort, heapsort, and quicksort? Give a simple scheme that makes any sorting algorithm stable. How much additional time and space does your scheme entail?
插入排序稳定
归并排序稳定
堆排序不稳定
快速排序不稳定
给每个元素再增加一个index属性,最后相同的元素根据index再排一次。
- 练习8.3-3 Use induction to prove that radix sort works. Where does your proof need the assumption that the intermediate sort is stable?
循环不变式:在每次循环开始前,最后i-1个元素是排好序的。
初始:开始i-1=0,故成立。
维持:i位不同肯定符合,i位相同因为上一位稳定排序,因此也符合。
终止:所有位都是排好序的。
- 练习8.3-4 Show how to sort n integers in the range 0 to n^3 - 1 in O(n) time.、
把这n个整数都看成n进制,T = O(d(n+k)), d=3, k=n, 所以是O(n)
- 练习8.3-5 In the first card-sorting algorithm in this section, exactly how many sorting passes are needed to sort d-digit decimal numbers in the worst case? How many piles of cards would an operator need to keep track of in the worst case?
需要递归来做,需要k^d次,要同时记录nk堆数据.
桶排序
- 桶排序假设输入数据服从均匀分布,平均情况下它的时间代价是O(n)
BUCKET-SORT(A)
n = A.length
let B[0..n-1] be a ney array
for i = 0 to n-1
make B[i] an empty list
for i = 1 to n
insert A[i] to list B[floor(nA[i])]
for i = 0 to n-1
sort list B[i] with insertion sort
concatenate the lists B[0], B[1],...,B[n-1] together in order
-
只要输入数据满足:所有桶的大小的平方和总的元素数呈线性关系,桶排序就可以在线性时间完成。
-
练习8.4-1 Using Figure 8.4 as a model, illustrate the operation of BUCKET-SORT on the array A =[.79, .13, .16, .64, .39, .20, .89, .53, .71, .42]
B:
1 /
2 .13 .16
3 .20
4 .39
5 .42
6 .53
7 .64
8 .71 .79
9 .89
10 /
- 练习8.4-2 What is the worst-case running time for the bucket-sort algorithm? What simple change to the algorithm preserves its linear expected running time and makes its worst-case running time O(n lg n)?
当所有元素都落到一个桶内时,插入排序所以运行时间是O(n^2)
可以对这些元素用merge-sort来排序。
- 练习8.4-3 Let X be a random variable that is equal to the number of heads in two flips of a fair coin. What is E [X^2]? What is E^2[X]?
3/2
1
- 练习8.4-4
划分n个面积相等的圆环。
- 练习8.4-5 A probability distribution function P(x) for a random variable X is defined by P(x) = Pr {X ≤ x}. Suppose that a list of n random variables X1, X2, . . .,Xn is drawn from a continuous probability distribution function P that is computable in O(1) time. Show how to sort these numbers in linear expected time.
先计算桶划分点用时O(n)
再桶排序用时O(n)
思考题
- 8-1 Average-case lower bounds on comparison sorting
a.
每一种输入对应一种输出,共有n!种输入,因此也有n!种输出。且都是等可能的,
故n!个叶节点标有1/n!,其它都不可能出现,是0
b.
有多少叶节点就要加多少个1,因此加k.
c.
从所有可能中取最小的。
d.
求导。
e.
一共有n!个叶节点,所以D(T) > d(n!) = Ω(n!lgn!)
Ω(n!lg(n!))/n! = Ω((nlgn)
f.
非随机化版本有n!种路径,随机化在其中。
- 8-2
a.
计数排序
b.
用思考7.1中的HOARE-PARTITION
c.
merge-sort
d.
a可以
b不stable
c需要θ(lgn)时间
e.
用一个大小为k的数组来count,不稳定。
- 8-3 Sorting variable-length items
a.
先将所有整数按照位数多少分组,每组再用radix-sort
b.
先reverse所有字符串,然后用radix-sort.
如果第i次迭代遇到字符串长度小于i,以后不再迭代它。
- 8-4 Water jugs
a.
用二重循环,从而在θ(n^2)时间内可以完成。
b.
共有n!种可能输出,所以Omega(nlgn)
c.
可以先用随机选蓝壶的办法快排红壶,
然后用随机选红壶的办法快排蓝壶,
然后一一配对
- 8-5 Average sorting
a.
说明它和正常排序一样,完全排好序。
b.
2, 1, 3, 5, 4, 6, 7, 8, 9, 10
c.
移项即可。
d.
分为k组,用mergesort的算法。
e.
6.5-9
f.
k是常量,忽略掉了。
- 8-6 Lower bound on merging sorted lists
a.
从2n中随便选n个不考虑顺序
b.
树的高度大于2n
c.
比较过才知道放到哪里~
d.
只有前后元素才比较,因此2n-1
- 8-7 0-1 排序引理和列排序
a.
你好像就是这样定义B的
c.
因为这个算法不依赖于待排序元素的值。
defgh下次做。。