CLRS7 快速排序

2017-05-11 | 阅读：次

介绍

快速排序虽然最坏时间复杂度是O(n^2),但通常是实际排序应用中最好的选择。它的期望时间复杂度是θ(nlgn)，而且其中隐含的常数因子非常小，它还能进行原址排序。

7.1 快速排序的描述。

QUICKSORT(A, p, r)
    if p < r
        q = PARTITION(A, p, r)
        QUICKSORT(A, p, q-1)
        QUICKSORT(A, q+1, r)


QUICKSORT(A, 1, A.length)


PARTITION(A, p, r)
    x = A[r]
    i = p - 1
    for j = p to r-1
        if A[j] <= x
            i += 1
            exchange A[i] with A[j]
    exchange A[i+1] with A[r]
    return i + 1

练习7.1-1 Using Figure 7.1 as a model, illustrate the operation of PARTITION on the array A = [13, 19, 3, 5, 12, 8, 7, 4, 21, 2, 6, 11].

13 19 3 5 12 8 7 4 21 2 6 11
13 19 3 5 12 8 7 4 21 2 6 11
13 19 3 5 12 8 7 4 21 2 6 11
3 19 13 5 12 8 7 4 21 2 6 11
3 5 13 19 12 8 7 4 21 2 6 11
3 5 8 19 12 13 7 4 21 2 6 11
3 5 8 7 12 13 19 4 21 2 6 11
3 5 8 7 4 13 19 12 21 2 6 11
3 5 8 7 4 13 19 12 21 2 6 11
3 5 8 7 4 2 19 12 21 13 6 11
3 5 8 7 4 2 6 12 21 13 19 11
3 5 8 7 4 2 6 11 21 13 19 12

练习7.1-2 What value of q does PARTITION return when all elements in the array A[p…r] have the same value? Modify PARTITION so that q = (p+r)/2 when all elements in the array A[p…r] have the same value.

返回r.
在迭代过程中计算出等于主元的元素的数目count，在结果中减去count/2

练习7.1-3 Give a brief argument that the running time of PARTITION on a subarray of size n is Θ(n).

It only iterate the array once.

练习7.1-4 How would you modify QUICKSORT to sort into nonincreasing order?

PARTITION(A, p, r)
    x = A[r]
    i = p - 1
    for j = p to r - 1:
        if A[j] <= x:
            i += 1
            exchange A[i] with A[j]
    exchange A[i+1] with A[r]
    return i + 1

7.2 快速排序的性能

快速排序的平均运行时间更接近于其最好情况。任何一种常数比例的划分都会产生深度为θ(lgn）的递归树，即运行时间总是O(nlgn)
练习7.2-1 Use the substitution method to prove that the recurrence T (n) = T (n - 1) + Θ(n) has the solution T (n) = Θ(n^2), as claimed at the beginning of Section 7.2.

T(n) = T(n-1) + Θ(n) = Θ(n)+Θ(n-1)+...+Θ(1) = Θ(n^2)

练习7.2-2 What is the running time of QUICKSORT when all elements of array A have the same value?

这时出现最坏情况，Θ(n^2)

练习7.2-3 Show that the running time of QUICKSORT is Θ(n^2) when the array A contains distinct elements and is sorted in decreasing order.

同上也是最坏情况，T(n) = T(n-1) + Θ(n)

练习7.2-4 Banks often record transactions on an account in order of the times of the transactions, but many people like to receive their bank statements with checks listed in order by check number. People usually write checks in order by check number, and merchants usually cash them with reasonable dispatch. The problem of converting time-of-transaction ordering to check-number ordering is therefore the problem of sorting almost-sorted input. Argue that the procedure INSERTION-SORT would tend to beat the procedure QUICKSORT on this problem.

原数组逆序对越少插入排序越快。

练习7.2-5 Suppose that the splits at every level of quicksort are in the proportion 1 - α to α, where 0 < α ≤ 1/2 is a constant. Show that the minimum depth of a leaf in the recursion tree is approximately - lg n/ lg α and the maximum depth is approximately -lg n/ lg(1 - α). (Don’t worry about integer round-off.)

最小深度总是得到上一层的α，所以n*α^h = 1, h = - lg n/ lg α
最大深度总是得到上一层的1 - α，所以n*(1-α)^h = 1, h = -lg n/ lg(1 - α)

练习7.2-6 Argue that for any constant 0 < α ≤ 1/2, the probability is approximately 1 - 2α that on a random input array, PARTITION produces a split more balanced than 1-α to α.

p = ((1 - α) - α) / 1

7.3 快速排序的随机化版本

随机选取一个元素作为主元。

RANDOMIZED-PARTITION(A, p, r)
    i = RANDOM(p, r)
    exchange A[r] with A[i]
    return PARTITION(A, p, r)

RANDOMIZED-QUICKSORT(A, p, r)
    if p < r
        q = RANDOMIZED-PARTITION(A, p, r)
        RANDOMIZED-QUICKSORT(A, p, q-1)
        RANDOMIZED-QUICKSORT(A, q+1, p)

练习7.3-1 Why do we analyze the average-case performance of a randomized algorithm and not its worst-case performance?

因为最坏情况极少发生，而且没有一种特定的输入会产生极端情况。

练习7.3-2 During the running of the procedure RANDOMIZED-QUICKSORT, how many calls are made to the random-number generator RANDOM in the worst case? How about in the best case? Give your answer in terms of Θ-notation.

最好情况: T(n) = 2T(n/2) + 1 = Θ(n)
最坏情况: T(n) = T(n-1) + 1 = Θ(n)

7.4 快速排序分析

证明很有趣~
练习7.4-1

令q = 1，得证。

练习7.4-2 Show that quicksort’s best-case running time is Ω(nlgn).

如上题，令q = n/2,得证。

练习7.4-3

开口向上的抛物线。

练习7.4-4 Show that RANDOMIZED-QUICKSORT’s expected running time is Ω(n lg n).

已经证了最好情况都是Ω(n lg n).

练习7.4-5 The running time of quicksort can be improved in practice by taking advantage of the fast running time of insertion sort when its input is “nearly” sorted. When quicksort is called on a subarray with fewer than k elements, let it simply return without sorting the subarray. After the top-level call to quicksort returns, run insertion sort on the entire array to finish the sorting process. Argue that this sorting algorithm runs in O(nk + nlg(n/k)) expected time. How should k be picked, both in theory and in practice?

当深度为lg(n/k)便停止了快排，所以快速排序时间是O(nlg(n/k))
插入排序有n/k个小数组，每个需要O(k^2),时间是n/k*O(k^2) = O(nk)

The main idea is to note that the recursion stops when n^2*i = k, that is i = log2(n)(k). The recursion takes in total O(n * lg(n)(k)). The resulting array is composed of k subarrays of size n=k, where the elements in each subarray are all less than all the subarrays following it. Running Insertion-Sort on the entire array is thus equivalent to sorting each of the n/k subarrays of size k, which takes on the average n/k * O(k2) = O(nk) (the expected running time of Insertion-Sort is O(n^2)).

If k is chosen too big, then the O(nk) cost of insertion becomes bigger than (n lg n). Therefore k must be O(lg n). Furthermore it must be that O(nk + n lg (n)(k) ) = O(n lg n). If the constant factors in the big-oh notation are ignored, than it follows that k should be such that k < lg k which is impossible (unless k = 1) - the error comes from ignoring the constant factors. Let c1 be the constant factor in quicksort, and c2 be the constant factor in insertion sort. Than k must be chosen such that c2k + c1 lg n k < c1 lg n which requires c1k < c2 lg k. In practice these constants cannot be ignored (also there can be lower order terms in O(n lg n)) and k should be chosen experimentally.

练习7.4-6 Consider modifying the PARTITION procedure by randomly picking three elements from array A and partitioning about their median (the middle value of the three elements). Approximate the probability of getting at worst an α-to-(1 - α) split, as a function of α in the range 0 < α < 1.

1,2,3都在中间：(1-2α)^3
1,2在中间，3不在：(1-2α)^2*α
2,3在中间，1不在：(1-2α)^2*α
2在中间，1，3不在：(1-2α)*α*α
加起来。

思考题

7-1 Hoare partition correctness

a.
       13  19  9  5  12  8  7  4  11  2  6  21
    i  x                                        j
       6  19  9  5  12  8  7  4  11  2  13  21
       i                                j
       6  2  9  5  12  8  7  4  11  19  13  21
          i                         j
       6  2  9  5  12  8  7  4  11  19  13  21
                                j   i  

b.
    i, j从两端往中间靠拢。

c.
    j肯定会减,而且至多减到p。

d.
    It's obvious.

e.
    QUICI-SORT(A, p, r)
        if p < r
            q = partition(A, p, r)
            quicksort(A, p, q-1)
            quicksort(A, q+1, r)

7-2 针对相同元素值的快速排序

a.
    Θ(n^2)

b.
    def partition(A, p, r):
        less, bigger, equal = [], [], []
        x = A[r]
        for i in A(p: r+1):
            if i < x: less.append(i)
            elif i == x: equal.append(i)
            else: bigger.append(i)
        A[p:r+1] = less + equal + bigger
        return (len(less), len(less)+len(equal)-1)

c.
    RANDOMIZED-PARTITION(A, p, r)
        i = RANDOM(p, r)
        exchange A[r] with A[i]
        return partition(A, p, r)

    RANDOMIZED-QUICKSORT(A, p, r)
        a, b = RANDOMIZED-RANTITION(A, p, r)
        RANDOMIZED-QUICKSORT(A, p, a-1)
        RANDOMIZED-QUICKSORT(A, p, b+1)

d.
    这样产生的结果肯定比所有元素互异要好，所以也是O(nlgn)

7-3 Alternative quicksort analysis

a.
    E[Xi] = 1/n
b.
    Obvious.
c.
    化简即可。
d.
    ......
e.
    ......

7-4 Stack depth for quicksort

a.
    先排左边，再排右边。
b.
    每次partition都返回r.
c.
    总是迭代小的一边。

7-5 Median-of-3 partition

a.
    6(i-1)(n-i)/(n(n-1)(n-2))

b.
    150%

c.
    13/27

d.
    无法保证取得最优划分点。与普通分法相比并没有很大提升。

7-6 Fuzzy sorting of intervals

import random
import copy

def intersects(a, b):
    return a[0] <= b[1] and b[0] <= a[1]

def before(a, b):
    return a[1] < b[0]

def partition(items, p, r):
    pick = random.randint(p, r)
    items[pick], items[r] = items[r], items[pick]
    intersection = copy.deepcopy(items[r])

    for i in range(p, r):
        if intersects(intersection, items[i]):
            if itesm[i][0] > intersection[0]:
                intersection[0] = items[i][0]
            if items[i][1] < intersection[1]:
                intersection[1] = items[i][1]

    s = p
    for i in range(p, r):
        if before(items[i], intersection):
            items[i], items[s] = items[s], items[i]
            s += 1

    items[r], items[s] = items[s], items[r]

    t = s + 1
    while t <= i:
        if intersects(items[i], intersection):
            items[t], items[i] = items[i], items[t]
            t += 1
        else:
            i -= 1

    return (s, t)

def fuzzy_sort(items, p, r):
    if (p < r):
        pivot = partition(items, p, r)
        fuzzy_sort(itmes, p, pivot[0])
        fuzzy_sort(items, pivot[1], r)

Sasuke

Jing

CLRS7 快速排序

介绍

7.1 快速排序的描述。

7.2 快速排序的性能

7.3 快速排序的随机化版本

7.4 快速排序分析

思考题

用积分近似累加和