Sorting and Searching

Python Algorithm Study - Week 7, 8

Algorithmpython

2017-08-23


Problem Solving with Algorithms and Data Structures using Python ์ฑ…์„ ์ฝ์œผ๋ฉฐ ์Šคํ„ฐ๋””ํ•œ ๋‚ด์šฉ์„ ์ •๋ฆฌํ•˜์—ฌ ์—ฐ์žฌํ•ฉ๋‹ˆ๋‹ค.


1. Sequential search

  • Sequential search๋Š” ๋ฆฌ์ŠคํŠธ์—์„œ item์„ ์ฐพ์„ ๋•Œ, ์•ž์—์„œ ํ•˜๋‚˜์”ฉ ๋น„๊ตํ•˜๋ฉฐ ์ฐพ๋Š” ๋ฐฉ์‹์ด๋‹ค.
  • Analysis
  • Unordered list
Case Best Case Worst Case Average Case
item is present 1 n n/2
item is not present n n n
  1. Ordered list
Case Best Case Worst Case Average Case
item is present 1 n n/2
item is not present 1 n n/2

2. Binary search

  • Binary search๋Š” ๋ฆฌ์ŠคํŠธ๊ฐ€ ํฌ๊ธฐ์ˆœ์œผ๋กœ ์ •๋ ฌ์ด ๋˜์–ด์žˆ์„ ๋•Œ, ๊ฐ€์šด๋ฐ ์žˆ๋Š” ๋ฐ์ดํ„ฐ์™€ ์ฐพ๊ณ ์ž ํ•˜๋Š” ์•„์ดํ…œ์„ ๋น„๊ตํ•ด์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ˜์”ฉ ์ค„์—ฌ๋‚˜๊ฐ€๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.
  • Binary search์˜ ์‹œ๊ฐ„๋ณต์žก๋„๋Š” O(logn)์ด๋‹ค. Sequential search๋ณด๋‹ค ์‹œ๊ฐ„๋ณต์žก๋„๋Š” ๋‚ฎ์ง€๋งŒ ๋ฌด์กฐ๊ฑด '์ •๋ ฌ'์„ ๊ฑฐ์ณ์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ฒฝ์šฐ์— ๋”ฐ๋ผ ๋น„์šฉ์ด ๋” ๋“ค์ˆ˜๋„ ์žˆ๋‹ค. (If we can sort once and then search many times, the cost of the sort is not so significant. However, for large lists, sorting even once can be so expensive that simply performing a sequential search from the start may be the best choice.)
# Recursive binary search
def binarySearch(alist, item):
    if len(alist) == 0:
        return False
    else:
        midpoint = len(alist)//2
        if alist[midpoint]==item:
            return True
        else:
            if item<alist[midpoint]:
                return binarySearch(alist[:midpoint],item)
            else:
                return binarySearch(alist[midpoint+1:],item)

2. Hashing

  1. Hashing์ด๋ž€?

    • ์ž„์˜์˜ ํฌ๊ธฐ๋ฅผ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ์ •๋œ ํฌ๊ธฐ์˜ ๋ฐ์ดํ„ฐ๋กœ ๋ณ€ํ™˜์‹œํ‚ค๋Š” ๊ฒƒ. ์ฃผ๋กœ ์ž„์˜์˜ ๋ฌธ์ž์—ด์„ ๋น ๋ฅด๊ฒŒ ์ฐพ๊ธฐ ์œ„ํ•ด ์งง์€ ์ •์ˆ˜๋‚˜ ์•ŒํŒŒ๋ฒณ์„ ํ• ๋‹นํ•ด์ค€๋‹ค.
    • ์ด์ƒ์ ์œผ๋กœ๋Š” ์‹œ๊ฐ„๋ณต์žก๋„๋ฅผ O(1)๊ณผ ๊ฐ€์žฅ ๊ฐ€๊น๊ฒŒ ๋งŒ๋“ค์–ด ๊ฒ€์ƒ‰์†๋„๋ฅผ ๋น ๋ฅด๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์ด๋‚˜, ์™„์ „ํ•œ O(1)์€ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค.
    • AWS EC2 ์ธ์Šคํ„ด์Šค๋ฅผ ๋งŒ๋“ค ๋•Œ ์ƒ์„ฑ๋˜๋Š” ํ‚คํŽ˜์–ด์˜ ์›๋ฆฌ์ธ - SHA(Secure Hash Algorithm)์™€ MD5๊ฐ™์€ ์•”ํ˜ธํ™”์™€๋„ ๊ด€๋ จ์žˆ๋‹ค. (์ถœ์ฒ˜)
  2. Hash table / Slot / Hash function / Load factor

    • Hash table์€ ๋ฐ์ดํ„ฐ๋“ค์˜ ๋ชจ์Œ์ด๋ฉฐ ์ผ์ •ํ•œ ํฌ๊ธฐ(m)๋ฅผ ๊ฐ€์ง„๋‹ค.
    • ์ด ํ…Œ์ด๋ธ”์˜ ๊ฐ ์ž๋ฆฌ๋ฅผ slot์ด๋ผ๊ณ  ํ•˜๋ฉฐ slot์€ 0๋ถ€ํ„ฐ (m-1)๊นŒ์ง€ ์ธ๋ฑ์Šค๋ฅผ ๊ฐ–๋Š”๋‹ค.
    • ๊ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์•Œ๋งž์€ slot์— ๋งคํ•‘ํ•ด์ฃผ๋Š” ์‹์„ hash function์ด๋ผ๊ณ  ํ•œ๋‹ค.
    • ํ…Œ์ด๋ธ” ํฌ๊ธฐ์— ๋Œ€ํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง„ slot์˜ ๊ฐœ์ˆ˜๋ฅผ load factor(๋ถ€ํ•˜๊ณ„์ˆ˜)๋ผ๊ณ  ํ•œ๋‹ค. ํ…Œ์ด๋ธ” ํฌ๊ธฐ๊ฐ€ 11์ด๊ณ  ๋ฐ์ดํ„ฐ๊ฐ€ ์ฑ„์›Œ์ง„ slot์ด 6๊ฐœ๋ผ๋ฉด load factor(ฮป)๋Š” 6/11์ด๋‹ค.
    • ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋ฐฉ์‹์˜ hash function ์ด๋‹ค. ๋ฐ์ดํ„ฐ๋ฅผ ํ•ด์‹œํ…Œ์ด๋ธ”์˜ ํฌ๊ธฐ๋กœ ๋‚˜๋ˆˆ ๋‚˜๋จธ์ง€๋กœ ํ• ๋‹นํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. img
    • ๊ทธ๋Ÿฌ๋‚˜ ์ด ๊ฒฝ์šฐ, ๊ฐ™์€ ๋‚˜๋จธ์ง€ ๊ฐ’์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ๋Š” ํ•œ ์Šฌ๋กฏ์—์„œ ์ถฉ๋Œํ•˜๊ฒŒ ๋œ๋‹ค. ์œ„์˜ ์˜ˆ์—์„œ ๋งŒ์•ฝ 44 ๋ผ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ์„ ๊ฒฝ์šฐ '44 % 11 = 0'์ด๊ธฐ ๋•Œ๋ฌธ์— 77 ๊ณผ ์ถฉ๋Œํ•˜๊ฒŒ ๋œ๋‹ค.
    • ์ถฉ๋Œ์ด ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š” hash function ์„ Perfect hash function์ด๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ž„์˜์˜ ๋ฐ์ดํ„ฐ๋“ค์˜ ๋ชจ์Œ์ด ์žˆ์„ ๋•Œ, ์™„๋ฒฝํ•œ ํ•ด์‹œ ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์€ ๋ฐ์ดํ„ฐ ์ˆซ์ž๋งŒํผ ํ•ด์‹œ ํ…Œ์ด๋ธ”์˜ ์ˆซ์ž๋ฅผ ๋Š˜๋ฆฌ์ง€ ์•Š๋Š” ์ด์ƒ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค.
    • ์™„๋ฒฝํ•œ ํ•ด์‹œ ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์—†๋‹ค๋ฉด ์ถฉ๋Œ์„ ์ตœ์†Œํ™” ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ฐพ์œผ๋ฉด ๋œ๋‹ค.
    • ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋ฐฉ์‹์˜ hash function์ด๋‹ค. ๋ฐ์ดํ„ฐ๋ฅผ ํ•ด์ƒˆํ…Œ์ด๋ธ”์˜ ํฌ๊ธฐ๋กœ ๋‚˜๋ˆˆ ๋‚˜๋จธ์ง€๋กœ ํ• ๋‹นํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. img
    • ๊ทธ๋Ÿฌ๋‚˜ ์ด ๊ฒฝ์šฐ, ๊ฐ™์€ ๋‚˜๋จธ์ง€ ๊ฐ’์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ๋Š” ํ•œ ์Šฌ๋กฏ์—์„œ ์ถฉ๋Œํ•˜๊ฒŒ ๋œ๋‹ค. ์œ„์˜ ์˜ˆ์—์„œ ๋งŒ์•ฝ 44๋ผ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ์„ ๊ฒฝ์šฐ '44 % 11 = 0'์ด๊ธฐ ๋•Œ๋ฌธ์— 77๊ณผ ์ถฉ๋Œํ•˜๊ฒŒ ๋œ๋‹ค.
    • ์ถฉ๋Œ์ด ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š” hash function์„ Perfect hash function์ด๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ž„์˜์˜ ๋ฐ์ดํ„ฐ๋“ค์˜ ๋ชจ์Œ์ด ์žˆ์„ ๋•Œ, ์™„๋ฒฝํ•œ ํ•ด์‹œ ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์€ ๋ฐ์ดํ„ฐ ์ˆซ์ž๋งŒํผ ํ•ด์‹œ ํ…Œ์ด๋ธ”์˜ ์ˆซ์ž๋ฅผ ๋Š˜๋ฆฌ์ง€ ์•Š๋Š” ์ด์ƒ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค.
    • ์™„๋ฒฝํ•œ ํ•ด์‹œ ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์—†๋‹ค๋ฉด ์ถฉ๋Œ์„ ์ตœ์†Œํ™” ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ฐพ์œผ๋ฉด ๋œ๋‹ค.
  3. Collision Resolution (์ถฉ๋Œ์ด ๋ฐœ์ƒํ•  ๋•Œ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•)

    1. Open addressing & Linear probing
    2. ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๊ฐ€ ๊ธฐ์กด ๋ฐ์ดํ„ฐ์™€ ๊ฐ™์€ ์Šฌ๋กฏ์—์„œ ์ถฉ๋Œํ•  ๋•Œ, ๋น„์–ด์žˆ๋Š” ์Šฌ๋กฏ์„ ์ฐพ์•„ ํ•œ์นธ์”ฉ ์˜†์œผ๋กœ ๊ฐ€๋Š” ๋ฐฉ๋ฒ•. ์ด๋ฅผ rehashing์ด๋ผ๊ณ  ๋ถ€๋ฅด๊ธฐ๋„ ํ•œ๋‹ค.
    3. ์ถฉ๋Œ์ด ๋งŽ์ด ๋‚˜๋Š” ์Šฌ๋กฏ ๊ทผ์ฒ˜์— ๋ฐ์ดํ„ฐ๊ฐ€ ์Œ“์ด๋Š” clustering์ด ์ƒ๊ธด๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค.
    4. Chaining
    5. ํ•˜๋‚˜์˜ ์Šฌ๋กฏ์— ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ๋“ค์–ด๊ฐ€๋Š” ๊ฒƒ์„ ํ—ˆ์šฉํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.
    6. ์ธ์Šคํƒ€๊ทธ๋žจ์˜ hashtag๋‚˜, ์‚ฌ์ „์—์„œ ๋‹จ์–ด๋ฅผ ์ฐพ์„ ๋•Œ ๋งจ ์•ž ๊ธ€์ž๊ฐ€ ์†ํ•œ ๊ณณ๋ถ€ํ„ฐ ์ฐพ๋Š”, '์ƒ‰์ธ'๊ณผ ๊ฐ™์€ ๊ฐœ๋…์ด๋‹ค.
    7. ํ•จ๊ป˜ ๋ณด๊ธฐ :point_right: How to resize a hash table?
  4. Map abstract data type
  5. ํŒŒ์ด์ฌ์—์„œ Dictionary๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ๋ฐ์ดํ„ฐ ํƒ€์ž…์€ key์™€ value๊ฐ€ ์ง์„ ์ด๋ฃจ๋Š” Map ์ž๋ฃŒ๊ตฌ์กฐ์™€ ๊ฐ™๋‹ค.
# Open addressing ๋ฐฉ์‹์˜ Hash๋กœ Map ๋ฐ์ดํ„ฐํƒ€์ž… ๋งŒ๋“ค๊ธฐ
class HashTable:
    def __init__(self):
        self.size = 11
        self.slots = [None] * self.size
        self.data = [None] * self.size

    def put(self,key,data):
    hashvalue = self.hashfunction(key,len(self.slots))

    if self.slots[hashvalue] == None:
        self.slots[hashvalue] = key
        self.data[hashvalue] = data
    else:
        if self.slots[hashvalue] == key:
            self.data[hashvalue] = data  #replace
        else:   # Open addressing ๋ฐฉ๋ฒ•
            nextslot = self.rehash(hashvalue,len(self.slots))
            while self.slots[nextslot] != None and self.slots[nextslot] != key:
                nextslot = self.rehash(nextslot,len(self.slots))

            if self.slots[nextslot] == None:
                self.slots[nextslot]=key
                self.data[nextslot]=data
            else:
                self.data[nextslot] = data #replace

    def hashfunction(self,key,size):
         return key%size

    def rehash(self,oldhash,size):
        return (oldhash+1)%size

    def get(self,key):
        startslot = self.hashfunction(key,len(self.slots))

        data = None
        stop = False
        found = False
        position = startslot
        while self.slots[position] != None and not found and not stop:
            if self.slots[position] == key:
                found = True
                data = self.data[position]
            else:
                position=self.rehash(position,len(self.slots))
                if position == startslot:
                    stop = True
        return data

    def __getitem__(self,key):
        return self.get(key)

    def __setitem__(self,key,data):
        self.put(key,data)
  1. Analysis of Hashing
  2. Hash table์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฒƒ์€ load factor(ฮป)์ด๋‹ค. ฮป์ด ์ž‘์„์ˆ˜๋ก ๋น„์–ด์žˆ๋Š” slot์ด ๋งŽ๋‹ค๋Š” ๋œป์ด๋ฉฐ, ์ถฉ๋Œ๋„ ๋น„๊ต์  ์ ๋‹ค. ๋ฐ˜๋Œ€๋กœ ฮป์ด ํด์ˆ˜๋ก hash table์ด ๊ฝ‰ ์ฐจ์žˆ๋‹ค๋Š” ์˜๋ฏธ์ด๋ฉฐ ๋”ฐ๋ผ์„œ ์ถฉ๋Œ๋„ ๋” ๋งŽ์„ ์ˆ˜ ์žˆ๋‹ค.

3. Sorting

  1. The Bubble Sort

    • (n-1๊ณผ n์„ ๋น„๊ต) X n๋ฒˆ
    • O(n^2)
def bubble_sort(list):
    for passnum in range(len(list)-1, 0, -1):
        for i in range(passnum):
            if list[i] > list[i+1]:
                list[i], list[i+1] = list[i+1], list[i]
  1. The Selection Sort

    • ์ œ์ผ ํฐ๊ฑฐ search ํ•ด์„œ ๋’ค์—์—์„œ n๋ฒˆ์งธ์— ๋‘๊ธฐ X ๋ฐ˜๋ณต
    • O(n^2)
    • pythontutor.com ์—์„œ ๊ณผ์ • ๋ณผ์ˆ˜ ์žˆ์Œ
  2. The Insertion Sort

    • ๋ฆฌ์ŠคํŠธ์˜ ํฌ๊ธฐ๋ฅผ 1์”ฉ ํ‚ค์›Œ๊ฐ€๋ฉฐ sorting
    • ์ถ”๊ฐ€๋˜๋Š” value๋Š” ์ด๋ฏธ sorting ๋˜์–ด์žˆ๋Š” ๊ฐ’๊ณผ ๋น„๊ตํ•ด์„œ ๋งž๋Š” ์ž๋ฆฌ์— ๋„ฃ๊ธฐ
  3. The Shell Sort

    • ๋ช‡๊ฐœ์˜ sublist ๋กœ ๋‚˜๋ˆ ์„œ sorting ํ•œ๋‹ค์Œ ํ•ฉ์น˜๊ธฐ
    • ๋‹จ, sublist๋กœ ๋‚˜๋ˆŒ ๋•Œ ๋ฐ”๋กœ ์˜†์— ์žˆ๋Š”๊ฑฐ๋ž‘ ๊ฐ™์ด ๋“ค์–ด๊ฐ€์ง€ ์•Š๊ณ  step์„ ๋‘๊ณ  ๋“ค์–ด๊ฐ.
    • interval ๊ฐ’์„ ๊ตฌํ•˜๋Š” ์‹๋„ ๋”ฐ๋กœ ์žˆ์Œ.
  4. The Merge Sort ์žฌ๊ท€

    • ๋‘๊ฐœ์”ฉ ๋น„๊ต ํ•ด์„œ sorting ํ•˜๊ณ  ๋น„๊ต๋œ๊ฑธ ํ•ฉ์ณ์„œ ๋‹ค์‹œ sorting ํ•˜๊ณ ... ๋ชจ๋‘ ํ•ฉ์ณ์งˆ๋•Œ๊นŒ์ง€ sorting.
    • ๋‘๊ฐœ์”ฉ ๋น„๊ตํ•˜๊ธฐ ๋•Œ๋ฌธ์— logn, ๋งˆ์ง€๋ง‰์— n๋ฒˆ ๋Œ๋ฉฐ ํ•˜๋‚˜์”ฉ ๋น„๊ตํ•˜๊ธฐ ๋•Œ๋ฌธ์— O(nlogn)์ด ๋‚˜์˜จ๋‹ค.
  5. The Quick Sort ์žฌ๊ท€

    • pivot value๋ผ๋Š” ๊ธฐ์ค€์ ์„ ํ•˜๋‚˜ ์ •ํ•œ๋‹ค(์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋‹ค์–‘ํ•จ).
    • pivot value๋ฅผ ์ œ์™ธํ•œ ๊ฐ€์žฅ ์™ผ์ชฝ ๊ฐ’๊ณผ ๊ฐ€์žฅ ์˜ค๋ฅธ์ชฝ ๊ฐ’์„ ๊ฐ๊ฐ leftmark, rightmark๋ผ๊ณ  ํ•œ๋‹ค.
    • leftmark์™€ pivot value ๋ฅผ ๋น„๊ตํ•ด์„œ leftmark๊ฐ€ ๋” ํด๋•Œ leftmark๋Š” ์›€์ง์ž„์„ ๋ฉˆ์ถ˜๋‹ค. ์ด๋•Œ leftmark์™€ rightmark๋ž‘ ๋น„๊ตํ•ด์„œ leftmark๊ฐ€ ๋” ํฌ๋ฉด swapํ•œ๋‹ค.
    • leftmark์™€ rightmark ๊ฐ€ ๋งŒ๋‚ ๋•Œ๊นŒ์ง€ ๋ฐ˜๋ณตํ•˜๋ฉด pivot value๋ฅผ ๊ธฐ์ค€์œผ๋กœ lefthalf์™€ righthalf๊ฐ€ ๋‚˜๋‰˜๊ฒŒ ๋œ๋‹ค.
    • ๋‚˜๋‰œ lefthalf์—์„œ ๋‹ค์‹œ quick sort๋ฅผ, righthalf์—์„œ๋„ quick sort๋ฅผ ๊ฐ๊ฐ ํ•œ๋‹ค.
    • ๋ฐ˜๋ณต