웹크롤링_pyautogui ( Port-MIS 운임 및 요금 조회 )

티스토리 뷰

개발

웹크롤링_pyautogui ( Port-MIS 운임 및 요금 조회 )

정으주 2021. 7. 16. 16:18

내가 하고싶은건 Port-MIS 사이트에서 운임 및 요금 조회 데이터를 긁어 모으는 것

밑에 스크린샷을 보면 알 수 있듯이

항로, 컨테이너 소유, 수출입 구분, 컨테이너 종류, 크기, 공표일을 필수로 입력해야 검색이 가능하다.

대충 검색해보면 밑에 표가 쫘르륵 나오고 엑셀파일을 다운받을 수 있게 해놨다.

엑셀파일을 싸그리 모으기 위해 코드를 작성해 보았다.

from selenium import webdriver

driver = webdriver.Chrome('driver/chromedriver')
driver.get("https://new.portmis.go.kr/portmis/websquare/websquare.jsp?w2xPath=/portmis/w2/main/index.xml&page=/portmis/w2/cm/sys/UI-PM-MT-001-021.xml&menuId=0045&menuCd=M4735&menuNm=%BB%E7%C0%CC%C6%AE%B8%CA")
driver.implicitly_wait(5)

#항로 리스트 만들기
xpath = """//*[@id="mf_tacMain_contents_M9843_body_wq_uuid_972_input_0"]"""

sealane_list_raw = driver.find_element_by_xpath(xpath)
sealane_list = sealane_list_raw.find_elements_by_tag_name("option")

sealane_names = [option.get_attribute("value") for option in sealane_list]
del sealane_names[0]
sealane_names

#컨테이너 소유 리스트 만들기
xpath = """//*[@id="mf_tacMain_contents_M9843_body_sbx_contnOwnSe_input_0"]"""
container_own_list_raw = driver.find_element_by_xpath(xpath)
container_own_list = container_own_list_raw.find_elements_by_tag_name("option")

container_own_names = [option.get_attribute("value") for option in container_own_list]
del container_own_names[0]
container_own_names

#수출입 구분 리스트 만들기
xpath = """//*[@id="mf_tacMain_contents_M9843_body_sbx_imxprtSe_input_0"]"""
portation_type_list_raw = driver.find_element_by_xpath(xpath)
portation_type_list = portation_type_list_raw.find_elements_by_tag_name("option")

portation_type_names = [option.get_attribute("value") for option in portation_type_list]
portation_type_names

#컨테이너 종류 리스트 만들기
xpath = """//*[@id="mf_tacMain_contents_M9843_body_sbx_contnCnd_input_0"]"""
container_type_list_raw = driver.find_element_by_xpath(xpath)
container_type_list = container_type_list_raw.find_elements_by_tag_name("option")

container_type_names = [option.get_attribute("value") for option in container_type_list]
del container_type_names[0]
container_type_names

#컨테이너 크기 리스트 만들기
xpath = """//*[@id="mf_tacMain_contents_M9843_body_sbx_contnStdStndrd_input_0"]"""
container_size_list_raw = driver.find_element_by_xpath(xpath)
container_size_list = container_size_list_raw.find_elements_by_tag_name("option")

container_size_names = [option.get_attribute("value") for option in container_size_list]
del container_size_names[0]
container_size_names

일단 홈페이지에 접근해서 각각 옵션의 xpath를 따서 리스트를 만들었다.

글구 아무렇게나 옵션을 선택하고 검색을 해보는데

예상치 못한 난관에 봉착하게됨

전체 다운로드를 누르면 나오는 창!!

요 팝업같은 팝업아닌 팝업창을 지우는 것이 젤 문제였다ㅠㅠㅠ

이자식은 xpath가 매번 바뀌어서 코드에 적용도 못하고 헤매다가 성나서 걍 잤버렸다,.ㅋㅋㅋㅋ

다음날 눈뜨고 가만히 스트레스 받다가,,, 전에 사용해봤던 매크로에서 좌표를 찍었던게 생각이 나서 좌표 찍는 걸 찾아봤다!

https://youtu.be/MOd8pkeA__k

여기서 매우 친절하게 갈켜주심

정말 감사합니다

import time
from selenium import webdriver
import pyautogui

#검색 버튼 클릭
xpath = """//*[@id="mf_tacMain_contents_M9843_body_wq_uuid_969_btnSearch"]"""
driver.find_element_by_xpath(xpath).click()

time.sleep(5)

#엑셀 다운 버튼 클릭
xpath = """//*[@id="mf_tacMain_contents_M9843_body_btn_ExcelAll"]"""
driver.find_element_by_xpath(xpath).click()

time.sleep(10)

#print(pyautogui.position()) #종료버튼 좌표찍기

pyautogui.screenshot('closepopup.png',region=(1134,335,40,40))

closepopup = pyautogui.locateCenterOnScreen('closepopup.png')
pyautogui.click(closepopup) #팝업 종료

#da = Alert(driver)
#da.accept()
#xpath = """//*[@id="ja_"]/div/div[1]"""
#driver.find_element_by_xpath(xpath).click()
                        
#time.sleep(2)

파이썬 라이브러리 pyautigui 를 이용하면 되는데, 여기엔 완전 신기하고 재밌는 기능들이 많았다.

역시 파이썬,,,,,,,,,,,, 환상의 세계 파이썬,,,,,,,,,,

내가 원하는 부분의 좌표를 찍을 수도 있고, 거기서 png파일을 만들어 그 이미지를 찾아 클릭할 수도 있다.

우선 print(pyautogui.position())을 이용해서 문제의 창 닫기 버튼의 좌표를 찍어준다. (왼쪽 윗부분을 찍어줘야함)

글구 screenshot으로 그 부분만 짤라서 png파일로 만들고

그 이미지를 찾아 클릭하게 하면 완성!

import time
from tqdm import tqdm_notebook
import pyautogui

for s in tqdm_notebook(sealane_names): 
    element = driver.find_element_by_id("mf_tacMain_contents_M9843_body_wq_uuid_972_input_0")
    element.send_keys(s)
    
    for co in tqdm_notebook(container_own_names):
        element = driver.find_element_by_id("mf_tacMain_contents_M9843_body_sbx_contnOwnSe_input_0")
        element.send_keys(co)
        
        for pt in tqdm_notebook(portation_type_names):
            element = driver.find_element_by_id("mf_tacMain_contents_M9843_body_sbx_imxprtSe_input_0")
            element.send_keys(pt)
        
            for ct in tqdm_notebook(container_type_names):
                element = driver.find_element_by_id("mf_tacMain_contents_M9843_body_sbx_contnCnd_input_0")
                element.send_keys(ct)
        
                for cs in tqdm_notebook(container_size_names):
                    element = driver.find_element_by_id("mf_tacMain_contents_M9843_body_sbx_contnStdStndrd_input_0")
                    element.send_keys(cs)
                    
                    time.sleep(3)
                    
                    #검색 버튼 클릭
                    xpath = """//*[@id="mf_tacMain_contents_M9843_body_wq_uuid_963_btnSearch"]"""
                    driver.find_element_by_xpath(xpath).click()

                    time.sleep(5)

                    #엑셀 다운 버튼 클릭
                    xpath = """//*[@id="mf_tacMain_contents_M9843_body_btn_ExcelAll"]"""
                    driver.find_element_by_xpath(xpath).click()

                    time.sleep(10)

                    #print(pyautogui.position()) #종료버튼 좌표찍기

                    pyautogui.screenshot('closepopup.png',region=(1134,335,40,40))

                    closepopup = pyautogui.locateCenterOnScreen('closepopup.png')
                    pyautogui.click(closepopup) #팝업 종료
                    
                    time.sleep(3)
driver.close()

이제 for문으로 반복해주며 다양하게 데이터를 찾아나서면 된다.

xpath 가 고정된 값인줄 알았는데 다음날 아침에 보니까 몇개는 계속 바뀌더라..

이것도 수정해야 할 듯,,,

저작자표시

'개발' 카테고리의 다른 글

텐서플로우 (0)	2021.12.07

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

글 보관함

와와

티스토리 뷰

웹크롤링_pyautogui ( Port-MIS 운임 및 요금 조회 )

'개발' 카테고리의 다른 글

티스토리툴바