Java selenium處理極驗(yàn)滑動(dòng)驗(yàn)證碼示例
要爬取一個(gè)網(wǎng)站遇到了極驗(yàn)的驗(yàn)證碼,這周都在想著怎么破解這個(gè),網(wǎng)上搜了好多知乎上看到有人問(wèn)了這問(wèn)題,我按照這思路去大概實(shí)現(xiàn)了一下。

1.使用htmlunit(這種方式我沒(méi)成功,模擬鼠標(biāo)拖拽后軌跡沒(méi)生成,可以跳過(guò))
我用的是java,我首先先想到了用直接用htmlunit,我做了點(diǎn)初始化
private void initWebClient() {
if (webClient != null) {
return;
}
webClient = new WebClient(BrowserVersion.FIREFOX_24);
webClient.getOptions().setProxyConfig(new ProxyConfig("127.0.0.1",8888));
webClient.getOptions().setActiveXNative(true);
webClient.getOptions().setUseInsecureSSL(true); // 配置證書(shū)
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setCssEnabled(true);
webClient.setCssErrorHandler(new SilentCssErrorHandler());
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
CookieManager cookieManager = new CookieManager();
List<org.apache.http.cookie.Cookie> httpCookies = client.getCookies();//其方式獲取的cookie
for (org.apache.http.cookie.Cookie cookie : httpCookies) {
cookieManager.addCookie(new com.gargoylesoftware.htmlunit.util.Cookie(cookie));
}
webClient.setCookieManager(cookieManager);
}
初始化代理,cookie..然后就能正常調(diào)用了
HtmlPage page = webClient.getPage("http://www.qixin.com/login");//企信寶
gePageInfor(page);
下面就是我獲取圖片,還原圖片并且模擬拖拽,(這里我覺(jué)得是有些問(wèn)題的,可能是拖拽我模擬的不對(duì)導(dǎo)致觸發(fā)的js并沒(méi)有生成正確的軌跡,還請(qǐng)大家?guī)兔纯茨睦镥e(cuò)了)
private void gePageInfor(HtmlPage page) {
String[] img_slice={"div", "class", "gt_cut_fullbg_slice"};
String[] img_bg_slice={"div", "class", "gt_cut_bg_slice"};
HtmlDivision div = (HtmlDivision) page.getElementById("captcha");
int deCAPTCHA = 0;
try {
byte[] img_slice_binary = client.get(getImgUrl(img_slice, div, true)).getBinary();//獲取圖片byte
byte[] img_bg_slice_binary = client.get(getImgUrl(img_bg_slice, div, false)).getBinary();
//獲取還原后的圖片
BufferedImage geetestImg = ImgTest.getGeetestImg(img_slice_binary, ImgTest.imgArray);
BufferedImage geetestImg2 = ImgTest.getGeetestImg(img_bg_slice_binary, ImgTest.imgArray);
//獲得圖片移動(dòng)位置(目前還有問(wèn)題,需改用第三方圖片識(shí)別)
deCAPTCHA =ImgTest.deCAPTCHA(geetestImg,geetestImg2);
System.out.println(deCAPTCHA);
} catch (IOException | FetchException e) {
e.printStackTrace();
}
HtmlDivision div_slider_knob = get_div_slider_knob(page,"gt_slider_knob gt_show");//獲取要移動(dòng)div
HtmlPage mouseOver = (HtmlPage) div_slider_knob.mouseOver();
HtmlPage mouseDownPage = (HtmlPage)div_slider_knob.mouseDown();
div_slider_knob = get_div_slider_knob(mouseDownPage,"gt_slider_knob gt_show moving");
mouseMoveX(deCAPTCHA, div_slider_knob, mouseDownPage);
HtmlPage newPage =(HtmlPage)div_slider_knob.mouseOver();
// newPage =(HtmlPage)div_slider_knob.mouseDown();
System.out.println(newPage.asXml());
div = (HtmlDivision)newPage.getElementById("captcha");
HtmlElement htmlElement = div.getElementsByAttribute("div", "class", "gt_slice gt_show moving").get(0);
System.out.println(htmlElement);
newPage =(HtmlPage)div_slider_knob.mouseUp();//觸發(fā)js,軌跡沒(méi)有生成
System.out.println("---------------");
System.out.println(newPage.asXml());
if (newPage.getElementById("captcha")!=null) {//錯(cuò)誤重試
//gePageInfor(newPage);
}
}
private void mouseMoveX(int deCAPTCHA, HtmlDivision div_slider_knob, HtmlPage mouseDown) {
MouseEvent mouseEvent = new MouseEvent(div_slider_knob, MouseEvent.TYPE_MOUSE_MOVE, false, false, false, MouseEvent.BUTTON_LEFT);
mouseEvent.setClientX( mouseEvent.getClientX()+((deCAPTCHA!=0)?deCAPTCHA:99)); //移動(dòng)x坐標(biāo)
ScriptResult scriptResult = mouseDown.getDocumentElement().fireEvent(mouseEvent);
}
private HtmlDivision get_div_slider_knob(HtmlPage page,String classString) {
return (HtmlDivision)(((HtmlDivision) page.getElementById("captcha")).getElementsByAttribute("div", "class", classString).get(0));
}
private String getImgUrl(String[] img_slice, HtmlDivision div, boolean isNeedCheckPostion) {
String url ="";
int[] postion = new int[2];
boolean empty = div.getElementsByAttribute(img_slice[0],img_slice[1],img_slice[2]).isEmpty();
if (div.hasChildNodes() && !empty) {
List<HtmlElement> elementsByAttribute = div.getElementsByAttribute(img_slice[0],img_slice[1],img_slice[2]);
for(int i = 0;i<elementsByAttribute.size();i++){
HtmlDivision div_img = (HtmlDivision)elementsByAttribute.get(i);
String style = div_img.getAttribute("style");
String[] imge_url_position = style.split(";");
if(StringUtils.isBlank(url)){//確認(rèn)url
url = StringUtils.replacePattern(imge_url_position[0], ".*\\(", "").replace(")", "");
}
if (isNeedCheckPostion) {//確認(rèn)圖片切割postion,兩張圖切割方式一樣 background-position: -157px -58px
// String[] positionS = StringUtils.split(StringUtils.remove(imge_url_position[1], "px").replace("-", "").replaceAll(".*:", ""), null);
String[] positionS = StringUtils.split(StringUtils.removePattern(imge_url_position[1], "[^\\d+ \\s]"),null);
postion[0] = Integer.parseInt(positionS[0]);
postion[1] = Integer.parseInt(positionS[1]);
int[] is = ImgTest.imgArray[i];
if (is[0]!=postion[0]||is[1]!=postion[1]) {
logger.debug("更新分割postion");
ImgTest.imgArray[i] = postion;
}
System.out.println(ImgTest.imgArray);
isNeedCheckPostion= false;
}
}
}
return url;
}
對(duì)比圖片獲取位移方法(deCAPTCHA)是錯(cuò)的我就不放代碼了,下面是其中還原圖片用的方法,目前是其實(shí)審查元素后你就明白怎么還原這個(gè)圖片了,這里是每次讀的10px,58px
public static BufferedImage getGeetestImg(byte[] binary, int[][] imgArray) throws IOException {
BufferedImage img = ImageIO.read(new ByteArrayInputStream(binary));
List<BufferedImage> list = new ArrayList<>();
for (int i=0;i< imgArray.length;i++) {
BufferedImage subimage = img.getSubimage(imgArray[i][0], imgArray[i][1], 10, 58);
list.add(subimage);
// ImageIO.write(subimage, "jpg", new File("d:\\image\\imgs"+i+".jpg"));
}
BufferedImage mergeImageUp = null;
BufferedImage mergeImageDown = null;
int mid = list.size()>>>1;
for (int i = 0; i <mid-1 ; i++) {
mergeImageUp = mergeImage(mergeImageUp==null?list.get(i):mergeImageUp, list.get(i+1), true);
}
for(int i = mid;i<list.size()-1;i++){
mergeImageDown = mergeImage(mergeImageDown==null?list.get(i):mergeImageDown,list.get(i+1), true);
}
img = mergeImage(mergeImageUp, mergeImageDown, false);
return img;
}
public static BufferedImage mergeImage(BufferedImage img1,
BufferedImage img2, boolean isHorizontal) throws IOException {
int w1 = img1.getWidth();
int h1 = img1.getHeight();
int w2 = img2.getWidth();
int h2 = img2.getHeight();
// 從圖片中讀取RGB
int[] ImageArrayOne = new int[w1 * h1];
ImageArrayOne = img1.getRGB(0, 0, w1, h1, ImageArrayOne, 0, w1); // 逐行掃描圖像中各個(gè)像素的RGB到數(shù)組中
int[] ImageArrayTwo = new int[w2 * h2];
ImageArrayTwo = img2.getRGB(0, 0, w2, h2, ImageArrayTwo, 0, w2);
// 生成新圖片
BufferedImage DestImage = null;
if (isHorizontal) { // 水平方向合并
DestImage = new BufferedImage(w1+w2, h1, BufferedImage.TYPE_INT_RGB);
DestImage.setRGB(0, 0, w1, h1, ImageArrayOne, 0, w1); // 設(shè)置上半部分或左半部分的RGB
DestImage.setRGB(w1, 0, w2, h2, ImageArrayTwo, 0, w2);
} else { // 垂直方向合并
DestImage = new BufferedImage(w1, h1 + h2,
BufferedImage.TYPE_INT_RGB);
DestImage.setRGB(0, 0, w1, h1, ImageArrayOne, 0, w1); // 設(shè)置上半部分或左半部分的RGB
DestImage.setRGB(0, h1, w2, h2, ImageArrayTwo, 0, w2); // 設(shè)置下半部分的RGB
}
return DestImage;
}
2.使用selenium
后來(lái)我想著是我模擬鼠標(biāo)這個(gè)動(dòng)作哪里有問(wèn)題,我就又找到了selenium(2.42.2),他也能操作htmlunit關(guān)鍵他的鼠標(biāo)動(dòng)作好像封裝比較完全
但是我嘗試了以后發(fā)現(xiàn)了這個(gè),HtmlUnitMouse這個(gè)動(dòng)作沒(méi)有實(shí)現(xiàn)
public void mouseMove(Coordinates where, long xOffset, long yOffset) {
throw new UnsupportedOperationException("Moving to arbitrary X,Y coordinates not supported.");
}
好吧,于是調(diào)用chrome吧
System.setProperty("webdriver.chrome.driver","C:\\chromedriver.exe");
Proxy proxy = new Proxy();
//設(shè)置代理服務(wù)器地址
proxy.setHttpProxy("127.0.0.1:8888");
// DesiredCapabilities capabilities = DesiredCapabilities.htmlUnitWithJs();
DesiredCapabilities capabilities = DesiredCapabilities.chrome();
capabilities.setCapability(CapabilityType.PROXY, proxy);
// final WebDriver driver = new HtmlUnitDriver(capabilities);
WebDriver driver = new ChromeDriver(capabilities);
driver.get("http://www.qixin.com/login");
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
checkPage(driver,"return $('.gt_cut_fullbg_slice');");
// 獲取 網(wǎng)頁(yè)的 title
System.out.println("1 Page title is: " + driver.getTitle());
// 通過(guò) id 找到 input 的 DOM
String pageSource = driver.getPageSource();
System.out.println(pageSource);
org.openqa.selenium.JavascriptExecutor executor = (org.openqa.selenium.JavascriptExecutor)driver;
boolean equals = executor.executeScript("return document.readyState").equals("complete");
int moveX =99;//移動(dòng)位置
if (equals) {
WebElement element = driver.findElement(By.className("gt_slider_knob"));//(".gt_slider_knob"));
Point location = element.getLocation();
element.getSize();
Actions action = new Actions(driver);
// action.clickAndHold().perform();// 鼠標(biāo)在當(dāng)前位置點(diǎn)擊后不釋放
// action.clickAndHold(element).perform();// 鼠標(biāo)在 onElement 元素的位置點(diǎn)擊后不釋放
// action.clickAndHold(element).moveByOffset(location.x+99,location.y).release().perform(); //選中source元素->拖放到(xOffset,yOffset)位置->釋放左鍵
action.dragAndDropBy(element, location.x+moveX,location.y).perform();
// action.dragAndDrop(element,newelement).perform();
pageSource = driver.getPageSource();
}
//更新cookie
Set<org.openqa.selenium.Cookie> cookies = driver.manage().getCookies();
Set<Cookie> cookies2 = new HashSet<>();
for (org.openqa.selenium.Cookie cookie : cookies) {
cookies2.add((Cookie) new Cookie(cookie.getDomain(), cookie.getName(), cookie.getValue(), cookie.getPath(), cookie.getExpiry(), true));
}
for (Cookie cookie : cookies2) {
org.apache.http.cookie.Cookie httpClient = cookie.toHttpClient();
}
System.out.println(pageSource);
這樣提交的表單確實(shí)是有軌跡的,這里移動(dòng)位置我先寫(xiě)了個(gè)固定值,可以由上面圖片還原,以及一些開(kāi)源的圖片識(shí)別工具識(shí)別出位置。以上應(yīng)該就能解決這個(gè)滑動(dòng)驗(yàn)證碼了
以上就是本文的全部?jī)?nèi)容,希望對(duì)大家的學(xué)習(xí)有所幫助,也希望大家多多支持腳本之家。
相關(guān)文章
基于Spring定時(shí)任務(wù)的fixedRate和fixedDelay的區(qū)別
這篇文章主要介紹了基于Spring定時(shí)任務(wù)的fixedRate和fixedDelay的區(qū)別,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教2021-10-10
Java創(chuàng)建線程的七種方法總結(jié)(全網(wǎng)最全面)
線程是Java中的基本執(zhí)行單元,它允許程序在同一時(shí)間執(zhí)行多個(gè)任務(wù),下面這篇文章主要給大家總結(jié)介紹了關(guān)于Java創(chuàng)建線程的七種方法,文中通過(guò)實(shí)例代碼將這七種方法介紹的非常詳細(xì),需要的朋友可以參考下2023-05-05
擴(kuò)展logback將日志輸出到Kafka實(shí)例詳解
logback是一個(gè)功能強(qiáng)大的Java日志框架,它是log4j的繼任者,提供了豐富的功能和配置選項(xiàng),本文將介紹如何通過(guò)擴(kuò)展logback,將日志輸出到Kafka實(shí)例,感興趣的朋友一起看看吧2024-12-12
SpringBoot中使用MyBatis-Plus實(shí)現(xiàn)分頁(yè)接口的詳細(xì)教程
MyBatis-Plus是一個(gè)MyBatis的增強(qiáng)工具,在MyBatis的基礎(chǔ)上只做增強(qiáng)不做改變,為簡(jiǎn)化開(kāi)發(fā)、提高效率而生,在SpringBoot項(xiàng)目中使用MyBatis-Plus可以大大簡(jiǎn)化分頁(yè)邏輯的編寫(xiě),本文將介紹如何在 SpringBoot項(xiàng)目中使用MyBatis-Plus實(shí)現(xiàn)分頁(yè)接口2024-03-03
Spring運(yùn)行時(shí)手動(dòng)注入bean的方法實(shí)例
spring給我們提供了IOC服務(wù),讓我們可以用注解的方式,方便的使用bean的相互引用,下面這篇文章主要給大家介紹了關(guān)于Spring運(yùn)行時(shí)手動(dòng)注入bean的相關(guān)資料,需要的朋友可以參考下2022-05-05
MybatisGenerator文件生成不出對(duì)應(yīng)文件的問(wèn)題
本文介紹了使用MybatisGenerator生成文件時(shí)遇到的問(wèn)題及解決方法,主要步驟包括檢查目標(biāo)表是否存在、是否能連接到數(shù)據(jù)庫(kù)、配置生成器的路徑等,通過(guò)在項(xiàng)目結(jié)構(gòu)中引入相應(yīng)的jar包,并在GeneratorSqlmap.java文件中運(yùn)行,可以成功生成對(duì)應(yīng)的文件2025-01-01
Java實(shí)現(xiàn)批量導(dǎo)入excel表格數(shù)據(jù)到數(shù)據(jù)庫(kù)中的方法
這篇文章主要介紹了Java實(shí)現(xiàn)批量導(dǎo)入excel表格數(shù)據(jù)到數(shù)據(jù)庫(kù)中的方法,結(jié)合實(shí)例形式詳細(xì)分析了java導(dǎo)入Excel數(shù)據(jù)到數(shù)據(jù)庫(kù)的具體步驟與相關(guān)操作技巧,需要的朋友可以參考下2017-10-10
Java中split根據(jù)"."分割字符串問(wèn)題舉例
split表達(dá)式其實(shí)就是一個(gè)正則表達(dá)式,* | . ^ 等符號(hào)在正則表達(dá)式中屬于一種有特殊含義的字符,下面這篇文章主要給大家介紹了關(guān)于Java中split根據(jù)“.“分割字符串問(wèn)題的相關(guān)資料,需要的朋友可以參考下2022-10-10

