.NET從優(yōu)酷專輯中采集所有視頻及信息(VB.NET代碼)
更新時(shí)間:2010年02月07日 11:50:59 作者:
因?yàn)橄胱鲆粋€(gè)視頻點(diǎn)播類的網(wǎng)站,所以開始研究視頻采集。
這個(gè)方法就是提取優(yōu)酷的專輯ID,然后一個(gè)個(gè)ID進(jìn)行循環(huán)采集網(wǎng)頁(yè)代碼,從中提取title標(biāo)簽和VID,沒(méi)什么技術(shù)含量。=..=
采集中應(yīng)用.NET中的HttpWebRequest和HttpWebResponse類,代碼分析用了正則表達(dá)式。
這個(gè)代碼效率不是很好,一個(gè)網(wǎng)頁(yè)的解析時(shí)間在0.5~2秒之間,不適合大量采集。也許將它轉(zhuǎn)換成JavaScript速度會(huì)快一點(diǎn)吧。
暫時(shí)就研究這么多,代碼直接發(fā)出來(lái)給大家共享一下。
代碼VB.NET,新建一個(gè)窗體frmMain,添加一個(gè)TextBox,一個(gè)ListBox,兩個(gè)Button,復(fù)制下面的代碼:
Imports System.Net
Imports System.IO
Imports System.Text
Imports System.Text.RegularExpressions
Public Class frmMain
Structure VList
Dim id As Integer
Dim title As String
Dim vid1 As String
Dim vid2 As String
Overloads Function ToString() As String
Return String.Format("{0}:<{1}> [{2}]", id, title, vid1)
End Function
End Structure
Dim myList As New List(Of VList)
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
' 防止重復(fù)創(chuàng)建變量
Dim wr1 As HttpWebRequest
Dim wr2 As HttpWebResponse
Dim ret As String
Dim reg As Match
Dim g As Group
Dim preVid As String = "" '上一個(gè)VID
Dim nowid As Integer = 0 '當(dāng)前的視頻集數(shù)
Dim listUrl As String = TextBox1.Text '獲取專輯URL,如 http://www.youku.com/playlist_show/id_2350764.html
Dim tarUrl As String = "http://v.youku.com/v_playlist/f{0}" '{0}ListID
reg = Regex.Match(listUrl, "playlist_show/id_(\d+).*\.html")
If Not reg.Success Then
MsgBox("專輯列表提取失??!")
Exit Sub
End If
g = reg.Groups(1)
tarUrl = String.Format(tarUrl, g.Value) & "o{1}p{0}.html" '{0}集數(shù) {1}排序
wr1 = HttpWebRequest.Create(TextBox1.Text)
wr2 = wr1.GetResponse
ret = New StreamReader(wr2.GetResponseStream, Encoding.GetEncoding(wr2.CharacterSet)).ReadToEnd
reg = Regex.Match(ret, "<title>(.+) - 專輯 - 優(yōu)酷視頻</title>")
If Not reg.Success Then
MsgBox("專輯名稱提取失敗!")
Else
g = reg.Groups(1)
MsgBox("專輯名:《" & g.Value & "》")
End If
Do
' 從Web流中獲取頁(yè)面文本
wr1 = HttpWebRequest.Create(String.Format(tarUrl, nowid, "0")) '按倒序方式查找視頻
wr2 = wr1.GetResponse
ret = New StreamReader(wr2.GetResponseStream, Encoding.GetEncoding(wr2.CharacterSet)).ReadToEnd
'TextBox2.Text = ret
' 創(chuàng)建一個(gè)臨時(shí)視頻列表變量
Dim nlist As New VList
nlist.id = nowid '獲取ID
' 獲取videoId
reg = Regex.Match(ret, "var\s+videoId\s*=\s*""(\d+)""\s*;")
If Not reg.Success Then Exit Do
g = reg.Groups(1)
' 如果VID等于上一個(gè)VID最退出
If g.Value = preVid Then Exit Do
nlist.vid1 = g.Value
' 獲取videoId2
reg = Regex.Match(ret, "var\s+videoId2\s*=\s*""((\w|=)+)""\s*;") '"var\s+videoId2\s*=\s*""(\w+)""\s*;")
If Not reg.Success Then Exit Do
g = reg.Groups(1)
nlist.vid2 = g.Value
' 獲取標(biāo)題
reg = Regex.Match(ret, "<title>(.+) - (.+) - 視頻 - 優(yōu)酷視頻 - 在線觀看 - </title>")
If Not reg.Success Then
nlist.title = "{名稱查找錯(cuò)誤}"
Else
g = reg.Groups(2)
nlist.title = g.Value
End If
' 收尾工作
myList.Add(nlist) '添加到總列表中
preVid = nlist.vid1 '記錄最后一個(gè)VID
wr2.Close()
Me.Text = nowid & " : 處理完成!"
nowid += 1
Loop
wr2.Close()
MsgBox(nowid & " 個(gè)視頻全部采集處理完成!")
Button2_Click(sender, e)
End Sub
Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
ListBox1.Items.Clear()
For Each ls As VList In myList
ListBox1.Items.Add(String.Format("{0}:<{1}> [{2}]", ls.id, ls.title, ls.vid1))
Next
myList.Clear()
End Sub
End Class
夜聞香原創(chuàng)
博客: http://clso.cnblogs.com
主頁(yè): http://cleclso.cn
QQ:315514678 E-mail:clso#qq.com
歡迎技術(shù)交流!
采集中應(yīng)用.NET中的HttpWebRequest和HttpWebResponse類,代碼分析用了正則表達(dá)式。
這個(gè)代碼效率不是很好,一個(gè)網(wǎng)頁(yè)的解析時(shí)間在0.5~2秒之間,不適合大量采集。也許將它轉(zhuǎn)換成JavaScript速度會(huì)快一點(diǎn)吧。
暫時(shí)就研究這么多,代碼直接發(fā)出來(lái)給大家共享一下。
代碼VB.NET,新建一個(gè)窗體frmMain,添加一個(gè)TextBox,一個(gè)ListBox,兩個(gè)Button,復(fù)制下面的代碼:
復(fù)制代碼 代碼如下:
Imports System.Net
Imports System.IO
Imports System.Text
Imports System.Text.RegularExpressions
Public Class frmMain
Structure VList
Dim id As Integer
Dim title As String
Dim vid1 As String
Dim vid2 As String
Overloads Function ToString() As String
Return String.Format("{0}:<{1}> [{2}]", id, title, vid1)
End Function
End Structure
Dim myList As New List(Of VList)
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
' 防止重復(fù)創(chuàng)建變量
Dim wr1 As HttpWebRequest
Dim wr2 As HttpWebResponse
Dim ret As String
Dim reg As Match
Dim g As Group
Dim preVid As String = "" '上一個(gè)VID
Dim nowid As Integer = 0 '當(dāng)前的視頻集數(shù)
Dim listUrl As String = TextBox1.Text '獲取專輯URL,如 http://www.youku.com/playlist_show/id_2350764.html
Dim tarUrl As String = "http://v.youku.com/v_playlist/f{0}" '{0}ListID
reg = Regex.Match(listUrl, "playlist_show/id_(\d+).*\.html")
If Not reg.Success Then
MsgBox("專輯列表提取失??!")
Exit Sub
End If
g = reg.Groups(1)
tarUrl = String.Format(tarUrl, g.Value) & "o{1}p{0}.html" '{0}集數(shù) {1}排序
wr1 = HttpWebRequest.Create(TextBox1.Text)
wr2 = wr1.GetResponse
ret = New StreamReader(wr2.GetResponseStream, Encoding.GetEncoding(wr2.CharacterSet)).ReadToEnd
reg = Regex.Match(ret, "<title>(.+) - 專輯 - 優(yōu)酷視頻</title>")
If Not reg.Success Then
MsgBox("專輯名稱提取失敗!")
Else
g = reg.Groups(1)
MsgBox("專輯名:《" & g.Value & "》")
End If
Do
' 從Web流中獲取頁(yè)面文本
wr1 = HttpWebRequest.Create(String.Format(tarUrl, nowid, "0")) '按倒序方式查找視頻
wr2 = wr1.GetResponse
ret = New StreamReader(wr2.GetResponseStream, Encoding.GetEncoding(wr2.CharacterSet)).ReadToEnd
'TextBox2.Text = ret
' 創(chuàng)建一個(gè)臨時(shí)視頻列表變量
Dim nlist As New VList
nlist.id = nowid '獲取ID
' 獲取videoId
reg = Regex.Match(ret, "var\s+videoId\s*=\s*""(\d+)""\s*;")
If Not reg.Success Then Exit Do
g = reg.Groups(1)
' 如果VID等于上一個(gè)VID最退出
If g.Value = preVid Then Exit Do
nlist.vid1 = g.Value
' 獲取videoId2
reg = Regex.Match(ret, "var\s+videoId2\s*=\s*""((\w|=)+)""\s*;") '"var\s+videoId2\s*=\s*""(\w+)""\s*;")
If Not reg.Success Then Exit Do
g = reg.Groups(1)
nlist.vid2 = g.Value
' 獲取標(biāo)題
reg = Regex.Match(ret, "<title>(.+) - (.+) - 視頻 - 優(yōu)酷視頻 - 在線觀看 - </title>")
If Not reg.Success Then
nlist.title = "{名稱查找錯(cuò)誤}"
Else
g = reg.Groups(2)
nlist.title = g.Value
End If
' 收尾工作
myList.Add(nlist) '添加到總列表中
preVid = nlist.vid1 '記錄最后一個(gè)VID
wr2.Close()
Me.Text = nowid & " : 處理完成!"
nowid += 1
Loop
wr2.Close()
MsgBox(nowid & " 個(gè)視頻全部采集處理完成!")
Button2_Click(sender, e)
End Sub
Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
ListBox1.Items.Clear()
For Each ls As VList In myList
ListBox1.Items.Add(String.Format("{0}:<{1}> [{2}]", ls.id, ls.title, ls.vid1))
Next
myList.Clear()
End Sub
End Class
夜聞香原創(chuàng)
博客: http://clso.cnblogs.com
主頁(yè): http://cleclso.cn
QQ:315514678 E-mail:clso#qq.com
歡迎技術(shù)交流!
相關(guān)文章
Web.Config文件配置之限制上傳文件大小和時(shí)間的屬性配置
在Web.Config文件中配置限制上傳文件大小與時(shí)間字符串時(shí),是在httpRuntime httpRuntime節(jié)中完成的,需要設(shè)置以下2個(gè)屬性:maxRequestLength屬性與ExecutionTimeout屬性,感興趣的朋友可以了解下,或許對(duì)你有所幫助2013-02-02
asp.net 讀取并顯示excel數(shù)據(jù)的實(shí)現(xiàn)代碼
Microsoft Office Excel是一個(gè)很好的電子表格應(yīng)用程序,在本文中,it同學(xué)會(huì)將教給你看到如何使用ASP.NET從Excel電子表格讀取并顯示顯示數(shù)據(jù)。2010-02-02
VS2019中.NET如何實(shí)現(xiàn)打日志功能
本文主要介紹了VS2019中.NET如何實(shí)現(xiàn)打日志功能,文中通過(guò)示例代碼介紹的非常詳細(xì),具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下2022-03-03
Visual Stduio 2010開發(fā)環(huán)境搭建教程
這篇文章主要為大家詳細(xì)介紹了Visual Stduio 2010開發(fā)環(huán)境搭建教程,具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下2017-04-04
ASP.NET存儲(chǔ)過(guò)程實(shí)現(xiàn)分頁(yè)效果(三層架構(gòu))
這篇文章主要為大家詳細(xì)介紹了ASP.NET存儲(chǔ)過(guò)程實(shí)現(xiàn)分頁(yè),利用三層架構(gòu)實(shí)現(xiàn)分頁(yè)效果,有參考價(jià)值的一篇文章,感興趣的小伙伴們可以參考一下2016-05-05
ASP.Net MVC+Data Table實(shí)現(xiàn)分頁(yè)+排序功能的方法
這篇文章主要介紹了ASP.Net MVC+Data Table實(shí)現(xiàn)分頁(yè)+排序功能的方法,結(jié)合實(shí)例形式分析了asp.net基于mvc架構(gòu)實(shí)現(xiàn)的數(shù)據(jù)查詢、排序、分頁(yè)顯示等相關(guān)操作技巧,需要的朋友可以參考下2017-06-06

