FREE hit counter and Internet traffic statistics from freestats.com

Tuesday, August 31, 2004

AutoTagging Ballplayers

Since I write alot of posts that include the names of baseball players like Willie Mays, Sammy Sosa, Babe Ruth and others I wanted to find a way to automatically create hyperlinks to the statistics for the players without having to search Baseball-reference.com manually. Since I couldn't find anything out there that did this I wrote a small Windows application using the .NET Framework.

Here's the simple interface.



When the application starts or is activated it copies whatever text is in the clipboard into its main window using the Clipboard object in the Framework. By clicking on Go it parses the text in the window looking for player names. I defined player names as any two consecutive words that begin with capital letters (I know this will miss some players like J.D. Drew). In order to deal with HTML that already contains hyperlinks and text with paragraphs the program adds spaces to the copy of the text it is parsing and replaces control-line feed (crlf) characters with spaces. It then splits the text into words using the Split method in the String class and searches for names ignoring punctuation using the handy IsPunctuation method.

When it finds a name it may prompt the user if the checkbox is checked and if not go on to make an HTTP POST against the search page at baseball-reference.com using the classes in the System.Net namespace. On that site if the search does not return a unique player the same URL that was posted to is returned and so the program continues on. Of course this means that when multiple players with the same name are found, such as Randy Johnson, it will not tag the name. If the URL returned is different, it will be the player's page which is then saved and inserted into the text with a hyperlink. When all of the replacements are made, the text in the window is copied back to the clipboard so I can easily paste it into the blogging window.

Here is the part of the program that parses the text:


Private Sub Button1_Click(ByVal sender As System.Object, _
ByVal e As System.EventArgs) Handles Button1.Click

' Parse the text for names
' defined as 2 consecutive words with capital letters

Dim text As String = TextBox1.Text

Dim sbText As New StringBuilder(text)

' Make some room for hyperlinks
text = text.Replace(">", " > ")
text = text.Replace("<", " < ")
text = text.Replace(vbCrLf, " ")
' Create an array of words
Dim words() As String = text.Split(" ")
Dim searchText, s, lastWord, searchTextOrig As String
Dim firstUpper As Boolean = False
Dim newText As New StringBuilder
Dim replacedText As New ArrayList
For Each s In words
If s.Length > 0 AndAlso Char.IsUpper(s.Chars(0)) Then
If firstUpper = True Then
' Last two are upper case so search
searchText = lastWord & " " & s
' See if we've replaced it already
If Not replacedText.Contains(searchText) Then
searchTextOrig = lastWord & " " & s
If SearchForName(searchText) Then
sbText.Replace(searchTextOrig, searchText)
replacedText.Add(searchTextOrig)
End If
End If
End If

' Words that end in punctuation are not considered
If Not Char.IsPunctuation(s.Chars(s.Length - 1)) Then
firstUpper = True
lastWord = s
End If
Else
firstUpper = False
End If
Next

TextBox1.Text = sbText.ToString
Windows.Forms.Clipboard.SetDataObject(sbText.ToString)
' Put the data in the clipboard
sbMessage.Panels(0).Text = "Done. " & replacements.ToString & _
" replacements made."

End Sub


And here is the part that searches baseball-reference.com.

Private Function SearchRef(ByVal searchText As String) As String

Try
Dim brHttp As HttpWebRequest = _
CType(WebRequest.Create(searchUri), HttpWebRequest)

' *** Send the POST data
Dim brPostData As String = "search=" + _
HttpUtility.UrlEncode(searchText)
brHttp.Method = "POST"

Dim lbPostBuffer() As Byte = _
System.Text.Encoding.GetEncoding(1252).GetBytes(brPostData)
brHttp.ContentLength = lbPostBuffer.Length

Dim sPostData As Stream = brHttp.GetRequestStream()
sPostData.Write(lbPostBuffer, 0, lbPostBuffer.Length)
sPostData.Close()

' Get the response and check and see if it's the search page
Dim loWebResponse As HttpWebResponse = _
CType(brHttp.GetResponse(), HttpWebResponse)
If loWebResponse.ResponseUri.ToString <> searchUri Then
loWebResponse.Close()
Return loWebResponse.ResponseUri.ToString
Else
loWebResponse.Close()
Return ""
End If

Catch ex As Exception
MsgBox("Could not search the site: " & ex.Message, _
MsgBoxStyle.Critical)
Return ""
End Try

End Function

BTW, this was the first post processed through this tool.

No comments: