Parse usage - Extract data



Counting words on text :

Red []

a: "Not great Britain nor small Britain, just Britain"

count: 0

parse a [any [thru "Britain" (count: count + 1)]]

print count



3



Explaining the code:


As long as thru "Britain" finds a "Britain", any will repeat the rule






Notice that if you used to instead of thru, the input would be moved to BEFORE the match, creating an endless loop, since "Britain" would be a match over and over again.



Extracting the middle part of a text :

To extract the remaining part of a text from a given point, you may use word: , as explained in the Storing Input chapter. To extract text between two parse matchings, you may use copy :


Red []

txt: "They are one person, they are two together"

parse txt [thru "person, " copy b to " two"]

print b



they are



Extract data from the Internet:


This is a very basic example. I have created an html page at helpin.red: http://helpin.red/samples/samplehtml1.html. The html is very simple and you can see it by typing print read http://helpin.red/samples/samplehtml1.html at the console.

Since I know the html, I can extract some information with the code below:


Red []

txt: read http://helpin.red/samples/samplehtml1.html

parse txt [

       thru "today"

       2 thru ">"

       copy weather1 to "<"

       thru "tomorrow"

       2 thru ">"

       copy weather2 to "<"

       thru "week"

       2 thru ">"

       copy weather3 to "<"

]

print {Acording to helpin.red website weather will be: }

print [] ; just adding an empty line

print ["Today:     "  weather1]

print ["Tomorrow:  "  weather2]

print ["Next week: "  #"^(tab)"  weather3] ; just showing the use of tab



Acording to helpin.red website weather will be:


Today:      sunny

Tomorrow:   horrible

Next week:          really really horrible



I will show how the parsing works for extracting the weather of "today" to the "weather1" variable:


thru "today" ; skips all text until after a "today" string.


border="1" cellpadding="2" cellspacing="2">

 <tbody>

   <tr>

     <td style="color: black;">weather today:</td>

     <td style="color: black;">sunny</td>

   </tr>

   <tr>


2 thru ">" ;this skips text until (after) the character ">". Does it 2 times!


border="1" cellpadding="2" cellspacing="2">

 <tbody>

   <tr>

     <td style="color: black;">weather today:</td>        ; 1

     <td style="color: black;">sunny</td>                 ; 2

   </tr>

   <tr>


copy weather1 to "<" ; this copies to "weather1" all that it finds until (before) a "<".


border="1" cellpadding="2" cellspacing="2">

 <tbody>

   <tr>

     <td style="color: black;">weather today:</td>

     <td style="color: black;">sunny</td>                  ; ==> weather1

   </tr>

   <tr>



< Previous topic                                                                                          Next topic >