Step 1: Opening 2 files Source.txt in read mode and Destination.txt in write mode. Step 2: Reading the content from the FHR which is filehandle to read content. How to read data from.html file using perl. I need to capture some data from.html files which was stored in my local desktop.
- Perl Basics
- Perl Advanced
- Perl Useful Resources
- Selected Reading
What is CGI ?
The Common Gateway Interface, or CGI, is a set of standards that define how information is exchanged between the web server and a custom script.
The CGI specs are currently maintained by the NCSA and NCSA defines CGI is as follows −
The Common Gateway Interface, or CGI, is a standard for external gateway programs to interface with information servers such as HTTP servers.
The current version is CGI/1.1 and CGI/1.2 is under progress.
Web Browsing
To understand the concept of CGI, lets see what happens when we click a hyper link to browse a particular web page or URL.
Your browser contacts the HTTP web server and demand for the URL ie. filename.
Web Server will parse the URL and will look for the filename in if it finds that file then sends back to the browser otherwise sends an error message indicating that you have requested a wrong file.
Web browser takes response from web server and displays either the received file or error message.
However, it is possible to set up the HTTP server so that whenever a file in a certain directory is requested that file is not sent back; instead it is executed as a program, and whatever that program outputs is sent back for your browser to display. This function is called the Common Gateway Interface or CGI and the programs are called CGI scripts. These CGI programs can be a PERL Script, Shell Script, C or C++ program etc.
CGI Architecture Diagram
Web Server Support & Configuration
Before you proceed with CGI Programming, make sure that your Web Server supports CGI and it is configured to handle CGI Programs. All the CGI Programs be executed by the HTTP server are kept in a pre-configured directory. This directory is called CGI Directory and by convention it is named as /cgi-bin. By convention PERL CGI files will have extention as .cgi.
First CGI Program
Output
HTTP Header
The line Content-type:text/htmlrnrn is part of HTTP header which is sent to the browser to understand the content. All the HTTP header will be in the following form
For Example
Content-type:text/htmlrnrn
There are few other important HTTP headers which you will use frequently in your CGI Programming.
S.No. | Header & Description |
---|---|
1 | Content-type: String A MIME string defining the format of the file being returned. Example is Content-type:text/html |
2 | Expires: Date String The date the information becomes invalid. This should be used by the browser to decide when a page needs to be refreshed. A valid date string should be in the format 01 Jan 1998 12:00:00 GMT. |
3 | Location: URL String The URL that should be returned instead of the URL requested. You can use this filed to redirect a request to any file. |
4 | Last-modified: String The date of last modification of the resource. |
5 | Content-length: String The length, in bytes, of the data being returned. The browser uses this value to report the estimated download time for a file. |
6 | Set-Cookie: String Set the cookie passed through the string |
CGI Environment Variables
All the CGI program will have access to the following environment variables. These variables play an important role while writing any CGI program.
Output
How To Raise a 'File Download' Dialog Box ?
Sometime it is desired that you want to give option where a use will click a link and it will pop up a 'File Download' dialogue box to the user in stead of displaying actual content. This is very easy and will be achived through HTTP header.
This HTTP header will be different from the header mentioned in previous section.
For example,if you want make a FileName file downloadable from a given link then its syntax will be as follows.
GET and POST Methods
You must have come across many situations when you need to pass some information from your browser to web server and ultimately to your CGI Program. Most frequently browser uses two methods two pass this information to web server. These methods are GET Method and POST Method.
Passing Information using GET method
The GET method sends the encoded user information appended to the page request. The page and the encoded information are separated by the ? character as follows −http://www.test.com/cgi-bin/hello.cgi?key1=value1&key2=value2
The GET method is the defualt method to pass information from browser to web server and it produces a long string that appears in your browser's Location:box. Never use the GET method if you have password or other sensitive information to pass to the server. The GET method has size limtation: only 1024 characters can be in a request string.
This information is passed using QUERY_STRING header and will be accessible in your CGI Program through QUERY_STRING environment variable.
You can pass information by simply concatenating key and value pairs along with any URL or you can use HTML <FORM> tags to pass information using GET method.
Simple URL Example : Get Method
Here is a simple URL which will pass two values to hello_get.cgi program using GET method.
Below is hello_get.cgi script to handle input given by web browser.
Output
Simple FORM Example: GET Method
Here is a simple example which passes two values using HTML FORM and submit button. We are going to use same CGI script hello_get.cgi to handle this input.
Here is the actual output of the above form, You enter First and Last Name and then click submit button to see the result.
Passing Information using POST method
A generally more reliable method of passing information to a CGI program is the POST method. This packages the information in exactly the same way as GET methods, but instead of sending it as a text string after a ? in the URL it sends it as a separate message. This message comes into the CGI script in the form of the standard input.
Below is hello_post.cgi script to handle input given by web browser. This script will handle GET as well as POST method.
Let us take again same example as above, which passes two values using HTML FORM and submit button. We are going to use CGI script hello_post.cgi to handle this input.
Here is the actual output of the above form, You enter First and Last Name and then click submit button to see the result.
Passing Checkbox Data to CGI Program
Checkboxes are used when more than one option is required to be selected.
Here is example HTML code for a form with two checkboxes
The result of this code is the following form
Below is checkbox.cgi script to handle input given by web browser for radio button.
Passing Radio Button Data to CGI Program
Radio Buttons are used when only one option is required to be selected.
Here is example HTML code for a form with two radio button −
The result of this code is the following form −
Below is radiobutton.cgi script to handle input given by web browser for radio button.
Passing Text Area Data to CGI Program
TEXTAREA element is used when multiline text has to be passed to the CGI Program.
Here is example HTML code for a form with a TEXTAREA box −
The result of this code is the following form −
Below is textarea.cgi script to handle input given by web browser.
Passing Drop Down Box Data to CGI Program
Drop Down Box is used when we have many options available but only one or two will be selected.
Here is example HTML code for a form with one drop down box
The result of this code is the following form −
Below is dropdown.cgi script to handle input given by web browser.
Using Cookies in CGI
HTTP protocol is a stateless protocol. But for a commercial website it is required to maintain session information among different pages. For example one user registration ends after completing many pages. But how to maintain user's session information across all the web pages.
In many situations, using cookies is the most efficient method of remembering and tracking preferences, purchases, commissions, and other information required for better visitor experience or site statistics.
How It Works
Your server sends some data to the visitor's browser in the form of a cookie. The browser may accept the cookie. If it does, it is stored as a plain text record on the visitor's hard drive. Now, when the visitor arrives at another page on your site, the cookie is available for retrieval. Once retrieved, your server knows/remembers what was stored.
Cookies are a plain text data record of 5 variable-length fields −
Expires − The date the cookie will expire. If this is blank, the cookie will expire when the visitor quits the browser.
Domain − The domain name of your site.
Path − The path to the directory or web page that set the cookie. This may be blank if you want to retrieve the cookie from any directory or page.
Secure − If this field contains the word 'secure' then the cookie may only be retrieved with a secure server. If this field is blank, no such restriction exists.
Name=Value − Cookies are set and retrviewed in the form of key and value pairs.
Setting up Cookies
This is very easy to send cookies to browser. These cookies will be sent along with HTTP Header. Assuming you want to set UserID and Password as cookies. So it will be done as follows −
From this example you must have understood how to set cookies. We use Set-Cookie HTTP header to set cookies.
Here it is optional to set cookies attributes like Expires, Domain, and Path. It is notable that cookies are set before sending magic line 'Content-type:text/htmlrnrn.
Retrieving Cookies
This is very easy to retrieve all the set cookies. Cookies are stored in CGI environment variable HTTP_COOKIE and they will have following form.
Here is an example of how to retrieving cookies.
CGI Modules and Libraries
You will find many built-in modules over the internet which provide you direct functions to use in your CGI program. Following are the important once.
Reading a fileOnce a FILEHANDLE is assigned a file, various operations like reading, writing and appending can be done. There are a number of different ways of reading a file. Using a File Handle Operator. Using getc function.
Using read function. The FileHandle OperatorThe main method of reading the information from an open filehandle is using the operator. When operator is used in a list context, it returns a list of lines from the specified filehandle.
The example below reads one line from the file and stores it in the scalar.Let the content of file “GFG.txt” is as given below:GeeksforGeeksHello GeekGeek a revolutionGeeks are the bestExample: GFG.pl. If there was an error or the filehandle is at end of the file, then it returns undef.
read FunctionThe read function is used to read binary data from a file using filehandle.Syntaxread FILEHANDLE, SCALAR, LENGTH, OFFSETread FILEHANDLE, SCALAR, LENGTHHere, LENGTH represents the length of data to be read and the data is placed at the start of SCALAR if no OFFSET is specified. Otherwise, data is placed after bytes of OFFSET in SCALAR. On the success of file reading, the function returns the number of bytes read, zero at end of file, or undef if there was an error.
- Author: admin
- Category: Category
Step 1: Opening 2 files Source.txt in read mode and Destination.txt in write mode. Step 2: Reading the content from the FHR which is filehandle to read content. How to read data from.html file using perl. I need to capture some data from.html files which was stored in my local desktop.
- Perl Basics
- Perl Advanced
- Perl Useful Resources
- Selected Reading
What is CGI ?
The Common Gateway Interface, or CGI, is a set of standards that define how information is exchanged between the web server and a custom script.
The CGI specs are currently maintained by the NCSA and NCSA defines CGI is as follows −
The Common Gateway Interface, or CGI, is a standard for external gateway programs to interface with information servers such as HTTP servers.
The current version is CGI/1.1 and CGI/1.2 is under progress.
Web Browsing
To understand the concept of CGI, lets see what happens when we click a hyper link to browse a particular web page or URL.
Your browser contacts the HTTP web server and demand for the URL ie. filename.
Web Server will parse the URL and will look for the filename in if it finds that file then sends back to the browser otherwise sends an error message indicating that you have requested a wrong file.
Web browser takes response from web server and displays either the received file or error message.
However, it is possible to set up the HTTP server so that whenever a file in a certain directory is requested that file is not sent back; instead it is executed as a program, and whatever that program outputs is sent back for your browser to display. This function is called the Common Gateway Interface or CGI and the programs are called CGI scripts. These CGI programs can be a PERL Script, Shell Script, C or C++ program etc.
CGI Architecture Diagram
Web Server Support & Configuration
Before you proceed with CGI Programming, make sure that your Web Server supports CGI and it is configured to handle CGI Programs. All the CGI Programs be executed by the HTTP server are kept in a pre-configured directory. This directory is called CGI Directory and by convention it is named as /cgi-bin. By convention PERL CGI files will have extention as .cgi.
First CGI Program
Output
HTTP Header
The line Content-type:text/htmlrnrn is part of HTTP header which is sent to the browser to understand the content. All the HTTP header will be in the following form
For Example
Content-type:text/htmlrnrn
There are few other important HTTP headers which you will use frequently in your CGI Programming.
S.No. | Header & Description |
---|---|
1 | Content-type: String A MIME string defining the format of the file being returned. Example is Content-type:text/html |
2 | Expires: Date String The date the information becomes invalid. This should be used by the browser to decide when a page needs to be refreshed. A valid date string should be in the format 01 Jan 1998 12:00:00 GMT. |
3 | Location: URL String The URL that should be returned instead of the URL requested. You can use this filed to redirect a request to any file. |
4 | Last-modified: String The date of last modification of the resource. |
5 | Content-length: String The length, in bytes, of the data being returned. The browser uses this value to report the estimated download time for a file. |
6 | Set-Cookie: String Set the cookie passed through the string |
CGI Environment Variables
All the CGI program will have access to the following environment variables. These variables play an important role while writing any CGI program.
Output
How To Raise a 'File Download' Dialog Box ?
Sometime it is desired that you want to give option where a use will click a link and it will pop up a 'File Download' dialogue box to the user in stead of displaying actual content. This is very easy and will be achived through HTTP header.
This HTTP header will be different from the header mentioned in previous section.
For example,if you want make a FileName file downloadable from a given link then its syntax will be as follows.
GET and POST Methods
You must have come across many situations when you need to pass some information from your browser to web server and ultimately to your CGI Program. Most frequently browser uses two methods two pass this information to web server. These methods are GET Method and POST Method.
Passing Information using GET method
The GET method sends the encoded user information appended to the page request. The page and the encoded information are separated by the ? character as follows −http://www.test.com/cgi-bin/hello.cgi?key1=value1&key2=value2
The GET method is the defualt method to pass information from browser to web server and it produces a long string that appears in your browser's Location:box. Never use the GET method if you have password or other sensitive information to pass to the server. The GET method has size limtation: only 1024 characters can be in a request string.
This information is passed using QUERY_STRING header and will be accessible in your CGI Program through QUERY_STRING environment variable.
You can pass information by simply concatenating key and value pairs along with any URL or you can use HTML <FORM> tags to pass information using GET method.
Simple URL Example : Get Method
Here is a simple URL which will pass two values to hello_get.cgi program using GET method.
Below is hello_get.cgi script to handle input given by web browser.
Output
Simple FORM Example: GET Method
Here is a simple example which passes two values using HTML FORM and submit button. We are going to use same CGI script hello_get.cgi to handle this input.
Here is the actual output of the above form, You enter First and Last Name and then click submit button to see the result.
Passing Information using POST method
A generally more reliable method of passing information to a CGI program is the POST method. This packages the information in exactly the same way as GET methods, but instead of sending it as a text string after a ? in the URL it sends it as a separate message. This message comes into the CGI script in the form of the standard input.
Below is hello_post.cgi script to handle input given by web browser. This script will handle GET as well as POST method.
Let us take again same example as above, which passes two values using HTML FORM and submit button. We are going to use CGI script hello_post.cgi to handle this input.
Here is the actual output of the above form, You enter First and Last Name and then click submit button to see the result.
Passing Checkbox Data to CGI Program
Checkboxes are used when more than one option is required to be selected.
Here is example HTML code for a form with two checkboxes
The result of this code is the following form
Below is checkbox.cgi script to handle input given by web browser for radio button.
Passing Radio Button Data to CGI Program
Radio Buttons are used when only one option is required to be selected.
Here is example HTML code for a form with two radio button −
The result of this code is the following form −
Below is radiobutton.cgi script to handle input given by web browser for radio button.
Passing Text Area Data to CGI Program
TEXTAREA element is used when multiline text has to be passed to the CGI Program.
Here is example HTML code for a form with a TEXTAREA box −
The result of this code is the following form −
Below is textarea.cgi script to handle input given by web browser.
Passing Drop Down Box Data to CGI Program
Drop Down Box is used when we have many options available but only one or two will be selected.
Here is example HTML code for a form with one drop down box
The result of this code is the following form −
Below is dropdown.cgi script to handle input given by web browser.
Using Cookies in CGI
HTTP protocol is a stateless protocol. But for a commercial website it is required to maintain session information among different pages. For example one user registration ends after completing many pages. But how to maintain user's session information across all the web pages.
In many situations, using cookies is the most efficient method of remembering and tracking preferences, purchases, commissions, and other information required for better visitor experience or site statistics.
How It Works
Your server sends some data to the visitor's browser in the form of a cookie. The browser may accept the cookie. If it does, it is stored as a plain text record on the visitor's hard drive. Now, when the visitor arrives at another page on your site, the cookie is available for retrieval. Once retrieved, your server knows/remembers what was stored.
Cookies are a plain text data record of 5 variable-length fields −
Expires − The date the cookie will expire. If this is blank, the cookie will expire when the visitor quits the browser.
Domain − The domain name of your site.
Path − The path to the directory or web page that set the cookie. This may be blank if you want to retrieve the cookie from any directory or page.
Secure − If this field contains the word 'secure' then the cookie may only be retrieved with a secure server. If this field is blank, no such restriction exists.
Name=Value − Cookies are set and retrviewed in the form of key and value pairs.
Setting up Cookies
This is very easy to send cookies to browser. These cookies will be sent along with HTTP Header. Assuming you want to set UserID and Password as cookies. So it will be done as follows −
From this example you must have understood how to set cookies. We use Set-Cookie HTTP header to set cookies.
Here it is optional to set cookies attributes like Expires, Domain, and Path. It is notable that cookies are set before sending magic line 'Content-type:text/htmlrnrn.
Retrieving Cookies
This is very easy to retrieve all the set cookies. Cookies are stored in CGI environment variable HTTP_COOKIE and they will have following form.
Here is an example of how to retrieving cookies.
CGI Modules and Libraries
You will find many built-in modules over the internet which provide you direct functions to use in your CGI program. Following are the important once.
Reading a fileOnce a FILEHANDLE is assigned a file, various operations like reading, writing and appending can be done. There are a number of different ways of reading a file. Using a File Handle Operator. Using getc function.
Using read function. The FileHandle OperatorThe main method of reading the information from an open filehandle is using the operator. When operator is used in a list context, it returns a list of lines from the specified filehandle.
The example below reads one line from the file and stores it in the scalar.Let the content of file “GFG.txt” is as given below:GeeksforGeeksHello GeekGeek a revolutionGeeks are the bestExample: GFG.pl. If there was an error or the filehandle is at end of the file, then it returns undef.
read FunctionThe read function is used to read binary data from a file using filehandle.Syntaxread FILEHANDLE, SCALAR, LENGTH, OFFSETread FILEHANDLE, SCALAR, LENGTHHere, LENGTH represents the length of data to be read and the data is placed at the start of SCALAR if no OFFSET is specified. Otherwise, data is placed after bytes of OFFSET in SCALAR. On the success of file reading, the function returns the number of bytes read, zero at end of file, or undef if there was an error.