Fetch a Table From a URL in Google Sheets When IMPORTHTML Fails (WAF Blocking)
I maintain a personal Google Sheet dashboard to track the latest releases of PHP, Drupal, Drush, and various other open-source solutions in a single place. For most tracking, the fantastic site Endoflife.date works flawlessly.
However, some sites utilize aggressive Web Application Firewalls (WAFs) that block automated scraping requests. When this happens, Google Sheets’ native =IMPORTHTML() function fails and throws a generic error.
The culprits are usually strict security rules blocking the default user-agent string sent by Google’s servers. To bypass these simpler WAF restrictions, I wrote a custom Google Apps Script that mimics IMPORTHTML but allows you to modify the user-agent header to look like a standard web browser request.
Feel free to use the Apps Script below if you are hitting a similar wall!
/**
* Fetches and parses an HTML table from a URL, mimicking the IMPORTHTML function.
* This version uses UrlFetchApp to include a User-Agent header, which helps
* bypass restrictions on standard Google Sheets imports.
*
* @param {string} url The URL of the page to fetch.
* @param {string} query The type of element to look for. Currently, only "table" is supported.
* @param {number} index The 1-based index of the table to retrieve from the page.
* @return {Array<Array<string>>} A 2D array of the table data.
* @customfunction
*/
function IMPORTHTML_WITH_USER_AGENT(url, query, index) {
// Check if the required arguments are provided.
if (!url || !query || !index) {
return "Error: Missing required arguments. Usage: IMPORTHTML_WITH_USER_AGENT(url, query, index)";
}
// This script currently only supports fetching tables.
if (query.toLowerCase() !== "table") {
return `Error: Query type "${query}" is not supported. Only "table" is available.`;
}
// Adjust index to be zero-based for array access.
const tableIndex = index - 1;
if (tableIndex < 0) {
return "Error: Index must be greater than 0.";
}
try {
// Set options for the fetch request, including a standard browser User-Agent.
const options = {
'method': 'get',
'headers': {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.4472.124 Safari/537.36'
},
'muteHttpExceptions': true // Prevent script from stopping on HTTP errors (e.g., 404, 503).
};
// Fetch the content of the URL.
const response = UrlFetchApp.fetch(url, options);
const responseCode = response.getResponseCode();
const content = response.getContentText();
// Handle non-successful HTTP responses.
if (responseCode !== 200) {
return `Error: Request failed. URL returned status code ${responseCode}.`;
}
// Use regular expressions to find all tables in the HTML content.
const tableRegex = /<table[\s\S]*?<\/table>/is;
const tables = content.match(new RegExp(tableRegex.source, 'gis'));
if (!tables || tables.length === 0) {
return "Error: No tables found on the page.";
}
if (tableIndex >= tables.length) {
return `Error: Table with index ${index} not found. Only ${tables.length} tables were found.`;
}
const targetTableHtml = tables[tableIndex];
const result = [];
// Find all table rows (<tr>) within the selected table.
const rowRegex = /<tr[\s\S]*?<\/tr>/is;
const rows = targetTableHtml.match(new RegExp(rowRegex.source, 'gis'));
if (rows) {
rows.forEach(rowHtml => {
const rowData = [];
// Find all table cells (<th> for headers, <td> for data) within each row.
const cellRegex = /<(t[dh])[\s\S]*?>([\s\S]*?)<\/\1>/is;
const cells = rowHtml.match(new RegExp(cellRegex.source, 'gis'));
if (cells) {
cells.forEach(cellHtml => {
// Extract the inner content of the cell.
const cellMatch = cellHtml.match(cellRegex);
let cellContent = cellMatch ? cellMatch[2] : '';
// Clean the content by removing any remaining HTML tags and trimming whitespace.
cellContent = cellContent.replace(/<[^>]*>/g, '').trim();
rowData.push(cellContent);
});
}
result.push(rowData);
});
}
// Return the 2D array, which will populate the cells in the spreadsheet.
return result;
} catch (e) {
// Return a descriptive error message if the script fails.
return `Error: An unexpected error occurred. ${e.toString()}`;
}
}
How to Set It Up in Google Sheets
Deploying custom scripts in Google Sheets is straightforward. Follow these steps to get started:
- Open your target spreadsheet in Google Sheets.
- In the top menu, navigate to Extensions > Apps Script.
- Clear out any default boilerplate code inside the editor block.
- Copy the complete script provided above and paste it into the editor.
- Click the Save project icon (the floppy disk symbol) at the top or press
Ctrl + S(Cmd + Son Mac). - Return to your spreadsheet. You can now use the new custom function just like the native alternative:
Excel
=IMPORTHTML_WITH_USER_AGENT("https://example.com/data-page", "table", 1)
Technical Disclaimer: This script utilizes Regular Expressions (RegEx) to process and extract the DOM layout. While this lightweight approach works flawlessly for traditional data tables, heavily structured sites using deeply nested modern tables or dynamic client-side rendering may require more granular adjustments.

