[–] Reddit_traitor 0 points 2 points (+2|-0) ago 

[–] Plant_Boy [S] 0 points 0 points (+0|-0) ago 

Ta, I'll have a look there and see what I can divulge!

[–] Reddit_traitor 0 points 0 points (+0|-0) ago 

i figured the wiki would get a you good start

[–] RustyEquipment 0 points 2 points (+2|-0) ago 

DISCLAIMER: Not an expert, kinda rusty at HTML.

HTML is an XML based language that, when written out has tags. <html> </html> <title> <Body>

In order to grab the title or such you would need to know the tag and then you should be able to parse it out. As far as how to do that... not an expert, but that would be a little bit more understanding I think. You would be looking for a sort of XML Parse function to grab the Title Tag....

corrections?

[–] Plant_Boy [S] 0 points 1 points (+1|-0) ago 

Gives me a direction to start looking!

[–] infamousEB 0 points 1 points (+1|-0) ago 

Check out BeautifulSoup

[–] Plant_Boy [S] 0 points 0 points (+0|-0) ago 

Someone else mentioned it and I think it has a relevant features!

[–] TheOmniscientOne 0 points 1 points (+1|-0) ago 

https://stackoverflow.com/questions/4348912/get-title-of-website-via-link


<?php

function get_title($url){
  $str = file_get_contents($url);
  if(strlen($str)>0){
    $str = trim(preg_replace('/\s+/', ' ', $str)); // supports line breaks inside <title>
    preg_match("/\<title\>(.*)\<\/title\>/i",$str,$title); // ignore case
    return $title[1];
  }
}
//Example:
echo get_title("http://f4ct.co/");

?>


[–] Plant_Boy [S] 0 points 0 points (+0|-0) ago 

Php may be something I have to look more into.

I'm attempting to learn programming in my spare time and python has been the dominant language in that time.

[–] TheOmniscientOne 0 points 0 points (+0|-0) ago  (edited ago)

Stick with Python vs. php. I only use php because at the time I developed some of my first big apps it had more modules and portability. Python is better.

[–] TheOmniscientOne 0 points 0 points (+0|-0) ago 

I would try a method that only grabbed headers, but it would be difficult to implement across all the various server types. What I am saying is if you could just grab the header without the rest of the document that would be great. Some coders would pass the response directly into buffers and close the connection as soon as it saw the </head> tag but I think that would be rude. [And likely to get your bot blacklisted]

Here's some background on headers:

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers

[–] Veridic 0 points 1 points (+1|-0) ago 

Beautiful soup, python.

[–] Plant_Boy [S] 0 points 0 points (+0|-0) ago 

This looks like something I'd like to bodge into my code! Thanks!

[–] psimonster 0 points 0 points (+0|-0) ago 

For general HTML parsing with Python: https://docs.python.org/3/library/html.parser.html .