Python BeautifulSoup:使用 string 获取和设置 HTML 标签内容
使用 string 可以获取和设置标签内容。
以下是 string 的源码注释:
If this element has a single string child, return
value is that string. If this element has one child tag,
return value is the 'string' attribute of the child tag,
recursively. If this element is itself a string, has no
children, or has more than one child, return value is None.
示例:返回非 None
from bs4 import BeautifulSoup
html_content = '''
<div id="content">
测试01
</div>
<div>测试03</div>
'''.strip()
soup = BeautifulSoup(html_content, 'html.parser')
content_div = soup.select_one("#content")
print(content_div.string)
执行结果:
测试01
示例:返回非 None
from bs4 import BeautifulSoup
html_content = '''
<div id="content"><p>测试01</p></div>
<div>测试03</div>
'''.strip()
soup = BeautifulSoup(html_content, 'html.parser')
content_div = soup.select_one("#content")
print(content_div.string)
执行结果:
测试01
示例:返回为 None
from bs4 import BeautifulSoup
html_content = '''
<div id="content">
<p>测试01</p>
</div>
<div>测试03</div>
'''.strip()
soup = BeautifulSoup(html_content, 'html.parser')
content_div = soup.select_one("#content")
print(content_div.string)
执行结果:
None
这里返回 None ,是因为<div id="content">
内部,除了 <p>测试01</p>
,还有空字符串和换行。
示例:返回为 None
from bs4 import BeautifulSoup
html_content = '''
<div id="content">x
<p>测试01</p>
<span>测试02</span>
</div>
<div>测试03</div>
'''.strip()
soup = BeautifulSoup(html_content, 'html.parser')
content_div = soup.select_one("#content")
print(content_div.string)
执行结果:
None
示例:设置内容
from bs4 import BeautifulSoup
html_content = '''
<div id="content">
<p>测试01</p>
</div>
<div>测试03</div>
'''.strip()
soup = BeautifulSoup(html_content, 'html.parser')
content_div = soup.select_one("#content")
content_div.string = 'xx'
print(soup)
执行结果:
<div id="content">xx</div>
<div>测试03</div>